Or controlling your IOT by voice.
2 days ago I was no-where with this – and today I “stand on the cusp” of having full voice control with just one minor issue bugging me – and as this subject has received so much attention I thought I’d better try to move people on to the matter at hand as most of it is resolved.
If you don’t have an Android device – or home control or MQTT – probably best to read no further.
Having defined my audience…. we’ll move on. I started looking for some means to divert Google Now and the general speech recognition of Google to actually controlling something as against finding things on the web. I put up an article in this very blog – and the number of responses was amazing. Clearly this is of interest.
So, the basis of this is that you have some kind of home control and that you are running or thinking of running Node-Red on some device (PC, Raspberry Pi, whatever) to control stuff. Much experience has led me to believe this is definitely the way forward, having the very reliable Node-Red as the central point to control everything. We’re talking I/O from ESP, Arduino and a host of other hardware gadgets and talking to them and Blynk, websockets, Email, Pushbullet, database, you name it.
So I can control stuff (as you’ll see in other items in the blog) using websockets, Blynk, Nextion serial displays and a host of other stuff – but up until now, controlling by voice was not on the cards.
It is now!
Starting with an HTC ONE M8 phone in my case, I added the following items:
- Autovoice
- Tasker
- MQTT Publisher Plugin
There is little initial setup to do – just load these onto your phone. The first two are not free (but are dirt cheap) – but Tasker is SO useful you’ll want it anyway if you don’t already have it – what a bargain.
So the general idea here is to have a program trap the return information from Google – let’s say I were to say “execute lamp on”. I could make that a phrase to search for but that’s a bit restricting… so you can set Tasker+Autovoice to look for “execute”. I picked that word as it is easily recognised and I can’t imagine wanting to search for “execute” otherwise.
So depending how you go into this – you might say “Ok Google, Execute lamp on” or you may just press the little microphone button then say “execute lamp on”.
So what I wanted to achieve and DID achieve thanks to perseverance and a lot of help from you guys – was to get “execute” trapped – and then get “lamp on” passed to Node-Red via MQTT for further processing.
Before we get into the phone bit let me tell you about the Node-Red bit. Many years ago I designed a package to let others write simple text-based adventures and though that may seem a little wasted now it helped me make some good decisions when it comes to how Node-Red handles the text.
Google liked phrases that make sense – whereas we want the minimal. I wrote some code for Node-Red which you’ll see below (and this is just the start – trust me). Let’s say I want to turn the heating up, down, to a specific temperature or just to auto. So there are FOUR variations here and one of them involves a variable number. That could get messy.
In addition I’d like to be able to speak in English for the sake of my good lady who might have trouble remembering limited sets of instructions.
So what’s needed is the ability to say “execute set the heating to 15 degrees”. What the computer wants is “heating 15” – preferably as 2 arguments.. the easiest way to do that is to separate the LAST argument off.
Here’s what I’ve done – you’ll no doubt immediately think of variations.
In a function in in Node-Red..
msg.payload = msg.payload.replace(/the /g, ”);
msg.payload = msg.payload.replace(/to /g, ”);
msg.payload = msg.payload.replace(/degrees /g, ”);
msg.payload = msg.payload.replace(/degrees/g, ”);
msg.payload = msg.payload.replace(/turn /g, ”);
msg.payload = msg.payload.replace(/to /g, ”);var incoming=msg.payload.split(” and “);
var msg2 = {
payload : “”,
topic : “”
}var msg3 = {
payload : “”,
topic : “”
}for (var a=0; a<incoming.length;a++)
{
var lastIndex= incoming[a].lastIndexOf(” “);
var leftbit = incoming[a].substring(0, lastIndex);
var rightbit = incoming[a].substring(lastIndex+1,incoming[a].length);switch (leftbit)
{
case “salt lamp” :
case “salt”:
msg.payload=”Turning the salt lamp “+ rightbit;
msg2.topic=”dfgdfg”;
msg2.payload=”dfgdfgdfg”;
msg3.topic=”orvibo/ACCF238D6DE6″;
msg3.payload=”salt “+rightbit;
node.send ([msg,msg2,msg3]);
break;
default :
msg.payload=”I do not understand “+ leftbit + “?”;
node.send ([msg,null,null]);
break;
}
}
So – given the example “execute turn the salt lamp off”… Node-Red never sees “Execute” it only gets “turn the salt lamp off” I have a function to control as it happens, an Orvibo socket (dealt with elsewhere in the blog) and as it happens it is expecting “salt on” or “salt off” as a message.
In the function above I have 3 outputs – one to a speech synth (Ivona – again dealt with earlier), the lamp and MQTT. Normally I’d just do everything by MQTT so don’t take this too literally.
First things first – remove un-necessary words for example “a”, “the”, “turn”, “degrees” etc. You could do this the other way but then you have to cater for numbers etc.
Then split the sentence on “ and “ and loop through the parts. For each part split off the LAST word as this is your argument “on”, “off” or a number.
And so now we can handle (‘ve only implemented salt lamp above but heating etc trivial to add) “execute turn the salt lamp on and the heating to 22 degrees”.
That comes down to two sentences… “salt lamp on” (or “salt on”) and “heating 22”
These are then easy to parse in a SWITCH statement, passing any voice (to come out of your speakers somewhere), any direct control – and MQTT messages.
Absolutely works a treat and if it has no idea what “turn boiler off” means it will say “I do not understand boiler” – similarly one could provide witty responses to invalid parameters “you must be kidding – heating at 5 degrees??”.
So all of that works – and it just ITCHING for further development which by the time you read this it will likely be getting – I’ve already stopped twice to get to the code you see above as I kept thinking of better ways. And please don’t say “you can do this all in Tasker” – I LIKE working in Node-Red.
You may by now be asking what the catch is? There isn’t other than one tiny item bugging me…. I hope to get responses as to how to fix it.
Autovoice/Tasker is listening to what I have to say and sending all but the command word to MQTT – reliably – from power up – no problems.
However and some of you have touched on this – depending on what I say after “execute” it may just pass on the material to MQTT and hence to Node-Red – OR it may first display something on the phone or even Google may have a bash at responding – usually giving up after the first word or so.
I’ve seen your GREAT responses… so before we start, re: Tasker etc.. yes, I’m running Tasker in the foreground, yes I have “Autovoice Google Now integration” turned on. “Do Google Now Search” is disabled.
In other words, pretty much all of the things that are SUPPOSED to stop Chrome+Google sneaking in and saying something or displaying something – which does happen – but occasionally just a tad late – as if Tasker was not quite getting in there quickly enough to stop things.
Ok so having installled AufoVoice, Tasker and the MQTT plug in – here is what I’ve done – some of you might want to follow my lead – hopefully someone else will say “OH WELL THAT’S WHY YOU’RE HAVING A PROBLEM” – whoever you are I am sitting here waiting to hear from you.
Open up TASKER… Select a new Event – plugin – Autovoice
After Autovoice you want the RECOGNISED option at the bottom of the menu (was just offscreen on my phone)
After recognise, you want “speak filter” – and in there say “command” or whatever clear, unambiguous word or phrase you want to use. When done – tick “contains all” and tick that box at the top. You are now done with the recognition bit.
So then you should be asked for a new TASK – if not – ask the people who desgned Tasker because it worked for me (I’m doing all of this from scratch now as I build up this blog entry). Give the task a name – I chose to call it MQTT. Call if FRED if you like. Then you need an ACTION for that task.
Select the MQTT plugin which just appears by magic (assuming you installed it). Fill in you MQTT details – you need address and port, user and pass (if you have a user and pass of course – some don’t but then I would not be using this out of the house if you don’t) and I used the TOPIC “voice”. The payload is
%avcommnofilter
What happens there is that gets replaced by all the text in your phrase EXCEPT the keyword or phrase.
Hit Accept and if you’ve not messed up that’s it.
I then chose to add another ACTION (not task) – a BEEP – that’s PROFILE – click on the green MQTT next to the profile – then hit + and select ALERT then BEEP and make it 8Khz and 1000ms or less (but not 1ms as you won’t hear anything.
Back out of all of that – press the mic button, say “command turn light on” – and back in your node-red you should have some text coming on in MQTT. You might want to test that with an MQTT client listening for “voice”…
Now – given the above – does anyone know how to make changes so that Google does not try to interfere with this – it never stops the command working but seeing a brief flash of screen or a brief unwanted word – could do without that.
SO – I THEN took the phone (which has good mics) – put it on the wall and from the other side of the room tried “command turn salt lamp on” – and it worked. GRANTED it’s not that big a room – and the wife is in the USA right now and the TV was off and we don’t have dogs.
Star Trek here we come. Once someone gets me out of my final misery and stops the occasional Googleisms, I’m adding heating control (3 fixed – up, down, auto and one variable – ie 22). You could do the same with lamps (on, off, up, down, auto) etc… and if you make it so big that it slows down your Raspberry Pi – get something faster!!! The sky is the limit. I do this on a Raspberry Pi 2 with a fast SD – no problem – that and 100 other jobs.
So as an alternative to the BEEP – you could setup Pushbullet and have Node-Red send you a message when it gets a valid command – that way you know for SURE that something has happened. And that is in fact what I have now done – having experimented with lots of MQTT clients for Android all of which seem less than reliable to put it mildly.
If you are interested – makes sure you SUBSCRIBE to this subject even if you don’t wish to comment – to make sure you get regular updates!
Is there an option for node red to send back something which is turned to voice on the phone?
Well, Node-Red can use Ivona which produces very good speech. How you get that into a phone..
Oh, erm – Imperihome – you run the Imperihome app on your phone – with it’s own server in the background – and there’s an API that Node Red can talk to to send speech -you must be inside the local network however.
I was playing around with it and I got it: Define a simple HTTP request and response in node-red, with a function in between. Put the text you want Google to say into msg.payload. In Tasker add an HTTP Get task, followed by a Say task where the text is the HTTP Data variable (%HTTPD). It is as simple as that.
Now, this is more easy and elegant than nextion :D…
need to buy cheap androind tablet or phone when deals shown up for testing..
Thanks for the posting!!!!
I dont know tasker. And until I have my setup working, I will not need it. But first I will try Automate (https://play.google.com/store/apps/details?id=com.llamalab.automate) For what I,ve seen is similar to tasker but is free 🙂 and is great. It even can command the esp8266 to open my garage door when I’m arriving home!
RE Google leftovers….Done a bit of testing and found that the command used can alter what happens, as outlined in the 2 cases below:
1) Use command filter of “SPEAK” and say
“OK Google, SPEAK, what is the weather today”
then 3 things happen
– The MQTT stuff works fine
– There is a Chrome webpage left hanging in background
– The “Google now” engine strips off “speak” and then instructs the phone to say something back like “The weather is…”
2) Use a command filter of “SAY” and say
“OK Google, SAY, what is the weather today”
then only 2 things happen (i.e. the phone doesn’t say anything back
– The MQTT stuff works fine
– There is a Chrome webpage left hanging in background
*********************
Also, this is in the FAQ http://joaoapps.com/autovoice/faq/
“When I use the Google Now integration in AutoVoice, the Google Now search is also executed. Is there a way to prevent this?”
“No, there is not. But you can do a “Go Home” action in Tasker which will close Google Now and not show the search. This also cancels out any action Google Now might try to take like creating a reminder, or calling someone.”
I tried this with both cases above – With case (1), it doesn’t stop the speech from starting, but it does cut it after a second or so (Annoying), there is no difference with Case 2.
I’ve also tried a few things to stop this small annoyance but I can’t stop it on my phone – its looking like the “Chrome cancel plugin” is the way to go now 🙁
Excellent – nice to see someone else struggling with this – let me know if you achieve miracles. A pal of mine is donating a phone (excellent phone but smashed screen) and the plan is to put this in my office behind a panel but for a hole for the microphone – I’ll have Tasker make sure it does not go to sleep – and also keep the screen turned off – and that should be a superb voice control system providing it doesn’t go off making calendar entries etc. The phone is one with a long battery life so that should protect it from any power cuts.
This has opened up other things to think about
– like IFTTT (IF This Then That) with node red (its a sort of cloud based version of Tasker)
– AI (to understand the spoken TTS message and map it to appropriate MQTT – with error messages e.g “did you mean this?” queries)
http://www.cleverbot.com/
There is so much scope here…..imagine a tablet device “thing” on the wall – its always listening to your commands but only responds if it senses that someone is physically in front of it (Ir sensor). If the tablet isn’t sure of a response to your spoken command, an animated face speaks back on the display saying “do you mean any of these commands?” whilst listing the options on a screen for you to select.
While this is going on, the tablet recognises who you are, it records and learns what you’ve selected and when.
Various tablets in the rooms monitor where you are – They are linked into your calender and ring if you are receiving a phone call. You can answer the call as it it is on “speaker phone” ……
Well, it gets better Pete – I’m already planning the tablet on the wall, having just changed the Google settings so that I can say “ok google” from anywhere – even the lock screen. I deliberately parsed only the end argument but of course – if then else would be relatively easy to add – and checking the state of things (albeit asynchronously) is already part of the whole node-red scenario. Yes, I can see this weekend has been a worthwhile effort – and stangely, since I scrapped the phone setup and started again for the purpose of the blog – the interference from Google seems have have died away somewhat – in particular voice responses so that it is already absolutely useful. Add to that the fact that Google now can perfectly adequately make calendar entries – I can see justifying sticking a hole in the wall for power 🙂
Sadly I can’t try this as-is because Tasker is Android only.
For iPhone user who may want to try this it looks like the starting point is Workflow (https://my.workflow.is/).
Or, get a cheap Android device and use it as a slave to the iPhone:
Pete – have you considered having a cheap Android device dedicated to doing the voice recognition, and then you could leave that permanently in the living room listening for command so you don’t even have to pull your phone out of your pocket to turn the lights on? 😉
Absolutely David – just dropped Aidan a note to see what he’s doing with his recently screen-smashed phone – doesn’t need a screen to listen for my voice!!! I tried it on the wall and talking from a distance and as long as you are clear and pointing at the device it seems fine – of course that will to some extent depend on the device and the quality of the microphone. Sometime soon I plan to test the Nexus 7 – it isn’t much use for anything else nowadays.