Category Archives: speech

Talking ImperiHome

I just added another useful tool to the home control arsenal – speech – not on the PC but on my phone. As regular readers know I use ImperiHome a lot – talking directly to Node-Red to control stuff – and that’s fine – it works – it’s not the only package – I also use Blynk. I just found another reason to concentrate on ImperiHome.

Today I discovered what the “API http server” in ImperiHome is for, at least, one use for it. You don’t normally need this with Node-Red as the phone talks to the Node-Red http link and that’s all that is needed. However I was aware that there was a TTS Engine available in ImperiHome but had never taken the time to find out what it could do.

Now I know. You can fire text at the API from Node-Red and it will speak the text, even when the phone is sitting idle and it sounds ok!!

So the first thing you need to know - in SETTINGS, GENERAL PREFERENCES in ImperiHome, tick the API server.

You need to know the IP address of your phone  - so that kills using this outside of the house or office unless someone can think of a wheeze. You should ideally do an IP-MAC binding on your router to ensure the phone always get the same internal address when on the WIFI.

And with that done, you’re almost up and running.  In Node-Red you need this simple SUBFLOW – create a new sub-flow, call it “My phone” for want of a better name and create this.

SubflowThe HTTP request is empty – the work is done in the orange function – put this in the function…

msg.url="http://192.168.1.31:8080/api/rest/speech/tts?text=" + msg.payload;
return msg;

That’s it – you’re done. Whatever you send in the payload to that sub-flow will speak on the phone. Couldn’t be easier.  Oh, that IP address – that needs to be the IP address of your phone.

June 2, 2016: I have found a slight bug. So you can let the phone go idle and run other tasks – the speech continues to work – but if you go out of signal area, perhaps onto a mobile network then come back in – noting that the mobile signal is now off and you are on the same IP address, no speech. Bring the app to the foreground – the speech comes back.  I reported it and for once got quite a quick response – they think it might be Google power management shutting the API down – so looks like some re-writing may be in order.. will have to wait and see but at least they’re aware of it.

Facebooktwittergoogle_pluspinterestlinkedin

Speech … the Final Frontier

Ivona speechUpdated 02/01/2017

Buffered, SD-cached high-quality human-like speech working alongside sound effects for your IOT project.

Requirements: Raspberry Pi or similar with audio out, Node-Red and a little of your time.

But you don’t need this blog – there’s a WAY better way to do this – in my latest entry about the Ivona Node.

As you may know I’ve done a couple of blog items on Speech on the Raspberry Pi over time. I was quite happy with the Google Translate API until recently when they stopped making it available for free,  so I was forced to go off in search of alternatives and settled on local synthesizers like eSpeak – the problem being – they generally sound AWFUL. eSpeak sounds like someone’s dog being strangled.

And so it was that one of our readers (thank you for that) contacted me and suggested I take a look at Ivona. https://www.ivona.com/us/

This is new: I’ve just gutted the code from the original blog to produce a much better version, better thought out with cacheing for off-line use, unifying speech and sound effects into an auto-created library. In the event of an external comms failure so that you cannot access the outside world, your Ivona is going to fail and this happened here at Bedrock – SO the idea hit us to CACHE all mp3 files (assuming you’re not going to do unique messages every time). That way the SECOND time you ask for a message you will already have a file with that name – the total file space for dozens or even hundreds of these MP3 files is a tiny fraction of a typical SD storage capability and not even worth taking into consideration in most projects. Currently we are working on modifying the Ivona node to handly dynamically changing voices. More on that soon.

I originally did a video to accompany this blog - https://www.youtube.com/watch?v=qoxPVa48qRw

If you use the link I’ve provided above (to Ivona) and select a suitable voice on their web page then enter some text in the box, you’ll find it does a pretty good job. Ivona is free to developers – just grab a free account and get an API key and secret key. I spent the entire evening playing with the code and when I looked at the percentage of my “free use” – nothing  - so you’re unlikely to run out of free use and with the mods here even less so.

Take a tip when you copy and paste that API code information (which you should immediately on getting an account as you won’t be able to get the same key later) and pass them via NOTEPAD (paste then copy again) to get rid of any hidden characters.

So now you have an API key and secret key to be used in the Node-Red node Ivona (node-red-contrib-ivona) https://www.npmjs.com/package/node-red-contrib-ivona

Your API key for Node-Red is the ACCESS code and the PASSWORD is the secret key !!!!! I used my email address for the username – this is all a bit non-intuitive so beware.

Over to Node-Red. So what this node does is take in your TEXT, send it off to Ivona which returns an MP3 file with your chosen speech. You should also have MPG123 installed on your end computer (I’m using a Raspberry Pi2 for all of this). http://www.mpg123.de/

In the simplest case you would send off your text, get the MP3 file, send that to mpg123 for playback. But then you are stuck with a file… and what if you send 2 in quick succession – they will overlap each other as Node-Red runs asynchronously.

Here’s the solution and it’s a lot better than I had in the past. You can fire off several speech requests including requests for other .mp3 files.  for special effects I have a bunch of MP3 files already stored – such as “alert” and “hailing frequencies open”.

Ivona speech

In the example above (that red block is NOT the Ivona node – it is a subflow I wrote – more in a minute)… let me show you those two INJECTS on the left..

Ivona speech

Ivona speech

The first has “alert” in the topic and some text “Red 1 logged in” – the second simply text.

I can click one, wait for it to speak and then click the second – or I can chose not to wait, clicking wildly – and they will still play in order.  So if you specify speech for the TOPIC AND THE PAYLOAD, simply both will go into the queue in order.

How do I do that – so looking at the red Node-Red SUBFLOW…

flow

The yellow’ish blocks are user functions, the red block is a Node-Red EXEC functions – the purple item is a simple 1 second delay for good measure. The purple item is the node-red-contrib-ivona node.

I take in text… if the topic has the word “alert” in it – I put that on the queue BEFORE the main text – other than that there is NO difference between the two.

If there is no text, just a blank message coming in, I check the queue and if not empty, try to use the items on the queue (first in, first out) one at a time.

The INJECT function is needed to start the ball rolling for the first item in the queue. Once I find text in the queue, it is send to the Ivona node IF such a named file does not exist - and then on to the mpg123 player – either way setting a BUSY flag so that those one-second ticks can’t pull another item off the queue until I’m done.

When done – I  send empty messages back into the input to trigger off any further items in the queue.

Here is the main function:

var frompush=0;
if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof context.global.speech_busy == "undefined") context.global.speech_busy=0;
if ((msg.payload==="")&&(context.global.speech_busy===0))
if (context.arr.length)
{
frompush=1;
msg.payload=context.arr.shift();
}

if (msg.payload!=="")
{
// just push but not recursively
if (frompush===0)
{
if (msg.topic!=="")  context.arr.push(msg.topic);
context.arr.push(msg.payload);
return;
}

context.global.speech_busy=1;
msg.fname=msg.payload.replace(/ /g,'_');
msg.fname=msg.fname.replace(/\./g,'');
msg.fname=msg.fname.replace(/\,/g,'');
msg.fname=msg.fname.toLowerCase();
msg.fname="/home/pi/recordings/"+msg.fname+".mp3";
return msg;
}

 

Note the busy flag and the use of PUSH for the queue.

The “copy file to payload” is trivial – Ivona returns the filename in msg.file which is not where I want it.

msg.payload=msg.file;
return msg;

The reset flag function simply clears the busy flag and returns a blank message.

context.global.speech_busy=0;
msg.payload=""; return msg;

The trigger is the Node-Red DELAY function simply set to delay for one second and then pass the message on.

Ivona speech

 

The MPG123 EXEC node calls mpg123 and passes the file name as parameter. The DELETE node simply deletes the file that Ivona creates… Here’s the Ivona setup. Put your credentials in the top box.

moustache

Note the triple moustache {{{}}} -  Ivona examples use a double – but that then interprets slashes and we don’t want to do that because we have a file path in there.

And that is about it – works a treat and produces high quality buffered speech – for free – for your IOT endeavours.

Pick your own file directory (note that I used /home/pi/recordings but that isn’t in any way special) and any words or phrases you want SOUNDS for instead of voice – simply replace with files of the same name (not that spaces are replaced in files names by underscores).  So “alert 2” as a file name would be “alert_2.mp3”

Of course I'm always on the lookout for alternatives, especially those which profess to need no connection.

https://mimic.mycroft.ai/

This looked great - and the short  example in their videosounded like a human. Sadly, on installing this on the Pi (it can't play without drivers which don't come on the Pi as standard - but it can generate files) I gave it my favourite micktake of a certain politician "I'm a little tea-pot short and stout, open my mouth and shite comes out"...  and I have to say it made an UTTER and complete mess of it, sounding like a quite ill Dalek. Think I'll stick with Ivona.

Facebooktwittergoogle_pluspinterestlinkedin

IOT Speech recognition

Here’s a thought – now this might be available – but I can’t find it..

Most of us have phones – some Android, some Apple, few Microsoft. So sticking with Android for now – the latest Android phones (I have the HTC One M8 with Android 6.0 which is marvellous) handle speech recognition well.  “Remind me to turn the cooker off in 5 minutes”.  Mind you it never did actually remind me but never mind.

So all of that works.. but what would be REALLY NICE – if the same exact thing could recognise “NODERED”..

So how about “NODERED turn kitchen light on”.

ALL that is needed is for that entire sentence – but only sentences that begin with NODERED – to end up being sent by whatever – MQTT maybe – to the Node-Red installation – from there the most trivial of line parsers could figure out what you were trying to achieve – and do it for you.

SEE THIS UPDATE: https://tech.scargill.net/from-zero-to-star-trek/

 

Facebooktwittergoogle_pluspinterestlinkedin

Node Red Speech – The Sequel

Some time ago, I wrote about speech output (text to speech) on the Raspberry Pi using Node Red. Well, that’s been running just fine but like most things – it works until it doesn’t.

In this case the nice people at Google have decided it is time to make money out of the people they offered free language translation APIs to and the API behind the excellent NORMIT program no longer works – I feel sorry for the fellow who spent ages putting it together. Here’s the original article.. https://tech.scargill.net/talking-raspberry/

So, I’ve checked and the best I can come up with for text-to-speech that doesn’t once again rely on someone else's servers, is “eSpeak”. This is simply installed on your computer as “apt-get install espeak”. I can only vouch for it running on the Raspberry Pi. I tried it on the Radxa Rock yesterday and it failed miserably.

So – I’ve come up with another route.

A program I used which works well, to sound alarms is called mpg123.  (npm install mpg123).  You simply hand it the name of an MP3 file on the command line – and it plays it! http://www.mpg123.de/ for more info.

So yesterday I discovered another service called voicerss (http://www.voicerss.org/) – and you can fire some text at them – and get a streaming MP3 file back!

For example…

http://api.voicerss.org/?key=xxxxxx&src=hello

where xxxxx is the API key they’ll freely give you on application.

So – I wanted this to work in Node-Red from a simple input.  I decided to use msg.payload to carry the text – and reserve msg.topic for the word “alert” if I wanted to prefix the sound with an alarm.

In this version I’ve not implemented complicated queues – I just play the message. See my previous article on queues. These might be needed if you wanted to fire messages out in rapid succession because I’m using Node-Red’s EXEC function and as far as I know – as it fires off things asynchronously – there’s no way to know when jobs are done. I guess you could have it run a batch file which sets and clears some kind of flag – I’ll leave that to others.

Here’s the basic setup.

tmpEAD1

What you see above – is a simple inject node – with the word “alert” in the topic (optional) and some text in the payload – for example “hello how are you”.

The function in the middle does all the work – the EXEC node at the end simply has “mpg123” as the command – other boxes are defaults.

Here are the internals of “THING” in the middle.

if (msg.payload.indexOf(".mp3")!=-1) return (msg); // if payload is an mp3 just play it, nothing else
    if (msg.topic.indexOf("alert")!=-1)
    {
    var msg2 = {
                payload : "",
                topic : ""
                };
        msg2.topic=msg.topic;
        msg2.payload='/usr/audio/alert02.mp3';
        node.send(msg2); // possibly play an alert
    }
    var moment=context.global.moment;
    var timeadd="";
    var dateadd="";
    if (msg.topic.indexOf("time")!=-1)
      timeadd= moment().format("h:mm a ")+ ', ';
    if (msg.topic.indexOf("date")!=-1)
      dateadd= moment().format("dddd, MMMM Do YYYY")+ ', ';
     
    msg.payload = msg.payload.replace(/''/gi, "'");
    msg.payload = msg.payload.replace(/'/gi, "''");   
    msg.payload= " -q 'http://api.voicerss.org/?key=XXXXXXXX&src=" + dateadd + timeadd + msg.payload + "'";
    node.send(msg); // synth   

Where you see XXXXXX above – put your API key from api.voicerss.org

You’ll note I have a file called /usr/audio/alert02.mp3 in there as well -  you can change the name and location but you should grab a suitable short alert mp3 sound from somewhere. I got it off a trekkie site.

As detailed in the original blog you’ll need to install and include MOMENT  if you want to be able to put “time” or “date” into the topic and have that spoken – if not just miss that stuff offf.

Below is a version with no time and date stuff.

 

if (msg.payload.indexOf(".mp3")!=-1) return (msg); // if payload is an mp3 just play it, nothing else
    if (msg.topic.indexOf("alert")!=-1)
    {
    var msg2 = {
                payload : "",
                topic : ""
                };
        msg2.topic=msg.topic;
        msg2.payload='/usr/audio/alert02.mp3';
        node.send(msg2); // possibly play an alert
    }
     
    msg.payload = msg.payload.replace(/''/gi, "'");
    msg.payload = msg.payload.replace(/'/gi, "''");   
    msg.payload= " -q 'http://api.voicerss.org/?key=XXXXXXXX&src="  + msg.payload + "'";
    node.send(msg); // synth   

Have fun.

Facebooktwittergoogle_pluspinterestlinkedin

Talking Raspberry

Updated 21/DEC/2015

This entry is now DEAD. Google have changed the goalposts and removed the API access for their language translation and text to speech system so that the base program behind this work – a program called Normit – is no dead in the water. I’m doing an update blog without the Google service.

This could be a good day, my WIFI problems have disappeared (for now anyway) all on their own and I’ve given up on the Orange Pi PC until someone comes up with a solution for the missing sound ( had some other minor issues – life’s too short). I have had a couple of successes recently - one of them being…  getting the Raspberry Pi 2 to SPEAK. Specifically giving Node-Red the power of speech.

If you’re not using Node-red – or rather if you don’t have NodeJS on your Pi, then this won’t be of much interest! (I did get Node-Red running on the Orange Pi but no sound) – if you are, read on as this is a winner.

So first things first – this is is an example of what I started running on my Raspberry Pi 2… this particular bunch or injects just sit on the Node-Red desktop and can be accessed at the press of a button. The purpose of this, for someone living in Spain but not having a clue what to say to anyone ringing me up in Spanish to inform me of a delivery, is to give me a starting point at talking to them.

Spanish

So I press one of the buttons on the left – and the Pi talks in clear Spanish through the speaker. My next application is more serious – it tells me when my ESP8266 boards have logged in by extracting their ID names from the incoming MQTT response (parser – trivial) and blasting out the text as speech. I was having WIFI issues for a while and it helped to know when the units were in trouble and continuously logging back in.English

In this instance the responses play back in English. So how do we get from a non-talking Raspberry Pi to this?

First things first you need to go look for normit (something like npn install normit –g) and get that running on the Pi at the command line.  What this neat little package does is to take some text and send it off to Google translate, get an mpeg back and play that. You’ll also want mpg123 which I believe is apt-get install mpg123  (you may need to use sudo if you are not root).

So the test is this..

normit en en “hello there”

That will visually respond with the phrase “hello there” onscreen visually. To change that to Spanish:

normit en es “hello there”

and you’ll get the Spanish version.

If you’ve installed mpg123 correctly and your sound is working…

normit en en “Hello there” –t  will play the sound back through the speaker – there’s a slight delay depending on the length of the sentence which should I believe be no longer than 100 characters.

That in itself is quite useful.

in Node-Red there is a function called EXEC (under advanced – and thanks to the Node-Red guys for helping me out with that one).  It will run command line stuff (you can see what’s coming next, can’t you…)

Grab the EXEC node and fill in as follows:

exec

So now if you pass a message such as en en ‘hello there’ –t  in your msg.payload to this node via, say an inject node – you’ll get speech. EASY!!  And thanks to the Node-Red guys for pointing this out to me. Don’t try “spawn” mode and don’t worry it won’t hold up Node-Red – I can prove that by sending 2 messages quickly – it will play two at once (I’ve a fix for that lower down). Ignore the outputs of the EXEC node, you don’t need them.

So that in itself is ok, but you might want to take that a little further to make actual use more flexible and easier. I did. I also have a package called moment on my Pi which make nice time and date formatting.

So in my case the next step was to add a function to this..

image

The contents of the function are as follows – you might fancy something different.

var moment=context.global.moment;
var lang='en en "';
var timeadd="";
var dateadd="";
if (msg.topic.indexOf("time")!=-1)
  timeadd='Time:, ' + moment().format("h:mm a ")+ '.';

if (msg.topic.indexOf("spanish")!=-1)
  lang='en es "';

if (msg.topic.indexOf("date")!=-1)
  dateadd= 'Date:, ' + moment().format("dddd,,, MMMM Do YYYY")+ '.';

msg.payload= lang + dateadd + timeadd + msg.payload + '" -t';
return msg;

So now, in the simplest case the input to that function might simply be…. hello world. The function adds on the technical bits and quotes.

Would it not be nice to package that up – seems like that’s what I’ve done, doesn’t it.   Create a SUBFLOW in Node-Red and dump those two items (the function and the exec  node) in there. Edit the name of the subflow to “speech” and add an input to that function. This really is a lot easier done than discussed. I thought it was going to be mega-hard but it’s not. Think of it as a visual macro.

subflow

 

Lo and behold you have a drop-in box you can use for speech in English or Spanish (or, clearly any language you like).

my speech subflow

Heading back to my Spanish example here’s one of the input INJECT boxes I showed you at the top…

inject

I’ve put the text I want to say in the payload (one sentence please, for some reason Google doesn’t like a full stop in the middle of something – you could of course without too much effort fix that – split the text up and fire out one after another – getting the delay between the parts right could be fun though) – and used the topic to add in options such as “spanish” “time” and “date” – I’m sure you’ll think of others – if you come up with winners do write in.

So now, from a simple command line tool we have a great drop-in for Node-Red projects to get verbal feedback.  And the sound, though with a strong American accent – really is very good.

Add in speech recognition and a little logic and you have your own Hal-9000 – except you could miss out the bit about not letting people back into the spaceship.

The only issue with this is that as this stands, there is no way to stop the unit from sending messages simultaneously – fine for my Spanish stuff, not so good for multiple simultaneous logins of my little ESP8266 boards.  Now, we can’t EASILY tell if a process is finished, but we can schedule messages, so that messages could be put in a queue and that queue checked, say, once every 3 or 4 seconds. If your messages are not ultra-time-critical, then you could use that solution.

Here’s a variation of my flow...

queue

Note the difference. In this version, an inject node is used to send a blank message every 3 seconds. If a message comes in – it is put in a queue, if a blank message comes in the queue is checked – and speech sent out if there is anything there, using the wonderfully elegant combination of arr.pushj and arr.shift – see the code. The only exception being if the message is urgent in which case it is sent immediately.

The code:

if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) {
        context.arr = [];
}
    if (msg.payload!=="")
{
    var moment=context.global.moment;
    var lang='en en "';
    var timeadd="";
    var dateadd="";
    if (msg.topic.indexOf("time")!=-1)
      timeadd='Time:, ' + moment().format("h:mm a ")+ '.';
   
    if (msg.topic.indexOf("spanish")!=-1)
      lang='en es "';
   
    if (msg.topic.indexOf("date")!=-1)
      dateadd= 'Date:, ' + moment().format("dddd,,, MMMM Do YYYY")+ '.';
   
    msg.payload= lang + dateadd + timeadd + msg.payload + '" -t';
    // append new value to the array OR play it now if urgent
    if (msg.topic.indexOf("urgent")!=-1)
        return msg;
    else
        context.arr.push(msg.payload);

}
else
{
    if (context.arr.length)
    {
        msg.payload=context.arr.shift();
        return msg;
    }
}

And just set the inject node to repeat every 3 seconds (well, that works for me) and send a blank message OR add into the topic the word URGENT and it goes out straight away. SIMPLES!!!

And if you think THAT’s good – check out THIS version that ALSO handles .mp3 files!!!  Simply put a file path, say /usr/audio/alert02.mp3 into the subject line – (option of “urgent”  still remains) and you can add your own favourite star trek BEEPS!

Here’s what it looks like…

mpeg handling

If you need beeps….. http://www.trekcore.com/audio/

And here’s the code – I’ve also added an ALERT option to play back both .mp3 and the voice file (not I’ve embedded a specific file in there as I’m out of fields – I may write a node for all of this yet):

var frompush=0;
if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (msg.payload=="") if (context.arr.length) { frompush=1; msg=context.arr.shift();  }

if (msg.payload!=="")
    {
    // if not urgent just push but not recursively
    if ((msg.topic.indexOf("urgent")==-1) && (frompush==0)) { context.arr.push(msg); return; }

    if (msg.payload.indexOf(".mp3")!=-1) return [null,msg];
    if (msg.topic.indexOf("alert")!=-1)
    {
    var msg2 = {
                payload : "",
                topic : ""
                };
        msg2.topic=msg.topic;
        msg2.payload='/usr/audio/alert02.mp3';
        node.send([null,msg2]);
    }
    var moment=context.global.moment;
    var lang='en en \'';
    var timeadd="";
    var dateadd="";
    if (msg.topic.indexOf("time")!=-1)
      timeadd= moment().format("h:mm a ")+ ', ';
   
    if (msg.topic.indexOf("spanish")!=-1)
      lang='en es \'';
   
    if (msg.topic.indexOf("date")!=-1)
      dateadd= moment().format("dddd, MMMM Do YYYY")+ ', ';
   
    msg.payload= lang + dateadd + timeadd + msg.payload + '\' -t';
    return [msg,null];
    }


And all of that was fine – until I started to experiment with multiple message – at which point it all went to hell. For reasons I’m not yet sure of – sending an entire object to an array and pushing and shifting doesn’t work that well. I made a slight change shown in red below and fixed the problem.

var frompush=0;
if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (msg.payload=="") if (context.arr.length) { frompush=1; msg.topic=context.arr.shift();  msg.payload=context.arr.shift();  }

if (msg.payload!=="")
    {
    // if not urgent just push but not recursively
    if ((msg.topic.indexOf("urgent")==-1) && (frompush==0)) { context.arr.push(msg.topic); context.arr.push(msg.payload); return; }

    if (msg.payload.indexOf(".mp3")!=-1) return [null,msg];
    if (msg.topic.indexOf("alert")!=-1)
    {
    var msg2 = {
                payload : "",
                topic : ""
                };
        msg2.topic=msg.topic;
        msg2.payload='/usr/audio/alert02.mp3';
        node.send([null,msg2]);
    }
    var moment=context.global.moment;
    var lang='en en \'';
    var timeadd="";
    var dateadd="";
    if (msg.topic.indexOf("time")!=-1)
      timeadd= moment().format("h:mm a ")+ ', ';
   
    if (msg.topic.indexOf("spanish")!=-1)
      lang='en es \'';
   
    if (msg.topic.indexOf("date")!=-1)
      dateadd= moment().format("dddd, MMMM Do YYYY")+ ', ';
   
    msg.payload= lang + dateadd + timeadd + msg.payload + '\' -t';
    return [msg,null];
    }

Now, if someone comes up with a way to detect the end of speech instead of using a timer – I’d be most grateful for the code – a fixed timer is, ok, but not ideal. Another possibility is to look at the number of characters in the string and arrange a time delay based on that – as clearly longer sentences take longer to speak – and it would be good to fire off a sentence before the previous one is finished – as it takes time to download the mpg file.

 

Facebooktwittergoogle_pluspinterestlinkedin