Speech … the Final Frontier

Ivona speechUpdated 02/01/2017

Buffered, SD-cached high-quality human-like speech working alongside sound effects for your IOT project.

Requirements: Raspberry Pi or similar with audio out, Node-Red and a little of your time.

But you don’t need this blog – there’s a WAY better way to do this – in my latest entry about the Ivona Node.

As you may know I’ve done a couple of blog items on Speech on the Raspberry Pi over time. I was quite happy with the Google Translate API until recently when they stopped making it available for free,  so I was forced to go off in search of alternatives and settled on local synthesizers like eSpeak – the problem being – they generally sound AWFUL. eSpeak sounds like someone’s dog being strangled.

And so it was that one of our readers (thank you for that) contacted me and suggested I take a look at Ivona. https://www.ivona.com/us/

This is new: I’ve just gutted the code from the original blog to produce a much better version, better thought out with cacheing for off-line use, unifying speech and sound effects into an auto-created library. In the event of an external comms failure so that you cannot access the outside world, your Ivona is going to fail and this happened here at Bedrock – SO the idea hit us to CACHE all mp3 files (assuming you’re not going to do unique messages every time). That way the SECOND time you ask for a message you will already have a file with that name – the total file space for dozens or even hundreds of these MP3 files is a tiny fraction of a typical SD storage capability and not even worth taking into consideration in most projects. Currently we are working on modifying the Ivona node to handly dynamically changing voices. More on that soon.

I originally did a video to accompany this blog - https://www.youtube.com/watch?v=qoxPVa48qRw

If you use the link I’ve provided above (to Ivona) and select a suitable voice on their web page then enter some text in the box, you’ll find it does a pretty good job. Ivona is free to developers – just grab a free account and get an API key and secret key. I spent the entire evening playing with the code and when I looked at the percentage of my “free use” – nothing  - so you’re unlikely to run out of free use and with the mods here even less so.

Take a tip when you copy and paste that API code information (which you should immediately on getting an account as you won’t be able to get the same key later) and pass them via NOTEPAD (paste then copy again) to get rid of any hidden characters.

So now you have an API key and secret key to be used in the Node-Red node Ivona (node-red-contrib-ivona) https://www.npmjs.com/package/node-red-contrib-ivona

Your API key for Node-Red is the ACCESS code and the PASSWORD is the secret key !!!!! I used my email address for the username – this is all a bit non-intuitive so beware.

Over to Node-Red. So what this node does is take in your TEXT, send it off to Ivona which returns an MP3 file with your chosen speech. You should also have MPG123 installed on your end computer (I’m using a Raspberry Pi2 for all of this). http://www.mpg123.de/

In the simplest case you would send off your text, get the MP3 file, send that to mpg123 for playback. But then you are stuck with a file… and what if you send 2 in quick succession – they will overlap each other as Node-Red runs asynchronously.

Here’s the solution and it’s a lot better than I had in the past. You can fire off several speech requests including requests for other .mp3 files.  for special effects I have a bunch of MP3 files already stored – such as “alert” and “hailing frequencies open”.

Ivona speech

In the example above (that red block is NOT the Ivona node – it is a subflow I wrote – more in a minute)… let me show you those two INJECTS on the left..

Ivona speech

Ivona speech

The first has “alert” in the topic and some text “Red 1 logged in” – the second simply text.

I can click one, wait for it to speak and then click the second – or I can chose not to wait, clicking wildly – and they will still play in order.  So if you specify speech for the TOPIC AND THE PAYLOAD, simply both will go into the queue in order.

How do I do that – so looking at the red Node-Red SUBFLOW…

flow

The yellow’ish blocks are user functions, the red block is a Node-Red EXEC functions – the purple item is a simple 1 second delay for good measure. The purple item is the node-red-contrib-ivona node.

I take in text… if the topic has the word “alert” in it – I put that on the queue BEFORE the main text – other than that there is NO difference between the two.

If there is no text, just a blank message coming in, I check the queue and if not empty, try to use the items on the queue (first in, first out) one at a time.

The INJECT function is needed to start the ball rolling for the first item in the queue. Once I find text in the queue, it is send to the Ivona node IF such a named file does not exist - and then on to the mpg123 player – either way setting a BUSY flag so that those one-second ticks can’t pull another item off the queue until I’m done.

When done – I  send empty messages back into the input to trigger off any further items in the queue.

Here is the main function:

var frompush=0;
if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof context.global.speech_busy == "undefined") context.global.speech_busy=0;
if ((msg.payload==="")&&(context.global.speech_busy===0))
if (context.arr.length)
{
frompush=1;
msg.payload=context.arr.shift();
}

if (msg.payload!=="")
{
// just push but not recursively
if (frompush===0)
{
if (msg.topic!=="")  context.arr.push(msg.topic);
context.arr.push(msg.payload);
return;
}

context.global.speech_busy=1;
msg.fname=msg.payload.replace(/ /g,'_');
msg.fname=msg.fname.replace(/\./g,'');
msg.fname=msg.fname.replace(/\,/g,'');
msg.fname=msg.fname.toLowerCase();
msg.fname="/home/pi/recordings/"+msg.fname+".mp3";
return msg;
}

 

Note the busy flag and the use of PUSH for the queue.

The “copy file to payload” is trivial – Ivona returns the filename in msg.file which is not where I want it.

msg.payload=msg.file;
return msg;

The reset flag function simply clears the busy flag and returns a blank message.

context.global.speech_busy=0;
msg.payload=""; return msg;

The trigger is the Node-Red DELAY function simply set to delay for one second and then pass the message on.

Ivona speech

 

The MPG123 EXEC node calls mpg123 and passes the file name as parameter. The DELETE node simply deletes the file that Ivona creates… Here’s the Ivona setup. Put your credentials in the top box.

moustache

Note the triple moustache {{{}}} -  Ivona examples use a double – but that then interprets slashes and we don’t want to do that because we have a file path in there.

And that is about it – works a treat and produces high quality buffered speech – for free – for your IOT endeavours.

Pick your own file directory (note that I used /home/pi/recordings but that isn’t in any way special) and any words or phrases you want SOUNDS for instead of voice – simply replace with files of the same name (not that spaces are replaced in files names by underscores).  So “alert 2” as a file name would be “alert_2.mp3”

Of course I'm always on the lookout for alternatives, especially those which profess to need no connection.

https://mimic.mycroft.ai/

This looked great - and the short  example in their videosounded like a human. Sadly, on installing this on the Pi (it can't play without drivers which don't come on the Pi as standard - but it can generate files) I gave it my favourite micktake of a certain politician "I'm a little tea-pot short and stout, open my mouth and shite comes out"...  and I have to say it made an UTTER and complete mess of it, sounding like a quite ill Dalek. Think I'll stick with Ivona.

Facebooktwittergoogle_pluspinterestlinkedin

13 thoughts on “Speech … the Final Frontier

  1. Nice one Peter, the youtube video is especialy helpful - , thanks for sharing.
    Just for reference - I got the flow working on Windows too (just needed to use "Videolan" instead of mpg123 and change file refs).

  2. Hi
    I'm trying to set up the most basic ie :: inject, Ivona and mpg123.
    First problem is that Ivona supplies an ID, access key and secret key. Which 2 of these are relevant to the node-red credentials. How do I know if Ivona has worked? Does it's output need to be saved to file so that mpg123 can pick it up.
    Sorry new to all this..
    John

  3. hi, do you know where to find the espeak module on this site: http://pagenodes.com/
    as it has in his options the google apis and seem to work fine? Sample program:

    [{"id":"e79390cd.186c7","type":"inject","z":"f006e294.0ff92","name":"","topic":"","payload":"prova","payloadType":"string","repeat":"","crontab":"","once":false,"allowDebugInput":false,"x":353,"y":112,"wires":[["f53233e5.0acdd"]]},{"id":"f53233e5.0acdd","type":"espeak","z":"f006e294.0ff92","name":"","variant":"Google italiano","active":true,"x":689,"y":108,"wires":[]}]

  4. If anyone fancies a job - where did I go wrong with single quotes???

    If I put "he's having a nice day" - the single quote goes all to hell. Yet if I try that on the Ivona website it is fine - so it's not their package... it's something I've done.

  5. Hello Peter, I am using Node-js (not sure what node-Red is). I have Ivona-node working almost perfectly with one small problem. I am running on a Raspberry Pi 3 under Raspbian Jessie. and could use a little assistance with a continuing problem

    I create the speech file and then use a child process to play the file using omxplayer. sometimes the full text of the speech is not heard, the last few words being dropped off. For example I will hear "The quick brown fox jumped over the lazy" instead of "The quick brown fox jumped over the lazy dog". I am not doing anything fancy, and using the standard supplied lexicon with the en us voices. If I try and play the mp3 file a few seconds later using omxplayer it always plays fine. This seems to indicate a timing problem in that perhaps the player is starting before the full file is in. I have even added delays prior to forking off the child process using setTimeout but that does not seem to help.

    Any advice or assistance you can provide would be greatly appreciated.

    Barry

    1. Hi there Barry - as I don't use OMXPLAYER I'm unable to help. Perhaps others can. In my setup I use and recommend MP3player and that works fine on Raspberry Pi 2 and 3.

      sudo apt-get install -y mpg123

  6. I believe I found my problem. When piping the output of the Ivona create speech process to a file, I was not waiting for the 'end' event to be emitted prior to starting to play the contents of the speech file. I changed the logic to do that and it seems to have solved the problem.

Comments are closed.