Category Archives: ivona

Ivona Again

A recurring theme in software development is re-inventing the wheel because you don’t know about or don’t understand what is available already. Well, sometimes it is for me anyway.

And so it was that my PERFECTLY working, well appreciated but totally un-necessary workflow for Ivona speech on the Node-Red – went in the bin.

But first:

So (here come the links) – Ivona is an online service from Amazon, free within reason – with REALLY nice quality sound. The Ivona NODE is a free node that lets you talk to Ivona in Node-Red easily using, say a Raspberry Pi  (which in it’s latest incarnation remains the fastest and easiest to use SBC in anywhere near that price range) – as easily as passing a text string to the node – and out pops high quality speech. Don’t like the idea of relying on an external service? Read on…

Yesterday we started working on a better caching mechanism to reduce calls to the Ivona speech API – and this morning – I took a LONG look at the instructions and….. it’s already in.

So here is the MK 2 explanation of how to use the Ivona Node-Red node – this time with the benefit of having actually read the instructions.

It is THIS easy.Ivona

Yup.. all that buffering work for nothing. But I’m making assumptions – let’s start from scratch:

We’re talking about getting high quality speech on, say a Raspberry Pi – using Node-Red.  In my case Node-Red is the central control for my home control… and speech is part of that – I want to know when devices are logging in etc.

There are speech synths for the Pi and other SBCs – and not to put too fine a point – most of them sound like someone being strangled. Ivona on the other hand provides REALLY nice speech in a range of languages. Google used to have a good API but they got greedy.

Installing:

From my Pi script…….

sudo apt-get install -y mpg123

and in the .node-red directory wherever that is on your system…

npm install node-red-contrib-ivona

Oh, Err:

In my original version, having put the speech into a first-in/first-out buffer,  I passed the output to MPG123 – a program (accessed by an EXEC function) that plays MP3 files. Why? Because the Ivona node takes in speech and dumps an MP3 file on your disk/SD with the recording in it.

All fine and good but what about overlapping messages – and what about constant use of the API? Well to cut a LONG story short it is all handled in the node itself. If you play the same message twice, it just uses the file it made last time.  The node successfully queues messages as well (all that work…)  If like me you didn’t read the instructions – the file went into the /tmp directory and you probably thought you’d have to develop your own file system.,

And now:

So – let me show you the setup for the Ivona Node – and then I’ll go through it piece–by-piece.

See installation above – see also my original blog for more details and putting in the credentials that Ivona give you. A one-off task and free.

 

Ivona[6]

So… in the message area above – just leave that as it is… the “moustache” system refers to using braces around stuff. It can take a little grasping but you don’t have to here…  In this case – the message entry simple means – if you fire a msg.payload like “Hello there” into the node – then it will be used. You’ll do that by, for example an INJECT node (see above) sending TEXT in the payload.

Voice: Well that’s simple enough – dropdown box – pick a voice! For UK users , Brian is good.

Exec:  Here you put in the name (and path if needed) of a program to play the file. I definitely recommend installing MPG123 as it is easy and reliable. {{{file}}} simply means – use the file generated by Ivona node.

File: At the start, to the left of {{{ put the name of the directory you want to store files – I made one called “recordings” under my /home/pi directory. The directory should exist before you use it. Don’t forget the slash at the end of the directory.

Then you see a number of items which together take the name of the speaker, the language and the actual text and make these into a filename. I suggest you leave them as-is. In this case..

/home/pi/recordings/{{voice_name}}-{{lang}}-{{slug}}.mp3

That’s it – when you fire speech at the node it should play it (assuming you’ve set the audio output to the right place – check your audio first – I didn’t and got ZILCH – then I realised it was set to go out of HDMI and the monitor I was using had no speakers!!!)

The sound will stay in that file. The next time you go for an identical piece of text with the same speaker (Brian in this case) – it will merely play back the file you already have instead of going off to get more. In a closed system with fixed messages, eventually the Ivona service won’t be needed.

This offers up a possibility for sound effects.  Let’s say you send to Ivona “Alert” in the payload. In my case that will generate “brian—alert.mp3”.  Now, let’s say you have a nice alert sound effect…  by simply giving it the file name “brian—alert.mp3”, that file will be used instead of Brian saying “alert” – so you can have a complete library of MP3 effect files used alongside your recorded speech!!! All without any special mechanisms.  

Fire multiple messages and they are all played in sequence without you having to worry about overlaps.

All in all an excellent node – just a shame some of us didn’t read the instructions the first time around!!

Update May 22: I have now implemented an improvement to the Ivona node and sent this back to the author – time will tell if he implements it.

So it would be nice to be able to control the VOICE dynamically. For example – you might wish to have your gadgets TALK. 

If you want Ivona to work via MQTT the problem is you ONLY have TOPIC and PAYLOAD. The latter is obviously the speech.

The changes I’ve made allow you to strap an MQTT input node to the Ivona node – with the topic subscription ivona/#

This means that Ivona will listen to any message send by MQTT to ivona/  - but if you put a name after the topic – for example ivona/brian – that will change the voice temporarily!!

In the Ivona .JS file is a line as follows..

text = mustache.render(node.message, msg);

Affer that line, add this code (and then restart node-red of course)

 

// Dynamic voice control using topic: ivona/XXX where XXX is a voice - i.e. brian.

node.voice = config.voice; // default voice otherwise any change will be permanent
if (msg.topic.substring(0,6).toLowerCase()=="ivona/")
   {
    for (x in voices)
       {
         if (voices[x].name.toLowerCase()==msg.topic.substring(6,msg.topic.length).toLowerCase()) node.voice=x;
       }
   }
// end of modification

Simples! And so here we see MQTT-SPY in action.

ivona on mqtt-spy

Or with Brian talking…

ivona mqtt with brian

 

And so what about sound effects?  Unless you want the words “alert 1” coming out for example, you need to ensure that pre-recorded sound effect files always use the same name. As the default in my example is “emma” – ensure you do NOT send a voice when you want a sound effect.

To make this easy and compatible with the above MQTT – I used “—“ as a separator so that for example you might say…

Topic: ivona/brian

Payload: alert 1—This is an alert

The first part is automatically sent to Ivona with no topic – hence the default voice and hence the need for a file emma—alert-1.mpg  whereas the second part should be sent with the topic.

subflow

And here is the code for the sub-flow:

var tPayload,tTopic;
var ar=msg.payload.split("--");
if (ar.length>1)
  {
    msg.payload=ar[0];
    tTopic=msg.topic;
    msg.payload=ar[0];
    msg.topic="";
    node.send(msg);
    msg.topic=tTopic;
    msg.payload=ar[1];
  }
node.send(msg);

Here is a VERY handy list of star-trek alert mp3 files.. http://www.lcarscom.net/sounds.htm – just rename the files to have dashes instead of any spaces – and prefix with emma--

Facebooktwittergoogle_pluspinterestlinkedin

Speech … the Final Frontier

Ivona speechUpdated 02/01/2017

Buffered, SD-cached high-quality human-like speech working alongside sound effects for your IOT project.

Requirements: Raspberry Pi or similar with audio out, Node-Red and a little of your time.

But you don’t need this blog – there’s a WAY better way to do this – in my latest entry about the Ivona Node.

As you may know I’ve done a couple of blog items on Speech on the Raspberry Pi over time. I was quite happy with the Google Translate API until recently when they stopped making it available for free,  so I was forced to go off in search of alternatives and settled on local synthesizers like eSpeak – the problem being – they generally sound AWFUL. eSpeak sounds like someone’s dog being strangled.

And so it was that one of our readers (thank you for that) contacted me and suggested I take a look at Ivona. https://www.ivona.com/us/

This is new: I’ve just gutted the code from the original blog to produce a much better version, better thought out with cacheing for off-line use, unifying speech and sound effects into an auto-created library. In the event of an external comms failure so that you cannot access the outside world, your Ivona is going to fail and this happened here at Bedrock – SO the idea hit us to CACHE all mp3 files (assuming you’re not going to do unique messages every time). That way the SECOND time you ask for a message you will already have a file with that name – the total file space for dozens or even hundreds of these MP3 files is a tiny fraction of a typical SD storage capability and not even worth taking into consideration in most projects. Currently we are working on modifying the Ivona node to handly dynamically changing voices. More on that soon.

I originally did a video to accompany this blog - https://www.youtube.com/watch?v=qoxPVa48qRw

If you use the link I’ve provided above (to Ivona) and select a suitable voice on their web page then enter some text in the box, you’ll find it does a pretty good job. Ivona is free to developers – just grab a free account and get an API key and secret key. I spent the entire evening playing with the code and when I looked at the percentage of my “free use” – nothing  - so you’re unlikely to run out of free use and with the mods here even less so.

Take a tip when you copy and paste that API code information (which you should immediately on getting an account as you won’t be able to get the same key later) and pass them via NOTEPAD (paste then copy again) to get rid of any hidden characters.

So now you have an API key and secret key to be used in the Node-Red node Ivona (node-red-contrib-ivona) https://www.npmjs.com/package/node-red-contrib-ivona

Your API key for Node-Red is the ACCESS code and the PASSWORD is the secret key !!!!! I used my email address for the username – this is all a bit non-intuitive so beware.

Over to Node-Red. So what this node does is take in your TEXT, send it off to Ivona which returns an MP3 file with your chosen speech. You should also have MPG123 installed on your end computer (I’m using a Raspberry Pi2 for all of this). http://www.mpg123.de/

In the simplest case you would send off your text, get the MP3 file, send that to mpg123 for playback. But then you are stuck with a file… and what if you send 2 in quick succession – they will overlap each other as Node-Red runs asynchronously.

Here’s the solution and it’s a lot better than I had in the past. You can fire off several speech requests including requests for other .mp3 files.  for special effects I have a bunch of MP3 files already stored – such as “alert” and “hailing frequencies open”.

Ivona speech

In the example above (that red block is NOT the Ivona node – it is a subflow I wrote – more in a minute)… let me show you those two INJECTS on the left..

Ivona speech

Ivona speech

The first has “alert” in the topic and some text “Red 1 logged in” – the second simply text.

I can click one, wait for it to speak and then click the second – or I can chose not to wait, clicking wildly – and they will still play in order.  So if you specify speech for the TOPIC AND THE PAYLOAD, simply both will go into the queue in order.

How do I do that – so looking at the red Node-Red SUBFLOW…

flow

The yellow’ish blocks are user functions, the red block is a Node-Red EXEC functions – the purple item is a simple 1 second delay for good measure. The purple item is the node-red-contrib-ivona node.

I take in text… if the topic has the word “alert” in it – I put that on the queue BEFORE the main text – other than that there is NO difference between the two.

If there is no text, just a blank message coming in, I check the queue and if not empty, try to use the items on the queue (first in, first out) one at a time.

The INJECT function is needed to start the ball rolling for the first item in the queue. Once I find text in the queue, it is send to the Ivona node IF such a named file does not exist - and then on to the mpg123 player – either way setting a BUSY flag so that those one-second ticks can’t pull another item off the queue until I’m done.

When done – I  send empty messages back into the input to trigger off any further items in the queue.

Here is the main function:

var frompush=0;
if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof context.global.speech_busy == "undefined") context.global.speech_busy=0;
if ((msg.payload==="")&&(context.global.speech_busy===0))
if (context.arr.length)
{
frompush=1;
msg.payload=context.arr.shift();
}

if (msg.payload!=="")
{
// just push but not recursively
if (frompush===0)
{
if (msg.topic!=="")  context.arr.push(msg.topic);
context.arr.push(msg.payload);
return;
}

context.global.speech_busy=1;
msg.fname=msg.payload.replace(/ /g,'_');
msg.fname=msg.fname.replace(/\./g,'');
msg.fname=msg.fname.replace(/\,/g,'');
msg.fname=msg.fname.toLowerCase();
msg.fname="/home/pi/recordings/"+msg.fname+".mp3";
return msg;
}

 

Note the busy flag and the use of PUSH for the queue.

The “copy file to payload” is trivial – Ivona returns the filename in msg.file which is not where I want it.

msg.payload=msg.file;
return msg;

The reset flag function simply clears the busy flag and returns a blank message.

context.global.speech_busy=0;
msg.payload=""; return msg;

The trigger is the Node-Red DELAY function simply set to delay for one second and then pass the message on.

Ivona speech

 

The MPG123 EXEC node calls mpg123 and passes the file name as parameter. The DELETE node simply deletes the file that Ivona creates… Here’s the Ivona setup. Put your credentials in the top box.

moustache

Note the triple moustache {{{}}} -  Ivona examples use a double – but that then interprets slashes and we don’t want to do that because we have a file path in there.

And that is about it – works a treat and produces high quality buffered speech – for free – for your IOT endeavours.

Pick your own file directory (note that I used /home/pi/recordings but that isn’t in any way special) and any words or phrases you want SOUNDS for instead of voice – simply replace with files of the same name (not that spaces are replaced in files names by underscores).  So “alert 2” as a file name would be “alert_2.mp3”

Of course I'm always on the lookout for alternatives, especially those which profess to need no connection.

https://mimic.mycroft.ai/

This looked great - and the short  example in their videosounded like a human. Sadly, on installing this on the Pi (it can't play without drivers which don't come on the Pi as standard - but it can generate files) I gave it my favourite micktake of a certain politician "I'm a little tea-pot short and stout, open my mouth and shite comes out"...  and I have to say it made an UTTER and complete mess of it, sounding like a quite ill Dalek. Think I'll stick with Ivona.

Facebooktwittergoogle_pluspinterestlinkedin