Amazon Polly Speech

Regular readers will know that I am an Ivona fan. In Node-Red I use the free Ivona service to provide high quality speech for my Raspberry Pi in Node-Red at the heart of my home control setup. Well, Ivona is now defunct. Amazon Polly is a replacement.

I’ll clarify that, Ivona is SOON to become defunct and you can’t create new accounts. The Amazon Polly system, is for most purposes a replacement for Ivona.

So – if you go to the Ivona site – you will see the reference to Amazon on the front page. The short, sharp answer is: Polly works, it is effectively free and it is as good as or better than Ivona. Read on.

So the Amazon system “Polly” works via an account. I have an Amazon Developer account and when I tried to add Polly – it said I didn’t have the right permissions – so – I added user Pete to my account and made him part of the Polly group – and that didn’t work either – then I noted something about payment and realised I’d not put any payment details in – I did that – and all of a sudden the thing came to life and I got the only things I needed – my user ID and secret password.

DON’T PANIC about payment – there is a free tier of up to (wait for it) 5 million characters per month for the first 12 months then $4 per million characters – by which point you probably won’t need any – read on) – for my purposes there is not a hope in hell I’ll ever reach the free limit. In addition  - the way I use it is the way they seem to want you to  – download a phrase as a file (MP3) and save it with a meaningful file name.  Next time you want that phrase – check to see if the file already exists – if so, play it, if not, get a new file from Amazon. In a typical use case that I might have, once the messages are used once there is very little chance of me needing to download anything and hence NO chance of incurring charges at least in the first year.

There are no doubt more elegant ways to do this than calling a command line from Node-Red and sometime someone will write a node to do it – might even be me – but right now this works perfectly and as far as I know it is the only published solution for Node-Red and Polly. If I’m wrong please do tell.

I’m assuming you have your credentials – don’t worry about location – they don’t have a location for England but all it means is to tell the code which server to use. Ireland works for me and it is working from here in Spain.

You need to grab the command line code – I used this on a Pi2.

sudo pip install awscli

Once that was in I used:

aws configure

to set up the user ID, secret key and location which I’d already set up on the Amazon site.

That done I tried this:

aws polly synthesize-speech --output-format mp3 --voice-id Amy --text "Hello my name is peter." peter.mp3

The resulting file was sitting in the /home/pi directory – this used the voice Amy (British female) to store a phrase into peter.mp3.  Good for testing but as you’ll see the final solution is much better.

The rest is about queuing messages, storing them with meaningful names, playing them back and making sure you don’t re-record a phrase you have already recorded. If you don’t like Amy – use another voice. If you want different voices for different phrases then you could incorporate the name into the filename (I’ll leave that to the reader).  If you want to add sound effects – just put .MP3 files in the relevant folder with your sound effects and call them by name.

Polly Speech

Looking at the above diagram, a test inject passes  what you want Polly to say in the payload.

The first function looks to see if the payload has something in it and if so it pushes that onto a stack. The code then looks to see if speech is busy – if not and if there is something on the stack, it checks – if it is an mp3 file it sends the file to the MP3 player. If it is not an mp3, it looks to see if you’ve already created an mp3 for that speech, if so it plays that file, otherwise it passes the message onto Amazon to create the file – which is then played back.

It would have been nice to process new speech while playing something else back but that would get more complicated, involving more flags. As it stands this is easy to understand. You can fire in more speech or .MP3 files while one is playing and they will simply be queued.

You clearly need your Amazon account setup and Node-Red for this – you also need MPG123 player. Both Node-Red and MPG123 are in my standard script.

Here is the code I used in each of those functions…. the MPG123 exec node simply has mpg123 for the command and the append payload ticked. The AWS exec node has  aws for the command and the append payload ticked.

Here is the code for the three yellow function nodes:

if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof global.get("speech_busy") == "undefined") global.set("speech_busy", 0);

if (msg.payload !== "") context.arr.push(msg.payload);
if ((global.get("speech_busy") === 0) && (context.arr.length)) {
    msg.payload = context.arr.shift();
    global.set("speech_busy", 1);
    if (msg.payload.indexOf(".mp3") == -1) {
        var fs = global.get("fs");
        var mess = msg.payload;
        var messfile = mess.toLowerCase();
        messfile = messfile.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        messfile = messfile.replace(/ /g, "_");
        
        if (fs.existsSync("/usr/audio/" + messfile + ".mp3")) {
            msg.payload = "/usr/audio/" + messfile + ".mp3";
            return [null, msg];
        }
        else {
            var voice = "Amy";
            msg.payload = 'polly synthesize-speech --output-format mp3 --voice-id ' + voice + ' --text "' + mess + '" /usr/audio/' + messfile + '.mp3';
            global.set("speech", messfile);
            return [msg, null];
        }
                    
    }
    return [null, msg]; // mp3 or synth        
}

// now play function node
msg.payload="/usr/audio/" + global.get("speech") + ".mp3";
return msg;

// clr busy function node
context.global.speech_busy=0;
msg.payload=""; return msg;

That was version 1. But ultimately I wanted new speech to be processed by Amazon WHILE a previously recorded item was playing (assuming a previously recorded item was playing and the next item had to be created).

Several hours later I came up with this – it appears to work!

Polly 2

and here is the latest code – probably not QUITE as straightforward to read – but when you run it – indicators on the EXEC functions (brown) show clearly that the software is able to play a recorded message while fetching a new one. Could do with some extreme testing.

 

if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof global.get("speech_busy") == "undefined") global.set("speech_busy", 0);
if (typeof global.get("create_speech_busy") == "undefined") global.set("create_speech_busy", 0);

if (msg.payload !== "") context.arr.push(msg.payload);
if (context.arr.length) {
    msg.payload = context.arr.shift();
    if (msg.payload.indexOf(".mp3") == -1) {
        var fs = global.get("fs");
        var mess = msg.payload;
        var messfile = mess.toLowerCase();
        messfile = messfile.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        messfile = messfile.replace(/ /g, "_");
        messfile = "/usr/audio/" + messfile + ".mp3";

        if (fs.existsSync(messfile)) {
            if (global.get("speech_busy")==1) { context.arr.unshift(msg.payload);  return [null, null]; }
            else { global.set("speech_busy", 1); msg.payload = messfile; return [null, msg]; }
        }
        else {
            if (global.get("create_speech_busy")) { context.arr.unshift(msg.payload);  return [null, null]; } else
            {
            context.arr.unshift(msg.payload);
            global.set("create_speech_busy",1);
            var voice = "Amy";
            msg.payload = 'polly synthesize-speech --output-format mp3 --voice-id ' + voice + ' --text "' + mess + '" ' + messfile;
            return [msg, null];
            }
        }
    }
    if (global.get("speech_busy")==1) context.arr.unshift(msg.payload); 
    else {  global.set("speech_busy", 1); return [null, msg]; } // mp3 or synth        
}

and here, the two small function nodes

// clear creating
global.set("create_speech_busy",0);
msg.payload="";
return msg;

// clear playing
global.set("speech_busy",0);
msg.payload=""; return msg;

Facebooktwittergoogle_pluspinterestlinkedin

20 thoughts on “Amazon Polly Speech

  1. Hi Peter
    Thanks a million for all your hard and very inspirational work. It is super relevant for me and highly appreciated. One suggestion - though not related to "Amazon Polly":

    I started getting your regular and automatic mails with the topic : "Latest posts from the Tech Blog" sent from "Admin ". They show up in my inbox ad coming from "Admin". I cant help not to think Who the hell is "Admin"? And I might not be the only one. Hence a subtle suggestion, that you change your sender ID to something less generic. Maybe "Scargill's Tech Blog" if you don't want your personal name here.

    Cheers and thanks again from Copenhagen.

  2. Hi Peter,
    is there an easy way to output the mp3 file via an bluetooth speaker?

    By the way, I love your posts. They helped me a lot. Now I understand Node-Red and i'm using it for my home automation.
    Thank's for your work.

    1. Hmm, I'll have to ponder that. Ok, so ASSUMING you had a Bluetooth dongle on the Pi - and assuming it was supported - and assuming the command line MPG123 player can output to the Bluetooth speaker by a command line option - then yes - easy.

      But that's a lot of assumptions about your knowledge, your ability to find knowledge on the web and your particular hardware. First step would be to look up Bluetooth output on Raspberry Pi - avoiding for the minute the built in stuff on the Pi3 which uses the serial port or some other thing that makes it fairly useless... then having ascertained that Bluetooth output sound is feasable - then checking at the MPG123 site for options to divert the output to Bluetooth would be the next step - Node-Red etc does not need to know anything about this.

    1. Hi Peter - well the only experience I have up to now of Bluetooth on the Pi (I ignored the BT on the Pi3 - indeed I deliberately disabled it as it screws with the serial port - how daft is that) is putting a BT dongle (just a cheapy) onto a Pi2 and it worked immediately - indeed I'm using one here now on the main Pi in Spain to pull in BT signals from those little garden sensors - and in fact, being a dongle rather than built in is an advantage as the range is so poor I have it on a lead which goes nearer the wall. I'll drill a hole in the wall at some point so I can use more of the sensors around the garden - right now it just won't reach to the other side of the garden through one breezeblock wall.

  3. In the main function code above I get:
    TypeError: Cannot read property 'existsSync' of undefined

    I've googled it but can't find a resolution

    1. Let me see what I have in my settings file - and no change - Node-Red never enabled this - I just chose to use the file system and any nodejs modules you use (other than node-red modules you have to enable in settings.js - I could not tell you want blog entry I covered this in but I did somewhere as have others.

      Right in my /home/pi/node-red/settings.js file I have the following....

      var fs = require("fs");

      That is about line 19 in my case - near the top of the file...

      Also you need this - well, you need the fs bit - os is handy also - moment is just something I've added. So this then makes fs available to Node-Red....

      functionGlobalContext: {
      os:require('os'),
      // bonescript:require('bonescript'),
      // jfive:require("johnny-five"),
      moment:require('moment'),
      fs:require('fs')
      },

    1. No. So firstly the only way to get custom speech out of Alexa is to make a skill for it - and last time I did that, it involved various hoops including having port 433 access internally as Amazon insist the end-point is https: and that's ok but they also insist for reasons beyond me that it HAS to be on port 433. That is not always possible. Secondly as far as I'm aware, even with a skill, it is not currently possible to have Alexa respond on demand ie asynchronously to an instruction. Sadly Amazon's focus on sales seems to prohibit making life easier for developers.

        1. Thanks for that Steve - I'll look forward to seeing that and a relaxation of the need to use port 433 - and not in America. When all that comes together - a simple App without 433 access that can asynchronously allow arbitrary replies from Alexa - and even return the text of a user command - then we will truly have a powerful platform. I'm not holding my breath however.

  4. I thought so, even with all of yours and Aidan's help I never did get the amazon to talk to node red. Virgin router, port 433, https, SSL cert's etc.

Leave a Reply

Your email address will not be published. Required fields are marked *