Amazon AWS Polly Neural Speech on RPi

Polly

Regular readers will know that for years I was a great fan of using Ivona speech on my Raspberry Pi. In Node-Red I used the free Ivona service to provide high quality speech in Node-Red at the heart of my home control setup. Well, Ivona, good as it was, has been defunct now for some time. Amazon Polly is great. especially with the “neural” enhancement – and I’m now using it on RPi4 in the UK and Spain. This post was originally written for Ivona in 2017 and has been completely overhauled for the new NEURAL option in POLLY as well as (optionally) including the voice in the input.

The Amazon Polly system, is for most purposes a replacement for Ivona. The short, sharp answer is: Polly works, it is effectively free (<5 million characters a month or 1 million characters with the new “neural” option) – and it is actually better than Ivona.

Read on, as my simple Node-Red code caches data so that previously-used text does not require successive calls to Amazon servers. In case you’re wondering, I could not see a decent AWS POLLY logo/icon on their website so I took this image from a free-use, no-attribute site 🙂

So the Amazon system “Polly” works via an AWS account. I have a free Amazon “developer” account and when I tried to add Polly – it said I didn’t have the right permissions – so – I added user pete to my account and made him part of the Polly group – and that didn’t work either – then I noted something about payment and realised I’d not put any payment details in – I did that – and all of a sudden the thing came to life. This has been running for 3 years now and I think they’ve charged me maybe £1 sterling in all that time.

The way I use Polly seems appropriate  – download a phrase as a file (MP3) from your text input and save it with a meaningful file name.  Next time you want that phrase – check to see if the file already exists – if so, play it, if not, get a new file from Amazon. In a typical use case that I might have, after a message is used once, I cache it in it’s own file and hence NO chance of incurring significant charges.

There are, no doubt, more elegant ways to do this than calling a command line from Node-Red – but this method works perfectly and as far as I know, the result is unique for Node-Red and Polly. If I’m wrong please do tell. I tried contrib-tts-ultimate but no joy with Polly.

I’ll assume you have your aws credentials (simple – free) – don’t worry about location.

Use the command line code below – I used this on RPi2-4 without any issues. As user pi, I created a folder called /home/pi/audio to store files… then…

sudo pip install awscli

I set the region to us-west-1 for no other reason than initially not knowing any better – no matter as this gets overwritten in the code below. The output format you’d expect to enter might be MP3 but no – so I picked json for no good reason – again see the AWS POLLY example below.

Once AWS was installed, I used:

aws configure

to enter the user ID and secret key (both of which I’d already set up on the Amazon site), location and format.

pi@ukpi:~ $ aws configure
AWS Access Key ID [*********]: 
AWS Secret Access Key [**********]:
Default region name [us-west-1]:
Default output format [json]:

That done I tried this:

aws polly synthesize-speech –output-format mp3 –voice-id Amy –engine “neural” –region eu-west-2 –text “Hello my name is peter.” /home/pi/audio/peter.mp3

The resulting file was .MP3 sitting in my /home/pi/audio folder – this used the voice Amy (British female) to store a phrase into peter.mp3. 

mpg123 /home/pi/audio/peter.mp3

Sorted – good for testing but as you’ll see the final solution is much better. I have this the wrong way around of course – you should first ensure you have a working MP3 player with some standard .MP3 file before testing Polly to keep life simple.

The Node-Red sub-flow below is about queuing messages, storing them with meaningful names, playing them back and making sure you don’t re-record a phrase you have already recorded. If you don’t like the default Amy – I’ve included the code to let you add another voice into your input (Brian for example).

If you want to add sound effects – just put .MP3 files in the audio folder and call them by name. I have files like red-alert.mp3 and similar using Star Trek recordings – far better to have the original than a modern voice wailing “red alert”?

The first function looks to see if the payload has something in it and if so it pushes that onto a stack. The code then looks to see if speech is busy – if not and if there is something on the stack, it checks – if it is an mp3 file it sends the file to the MP3 player. If it is not an mp3, it looks to see if you’ve already created an mp3 for that speech, if so it plays that file, otherwise it passes the message onto Amazon to create the file – which is then played back with a tiny delay.

It would have been nice to process new speech while playing something else back but that would get more complicated, involving more flags. As it stands this is easy to understand. You can fire in more speech or .MP3 files while one is playing and they will simply be queued.

You clearly need your free Amazon account set up and Node-Red for this – you also need MPG123 player. Both Node-Red and MPG123 are in my standard “The Script” if that helps.

Here is the code I used in each of those functions…. the MPG123 exec node simply has mpg123 for the command and append payload ticked. The AWS exec node has  aws for the command and append payload ticked.

Here is the code for the three yellow function nodes below:

I put the code a Node-Red sub-flow (for ease of use) that can be used by simply injecting some text into the incoming payload.

if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof global.get("speech_busy") == "undefined") global.set("speech_busy", 0);
if (typeof global.get("create_speech_busy") == "undefined") global.set("create_speech_busy", 0);

if (msg.payload !== "") context.arr.push(msg.payload);
if (context.arr.length) {
    msg.payload = context.arr.shift();
    var voice = "Amy";
    var paysplit = msg.payload.split("|");
    if (typeof paysplit[1] !== 'undefined') {
        voice = paysplit[0];
        voice = voice.charAt(0).toUpperCase() + voice.slice(1).toLowerCase();
        msg.payload = paysplit[1];
    } else msg.payload = paysplit[0];

    if (msg.payload.indexOf(".mp3") == -1) {
        var fs = global.get("fs");
        var mess = msg.payload;
        mess = mess.replace(/'/g, "");
        var messfile = mess.toLowerCase();

        messfile = messfile.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        messfile = messfile.replace(/ /g, "_");
        messfile = "/home/pi/audio/" + messfile + ".mp3";

        if (fs.existsSync(messfile)) {
            if (global.get("speech_busy") == 1) {
                context.arr.unshift(msg.payload);
                return [null, null];
            } else {
                global.set("speech_busy", 1);
                msg.payload = messfile;
                return [null, msg];
            }
        } else {
            if (global.get("create_speech_busy") == 1) {
                context.arr.unshift(msg.payload);
                return [null, null];
            } else
            //if (global.get("create_speech_busy")==1) {   return [null, null]; } else
            {
                context.arr.unshift(msg.payload);
                global.set("create_speech_busy", 1);

                msg.payload = 'polly synthesize-speech --output-format mp3 --engine "neural" --region eu-west-2 --voice-id ' + voice + ' --text "' + mess + '" ' + messfile;

                return [msg, null];
            }
        }
    }
    if (global.get("speech_busy") == 1) context.arr.unshift(msg.payload);
    else {
        global.set("speech_busy", 1);
        return [null, msg];
    } // mp3 or synth  



// clear create_speech_busy function
global.set("create_speech_busy",0);
msg.payload="";
return msg;


// clear speech_busy function
global.set("speech_busy",0);
msg.payload=""; 
return msg;
msg.payload="Hello there"

or

msg.payload="brian|Hello there"

The latter includes one of the acceptable Polly voice names which must start with upper case but I’ve handled that internally.

I did consider adding the Google Assistant voice – but by the time I got to the point where Google were asking for inside leg measurement I decided… “NOOO” – give that a miss. See this guide for Google Assistant API access.

Facebooktwitterpinterestlinkedin

42 thoughts on “Amazon AWS Polly Neural Speech on RPi

  1. Currently I am a heavy Alexa user utilizing node-red to manage all of my home automation but doing it verbally with an Alexa/node-red combo.

    I would like to not be using Alexa but I am a little confused how exactly you use this system. I understand that you are able to capture mp3 audio based on text. But what do you do with it after that?

    1. Hi there Joe

      I’ve used various Node-Red Alexa nodes over a long period of time and almost all of worked fine until they no longer worked, thanks to Amazon changing the goalposts – and then finally discouraged by a several-day-long Internet outage I have all but Abandoned using Alexa for anything but additional “nicety” speeches and annoying my cat (my wife uses it as a timer and to play music) – I certainly have no intention to RELY on it for home control in future, good as it is when it works.

      I am using the AWS POLLY to capture phrases which 99% of the time will then remain cached locally. These are captured on a Raspberry Pi which is also controlling my home locally (and using the VERY reliable rpi-clone to make backups of everything), the .MP3 files if present are used automatically by MPG123 to then give verbal confirmations where required of events happening NOT in my office, i.w. in other parts of the building. So for the most part all activities except “niceties” are happening internally. Node-Red calls the subflow to get a phrase and if it is already available, plays it. If not, it goes to get it from AWS and THEN plays it. I supposed I should add a standard phrase along the lines of “The Internet is bust right now, I cannot play that phrase – would you like me to try later” 🙂

      1. ok, so I think that I “get” the use of using POLLY to convert text to spoken audio. But are you also somehow converting your speech to text so that you can do something with it?

        Also, you mentioned in one of the initial posts that you posted the flow somewhere. I think what I am seeing in that post is a function node?

  2. I’d warn anyone using AWS to make sure they have Multi Factor Authentication enabled.

    Account hacking is rife at the moment. If your account is hacked, expect a bill for thousands, the hackers will spin up high end servers for crypto mining.

  3. Do now, months later, I’m copying the blog to remember how to do this on a new pi3… and of course:

    aws polly synthesize-speech –output-format mp3 –voice-id Amy –text “Hello my name is peter.” peter.mp3

    is not working. mp3 is not a valid format. Ideas anyone?

      1. I deleted all the config – worked as well. Now I have a lsrger issue, my new NanoPi K1+ has no working audio output – I’ve written to Friendlyarm to ask what’s wrong.

          1. Thank you. In this case I’m ahead. All utilities installed via script. No second device exists in either of their operating systems (something wrong there), one output with no sound coming out using that test or mpg123. Told FA.

  4. What would be *really* nice would be to send a playlist and let the player do the dirty work. 😀
    I just tried with Google Home, both m3u and pls, with no success.

    However, I also tried cat *.mp3 >> big.mp3 and… bingo! Google Home played the resulting file perfectly!

    OK, I’m definitively not on the elegant side here. 😉

  5. Hmmm… I realize only now that my quote may have sound a bit negative.
    My point was about someone writing a node to do the stuff, at least partially, not about the elegance (I’m not even the author of those nodes).
    Sorry about that.

    1. Not at all Yannick – anything that leads to a better way of doing things is good.

      Now, that script does everything my code does but for one thing.

      It will happily send out mp3 files (or the same file) one after another – and does nothing about queueing them up – part of what I’ve done is to build a queue and play them in order. So actually, it may not be QUITE as useful as it looks.

      Also I fire off to MPG123 via an EXEC function… I tried one of the nodes that will do this – and if you put the wrong information in – it crashes Node-Red – so that went in the bin 🙂

      What would be nice is a node for the next part – take in a file name – play it if there is nothing there – or queue it – and in the background watch for files finishing playing before starting the next one.

      Am I missing something here. An example – at dusk, my BIGTIMERS report on what’s happening – may be several things at once – if it were not for the orderly queue, I’d get the lot at once.

  6. Sure we have to do something with the mp3 file. 🙂
    Actually I don’t use it directly on my raspberry, just dropping it in my HTTP server document root (Apache, but I could have used Node-RED as well) so Google Home can get it and play it.
    Since GH is basically a Chromecast device, you can send it any arbitrary text as an mp3 file (more exactly: you can send it the URL of any mp3 file, by using the Chromecast node).
    Concerning secret keys, Amazon is quite clear: if you don’t remember your key, just create a new pair. 😉

    1. Well, I have downloaded and set up the node-red-contrib-polly-tts node – but it IS worth noting that you still have to use a command line utility – to make use of the MP3 file it creates! But at least on the surface this does look like a nicer way – once you get past the hassle of creating public and secret keys (and as I found out today once created you cannot access the secret key again from Amazon – so keep a local copy somewhere!)……

      As I don’t have Google Home – can you clarify – can you send any text you like arbitrarily to Google Home – because you certainly can’t do that to Amazon’s Echo….

  7. I thought so, even with all of yours and Aidan’s help I never did get the amazon to talk to node red. Virgin router, port 433, https, SSL cert’s etc.

    1. No. So firstly the only way to get custom speech out of Alexa is to make a skill for it – and last time I did that, it involved various hoops including having port 433 access internally as Amazon insist the end-point is https: and that’s ok but they also insist for reasons beyond me that it HAS to be on port 433. That is not always possible. Secondly as far as I’m aware, even with a skill, it is not currently possible to have Alexa respond on demand ie asynchronously to an instruction. Sadly Amazon’s focus on sales seems to prohibit making life easier for developers.

        1. Thanks for that Steve – I’ll look forward to seeing that and a relaxation of the need to use port 433 – and not in America. When all that comes together – a simple App without 433 access that can asynchronously allow arbitrary replies from Alexa – and even return the text of a user command – then we will truly have a powerful platform. I’m not holding my breath however.

    1. Let me see what I have in my settings file – and no change – Node-Red never enabled this – I just chose to use the file system and any nodejs modules you use (other than node-red modules you have to enable in settings.js – I could not tell you want blog entry I covered this in but I did somewhere as have others.

      Right in my /home/pi/node-red/settings.js file I have the following….

      var fs = require(“fs”);

      That is about line 19 in my case – near the top of the file…

      Also you need this – well, you need the fs bit – os is handy also – moment is just something I’ve added. So this then makes fs available to Node-Red….

      functionGlobalContext: {
      os:require(‘os’),
      // bonescript:require(‘bonescript’),
      // jfive:require(“johnny-five”),
      moment:require(‘moment’),
      fs:require(‘fs’)
      },

  8. In the main function code above I get:
    TypeError: Cannot read property ‘existsSync’ of undefined

    I’ve googled it but can’t find a resolution

  9. In my brief experience, trying to get bluetooth working on my pi3 was an exercise in frustration.

    1. Hi Peter – well the only experience I have up to now of Bluetooth on the Pi (I ignored the BT on the Pi3 – indeed I deliberately disabled it as it screws with the serial port – how daft is that) is putting a BT dongle (just a cheapy) onto a Pi2 and it worked immediately – indeed I’m using one here now on the main Pi in Spain to pull in BT signals from those little garden sensors – and in fact, being a dongle rather than built in is an advantage as the range is so poor I have it on a lead which goes nearer the wall. I’ll drill a hole in the wall at some point so I can use more of the sensors around the garden – right now it just won’t reach to the other side of the garden through one breezeblock wall.

  10. Hi Peter,
    is there an easy way to output the mp3 file via an bluetooth speaker?

    By the way, I love your posts. They helped me a lot. Now I understand Node-Red and i’m using it for my home automation.
    Thank’s for your work.

    1. Hmm, I’ll have to ponder that. Ok, so ASSUMING you had a Bluetooth dongle on the Pi – and assuming it was supported – and assuming the command line MPG123 player can output to the Bluetooth speaker by a command line option – then yes – easy.

      But that’s a lot of assumptions about your knowledge, your ability to find knowledge on the web and your particular hardware. First step would be to look up Bluetooth output on Raspberry Pi – avoiding for the minute the built in stuff on the Pi3 which uses the serial port or some other thing that makes it fairly useless… then having ascertained that Bluetooth output sound is feasable – then checking at the MPG123 site for options to divert the output to Bluetooth would be the next step – Node-Red etc does not need to know anything about this.

      1. beware if you did that mod about serial (stealing port to bluetooth) on raspberry 3…. it’s a shame they didn’t add a simple audio swap as they did for hdmi/3.5″ jack in raspi-config…

  11. Hi Peter
    Thanks a million for all your hard and very inspirational work. It is super relevant for me and highly appreciated. One suggestion – though not related to “Amazon Polly”:

    I started getting your regular and automatic mails with the topic : “Latest posts from the Tech Blog” sent from “Admin “. They show up in my inbox ad coming from “Admin”. I cant help not to think Who the hell is “Admin”? And I might not be the only one. Hence a subtle suggestion, that you change your sender ID to something less generic. Maybe “Scargill’s Tech Blog” if you don’t want your personal name here.

    Cheers and thanks again from Copenhagen.

    1. Good one – I’m onto it. I’ve never figured out ehy that mailing system always seems to send out just over 1000 emails – it’s been doing that for 2 years now – I wonder if there’s a hidden limit somewhere.. my subscribers have almost doubled since I noticed an increase in newsletters. Hmmm.

Leave a Reply

Your email address will not be published. Required fields are marked *

Leave the field below empty!

The maximum upload file size: 512 MB. You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here