Regular readers will know that I am an Ivona fan. In Node-Red I use the free Ivona service to provide high quality speech for my Raspberry Pi in Node-Red at the heart of my home control setup. Well, Ivona is now defunct. Amazon Polly is a replacement.
I’ll clarify that, Ivona is SOON to become defunct and you can’t create new accounts. The Amazon Polly system, is for most purposes a replacement for Ivona.
So – if you go to the Ivona site – you will see the reference to Amazon on the front page. The short, sharp answer is: Polly works, it is effectively free and it is as good as or better than Ivona. Read on.
So the Amazon system “Polly” works via an account. I have an Amazon Developer account and when I tried to add Polly – it said I didn’t have the right permissions – so – I added user Pete to my account and made him part of the Polly group – and that didn’t work either – then I noted something about payment and realised I’d not put any payment details in – I did that – and all of a sudden the thing came to life and I got the only things I needed – my user ID and secret password.
DON’T PANIC about payment – there is a free tier of up to (wait for it) 5 million characters per month for the first 12 months then $4 per million characters – by which point you probably won’t need any – read on) – for my purposes there is not a hope in hell I’ll ever reach the free limit. In addition - the way I use it is the way they seem to want you to – download a phrase as a file (MP3) and save it with a meaningful file name. Next time you want that phrase – check to see if the file already exists – if so, play it, if not, get a new file from Amazon. In a typical use case that I might have, once the messages are used once there is very little chance of me needing to download anything and hence NO chance of incurring charges at least in the first year.
There are no doubt more elegant ways to do this than calling a command line from Node-Red and sometime someone will write a node to do it – might even be me – but right now this works perfectly and as far as I know it is the only published solution for Node-Red and Polly. If I’m wrong please do tell.
I’m assuming you have your credentials – don’t worry about location – they don’t have a location for England but all it means is to tell the code which server to use. Ireland works for me and it is working from here in Spain.
You need to grab the command line code – I used this on a Pi2.
sudo pip install awscli
Once that was in I used:
to set up the user ID, secret key and location which I’d already set up on the Amazon site.
That done I tried this:
aws polly synthesize-speech --output-format mp3 --voice-id Amy --text "Hello my name is peter." peter.mp3
The resulting file was sitting in the /home/pi directory – this used the voice Amy (British female) to store a phrase into peter.mp3. Good for testing but as you’ll see the final solution is much better.
The rest is about queuing messages, storing them with meaningful names, playing them back and making sure you don’t re-record a phrase you have already recorded. If you don’t like Amy – use another voice. If you want different voices for different phrases then you could incorporate the name into the filename (I’ll leave that to the reader). If you want to add sound effects – just put .MP3 files in the relevant folder with your sound effects and call them by name.
Looking at the above diagram, a test inject passes what you want Polly to say in the payload.
The first function looks to see if the payload has something in it and if so it pushes that onto a stack. The code then looks to see if speech is busy – if not and if there is something on the stack, it checks – if it is an mp3 file it sends the file to the MP3 player. If it is not an mp3, it looks to see if you’ve already created an mp3 for that speech, if so it plays that file, otherwise it passes the message onto Amazon to create the file – which is then played back.
It would have been nice to process new speech while playing something else back but that would get more complicated, involving more flags. As it stands this is easy to understand. You can fire in more speech or .MP3 files while one is playing and they will simply be queued.
You clearly need your Amazon account setup and Node-Red for this – you also need MPG123 player. Both Node-Red and MPG123 are in my standard script.
Here is the code I used in each of those functions…. the MPG123 exec node simply has mpg123 for the command and the append payload ticked. The AWS exec node has aws for the command and the append payload ticked.
Here is the code for the three yellow function nodes:
That was version 1. But ultimately I wanted new speech to be processed by Amazon WHILE a previously recorded item was playing (assuming a previously recorded item was playing and the next item had to be created).
Several hours later I came up with this – it appears to work!
and here is the modified code – probably not QUITE as straightforward to read – but when you run it – indicators on the EXEC functions (brown) show clearly that the software is able to play a recorded message while fetching a new one. Could do with some extreme testing.
and here, the two small function nodes