In my last post Using Text to Speech online tools to create audio files for Arduino projects I had some info about online text to audio sites I have been using to create audio alert files for projects. The files produced by ttsmp3.com seemed to have a lower audio level than I expected. I found I could run all the files through a batch process in Audacity to normalize them. I spent a lot more time trying to find out how to do it so I’m making this quick post so I have the info for next time.
I was running Audacity 2.1.3 which is now an old version. The terminology and location of the feature changed in 2.3.0. Essentially the pre 2.3 version seems to call it ‘chains’ whereas it became ‘macros’ in 2.3. I’ll include info for both as I made notes for the older versions first.
Create a chain/macro with the processes that will be run during the batch process. This only needs to be set up once.
Select files and run the task.
There are a lot of other changes that can be included. I’ve also added Mono to one as I found some of my files created another way were stereo and others mono. My project only had a single speaker so I converted them all to mono for consistancy.
Pre 2.3 version
In these versions the batch process is referred to as a chain.
To create a chain
Go to File
Select Edit chains…
Add button
Set a name, e.g. Normalize to -0.1dB
Insert Normalize – Tweak settings if desired. I didn’t
Insert ExportMP3 – This is required otherwise the files are not saved
The chain should look something like this
To use the chain
Go to File
Select Apply chains…
Select the chain, in my case ‘Normalize to -0.1dB’
Select Apply to files
Browse to the file folder and choose files
Select Open
Converted files will be placed in a folder called cleaned inside the folder of the selected files
Current version
In later versions it is called a macro.
To create a macro
Go to Tools
Select Macros…
New button
Set a name, e.g. Normalize
Insert Normalize. There is also an option ‘Normalize (Macro_Normalize)’ that I have not tried
Insert Export as MP3 – This is required otherwise the files are not saved
To use the macro
Go to Tools
Select Macros…
Select the macro, in my case Normalize
Select Files
Browse to the file folder and choose files
Select Open
Converted files will be placed in a folder called macro-output inside the folder of the selected files
Before and after
Here is a before and after shot. The before is the one at the top.
As always if let me know if you find inaccuracies in my info.
I’ve made a couple of projects that play audio alerts using a dfplayer and MP3 files. One in a clock that plays announcements and the other a countdown timer. When I did those, I used onlinetonegenerator.com to convert text to speech. I liked the voices but it doesn’t have an option to save the audio as an MP3 file. I ended up using Audacity to record the computers audio. It worked but was very tedious. Since then, I’ve been looking for a simpler way.
The important criteria in a text to speech service for me is:
Ability to enter text, listen to the converted audio in the browser and then download it as an mp3 file.
Sufficient audio quality.
Volume level ok.
Suitable lead in and end dead time to allow multiple files to be played in sequence with the result sounding as a smooth sentence. For example these four files together; “The time is”, “eleven”, “thirty two”, “am”.
A suitable voice. They don’t all have the same voices. I prefer some more than others.
Free or good value.
Ability to change speed, pitch and emphasis a bonus.
I’m unable to listen in browser. No play button displays until a file is created and then the player presented appears to use flash and is blocked by my browser. The file can be downloaded and used.
This is the one that I am intending to use in my next project. Lots of features, MP3 file output and a fair amount of free usage. It is:
Free for 3,000 characters (~375 words) per day.
Lots of different voices.
Supports speed, pitch and other effects using tags.
Multiple voices can be used in the one piece of text by using tags.
MP3 file download.
TTSMP3 uses Amazon Polly and comes with quite a few voices and features. Additional effects can be used by using tags in your text. More info about tags is available on this Amazon page.
Here is an example of the voices.
That audio file was created by pasting the text below into the converter. Beware if you do this it will use up most of your daily 3000 word limit.
[speaker:Zeina] Hi, I'm Arabic Zeina
[speaker:Russell] Hi, I'm Russell Australian English Russell
[speaker:Nicole] Hi, I'm Australian English Nicole
[speaker:Camila] Hi, I'm Brazilian Portuguese Camila
[speaker:Ricardo] Hi, I'm Brazilian Portuguese Ricardo
[speaker:Vitória] Hi, I'm Brazilian Portuguese Vitória
[speaker:Emma] Hi, I'm British English Emma
[speaker:Amy] Hi, I'm British English Amy
[speaker:Brian] Hi, I'm British English Brian
[speaker:Chantal] Hi, I'm Canadian French Chantal
[speaker:Enrique] Hi, I'm Castilian Spanish Enrique
[speaker:Lucia] Hi, I'm Castilian Spanish Lucia
[speaker:Conchita] Hi, I'm Castilian Spanish Conchita
[speaker:Zhiyu] Hi, I'm Chinese Mandarin Zhiyu
[speaker:Mads] Hi, I'm Danish Mads
[speaker:Naja] Hi, I'm Danish Naja
[speaker:Ruben] Hi, I'm Dutch Ruben
[speaker:Lotte] Hi, I'm Dutch Lotte
[speaker:Céline] Hi, I'm French Céline
[speaker:Léa] Hi, I'm French Léa
[speaker:Mathieu] Hi, I'm French Mathieu
[speaker:Vicki] Hi, I'm German Vicki
[speaker:Marlene] Hi, I'm German Marlene
[speaker:Hans] Hi, I'm German Hans
[speaker:Karl] Hi, I'm Icelandic Karl
[speaker:Dóra] Hi, I'm Icelandic Dóra
[speaker:Aditi] Hi, I'm Indian English Aditi
[speaker:Raveena] Hi, I'm Indian English Raveena
[speaker:Carla] Hi, I'm Italian Carla
[speaker:Giorgio] Hi, I'm Italian Giorgio
[speaker:Bianca] Hi, I'm Italian Bianca
[speaker:Takumi] Hi, I'm Japanese Takumi
[speaker:Mizuki] Hi, I'm Japanese Mizuki
[speaker:Seoyeon] Hi, I'm Korean Seoyeon
[speaker:Mia] Hi, I'm Mexican Spanish Mia
[speaker:Liv] Hi, I'm Norwegian Liv
[speaker:Ewa] Hi, I'm Polish Ewa
[speaker:Jan] Hi, I'm Polish Jan
[speaker:Maja] Hi, I'm Polish Maja
[speaker:Jacek] Hi, I'm Polish Jacek
[speaker:Inês] Hi, I'm Portuguese Inês
[speaker:Cristiano] Hi, I'm Portuguese Cristiano
[speaker:Carmen] Hi, I'm Romanian Carmen
[speaker:Maxim] Hi, I'm Russian Maxim
[speaker:Tatyana] Hi, I'm Russian Tatyana
[speaker:Astrid] Hi, I'm Swedish Astrid
[speaker:Filiz] Hi, I'm Turkish Filiz
[speaker:Joey] Hi, I'm US English Joey
[speaker:Kimberly] Hi, I'm US English Kimberly
[speaker:Salli] Hi, I'm US English Salli
[speaker:Ivy] Hi, I'm US English Ivy
[speaker:Matthew] Hi, I'm US English Matthew
[speaker:Kendra] Hi, I'm US English Kendra
[speaker:Joanna] Hi, I'm US English Joanna
[speaker:Justin] Hi, I'm US English Justin
[speaker:Miguel] Hi, I'm US Spanish Miguel
[speaker:Lupe] Hi, I'm US Spanish Lupe
[speaker:Penélope] Hi, I'm US Spanish Penélope
[speaker:Gwyneth] Hi, I'm Welsh Gwyneth
[speaker:Geraint] Hi, I'm Welsh English Geraint
Comparisons
Compared with the original audio files that I created by using Audacity to record the PC audio and onlinetonegenerator.com, ttsmp3.com had lower volume. I may have had the record level a bit high when I used Audacity so not sure that the ttsmp3 level is too low.
The bit rate is also different, with the Audacity ones higher. That’s probably because I unnecessarily chose a higher bitrate in Audacity. TTSMP3 was 48kbs.
And that affected the file size. The TTSMP3 is much smaller.
Here are a couple of examples for comparison. For each I created four separate files and then joined them together to see how smooth the transition was. The four files were “The time is”, “11”, “32”, “AM”. I had to be a bit creative with the AM for French Celine as it was pronounced as “am”.
onlinetonegenerator.com Voice is Google français. I like this voice. It has added a lot of character to my speaking clock.
ttsmp3.com Voice is French Celine. It was much easier to create and the timing between files is ok, but the voice doesn’t have the same character to the one above in my opinion
ttsmp3.com This is British Amy. This was just for comparison to see how the same text would sound with an English voice.