How to squeeze a reasonable TTS out of OpenSUSE Leap

Hello community,

I need to get a reasonable TTS voice(s) working out of speach-dispatcher (SD). The SD works with a lot of synthetizers.

Itried espeak-ng, which was a default one in OpenSUSE, but it has poor quality voices and when searching online I didn’t find any information on improving it.

I was then pleased with online samples of Festival, so i set on testing it. However, after installation I found that the default voices are poor. Searching, I learned that there are few types of voices, like:

  • Festvox diphone (default, poor quality)
  • MBROLA
  • CMU Arctic
  • Nitech HTS

After doing some more research, I opted for the japaneese Nitech HTS voices (which seem to be the best quality and least demanding), downloaded them and installed. I then modified festival.scm to use ALSA and US BDL HTS-type voice:

(Parameter.set 'Audio_Command \"aplay -q -c 1 -t raw -f s16 -r $SR $FILE\")
(Parameter.set 'Audio_Method 'Audio_Command)
(set! voice_default 'voice_nitech_us_bdl_arctic_hts)

But it didn’t work and festival crashed on startup, everytime I tried to use the HTS voice.
To cut story short, after a lot of research, I found that Festival’s (2.5) compatibility with HTS voices is broken. And seemingly, this happens over and over again and they say Festival does that on purpose to push their offer for commercial voices.

Next, I found patched voices that work with Festival 2.4 and patches for Festival itself. There I spent another amount of time, so far without results, because I’m not a Linux developer (thankfully at last .NET) and I’m not accustomed to patch and build Linux programs at all. Though, I remember the old SuSE days about 20 years ago, where we had to build a lot of programs with ./configure, make, make install. But running into difficulties with the patches being for another distro, I’m not sure how to apply them.

So, I would like to ask: Is there any proven way to get quality TTS voices on OpenSUSE Leap? What should I do?

Oak

Haven’t looked at TTS in awhile, a quick look around suggests the same things I played around with are still mostly used if you don’t want to use proprietary products

Praat is something new and looks interesting,
http://www.praatvocaltoolkit.com/text-to-speech.html

I see you have already trialed espeak so maybe you’ll have no problem setting praat up. Although not required for TTS (AFAIK) it uses machine learning and neural networks, likely for voice recognition.

Praat is in the openSUSE repos

TSU

Thank you for your suggestion. I’m not sure if you meant it that way; Praat is not a synthethizer, it acutally uses espeak, which I ruled out due to terrible voices. Well, I’m not certain about it, but I did not find any better voices or improvement of sampling for the espeak. Default Festival voices are better just by a hair, I’d say, but it can be improved, it’s just not that simple for me with the downgrading, patching, compiling,…

So you’re over the HW failure and your new box is up and running? :slight_smile:

Oak

According to descriptions, Praat is actually a voice synthesizer, and highly modifiable.
eSpeak is used only for the text recognition, and then passes its result to Praat for the actual audio.

Am typing this post on my new machine now…
It’ll probably be close to a week before it’ll be even close to having all the tools installed that I use…
Having a blast with this Optane acceleration, it really does seem to write approx 2x a regular 7200 rpm HDD, and reads pretty fast (don’t know if it’s really 10x a SSD, I can’t judge relative speeds that different).
Building a new VBox VM was pretty fast…you can see each package installing 2x faster than usual…
I’ve been avoiding secure boot in the past for what I deemed an unnecessary complication but it’s time I see this thing up close and deal with its problems…

TSU

OK, maybe I got it wrong, based on this:

Praat Vocal Toolkit (http://www.praatvocaltoolkit.com/text-to-speech.html)
*Text to Speech
**This command converts any text into speech by using the eSpeak speech synthesizer included in Praat.
*
If it wasn’t synthesizer and just a kind of dispatcher, it would not make sense to use it with speech-dispatcher.

But the main problem is, that looking into speach-dispatcher supported modules list (which is very rich), there is no support for praat. And that is a requirement as stated at the beginning, because the ATC software I use is using speach-dispatcher…

I downloaded patched 2.4 sources (all the parts) and successfully build the software. However, when I run festival, it shows it’s 2.5 version, so apparently now I have the two - patched 2.4 and unpatched “broken” 2.5 - together on my system. I am still analyzing it and hopefully I’ll be able to replace broken festival for the patched one. I’ll post the solution, once achieved.

Great to hear about the progress. I’m about 4 weeks in with my new box and still configuring the software. I too use VirtualBox, now due to VisualStudio. I have 3 SSD hard drives in the new machine, 8 cores, fast 32GB RAM, a good MB - and it shows :-). I’m really happy with the performance. I wanted two separate SSDs (and got 3rd for free with my whole purchase) in case I need a dual boot, but managed to get rid of the dependency on hi-demanding SW I used in Windows on my laptop and the VS is happy to run in virtual environment. One has to be careful about SSDs, there are lot of old ones available and speeds and duration differences are huge; I made sure to choose hi-spec ones.
I have 2nd Linux box with old 4 core CPU and complicated hybrid setup (SSD for the system and 7200 rpm HDD for logs, home, etc.) with the same Leap 15.1 and the difference is huge. In my case, I think the difference in installing software is way more than double, the total startup time is at least 10x, even though the old box is hybrid. The old box faster at the beginning, but since it has logs and home on HDD, it gets slower over the time especially during KDE loading.
I didn’t opted for a secure boot, because it’s a security feature targeted at the physical security (at least I have a vague idea it’s like that) and since my PCs are resting at home. It won’t help against BIOS/UEFI attack from within the installed certified system, it seems.

OK, so after more than 6 weeks into the issue, I cracked the OenSUSE TTS problem. I summed some of that effort into one picture:

https://i.postimg.cc/HcY4MSBx/Open-SUSE-TTS.png](https://postimg.cc/HcY4MSBx)

The last problem with unbound variable was happening with the marked voices, but upon computer and festival restart, those voices were working and other (KAL, CLB) were not. I don’t know a reason yet, it nearly seems like Festival can take only some amount of voice switching (which I switched a lot, during testing).

My understanding (and I am not a specialist in any way, just a plain user) of available TTS quality is this:

https://i.postimg.cc/N9d8k1zx/TTSquality.png](https://postimg.cc/N9d8k1zx)

This means, that now I have a reasonable TTS on my OpenSUSE. I still can see some (guestimate) 10% quality difference between what’s producing my Festival and what is producing an online testing app.

Interesting study,
I’d encourage you create a project others can join and if you can describe a recommended method (or multiple methods) of study and analysis others can follow,

  1. Others might be willing to confirm or improve on anything you might have provided as a baseline
  2. The other projects you mention in the course of your research might themselves identify ways for them to improve their own work
  3. Could lead to a simpler solution that implements your findings.

Projects can be created on
github or gitlab - Free places (for public projects) which incorporate social media practices to make your project easy to find and people to both copy and contribute.
An openSUSE Wiki page (see my signature to all my posts) - Anyone can create a Wiki for any purpose. It’ll likely be seen mostly by openSUSE only people (unlike github or gitlab) but you’ll likely have less issues about controlling your content if it’s clearly associated with your User account

TSU