Logo Kérwá
 

LSTM deep neural networks postfiltering for enhancing synthetic voices

dc.creatorCoto Jiménez, Marvin
dc.creatorGoddard Close, John
dc.date.accessioned2022-03-24T17:07:44Z
dc.date.available2022-03-24T17:07:44Z
dc.date.issued2018
dc.description.abstractRecent developments in speech synthesis have produced systems capable of producing speech which closely resembles natural speech, and researchers now strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. Speech synthesis based on Hidden Markov Models (HMM) is of great interest to researchers, due to its ability to produce sophisticated features with a small footprint. Despite some progress, its quality has not yet reached the level of the current predominant unit-selection approaches, which select and concatenate recordings of real speech, and work has been conducted to try to improve HMM-based systems. In this paper, we present an application of long short-term memory (LSTM) deep neural networks as a postfiltering step in HMM-based speech synthesis. Our motivation stems from a similar desire to obtain characteristics which are closer to those of natural speech. The paper analyzes four types of postfilters obtained using five voices, which range from a single postfilter to enhance all the parameters, to a multi-stream proposal which separately enhances groups of parameters. The different proposals are evaluated using three objective measures and are statistically compared to determine any significance between them. The results described in the paper indicate that HMM-based voices can be enhanced using this approach, specially for the multi-stream postfilters on the considered objective measures.es_ES
dc.description.procedenceUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctricaes_ES
dc.description.sponsorshipUniversidad de Costa Rica/[]/UCR/Costa Ricaes_ES
dc.description.sponsorshipConsejo Nacional de Ciencia y Tecnología/[CB-2012-01, No.182432]/CONACyT/Méxicoes_ES
dc.identifier.citationhttps://www.worldscientific.com/doi/abs/10.1142/S021800141860008Xes_ES
dc.identifier.doi10.1142/S021800141860008X
dc.identifier.issn1793-6381
dc.identifier.urihttps://hdl.handle.net/10669/86283
dc.language.isoenges_ES
dc.sourceInternational Journal of Pattern Recognition and Artificial Intelligence, vol.32(1), pp.1-24.es_ES
dc.subjectLong short-term memory (LSTM)es_ES
dc.subjectSpeech synthesises_ES
dc.subjectPostfilteringes_ES
dc.subjectDeep learninges_ES
dc.titleLSTM deep neural networks postfiltering for enhancing synthetic voiceses_ES
dc.typeartículo originales_ES

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
IJPRAI.pdf
Size:
4.72 MB
Format:
Adobe Portable Document Format
Description:
Artículo principal

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.5 KB
Format:
Item-specific license agreed upon to submission
Description: