Speech synthesis based on Hidden Markov Models and deep learning
Loading...
Date
Authors
Coto Jiménez, Marvin
Goddard Close, John
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Speech synthesis based on Hidden Markov Models (HMM)
and other statistical parametric techniques have been a hot topic for
some time. Using this techniques, speech synthesizers are able to produce
intelligible and flexible voices. Despite progress, the quality of the voices
produced using statistical parametric synthesis has not yet reached the
level of the current predominant unit-selection approaches, that select
and concatenate recordings of real speech. Researchers now strive to
create models that more accurately mimic human voices. In this paper,
we present our proposal to incorporate recent deep learning algorithms,
specially the use of Long Short-term Memory (LSTM) to improve the
quality of HMM-based speech synthesis. Thus far, the results indicate
that HMM-voices can be improved using this approach in its spectral
characteristics, but additional research should be conducted to improve
other parameters of the voice signal, such as energy and fundamental
frequency, to obtain more natural sounding voices.
Description
Keywords
Long short-term memory (LSTM), Hidden Markov Models (HMM), Speech synthesis, Statistical parametric speech synthesis, Deep learning
Citation
https://www.rcs.cic.ipn.mx/2016_112/