Summary
Each national language is described by specific grammatical rules. But rule-based knowledge representations alone cannot be used for the natural flow of speech. In this paper, optimisation of the naturalness of speech, i.e. the optimal choice of phonetic and phonologic parameters for prosody modelling is sought. We will try to find relevant features (speech parameters) having the basic influence on the fundamental frequency and duration of speech units. If the prosody of the synthesizer is controlled by an artificial neural network (ANN), optimisation of the ANN topology is necessary. The topology of the ANN is also dependent on the number of input neurons which represent the most important speech parameters. The pruning of the ANN based on the several approaches (GUHA method, sensitivities of the synaptic weights, etc.) is a suitable tool for reducing the ANN structure.
See the full content of this document
Extract
Prosody Optimisation of a Czech Language Synthesizer
(ProQuest: ... denotes formulae omitted.)
1. IntroductionSynthetic speech, its production, analysis, intelligibility and naturalness are in great demand in the information society. The speech signal character is a cause of difficult and imperfect processing. It is a very complex system based on technical, human, physiological, phonological and phonetic properties. Some of the speech attributes, the prosodie parameters, depend on phonological and phonetic properties. Prosody is very important for any kind of synthetic speech. Improper prosody is namely one of the differences between the natural and synthetic speech. Our effort is to minimize these differences.Many research teams around the world are engaged in the modelling of the prosody of the synthetic speech. This problem must be solved in dependency on the specific attributes of different languages: e.g. [11] for English, [12] and [8] for German, [5] for French, [9] for Japanese, [6] for Korean and [4] for Mandarin. The majority of the prosody control systems is based on the implementation of grammatical rules, e.g., realised by decision trees, but some researchers (Sejnowski, Traber, Riedi), including the authors of this paper, use the neural networks for prosody modelling. Different input parameters with a significant impact on the speech prosody have to be used for neural network training in different languages. Therefore, it is very difficult, nearly impossible, to compare the results of the prosody controllers for different languages.The most complex evaluation is the listening test, but it is very subjective and cannot be described by an objective metric. The prosody depends o...See the full content of this document
Sponsored links
