Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

modeling the prosody of vietnamese language for speech synthesis
Nội dung xem thử
Mô tả chi tiết
MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF TECHNOLOGY
-------------------------------
Thesis for the degree of
MASTER OF SCIENCE
Modeling the prosody
of Vietnamese language for speech
synthesis
Speciality: “Information processing and Communication”
Code:23.04.3898
MẠC ĐĂNG KHOA
Supervisor:
Prof. PHẠM THỊ NGỌC YẾN
Hanoi, 2007
Faculty of Information Technology
International research center of
Multimedia Information, Communication and Application
- 1 -
Master thesis
Mạc Đăng Khoa
Acknowledgment
Many people provided me generous help and inspiration during my time of master
student.
First, I would like to express my deep sense of respect and gratitude towards my
supervisors: Dr. Eric Castelli and Prof. Phạm Thị Ngọc Yến. Thank you very much
for orienting and guiding my research in speech processing domain. Thank you for
all your useful advices, your true criticisms and your patience during my time of
master research.
Special thanks also goes to Mrs. Geneviève Caelen-Haumont, PhD students Trần
Đỗ Đạt, Vũ Minh Quang and all members of MICA’s speech group. I could not
have done this thesis without your supports. Thank all of you for all your
suggestions and your sincere remarks on entire of my research.
I would like to thank to Ms. Đoàn Thị Ngọc Hiền, who guiding me in recording the
corpus. I would also like to thank to a lot of MICA member who spent much of time
for recording and testing for my research.
I am grateful to Prof. Nguyễn Trọng Giảng and MICA’s directorate supporting me
the best convenient conditions during time working in International Research
Center MICA.
Finally, I owe a great deal to my parents and my sister for their continued support. I
also give a very special thanks to my girl friend for her constant encouragement,
giving me strength and motivation in my work and in my life.
- 2 -
Master thesis
Mạc Đăng Khoa
Abstract
Text-To-Speech (TTS) system is a computer system which is able to produce the
speech from the text. In the TTS system, the naturalness of the produced speech
depends greatly on the variation of pitch, duration and energy during speaking. We
call it the “prosody controlling ability”. A TTS system with good prosody
controlling ability can be simulate the human speech prosody corresponding to the
context of speaking.
With tonal languages such as Vietnamese, the prosody of an utterance is the
combination results of the two components: "micro-prosody" corresponding to the
tone of each syllable in a sentence and "macro-prosody" corresponding to the whole
sentence.
The main goal of this thesis is to model the characteristics of Vietnamese prosody
for speech synthesis. It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative.
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters. Base on the extracted data, we defined seventy-two simple prosody
patterns for Vietnamese syllables in three types of sentence. After that, these
patterns were applied to synthesize some simple sentences. Finally, some perception
experiments were taken to evaluate these synthesized sentences. The results shown
that the proposed patterns can be applied successfully to generate the prosody of
simple sentence.
This work is our preliminary work in Vietnamese prosody, just concerning the
sentence types and the position of syllable in a sentence. In the future, we expect to
continue this research with more factors of Vietnamese prosody, improve our
pattern and apply them Vietnamese TTS system.
- 3 -
Master thesis
Mạc Đăng Khoa
- 4 -
Master thesis
Mạc Đăng Khoa
List of Figures
Figure 1-1: Category of methods for predicting syllable duration [6]....................23
Figure 2-1: Example of the contours of six tones, as described in [21]...................30
Figure 2-2: The shape of Tone 1 with female and male voice [18].........................31
Figure 2-3: The shape of Tone 2 with female and male voice [18].........................31
Figure 2-4: The shape of Tone 3 with female and male voice [18].........................32
Figure 2-5: The shape of Tone 4 with female and male voice [18].........................32
Figure 2-6: The shape of Tone 5 with female and male voice [18].........................32
Figure 2-7: The shape of Tone 5b with female and male voice [18].......................33
Figure 2-8: The shape of Tone 6 with female and male voice [18].........................33
Figure 2-9: The shape of Tone 6b with female and male voice [18].......................34
Figure 2-10: Sentence classification by structure [20]............................................35
Figure 2-11: The sentences “Lan thích ăn cơm không” in......................................36
Figure 2-12: The sentences “Bảo cố gắng tập đi” in...............................................36
Figure 2-13: The sentences “Tân bỏ đi chứ” in ......................................................37
Figure 2-14: The differences of F0 contour between Assertive and Interrogative
sentence [16] .........................................................................................................37
Figure 3-1: A general function diagram of TTS system [13]..................................41
Figure 3-2: Fujisaki model.....................................................................................46
Figure 3-3: Fujisaki model for tonal language [19] ................................................46
Figure 3-4: Function diagram of proposal TTS system ..........................................47
Figure 3-5: Prosody generation module .................................................................48
Figure 4-1: Key-syllable segmentation ..................................................................56
Figure 4-2: Extracting F0 contour using PRAAT...................................................57
Figure 4-3: An example of prosody pattern............................................................60
Figure 5-1: An example of synthesized non-sense phrase ......................................73
Figure 5-2: Perception test 1..................................................................................74
Figure 5-3: An example of synthesized multi-type sentences.................................80
- 5 -
Master thesis
Mạc Đăng Khoa
Figure 5-4: Interface for Perception test 2..............................................................82
Figure 5-5: Correct recognition rate with 8 tones of last syllable ...........................85
Figure 5-6: Correct recognition rate (%) with other types of sentences ..................86
Figure 5-7: Result comparison of three experiments ..............................................87
- 6 -
Master thesis
Mạc Đăng Khoa
List of Tables
Table 1.1: Prosody functions .................................................................................16
Table 1.2:Links between levels of representation of prosodic phenomena [13]......17
Table 1.3: Intonation model classification .............................................................18
Table 2.1:Vietnamese vowels. ...............................................................................27
Table 2.2:Vietnamese consonants..........................................................................28
Table 2.3: Arrangement of Vietnamese consonants. ..............................................28
Table 2.4:The phonological hierarchy of Vietnamese syllables with total numbers of
each phonetic unit [14]. .........................................................................................29
Table 2.5 The six Vietnamese tones.......................................................................30
Table 3.1: Comparison between direct pattern and model pattern ..........................50
Table 4.1: Prosody corpus structure.......................................................................52
Table 4.2: Prosody corpus text information ...........................................................53
Table 4.3: Recording information of Prosody corpus.............................................54
Table 5.1: Confusion matrix (in %) for 8 tones with male voice ............................75
Table 5.2: Confusion matrix (in %) for 8 tones with female voice .........................75
Table 5.3: Confusion matrix (%) of sentence types with male voice .....................76
Table 5.4: Confusion matrix (%) of sentence types with female voice ..................77
Table 5.5: Test data for Experiment 2....................................................................79
Table 5.6: Confusion matrix (in %) of sentence types (with male voice)................82
Table 5.7: Confusion matrix (in %) of sentence types (with female voice) ............83
Table 5.8: Confusion matrix (in %) of sentence types (average of Male and Female)
..............................................................................................................................84
Table 5.9: Correct recognition rate (%) with other types of sentences....................86
Table 5.10: Result of three experiments.................................................................87
- 7 -
Master thesis
Mạc Đăng Khoa
Table of contents
Acknowledgment.......................................................................................... 1
Abstract ........................................................................................................ 2
List of Figures............................................................................................... 4
List of Tables ................................................................................................ 6
Table of contents .......................................................................................... 7
0 INTRODUCTION ................................................................................. 9
1 PROSODY AND PROSODIC MODEL............................................. 12
1.1. Overview of prosody ...........................................................................................12
1.1.1. The concept of prosody............................................................................................................ 12
1.1.2. Major components of prosody ................................................................................................. 13
1.1.3. The functions of prosody ......................................................................................................... 14
1.1.4. Levels of representation of prosodic phenomena..................................................................... 16
1.2. Prosody modeling ................................................................................................17
1.2.1. Intonation models..................................................................................................................... 18
1.2.2. Duration modeling ................................................................................................................... 21
1.2.3. This thesis work approach........................................................................................................ 23
2 VIETNAMESE LANGUAGE AND PROSODY ............................... 25
2.1. Vietnamese language ...........................................................................................25
2.1.1. Vietnamese characteristics....................................................................................................... 25
2.1.2. Vietnamese phoneme system ................................................................................................... 27
2.1.3. Syllable structure ..................................................................................................................... 29
2.2. Vietnamese prosody.............................................................................................29
2.2.1. Micro-prosody and tones system in Vietnamese...................................................................... 30
2.2.2. Macro-prosody and sentence types in Vietnamese .................................................................. 34
2.2.3. Some special phenomena in Vietnamese prosody ................................................................... 38
3 TTS SYSTEM AND PROSODY GENERATION ............................. 40
3.1. An overview of TTS system ................................................................................40
3.2. Prosody generation ..............................................................................................41
3.2.1. Overview of prosody generation.............................................................................................. 41
3.2.2. From text to prosody................................................................................................................ 43
3.3. Other researches and our proposal.......................................................................45
4 PROSODY PATTERNS EXTRACTION .......................................... 51
4.1. Prosody corpus.....................................................................................................51