Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

modeling the prosody of vietnamese language for speech synthesis
PREMIUM
Số trang
105
Kích thước
1.9 MB
Định dạng
PDF
Lượt xem
1895

modeling the prosody of vietnamese language for speech synthesis

Nội dung xem thử

Mô tả chi tiết

MINISTRY OF EDUCATION AND TRAINING

HANOI UNIVERSITY OF TECHNOLOGY

-------------------------------

Thesis for the degree of

MASTER OF SCIENCE

Modeling the prosody

of Vietnamese language for speech

synthesis

Speciality: “Information processing and Communication”

Code:23.04.3898

MẠC ĐĂNG KHOA

Supervisor:

Prof. PHẠM THỊ NGỌC YẾN

Hanoi, 2007

Faculty of Information Technology

International research center of

Multimedia Information, Communication and Application

- 1 -

Master thesis

Mạc Đăng Khoa

Acknowledgment

Many people provided me generous help and inspiration during my time of master

student.

First, I would like to express my deep sense of respect and gratitude towards my

supervisors: Dr. Eric Castelli and Prof. Phạm Thị Ngọc Yến. Thank you very much

for orienting and guiding my research in speech processing domain. Thank you for

all your useful advices, your true criticisms and your patience during my time of

master research.

Special thanks also goes to Mrs. Geneviève Caelen-Haumont, PhD students Trần

Đỗ Đạt, Vũ Minh Quang and all members of MICA’s speech group. I could not

have done this thesis without your supports. Thank all of you for all your

suggestions and your sincere remarks on entire of my research.

I would like to thank to Ms. Đoàn Thị Ngọc Hiền, who guiding me in recording the

corpus. I would also like to thank to a lot of MICA member who spent much of time

for recording and testing for my research.

I am grateful to Prof. Nguyễn Trọng Giảng and MICA’s directorate supporting me

the best convenient conditions during time working in International Research

Center MICA.

Finally, I owe a great deal to my parents and my sister for their continued support. I

also give a very special thanks to my girl friend for her constant encouragement,

giving me strength and motivation in my work and in my life.

- 2 -

Master thesis

Mạc Đăng Khoa

Abstract

Text-To-Speech (TTS) system is a computer system which is able to produce the

speech from the text. In the TTS system, the naturalness of the produced speech

depends greatly on the variation of pitch, duration and energy during speaking. We

call it the “prosody controlling ability”. A TTS system with good prosody

controlling ability can be simulate the human speech prosody corresponding to the

context of speaking.

With tonal languages such as Vietnamese, the prosody of an utterance is the

combination results of the two components: "micro-prosody" corresponding to the

tone of each syllable in a sentence and "macro-prosody" corresponding to the whole

sentence.

The main goal of this thesis is to model the characteristics of Vietnamese prosody

for speech synthesis. It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative.

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters. Base on the extracted data, we defined seventy-two simple prosody

patterns for Vietnamese syllables in three types of sentence. After that, these

patterns were applied to synthesize some simple sentences. Finally, some perception

experiments were taken to evaluate these synthesized sentences. The results shown

that the proposed patterns can be applied successfully to generate the prosody of

simple sentence.

This work is our preliminary work in Vietnamese prosody, just concerning the

sentence types and the position of syllable in a sentence. In the future, we expect to

continue this research with more factors of Vietnamese prosody, improve our

pattern and apply them Vietnamese TTS system.

- 3 -

Master thesis

Mạc Đăng Khoa

- 4 -

Master thesis

Mạc Đăng Khoa

List of Figures

Figure 1-1: Category of methods for predicting syllable duration [6]....................23

Figure 2-1: Example of the contours of six tones, as described in [21]...................30

Figure 2-2: The shape of Tone 1 with female and male voice [18].........................31

Figure 2-3: The shape of Tone 2 with female and male voice [18].........................31

Figure 2-4: The shape of Tone 3 with female and male voice [18].........................32

Figure 2-5: The shape of Tone 4 with female and male voice [18].........................32

Figure 2-6: The shape of Tone 5 with female and male voice [18].........................32

Figure 2-7: The shape of Tone 5b with female and male voice [18].......................33

Figure 2-8: The shape of Tone 6 with female and male voice [18].........................33

Figure 2-9: The shape of Tone 6b with female and male voice [18].......................34

Figure 2-10: Sentence classification by structure [20]............................................35

Figure 2-11: The sentences “Lan thích ăn cơm không” in......................................36

Figure 2-12: The sentences “Bảo cố gắng tập đi” in...............................................36

Figure 2-13: The sentences “Tân bỏ đi chứ” in ......................................................37

Figure 2-14: The differences of F0 contour between Assertive and Interrogative

sentence [16] .........................................................................................................37

Figure 3-1: A general function diagram of TTS system [13]..................................41

Figure 3-2: Fujisaki model.....................................................................................46

Figure 3-3: Fujisaki model for tonal language [19] ................................................46

Figure 3-4: Function diagram of proposal TTS system ..........................................47

Figure 3-5: Prosody generation module .................................................................48

Figure 4-1: Key-syllable segmentation ..................................................................56

Figure 4-2: Extracting F0 contour using PRAAT...................................................57

Figure 4-3: An example of prosody pattern............................................................60

Figure 5-1: An example of synthesized non-sense phrase ......................................73

Figure 5-2: Perception test 1..................................................................................74

Figure 5-3: An example of synthesized multi-type sentences.................................80

- 5 -

Master thesis

Mạc Đăng Khoa

Figure 5-4: Interface for Perception test 2..............................................................82

Figure 5-5: Correct recognition rate with 8 tones of last syllable ...........................85

Figure 5-6: Correct recognition rate (%) with other types of sentences ..................86

Figure 5-7: Result comparison of three experiments ..............................................87

- 6 -

Master thesis

Mạc Đăng Khoa

List of Tables

Table 1.1: Prosody functions .................................................................................16

Table 1.2:Links between levels of representation of prosodic phenomena [13]......17

Table 1.3: Intonation model classification .............................................................18

Table 2.1:Vietnamese vowels. ...............................................................................27

Table 2.2:Vietnamese consonants..........................................................................28

Table 2.3: Arrangement of Vietnamese consonants. ..............................................28

Table 2.4:The phonological hierarchy of Vietnamese syllables with total numbers of

each phonetic unit [14]. .........................................................................................29

Table 2.5 The six Vietnamese tones.......................................................................30

Table 3.1: Comparison between direct pattern and model pattern ..........................50

Table 4.1: Prosody corpus structure.......................................................................52

Table 4.2: Prosody corpus text information ...........................................................53

Table 4.3: Recording information of Prosody corpus.............................................54

Table 5.1: Confusion matrix (in %) for 8 tones with male voice ............................75

Table 5.2: Confusion matrix (in %) for 8 tones with female voice .........................75

Table 5.3: Confusion matrix (%) of sentence types with male voice .....................76

Table 5.4: Confusion matrix (%) of sentence types with female voice ..................77

Table 5.5: Test data for Experiment 2....................................................................79

Table 5.6: Confusion matrix (in %) of sentence types (with male voice)................82

Table 5.7: Confusion matrix (in %) of sentence types (with female voice) ............83

Table 5.8: Confusion matrix (in %) of sentence types (average of Male and Female)

..............................................................................................................................84

Table 5.9: Correct recognition rate (%) with other types of sentences....................86

Table 5.10: Result of three experiments.................................................................87

- 7 -

Master thesis

Mạc Đăng Khoa

Table of contents

Acknowledgment.......................................................................................... 1

Abstract ........................................................................................................ 2

List of Figures............................................................................................... 4

List of Tables ................................................................................................ 6

Table of contents .......................................................................................... 7

0 INTRODUCTION ................................................................................. 9

1 PROSODY AND PROSODIC MODEL............................................. 12

1.1. Overview of prosody ...........................................................................................12

1.1.1. The concept of prosody............................................................................................................ 12

1.1.2. Major components of prosody ................................................................................................. 13

1.1.3. The functions of prosody ......................................................................................................... 14

1.1.4. Levels of representation of prosodic phenomena..................................................................... 16

1.2. Prosody modeling ................................................................................................17

1.2.1. Intonation models..................................................................................................................... 18

1.2.2. Duration modeling ................................................................................................................... 21

1.2.3. This thesis work approach........................................................................................................ 23

2 VIETNAMESE LANGUAGE AND PROSODY ............................... 25

2.1. Vietnamese language ...........................................................................................25

2.1.1. Vietnamese characteristics....................................................................................................... 25

2.1.2. Vietnamese phoneme system ................................................................................................... 27

2.1.3. Syllable structure ..................................................................................................................... 29

2.2. Vietnamese prosody.............................................................................................29

2.2.1. Micro-prosody and tones system in Vietnamese...................................................................... 30

2.2.2. Macro-prosody and sentence types in Vietnamese .................................................................. 34

2.2.3. Some special phenomena in Vietnamese prosody ................................................................... 38

3 TTS SYSTEM AND PROSODY GENERATION ............................. 40

3.1. An overview of TTS system ................................................................................40

3.2. Prosody generation ..............................................................................................41

3.2.1. Overview of prosody generation.............................................................................................. 41

3.2.2. From text to prosody................................................................................................................ 43

3.3. Other researches and our proposal.......................................................................45

4 PROSODY PATTERNS EXTRACTION .......................................... 51

4.1. Prosody corpus.....................................................................................................51

Tải ngay đi em, còn do dự, trời tối mất!