Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tăng cường mô hình âm học cho nhận dạng tiếng nói tiếng Việt sử dụng đặc trưng âm học làm đầu vào cho mạng nơron = Improving acoustic models for Vietnamese speech recognition using tonal features as input of neural networks
MIỄN PHÍ
Số trang
6
Kích thước
260.7 KB
Định dạng
PDF
Lượt xem
1111

Tăng cường mô hình âm học cho nhận dạng tiếng nói tiếng Việt sử dụng đặc trưng âm học làm đầu vào cho mạng nơron = Improving acoustic models for Vietnamese speech recognition using tonal features as input of neural networks

Nội dung xem thử

Mô tả chi tiết

Nguyin QuIc Bao vd Dtg Tap chi KHOA HQC & CONG NGHE 132(02): 71 -76

IMPROVING ACOUSTIC MODELS FOR VIETNAMESE SPEECH RECOGJVITION

USING TONAL FEATURES AS INPUT OF NEURAL NETWORKS

Nguyen Quoc Bao', Nguyen Thanh Trung, Nguyen Thu Phuong, Pham Thi Huong

College of Information and Communication Technology - TNU

SUMMARY

In this paper, a neural network method for improving acoustic models of Vietnamese speech

recognition is presented Deep neural network (DNN) for acoustic modeling is able to achieve

significant improvements over baseline systems The experiments are carried out on the dataset

containing speeches on Voice of Vietnam channel (VOV). The results show that adding tonal

feature as input feature of the network reached around 18% relative recognition performance. The

DNN using tonal feature for Vietnamese recognition decrease the error rate by 49.6%, compared

to the MFCC baseline.

Keywords: Deep neural network, Vietnamese automatic speech recognition.

INTRODUCTION

In the automatic speech recognition system

(ASR) acoustic model is an important

module. It is used to model the acoustic space

of input feature, The state-of-the-art acoustic

models for speech recognition utilize a

statistical pattern recognition framework

called HMM/GMM (Hidden Markov

Model/Gaussian Mixture Model) [1] with

short time spectral input features. Although

the HMM/GMM approach has been effective

in capturing speech patterns, it has several

inherent limitations. For example, speech

feature vectors at different frames are

assumed to be statistically independent given

the state sequence. Hence, many researchers

have been trying to incorporate the power of

artificial neural networks in acoustic

modeling to improve performance over the

traditional HMM/GMM approach.

In Vietnamese speech recognition system,

another acoustic models problems which can

occur in Vietnamese speech is when there

are similar monosyllabic -words like: vang,

ving, vang, vang, v|ng. ... that will be easily

confused. Therefore, tonal features that

present tone information is an essential part of

the Vietnamese speech recognition system.

Previous studies [2][3][4] showed efforts

toward Vietnamese speech recognition.

However, their systems did not employ the

fiill range of state-of-the-art techniques for

acoustic model.

The purpose of this study is to improve

acoustic model for Vietnamese speech

recognition using tonal features as input of

neural networks. We also show the way to

extract the pitch feature using modified

algorithm which can achieve large

improvement. The rest of this paper is

organized as follows. Next section, a brief

description automatic speech recognition.

This is followed by Section 3 which shows

the way to extract the pitch features. Section

4, we briefly describe the deep neural network

architecture for acoustic modeling. Sections 5

and 6, the experiments setup and results are

presented. Finally, conclusions and future

research are given in the last section.

ACOUSTIC MODEL IN SPEECH

RECOGNITION

This section presents a brief introduction of

the acoustic model in ASR system.

' Tel: 0919 114252. Email. [email protected]. Figure \.A left-to-right HMMmodel

Tải ngay đi em, còn do dự, trời tối mất!