Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tăng cường mô hình âm học cho nhận dạng tiếng nói tiếng Việt sử dụng đặc trưng âm học làm đầu vào cho mạng nơron = Improving acoustic models for Vietnamese speech recognition using tonal features as input of neural networks
Nội dung xem thử
Mô tả chi tiết
Nguyin QuIc Bao vd Dtg Tap chi KHOA HQC & CONG NGHE 132(02): 71 -76
IMPROVING ACOUSTIC MODELS FOR VIETNAMESE SPEECH RECOGJVITION
USING TONAL FEATURES AS INPUT OF NEURAL NETWORKS
Nguyen Quoc Bao', Nguyen Thanh Trung, Nguyen Thu Phuong, Pham Thi Huong
College of Information and Communication Technology - TNU
SUMMARY
In this paper, a neural network method for improving acoustic models of Vietnamese speech
recognition is presented Deep neural network (DNN) for acoustic modeling is able to achieve
significant improvements over baseline systems The experiments are carried out on the dataset
containing speeches on Voice of Vietnam channel (VOV). The results show that adding tonal
feature as input feature of the network reached around 18% relative recognition performance. The
DNN using tonal feature for Vietnamese recognition decrease the error rate by 49.6%, compared
to the MFCC baseline.
Keywords: Deep neural network, Vietnamese automatic speech recognition.
INTRODUCTION
In the automatic speech recognition system
(ASR) acoustic model is an important
module. It is used to model the acoustic space
of input feature, The state-of-the-art acoustic
models for speech recognition utilize a
statistical pattern recognition framework
called HMM/GMM (Hidden Markov
Model/Gaussian Mixture Model) [1] with
short time spectral input features. Although
the HMM/GMM approach has been effective
in capturing speech patterns, it has several
inherent limitations. For example, speech
feature vectors at different frames are
assumed to be statistically independent given
the state sequence. Hence, many researchers
have been trying to incorporate the power of
artificial neural networks in acoustic
modeling to improve performance over the
traditional HMM/GMM approach.
In Vietnamese speech recognition system,
another acoustic models problems which can
occur in Vietnamese speech is when there
are similar monosyllabic -words like: vang,
ving, vang, vang, v|ng. ... that will be easily
confused. Therefore, tonal features that
present tone information is an essential part of
the Vietnamese speech recognition system.
Previous studies [2][3][4] showed efforts
toward Vietnamese speech recognition.
However, their systems did not employ the
fiill range of state-of-the-art techniques for
acoustic model.
The purpose of this study is to improve
acoustic model for Vietnamese speech
recognition using tonal features as input of
neural networks. We also show the way to
extract the pitch feature using modified
algorithm which can achieve large
improvement. The rest of this paper is
organized as follows. Next section, a brief
description automatic speech recognition.
This is followed by Section 3 which shows
the way to extract the pitch features. Section
4, we briefly describe the deep neural network
architecture for acoustic modeling. Sections 5
and 6, the experiments setup and results are
presented. Finally, conclusions and future
research are given in the last section.
ACOUSTIC MODEL IN SPEECH
RECOGNITION
This section presents a brief introduction of
the acoustic model in ASR system.
' Tel: 0919 114252. Email. [email protected]. Figure \.A left-to-right HMMmodel