Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

A multifunctional embedded system based on deep learning for assisting the cognition of visually impaired people
Nội dung xem thử
Mô tả chi tiết
逢 甲 大 學
資訊工程學系
博 士 論 文
基於深度學習來輔助視覺障礙者認知
之多功能嵌入式系統
A Multifunctional Embedded System Based on
Deep Learning for Assisting the Cognition of
Visually Impaired People
指導教授:竇其仁
林峰正
研 究 生:吳友輝
中 華 民 國 一 百 一 十 年 一 月
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
i FCU e-Theses & Dissertations (2021)
Acknowledgement
First and foremost, I would like to express my sincere gratitude to my advisor,
Prof. Chyi-Ren Dow, for his motivation, extensive experience, and immense
knowledge. I am very grateful for all his ideas, time, and funding contributions that laid
the foundation for my research experience. The passion and enthusiasm he has for the
research were inspired and motivational to me, especially during tough periods in the
Ph.D. pursuit. I am also thankful for the excellent pieces of advice he has offered as an
outstanding professor. These pieces of advice are valuable lessons for me in all the time
of research and the future. It is an honor for me to be one of his Ph.D. students. Again,
I would like to convey my heartfelt gratitude to him.
I would like to express my sincere gratitude to my co-advisor, Prof. Feng-Cheng
Lin, for his scientific advice, knowledge, and valuable guidance. I am very grateful for
all his ideas, time, and the supported devices. He always encourages and helps me to
promote strengths in my research, especially appreciates our research results. I am also
thankful for his excellent pieces of advice. It is an honor for me to be his first Ph.D.
student. From the bottom of my heart, I would like to express my sincere gratitude to
him again.
I would like to thank my dissertation committee: Prof. Hsiao-Hsi Wang,
Prof. Tsung-Chuan Huang, Prof. Lin-Huang Chang, Prof. Cheng-Min Lin, and Prof.
Hsi-Min Chen, for their meaningful suggestions, which help me continue to improve
and develop my research.
In addition, I am indebted to Szu-Yi Ho (Toni), who gave me numerous insightful
discussions and suggestions. She supported and shared with me to resolve my faced
challenges in related research issues, especially publishing papers.
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
ii FCU e-Theses & Dissertations (2021)
I would always remember my fellow labmates at the Mobile Computing lab for
the inspiring discussions, unconditional supports, friendship, and for all the fun-time
we have had in the last four years. In particular, my gratitude goes to Ms. Yu-Yun Chang
(Amber) and Mr. Kuan-Chieh Wang (Rich) for providing essential local supports during
the years.
Last but not least, I am grateful to my family members for all their encouragements
and faith in me. They gave me enough moral support, encouragement, and motivation
to accomplish the personal goals. And most of all for my parents, who raised me up
with unconditional love and gave me unlimited support in every decision I have made.
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
iii FCU e-Theses & Dissertations (2021)
摘要
視力障礙的人在生活中面臨許多困難,例如,無輔助導航,獲取訊息和情境
感知。儘管許多智慧型裝置可用來幫助視障人士,但大多數只在提供導航幫助和
避障。在本研究中,我們專注於情境感知和周遭物件辨識。與大多數主從式架構
或是單台桌機運算所進行的研究不同,我們提出了一種基於深度學習的多功能嵌
入式系統,以幫助視覺障礙者對周遭環境的認知。我們提出的系統還克服了使用
上的區域限制,並增強了導航任務的能力。我們使用嵌入式設備(NVIDIA Jetson
AGX Xavier)作為主要的處理器模組,並連接到其他外部周邊設備(像是網路鏡
頭、藍芽喇叭、螢幕、滑鼠和藍芽音訊配對器)。它幾乎可以執行所有主機應有
的系統功能,包括影像蒐集,影像處理和結果呈現。首先,系統的網路鏡頭用於
擷取使用者當前場景。然後,透過遙控器執行所選取的功能來處理該圖像。最後,
系統將當前場景的結果描述,從文字描述轉為語音,並由藍芽喇叭將其傳達給使
用者。該系統的三個主要功能,包括臉部辨識和情緒分類感知(第一個功能),
年齡和性別分類(第二個功能)以及物體檢測(第三個功能)。該系統是基於不
同的深度學習模型構建的,但對於視力障礙的人使用上可能會成為挑戰。因此,
我們還提出了一種可以有效選擇功能的過程,以減輕視障人士的系統控制的複雜
性。最後,完成設計,製造和測試原型,並進行實驗驗證。利用原型機上獲得的
實驗結果,證明了所提系統的性能可靠度。基於辨識和分類準確性、計算時間及
實際適用性的結果證明,該系統是可行的,並且可以有效地用於幫助視障人士。
關鍵詞:年齡分類,情緒分類,臉部辨識,性別分類,對象檢測。
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
iv FCU e-Theses & Dissertations (2021)
Abstract
Individuals with visual impairment confront many difficulties in their living, for
example, unassisted navigation, access to information, and context-aware. Although
many smart devices were designed to assist visually impaired people, most of them
aimed to provide navigation assistance and obstacle avoidance. In this study, we focus
on context-aware and surrounding object recognition. Unlike most studies, which were
implemented on servers or laptop computers, we propose a multifunctional embedded
system based on deep learning for assisting the cognition of visually impaired people.
This proposed system also overcomes the limitation of area usage and enhances the
capabilities of navigation tasks. An embedded device (NVIDIA Jetson AGX Xavier) is
employed as a central processor module in the system and connected to peripheral
devices (webcam, speaker, monitor, mouse, and Bluetooth audio transmitter adapter).
It performs almost all the system functions, including image collection, image
processing, and result description. First, the webcam of the system is used to capture
the current scene of the user. Then, this image is processed by following the selected
function that is executed through a remote controller. Lastly, the system converts the
result description of the current scene from text to voice and delivers it to the user by
the speaker. Three main functions of this system include face recognition and emotion
classification (the first function), age and gender classification (the second function),
and object detection (the third function). This system is built based on different deep
learning models, and it may become a challenge for visually impaired people. Therefore,
we also propose a process that can select functions efficiently to ease the complexity of
the system control for visually impaired people. Finally, a prototype is designed,
fabricated, and tested for experimental validation. The performance of the proposed
system is demonstrated using results obtained from the experiments on the prototype.
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
v FCU e-Theses & Dissertations (2021)
Results based on recognition and classification accuracy, computing time, and practical
applicability prove that the proposed system is feasible and can be effectively used to
assist visually impaired people.
Keywords: Age Classification, Emotion Classification, Face Recognition, Gender
Classification, Object Detection.
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
vi FCU e-Theses & Dissertations (2021)
Table of Contents
Acknowledgement.........................................................................................................i
摘要.............................................................................................................................. iii
Abstract........................................................................................................................iv
Table of Contents........................................................................................................vi
List of Figures..............................................................................................................ix
List of Tables...............................................................................................................xi
Chapter 1 Introduction..............................................................................................1
1.1 Motivation.....................................................................................................2
1.2 Overview of Research...................................................................................6
1.3 Dissertation Organization .............................................................................8
Chapter 2 Related Work............................................................................................9
2.1 Face Recognition ..........................................................................................9
2.2 Gender, Age and Emotion Classification....................................................11
2.3 Object Detection .........................................................................................14
2.4 Smart Healthcare.........................................................................................16
Chapter 3 System Overview ....................................................................................19
3.1 System Architecture....................................................................................19
3.2 Function Selection ......................................................................................21
3.2.1 Remote Controller ..........................................................................21
3.2.2 Function Selection Process.............................................................23
3.3 NVIDIA Jetson AGX Xavier......................................................................25
3.3.1 NVIDIA Jetson Family Introduction..............................................25
3.3.2 Technical Specification of NVIDIA Jetson AGX Xavier ..............26
Chapter 4 Face Recognition Function....................................................................29
A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People
vii FCU e-Theses & Dissertations (2021)
4.1 Overview of Face Recognition Function ....................................................29
4.2 Dataset Collection.......................................................................................30
4.3 Model Architectures....................................................................................33
4.4 Enrolling a New Person ..............................................................................36
Chapter 5 Gender, Age and Emotion Classification Function ..............................38
5.1 Overview of Gender, Age and Emotion Classification Function ...............38
5.2 Gender Classification Schemes...................................................................39
5.3 Age Classification Schemes........................................................................41
5.4 Emotion Classification Schemes.................................................................42
Chapter 6 Object Detection Function.....................................................................47
6.1 Overview of Object Detection Function .....................................................47
6.2 Object Detection Schemes..........................................................................48
6.2.1 Two-Stage Detectors......................................................................48
6.2.2 One-Stage Detectors.......................................................................49
6.3 Arrangement of Result Description ............................................................52
Chapter 7 System Prototype and Implementation................................................53
7.1 Devices in System Implementation ............................................................53
7.2 Initialization Program in Embedded System ..............................................55
7.3 Dataset Collection.......................................................................................58
Chapter 8 Experimental Results.............................................................................60
8.1 Evaluation of Face Recognition Results.....................................................60
8.1.1 Results Evaluation in Terms of Precision and Recall.....................60
8.1.2 Analysis Results of Face Recognition............................................63
8.1.3 Results Comparison in Multiple Standard Datasets.......................64
8.2 Examination Results of Gender, Age and Emotion Classification.............65
8.2.1 Evaluation Results of Gender Classification..................................65