Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

A multifunctional embedded system based on deep learning for assisting the cognition of visually impaired people
PREMIUM
Số trang
106
Kích thước
4.1 MB
Định dạng
PDF
Lượt xem
1559

A multifunctional embedded system based on deep learning for assisting the cognition of visually impaired people

Nội dung xem thử

Mô tả chi tiết

逢 甲 大 學

資訊工程學系

博 士 論 文

基於深度學習來輔助視覺障礙者認知

之多功能嵌入式系統

A Multifunctional Embedded System Based on

Deep Learning for Assisting the Cognition of

Visually Impaired People

指導教授:竇其仁

林峰正

研 究 生:吳友輝

中 華 民 國 一 百 一 十 年 一 月

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

i FCU e-Theses & Dissertations (2021)

Acknowledgement

First and foremost, I would like to express my sincere gratitude to my advisor,

Prof. Chyi-Ren Dow, for his motivation, extensive experience, and immense

knowledge. I am very grateful for all his ideas, time, and funding contributions that laid

the foundation for my research experience. The passion and enthusiasm he has for the

research were inspired and motivational to me, especially during tough periods in the

Ph.D. pursuit. I am also thankful for the excellent pieces of advice he has offered as an

outstanding professor. These pieces of advice are valuable lessons for me in all the time

of research and the future. It is an honor for me to be one of his Ph.D. students. Again,

I would like to convey my heartfelt gratitude to him.

I would like to express my sincere gratitude to my co-advisor, Prof. Feng-Cheng

Lin, for his scientific advice, knowledge, and valuable guidance. I am very grateful for

all his ideas, time, and the supported devices. He always encourages and helps me to

promote strengths in my research, especially appreciates our research results. I am also

thankful for his excellent pieces of advice. It is an honor for me to be his first Ph.D.

student. From the bottom of my heart, I would like to express my sincere gratitude to

him again.

I would like to thank my dissertation committee: Prof. Hsiao-Hsi Wang,

Prof. Tsung-Chuan Huang, Prof. Lin-Huang Chang, Prof. Cheng-Min Lin, and Prof.

Hsi-Min Chen, for their meaningful suggestions, which help me continue to improve

and develop my research.

In addition, I am indebted to Szu-Yi Ho (Toni), who gave me numerous insightful

discussions and suggestions. She supported and shared with me to resolve my faced

challenges in related research issues, especially publishing papers.

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

ii FCU e-Theses & Dissertations (2021)

I would always remember my fellow labmates at the Mobile Computing lab for

the inspiring discussions, unconditional supports, friendship, and for all the fun-time

we have had in the last four years. In particular, my gratitude goes to Ms. Yu-Yun Chang

(Amber) and Mr. Kuan-Chieh Wang (Rich) for providing essential local supports during

the years.

Last but not least, I am grateful to my family members for all their encouragements

and faith in me. They gave me enough moral support, encouragement, and motivation

to accomplish the personal goals. And most of all for my parents, who raised me up

with unconditional love and gave me unlimited support in every decision I have made.

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

iii FCU e-Theses & Dissertations (2021)

摘要

視力障礙的人在生活中面臨許多困難,例如,無輔助導航,獲取訊息和情境

感知。儘管許多智慧型裝置可用來幫助視障人士,但大多數只在提供導航幫助和

避障。在本研究中,我們專注於情境感知和周遭物件辨識。與大多數主從式架構

或是單台桌機運算所進行的研究不同,我們提出了一種基於深度學習的多功能嵌

入式系統,以幫助視覺障礙者對周遭環境的認知。我們提出的系統還克服了使用

上的區域限制,並增強了導航任務的能力。我們使用嵌入式設備(NVIDIA Jetson

AGX Xavier)作為主要的處理器模組,並連接到其他外部周邊設備(像是網路鏡

頭、藍芽喇叭、螢幕、滑鼠和藍芽音訊配對器)。它幾乎可以執行所有主機應有

的系統功能,包括影像蒐集,影像處理和結果呈現。首先,系統的網路鏡頭用於

擷取使用者當前場景。然後,透過遙控器執行所選取的功能來處理該圖像。最後,

系統將當前場景的結果描述,從文字描述轉為語音,並由藍芽喇叭將其傳達給使

用者。該系統的三個主要功能,包括臉部辨識和情緒分類感知(第一個功能),

年齡和性別分類(第二個功能)以及物體檢測(第三個功能)。該系統是基於不

同的深度學習模型構建的,但對於視力障礙的人使用上可能會成為挑戰。因此,

我們還提出了一種可以有效選擇功能的過程,以減輕視障人士的系統控制的複雜

性。最後,完成設計,製造和測試原型,並進行實驗驗證。利用原型機上獲得的

實驗結果,證明了所提系統的性能可靠度。基於辨識和分類準確性、計算時間及

實際適用性的結果證明,該系統是可行的,並且可以有效地用於幫助視障人士。

關鍵詞:年齡分類,情緒分類,臉部辨識,性別分類,對象檢測。

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

iv FCU e-Theses & Dissertations (2021)

Abstract

Individuals with visual impairment confront many difficulties in their living, for

example, unassisted navigation, access to information, and context-aware. Although

many smart devices were designed to assist visually impaired people, most of them

aimed to provide navigation assistance and obstacle avoidance. In this study, we focus

on context-aware and surrounding object recognition. Unlike most studies, which were

implemented on servers or laptop computers, we propose a multifunctional embedded

system based on deep learning for assisting the cognition of visually impaired people.

This proposed system also overcomes the limitation of area usage and enhances the

capabilities of navigation tasks. An embedded device (NVIDIA Jetson AGX Xavier) is

employed as a central processor module in the system and connected to peripheral

devices (webcam, speaker, monitor, mouse, and Bluetooth audio transmitter adapter).

It performs almost all the system functions, including image collection, image

processing, and result description. First, the webcam of the system is used to capture

the current scene of the user. Then, this image is processed by following the selected

function that is executed through a remote controller. Lastly, the system converts the

result description of the current scene from text to voice and delivers it to the user by

the speaker. Three main functions of this system include face recognition and emotion

classification (the first function), age and gender classification (the second function),

and object detection (the third function). This system is built based on different deep

learning models, and it may become a challenge for visually impaired people. Therefore,

we also propose a process that can select functions efficiently to ease the complexity of

the system control for visually impaired people. Finally, a prototype is designed,

fabricated, and tested for experimental validation. The performance of the proposed

system is demonstrated using results obtained from the experiments on the prototype.

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

v FCU e-Theses & Dissertations (2021)

Results based on recognition and classification accuracy, computing time, and practical

applicability prove that the proposed system is feasible and can be effectively used to

assist visually impaired people.

Keywords: Age Classification, Emotion Classification, Face Recognition, Gender

Classification, Object Detection.

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

vi FCU e-Theses & Dissertations (2021)

Table of Contents

Acknowledgement.........................................................................................................i

摘要.............................................................................................................................. iii

Abstract........................................................................................................................iv

Table of Contents........................................................................................................vi

List of Figures..............................................................................................................ix

List of Tables...............................................................................................................xi

Chapter 1 Introduction..............................................................................................1

1.1 Motivation.....................................................................................................2

1.2 Overview of Research...................................................................................6

1.3 Dissertation Organization .............................................................................8

Chapter 2 Related Work............................................................................................9

2.1 Face Recognition ..........................................................................................9

2.2 Gender, Age and Emotion Classification....................................................11

2.3 Object Detection .........................................................................................14

2.4 Smart Healthcare.........................................................................................16

Chapter 3 System Overview ....................................................................................19

3.1 System Architecture....................................................................................19

3.2 Function Selection ......................................................................................21

3.2.1 Remote Controller ..........................................................................21

3.2.2 Function Selection Process.............................................................23

3.3 NVIDIA Jetson AGX Xavier......................................................................25

3.3.1 NVIDIA Jetson Family Introduction..............................................25

3.3.2 Technical Specification of NVIDIA Jetson AGX Xavier ..............26

Chapter 4 Face Recognition Function....................................................................29

A Multifunctional Embedded System Based on Deep Learning for Assisting the Cognition of Visually Impaired People

vii FCU e-Theses & Dissertations (2021)

4.1 Overview of Face Recognition Function ....................................................29

4.2 Dataset Collection.......................................................................................30

4.3 Model Architectures....................................................................................33

4.4 Enrolling a New Person ..............................................................................36

Chapter 5 Gender, Age and Emotion Classification Function ..............................38

5.1 Overview of Gender, Age and Emotion Classification Function ...............38

5.2 Gender Classification Schemes...................................................................39

5.3 Age Classification Schemes........................................................................41

5.4 Emotion Classification Schemes.................................................................42

Chapter 6 Object Detection Function.....................................................................47

6.1 Overview of Object Detection Function .....................................................47

6.2 Object Detection Schemes..........................................................................48

6.2.1 Two-Stage Detectors......................................................................48

6.2.2 One-Stage Detectors.......................................................................49

6.3 Arrangement of Result Description ............................................................52

Chapter 7 System Prototype and Implementation................................................53

7.1 Devices in System Implementation ............................................................53

7.2 Initialization Program in Embedded System ..............................................55

7.3 Dataset Collection.......................................................................................58

Chapter 8 Experimental Results.............................................................................60

8.1 Evaluation of Face Recognition Results.....................................................60

8.1.1 Results Evaluation in Terms of Precision and Recall.....................60

8.1.2 Analysis Results of Face Recognition............................................63

8.1.3 Results Comparison in Multiple Standard Datasets.......................64

8.2 Examination Results of Gender, Age and Emotion Classification.............65

8.2.1 Evaluation Results of Gender Classification..................................65

Tải ngay đi em, còn do dự, trời tối mất!