Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Application of machine learning for predicting probabilyty of default of small and medium enterprises, 2022
Nội dung xem thử
Mô tả chi tiết
MINISTRY OF EDUCATION & TRAINING STATE BANK OF VIET NAM
HO CHI MINH CITY UNIVERSITY OF BANKING
--------------------------
NGUYEN THI NGOC ANH
APPLICATION OF MACHINE LEARNING FOR
PREDICTING PROBABILITY OF DEFAULT OF
SMALL AND MEDIUM ENTERPRISES
GRADUATION THESIS
MAJOR: FINANCE – BANKING
CODE: 7340201
HO CHI MINH CITY, 2022
MINISTRY OF EDUCATION & TRAINING STATE BANK OF VIET NAM
HO CHI MINH CITY UNIVERSITY OF BANKING
--------------------------
NGUYEN THI NGOC ANH
APPLICATION OF MACHINE LEARNING FOR
PREDICTING PROBABILITY OF DEFAULT OF
SMALL AND MEDIUM ENTERPRISES
GRADUATION THESIS
MAJOR: FINANCE – BANKING
CODE: 7340201
SUPERVISOR
Ph.D. NGUYEN MINH NHAT
HO CHI MINH CITY, 2022
i
ABSTRACT
Corporate default predictions play an essential role in each sector of the economy, as
highlighted by the Covid - 19 pandemic. The recent high incidence of Small and
Medium Enterprises bankruptcies has highlighted the necessity of anticipating defaults
in many sectors. Based on the importance and necessity, this study aims to investigate
what appropriate models for predicting the probability of default of SMEs in the
Vietnamese Commercial Banks System by Machine Learning approaches; how to
choose an appropriate model for predicting the probability of default of SMEs in the
Vietnamese Commercial Banks System by Machine Learning approaches; and how to
choose an appropriate model for predicting the probability of default of SMEs in the
Vietnamese Commercial Banks System by Machine Learning approaches using a
unique database of 400 Vietnamese SMEs over the 2019 – 2021 period including 13
independent financial variables. The most significant contribution of this research is
the application of Machine Learning approaches in the use of financial indicators to
anticipate the default likelihood of SMEs, as a result, leading to improved efficiency
outcomes in commercial banks' credit risk control in Vietnam in the future. This
research analyzes the performance of a set of Machine Learning (ML) models in
predicting default risk, using a standard statistical model, in particular, the Logistic
Regression Model. When just a restricted amount of information is provided, such as in
the case of financial indicators, ML models (Decision Tree and Random Forest)
outperform statistical models was found in terms of discriminatory power and
precision. Confusion matrix and F1 – Score are used to evaluate which model is the
most appropriate to predict the probability of default of SMEs.
ii
DECLARATION
I declare that this thesis has been composed solely by myself and that it has not been
submitted, in whole or in part, in any previous application for a degree. Except where
states otherwise by reference or acknowledgment, the work presented is entirely my
own.
The author
NGUYEN THI NGOC ANH
iii
ACKNOWLEDGEMENTS
Throughout the writing of this thesis, I have received a great deal of support and
assistance.
First and foremost, I would like to express my heartfelt gratitude and profound
gratitude to the professors of the Ho Chi Minh City University of Banking for their
passionate teaching and for solidifying the firm foundation of knowledge that enabled
me to successfully finish the university program.
Second, I would like to acknowledge and give my warmest thanks to my supervisor,
Ph.D. Nguyen Minh Nhat for providing me with thorough instruction and unwavering
support in finishing my graduation thesis. It would be tough for me to accomplish my
thesis without his careful assistance.
Because of my limited practical experience, the topic of my graduation thesis cannot
avoid some faults; nonetheless, I am looking forward to obtaining more advice from
lecturers to gain new experiences. These experiences, I feel, will be highly beneficial to
my future development.
I sincerely thank you!
Nguyen Thi Ngoc Anh
iv
TABLE OF CONTENT
LIST OF ABBREVIATIONS .............................................................................vii
LIST OF FIGURES...........................................................................................viii
LIST OF TABLES ...............................................................................................ix
CHAPTER 1 INTRODUCTION ..........................................................................1
1.1 The urgency of the research......................................................................1
1.2 Research Objectives...................................................................................5
1.3 Research Questions....................................................................................5
1.4 Research Subject and Scope......................................................................6
1.4.1 Research Subject ...................................................................................6
1.4.2 Research Scope .....................................................................................6
1.5 Research Contributions.............................................................................6
1.6 Research Methodology ..............................................................................7
1.7 The Structure of Research ........................................................................7
CHAPTER 2 LITERATURE REVIEW.............................................................10
2.1 Probability of Default (PD) .....................................................................10
2.2 Overview of the models used to predict the Probability of Default of
SMEs................................................................................................................11
2.2.1 The Structural Models.........................................................................11
2.2.1.1 Regression Analysis Models ......................................................12
2.2.1.2 Discriminant Analysis Models...................................................12
v
2.2.1.3 Logistic Models..........................................................................13
2.2.2 The Non-Structural Models ................................................................14
2.2.2.1 Decision Tree Model (DT).........................................................14
2.2.2.2 Random Forest Model (RF).......................................................15
2.2.2.3 Artificial Neural Network Models (ANNs) ................................15
2.2.2.4 Ensemble Learning ....................................................................16
2.3 Previous Related Research......................................................................17
CHAPTER 3 DATA AND METHODOLOGY...................................................20
3.1 Methodological Model Framework:.......................................................20
3.2 Data collection ..........................................................................................21
3.3 Input Variables Selection ........................................................................22
3.4 The Probability of Default prediction models.......................................25
3.4.1 Logistic Regression Model .................................................................25
3.4.2 Decision Tree Model (DT)..................................................................26
3.4.3 Random Forest Model (RF)................................................................28
3.4.4 Confusion Matrix ................................................................................29
3.4.5 F1-Score ..............................................................................................31
CHAPTER 4 EMPIRICAL RESULTS ..............................................................32
4.1 Descriptive statistics results ....................................................................32
4.2 Correlations..............................................................................................33
4.3 Regression results of a parametric model..............................................34