Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Application of machine learning for predicting probabilyty of default of small and medium enterprises, 2022
PREMIUM
Số trang
73
Kích thước
1.6 MB
Định dạng
PDF
Lượt xem
1849

Application of machine learning for predicting probabilyty of default of small and medium enterprises, 2022

Nội dung xem thử

Mô tả chi tiết

MINISTRY OF EDUCATION & TRAINING STATE BANK OF VIET NAM

HO CHI MINH CITY UNIVERSITY OF BANKING

--------------------------

NGUYEN THI NGOC ANH

APPLICATION OF MACHINE LEARNING FOR

PREDICTING PROBABILITY OF DEFAULT OF

SMALL AND MEDIUM ENTERPRISES

GRADUATION THESIS

MAJOR: FINANCE – BANKING

CODE: 7340201

HO CHI MINH CITY, 2022

MINISTRY OF EDUCATION & TRAINING STATE BANK OF VIET NAM

HO CHI MINH CITY UNIVERSITY OF BANKING

--------------------------

NGUYEN THI NGOC ANH

APPLICATION OF MACHINE LEARNING FOR

PREDICTING PROBABILITY OF DEFAULT OF

SMALL AND MEDIUM ENTERPRISES

GRADUATION THESIS

MAJOR: FINANCE – BANKING

CODE: 7340201

SUPERVISOR

Ph.D. NGUYEN MINH NHAT

HO CHI MINH CITY, 2022

i

ABSTRACT

Corporate default predictions play an essential role in each sector of the economy, as

highlighted by the Covid - 19 pandemic. The recent high incidence of Small and

Medium Enterprises bankruptcies has highlighted the necessity of anticipating defaults

in many sectors. Based on the importance and necessity, this study aims to investigate

what appropriate models for predicting the probability of default of SMEs in the

Vietnamese Commercial Banks System by Machine Learning approaches; how to

choose an appropriate model for predicting the probability of default of SMEs in the

Vietnamese Commercial Banks System by Machine Learning approaches; and how to

choose an appropriate model for predicting the probability of default of SMEs in the

Vietnamese Commercial Banks System by Machine Learning approaches using a

unique database of 400 Vietnamese SMEs over the 2019 – 2021 period including 13

independent financial variables. The most significant contribution of this research is

the application of Machine Learning approaches in the use of financial indicators to

anticipate the default likelihood of SMEs, as a result, leading to improved efficiency

outcomes in commercial banks' credit risk control in Vietnam in the future. This

research analyzes the performance of a set of Machine Learning (ML) models in

predicting default risk, using a standard statistical model, in particular, the Logistic

Regression Model. When just a restricted amount of information is provided, such as in

the case of financial indicators, ML models (Decision Tree and Random Forest)

outperform statistical models was found in terms of discriminatory power and

precision. Confusion matrix and F1 – Score are used to evaluate which model is the

most appropriate to predict the probability of default of SMEs.

ii

DECLARATION

I declare that this thesis has been composed solely by myself and that it has not been

submitted, in whole or in part, in any previous application for a degree. Except where

states otherwise by reference or acknowledgment, the work presented is entirely my

own.

The author

NGUYEN THI NGOC ANH

iii

ACKNOWLEDGEMENTS

Throughout the writing of this thesis, I have received a great deal of support and

assistance.

First and foremost, I would like to express my heartfelt gratitude and profound

gratitude to the professors of the Ho Chi Minh City University of Banking for their

passionate teaching and for solidifying the firm foundation of knowledge that enabled

me to successfully finish the university program.

Second, I would like to acknowledge and give my warmest thanks to my supervisor,

Ph.D. Nguyen Minh Nhat for providing me with thorough instruction and unwavering

support in finishing my graduation thesis. It would be tough for me to accomplish my

thesis without his careful assistance.

Because of my limited practical experience, the topic of my graduation thesis cannot

avoid some faults; nonetheless, I am looking forward to obtaining more advice from

lecturers to gain new experiences. These experiences, I feel, will be highly beneficial to

my future development.

I sincerely thank you!

Nguyen Thi Ngoc Anh

iv

TABLE OF CONTENT

LIST OF ABBREVIATIONS .............................................................................vii

LIST OF FIGURES...........................................................................................viii

LIST OF TABLES ...............................................................................................ix

CHAPTER 1 INTRODUCTION ..........................................................................1

1.1 The urgency of the research......................................................................1

1.2 Research Objectives...................................................................................5

1.3 Research Questions....................................................................................5

1.4 Research Subject and Scope......................................................................6

1.4.1 Research Subject ...................................................................................6

1.4.2 Research Scope .....................................................................................6

1.5 Research Contributions.............................................................................6

1.6 Research Methodology ..............................................................................7

1.7 The Structure of Research ........................................................................7

CHAPTER 2 LITERATURE REVIEW.............................................................10

2.1 Probability of Default (PD) .....................................................................10

2.2 Overview of the models used to predict the Probability of Default of

SMEs................................................................................................................11

2.2.1 The Structural Models.........................................................................11

2.2.1.1 Regression Analysis Models ......................................................12

2.2.1.2 Discriminant Analysis Models...................................................12

v

2.2.1.3 Logistic Models..........................................................................13

2.2.2 The Non-Structural Models ................................................................14

2.2.2.1 Decision Tree Model (DT).........................................................14

2.2.2.2 Random Forest Model (RF).......................................................15

2.2.2.3 Artificial Neural Network Models (ANNs) ................................15

2.2.2.4 Ensemble Learning ....................................................................16

2.3 Previous Related Research......................................................................17

CHAPTER 3 DATA AND METHODOLOGY...................................................20

3.1 Methodological Model Framework:.......................................................20

3.2 Data collection ..........................................................................................21

3.3 Input Variables Selection ........................................................................22

3.4 The Probability of Default prediction models.......................................25

3.4.1 Logistic Regression Model .................................................................25

3.4.2 Decision Tree Model (DT)..................................................................26

3.4.3 Random Forest Model (RF)................................................................28

3.4.4 Confusion Matrix ................................................................................29

3.4.5 F1-Score ..............................................................................................31

CHAPTER 4 EMPIRICAL RESULTS ..............................................................32

4.1 Descriptive statistics results ....................................................................32

4.2 Correlations..............................................................................................33

4.3 Regression results of a parametric model..............................................34

Tải ngay đi em, còn do dự, trời tối mất!