Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Evaluation on performance and energy eciency of distributed computing systems
Nội dung xem thử
Mô tả chi tiết
Evaluation on Performance and Energy
Efficiency of Distributed Computing
Systems
Ph.D. Dissertation
by
Tran Thi Xuan (MSc)
Supervised by
Prof. Do Van Tien (DSc)
Department of Networked Systems and Services
Budapest University of Technology and Economics
Hungary, 2020
Abstract
The increasing usage of distributed computing systems to serve the growing demand for
scientific computation and big data processing comes with the drastic growth of energy
consumption in computing clusters. Therefore, optimizing the energy consumption of
computational clusters has become more crucial than ever. The dissertation summarizes
a study on the resource allocation problem in distributed systems, motivated by a need
of taking into account different resource characteristics and dynamic power management
(DPM) techniques.
First, a generalized model of computational clusters built from heterogeneous types
of COTS servers has been introduced to study the resource-aware scheduling. A set of
scheduling heuristics that consider servers’ performance and power consumption characteristics and the organization of waiting buffers have been investigated. We show that the
buffering schemes play an important role in ensuring the quality of service parameters in
terms of the waiting time and the response time experienced by arriving jobs. Moreover,
energy efficiency characteristic based scheduling can conserve the system energy and high
performance priority based policy yields the best performance.
Second, new real-time measurement based scheduling algorithms to achieve a trade-off
between energy efficiency and the performance capability of computational clusters have
been proposed in the thesis. Numerical results show that the proposed algorithms attain
a balance between the job execution time and energy efficiency.
Third, the impact of dynamic power management (DPM) in computing systems built
from multicore processors has been investigated. Numerical results point out that DPM
in the core level of processors can play a role in saving energy consumption. A resourceaware scheduling solution has been proposed to achieve energy-efficient processing of
parallel tasks in multicore systems. Obtained results indicate that the proposal reduces
energy consumption significantly in comparison to random allocation.
2
3
Last, the energy inefficiency in an ordinary big data scheduler-Hadoop YARN has
been investigated. Since the resource allocation policy in the Hadoop YARN cluster is
data-aware (i.e. the allocation strongly depends on the locations of data splits in Hadoop
Distributed File System-HDFS), a new data placement scheme for HDFS was proposed to
achieve energy efficiency when MapReduce tasks are processed by the cluster. Compared
to the existing HDFS data layout scheme, the proposal yields above 50% reduction in
energy consumption at a small expense of ≈6% increase in job execution time.
5
I, the undersigned Tran Thi Xuan, hereby state that I have written this doctoral dissertation myself, and I have used only the sources given in it. I have clearly marked all
the parts taken from other sources either word for word or reworded but with the same
contents, indicating their sources.
The reviews of the dissertation and the report of the thesis discussion are available at the
Dean’s Office of the Electrical Engineering and Informatics Faculty, Budapest University
of Technology and Economics.
Budapest, February 17, 2020
Tran Thi Xuan
Acknowledgements
I would like to thank all people who have provided invaluable assistance during my study
towards the Ph.D. degree.
I would like to express my sincere gratitude to Prof. Dr. Do Van Tien for his intensive
supervision. Prof. Dr. Do Van Tien has guided me on the direction of my research at
preliminary time. Without his continuous supervision and straight criticisms, I could not
accomplish this study and achieve PhD degree.
I deeply thank Dr. Do Hoai Nam, a senior researcher in Analysis, Design and Development of ICT systems laboratory at our department, for his work cooperation and
enthusiastic support through my research. All members of the Analysis, Design and Development of ICT systems laboratory, other PhD students, and the university staffs are
acknowledged.
Finally, I dedicate my hearty thankfulness to my husband and son Le Linh Bang and
Le Minh Anh for their love and encouragement. I am also grateful to all family members
and friends who have supported me throughout.
6
Contents
Abstract 4
Acknowledgement 7
List of Figures 14
List of Tables 16
1 Introduction 17
2 A generalized model of heterogeneous computing clusters for investigation of scheduling schemes 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 A generalized cluster model and Scheduling algorithms . . . . . . . . . . . 21
2.2.1 Ranking of servers . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Scheduling algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.3 Performance measures and energy metrics . . . . . . . . . . . . . . 27
2.3 Simulation Inputs and Numerical Results . . . . . . . . . . . . . . . . . . . 29
2.3.1 Input parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8
CONTENTS 9
3 New algorithms for balancing energy consumption and performance in
computational clusters 40
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 System description and proposed scheduling algorithms . . . . . . . . . . . 42
3.2.1 Scheduling algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 The parameters of a computational cluster . . . . . . . . . . . . . . 46
3.3.2 Job balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 System metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.4 Impacts of DVFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.5 Evaluations with workload traces as input data . . . . . . . . . . . 52
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Impact of Dynamic power management techniques in computing systems of multicore processors 56
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Dynamic Power Management practices . . . . . . . . . . . . . . . . . . . . 58
4.3 System descriptions and operation scenarios . . . . . . . . . . . . . . . . . 59
4.3.1 Job assignment scenarios . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.2 Performance and energy metrics . . . . . . . . . . . . . . . . . . . . 63
4.4 Evaluation on the impact of DPM . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 Simulation inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.2 Analysis of obtained results . . . . . . . . . . . . . . . . . . . . . . 67
4.5 A proposal of Resource-aware scheduling algorithm . . . . . . . . . . . . . 72
4.5.1 The proposed policy . . . . . . . . . . . . . . . . . . . . . . . . . . 73
CONTENTS 10
4.5.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 A New Data Layout Scheme for Energy-Efficient MapReduce Processing
Tasks 80
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 The operation of HDFS and YARN in a computing cluster . . . . . . . . . 83
5.3.1 HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.2 Yet Another Resource Negotiator –YARN . . . . . . . . . . . . . . 83
5.3.3 Processing Hadoop MapReduce applications . . . . . . . . . . . . . 85
5.3.4 The default HDFS data layout . . . . . . . . . . . . . . . . . . . . . 86
5.3.5 A locality relaxation algorithm for resource allocation in RM . . . . 87
5.4 A New Data Layout Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.1 Subsets of servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.2 A proposed algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.1 Parameters for a case study . . . . . . . . . . . . . . . . . . . . . . 93
5.5.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Summary 104
Own Publications 105
Bibliography 105