Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Searching For Phenotypes Of Sepsis An Application Of Machine Learning To Electronic Health Records
Nội dung xem thử
Mô tả chi tiết
Yale University
EliScholar – A Digital Platform for Scholarly Publishing at Yale
Yale Medicine Thesis Digital Library School of Medicine
January 2019
Searching For Phenotypes Of Sepsis: An
Application Of Machine Learning To Electronic
Health Records
Michael Jarvis Boyle
Follow this and additional works at: https://elischolar.library.yale.edu/ymtdl
This Open Access Thesis is brought to you for free and open access by the School of Medicine at EliScholar – A Digital Platform for Scholarly
Publishing at Yale. It has been accepted for inclusion in Yale Medicine Thesis Digital Library by an authorized administrator of EliScholar – A Digital
Platform for Scholarly Publishing at Yale. For more information, please contact [email protected].
Recommended Citation
Boyle, Michael Jarvis, "Searching For Phenotypes Of Sepsis: An Application Of Machine Learning To Electronic Health Records"
(2019). Yale Medicine Thesis Digital Library. 3477.
https://elischolar.library.yale.edu/ymtdl/3477
Searching for Phenotypes of Sepsis:
An Application of Machine Learning to Electronic Health
Records
A Thesis Submitted to the
Yale University School of Medicine
In Partial Fulfillment of the Requirements for the
Degree of Doctor of Medicine
by
Michael Jarvis Boyle
2019
2
SEARCHING FOR PHENOTYPES OF SEPSIS: AN APPLICATION OF MACHINE LEARNING TO
ELECTRONIC HEALTH RECORDS. Michael J. Boyle (Sponsored by R. Andrew Taylor).
Department of Emergency Medicine, Yale University School of Medicine, New Haven,
CT.
Sepsis has historically been categorized into discrete subsets based on expert
consensus-driven definitions, but there is evidence to suggest it would be better
described as a continuum. The goal of this study was to perform an exhaustive search
for distinct phenotypes of sepsis using various unsupervised machine learning
techniques applied to the electronic health record (EHR) data of 41,843 Yale New Haven
Health System emergency department patients with infection between 2013 and 2016.
Specifically, the aims were to develop an autoencoder to reduce the high-dimensional
EHR data to a latent representation amenable to clustering, and then to search for and
assess the quality of clusters within that representation using various clustering
methods (partitional, hierarchical, and density-based) and standard evaluation metrics.
Autoencoder training was performed by minimizing the mean squared error of the
reconstruction. With this exhaustive search, no convincing consistent clusters were
found. Various clustering patterns were produced by the different methods but all had
poor quality metrics, while evaluation metrics meant to find the ideal number of
clusters did not agree on a consistent number but seemed to suggest fewer than two
clusters. Inspection of one promising arrangement with eight clusters did not reveal a
statistically significant difference in admission rate. While it is impossible to prove a
negative, these results suggest there are not distinct phenotypic clusters of sepsis.
3
Acknowledgements
I am indebted to my thesis advisor, Dr. R. Andrew Taylor, for his constant support and
insight, and to my friends and colleagues for their willingness to discuss these ideas and
serve as valuable sounding boards. This work was made possible through the generous
support of the Yale Summer Research Grant.
None of this would be possible, however, without the love and support of my wife,
Shirin Jamshidian. This work is dedicated to her.
4
INTRODUCTION 6
Sepsis Definitions 6
Machine Learning and Electronic Health Records 12
AIMS 15
METHODS 16
Study Design 16
Study Setting and Population 16
Study Protocol 17
Data Set Creation 19
Imputation 26
Autoencoder Training 26
Clustering 30
RESULTS AND DISCUSSION 31
Quality of dimensionality reduction and latent representation 31
Clustering 32
Assessing clustering propensity 32
Assessing ideal number of clusters 33
Partitional Methods 35
K-means 35
K-medoids 38