Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Practical Java Machine Learning
PREMIUM
Số trang
410
Kích thước
14.6 MB
Định dạng
PDF
Lượt xem
769

Practical Java Machine Learning

Nội dung xem thử

Mô tả chi tiết

Practical Java

Machine

Learning

Projects with Google Cloud Platform and

Amazon Web Services

Mark Wickham

Practical Java

Machine Learning

Projects with Google Cloud

Platform and Amazon Web Services

Mark Wickham

Practical Java Machine Learning: Projects with Google Cloud Platform and

Amazon Web Services

ISBN-13 (pbk): 978-1-4842-3950-6 ISBN-13 (electronic): 978-1-4842-3951-3

https://doi.org/10.1007/978-1-4842-3951-3

Library of Congress Control Number: 2018960994

Copyright © 2018 by Mark Wickham

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the

material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now

known or hereafter developed.

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with

every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an

editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the

trademark.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not

identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to

proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication,

neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or

omissions that may be made. The publisher makes no warranty, express or implied, with respect to the

material contained herein.

Managing Director, Apress Media LLC: Welmoed Spahr

Acquisitions Editor: Steve Anglin

Development Editor: Matthew Moodie

Coordinating Editor: Mark Powers

Cover designed by eStudioCalamar

Cover image designed by Freepik (www.freepik.com)

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street,

6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer￾sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member

(owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a

Delaware corporation.

For information on translations, please e-mail [email protected]; for reprint, paperback, or audio rights,

please email [email protected].

Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and

licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales

web page at www.apress.com/bulk-sales.

Any source code or other supplementary material referenced by the author in this book is available to

readers on GitHub via the book's product page, located at www.apress.com/9781484239506. For more

detailed information, please visit www.apress.com/source-code.

Printed on acid-free paper

Mark Wickham

Irving, TX, USA

iii

Table of Contents

Chapter 1: Introduction 1

1.1 Terminology ............................................................................................................................. 1

1.2 Historical ................................................................................................................................. 5

1.3 Machine Learning Business Case ........................................................................................... 7

Machine Learning Hype ........................................................................................................... 7

Challenges and Concerns ........................................................................................................ 8

Data Science Platforms ........................................................................................................... 9

ML Monetization .................................................................................................................... 13

The Case for Classic Machine Learning on Mobile ................................................................ 14

1.4 Deep Learning ....................................................................................................................... 18

Identifying DL Applications .................................................................................................... 19

1.5 ML-Gates Methodology ......................................................................................................... 22

ML-Gate 6: Identify the Well-Defined Problem ...................................................................... 23

ML-Gate 5: Acquire Sufficient Data ....................................................................................... 24

ML-Gate 4: Process/Clean/Visualize the Data ....................................................................... 25

ML-Gate 3: Generate a Model ................................................................................................ 25

ML-Gate 2: Test/Refine the Model ......................................................................................... 25

ML-Gate 1: Integrate the Model ............................................................................................. 26

ML-Gate 0: Deployment ......................................................................................................... 26

Methodology Summary ......................................................................................................... 27

1.6 The Case for Java .................................................................................................................. 27

Java Market ........................................................................................................................... 27

Java Versions ......................................................................................................................... 29

About the Author xi

About the Technical Reviewer xiii

Preface xv

iv

Installing Java ....................................................................................................................... 31

Java Performance ................................................................................................................. 33

1.7 Development Environments .................................................................................................. 35

Android Studio ....................................................................................................................... 36

Eclipse ................................................................................................................................... 39

Net Beans IDE ........................................................................................................................ 43

1.8 Competitive Advantage ......................................................................................................... 44

Standing on the Shoulders of Giants ..................................................................................... 44

Bridging Domains .................................................................................................................. 45

1.9 Chapter Summary ................................................................................................................. 46

Key Findings .......................................................................................................................... 46

Chapter 2: Data: The Fuel for Machine Learning 47

2.1 Megatrends ........................................................................................................................... 48

Explosion of Data ................................................................................................................... 48

Highly Scalable Computing Resources .................................................................................. 51

Advancement in Algorithms ................................................................................................... 52

2.2 Think Like a Data Scientist .................................................................................................... 52

Data Nomenclature ................................................................................................................ 53

Defining Data ......................................................................................................................... 54

2.3 Data Formats ........................................................................................................................ 55

CSV Files and Apache OpenOffice ......................................................................................... 57

ARFF Files .............................................................................................................................. 62

JSON ...................................................................................................................................... 63

2.4 JSON Integration ................................................................................................................... 69

JSON with Android SDK ......................................................................................................... 69

JSON with Java JDK .............................................................................................................. 70

2.5 Data Preprocessing ............................................................................................................... 72

Instances, Attributes, Labels, and Features ........................................................................... 73

Data Type Identification ......................................................................................................... 74

Missing Values and Duplicates .............................................................................................. 74

Erroneous Values and Outliers ............................................................................................... 76

Table of Contents

v

Macro Processing with OpenOffice Calc ............................................................................... 77

JSON Validation ..................................................................................................................... 79

2.6 Creating Your Own Data ........................................................................................................ 80

Wifi Gathering ........................................................................................................................ 80

2.7 Visualization .......................................................................................................................... 84

JavaScript Visualization Libraries .......................................................................................... 84

D3 Plus .................................................................................................................................. 86

2.8 Project: D3 Visualization ........................................................................................................ 86

2.9 Project: Android Data Visualization ....................................................................................... 97

2.10 Summary........................................................................................................................... 102

Key Data Findings ................................................................................................................ 103

Chapter 3: Leveraging Cloud Platforms 105

3.1 Introduction ......................................................................................................................... 105

Commercial Cloud Providers ............................................................................................... 106

Competitive Positioning ....................................................................................................... 109

Pricing ................................................................................................................................. 110

3.2 Google Cloud Platform (GCP) ............................................................................................... 112

Google Compute Engine (GCE) Virtual Machines (VM) ......................................................... 114

Google Cloud SDK ................................................................................................................ 116

Google Cloud Client Libraries .............................................................................................. 120

Cloud Tools for Eclipse (CT4E) ............................................................................................. 120

GCP Cloud Machine Learning Engine (ML Engine) ............................................................... 121

GCP Free Tier Pricing Details ............................................................................................... 122

3.3 Amazon AWS ....................................................................................................................... 123

AWS Machine Learning........................................................................................................ 124

AWS ML Building and Deploying Models ............................................................................. 126

AWS EC2 AMI ....................................................................................................................... 131

Running Weka ML in the AWS Cloud .................................................................................... 135

AWS SageMaker .................................................................................................................. 141

AWS SDK for Java ................................................................................................................ 143

AWS Free Tier Pricing Details .............................................................................................. 147

Table of Contents

vi

3.4 Machine Learning APIs........................................................................................................ 148

Using ML REST APIs............................................................................................................. 150

Alternative ML API Providers ............................................................................................... 151

3.5 Project: GCP Cloud Speech API for Android ......................................................................... 152

Cloud Speech API App Overview .......................................................................................... 153

GCP Machine Learning APIs ................................................................................................. 155

Cloud Speech API Authentication ......................................................................................... 156

Android Audio ...................................................................................................................... 161

Cloud Speech API App Summary ......................................................................................... 165

3.6 Cloud Data for Machine Learning ........................................................................................ 166

Unstructured Data ............................................................................................................... 167

NoSQL Databases ................................................................................................................ 168

NoSQL Data Store Methods ................................................................................................. 170

Apache Cassandra Java Interface ....................................................................................... 172

3.7 Cloud Platform Summary .................................................................................................... 175

Chapter 4: Algorithms: The Brains of Machine Learning 177

4.1 Introduction ......................................................................................................................... 177

ML-Gate 3 ............................................................................................................................ 178

4.2 Algorithm Styles .................................................................................................................. 179

Labeled vs. Unlabeled Data ................................................................................................. 179

4.3 Supervised Learning ........................................................................................................... 180

4.4 Unsupervised Learning ....................................................................................................... 182

4.5 Semi-Supervised Learning ................................................................................................. 184

4.6 Alternative Learning Styles ................................................................................................. 185

Linear Regression Algorithm ............................................................................................... 185

Deep Learning Algorithms ................................................................................................... 186

Reinforcement Learning ...................................................................................................... 188

4.7 CML Algorithm Overview ..................................................................................................... 189

4.8 Choose the Right Algorithm ................................................................................................ 192

Functional Algorithm Decision Process ............................................................................... 193

Table of Contents

vii

4.9 The Seven Most Useful CML Algorithms ............................................................................. 195

Naive Bayes Algorithm (NB) ................................................................................................. 195

Random Forest Algorithm (RF) ............................................................................................. 197

K-Nearest Neighbors Algorithm (KNN) ................................................................................. 199

Support Vector Machine Algorithm (SVM) ............................................................................ 202

K-Means Algorithm .............................................................................................................. 204

DBSCAN Algorithm ............................................................................................................... 206

Expectation-Maximization (EM) Algorithm........................................................................... 208

4.10 Algorithm Performance ..................................................................................................... 209

MNIST Algorithm Evaluation ................................................................................................ 209

4.11 Algorithm Analysis ............................................................................................................ 214

Confusion Matrix ................................................................................................................. 215

ROC Curves .......................................................................................................................... 216

K-Fold Cross-Validation ....................................................................................................... 218

4.12 Java Source Code ............................................................................................................. 220

Classification Algorithms ..................................................................................................... 222

Clustering Algorithms .......................................................................................................... 223

Java Algorithm Modification ................................................................................................ 224

Chapter 5: Machine Learning Environments 227

5.1 Overview ............................................................................................................................. 228

ML Gates .............................................................................................................................. 228

5.2 Java ML Environments ........................................................................................................ 229

Weka .................................................................................................................................... 232

RapidMiner .......................................................................................................................... 232

KNIME .................................................................................................................................. 234

ELKI...................................................................................................................................... 236

Java-ML ............................................................................................................................... 236

5.3 Weka Installation ................................................................................................................. 236

Weka Configuration ............................................................................................................. 238

Java Parameters Setup ....................................................................................................... 241

Table of Contents

viii

Modifying Weka .prop Files ................................................................................................. 242

Weka Settings...................................................................................................................... 244

Weka Package Manager ...................................................................................................... 245

5.4 Weka Overview ................................................................................................................... 247

Weka Documentation .......................................................................................................... 249

Weka Explorer ..................................................................................................................... 249

Weka Filters ......................................................................................................................... 251

Weka Explorer Key Options ................................................................................................. 252

Weka KnowledgeFlow ......................................................................................................... 253

Weka Simple CLI .................................................................................................................. 255

5.5 Weka Clustering Algorithms ................................................................................................ 257

Clustering with DBSCAN ...................................................................................................... 257

Clustering with KnowledgeFlow .......................................................................................... 264

5.6 Weka Classification Algorithms ........................................................................................... 268

Preprocessing (Data Cleaning) ............................................................................................ 269

Classification: Random Forest Algorithm ............................................................................. 274

Classification: K-Nearest Neighbor ...................................................................................... 278

Classification: Naive Bayes .................................................................................................. 281

Classification: Support Vector Machine ............................................................................... 283

5.7 Weka Model Evaluation ....................................................................................................... 286

Multiple ROC Curves ............................................................................................................ 288

5.8 Weka Importing and Exporting ............................................................................................ 292

Chapter 6: Integrating Models 297

6.1 Introduction ......................................................................................................................... 297

6.2 Managing Models ................................................................................................................ 298

Device Constraints ............................................................................................................... 299

Optimal Model Size .............................................................................................................. 300

Model Version Control .......................................................................................................... 304

Updating Models .................................................................................................................. 305

Managing Models: Best Practices ....................................................................................... 307

Table of Contents

ix

6.3 Weka Java API ..................................................................................................................... 307

Loading Data ....................................................................................................................... 308

Working with Options .......................................................................................................... 309

Applying Filters .................................................................................................................... 309

Setting the Label Attribute ................................................................................................... 310

Building a Classifier ............................................................................................................. 310

Training and Testing ............................................................................................................. 311

Building a Clusterer ............................................................................................................. 312

Loading Models ................................................................................................................... 312

Making Predictions .............................................................................................................. 313

6.4 Weka for Android ................................................................................................................. 314

Creating Android Weka Libraries in Eclipse ......................................................................... 315

Adding the Weka Library in Android Studio ......................................................................... 320

6.5 Android Integration ............................................................................................................. 321

Project: Weka Model Create ................................................................................................. 322

Project: Weka Model Load ................................................................................................... 328

6.6 Android Weka Model Performance ...................................................................................... 335

6.7 Raspberry Pi Integration ..................................................................................................... 337

Raspberry Pi Setup for ML ................................................................................................... 339

Raspberry Pi GUI Considerations ......................................................................................... 341

Weka API Library for Raspberry Pi ....................................................................................... 342

Project: Raspberry Pi Old Faithful Geyser Classifier ............................................................ 342

6.8 Sensor Data ........................................................................................................................ 363

Android Sensors .................................................................................................................. 363

Raspberry Pi with Sensors .................................................................................................. 365

Sensor Units of Measure ..................................................................................................... 369

Project: Android Activity Tracker .......................................................................................... 370

6.9 Weka License Notes ............................................................................................................ 381

Index 383

Table of Contents

xi

About the Author

Mark Wickham is a frequent speaker at Android developer

conferences and has written two books, Practical Android

and Practical Java Machine Learning. As a freelance Android

developer, Mark currently resides in Dallas, TX after living

and working in China for nearly 20 years. While at Motorola,

Mark led product management, product marketing, and

software development teams in the Asia Pacific region.

Before joining Motorola, Mark worked on software projects

for TRW’s Space Systems Division. Mark has a degree in

Computer Science and Physics from Creighton University, and MBA from the University

of Washington, and jointly studied business at the Hong Kong University of Science

and Technology. In his free time, Mark also enjoys photography and recording live

music. Mark can be contacted via his LinkedIn profile (www.linkedin.com/in/mark-j￾wickham/) or GitHub page (www.github.com/wickapps).

xiii

About the Technical Reviewer

Jason Whitehorn is an experienced entrepreneur and

software developer. He has helped many oil and gas

companies automate and enhance their oilfield solutions

through field data capture, SCADA, and machine learning.

Jason obtained his Bachelor of Science in Computer Science

from Arkansas State University, but he traces his passion

for development back many years before then, having first

taught himself to program BASIC on his family’s computer

while still in middle school.

When he’s not mentoring and helping his team at work,

writing, or pursuing one of his many side projects, Jason enjoys spending time with his

wife and four children and living in the Tulsa, Oklahoma region. More information about

Jason can be found on his website at https://jason.whitehorn.us.

xv

Preface

It is interesting to watch trends in software development come and go, and to watch

languages become fashionable, and then just as quickly fade away. As machine learning

and AI began to reemerge a few years ago, it was easy to look upon the hype with a great

deal of skepticism.

• AlphaGo, a UK-based company, used deep learning to defeat the Go

masters. Go is a Chinese board game that very complicated due to a

huge number of combinations. Living in China at the time, there was

a lot of discussion about the panicked Go masters who refused to

play the machine for fear that their techniques would be exposed or

"learned" by the machines.

• An AI Poker Bot named Libratus individually defeated four top

human professional players in 2017. This was surprising because

poker is a difficult game for machines to master. In poker, unlike

Go, there is a lot of unknown information, making it an "imperfect

information" game.

• Machine traders are replacing human traders at many of the large

investment banks. The rise of the "quant" on Wall Street is well

documented. Examining the job opportunities at investment banks

reveals a trend favoring math majors, data scientists, and machine

learning experts.

• IBM's Watson can do amazing things, such as fix the elevator before

breaks, adjust the sprinkler system in the vineyard to optimize yield,

and help oilfield workers manage a drilling rig.

xvi

Despite the hype, it was not until confronted with problems that were very difficult

to solve with existing software tools that I began to explore and appreciate the power of

machine learning techniques.

Today, after several years of gaining an understanding about what these new

techniques can do, and how to apply them, I find myself thinking differently about each

problem I encounter. Almost every piece of software can benefit in some way from

machine learning techniques.

Developing machine learning software requires us to think differently about

problems, resulting in a new way to partition our development efforts. However, change

is good, and using machine learning with a data-driven development methodology can

allow us to solve previously unsolvable problems.

In this book, I will describe what I have discovered along my journey. I hope that it

can help you in your future software endeavors.

Objectives

The book will meet the following objectives:

• Introduce readers to the exciting developments in the AI subfield

of machine learning (ML). The book will summarize the types of

problems machine learning can solve. Without machine learning,

such solutions would be very difficult to accomplish.

• Help readers understand the importance of data as the critical input

for any machine learning solution, and how to identify, organize, and

architect the data required for ML. Strategies and techniques for the

visualization and preprocessing of data will also be covered using

available Java packages. The book will help readers who know Java to

become more proficient in data science.

• Explore how to deploy ML solutions in conjunction with cloud

service providers such as Google and Amazon.

• Focus exclusively on Java libraries and Java-based solutions for

ML. The book will NOT cover other popular ML languages such as

Python or C++.

Preface

xvii

• Focus on classic machine learning solutions. The book will not cover

implementations for deep learning, which use neural networks. Deep

learning is a topic that requires a complete text of its own for proper

exploration.

• Provide readers an overview of ML algorithms. Rather than cover

these algorithms from a mathematical viewpoint, the book will

present a practical review of the algorithms and explain to readers

which algorithm to select for a particular problem.

• Introduce readers to the most important Java-based ML platforms.

The book will provide a deep dive into the popular Weka Java

environments. The book will show readers how to port the latest

Weka version to Android.

• Java developers have the advantage of easily transitioning to the

Android Mobile platform. The book will show readers how to deploy

ML apps for Android devices using the Weka API.

• One of the fastest growing sources of data is sensor data. Embedded

devices often produce sensor data, enabling a significant opportunity

to deploy ML solutions for these devices. The book will show readers

how to implement ML solutions for sensor data using Java.

Audience

This book is intended for the following audiences:

• Developers looking to implement ML solutions for Java platforms

• Data scientists looking to explore Java implementation options

• Business decision makers looking to explore entry into machine

learning for their organizations

The book will be of most value to experienced Java developers who have not

implemented ML techniques before. The book will explain the various ML techniques

that are now feasible due to recent advances in performance, storage, and algorithms.

Preface

Tải ngay đi em, còn do dự, trời tối mất!