Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Analytics for the Internet of Things (IoT)
Nội dung xem thử
Mô tả chi tiết
Analytics for the Internet of Things
(IoT)
Intelligent Analytics for Your Intelligent Devices
Andrew Minteer
BIRMINGHAM - MUMBAI
Analytics for the Internet of Things (IoT)
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its
dealers and distributors will be held liable for any damages caused or alleged to be caused
directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2017
Production reference: 1210717
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78712-073-0
www.packtpub.com
Credits
Author
Andrew Minteer
Copy Editor
Yesha Gangani
Reviewer
Ruben Oliva Ramos
Project Coordinator
Judie Jose
Commissioning Editor
Kartikey Pandey
Proofreader
Safis Editing
Acquisition Editor
Namrata Patil
Indexer
Aishwarya Gangawane
Content Development Editor
Abhishek Jadhav
Graphics
Kirk D'Penha
Technical Editor
Prachi Sawant
Production Coordinator
Aparna Bhagat
About the Author
Andrew Minteer is currently the senior director, data science and research at a leading
global retail company. Prior to that, he served as the director, IoT Analytics and Machine
Learning at a Fortune 500 manufacturing company.
He has an MBA from Indiana University with a background in statistics, software
development, database design, cloud architecture, and has led analytics teams for over 10
years.
He first taught himself to program on an Atari 800 computer at the age of 11 and fondly
remembers the frustration of waiting through 20 minutes of beeps and static to load a 100-
line program. He now thoroughly enjoys launching a 1 TB GPU-backed cloud instance in a
few minutes and getting right to work.
Andrew is a private pilot who looks forward to spending some time in the air sometime
soon. He enjoys kayaking, camping, traveling the world, and playing around with his sixyear-old son and three-year-old daughter.
I would like to thank my ever-patient wife, Julie, for her constant support and tolerance of
so many nights and weekends spent working on this technical book. I also thank her for
credibly convincing me that this book was not actually a sleep aid, she was just tired from
watching the kids. I also want to thank my energetic little princess-dress-wearing
daughter, Olivia, and my intelligent Lego-wielding son, Max, for inspiring me to keep at
it. Thank you to my family for your constant support and encouragement, especially my
father who I suspect is more excited about this book than I am.
While I am thanking everyone, I want to give a shout-out to all the fantastic people I have
worked with over the years, both bosses and colleagues. I have learned far more from them
than they have from me. I have been truly lucky to work with such talented people.
Last but not least, I want to thank all my editors and reviewers for their comments and
insights in developing this book.
I hope you, the reader, not only learn a lot about analytics for IoT but also enjoy the
experience.
About the Reviewer
Ruben Oliva Ramos is a computer systems engineer from Tecnologico of León Institute,
with a master's degree in computer and electronic systems engineering, teleinformatics, and
networking specialization from the University of Salle Bajio in Leon, Guanajuato Mexico.
He has more than 5 years of experience in developing web applications to control and
monitor devices connected with Arduino and Raspberry Pi using web frameworks and
cloud services to build applications using the Internet of Things.
He is a mechatronics teacher at the University of Salle Bajio and teaches students on the
master's degree in design and engineering of mechatronics systems. He also works at
Centro de Bachillerato Tecnologico Industrial 225 in Leon, Guanajuato Mexico, teaching
subjects such as electronics, robotics and control, automation, and microcontrollers at
Mechatronics Technician Career, consultant and developer projects in areas such as
monitoring systems and datalogger data using technologies such as Android, iOS,
Windows Phone, HTML5, PHP, CSS, Ajax, JavaScript, Angular, ASP .NET databases SQlite,
mongoDB, MySQL, web servers Node.js, IIS, hardware programming Arduino, Raspberry
pi, Ethernet Shield, GPS and GSM/GPRS, ESP8266, control, and monitor systems for data
acquisition and programming.
I would like to thank my savior and lord, Jesus Christ for giving me strength and courage
to pursue this project, to my dearest wife, Mayte, our two lovely sons, Ruben and Dario,
To my father (Ruben), my dearest mom (Rosalia), my brother (Juan Tomas), and my sister
(Rosalia) whom I love, for all their support while reviewing this book, for allowing me to
pursue my dream and tolerating not being with them after my busy day job.
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a
print book customer, you are entitled to a discount on the eBook copy. Get in touch with us
at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a
range of free newsletters and receive exclusive discounts and offers on Packt books and
eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt
books and video courses, as well as industry-leading tools to help you plan your personal
development and advance your career.
why subscribe
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial
process. To help us improve, please leave us an honest review on this book's Amazon page
at https://www.amazon.com/dp/1787120732.
If you'd like to join our team of regular reviewers, you can e-mail us at
[email protected]. We award our regular reviewers with free eBooks and
videos in exchange for their valuable feedback. Help us be relentless in improving our
products!
Table of Contents
Preface 1
Chapter 1: Defining IoT Analytics and Challenges 8
The situation 8
Defining IoT analytics 12
Defining analytics 12
Defining the Internet of Things 14
The concept of constrained 15
IoT analytics challenges 15
The data volume 16
Problems with time 18
Problems with space 20
Data quality 21
Analytics challenges 22
Business value concerns 23
Summary 23
Chapter 2: IoT Devices and Networking Protocols 24
IoT devices 25
The wild world of IoT devices 25
Healthcare 25
Manufacturing 25
Transportation and logistics 26
Retail 27
Oil and gas 27
Home automation or monitoring 27
Wearables 28
Sensor types 28
Networking basics 29
IoT networking connectivity protocols 31
Connectivity protocols (when the available power is limited) 31
Bluetooth Low Energy (also called Bluetooth Smart) 32
6LoWPAN 33
ZigBee 36
Advantages of ZigBee 38
Disadvantages of ZigBee 38
Common use cases 38
NFC 39
[ ii ]
Common use cases 40
Sigfox 40
Connectivity protocols (when power is not a problem) 41
Wi-Fi 41
Common use cases 42
Cellular (4G/LTE) 43
Common use cases 44
IoT networking data messaging protocols 44
Message Queue Telemetry Transport (MQTT) 45
Topics 46
Advantages to MQTT 47
Disadvantages to MQTT 49
QoS levels 49
QoS 0 50
QoS 1 50
QoS 2 51
Last Will and Testament (LWT) 52
Tips for analytics 53
Common use cases 53
Hyper-Text Transport Protocol (HTTP) 53
Representational State Transfer (REST) principles 54
HTTP and IoT 55
Advantages to HTTP 55
Disadvantages to HTTP 55
Constrained Application Protocol (CoAP) 55
Advantages to CoAP 57
Disadvantages to CoAP 58
Message reliability 58
Common use cases 58
Data Distribution Service (DDS) 59
Common use cases 60
Analyzing data to infer protocol and device characteristics 61
Summary 63
Chapter 3: IoT Analytics for the Cloud 64
Building elastic analytics 65
What is cloud infrastructure? 65
Elastic analytics concepts 67
Design with the endgame in mind 69
Designing for scale 69
Decouple key components 69
Encapsulate analytics 69
Decoupling with message queues 70
Distributed computing 73
Avoid containing analytics to one server 73
[ iii ]
When to use distributed and when to use one server 73
Assuming that change is constant 74
Leverage managed services 74
Use Application Programming Interfaces (API) 76
Cloud security and analytics 77
Public/private keys 77
Public versus private subnets 77
Access restrictions 78
Securing customer data 78
The AWS overview 79
AWS key concepts 81
Regions 81
Availability Zones 81
Subnet 82
Security groups 82
AWS key core services 82
Virtual Private Cloud (VPC) 82
Identity and Access Management (IAM) 84
Elastic Compute (EC2) 84
Simple Storage Service (S3) 85
AWS key services for IoT analytics 85
Amazon Simple Queue Service (SQS) 86
Amazon Elastic Map Reduce (EMR) 86
AWS machine learning 87
Amazon Relational Database Service (RDS) 88
Amazon Redshift 88
Microsoft Azure overview 88
Azure Data Lake Store 88
Azure Analysis Services 89
HDInsight 90
The R server option 90
The ThingWorx overview 91
ThingWorx Core 92
ThingWorx Connection Services 92
ThingWorx Edge 93
ThingWorx concepts 94
Thing templates 94
Things 94
Properties 95
Services 95
Events 95
Thing shapes 96
Data shapes 96
[ iv ]
Entities 96
Summary 96
Chapter 4: Creating an AWS Cloud Analytics Environment 97
The AWS CloudFormation overview 97
The AWS Virtual Private Cloud (VPC) setup walk-through 99
Creating a key pair for the NAT and bastion instances 101
Creating an S3 bucket to store data 103
Creating a VPC for IoT Analytics 105
What is a NAT gateway? 105
What is a bastion host? 106
Your VPC architecture 106
The VPC Creation walk-through 108
How to terminate and clean up the environment 117
Summary 120
Chapter 5: Collecting All That Data - Strategies and Techniques 121
Designing data processing for analytics 122
Amazon Kinesis 122
AWS Lambda 123
AWS Athena 123
The AWS IoT platform 124
Microsoft Azure IoT Hub 125
Applying big data technology to storage 126
Hadoop 126
Hadoop cluster architectures 129
What is a Node? 130
Node types 130
Hadoop Distributed File System 131
Parquet 133
Avro 136
Hive 137
Serialization/Deserialization (SerDe) 139
Hadoop MapReduce 140
Yet Another Resource Negotiator (YARN) 140
HBase 142
Amazon DynamoDB 142
Amazon S3 142
Apache Spark for data processing 143
What is Apache Spark? 143
Spark and big data analytics 144
Thinking about a single machine versus a cluster of machines 145
Using Spark for IoT data processing 146
[ v ]
To stream or not to stream 148
Lambda architectures 149
Handling change 150
Summary 151
Chapter 6: Getting to Know Your Data - Exploring IoT Data 152
Exploring and visualizing data 154
The Tableau overview 154
Techniques to understand data quality 156
Look at your data - au naturel 157
Data completeness 158
Data validity 164
Assessing Information Lag 166
Representativeness 167
Basic time series analysis 167
What is meant by time series? 168
Applying time series analysis 168
Get to know categories in the data 173
Bring in geography 173
Look for attributes that might have predictive value 175
R (the pirate's language...if he was a statistician) 175
Installing R and RStudio 175
Using R for statistical analysis 176
Summing it all up 180
Solving industry-specific analysis problems 181
Manufacturing 181
Healthcare 182
Retail 183
Summary 183
Chapter 7: Decorating Your Data - Adding External Datasets to
Innovate 184
Adding internal datasets 185
Which ones and why? 186
Customer information 186
Production data 186
Field services 187
Financial 187
Adding external datasets 187
External datasets - geography 188
Elevation 188
SRTM elevation 188
National Elevation Dataset (NED) 189
[ vi ]
Weather 190
Geographical features 191
Planet.osm 192
Google Maps API 193
USGS national transportation datasets 194
External datasets - demographic 195
The U.S. Census Bureau 195
CIA World Factbook 196
External datasets - economic 197
Organization for Economic Cooperation and Development (OECD) 197
Federal Reserve Economic Data (FRED) 199
Summary 200
Chapter 8: Communicating with Others - Visualization and
Dashboarding 201
Common mistakes when designing visuals 203
The Hierarchy of Questions method 206
The Hierarchy of Questions method overview 207
Developing question trees 208
Pulling together the data 212
Aligning views with question flows 212
Designing visual analysis for IoT data 212
Using layout positioning to convey importance 213
Use color to highlight important data 213
The impact of using a single color to communicate importance 214
Be consistent across visuals 215
Make charts easy to interpret 215
Creating a dashboard with Tableau 216
The dashboard walk-through 216
Hierarchy of Questions example 216
Aligning visuals to the thought process 217
Creating individual views 218
Assembling views into a dashboard 222
Creating and visualizing alerts 224
Alert principles 225
Organizing alerts using a Tableau dashboard 225
Summary 229
Chapter 9: Applying Geospatial Analytics to IoT Data 230
Why do you need geospatial analytics for IoT? 232
The basics of geospatial analysis 234
Welcome to Null Island 234
Coordinate Reference Systems 235
[ vii ]
The Earth is not a ball 235
Vector-based methods 238
The bounding box 240
Contains 241
Buffer 242
Dilation and erosion 243
Simplify 244
Vector summary 245
Raster-based methods 245
Storing geospatial data 246
File formats 246
Spatial extensions for relational databases 248
Storing geospatial data in HDFS 248
Spatial indexing 249
R-tree 249
Processing geospatial data 251
Geospatial analysis software 251
ArcGIS 251
QGIS 252
ogr2ogr 253
PostGIS spatial functions 254
Geospatial analysis in the big data world 255
Solving the pollution reporting problem 256
Summary 257
Chapter 10: Data Science for IoT Analytics 258
Machine learning (ML) 259
What is machine learning? 260
Representation 262
Evaluation 262
Optimization 263
Generalization 263
Feature engineering with IoT data 264
Dealing with missing values 265
Centering and scaling 270
Time series handling 271
Validation methods 272
Cross-validation 272
Test set 274
Precision, recall, and specificity 274
Understanding the bias–variance tradeoff 276
Bias 277
Variance 278