Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Essentials of Business Analytics
Nội dung xem thử
Mô tả chi tiết
International Series in
Operations Research & Management Science
Bhimasankaram Pochiraju
Sridhar Seshadri Editors
Essentials
of Business
Analytics
An Introduction to the Methodology
and its Applications
International Series in Operations Research
& Management Science
Volume 264
Series Editor
Camille C. Price
Stephen F. Austin State University, TX, USA
Associate Series Editor
Joe Zhu
Worcester Polytechnic Institute, MA, USA
Founding Series Editor
Frederick S. Hillier, Stanford University, CA, USA
More information about this series at http://www.springer.com/series/6161
Bhimasankaram Pochiraju • Sridhar Seshadri
Editors
Essentials of Business
Analytics
An Introduction to the Methodology
and its Applications
123
Editors
Bhimasankaram Pochiraju
Applied Statistics and Computing Lab
Indian School of Business
Hyderabad, Telangana, India
Sridhar Seshadri
Gies College of Business
University of Illinois at Urbana Champaign
Champaign, IL, USA
ISSN 0884-8289 ISSN 2214-7934 (electronic)
International Series in Operations Research & Management Science
ISBN 978-3-319-68836-7 ISBN 978-3-319-68837-4 (eBook)
https://doi.org/10.1007/978-3-319-68837-4
© Springer Nature Switzerland AG 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Professor Bhimasankaram: With the divine
blessings of Bhagawan Sri Sri Sri Satya Sai
Baba, I dedicate this book to my parents—Sri
Pochiraju Rama Rao and Smt. Venkata
Ratnamma.
Sridhar Seshadri: I dedicate this book to
the memory of my parents, Smt. Ranganayaki
and Sri Desikachari Seshadri, my
father-in-law, Sri Kalyana Srinivasan
Ayodhyanath, and my dear friend,
collaborator and advisor, Professor
Bhimasankaram.
Contents
1 Introduction ................................................................. 1
Sridhar Seshadri
Part I Tools
2 Data Collection.............................................................. 19
Sudhir Voleti
3 Data Management—Relational Database Systems (RDBMS)......... 41
Hemanth Kumar Dasararaju and Peeyush Taori
4 Big Data Management ..................................................... 71
Peeyush Taori and Hemanth Kumar Dasararaju
5 Data Visualization .......................................................... 111
John F. Tripp
6 Statistical Methods: Basic Inferences .................................... 137
Vishnuprasad Nagadevara
7 Statistical Methods: Regression Analysis ................................ 179
Bhimasankaram Pochiraju and Hema Sri Sai Kollipara
8 Advanced Regression Analysis............................................ 247
Vishnuprasad Nagadevara
9 Text Analytics ............................................................... 283
Sudhir Voleti
Part II Modeling Methods
10 Simulation ................................................................... 305
Sumit Kunnumkal
11 Introduction to Optimization.............................................. 337
Milind G. Sohoni
vii
viii Contents
12 Forecasting Analytics ...................................................... 381
Konstantinos I. Nikolopoulos and Dimitrios D. Thomakos
13 Count Data Regression..................................................... 421
Thriyambakam Krishnan
14 Survival Analysis ........................................................... 439
Thriyambakam Krishnan
15 Machine Learning (Unsupervised) ....................................... 459
Shailesh Kumar
16 Machine Learning (Supervised) .......................................... 507
Shailesh Kumar
17 Deep Learning .............................................................. 569
Manish Gupta
Part III Applications
18 Retail Analytics ............................................................. 599
Ramandeep S. Randhawa
19 Marketing Analytics........................................................ 623
S. Arunachalam and Amalesh Sharma
20 Financial Analytics ......................................................... 659
Krishnamurthy Vaidyanathan
21 Social Media and Web Analytics.......................................... 719
Vishnuprasad Nagadevara
22 Healthcare Analytics ....................................................... 765
Maqbool (Mac) Dada and Chester Chambers
23 Pricing Analytics............................................................ 793
Kalyan Talluri and Sridhar Seshadri
24 Supply Chain Analytics .................................................... 823
Yao Zhao
25 Case Study: Ideal Insurance .............................................. 847
Deepak Agrawal and Soumithri Mamidipudi
26 Case Study: AAA Airline .................................................. 863
Deepak Agrawal, Hema Sri Sai Kollipara, and Soumithri Mamidipudi
27 Case Study: InfoMedia Solutions ......................................... 873
Deepak Agrawal, Soumithri Mamidipudi, and Sriram Padmanabhan
28 Introduction to R ........................................................... 889
Peeyush Taori and Hemanth Kumar Dasararaju
Contents ix
29 Introduction to Python ..................................................... 917
Peeyush Taori and Hemanth Kumar Dasararaju
30 Probability and Statistics .................................................. 945
Peeyush Taori, Soumithri Mamidipudi, and Deepak Agrawal
Index ............................................................................... 965
Disclaimer
This book contains information obtained from authentic and highly regarded
sources. Reasonable efforts have been made to publish reliable data and information,
but the author and publisher cannot assume responsibility for the validity of
all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let
us know so we may rectify in any future reprint.
xi
Acknowledgements
This book is the outcome of a truly collaborative effort amongst many people who
have contributed in different ways. We are deeply thankful to all the contributing
authors for their ideas and support. The book belongs to them. This book would not
have been possible without the help of Deepak Agrawal. Deepak helped in every
way, from editorial work, solution support, programming help, to coordination with
authors and researchers, and many more things. Soumithri Mamidipudi provided
editorial support, helped with writing summaries of every chapter, and proof-edited
the probability and statistics appendix and cases. Padmavati Sridhar provided editorial support for many chapters. Two associate alumni—Ramakrishna Vempati and
Suryanarayana Ambatipudi—of the Certificate Programme in Business Analytics
(CBA) at Indian School of Business (ISB) helped with locating contemporary
examples and references. They suggested examples for the Retail Analytics and
Supply Chain Analytics chapters. Ramakrishna also contributed to the draft of the
Big Data chapter. Several researchers in the Advanced Statistics and Computing
Lab (ASC Lab) at ISB helped in many ways. Hema Sri Sai Kollipara provided
support for the cases, exercises, and technical and statistics support for various
chapters. Aditya Taori helped with examples for the machine learning chapters
and exercises. Saurabh Jugalkishor contributed examples for the machine learning
chapters. The ASC Lab’s researchers and Hemanth Kumar provided technical
support in preparing solutions for various examples referred in the chapters. Ashish
Khandelwal, Fellow Program student at ISB, helped with the chapter on Linear
Regression. Dr. Kumar Eswaran and Joy Mustafi provided additional thoughts for
the Unsupervised Learning chapter. The editorial team comprising Faith Su, Mathew
Amboy and series editor Camille Price gave immense support during the book
proposal stage, guidance during editing, production, etc. The ASC Lab provided
the research support for this project.
We thank our families for the constant support during the 2-year long project.
We thank each and every person associated with us during the beautiful journey of
writing this book.
xiii
Contributors
Deepak Agrawal Indian School of Business, Hyderabad, Telangana, India
S. Arunachalam Indian School of Business, Hyderabad, Telangana, India
Chester Chambers Carey Business School, Johns Hopkins University, Baltimore,
MD, USA
Maqbool (Mac) Dada Carey Business School, Johns Hopkins University, Baltimore, MD, USA
Manish Gupta Microsoft Corporation, Hyderabad, India
Hema Sri Sai Kollipara Indian School of Business, Hyderabad, Telangana, India
Thriyambakam Krishnan Chennai Mathematical Institute, Chennai, India
Shailesh Kumar Reliance Jio, Navi Mumbai, Maharashtra, India
Hemanth Kumar Dasararaju Indian School of Business, Hyderabad, Telangana,
India
Sumit Kunnumkal Indian School of Business, Hyderabad, Telangana, India
Soumithri Mamidipudi Indian School of Business, Hyderabad, Telangana, India
Vishnuprasad Nagadevara IIM-Bangalore, Bengaluru, Karnataka, India
Konstantinos I. Nikolopoulos Bangor Business School, Bangor, Gwynedd, UK
Sriram Padmanabhan New York, NY, USA
Bhimasankaram Pochiraju Applied Statistics and Computing Lab, Indian School
of Business, Hyderabad, Telangana, India
Ramandeep S. Randhawa Marshall School of Business, University of Southern
California, Los Angeles, CA, USA
Sridhar Seshadri Gies College of Business, University of Illinois at Urbana
Champaign, Champaign, IL, USA
xv
xvi Contributors
Amalesh Sharma Texas A&M University, College Station, TX, USA
Milind G. Sohoni Indian School of Business, Hyderabad, Telangana, India
Kalyan Talluri Imperial College Business School, South Kensington, London, UK
Peeyush Taori London Business School, London, UK
Dimitrios D. Thomakos University of Peloponnese, Tripoli, Greece
John F. Tripp Clemson University, Clemson, SC, USA
Krishnamurthy Vaidyanathan Indian School of Business, Hyderabad,
Telangana, India
Sudhir Voleti Indian School of Business, Hyderabad, Telangana, India
Yao Zhao Rutgers University, Newark, NJ, USA
Chapter 1
Introduction
Sridhar Seshadri
Business analytics is the science of posing and answering data questions related to
business. Business analytics has rapidly expanded in the last few years to include
tools drawn from statistics, data management, data visualization, and machine learning. There is increasing emphasis on big data handling to assimilate the advances
made in data sciences. As is often the case with applied methodologies, business
analytics has to be soundly grounded in applications in various disciplines and
business verticals to be valuable. The bridge between the tools and the applications
are the modeling methods used by managers and researchers in disciplines such as
finance, marketing, and operations. This book provides coverage of all three aspects:
tools, modeling methods, and applications.
The purpose of the book is threefold: to fill the void in the graduate-level study
materials for addressing business problems in order to pose data questions, obtain
optimal business solutions via analytics theory, and ground the solution in practice.
In order to make the material self-contained, we have endeavored to provide ample
use of cases and data sets for practice and testing of tools. Each chapter comes
with data, examples, and exercises showing students what questions to ask, how to
apply the techniques using open source software, and how to interpret the results. In
our approach, simple examples are followed with medium to large applications and
solutions. The book can also serve as a self-study guide to professionals who wish
to enhance their knowledge about the field.
The distinctive features of the book are as follows:
• The chapters are written by experts from universities and industry.
• The major software used are R, Python, MS Excel, and MYSQL. These are all
topical and widely used in the industry.
S. Seshadri ()
Gies College of Business, University of Illinois at Urbana Champaign, Champaign, IL, USA
e-mail: [email protected]
© Springer Nature Switzerland AG 2019
B. Pochiraju, S. Seshadri (eds.), Essentials of Business Analytics, International
Series in Operations Research & Management Science 264,
https://doi.org/10.1007/978-3-319-68837-4_1
1
2 S. Seshadri
• Extreme care has been taken to ensure continuity from one chapter to the next.
The editors have attempted to make sure that the content and flow are similar in
every chapter.
• In Part A of the book, the tools and modeling methodology are developed in
detail. Then this methodology is applied to solve business problems in various
verticals in Part B. Part C contains larger case studies.
• The Appendices cover required material on Probability theory, R, and Python, as
these serve as prerequisites for the main text.
The structure of each chapter is as follows:
• Each chapter has a business orientation. It starts with business problems, which
are transformed into technological problems. Methodology is developed to solve
the technological problems. Data analysis is done using suitable software and the
output and results are clearly explained at each stage of development. Finally, the
technological solution is transformed back to a business solution. The chapters
conclude with suggestions for further reading and a list of references.
• Exercises (with real data sets when applicable) are at the end of each chapter and
on the Web to test and enhance the understanding of the concepts and application.
• Caselets are used to illustrate the concepts in several chapters.
1 Detailed Description of Chapters
Data Collection: This chapter introduces the concepts of data collection and
problem formulation. Firstly, it establishes the foundation upon which the fields
of data sciences and analytics are based, and defines core concepts that will be used
throughout the rest of the book. The chapter starts by discussing the types of data
that can be gathered, and the common pitfalls that can occur when data analytics
does not take into account the nature of the data being used. It distinguishes between
primary and secondary data sources using examples, and provides a detailed
explanation of the advantages and constraints of each type of data. Following this,
the chapter details the types of data that can be collected and sorted. It discusses the
difference between nominal-, ordinal-, interval-, and ratio-based data and the ways
in which they can be used to obtain insights into the subject being studied.
The chapter then discusses problem formulation and its importance. It explains
how and why formulating a problem will impact the data that is gathered, and
thus affect the conclusions at which a research project may arrive. It describes
a framework by which a messy real-world situation can be clarified so that a
mathematical toolkit can be used to identify solutions. The chapter explains the
idea of decision-problems, which can be used to understand the real world, and
research-objectives, which can be used to analyze decision-problems.
1 Introduction 3
The chapter also details the challenges faced when collecting and collating data.
It discusses the importance of understanding what data to collect, how to collect it,
how to assess its quality, and finally the most appropriate way of collating it so that
it does not lose its value.
The chapter ends with an illustrative example of how the retailing industry might
use various sources of data in order to better serve their customers and understand
their preferences.
Data Management—Relational Database Management Systems: This chapter
introduces the idea of data management and storage. The focus of the chapter
is on relational database management systems or RDBMS. RDBMS is the most
commonly used data organization system in enterprises. The chapter introduces and
explains the ideas using MySQL, an open-source structural query language used by
many of the largest data management systems in the world.
The chapter describes the basic functions of a MySQL server, such as creating
databases, examining data tables, and performing functions and various operations
on data sets. The first set of instructions the chapter discusses is about the rules,
definition, and creation of relational databases. Then, the chapter describes how to
create tables and add data to them using MySQL server commands. It explains how
to examine the data present in the tables using the SELECT command.
Data Management—Big Data: This chapter builds on some of the concepts
introduced in the previous chapter but focuses on big data tools. It describes what
really constitutes big data and focuses on some of the big data tools. In this chapter,
the basics of big data tools such as Hadoop, Spark, and surrounding ecosystem are
presented.
The chapter begins by describing Hadoop’s uses and key features, as well as the
programs in its ecosystem that can also be used in conjunction with it. It also briefly
visits the concepts of distributed and parallel computing and big data cloud.
The chapter describes the architecture of the Hadoop runtime environment. It
starts by describing the cluster, which is the set of host machines, or nodes for
facilitating data access. It then moves on to the YARN infrastructure, which is
responsible for providing computational resources to the application. It describes
two main elements of the YARN infrastructure—the Resource Manager and the
Node Manager. It then details the HDFS Federation, which provides storage,
and also discusses other storage solutions. Lastly, it discusses the MapReduce
framework, which is the software layer.
The chapter then describes the functions of MapReduce in detail. MapReduce
divides tasks into subtasks, which it runs in parallel in order to increase efficiency. It
discusses the manner in which MapReduce takes lists of input data and transforms
them into lists of output data, by implementing a “map” process and a “reduce”
process, which it aggregates. It describes in detail the process steps that MapReduce
takes in order to produce the output, and describes how Python can be used to create
a MapReduce process for a word count program.
The chapter briefly describes Spark and an application using Spark. It concludes
with a discussion about cloud storage. The chapter makes use of Cloudera virtual
machine (VM) distributable to demonstrate different hands-on exercises.