Essentials of Business Analytics

International Series in

Operations Research & Management Science

Bhimasankaram Pochiraju

Sridhar Seshadri Editors

Essentials

of Business

Analytics

An Introduction to the Methodology

and its Applications

International Series in Operations Research

& Management Science

Volume 264

Series Editor

Camille C. Price

Stephen F. Austin State University, TX, USA

Associate Series Editor

Joe Zhu

Worcester Polytechnic Institute, MA, USA

Founding Series Editor

Frederick S. Hillier, Stanford University, CA, USA

More information about this series at http://www.springer.com/series/6161

Bhimasankaram Pochiraju • Sridhar Seshadri

Editors

Essentials of Business

Analytics

An Introduction to the Methodology

and its Applications

123

Editors

Bhimasankaram Pochiraju

Applied Statistics and Computing Lab

Indian School of Business

Hyderabad, Telangana, India

Sridhar Seshadri

Gies College of Business

University of Illinois at Urbana Champaign

Champaign, IL, USA

ISSN 0884-8289 ISSN 2214-7934 (electronic)

International Series in Operations Research & Management Science

ISBN 978-3-319-68836-7 ISBN 978-3-319-68837-4 (eBook)

https://doi.org/10.1007/978-3-319-68837-4

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of

the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology

now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book

are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or

the editors give a warranty, express or implied, with respect to the material contained herein or for any

errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional

claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Professor Bhimasankaram: With the divine

blessings of Bhagawan Sri Sri Sri Satya Sai

Baba, I dedicate this book to my parents—Sri

Pochiraju Rama Rao and Smt. Venkata

Ratnamma.

Sridhar Seshadri: I dedicate this book to

the memory of my parents, Smt. Ranganayaki

and Sri Desikachari Seshadri, my

father-in-law, Sri Kalyana Srinivasan

Ayodhyanath, and my dear friend,

collaborator and advisor, Professor

Bhimasankaram.

Contents

1 Introduction ................................................................. 1

Sridhar Seshadri

Part I Tools

2 Data Collection.............................................................. 19

Sudhir Voleti

3 Data Management—Relational Database Systems (RDBMS)......... 41

Hemanth Kumar Dasararaju and Peeyush Taori

4 Big Data Management ..................................................... 71

Peeyush Taori and Hemanth Kumar Dasararaju

5 Data Visualization .......................................................... 111

John F. Tripp

6 Statistical Methods: Basic Inferences .................................... 137

Vishnuprasad Nagadevara

7 Statistical Methods: Regression Analysis ................................ 179

Bhimasankaram Pochiraju and Hema Sri Sai Kollipara

8 Advanced Regression Analysis............................................ 247

Vishnuprasad Nagadevara

9 Text Analytics ............................................................... 283

Sudhir Voleti

Part II Modeling Methods

10 Simulation ................................................................... 305

Sumit Kunnumkal

11 Introduction to Optimization.............................................. 337

Milind G. Sohoni

vii

viii Contents

12 Forecasting Analytics ...................................................... 381

Konstantinos I. Nikolopoulos and Dimitrios D. Thomakos

13 Count Data Regression..................................................... 421

Thriyambakam Krishnan

14 Survival Analysis ........................................................... 439

Thriyambakam Krishnan

15 Machine Learning (Unsupervised) ....................................... 459

Shailesh Kumar

16 Machine Learning (Supervised) .......................................... 507

Shailesh Kumar

17 Deep Learning .............................................................. 569

Manish Gupta

Part III Applications

18 Retail Analytics ............................................................. 599

Ramandeep S. Randhawa

19 Marketing Analytics........................................................ 623

S. Arunachalam and Amalesh Sharma

20 Financial Analytics ......................................................... 659

Krishnamurthy Vaidyanathan

21 Social Media and Web Analytics.......................................... 719

Vishnuprasad Nagadevara

22 Healthcare Analytics ....................................................... 765

Maqbool (Mac) Dada and Chester Chambers

23 Pricing Analytics............................................................ 793

Kalyan Talluri and Sridhar Seshadri

24 Supply Chain Analytics .................................................... 823

Yao Zhao

25 Case Study: Ideal Insurance .............................................. 847

Deepak Agrawal and Soumithri Mamidipudi

26 Case Study: AAA Airline .................................................. 863

Deepak Agrawal, Hema Sri Sai Kollipara, and Soumithri Mamidipudi

27 Case Study: InfoMedia Solutions ......................................... 873

Deepak Agrawal, Soumithri Mamidipudi, and Sriram Padmanabhan

28 Introduction to R ........................................................... 889

Peeyush Taori and Hemanth Kumar Dasararaju

Contents ix

29 Introduction to Python ..................................................... 917

Peeyush Taori and Hemanth Kumar Dasararaju

30 Probability and Statistics .................................................. 945

Peeyush Taori, Soumithri Mamidipudi, and Deepak Agrawal

Index ............................................................................... 965

Disclaimer

This book contains information obtained from authentic and highly regarded

sources. Reasonable efforts have been made to publish reliable data and information,

but the author and publisher cannot assume responsibility for the validity of

all materials or the consequences of their use. The authors and publishers have

attempted to trace the copyright holders of all material reproduced in this publication

and apologize to copyright holders if permission to publish in this form has not been

obtained. If any copyright material has not been acknowledged please write and let

us know so we may rectify in any future reprint.

Acknowledgements

This book is the outcome of a truly collaborative effort amongst many people who

have contributed in different ways. We are deeply thankful to all the contributing

authors for their ideas and support. The book belongs to them. This book would not

have been possible without the help of Deepak Agrawal. Deepak helped in every

way, from editorial work, solution support, programming help, to coordination with

authors and researchers, and many more things. Soumithri Mamidipudi provided

editorial support, helped with writing summaries of every chapter, and proof-edited

the probability and statistics appendix and cases. Padmavati Sridhar provided editorial support for many chapters. Two associate alumni—Ramakrishna Vempati and

Suryanarayana Ambatipudi—of the Certificate Programme in Business Analytics

(CBA) at Indian School of Business (ISB) helped with locating contemporary

examples and references. They suggested examples for the Retail Analytics and

Supply Chain Analytics chapters. Ramakrishna also contributed to the draft of the

Big Data chapter. Several researchers in the Advanced Statistics and Computing

Lab (ASC Lab) at ISB helped in many ways. Hema Sri Sai Kollipara provided

support for the cases, exercises, and technical and statistics support for various

chapters. Aditya Taori helped with examples for the machine learning chapters

and exercises. Saurabh Jugalkishor contributed examples for the machine learning

chapters. The ASC Lab’s researchers and Hemanth Kumar provided technical

support in preparing solutions for various examples referred in the chapters. Ashish

Khandelwal, Fellow Program student at ISB, helped with the chapter on Linear

Regression. Dr. Kumar Eswaran and Joy Mustafi provided additional thoughts for

the Unsupervised Learning chapter. The editorial team comprising Faith Su, Mathew

Amboy and series editor Camille Price gave immense support during the book

proposal stage, guidance during editing, production, etc. The ASC Lab provided

the research support for this project.

We thank our families for the constant support during the 2-year long project.

We thank each and every person associated with us during the beautiful journey of

writing this book.

xiii

Contributors

Deepak Agrawal Indian School of Business, Hyderabad, Telangana, India

S. Arunachalam Indian School of Business, Hyderabad, Telangana, India

Chester Chambers Carey Business School, Johns Hopkins University, Baltimore,

MD, USA

Maqbool (Mac) Dada Carey Business School, Johns Hopkins University, Baltimore, MD, USA

Manish Gupta Microsoft Corporation, Hyderabad, India

Hema Sri Sai Kollipara Indian School of Business, Hyderabad, Telangana, India

Thriyambakam Krishnan Chennai Mathematical Institute, Chennai, India

Shailesh Kumar Reliance Jio, Navi Mumbai, Maharashtra, India

Hemanth Kumar Dasararaju Indian School of Business, Hyderabad, Telangana,

India

Sumit Kunnumkal Indian School of Business, Hyderabad, Telangana, India

Soumithri Mamidipudi Indian School of Business, Hyderabad, Telangana, India

Vishnuprasad Nagadevara IIM-Bangalore, Bengaluru, Karnataka, India

Konstantinos I. Nikolopoulos Bangor Business School, Bangor, Gwynedd, UK

Sriram Padmanabhan New York, NY, USA

Bhimasankaram Pochiraju Applied Statistics and Computing Lab, Indian School

of Business, Hyderabad, Telangana, India

Ramandeep S. Randhawa Marshall School of Business, University of Southern

California, Los Angeles, CA, USA

Sridhar Seshadri Gies College of Business, University of Illinois at Urbana

Champaign, Champaign, IL, USA

xvi Contributors

Amalesh Sharma Texas A&M University, College Station, TX, USA

Milind G. Sohoni Indian School of Business, Hyderabad, Telangana, India

Kalyan Talluri Imperial College Business School, South Kensington, London, UK

Peeyush Taori London Business School, London, UK

Dimitrios D. Thomakos University of Peloponnese, Tripoli, Greece

John F. Tripp Clemson University, Clemson, SC, USA

Krishnamurthy Vaidyanathan Indian School of Business, Hyderabad,

Telangana, India

Sudhir Voleti Indian School of Business, Hyderabad, Telangana, India

Yao Zhao Rutgers University, Newark, NJ, USA

Chapter 1

Introduction

Sridhar Seshadri

Business analytics is the science of posing and answering data questions related to

business. Business analytics has rapidly expanded in the last few years to include

tools drawn from statistics, data management, data visualization, and machine learning. There is increasing emphasis on big data handling to assimilate the advances

made in data sciences. As is often the case with applied methodologies, business

analytics has to be soundly grounded in applications in various disciplines and

business verticals to be valuable. The bridge between the tools and the applications

are the modeling methods used by managers and researchers in disciplines such as

finance, marketing, and operations. This book provides coverage of all three aspects:

tools, modeling methods, and applications.

The purpose of the book is threefold: to fill the void in the graduate-level study

materials for addressing business problems in order to pose data questions, obtain

optimal business solutions via analytics theory, and ground the solution in practice.

In order to make the material self-contained, we have endeavored to provide ample

use of cases and data sets for practice and testing of tools. Each chapter comes

with data, examples, and exercises showing students what questions to ask, how to

apply the techniques using open source software, and how to interpret the results. In

our approach, simple examples are followed with medium to large applications and

solutions. The book can also serve as a self-study guide to professionals who wish

to enhance their knowledge about the field.

The distinctive features of the book are as follows:

• The chapters are written by experts from universities and industry.

• The major software used are R, Python, MS Excel, and MYSQL. These are all

topical and widely used in the industry.

S. Seshadri ()

Gies College of Business, University of Illinois at Urbana Champaign, Champaign, IL, USA

e-mail: sridhar@illinois.edu

B. Pochiraju, S. Seshadri (eds.), Essentials of Business Analytics, International

Series in Operations Research & Management Science 264,

https://doi.org/10.1007/978-3-319-68837-4_1

2 S. Seshadri

• Extreme care has been taken to ensure continuity from one chapter to the next.

The editors have attempted to make sure that the content and flow are similar in

every chapter.

• In Part A of the book, the tools and modeling methodology are developed in

detail. Then this methodology is applied to solve business problems in various

verticals in Part B. Part C contains larger case studies.

• The Appendices cover required material on Probability theory, R, and Python, as

these serve as prerequisites for the main text.

The structure of each chapter is as follows:

• Each chapter has a business orientation. It starts with business problems, which

are transformed into technological problems. Methodology is developed to solve

the technological problems. Data analysis is done using suitable software and the

output and results are clearly explained at each stage of development. Finally, the

technological solution is transformed back to a business solution. The chapters

conclude with suggestions for further reading and a list of references.

• Exercises (with real data sets when applicable) are at the end of each chapter and

on the Web to test and enhance the understanding of the concepts and application.

• Caselets are used to illustrate the concepts in several chapters.

1 Detailed Description of Chapters

Data Collection: This chapter introduces the concepts of data collection and

problem formulation. Firstly, it establishes the foundation upon which the fields

of data sciences and analytics are based, and defines core concepts that will be used

throughout the rest of the book. The chapter starts by discussing the types of data

that can be gathered, and the common pitfalls that can occur when data analytics

does not take into account the nature of the data being used. It distinguishes between

primary and secondary data sources using examples, and provides a detailed

explanation of the advantages and constraints of each type of data. Following this,

the chapter details the types of data that can be collected and sorted. It discusses the

difference between nominal-, ordinal-, interval-, and ratio-based data and the ways

in which they can be used to obtain insights into the subject being studied.

The chapter then discusses problem formulation and its importance. It explains

how and why formulating a problem will impact the data that is gathered, and

thus affect the conclusions at which a research project may arrive. It describes

a framework by which a messy real-world situation can be clarified so that a

mathematical toolkit can be used to identify solutions. The chapter explains the

idea of decision-problems, which can be used to understand the real world, and

research-objectives, which can be used to analyze decision-problems.

1 Introduction 3

The chapter also details the challenges faced when collecting and collating data.

It discusses the importance of understanding what data to collect, how to collect it,

how to assess its quality, and finally the most appropriate way of collating it so that

it does not lose its value.

The chapter ends with an illustrative example of how the retailing industry might

use various sources of data in order to better serve their customers and understand

their preferences.

Data Management—Relational Database Management Systems: This chapter

introduces the idea of data management and storage. The focus of the chapter

is on relational database management systems or RDBMS. RDBMS is the most

commonly used data organization system in enterprises. The chapter introduces and

explains the ideas using MySQL, an open-source structural query language used by

many of the largest data management systems in the world.

The chapter describes the basic functions of a MySQL server, such as creating

databases, examining data tables, and performing functions and various operations

on data sets. The first set of instructions the chapter discusses is about the rules,

definition, and creation of relational databases. Then, the chapter describes how to

create tables and add data to them using MySQL server commands. It explains how

to examine the data present in the tables using the SELECT command.

Data Management—Big Data: This chapter builds on some of the concepts

introduced in the previous chapter but focuses on big data tools. It describes what

really constitutes big data and focuses on some of the big data tools. In this chapter,

the basics of big data tools such as Hadoop, Spark, and surrounding ecosystem are

presented.

The chapter begins by describing Hadoop’s uses and key features, as well as the

programs in its ecosystem that can also be used in conjunction with it. It also briefly

visits the concepts of distributed and parallel computing and big data cloud.

The chapter describes the architecture of the Hadoop runtime environment. It

starts by describing the cluster, which is the set of host machines, or nodes for

facilitating data access. It then moves on to the YARN infrastructure, which is

responsible for providing computational resources to the application. It describes

two main elements of the YARN infrastructure—the Resource Manager and the

Node Manager. It then details the HDFS Federation, which provides storage,

and also discusses other storage solutions. Lastly, it discusses the MapReduce

framework, which is the software layer.

The chapter then describes the functions of MapReduce in detail. MapReduce

divides tasks into subtasks, which it runs in parallel in order to increase efficiency. It

discusses the manner in which MapReduce takes lists of input data and transforms

them into lists of output data, by implementing a “map” process and a “reduce”

process, which it aggregates. It describes in detail the process steps that MapReduce

takes in order to produce the output, and describes how Python can be used to create

a MapReduce process for a word count program.

The chapter briefly describes Spark and an application using Spark. It concludes

with a discussion about cloud storage. The chapter makes use of Cloudera virtual

machine (VM) distributable to demonstrate different hands-on exercises.

Thư viện tri thức trực tuyến

Essentials of Business Analytics

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Essentials of Business Communication

Essentials of business law

ESSENTIALS of Business Process Outsourcing 2005 phần 1 ppsx

ESSENTIALS of Business Process Outsourcing 2005 phần 5 ppt

ESSENTIALS of Business Process Outsourcing 2005 phần 6 ppsx

ESSENTIALS of Business Process Outsourcing 2005 phần 4 pptx