Seven Databases in Seven Weeks

What Readers Are Saying About

Seven Databases in Seven Weeks, Second Edition

Choosing a database is perhaps one of the most important architectural decisions

a developer can make. Seven Databases in Seven Weeks provides a fantastic tour

of different technologies and makes it easy to add each to your engineering toolbox.

➤ Dave Parfitt

Senior Site Reliability Engineer, Mozilla

By comparing each database technology to a tool you’d find in any workshop, the

authors of Seven Databases in Seven Weeks provide a practical and well-balanced

survey of a very diverse and highly varied datastore landscape. Anyone looking

to get a handle on the database options available to them as a data platform

should read this book and consider the trade-offs presented for each option.

➤ Matthew Oldham

Director of Data Architecture, Graphium Health

Reading this book felt like some of my best pair-programming experiences. It

showed me how to get started, kept me engaged, and encouraged me to experiment

on my own.

➤ Jesse Hallett

Open Source Mentor

This book will really give you an overview of what’s out there so you can choose

the best tool for the job.

➤ Jesse Anderson

Managing Director, Big Data Institute

We've left this page blank to

make the page numbers the

same in the electronic and

paper books.

We tried just leaving it out,

but then people wrote us to

ask about the missing pages.

Anyway, Eddy the Gerbil

wanted to say “hello.”

Seven Databases in Seven Weeks,

Second Edition

A Guide to Modern Databases and the NoSQL Movement

Luc Perkins

with Eric Redmond

and Jim R. Wilson

The Pragmatic Bookshelf

Raleigh, North Carolina

Many of the designations used by manufacturers and sellers to distinguish their products

are claimed as trademarks. Where those designations appear in this book, and The Pragmatic

Programmers, LLC was aware of a trademark claim, the designations have been printed in

initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer,

Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trademarks of The Pragmatic Programmers, LLC.

Every precaution was taken in the preparation of this book. However, the publisher assumes

no responsibility for errors or omissions, or for damages that may result from the use of

information (including program listings) contained herein.

Our Pragmatic books, screencasts, and audio books can help you and your team create

better software and have more fun. Visit us at https://pragprog.com.

The team that produced this book includes:

Publisher: Andy Hunt

VP of Operations: Janet Furlow

Managing Editor: Brian MacDonald

Supervising Editor: Jacquelyn Carter

Series Editor: Bruce A. Tate

Copy Editor: Nancy Rapoport

Indexing: Potomac Indexing, LLC

Layout: Gilson Graphics

For sales, volume licensing, and support, please contact [email protected].

For international rights, please contact [email protected].

No part of this publication may be reproduced, stored in a retrieval system, or transmitted,

in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise,

without the prior consent of the publisher.

Printed in the United States of America.

ISBN-13: 978-1-68050-253-4

Encoded using the finest acid-free high-entropy binary digits.

Book version: P1.0—April 2018

Contents

Acknowledgments . . . . . . . . . . . vii

Preface . . . . . . . . . . . . . . ix

1. Introduction . . . . . . . . . . . . . 1

It Starts with a Question 2

The Genres 3

Onward and Upward 8

2. PostgreSQL . . . . . . . . . . . . . 9

That’s Post-greS-Q-L 9

Day 1: Relations, CRUD, and Joins 10

Day 2: Advanced Queries, Code, and Rules 21

Day 3: Full Text and Multidimensions 36

Wrap-Up 50

3. HBase . . . . . . . . . . . . . . 53

Introducing HBase 54

Day 1: CRUD and Table Administration 55

Day 2: Working with Big Data 67

Day 3: Taking It to the Cloud 82

Wrap-Up 88

4. MongoDB . . . . . . . . . . . . . 93

Hu(mongo)us 93

Day 1: CRUD and Nesting 94

Day 2: Indexing, Aggregating, Mapreduce 110

Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS 124

Wrap-Up 132

5. CouchDB . . . . . . . . . . . . . 135

Relaxing on the Couch 135

Day 1: CRUD, Fauxton, and cURL Redux 137

Day 2: Creating and Querying Views 145

Day 3: Advanced Views, Changes API, and Replicating Data 158

Wrap-Up 174

6. Neo4J . . . . . . . . . . . . . . 177

Neo4j Is Whiteboard Friendly 177

Day 1: Graphs, Cypher, and CRUD 179

Day 2: REST, Indexes, and Algorithms 189

Day 3: Distributed High Availability 202

Wrap-Up 207

7. DynamoDB . . . . . . . . . . . . . 211

DynamoDB: The “Big Easy” of NoSQL 211

Day 1: Let’s Go Shopping! 216

Day 2: Building a Streaming Data Pipeline 233

Day 3: Building an “Internet of Things” System

Around DynamoDB 246

Wrap-Up 255

8. Redis . . . . . . . . . . . . . . 259

Data Structure Server Store 259

Day 1: CRUD and Datatypes 260

Day 2: Advanced Usage, Distribution 274

Day 3: Playing with Other Databases 289

Wrap-Up 303

9. Wrapping Up . . . . . . . . . . . . 305

Genres Redux 305

Making a Choice 309

Where Do We Go from Here? 309

A1. Database Overview Tables . . . . . . . . . 311

A2. The CAP Theorem . . . . . . . . . . . 315

Eventual Consistency 316

CAP in the Wild 317

The Latency Trade-Off 317

Bibliography . . . . . . . . . . . . 319

Index . . . . . . . . . . . . . . 321

Contents • vi

Acknowledgments

A book with the size and scope of this one is never the work of just the authors,

even if there are three of them. It requires the effort of many very smart people

with superhuman eyes spotting as many mistakes as possible and providing

valuable insights into the details of these technologies.

We’d like to thank, in no particular order, all of the folks who provided their

time and expertise:

Dave Parfitt Jerry Sievert Jesse Hallett

Matthew Oldham Ben Rady Nick Capito

Jesse Anderson Sean Moubry

Finally, thanks to Bruce Tate for his experience and guidance.

We’d also like to sincerely thank the entire team at the Pragmatic Bookshelf.

Thanks for entertaining this audacious project and seeing us through it. We’re

especially grateful to our editor, Jackie Carter. Your patient feedback made

this book what it is today. Thanks to the whole team who worked so hard to

polish this book and find all of our mistakes.

For anyone we missed, we hope you’ll accept our apologies. Any omissions

were certainly not intentional.

From Eric: Dear Noelle, you’re not special; you’re unique, and that’s so much

better. Thanks for living through another book. Thanks also to the database

creators and committers for providing us something to write about and make

a living at.

From Luc: First, I have to thank my wonderful family and friends for making

my life a charmed one from the very beginning. Second, I have to thank a

handful of people who believed in me and gave me a chance in the tech industry

at different stages of my career: Lucas Carlson, Marko and Saša Gargenta,

Troy Howard, and my co-author Eric Redmond for inviting me on board to

report erratum • discuss

prepare the most recent edition of this book. My journey in this industry has

changed my life and I thank all of you for crucial breakthroughs.

From Jim: First, I want to thank my family: Ruthy, your boundless patience

and encouragement have been heartwarming. Emma and Jimmy, you’re two

smart cookies, and your daddy loves you always. Also, a special thanks to all

the unsung heroes who monitor IRC, message boards, mailing lists, and bug

systems ready to help anyone who needs you. Your dedication to open source

keeps these projects kicking.

Acknowledgments • viii

report erratum • discuss

Preface

If we use oil extraction as a metaphor for understanding data in the contemporary world, then databases flat-out constitute—or are deeply intertwined

with—all aspects of the extraction chain, from the fields to the refineries,

drills, and pumps. If you want to harness the potential of data—which has

perhaps become as vital to our way of life as oil—then you need to understand

databases because they are quite simply the most important piece of modern

data equipment.

Databases are tools, a means to an end. But like any complex tool, databases

also harbor their own stories and embody their own ways of looking at the

world. The better you understand databases, the more capable you’ll be of

tapping into the ever-growing corpus of data at our disposal. That enhanced

understanding could lead to anything from undertaking fun side projects to

embarking on a career change or starting your own data-driven company.

Why a NoSQL Book

What exactly does the term NoSQL even mean? Which types of systems does

the term include? How will NoSQL impact the practice of making great software? These were questions we wanted to answer—as much for ourselves as

for others.

Looking back more than a half-decade later, the rise of NoSQL isn’t so much

buzzworthy as it is an accepted fact. You can still read plenty of articles about

NoSQL technologies on HackerNews, TechCrunch, or even WIRED, but the

tenor of those articles has changed from starry-eyed prophecy (“NoSQL will

change everything!”) to more standard reporting (“check out this new Redis

feature!”). NoSQL is now a mainstay and a steadily maturing one at that.

But don’t write a eulogy for relational databases—the “SQL” in “NoSQL”—just

yet. Although NoSQL databases have gained significant traction in the technological landscape, it’s still far too early to declare “traditional” relational

database models as dead or even dying. From the release of Google’s BigQuery

report erratum • discuss

and Spanner to continued rapid development of MySQL, PostgreSQL, and

others, relational databases are showing no signs of slowing down. NoSQL

hasn’t killed SQL; instead, the galaxy of uses for data has expanded, and

both paradigms continue to grow and evolve to keep up with the demand.

So read this book as a guide to powerful, compelling databases with similar

worldviews—not as a guide to the “new” way of doing things or as a nail in the

coffin of SQL. We’re writing a second edition so that a new generation of data

engineers, application developers, and others can get a high-level understanding and deep dive into specific databases in one place.

Why Seven Databases

This book’s format originally came to us when we read Bruce Tate’s exemplary

Seven Languages in Seven Weeks [Tat10] many years ago. That book’s style of

progressively introducing languages struck a chord with us. We felt teaching

databases in the same manner would provide a smooth medium for tackling

some of these tough NoSQL questions while also creating conceptual bridges

across chapters.

What’s in This Book

This book is aimed at experienced application developers, data engineers,

tech enthusiasts, and others who are seeking a well-rounded understanding

of the modern database landscape. Prior database experience is not strictly

required, but it helps.

After a brief introduction, this book tackles a series of seven databases

chapter by chapter. The databases were chosen to span five different database

genres or styles, which are discussed in Chapter 1, Introduction, on page 1.

In order, the databases covered are PostgreSQL, Apache HBase, MongoDB,

Apache CouchDB, Neo4J, DynamoDB, and Redis.

Each chapter is designed to be taken as a long weekend’s worth of work, split

up into three days. Each day ends with exercises that expand on the topics

and concepts just introduced, and each chapter culminates in a wrap-up

discussion that summarizes the good and bad points about the database.

You may choose to move a little faster or slower, but it’s important to grasp

each day’s concepts before continuing. We’ve tried to craft examples that

explore each database’s distinguishing features. To really understand what

these databases have to offer, you have to spend some time using them, and

that means rolling up your sleeves and doing some work.

Preface • x

report erratum • discuss

Although you may be tempted to skip chapters, we designed this book to be

read linearly. Some concepts, such as mapreduce, are introduced in depth

in earlier chapters and then skimmed over in later ones. The goal of this book

is to attain a solid understanding of the modern database field, so we recommend you read them all.

What This Book Is Not

Before reading this book, you should know what it won’t cover.

This Is Not an Installation Guide

Installing the databases in this book is sometimes easy, sometimes a bit of

a challenge, and sometimes downright frustrating. For some databases, you’ll

be able to use stock packages or tools such as apt-get (on many Linux systems)

or Homebrew (if you’re a Mac OS user) and for others you may need to compile

from source. We’ll point out some useful tips here and there, but by and large

you’re on your own. Cutting out installation steps allows us to pack in more

useful examples and a discussion of concepts, which is what you really came

for anyway, right?

Administration Manual? We Think Not

In addition to installation, this book will also not cover everything you’d find

in an administration manual. Each of these databases offers myriad options,

settings, switches, and configuration details, most of which are well covered

online in each database’s official documentation and on forums such as

StackOverflow. We’re much more interested in teaching you useful concepts

and providing full immersion than we are in focusing on the day-to-day

operations. Though the characteristics of the databases can change based

on operational settings—and we discuss these characteristics in some chapters

—we won’t be able to go into all the nitty-gritty details of all possible configurations. There simply isn’t space!

A Note to Windows Users

This book is inherently about choices, predominantly open source software

on *nix platforms. Microsoft environments tend to strive for an integrated

environment, which limits many choices to a smaller predefined set. As such,

the databases we cover are open source and are developed by (and largely

for) users of *nix systems. This is not our own bias so much as a reflection

of the current state of affairs.

report erratum • discuss

What This Book Is Not • xi

Consequently, our tutorial-esque examples are presumed to be run in a *nix

shell. If you run Windows and want to give it a try anyway, we recommend

setting up Bash on Windows1

or Cygwin2

to give you the best shot at success.

You may also want to consider running a Linux virtual machine.

Code Examples and Conventions

This book contains code in a variety of languages. In part, this is a consequence of the databases that we cover. We’ve attempted to limit our choice

of languages to Ruby/JRuby and JavaScript. We prefer command-line tools

to scripts, but we will introduce other languages to get the job done—such

as PL/pgSQL (Postgres) and Cypher (Neo4J). We’ll also explore writing some

server-side JavaScript applications with Node.js.

Except where noted, code listings are provided in full, usually ready to be

executed at your leisure. Samples and snippets are syntax highlighted according to the rules of the language involved. Shell commands are prefixed by $

for *nix shells or by a different token for database-specific shells (such as >

in MongoDB).

Credits

Apache, Apache HBase, Apache CouchDB, HBase, CouchDB, and the HBase

and CouchDB logos are trademarks of The Apache Software Foundation. Used

with permission. No endorsement by The Apache Software Foundation is

implied by the use of these marks.

Online Resources

The Pragmatic Bookshelf’s page for this book3

is a great resource. There you’ll

find downloads for all the source code presented in this book. You’ll also find

feedback tools such as a community forum and an errata submission form

where you can recommend changes to future releases of the book.

Thanks for coming along with us on this journey through the modern database

landscape.

Luc Perkins, Eric Redmond, and Jim R. Wilson

April 2018

1. https://msdn.microsoft.com/en-us/commandline/wsl/about

2. http://www.cygwin.com/

3. http://pragprog.com/book/pwrdata/seven-databases-in-seven-weeks

Preface • xii

report erratum • discuss

CHAPTER 1

Introduction

The non-relational database paradigm—we’ll call it NoSQL throughout this

book, following now-standard usage—is no longer the fledgling upstart that

it once was. When the NoSQL alternative to relational databases came on the

scene, the “old” model was the de facto option for problems big and small.

Today, that relational model is still going strong and for many reasons:

• Databases such as PostgreSQL, MySQL, Microsoft SQL Server, and Oracle,

amongst many others, are still widely used, discussed, and actively

developed.

• Knowing how to run SQL queries remains a highly sought-after skill for

software engineers, data analysts, and others.

• There remains a vast universe of use cases for which a relational database

is still beyond any reasonable doubt the way to go.

But at the same time, NoSQL has risen far beyond its initial upstart status

and is now a fixture in the technology world. The concepts surrounding it,

such as the CAP theorem, are widely discussed at programming conferences,

on Hacker News, on StackOverflow, and beyond. Schemaless design, massive

horizontal scaling capabilities, simple replication, new query methods that

don’t feel like SQL at all—these hallmarks of NoSQL have all gone mainstream.

Not long ago, a Fortune 500 CTO may have looked at NoSQL solutions with

bemusement if not horror; now, a CTO would be crazy not to at least consider

them for some of their workloads.

In this book, we explore seven databases across a wide spectrum of database

styles. We start with a relational database, PostgreSQL, largely for the sake of

comparison (though Postgres is quite interesting in its own right). From there,

things get a lot stranger as we wade into a world of databases united above

all by what they aren’t. In the process of reading this book, you will learn the

report erratum • discuss

Thư viện tri thức trực tuyến

Seven Databases in Seven Weeks

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

seven databases in seven weeks 2484

Tài liệu Seven Databases in Seven Weeks pdf

Tài liệu Seven Databases in Seven Weeks pptx

Tài liệu Pro SQL Sever 2012 Relational Database Design and Implementation ppt

SQL sever 2008 create new database

Seven Mobile Apps in Seven Weeks