Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Structures and Algorithms for Big Databases
PREMIUM
Số trang
226
Kích thước
16.8 MB
Định dạng
PDF
Lượt xem
1815

Data Structures and Algorithms for Big Databases

Nội dung xem thử

Mô tả chi tiết

Data Structures and Algorithms for

Big Databases

Michael A. Bender

Stony Brook & Tokutek

Bradley C. Kuszmaul

MIT & Tokutek

Big data problem

oy vey

??? ???

???

data indexing query processor

queries + answers

???

365

42

data ingestion

Important and universal problem.

2 Hot topic.

Big data problem

oy vey

??? ???

???

data indexing query processor

queries + answers

???

365

42

data ingestion

For on-disk data, one sees funny tradeoffs in the speeds

of data ingestion, query speed, and freshness of data.

Important and universal problem.

2 Hot topic.

data indexing Don’t Thrash: How to Cache Your Hash in Flash query processor

queries +

answers

???

42

data

ingestion

Funny tradeoff in ingestion, querying, freshness

• “Select queries were slow until I added an index onto the timestamp field...

Adding the index really helped our reporting, BUT now the inserts are taking

forever.”

‣ Comment on mysqlperformanceblog.com

• “I'm trying to create indexes on a table with 308 million rows. It took ~20

minutes to load the table but 10 days to build indexes on it.”

‣ MySQL bug #9544

• “They indexed their tables, and indexed them well,

And lo, did the queries run quick!

But that wasn’t the last of their troubles, to tell–

Their insertions, like treacle, ran thick.”

‣ Not from Alice in Wonderland by Lewis Carroll

3

data indexing Don’t Thrash: How to Cache Your Hash in Flash query processor

queries +

answers

???

42

data

ingestion

Funny tradeoff in ingestion, querying, freshness

• “Select queries were slow until I added an index onto the timestamp field...

Adding the index really helped our reporting, BUT now the inserts are taking

forever.”

‣ Comment on mysqlperformanceblog.com

• “I'm trying to create indexes on a table with 308 million rows. It took ~20

minutes to load the table but 10 days to build indexes on it.”

‣ MySQL bug #9544

• “They indexed their tables, and indexed them well,

And lo, did the queries run quick!

But that wasn’t the last of their troubles, to tell–

Their insertions, like treacle, ran thick.”

‣ Not from Alice in Wonderland by Lewis Carroll

4

data indexing Don’t Thrash: How to Cache Your Hash in Flash query processor

queries +

answers

???

42

data

ingestion

Funny tradeoff in ingestion, querying, freshness

• “Select queries were slow until I added an index onto the timestamp field...

Adding the index really helped our reporting, BUT now the inserts are taking

forever.”

‣ Comment on mysqlperformanceblog.com

• “I'm trying to create indexes on a table with 308 million rows. It took ~20

minutes to load the table but 10 days to build indexes on it.”

‣ MySQL bug #9544

• “They indexed their tables, and indexed them well,

And lo, did the queries run quick!

But that wasn’t the last of their troubles, to tell–

Their insertions, like treacle, ran thick.”

‣ Not from Alice in Wonderland by Lewis Carroll

5

This tutorial

• Better data

structures

significantly mitigate

the insert/query/

freshness tradeoff.

• These structures

scale to much larger

sizes while efficiently

using the memory￾hierarchy.

Fractal-tree®

index

LSM

tree

Bɛ-tree

6

Don’t Thrash: How to Cache Your Hash in Flash

What we mean by Big Data

We don’t define Big Data in terms of TB, PB, EB.

By Big Data, we mean

• The data is too big to fit in main memory.

• We need data structures on the data.

• Words like “index” or “metadata” suggest that there are

underlying data structures.

• These underlying data structures are also too big to fit in

main memory.

7

8 Don’t Thrash: How to Cache Your Hash in Flash

In this tutorial we study the

underlying data structures for

managing big data.

File systems

NewSQL

SQL

NoSQL

But enough about

databases...

... more

about us.

Don’t Thrash: How to Cache Your Hash in Flash

Our Research and Tokutek

A few years ago we started working together on

I/O-efficient and cache-oblivious data structures.

Along the way, we started Tokutek to commercialize

our research.

Michael Martin Bradley

10

Tokutek sells open source, ACID compliant,

implementations of MySQL and MongoDB.

11

File System

MySQL Database

-- SQL processing,

-- query optimization

Application

libFT

Disk/SSD

TokuDB

File System

Standard MongoDB

-- drivers,

-- query language, and

-- data model

Application

libFT

TokuMX

Tokutek sells open source, ACID compliant,

implementations of MySQL and MongoDB.

11

File System

MySQL Database

-- SQL processing,

-- query optimization

Application

libFT

Disk/SSD

TokuDB

File System

Standard MongoDB

-- drivers,

-- query language, and

-- data model

Application

libFT

TokuMX

Tokutek sells open source, ACID compliant,

implementations of MySQL and MongoDB.

11

File System

MySQL Database

-- SQL processing,

-- query optimization

Application

libFT

Disk/SSD

TokuDB

File System

Standard MongoDB

-- drivers,

-- query language, and

-- data model

Application

libFT

TokuMX

libFT implements the

persistent structures

for storing data on disk.

Tokutek sells open source, ACID compliant,

implementations of MySQL and MongoDB.

11

File System

MySQL Database

-- SQL processing,

-- query optimization

Application

libFT

Disk/SSD

TokuDB

File System

Standard MongoDB

-- drivers,

-- query language, and

-- data model

Application

libFT

TokuMX

libFT implements the

persistent structures

for storing data on disk.

libFT provides a

Berkeley DB API

and can be used

independently.

Tải ngay đi em, còn do dự, trời tối mất!