Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

A Statistical Machine Learning Perspective of Deep Learning
PREMIUM
Số trang
286
Kích thước
21.5 MB
Định dạng
PDF
Lượt xem
1035

Tài liệu đang bị lỗi

File tài liệu này hiện đang bị hỏng, chúng tôi đang cố gắng khắc phục.

A Statistical Machine Learning Perspective of Deep Learning

Nội dung xem thử

Mô tả chi tiết

A Statistical Machine Learning

Perspective of Deep Learning:

Algorithm, Theory, Scalable Computing

Maruan Al-Shedivat, Zhiting Hu, Hao Zhang, and Eric Xing

Petuum Inc

&

Carnegie Mellon University

• Network

switches

• Infiniband

• Stochastic Gradient

Descent / Back

propagation

• Graphical Models

• Regularized

Bayesian Methods

• Deep Learning • Sparse Coding

• Sparse Structured

I/O Regression

• Large-Margin

• Spectral/Matrix

Methods

• Nonparametric

Bayesian Models

• Coordinate

Descent

• L-BFGS • Gibbs Sampling • Metropolis-

Hastings

• Mahout

(MapReduce)

• Mllib

(BSP)

• CNTK • MxNet • Tensorflow

(Async)

• Network attached

storage

• Flash storage

• Server machines

• Desktops/Laptops

• NUMA machines

• Mobile devices

• GPUs, CPUs, FPGA, TPU

• ARM-powered devices

• RAM

• Flash

• SSD

• Cloud compute

(e.g. Amazon EC2)

• IoT networks

• Data centers

• Virtual

machines

Hadoop Spark MPI RPC GraphLab …

Task

Model

Algorithm

Implementation

System

Platform

and Hardware

Element of AI/Machine Learning

© Petuum,Inc. 1

ML vs DL

© Petuum,Inc. 2

Plan

• Statistical And Algorithmic Foundation and Insight of Deep

Learning

• On Unified Framework of Deep Generative Models

• Computational Mechanisms: Distributed Deep Learning

Architectures

© Petuum,Inc. 3

Part-I

Basics

Outline

• Probabilistic Graphical Models: Basics

• An overview of DL components

• Historical remarks: early days of neural networks

• Modern building blocks: units, layers, activations functions, loss functions, etc.

• Reverse-mode automatic differentiation (aka backpropagation)

• Similarities and differences between GMs and NNs

• Graphical models vs. computational graphs

• Sigmoid Belief Networks as graphical models

• Deep Belief Networks and Boltzmann Machines

• Combining DL methods and GMs

• Using outputs of NNs as inputs to GMs

• GMs with potential functions represented by NNs

• NNs with structured outputs

• Bayesian Learning of NNs

• Bayesian learning of NN parameters

• Deep kernel learning

© Petuum,Inc. 5

Outline

• Probabilistic Graphical Models: Basics

• An overview of DL components

• Historical remarks: early days of neural networks

• Modern building blocks: units, layers, activations functions, loss functions, etc.

• Reverse-mode automatic differentiation (aka backpropagation)

• Similarities and differences between GMs and NNs

• Graphical models vs. computational graphs

• Sigmoid Belief Networks as graphical models

• Deep Belief Networks and Boltzmann Machines

• Combining DL methods and GMs

• Using outputs of NNs as inputs to GMs

• GMs with potential functions represented by NNs

• NNs with structured outputs

• Bayesian Learning of NNs

• Bayesian learning of NN parameters

• Deep kernel learning

© Petuum,Inc. 6

Fundamental questions of probabilistic modeling

• Representation: what is the joint probability distr. on multiple variables?

!(#$, #&, #', … , #))

• How many state configurations are there?

• Do they all need to be represented?

• Can we incorporate any domain-specific insights into the representation?

• Learning: where do we get the probabilities from?

• Maximum likelihood estimation? How much data do we need?

• Are there any other established principles?

• Inference: if not all variables are observable, how to compute the conditional

distribution of latent variables given evidence?

• Computing !(+|-) would require summing over 2/ configurations of the unobserved variables

© Petuum,Inc. 7

What is a graphical model?

• A possible world of cellular signal transduction

© Petuum,Inc. 8

GM: structure simplifies representation

• A possible world of cellular signal transduction

© Petuum,Inc. 9

Probabilistic Graphical Models

• If #0’s are conditionally independent (as described by a PGM), then

the joint can be factored into a product of simpler terms

• Why we may favor a PGM?

• Easy to incorporate domain knowledge and causal (logical) structures

• Significant reduction in representation cost (21 reduced down to 18)

! #$, #&, #', #2, #3, #/, #4, #1 =

! #$ ! #& ! #' #$ ! #2 #& ! #3 #&

!(#/|#', #2)!(#4|#/)!(#1|#3, #/)

© Petuum,Inc. 10

The two types of GMs

• Directed edges assign causal meaning to the relationships

(Bayesian Networks or Directed Graphical Models)

• Undirected edges represent correlations between the variables

(Markov Random Field or Undirected Graphical Models)

! #$, #&, #', #2, #3, #/, #4, #1 =

! #$ ! #& ! #' #$ ! #2 #& ! #3 #&

!(#/|#', #2)!(#4|#/)!(#1|#3, #/)

! #$, #&, #', #2, #3, #/, #4, #1 =

1

7 exp{= #$ + = #& + = #$, #' + = #&, #2 + = #3, #& +

= #', #2, #/ + = #/, #4 + = #3, #/, #1 }

!(+|@)

q = argmaxq!q(@)

© Petuum,Inc. 11

Outline

• Probabilistic Graphical Models: Basics

• An overview of DL components

• Historical remarks: early days of neural networks

• Modern building blocks: units, layers, activations functions, loss functions, etc.

• Reverse-mode automatic differentiation (aka backpropagation)

• Similarities and differences between GMs and NNs

• Graphical models vs. computational graphs

• Sigmoid Belief Networks as graphical models

• Deep Belief Networks and Boltzmann Machines

• Combining DL methods and GMs

• Using outputs of NNs as inputs to GMs

• GMs with potential functions represented by NNs

• NNs with structured outputs

• Bayesian Learning of NNs

• Bayesian learning of NN parameters

• Deep kernel learning

© Petuum,Inc. 12

Perceptron and Neural Nets

• From biological neuron to artificial neuron (perceptron)

• From biological neuron network to artificial neuron networks

Threshold

Inputs

x1

x2

Output

å Y

Hard

Limiter

w2

w1

Linear

Combiner

q

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon

Input Layer Output Layer

Middle Layer

I n p u t S i g n a l s

O u t p u t S i g n a l s

McCulloch & Pitts (1943)

© Petuum,Inc. 13

The perceptron learning algorithm

• Recall the nice property of sigmoid function

• Consider regression problem f: XàY, for scalar Y:

• We used to maximize the conditional data likelihood

• Here …

© Petuum,Inc. 14

Tải ngay đi em, còn do dự, trời tối mất!