Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

A Statistical Machine Learning Perspective of Deep Learning
PREMIUM
Số trang
286
Kích thước
21.5 MB
Định dạng
PDF
Lượt xem
1347

A Statistical Machine Learning Perspective of Deep Learning

Nội dung xem thử

Mô tả chi tiết

A Statistical Machine Learning

Perspective of Deep Learning:

Algorithm, Theory, Scalable Computing

Maruan Al-Shedivat, Zhiting Hu, Hao Zhang, and Eric Xing

Petuum Inc

&

Carnegie Mellon University

• Network

switches

• Infiniband

• Stochastic Gradient

Descent / Back

propagation

• Graphical Models

• Regularized

Bayesian Methods

• Deep Learning • Sparse Coding

• Sparse Structured

I/O Regression

• Large-Margin

• Spectral/Matrix

Methods

• Nonparametric

Bayesian Models

• Coordinate

Descent

• L-BFGS • Gibbs Sampling • Metropolis-

Hastings

• Mahout

(MapReduce)

• Mllib

(BSP)

• CNTK • MxNet • Tensorflow

(Async)

• Network attached

storage

• Flash storage

• Server machines

• Desktops/Laptops

• NUMA machines

• Mobile devices

• GPUs, CPUs, FPGA, TPU

• ARM-powered devices

• RAM

• Flash

• SSD

• Cloud compute

(e.g. Amazon EC2)

• IoT networks

• Data centers

• Virtual

machines

Hadoop Spark MPI RPC GraphLab …

Task

Model

Algorithm

Implementation

System

Platform

and Hardware

Element of AI/Machine Learning

© Petuum,Inc. 1

ML vs DL

© Petuum,Inc. 2

Plan

• Statistical And Algorithmic Foundation and Insight of Deep

Learning

• On Unified Framework of Deep Generative Models

• Computational Mechanisms: Distributed Deep Learning

Architectures

© Petuum,Inc. 3

Part-I

Basics

Outline

• Probabilistic Graphical Models: Basics

• An overview of DL components

• Historical remarks: early days of neural networks

• Modern building blocks: units, layers, activations functions, loss functions, etc.

• Reverse-mode automatic differentiation (aka backpropagation)

• Similarities and differences between GMs and NNs

• Graphical models vs. computational graphs

• Sigmoid Belief Networks as graphical models

• Deep Belief Networks and Boltzmann Machines

• Combining DL methods and GMs

• Using outputs of NNs as inputs to GMs

• GMs with potential functions represented by NNs

• NNs with structured outputs

• Bayesian Learning of NNs

• Bayesian learning of NN parameters

• Deep kernel learning

© Petuum,Inc. 5

Outline

• Probabilistic Graphical Models: Basics

• An overview of DL components

• Historical remarks: early days of neural networks

• Modern building blocks: units, layers, activations functions, loss functions, etc.

• Reverse-mode automatic differentiation (aka backpropagation)

• Similarities and differences between GMs and NNs

• Graphical models vs. computational graphs

• Sigmoid Belief Networks as graphical models

• Deep Belief Networks and Boltzmann Machines

• Combining DL methods and GMs

• Using outputs of NNs as inputs to GMs

• GMs with potential functions represented by NNs

• NNs with structured outputs

• Bayesian Learning of NNs

• Bayesian learning of NN parameters

• Deep kernel learning

© Petuum,Inc. 6

Fundamental questions of probabilistic modeling

• Representation: what is the joint probability distr. on multiple variables?

!(#$, #&, #', … , #))

• How many state configurations are there?

• Do they all need to be represented?

• Can we incorporate any domain-specific insights into the representation?

• Learning: where do we get the probabilities from?

• Maximum likelihood estimation? How much data do we need?

• Are there any other established principles?

• Inference: if not all variables are observable, how to compute the conditional

distribution of latent variables given evidence?

• Computing !(+|-) would require summing over 2/ configurations of the unobserved variables

© Petuum,Inc. 7

What is a graphical model?

• A possible world of cellular signal transduction

© Petuum,Inc. 8

GM: structure simplifies representation

• A possible world of cellular signal transduction

© Petuum,Inc. 9

Probabilistic Graphical Models

• If #0’s are conditionally independent (as described by a PGM), then

the joint can be factored into a product of simpler terms

• Why we may favor a PGM?

• Easy to incorporate domain knowledge and causal (logical) structures

• Significant reduction in representation cost (21 reduced down to 18)

! #$, #&, #', #2, #3, #/, #4, #1 =

! #$ ! #& ! #' #$ ! #2 #& ! #3 #&

!(#/|#', #2)!(#4|#/)!(#1|#3, #/)

© Petuum,Inc. 10

The two types of GMs

• Directed edges assign causal meaning to the relationships

(Bayesian Networks or Directed Graphical Models)

• Undirected edges represent correlations between the variables

(Markov Random Field or Undirected Graphical Models)

! #$, #&, #', #2, #3, #/, #4, #1 =

! #$ ! #& ! #' #$ ! #2 #& ! #3 #&

!(#/|#', #2)!(#4|#/)!(#1|#3, #/)

! #$, #&, #', #2, #3, #/, #4, #1 =

1

7 exp{= #$ + = #& + = #$, #' + = #&, #2 + = #3, #& +

= #', #2, #/ + = #/, #4 + = #3, #/, #1 }

!(+|@)

q = argmaxq!q(@)

© Petuum,Inc. 11

Outline

• Probabilistic Graphical Models: Basics

• An overview of DL components

• Historical remarks: early days of neural networks

• Modern building blocks: units, layers, activations functions, loss functions, etc.

• Reverse-mode automatic differentiation (aka backpropagation)

• Similarities and differences between GMs and NNs

• Graphical models vs. computational graphs

• Sigmoid Belief Networks as graphical models

• Deep Belief Networks and Boltzmann Machines

• Combining DL methods and GMs

• Using outputs of NNs as inputs to GMs

• GMs with potential functions represented by NNs

• NNs with structured outputs

• Bayesian Learning of NNs

• Bayesian learning of NN parameters

• Deep kernel learning

© Petuum,Inc. 12

Perceptron and Neural Nets

• From biological neuron to artificial neuron (perceptron)

• From biological neuron network to artificial neuron networks

Threshold

Inputs

x1

x2

Output

å Y

Hard

Limiter

w2

w1

Linear

Combiner

q

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon

Input Layer Output Layer

Middle Layer

I n p u t S i g n a l s

O u t p u t S i g n a l s

McCulloch & Pitts (1943)

© Petuum,Inc. 13

The perceptron learning algorithm

• Recall the nice property of sigmoid function

• Consider regression problem f: XàY, for scalar Y:

• We used to maximize the conditional data likelihood

• Here …

© Petuum,Inc. 14

Tải ngay đi em, còn do dự, trời tối mất!