Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

A Statistical Machine Learning Perspective of Deep Learning
Nội dung xem thử
Mô tả chi tiết
A Statistical Machine Learning
Perspective of Deep Learning:
Algorithm, Theory, Scalable Computing
Maruan Al-Shedivat, Zhiting Hu, Hao Zhang, and Eric Xing
Petuum Inc
&
Carnegie Mellon University
• Network
switches
• Infiniband
• Stochastic Gradient
Descent / Back
propagation
• Graphical Models
• Regularized
Bayesian Methods
• Deep Learning • Sparse Coding
• Sparse Structured
I/O Regression
• Large-Margin
• Spectral/Matrix
Methods
• Nonparametric
Bayesian Models
• Coordinate
Descent
• L-BFGS • Gibbs Sampling • Metropolis-
Hastings
• Mahout
(MapReduce)
• Mllib
(BSP)
• CNTK • MxNet • Tensorflow
(Async)
…
• Network attached
storage
• Flash storage
• Server machines
• Desktops/Laptops
• NUMA machines
• Mobile devices
• GPUs, CPUs, FPGA, TPU
• ARM-powered devices
• RAM
• Flash
• SSD
• Cloud compute
(e.g. Amazon EC2)
• IoT networks
• Data centers
• Virtual
machines
Hadoop Spark MPI RPC GraphLab …
Task
Model
Algorithm
Implementation
System
Platform
and Hardware
Element of AI/Machine Learning
© Petuum,Inc. 1
ML vs DL
© Petuum,Inc. 2
Plan
• Statistical And Algorithmic Foundation and Insight of Deep
Learning
• On Unified Framework of Deep Generative Models
• Computational Mechanisms: Distributed Deep Learning
Architectures
© Petuum,Inc. 3
Part-I
Basics
Outline
• Probabilistic Graphical Models: Basics
• An overview of DL components
• Historical remarks: early days of neural networks
• Modern building blocks: units, layers, activations functions, loss functions, etc.
• Reverse-mode automatic differentiation (aka backpropagation)
• Similarities and differences between GMs and NNs
• Graphical models vs. computational graphs
• Sigmoid Belief Networks as graphical models
• Deep Belief Networks and Boltzmann Machines
• Combining DL methods and GMs
• Using outputs of NNs as inputs to GMs
• GMs with potential functions represented by NNs
• NNs with structured outputs
• Bayesian Learning of NNs
• Bayesian learning of NN parameters
• Deep kernel learning
© Petuum,Inc. 5
Outline
• Probabilistic Graphical Models: Basics
• An overview of DL components
• Historical remarks: early days of neural networks
• Modern building blocks: units, layers, activations functions, loss functions, etc.
• Reverse-mode automatic differentiation (aka backpropagation)
• Similarities and differences between GMs and NNs
• Graphical models vs. computational graphs
• Sigmoid Belief Networks as graphical models
• Deep Belief Networks and Boltzmann Machines
• Combining DL methods and GMs
• Using outputs of NNs as inputs to GMs
• GMs with potential functions represented by NNs
• NNs with structured outputs
• Bayesian Learning of NNs
• Bayesian learning of NN parameters
• Deep kernel learning
© Petuum,Inc. 6
Fundamental questions of probabilistic modeling
• Representation: what is the joint probability distr. on multiple variables?
!(#$, #&, #', … , #))
• How many state configurations are there?
• Do they all need to be represented?
• Can we incorporate any domain-specific insights into the representation?
• Learning: where do we get the probabilities from?
• Maximum likelihood estimation? How much data do we need?
• Are there any other established principles?
• Inference: if not all variables are observable, how to compute the conditional
distribution of latent variables given evidence?
• Computing !(+|-) would require summing over 2/ configurations of the unobserved variables
© Petuum,Inc. 7
What is a graphical model?
• A possible world of cellular signal transduction
© Petuum,Inc. 8
GM: structure simplifies representation
• A possible world of cellular signal transduction
© Petuum,Inc. 9
Probabilistic Graphical Models
• If #0’s are conditionally independent (as described by a PGM), then
the joint can be factored into a product of simpler terms
• Why we may favor a PGM?
• Easy to incorporate domain knowledge and causal (logical) structures
• Significant reduction in representation cost (21 reduced down to 18)
! #$, #&, #', #2, #3, #/, #4, #1 =
! #$ ! #& ! #' #$ ! #2 #& ! #3 #&
!(#/|#', #2)!(#4|#/)!(#1|#3, #/)
© Petuum,Inc. 10
The two types of GMs
• Directed edges assign causal meaning to the relationships
(Bayesian Networks or Directed Graphical Models)
• Undirected edges represent correlations between the variables
(Markov Random Field or Undirected Graphical Models)
! #$, #&, #', #2, #3, #/, #4, #1 =
! #$ ! #& ! #' #$ ! #2 #& ! #3 #&
!(#/|#', #2)!(#4|#/)!(#1|#3, #/)
! #$, #&, #', #2, #3, #/, #4, #1 =
1
7 exp{= #$ + = #& + = #$, #' + = #&, #2 + = #3, #& +
= #', #2, #/ + = #/, #4 + = #3, #/, #1 }
!(+|@)
q = argmaxq!q(@)
© Petuum,Inc. 11
Outline
• Probabilistic Graphical Models: Basics
• An overview of DL components
• Historical remarks: early days of neural networks
• Modern building blocks: units, layers, activations functions, loss functions, etc.
• Reverse-mode automatic differentiation (aka backpropagation)
• Similarities and differences between GMs and NNs
• Graphical models vs. computational graphs
• Sigmoid Belief Networks as graphical models
• Deep Belief Networks and Boltzmann Machines
• Combining DL methods and GMs
• Using outputs of NNs as inputs to GMs
• GMs with potential functions represented by NNs
• NNs with structured outputs
• Bayesian Learning of NNs
• Bayesian learning of NN parameters
• Deep kernel learning
© Petuum,Inc. 12
Perceptron and Neural Nets
• From biological neuron to artificial neuron (perceptron)
• From biological neuron network to artificial neuron networks
Threshold
Inputs
x1
x2
Output
å Y
Hard
Limiter
w2
w1
Linear
Combiner
q
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
Input Layer Output Layer
Middle Layer
I n p u t S i g n a l s
O u t p u t S i g n a l s
McCulloch & Pitts (1943)
© Petuum,Inc. 13
The perceptron learning algorithm
• Recall the nice property of sigmoid function
• Consider regression problem f: XàY, for scalar Y:
• We used to maximize the conditional data likelihood
• Here …
© Petuum,Inc. 14