Definition: A probabilistic graphical model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. In general, PGM obeys following rules: As the dimension of the data increases, the chain rule is harder to compute. In fact, many models try to simplify it in some ways. Reasons: They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph. Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly. Representation: Express a probability distribution that models some real-world phenomenon. Inference: Obtain answers to relevant questions from our models. Learning: Fit a model to real-world data. We are going to mainly focus on Representation in this post. Representation: Express a probability distribution that models some real-world phenomenon. Directed graphical models is also known as Bayesian networks.
Intuition: In a directed graph, vertices correspond to variables Formal Definition: A Bayesian network is a directed graph A random variable One conditional probability distribution (CPD) Note: Bayesian networks represent probability distributions that can be formed via products of smaller, local conditional probability distributions (one for each variable). Another way to say it is that each factor in the factorization of Directed models are often used as generative models. Undirected graphical models is also known as Markov random fields (MRFs).
Unlike in the directed case, we cannot say anything about how one variable is generated from another set of variables (as a conditional probability distribution would do). Intuition: Suppose we have five students doing a project and we want to evaluate how well they would cooperate together. Since five people are too many to be evaluated as a whole, we devide it into small subgroups and evaluate these subgroups respectively. In fact, these small subgroups are called clique and we would introduce it later in this section.
Here, we introduce the concept of potential function Note that the left hand side of the queation is a probability but the right hand side is a product of potentials/ scores. To make the right hand side a valid probability, we need to introduce a normalization term Here we say Notice that the summation in Formal Definition: cliques: fully connected subgraphs. maximal clique: A clique is a maximal clique if it is not contained in any larger clique. A Markov Random Field (MRF) is a probability distribution where is a normalizing constant that ensures that the distribution sums to one.
Global Markov Property: Local Markov Property: Pairwise Markov Property: Note: A distribution Global Markov Property A Markov random field reflects conditional independency since it satisfies the Local Markov Property. To see whether a distribution is a Markov random field or Markov network, we have the following theorem: Hammersley-Clifford Theorem: Suppose Bayesian networks effectively show causality, whereas MRFs cannot. Thus, MRFs are preferable for problems where there is no clear causality between random variables. It is much easier to generate data from a Bayesian network, which is important in some applications. In Markov random fields, computing the normalization constant A moral graph is used to find the equivalent undirected form of a directed acyclic graph. The moralized counterpart of a directed acyclic graph is formed by Add edges between all pairs of non-adjacent nodes that have a common child. Make all edges in the graph undirected. Here is an example:
Note that a Bayesian network can always be converted into an undirected network. Therefore, MRFs have more power than Bayesian networks, but are more difficult to deal with computationally. A general rule of thumb is to use Bayesian networks whenever possible, and only switch to MRFs if there is no natural way to model the problem with a directed graph
A Markov network has an undesirable ambiguity from the factorization perspective. Consider the three-node Markov network in the figure (left). Any distribution that factorizes as for some positive function The model family in Definition (factor graph): A factor graph is a bipartite graph An example of a factor graph is shown on the right side of the figure above. In the figure, the circles are variable nodes, and the shaded boxes are factor nodes. Notice that, unlike the undirected graph, the factor graph depicts the factorization of the model unambiguously. Remark: Directed models can be thought of as a kind of factor graph, in which the individual factors are locally normalized in a special fashion so that globally Inference: Obtain answers to relevant questions from our models. Learning: Fit a model to real-world data. Parameter learning: the graph structure is known and we want to estimate the parameters. complete case: incomplete case: We use EM Algorithm to approximate parameters. Example: Guassian Mixture Model (GMM), Hidden Markov Model (HMM). Structure learning: we want to estimate the graph, i.e., determine from data how the variables depend on each other. Reference:
往期文章链接目录
文章目录
Probabilistic Graphical Model (PGM)
Sum Rule : p(x1)=∫p(x1,x2)dx2Product Rule : p(x1,x2)=p(x1∣x2)p(x2)Chain Rule: p(x1,x2,⋯,xp)=i=1∏pp(xi∣xi+1,xi+2…xp)Bayesian Rule: p(x1∣x2)=p(x2)p(x2∣x1)p(x1)Why we need probabilistic graphical models
Three major parts of PGM
Representation
Directed graphical models (Bayesian networks)
xi and edges indicate dependency relationships. Once the graphical representation of a directed graph is given (directed acyclic graphs), we can easily calculate the joint probability. For example, from the figure above, we can calculate the joint probability
p(a,b,c,d,e) by
p(a,b,c,d,e)=p(a)⋅p(b∣a)⋅p(c∣b,d)⋅p(d)⋅p(e∣c)
G=(V,E) together with
xi for each node
i∈V.
p(xi∣xAi) per node, specifying the probability of
xi conditioned on its parents’ values.
p(a,b,c,d,e) is locally normalized (every factor can sum up to one).Undirected graphical models (Markov random fields)
ϕ to evaluete how well they would cooperate together. You can think of it as a score that measures how well a clique cooperate. Higher scores indicate better cooperation. In fact, we requie scores to be non-negative, and depending on how we define the potential functions, we would get different models. As the figure shown above, we could write
p(a,b,c,d,e) as
p(a,b,c,d,e)=ϕ1(a,b,c)⋅ϕ2(b,d)⋅ϕ3(d,e)
1/Z. Hence it becomes
p(a,b,c,d,e)=Z1⋅ϕ1(a,b,c)⋅ϕ2(b,d)⋅ϕ3(d,e)
p(a,b,c,d,e) is globally normalized. Also, we call
Z a partition function, which is
Z=a,b,c,d,e∑ϕ1(a,b,c)⋅ϕ2(b,d)⋅ϕ3(d,e)(1)
(1) is over the exponentially many possible assignments to
a,b,c,d and
e. For this reason, computing
Z is intractable in general, but much work exists on how to approximate it.
p over variables
x1,…,xn defined by an undirected graph
G in which nodes correspond to variables
xi. The probability
p has the form
p(x1,…,xn)=Z1c∈C∏ϕc(xc)(2)
C denotes the set of cliques of
G, and each factor
ϕc is a non-negative function over the variables in a clique. The partition function
Z=x1,…,xn∑c∈C∏ϕc(xc)Markov Properties of undirected graph
p satisfies the global Markov property with respect to a graph
G if for any disjoint vertex subsets
A,
B, and
C, such that
C separates
A and
B, the random variables
XA are conditionally independent of
XB given
XC.
Here,we say
C separates
A and
B if every path from a node in
A to a node in B passes through a node in
C (d-seperation).
p satisfies the local Markov property with respect to
G if the conditional distribution of a variable given its neighbors is independent of the remaining nodes.
p satisfies the pairwise markov property with respect to
G if for any pair of non-adjacent nodes,
s,t∈V, we have
Xs⊥Xt∣XV{s,t}.
p that satisfies the global Markov property is said to be a Markov random field or Markov network with respect to the graph.
⇒ Local Markov Property
⇒Pairwise Markov Property.
p is a strictly positive distribution, and
G is an undirected graph that indexes the domain of
p. Then
p is Markov with respect to G if and only if
p factorizes over the cliques of the graph
G.Comparison between Bayesian networks and Markov random fields
Z requires a summation over the exponentially many possible assignments. For this reason, computing
Z is intractable in general, but much work exists on how to approximate it.Moral graph
Factor Graph
p(x1,x2,x3)∝ϕ(x1,x2,x3)(3)
ϕ is Markov with respect to this graph (check Hammersley-Clifford Theorem mentioned earlier). However, we may wish to use a more restricted parameterization, where
p(x1,x2,x3)∝ϕ1(x1,x2)ϕ1(x2,x3)ϕ1(x1,x3)(4)
(4) is smaller, and therefore may be more amenable to parameter estimation. But the Markov network formalism cannot distinguish between these two parameterizations. In order to state models more precisely, the factorization in
(2) can be represented directly by means of a factor graph.
G=(V,F,E) in which a variable node
xi∈V is connected to a factor node
ϕa∈F if
xi is an argument to
ϕa.
Z=1.Inference
p(y=1)=x1∑x2∑⋯xn∑p(y=1,x1,x2,…,xn)
x1,…,xnmaxp(y=1,x1,…,xn)Learning
往期文章链接目录
本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器 下载并得到。
ImovieBox网页视频下载器 下载地址: ImovieBox网页视频下载器-最新版本下载
本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.
阅读和此文章类似的: 全球云计算