what is the most important factor in deciding whether to prosecute?Ã¢â‚¬â€¹

...menustart

Bayes' Nets: Inference
- Recap: Bayes' Net Representation
- Recap: Example: Alarm Network
- Inference
- Inference by Enumeration
- Inference by Enumeration in Bayes' Net
- Inference by Enumeration vs. Variable Elimination
- Cistron Zoo
  - Factor Zoo I
  - Gene Zoo 2
  - Cistron Zoo 3
  - Factor Zoo Summary
  - Case : Traffic Domain
- Inference by Enumeration: Procedural Outline
  - Operation ane: Bring together Factors
    - Example: Multiple Joins
  - Operation two: Eliminate
    - Multiple Elimination
  - Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)
  - Marginalizing Early (= Variable Emptying)
  - Example: Traffic Domain once more
  - Marginalizing Early! (aka VE)
  - Show
- General Variable Emptying
  - Instance
  - Aforementioned Instance in Equations
  - Another Variable Elimination Example
- Variable Elimination Ordering
- VE: Computational and Space Complication
  - Worst Case Complexity?
- Polytrees
- Quiz
- Quiz BN2-2
- Quiz BN2-3

...menuend

Bayes' Nets: Inference

Recap: Bayes' Cyberspace Representation

A directed, acyclic graph, 1 node per random variable
A provisional probability table (CPT) for each node
- A collection of distributions over X, 1 for each combination of parents' values
  - P(X|a₁,...,a_n)
Bayes' nets implicitly encode joint distributions
- As a production of local conditional distributions
- TO see what probability a BN gives to a full assignment, multiply all the relevant conditionals together.

Recap: Example: Alarm Network

You build up the local piffling piececs, CPT, so you tin can respond questions of the whole distribution by assembling information technology from the BN.

                              P(+b,-e,+a,-j,+yard)  = P(+b)P(-east)P(+a |+b,-east)P(-j|+a)P(+m|+a) = 0.001 x 0.998 x 0.94 ten 0.ane 10 0.7

Inference

Inference: calculating some useful quantity from a joint probability distribution
- So you image the given is some drove of probabilites.
- Peradventure it's the whole articulation probability table in all its exponential size glory. More often, information technology'a a bunch of conditional probabilities that define a Bayes' Net.
- Some examples of things you might calculate from those givens are the approved thing is a posterior probability.
- Some other classic approved query y'all might do is a most probable explanation query, where yous say, I have some evidence, I would like to know the almost likely value of a i or more than variables given that evidence.
Examples:
- Posterior probability
  - P( Q| Due east₁=e₁,...,E_k=due east_thou )
  - calculating the posterior probability of some set of query variables conditioned on some evidence having been observed
  - I care almost variable Q. There's a bunch of evidence variables whose values I know and unfortunately there's going to exist a bunch of other variables called hidden variables that I don't care about, and I also don't observe. And we're going to have to sum them out, and that creates a lot of fourth dimension complexity.
- Most probable explanation
  - argmax_q P( Q=q | E₁=east₁...)
  - given some testify has bee observed, what's the nigh likely association of some query variables.

Inference by Enumeration

Full general case
- All variables: X₁,X₂,...,X_{due north}
  - Show variables: Due east₁,...,E₁₀₀₀ = due east₁,...,e_thousand
  - Query^* variable: Q
  - Hidden variables: H₁, ..., Hᵣ
the first blazon of inference is the naive type of inference is inference by enumeration
we accept some query , and the query has
- some evidence variables Due east₁...East_thousand ,
- some query variables Q , the ones which we want to distribution given the show variables.
- some hidden variables. H₁...H_r , the ones that are not query variables , not query variables, but yet still are in our joint distribution.
  - we have to bargain with them, you will have to sum them out effectively to get rid of them.
run across details in Probability
Problem with this algorithm:
1. we don't accept the behemothic joint distribution, nosotros've got a Bayes Net
2. while we could, in principle, build the giant joint distribution would be a waste of space and time.

Inference by Enumeration in Bayes' Net

Given unlimited time, inference in BNs is easy
Reminder of inference past enumeration by example:
- P(B|+j,+g)
  - ∝_B P(B,+j,+yard)
    - proportional to
    - this says these 2 things are almost equal, not actually equal.
    - but they'd be euqal if nosotros summed upwardly the values over b, and multiply by the inverse.
    - And and so what this says if you want a conditional probability, you compute the equivalent joint probability, then normalize.
  - = ∑_e,a P(B,e,a,+j,+m)
    - now we can use the definition if the joint distribution every bit its specified through the Bayes cyberspace :
  - = ∑_e,a P(B)P(e)P(a|B,e)P(+j|a)P(+m|a)
  - = P(B)P(+e)P(+a|B,+e)P(+j|+a)P(+m|+a) + P(B)P(+east)P(-a|B,+e)P(+j|-a)P(+m|-a) + P(B)P(-e)P(+a|B,-e)P(+j|+a)P(+thou|+a) + P(B)P(-e)P(-a|B,-eastward)P(+j|-a)P(+m|-a)
if we have 100 variables in the BNs , l of them are evidence variables, means l of them not evidence variables that volition look at 2⁵⁰ entries
- it gets very expensive, exponential in a number of non-evidence variables , unless almost everything is evidence
It's not ideal to do it this style

Inference past Enumeration vs. Variable Elimination

Gene Zoo

Cistron Zoo I

Joint distribution: P(10,Y)
- Entries P(x,y) for all x, y
- Sums to one

Example: P (T,Westward)

T	W	P
hot	sun	0.iv
hot	pelting	0.1
cold	sun	0.2
cold	rain	0.3

Selected joint: P(ten,Y)
- A slice of the joint distribution
- Entries P(x,y) for stock-still x, all y
- Sums to P(10)
  - factors don't have to sum to 1

Instance: P( cold , W)

T	W	P
cold	sun	0.two
cold	rain	0.iii

Number of capitals = dimensionality of the table
- P(T,Due west) is a 2D array
- P(cold,Westward) is a 1D assortment
- p(cold,sun),it's withal a joint probability, merely now it's a 1D object as a scalar.
- That's important. We can reduce the size of data structure without changing the semantics of what'south in it.
Then as we piece of work in this variable emptying procedure , the game will be i of trying to keep the number of capitialized variables small in our gene.

Cistron Zoo II

Unmarried provisional: P(Y | x)
- Entries P(y | x) for fixed x, all y
- Sums to one

Example: P(Westward|cold)

T	W	P
cold	sun	0.iv
cold	rain	0.6

Family of conditionals: P(Ten |Y)
- Multiple conditionals
- Entries P(x | y) for all 10, y
- Sums to |Y|
- a lilliputian weird. This is distributions over Y, simply it's ane for every value of ten , not a fixed x. It'southward got a value for every value of x and every value for y, If you add them up, each footling distribution over Y adds up to 1, and in that location are many of x. And so this whole matter together sums to more than 1.

Example : P(W|T)

T	W	P
hot	sun	0.viii
hot	pelting	0.2
cold	sun	0.iv
common cold	rain	0.six

Gene Zoo Iii

Now we get to the weird stuff.

Specified family: P( y | X )
- Entries P(y | x) for stock-still y, but for all 10
- Sums to … who knows!

Example: P(rain | T)

T	W	P
hot	rain	0.2
cold	rain	0.6

Information technology'south not a distribution over rain and sun, it's a bunch of probability of rains.
- This is no longer a distribution. It's no longer a family of distributions. It's a 1D array, each of those entries is the conditional probability of some particular value for Y, which is unremarkably an prove value.
How do nosotros get these things. Nosotros get these things because somebody hands us little pieces of Bayes' Internet. Nosotros're going to selecting the value of the bear witness, and and so nosotros're gonna to kickoff multiplying things together. So nosotros're going to get all kinds of stuff.

Factor Zoo Summary

In general, when we write P(Y₁ … Y_N | X₁ … X_Yard)
- It is a "factor," a multi-dimensional array
- Its values are P(y₁ … y_Northward | x₁ … ten_Thou)
- Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array

Case : Traffic Domain

R → T → L
Random Variables
- R: Raining
- T: Traffic
- L: Tardily for course!

P(R)

+r	0.1
-r	0.nine

P(T|R), family unit of conditional distributions

+r	+t	0.8
+r	-t	0.two
-r	+t	0.1
-r	-t	0.nine

P(L|T)

+t	+l	0.iii
+t	-50	0.7
-t	+l	0.1
-t	-l	0.9

query: P(L) = ?
- for inference enumertation:
- P(L) = ∑_r,t P(r,t,L)
- = ∑_r,t P(r)P(t|r)P(L|t)

Inference by Enumeration: Procedural Outline

Operation 1: Join Factors

Commencement basic operation: joining factors
Combining factors:
- Just like a database join
- Get all factors over the joining variable
- Build a new factor over the union of the variables involved
Example: Bring together on R
Computation for each entry: pointwise products
- ∀_r,t : P(r,t) = P(r)·P(t|r)

Instance: Multiple Joins

we call "Join on R" means that you grab all tables that have R in them.

Operation two: Eliminate

Second bones operation: marginalization
Take a gene and sum out a variable
- Shrinks a factor to a smaller i
- A projection operation
  - get rid of the variables that don't matter -- the hidden variables
  - why practice we even have subconscious variables ?
    - the reason nosotros have it because we started with a articulation distribution that was over more than than the variables that appear in our query.
Example:

Multiple Elimination

Thus Far: Multiple Bring together, Multiple Eliminate (= Inference by Enumeration)

Marginalizing Early (= Variable Elimination)

switch the some order of join/eliminate
intuition
- if you want to eliminate a variable , you lot tin can non do this until you accept joined on that variable.

Instance: Traffic Domain again

R → T → L
P(L) = ?

Marginalizing Early! (aka VE)

P(50) = ?

so that is variable emptying .
what if your prove ?
- but like with the inference by enumeration , when at that place is prove you just look at your tabels and you only retain those entries consistent with your evidence.
Q: why sum out R in the 1st step?
- Really nosotros are eliminating a hidden variable, we are hunting down the subconscious variables.

Evidence

If evidence, get-go with factors that select that evidence
Nosotros eliminate all vars other than query + evidence
Issue will be a selected articulation of query and evidence
- E.k. for P(L | +r), we would end up with:
To get our reply, just normalize this!
That's it!

General Variable Elimination

Query: P( Q| East₁=due east₁,...,E_g=e_m )
Commencement with initial factors:
- Local CPTs (but instantiated by evidence)
While in that location are however hidden variables (not Q or evidence):
- Choice a subconscious variable H
  - any ordering of subconscious variables is valid
  - but some orderings will lead to very big factors existence generated along the way
  - and some orderings might be able to go on the factors generated along the way very minor
- Bring together all factors mentioning H
- Eliminate (sum out) H
Bring together all remaining factors and normalize

Instance

P(B|j,k) ∝ P(B,j,m)
Somebody comes along and says, hey, I want the probability of burglary given that John and Mary are both calling.
- after introducing prove , we have following factors:
- P(B) , P(E) , P(A|B,E) , P(j|A) , P(m|A)

Cull A
- the size of the tables generated forth the way is 2³, in this instance it wasn't too bad , it's possible to handle. merely if you have a very big BNs y'all demand to be careful almost your ordering to make sure you keep it low.
- Now these 3 things, P(A|B,E) , P(j|A) , P(k|A), take been replaced past P(j,one thousand|B,E), and we got:
  - P(B) , P(E) , P(j,m|B,E)
Cull E
- P(B) , P(j,m|B)
Finish with B

Same Example in Equations

Practise non process the slide! Don't do it. The slide is simply saying that instead of working with these factors, I could have worked all that out using products, and sums, and laws of distribution, and all of that.
equations from tiptop to bottom
1. marginal tin exist obtained from joint by summing out
2. use Bayes' internet joint distribution expression
3. use x*(y+z) = xy + xz
4. joining on a , and and then summing out give f₁
5. use x*(y+z) = xy + xz
6. joining on e , and then summing out requite f₂
All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(west+10)(y+z) to amend computational efficiency !
how do yous decide which variables to pick kickoff ?
- suggestion here was a variable with very few connections . Connections ways that it is participating in a factor.

Another Variable Elimination Example

Query: P(10₃|Y₁=y₁,Y₂=y₂,Y₃=y₃)
Offset by inserting testify , which gives the following initial factors:
- p(Z),p(X₁|Z),p(X₂|Z),p(X₃|Z),p(y₁|10₁),p(y₂|10₂),p(y₃|X₃)
Eliminate Ten₁, this introduce the gene f₁(Z,y₁) = ∑ₓ₁ p(ten₁|Z)p(y₁|ten₁) , and nosotros are left with
- p(Z),f₁(Z,y₁),p(10₂|Z),p(X₃|Z),p(y₂|10₂),p(y₃|X₃)
Eliminate 10₂, this introduce the cistron f₂(Z,y₂) = ∑ₓ₂ p(x₂|Z)p(y₂|ten₂) , and we are left with
- p(Z),f₁(Z,y₁),f₂(Z,y₂),p(Ten₃|Z),p(y₃|X₃)
Eliminate Z, this introduces the factor f₃(y₁,y₂,X₃) = ∑_z p(z)f₁(Z,y₁)f₂(Z,y₂)p(X₃|z) , and nosotros are left:
- p(y₃|Ten₃),f₃(y₁,y₂,X₃)
- quiz: why in that location is a cistron p(10₃|z) ? why non the other ones , like p(X₃,z) or P(Z,y) ?
  - X₃ is the query variable, the query variable doesn't become through the elimination process, the same way as a hidden variable
No hidden variables left. Join the remaining factors to get
- f₄(y₁,y₂,y₃, 10₃) = p(y₃|Ten₃)·f₃(y₁,y₂,X₃)
Normalizing over X₃ gives P(Ten₃|y₁,y₂,y₃)

Variable Elimination Ordering

For the query P(X_n|y₁,…,y_northward) piece of work through the following two unlike orderings as washed in previous slide: Z, 10₁, …, 10_n-1 and Ten₁, …, X_n-1, Z. What is the size of the maximum gene generated for each of the orderings?
Answer: 2ⁿ⁺¹ versus 2² (bold binary)
In full general: the ordering tin profoundly affect efficiency.

start from Ten₁ , nosotros go f₁(Z,y₁) , then X₂, we get f₂(Z,y₂) , then ... f_{due north-1}(Z,y_n-1)
- not X_northward , X_n is query variable
at present we eliminate Z , we get f( y₁ , ... , y_n-1 , ... X_n ) , also only two variables

VE: Computational and Space Complication

The computational and space complexity of variable elimination is determined by the largest factor
The elimination ordering tin can greatly touch on the size of the largest gene.
- E.k., previous slide'south example 2ⁿ vs. 2
Does there always exist an ordering that simply results in pocket-size factors?
- No!
- There are BNs and we'll run across one in side by side slides where no matter which ordering you choice it'south going to generate large factors along the way.

Worst Case Complexity?

CPS
- a three-Sabbatum problem, a special kind of CSP
- there are 7 variables , X₁ ... X₇, I want to find an consignment to these 7 variables , such that this clause is true
  - the clause is saying : ten₁ or 10₂ or not x₃ has to be true , and non x₁ or ten₃ or not x₄ has to be truthful , and ... then forth
- P(Xᵢ=0) = P(Xᵢ=ane) = 0.v
- Y₁ = X₁ v 10₂ v ¬V₃ , ... , Y₈ = ¬X₅ v X₆ five Five₇
- Y₁‚₂ = Y₁ ∧ Y₂ , ... , Y₇‚₈ = Y₇ ∧ Y₈
- Y₁‚₂‚₃‚₄ = Y₁‚₂ ∧ Y₃‚₄ , Y₅‚₆‚₇‚₈ = Y₅‚₆ ∧ Y₇‚₈
- Z = Y₁‚₂‚₃‚₄ ∧ Y₅‚₆‚₇‚₈
why we use then many Y variables here ?
- BNs where variable has many parents is a large BNs , it's not very compact
- those Y variables gives u.s. a BNs where every variables has at nigh iii parents. Then information technology's a very pocket-sized BNs.
If we tin answer P(z) equal to zip or not, nosotros answered whether the three-SAT problem has a solution.
Hence inference in Bayes' nets is NP-hard. No known efficient probabilistic inference in full general.

Polytrees

At that place are atrist special graph structures of BNs , where inference tin can be done efficiently. One example is Ploytree.

A polytree is a directed graph with no undirected cycles
- so directed acyclic graph tin can have undirected cycles.
- simply if you don't allow those undirected cycles, you accept a poly tree.
For poly-trees you lot can e'er find an ordering that is efficient
- Try it!!
Cut-set conditioning for Bayes' internet inference
- Choose set of variables such that if removed just a polytree remains
- Exercise: Retrieve about how the specifics would work out!

Quiz

Alert Network

Quiz BN2-2

BNs
Query: P(C |eastward=1)
we have the following probability tables
Step one: eliminate A
- and so we go the factor A→B : f₁(B) = ∑ₐ P(a)P(B|a)

B	A	P(A,B)
0	0	0.35
1	0	0.35
0	i	0.27
1	i	0.03

sum out A, nosotros get

B	f₁(B)
0	0.62
1	0.38

Pace two: eliminate B.
- so we become the cistron C←B→D : f₂(C,D) = ∑_b P(C|b)P(D|b)f₁(b)

f₁(B)	C	D	P(f₁(B) ,C,D)
0	0	0	0.155
0	one	0	0.155
0	0	1	0.155
0	1	1	0.155
ane	0	0	0.0076
one	1	0	0.0684
1	0	one	0.0304
i	1	1	0.2736

sum out B, we get

C	D	f₂(C,D)
0	0	0.1626
1	0	0.2234
0	ane	0.1854
1	1	0.4286

Step 3: eliminate D.
- so nosotros become cistron C→E←D: f₃(C,east=ane) = ∑_d P(e=1|C,d)f₂(C,d)

C	D	e=1	P( f₂(C,D) , e=1 )
0	0	one	0.900*0.1626
1	0	one	0.600*0.2234
0	1	1	0.700*0.1854
1	ane	i	0.500*0.4286

sum out D , nosotros get

C	f₃(C,e=one)
0	0.27612
one	0.34834

Later on getting the concluding gene f₃(C,eastward=i), a concluding renormalization step needs be carried out to obtain the conditional probability P(C|e=1).

C	P(C\|e=i)
0	0.44217404
1	0.55782596

Quiz BN2-iii

BNs
Query : P(A|+f)
we run variable emptying with the following ordering: E,D,C,B.
After introducing prove, we accept the following factors:
- P(A) , P(B|A), P(C|A,B), P(D|C), P(E|C), P(+f|E,D)
Pace 1:
- After joining on E and summing out over E, we take generated a new factor f₁ over the post-obit variables and/or prove
  - C,D,+f
    - factors incorporate variable East are P(E|C), P(+f|East,D)
    - which comprise variables: C,D,E,+f,
    - and note E is non included subsequently summing out over
- After this stride, the remaining factors are:
  - P(A) , P(B|A), P(C|A,B), P(D|C), f₁
    - P(E|C), P(+f|E,D) which contain variable Eastward does non availabel any more
    - and new factor f₁ comes in
Step 2:
- Afterward joining on D and summing out over D, we have generated a new gene f₂ over the following variables and/or evidence
  - C,+f
- Later on this step, the remaining factors are:
  - P(A) , P(B|A), P(C|A,B), f₂
    - P(D|C) not available at present
    - f₁ was comsumed
    - f₂ comes in
Step 3:
- After joining on C and summing out over C, nosotros have generated a new factor f₃ over the following variables and/or testify
  - A,B,+f
- After this step, the remaining factors are:
  - P(A) , P(B|A), f₃
Stride 4: After joining on B and summing out over
- A,+f
- P(A) , f₄