<<

. 6
( 20)



>>

The first sum equals zero, and the second equals ’ Var Xi N ’ 1 . Hence, Cov
Xi Xj = ’ N + 1 /12.
(iii)

N +1 N ’1 N +1 n N +1 N ’n
Var Y = n ’n n’1 =
12 12 12
Discrete outcomes
96

1.7 Branching processes

Life is a school of probability.
W. Bagehot (1826“1877), British economist

Branching processes are a fine chapter of probability theory. Historically, the concept
of a branching process was conceived to calculate the survival probabilities of noble
families. The name of W.R. Hamilton (1805“1865), the famous Irish mathematician,
should be mentioned here, as well as F. Galton (1822“1911), the English scientist and
explorer, and H.W. Watson (1827“1903), the English mathematician. Since the 1940s
branching processes have been used extensively in natural sciences, in particular to
calculate products of nuclear fission (physics) and the size of populations (biology). Later
they found powerful applications in computer science (algorithms on logical trees) and
other disciplines.
The model giving rise to a branching process is simple and elegant. Initially, we have
an item (a particle or a biological organism) that produces a random number of ˜offspring™
each of which produces a random number of offspring and so on. This generates a ˜tree-
like™ structure where a descendant has a link to the parent and a number of links to its
own offspring. See Figure 1.7.
Each site of the emerging (random) tree has a path that joins it with the ultimate
ancestor (called the origin, or the root of the tree). The length of the path, which is equal
to the number of links in it, measures the number of generations behind the given site
(and the item it represents). Each site gives rise to a subtree that grows from it (for some
sites there may be no continuation, when the number of offspring is zero).
The main assumption is that the process carries on with maximum independence and
homogeneity: the number of offspring produced from a given parent is independent of the




Figure 1.7
1.7 Branching processes 97

numbers related to other sites. More precisely, we consider RVs X0 X1 X2 , where
Xn gives the size of the population in the nth generation. That is
X0 = 1
X1 = the number of offspring after the 1st fission
X2 = the number of offspring after the 2nd fission
etc.
RVs Xn and Xn+1 are related by the following recursion:
Xn
n
Xn+1 = Yi (1.84)
i=1

n
where Yi is the number of descendants produced by the ith member of the nth generation.
n
RVs Yi are supposed to be IID, and their common distribution determines the branching
process.
The first important exercise is to calculate the mean value Xn , i.e. the expected size
of the nth generation. By using the conditional expectation,

Xn = = Xn’1 = m Xn Xn’1 = m
Xn Xn’1
m
m
n’1 n’1
= Xn’1 = m = Xn’1 = m m Yi
Yi
m m
i=1
n’1 n’1
= Y1 Xn’1 = m m = Y1 Xn’1 (1.85)
m

k
Value Yi does not depend on k and i, and we denote it by Y for short. Then,
recurrently,

X1 = Y X2 = Xn = n
2
Y Y (1.86)

We see that if Y < 1, Xn ’ 0 with n, i.e. the process eventually dies out. This case is
often referred to as subcritical. On the contrary, if Y > 1 (a supercritical process), then
Xn ’ . The borderline case Y = 1 is called critical.

Remark In formula (1.86) we did not use the independence assumption.
n
A convenient characteristic is the common PGF s = sY of RVs Yi (again it does
not depend on n and i). Here, an important fact is that if n s = sXn is the PGF of the
size of the nth generation, then 1 s = s and, recursively,

s= n≥1
s (1.87)
n+1 n

See Problems 1.72 and 1.80. In other words,

s= = ···
s s n times (1.88)
n

··· stands for the iteration of the map s ’ s=
s . In particular,
where n+1
s.
n
Discrete outcomes
98

This construction leads to an interesting analysis of extinction probabilities

= Xn = 0 = 0= n
0 (1.89)
n n

As n+1 = = lim n to be a fixed point
n , intuitively we would expect the limit
n’
of map s ’ s , i.e. a solution to z = z . One such point is 1 (as 1 = 1), but there
may be another solution lying between 0 and 1. An important fact here is that function
1 > 1, there will be
is convex, and the value of 0 is between 0 and 1. Then if
a root of equation z = z in 0 1 . Otherwise, z = 1 will be the smallest positive root.
See Figure 1.8.

In fact, it is not difficult to check that the limiting extinction probability q exists and

= the least non-negative solution to z = z (1.90)

Indeed, if there exists z ∈ 0 1 with z = z , then it is unique, and also 0 < z < 1 and
0 < 0 = Y = 0 < z (as is convex and 1 = 1). Then 0 < 0 <···<z
z = z). The sequence n
because is monotone (and 0 must then have a limit
which, by continuity, coincides with z.
If the least non-negative fixed point is z = 1, then the above analysis can be repeated
without changes, yielding that = 1. We conclude that if Y = 0 > 0, then > 0
(actually, > Y = 0 ). On the other hand, if 0 = 0 (i.e. Y = 0 = 0), then, trivially,
= 0. This establishes equation (1.90). We see that even in the supercritical case (with
1 > 1) the limiting extinction probability can be arbitrarily close to 1.
A slight modification of the above construction arises when we initially have several
items (possibly a random number X0 ).

Problem 1.72 In a branching process every individual has probability pk of producing
exactly k offspring, k = 0 1 , and the individuals of each generation produce offspring
independently of each other and of individuals in preceding generations. Let Xn represent
the size of the nth generation. Assume that X0 = 1 and p0 > 0 and let n s be the PGF
of Xn . Thus
n
s= s = X1
pk sk
1
k=0



y
y
y=z
y=z
1
1
y = φ(z)




z = φ(z) 1 z
1 z

Figure 1.8
1.7 Branching processes 99

(i) Prove that

s= s
n+1 n 1


(ii) Prove that for n < m

s 0
n n’m
sXn Xm = 0 =
0
m


Solution (i) By definition,

s = sXn+1 = Xn+1 = k sk
n+1
k=0

where Xn denotes the size of the nth generation. We write
n
sXn+1 = sXn+1 Xn = l
pl
l

n
Here pl = Xn = l , and the conditional expectation is

sXn+1 Xn = l = Xn+1 = k Xn = l sk
k=l

Now observe that

sXn+1 Xn = l = sX1 l = l
s
1


because (a) under the condition that Xn = l,
l
˜
Xn+1 = Xj
j=1


˜
where Xj is the number of offspring produced by the jth individual of the nth generation,
˜ ˜
(b) all the Xj are IID and sXj Xn = l = sX1 = 1 s . This relation yields
n
sXn+1 = =
l
pl s s
n
1 1
l

m m m
(ii) Denote by I0 the indicator I Xm = 1 . Then = 1 = I0 = Xm = 0 =
I0
m 0 . Furthermore,

m
sXn Xm = 0 = sXn I0 / 0
m


Hence, it suffices to check that
m
=
sXn I0 s 0
n n’m
Discrete outcomes
100

Indeed,
m
= Xn = k Xm = 0
sXn I0 sk
k

= Xn = k Xm = 0 Xn = k
sk
k

X m = 0 Xn = k = 0 k,
Now, since m’n

m
=
sXn I0 s 0
n n’m


Problem 1.73 A laboratory keeps a population of aphids. The probability of an aphid
passing a day uneventfully is q < 1. Given that a day is not uneventful, there is a probability
r that the aphid will have one offspring, a probability s that it will have two offspring,
and a probability t that it will die of exhaustion, where r + s + t = 1. Offspring are ready
to reproduce the next day. The fates of different aphids are independent, as are the events
of different days. The laboratory starts out with one aphid.
Let Xn be the number of aphids at the end of n days. Show how to obtain an expression
for the PGF n z of Xn . What is the expected value of Xn ?
Show that the probability of extinction does not depend on q and that if 2r + 3s ¤ 1,
the aphids will certainly die out. Find the probability of extinction if r = 1/5, s = 2/5 and
t = 2/5.

Solution Denote by z the PGF of X1 , i.e. the number of aphids generated, at the
X1
end of a single day, by a single aphid (including the initial aphid). Then

z = 1 ’ q t + qz + 1 ’ q rz2 + 1 ’ q sz3 z > 0
X1


Xn = X1 n with X1 = X1 1 = q + 2 1 ’ q r + 3 1 ’ q s. Indeed, Xn z =
Write
implies Xn = Xn’1 X1 or Xn = X1 n . The probability of extinction
X1 z
Xn’1
at the end of n days is Xn 0 . It is non-decreasing with n and tends to a limit as n ’
giving the probability of (eventual) extinction . As we already know < 1 iff X1 > 1.
The extinction probability is the minimal positive root of

1 ’ q t + qz + 1 ’ q rz2 + 1 ’ q sz3 = z

or, after division by 1 ’ q (since q < 1):

t ’ z + rz2 + sz3 = 0

The last equation does not depend on q, hence also does not depend on q. Condition
X1 ¤ 1 is equivalent to 2r + 3s ¤ 1. In the case r = 1/5 s = 2/5 t = 2/5, the equation
takes the form 2z3 + z2 ’ 5z + 2 = 0. Dividing by z ’ 1 (as z = 1 is a root) one gets
a quadratic equation 2z2 + 3z ’ 2 = 0, with roots z± = ’3 ± 5 /4. The positive root is
1/2, and it gives the extinction probability.
1.7 Branching processes 101

s = 1 ’ p 1 ’ s , where 0 < p < 1 and 0 <
Problem 1.74 Let < 1. Prove that
s is a PGF and that its iterates are
+ ··· + n’1 n
s = 1 ’ p1+ 1’s n=1 2
n


=
Find the mean m of the associated distribution and the extinction probability
limn’ n 0 for a branching process with offspring distribution determined by .

Solution The coefficients in Taylor™s expansion of s are
dk
ak = k s =p ’1 ··· ’ k + 1 ’1 ≥0
k’1
s=0
ds
k=1 2 a0 = 1 ’ p

and 1 = k≥0 ak /k! = 1. Thus, s is the PGF for the probabilities pk = ak k!.
The second iterate, 2 s = s is of the form
2
s =1’p 1’ = 1 ’ pp 1 ’ s
s
+ ··· + k’1 k
s = 1 ’ p1+ 1’s , k ¤ n ’ 1. Then
Assume inductively that k

+ ··· + n’2 n’1
s =1’p 1’ = 1 ’ p p1+ 1’s
s
n n’1

+ ··· + n’1 n
= 1 ’ p1+ 1’s

as required.
1 = lims’1’ s =+
Finally, the mean value is and the extinction probability,

= lim 0 = 1 ’ p1/ 1’
n
n’


Problem 1.75 At time 0, a blood culture starts with one red cell. At the end of 1 min,
the red cell dies and is replaced by one of the following combinations with probabilities
as indicated
1 2 1
two red cells: one red, one white: two white cells:
4 3 12
Each red cell lives for 1 min and gives birth to offspring in the same way as the parent
cell. Each white cell lives for 1 min and dies without reproducing. Assume that individual
cells behave independently.
(i) At time n + 2 min after the culture began, what is the probability that no white cells
1

have yet appeared?
(ii) What is the probability that the entire culture dies out eventually?

Solution (i) The event
1
by time n + no white cells have yet appeared
2
Discrete outcomes
102

implies that the total number of cells equals the total number of red cells and equals 2n .
Then, denoting the probability of this event by pn , we find that
2n
1 1
p0 = 1 p1 = and pn+1 = pn n≥1
4 4
whence
2n ’1
2n’1 2n’2 20
1 1 1 1
pn = = n≥1
4 4 4 4
(ii) The extinction probability obeys
1 2 1 1 1 1
= + + ’ + =0
2 2
or
4 3 12 4 3 12
whence
21
= ±
33
= 1/3, the smaller root.
We must take

Problem 1.76 Let Xn be a branching process such that X0 = 1, X1 = . If Yn =
X0 + · · · + Xn , and for 0 ¤ s ¤ 1

s ≡ s Yn
n


prove that

s =s s
n+1 n


s ≡ sX1 . Deduce that, if Y = s ≡ sY satisfies
n≥0 Xn ,
where then

s =s 0¤s¤1
s
’1
Y = 1’
< 1, prove that
If .

Solution Write Yn+1 = 1 + Yn1 + · · · + YnX1 , where Ynj is the total number of offspring
j
produced by individual j from the first generation. Then the RVs Yn are IID and have the
PGF n s . Hence
j
s= s =s X1 = j
Yn+1
s
n+1 n
j l=1

=s X1 = j =s
j
s s
n n
j

The infinite series Y = s = limn’
n≥0 Xn s . Hence, it obeys
has the PGF n


s =s s
1.7 Branching processes 103

By induction, Yn = 1 + + · · · + n . In fact, Y1 = 1 + X1 and Y1 = 1 + . Assume that
the formula holds for n ¤ k ’ 1. Then

Yk = = 1+
1 1 1
k k’1 k’1 k’1

=1+ 1+ +···+ =1+ + +···+
k’1 k
2


’1
Y = 1’
which completes the induction. Therefore, .

Problem 1.77 Green™s disease turns your hair pink and your toenails blue but has
no other symptoms. A sufferer from Green™s disease has a probability pn of infecting n
further uninfected individuals (n = 0 1 2 3) but will not infect more than 3. (The standard
assumptions of elementary branching processes apply.) Write down an expression for e,
the expected number of individuals infected by a single sufferer.
Starting from the first principles, find the probability that the disease will die out if,
initially, there is only one case.
Let eA and A be the values of e and when p0 = 2/5, p1 = p2 = 0 and p3 = 3/5. Let
eB and B be the values of e and when p0 = p1 = 1/10, p2 = 4/5 and p3 = 0. Show that
eA > eB but A > B .

Solution The expression for e is e = p1 + 2p2 + 3p3 . Let Xj be the number of individuals
in the jth generation of the disease and X0 = 1. Assume that each sufferer dies once
passing the disease on to n ¤ 3 others. Call the probability that the disease dies out:

= X1 = k = p0 + p1 + p2 + p3
k 2 3

k

Direct calculation shows that eA = 9/5, eB = 17/10. Value is identified as the smallest
A
positive root of the equation
3 3 2
0 = p0 + p1 ’ 1 + p2 + p3 = ’1 + ’
2 3 2
5 5 5
Hence

’3 + 33
A= ≈ 0 46
6
Similarly, is the smallest positive root of the equation
B

4 1
0= ’1 ’
5 10
= 1/8. So, eA > eB and >
and B.
B A


Problem 1.78 Suppose that Xr r ≥ 0 is a branching process with X0 = 1 and that
the PGF for X1 is s . Establish an iterative formula for the PGF s for Xr . State a
r
s about the probability of eventual extinction.
result in terms of
Discrete outcomes
104

(i) Suppose the probability that an individual leaves k descendants in the next generation
is pk = 1/2k+1 , for k ≥ 0. Show from the result you state that extinction is certain. Prove
further that
r ’ r ’1 s
rs= r ≥1
r + 1 ’ rs
and deduce the probability that the rth generation is empty.
(ii) Suppose that every individual leaves at most three descendants in the next genera-
tion, and that the probability of leaving k descendants in the next generation is
1
3
pk = k=0 1 2 3
k 23
What is the probability of extinction?

Solution (i) Let Yin be the number of offspring of individual i in generation n. Then
Xn+1 = Y1n + · · · + YXn
n


and

sY1 +···+Yk
n n
sXn+1 = = Xn = k
sXn+1 Xn
k=0

n
= Xn = k = Xn = k s k=
s Y1 k
s
n
k=0 k=0

and so n+1 s = s . The probability of extinction is the least s ∈ 0 1 such that
n
s = s . Further,
1 1 1
s= + 2s +···=
2’s
22
Solving s = 1/ 2 ’ s yields s ’ 1 2 = 0, i.e. s = 1. Hence the extinction is certain.
The formula for r s is established by induction:
r ’ r ’1 / 2’s r + 1 ’ rs
s= =
r+1
r + 1 ’ r/ 2 ’ s r +2 ’ r +1 s
Hence, the probability that the rth generation is empty is
r
r0=
r +1
(ii) s = 213 1 + s√ whence solving 23 s = 1 + s 3 or s ’ 1 s2 + 4s ’ 1 = 0 we get
3

solutions s = 1, s = ± 5 ’ 2, and the extinction probability is 5 ’ 2 ≈ 0 24.

Problem 1.79 Consider a Galton“Watson process (i.e. the branching process where the
number of offspring is random and independent for each division), with X1 = 0 = 2/5
and X1 = 2 = 3/5. Compute the distribution of the random variable X2 . For generation
3 find all probabilities X3 = 2k , k = 0 1 2 3 4. Find the extinction probability for
this model.
1.7 Branching processes 105

= 2/3 and the PGF for X2
Solution The extinction probability
3
2 12 36 2 3
s= + + s+ s4
X2
5 125 125 5

s= 2/5 + 3s2 /5 . Then
The PGF X3 X2

3
2 12 36 3
X2 = 0 = + X2 = 2 = X2 = 4 =
5 125 125 5
2 3 4
2 12 36 2 3 2
X3 = 0 = + + + = 0 54761
5 125 125 5 5 5
24 — 33 25 — 34
X3 = 2 = + = 0 17142
55 57
4 — 34 42 — 35 8 — 35
X3 = 4 = + + = 0 17833
55 57 57
36 — 23
X3 = 6 = = 0 07465
57
7
3
X3 = 8 = = 0 02799
5

Problem 1.80 By developing the theory of extinction probabilities, or otherwise, solve
the following problem.
No-one in their right mind would wish to be a guest at the Virtual Reality Hotel.
The rooms are numbered 0 to 3N ’ 3 /2, where N is a very large integer. If 0 ¤ i ¤
3N ’1 ’ 3 /2 and j = 1 2 3 there is a door between Room i and Room 3i + j through
which (if it is unlocked) guests may pass in both directions. In addition, any room with
a number higher than 3N ’1 ’ 3 /2 has an open window through which guests can (and
should) escape into the street. So far as the guests are concerned, there are no other doors
or windows. Figure 1.9 shows part of the floor plan of the hotel.


12
11
3
10
9
2 8
0
7
6
1
5
4

Figure 1.9
Discrete outcomes
106

Each door in the hotel is locked with probability 1/3 independently of the others. An
arriving guest is placed in Room 0 and can then wander freely (insofar as the locked

doors allow). Show that the guest™s chance of escape is about 9 ’ 27 /4.

Solution Denote by Xr the number of rooms available at level r from 0. Writing
t = tXr , with =:
r 1


t = tXr+1 = Xr = i tXr+1 Xr = i
r+1
i
i
= Xr = i tX1
i
i
= Xr = i =
t t
r
i



t= =
t t , and
Then r+1 r


can™t reach level r = 0
r


t equals
Now PGF
3 2 2 3
1 2 1 2 12 2 1
+3 t+3 t+ t3 = 1 + 6t + 12t2 + 8t3
3 3 3 3 3 3 27
t = t becomes
Hence, the equation

27t = 1 + 6t + 12t2 + 8t3 i.e. 1 ’ 21t + 12t2 + 8t3 = 0

By factorising 1 ’ 21t + 12t2 + 8t3 = t ’ 1 8t2 + 20t ’ 1 , we find that the roots are

’5 ± 27
t=1
4

27 ’ 5 4. The sequence n 0 is monotone increasing
The root between 0 and 1 is
and bounded above. Hence, it converges as N ’ :

27 ’ 5
no escape ’
4
Then
√ √
27 ’ 5 9 ’ 27
escape ’ 1 ’ = ≈ 0 950962
4 4

Problem 1.81 (i) A mature individual produces offspring according to the PGF s.
Suppose we start with a population of k immature individuals, each of which grows to
maturity with probability p and then reproduces, independently of the other individuals.
Find the PGF of the number of (immature) individuals in the next generation.
1.7 Branching processes 107

(ii) Find the PGF of the number of mature individuals in the next generation, given
that there are k mature individuals in the parent generation.
Hint: (i) Let R be the number of immature descendants of an immature individual.
If X n is the number of immature individuals in generation n, then, given that X n = k,

X n+1 = R1 + · · · + Rk

where Ri ∼ R, independently. The conditional PGF is

sX n+1 X n = k = g k
s

where g x = 1 ’ p + px.
(ii) Let U be the number of mature descendants of a mature individual, and Z n be the
number of mature individuals in generation n. Then, as before, conditional on Z n = k,

Z n+1 = U1 + · · · + Uk

where Ui ∼ U , independently. The conditional PGF is
k
sZ n+1 Z n = k = gs

Problem 1.82 Show that the distributions in parts (i) and (ii) of Problem 1.81 have
the same mean, but not necessarily the same variance.
Hint:
d2 d2
X n+1
=1 =p sZ n+1 Z n = 1 = p2
n
s X 1 1
ds2 ds2
2 Continuous outcomes




2.1 Uniform distribution. Probability density functions.
Random variables. Independence

Probabilists do it continuously but discreetly.
(From the series ˜How they do it™.)

Bye, Bi, Variate
(From the series ˜Movies that never made it to the Big Screen™.)

After developing a background in probabilistic models with discrete outcomes we can
now progress further and do exercises where uncountably many outcomes are explicitly
involved. Here, the events are associated with subsets of a continuous space (a real line ,
an interval a b , a plane 2 , a square, etc.). The simplest case is where the outcome
space is represented by a ˜nice™ bounded set and the probability distribution corresponds
to a unit mass uniformly spread over it. Then an event (i.e. a subset) A ⊆ acquires the
probability

vA
A= (2.1)
v

where v A is the standard Euclidean volume (or area or length) of A and v that of .
The term ˜uniformly spread™ is the key here; an example below shows that one has to
be careful about what exactly it means in the given context.


Example 2.1 This is known as Bertrand™s paradox. A chord has been chosen at random
in a circle of radius r. What is the probability that it is longer than the side of the equilateral
triangle inscribed in the circle? The answer is different in the following three cases:

(i) the middle point of the chord is distributed uniformly inside the circle;
(ii) one endpoint is fixed and the second is uniformly distributed over the circum-
ference;
(iii) the distance between the middle point of the chord and the centre of the circle
is uniformly distributed over the interval 0 r .

108
2.1 Uniform distribution 109

In fact, in case (i), the middle point of the chord must lie inside the circle inscribed in
the triangle. Hence,

r 2 /4 1
area of the inscribed circle
chord longer = = =
r2
area of the original circle 4
In case (ii), the second endpoint must then lie on the opposite third of the circumfer-
ence. Hence,
1
chord longer =
3
Finally, in case (iii), the middle point of the chord must be at distance ¤ r/2 from the
centre. Hence,
1
chord longer =
2

A useful observation is that we can think in terms of a uniform probability density
function assigning to a point x ∈ the value
1
f uni x = Ix (2.2)
v
with the probability of event A ⊆ calculated as the integral
1
A= f uni x dx = dx (2.3)
v
A A


giving of course the same answer as formula (2.1). Because f uni ≥ 0 and f uni x dx = 1,
the probability of event A ⊆ is always between 0 and 1. Note that the mass assigned to
a single outcome represented by a point of is zero. Hence the mass assigned to any
finite or countable set of outcomes is zero (as it is the sum of the masses assigned to each
outcome); to get a positive mass (and thus a positive probability), an event A must be
uncountable.

Problem 2.1 Alice and Bob agree to meet in the Copper Kettle after their Saturday
lectures. They arrive at times that are independent and uniformly distributed between
12:00 and 13:00. Each is prepared to wait s minutes before leaving. Find a minimal s
such that the probability that they meet is at least 1/2.

is the unit square with co-ordinates 0 ¤ x y ¤ 1 (measuring the
Solution The set
time in fractions of an hour between 12:00 and 13:00). Each = x y ∈ specifies
Alice™s arrival time x and Bob™s y. Then the event: ˜they arrive within s minutes of each
other™ is a strip around the diagonal x = y:
s
A= xy∈ x’y ¤
60
Continuous outcomes
110

1“ s/60




1“ s/60


s/60
s/60

Figure 2.1

2
Its complement is formed by two triangles, of area 1/2 1 ’ s/60 each. So, the area
v A is
s 2
1’ 1’
60
and we want it to be ≥ 1/2. See Figure 2.1.

This gives s ≥ 60 1 ’ 2/2 ≈ 18 minutes.

Problem 2.2 A stick is broken into two at random; then the longer half is broken
again into two pieces at random. What is the probability that the three pieces will make
a triangle?

Solution Let the stick length be . If x is the place of the first break, then 0 ¤ x ¤
and x is uniform on 0 . If x ≥ /2, then the second break point y is uniformly chosen
on the interval 0 x . See Figure 2.2. Otherwise y is uniformly chosen on x .
Thus

= xy 0¤x y¤ y ¤ x for x ≥ /2 and x ¤ y ¤ for x ¤ /2

= 3 2 /4. See Figure 2.3.
and the area v
To make a triangle x y must lie in A, where

A = max x y > x’y <
min x y <
2 2 2
2
/4. The answer then is 1/3.
which yields the area




0 y x l

Figure 2.2
2.1 Uniform distribution 111




„¦




A




Figure 2.3


to a ˜half™ of the above set just assuming that x is always
It is also possible to reduce
the length of the longer stick:

= xy /2 ¤ x ¤ y¤x

the probability will still be 1/3.

Problem 2.3 A stick is broken in two places chosen beforehand, completely at random
along its length. What is the probability that the three pieces will make a triangle?
Answer : 1/4. The loss in probability occurs as we include into the outcomes the
possibility that a shorter stick is broken again while in the previous example it was
excluded.

Problem 2.4 The ecu coin is a disc of diameter 4/5 units. In the traditional game of
drop the ecu, an ecu coin is dropped at random onto a grid of squares formed by lines
one unit apart. If the coin covers part of a line you lose your ecu. If not, you still lose
your ecu but the band plays the national anthem of your choice. What is the probability
that you get to choose your national anthem?

Solution Without loss of generality, assume that the centre of the coin lies in the unit
square with corners 0 0 01 10 1 1 . You will hear an anthem when the
centre lies in the inside square described by
2 3 2 3
¤x¤ ¤y¤
5 5 5 5
Hence,
1
anthem = area of S =
25

There are several serious questions arising here which we will address later. One is
the so-called measurability: there exist weird sets A ‚ (even when is a unit interval
Continuous outcomes
112

(0,1)) which do not have a correctly defined volume (or length). In general, how does one
measure the volume of a set A in a continuous space? Such sets may not be particularly
difficult to describe (for instance, Cantor™s continuum has a correctly defined length),
but calculating their volume, area or length goes beyond standard Riemann integration,
let alone elementary formulas. (As a matter of fact, the correct length of is zero.) To
develop a complete theory, we would need the so-called Lebesgue integration, which is
called after H.L. Lebesgue (1875“1941), the famous French mathematician. Lebesgue was
of a very humble origin, but became a top mathematician. He was renowned for flawless
and elegant presentations and written works. In turn, the Lebesgue integration requires
the concept of a sigma-algebra and an additive measure which leads to a far-reaching
generalisation of the concept of length, area and volume encapsulated in a concept of a
measure. We will discuss these issues in later volumes.
An issue to discuss now is: what if the distribution of the mass is not uniform?
This question is not only of a purely theoretical interest. In many practical models is
represented by an unbounded subset of a Euclidean space d whose volume is infinite (e.g.
by d itself or by + = 0 , for d = 1). Then the denominator v in equation (2.1)
becomes infinite. Here, the recipe is: consider a function f ≥ 0 with f x dx = 1 and set

A= f x dx A ⊆
A

(cf. equation (2.3)). Such a function f is interpreted as a (general) probability density
function (PDF). The following natural (and important) examples appear in problems
below:
A uniform distribution on an interval a b a < b: here = a b and

1
fx = I a<x<b (2.4)
b’a
=
A Gaussian, or normal, distribution, with and

1 1
f x =√ exp ’ x’ x∈
2
(2.5)
2
2
2
Here ∈ > 0 are parameters specifying the distribution.
The graphs of normal PDFs on an interval around the origin and away from it are
plotted in Figures 2.4 and 2.5.
This is the famous curve about which the great French mathematician J.-H. Poincar©
(1854“1912) said ˜Experimentalists think that it is a mathematical theorem while the
mathematicians believe it to be an experimental fact.™
An exponential distribution: here = + and

f x = e’ x I x ≥ 0 x∈ (2.6)

Here > 0 is a parameter specifying the distribution.
The graphs of the exponential PDFs are shown in Figure 2.6.
2.1 Uniform distribution 113


0.4 mu = 0, sigma = 1
mu = 0, sigma = 1.5
mu = 0, sigma = 2
0.3
0.2
0.1
0.0




“3 “2 “1 0 1 2 3
x

Figure 2.4 The normal PDFs, I.
0.0 0.01 0.02 0.03 0.04 0.05 0.06




mu = 0, sigma = 1
mu = 0, sigma = 1.5
mu = 0, sigma = 2




3.0 3.5 4.0 4.5 5.0 5.5 6.0
x

Figure 2.5 The normal PDFs, II.
1.0




lambda = 1
lambda = 1/2
0.8
0.6
0.4
0.2
0.0




0 2 4 6 8 x

Figure 2.6 The exponential PDFs.
Continuous outcomes
114

=
A generalisation of formula (2.6) is the Gamma distribution. Here again and
+
the PDF is

’1 ’ x
fx = I x≥0
x e (2.7)

= 0 x ’1 e’x dx (the value of the Gamma function)
> 0. Here
with parameters
n = n ’ 1 !; in
is the normalising constant. Recall, for a positive integer argument,
= ’1 ’1 .
general, for > 1
The Gamma distribution plays a prominent rôle in statistics and will repeatedly appear
in later chapters. The graphs of the Gamma PDFs are sketched in Figure 2.7.
Another example is a Cauchy distribution, with = and
1
fx = x∈ (2.8)
+ x’
2 2


with parameters ∈ and > 0. There is a story that the Cauchy distribution was
discovered by Poisson in 1824 when he proposed a counterexample to the CLT. See
below. The graphs of the Cauchy PDFs are sketched in Figure 2.8.
Cauchy was a staunch royalist and a devoted Catholic and, unlike many other prominent
French scientists of the period, he had difficult relations with the Republican regime. In
1830, during one of the nineteenth century French revolutions, he went into voluntary
exile to Turin and Prague where he gave private mathematics lessons to the children of
the Bourbon Royal family. His admission to the French Academy occurred only in 1838,
after he had returned to Paris.
The Gaussian distribution will be discussed in detail below. At this stage we only
indicate a generalisation to the multidimensional case where = d and
1 1 ’1
fx = √ exp ’ x’ x’ (2.9)
2
d 1/2
2 det
0.5




lambda = 1, alpha = 0.5
lambda = 1, alpha = 1.5
0.4




lambda = 1, alpha = 5.5
0.3
0.2
0.1
0.0




0 2 4 6 x

Figure 2.7 The Gamma PDFs.
2.1 Uniform distribution 115


0.05 0.10 0.15 0.20 0.25 0.30
alpha = 0, tau = 1
alpha = 0, tau = 1.5
alpha = 0, tau = 2




“3 “2 “1 0 1 2 3
x

Figure 2.8 The Cauchy PDFs.


are real d-dimensional vectors:
Here, x and
⎛⎞ ⎛⎞
x1 1
⎜⎟ ⎜⎟
x=⎝ ⎠ =⎝ ⎠ ∈ d
xd d

and is an invertible positive-definite d — d real matrix, with the determinant det > 0
and the inverse matrix ’1 = ’1 . Matrix is called positive-definite if it can be
ij
represented as the product = AA— and strictly positive-definite if in this representation
matrix A is invertible, i.e. the inverse A’1 exists (in which case the inverse matrix
’1
= A—’1 A’1 also exists). It is easy to see that a positive-definite matrix is always
symmetric (or Hermitian), i.e. obeys — = . Hence, a positive-definite matrix has an
orthonormal basis of eigenvectors, and its eigenvalues are non-negative (positive if it is
stands for the Euclidean scalar product in d :
strictly positive-definite). Further,
d
’1 ’1
x’ x’ = xi ’ xj ’
i j
ij
i j=1


A PDF of this form is called a multivariate normal, or Gaussian, distribution.

Remark We already have seen a number of probability distributions bearing personal
names (Gauss, Poisson, Cauchy; more will appear in Chapters 3 and 4). Another example
(though not as frequent) is the Simpson distribution. Here we take X Y ∼ U 0 1 ,
independently. Then X + Y has a ˜triangular™ PDF known as Simpson™s PDF:
§
⎪u 0¤u¤1


fX+Y u = 2 ’ u 1 ¤ u ¤ 2


©0 02
Continuous outcomes
116

T. Simpson (1700“1761), an English scientist, left a notable mark in interpolation and
numerical methods of integration. He was the most distinguished of the group of itinerant
lecturers who taught in fashionable London coffee-houses (a popular way of spreading
scientific information in eighteenth-century England).

As before, we face a question: what type of function f can serve as PDFs? (The
example with f x = I x ∈ 0 1 \ , where ‚ 0 1 is Cantor™s set, is typical. Here
1
f ≥ 0 by definition but how 0 f x dx should be defined?) And again, the answer lies in
the theory of Lebesgue integration. Fortunately, in ˜realistic™ models, these matters arise
rarely and are overshadowed by far more practical issues.
So, from now until the end of this chapter our basic model will be where outcomes
run over an ˜allowed™ subset of (such subsets are called measurable and will be
introduced later). Quite often will be d . The probability A will be calculated for
every such set A (called an event in ) as

A= f x dx (2.10)
A

Here f is a given PDF f ≥ 0 with f x dx = 1.
As in the discrete case, we have an intuitively plausible property of additivity: if
A1 A2 is a (finite or countable) sequence of pair-wise disjoint events then

Aj = Aj (2.11)
j
j


j Aj ¤ = 1, we obtain that for the
Aj . As
while, in a general case, j
complement A = \ A, A = 1 ’ A , and for the set-theoretical difference A \
c c

A \ B = A ’ A © B . Of course, more advanced facts that we learned in the
B
discrete space case remain true, such as the inclusion“exclusion formula.
In this setting, the concept of a random variable develops, unsurprisingly, from its
discrete-outcome analogue: a RV is a function

∈ ’X
X

with real or complex values X (in the complex case we again consider a pair of
real RVs representing the real and imaginary parts). Formally, a real RV must have the
property that ∀ x ∈ , the set ∈ X < x is an event in to which the probability
X < y can be assigned. Then with each real-valued RV we associate its cumulative
distribution function (CDF)

y∈ ’ FX x = X<x (2.12)

varying monotonically from 0 to 1 as y increases from ’ to . See Figure 2.9.
The quantity

F X x = 1 ’ FX x = X≥x (2.13)

describing tail probabilities is also often used.
2.1 Uniform distribution 117


F(x)
1
.


x

<<

. 6
( 20)



>>