Xi Xj = ’ N + 1 /12.

(iii)

N +1 N ’1 N +1 n N +1 N ’n

Var Y = n ’n n’1 =

12 12 12

Discrete outcomes

96

1.7 Branching processes

Life is a school of probability.

W. Bagehot (1826“1877), British economist

Branching processes are a fine chapter of probability theory. Historically, the concept

of a branching process was conceived to calculate the survival probabilities of noble

families. The name of W.R. Hamilton (1805“1865), the famous Irish mathematician,

should be mentioned here, as well as F. Galton (1822“1911), the English scientist and

explorer, and H.W. Watson (1827“1903), the English mathematician. Since the 1940s

branching processes have been used extensively in natural sciences, in particular to

calculate products of nuclear fission (physics) and the size of populations (biology). Later

they found powerful applications in computer science (algorithms on logical trees) and

other disciplines.

The model giving rise to a branching process is simple and elegant. Initially, we have

an item (a particle or a biological organism) that produces a random number of ˜offspring™

each of which produces a random number of offspring and so on. This generates a ˜tree-

like™ structure where a descendant has a link to the parent and a number of links to its

own offspring. See Figure 1.7.

Each site of the emerging (random) tree has a path that joins it with the ultimate

ancestor (called the origin, or the root of the tree). The length of the path, which is equal

to the number of links in it, measures the number of generations behind the given site

(and the item it represents). Each site gives rise to a subtree that grows from it (for some

sites there may be no continuation, when the number of offspring is zero).

The main assumption is that the process carries on with maximum independence and

homogeneity: the number of offspring produced from a given parent is independent of the

Figure 1.7

1.7 Branching processes 97

numbers related to other sites. More precisely, we consider RVs X0 X1 X2 , where

Xn gives the size of the population in the nth generation. That is

X0 = 1

X1 = the number of offspring after the 1st fission

X2 = the number of offspring after the 2nd fission

etc.

RVs Xn and Xn+1 are related by the following recursion:

Xn

n

Xn+1 = Yi (1.84)

i=1

n

where Yi is the number of descendants produced by the ith member of the nth generation.

n

RVs Yi are supposed to be IID, and their common distribution determines the branching

process.

The first important exercise is to calculate the mean value Xn , i.e. the expected size

of the nth generation. By using the conditional expectation,

Xn = = Xn’1 = m Xn Xn’1 = m

Xn Xn’1

m

m

n’1 n’1

= Xn’1 = m = Xn’1 = m m Yi

Yi

m m

i=1

n’1 n’1

= Y1 Xn’1 = m m = Y1 Xn’1 (1.85)

m

k

Value Yi does not depend on k and i, and we denote it by Y for short. Then,

recurrently,

X1 = Y X2 = Xn = n

2

Y Y (1.86)

We see that if Y < 1, Xn ’ 0 with n, i.e. the process eventually dies out. This case is

often referred to as subcritical. On the contrary, if Y > 1 (a supercritical process), then

Xn ’ . The borderline case Y = 1 is called critical.

Remark In formula (1.86) we did not use the independence assumption.

n

A convenient characteristic is the common PGF s = sY of RVs Yi (again it does

not depend on n and i). Here, an important fact is that if n s = sXn is the PGF of the

size of the nth generation, then 1 s = s and, recursively,

s= n≥1

s (1.87)

n+1 n

See Problems 1.72 and 1.80. In other words,

s= = ···

s s n times (1.88)

n

··· stands for the iteration of the map s ’ s=

s . In particular,

where n+1

s.

n

Discrete outcomes

98

This construction leads to an interesting analysis of extinction probabilities

= Xn = 0 = 0= n

0 (1.89)

n n

As n+1 = = lim n to be a fixed point

n , intuitively we would expect the limit

n’

of map s ’ s , i.e. a solution to z = z . One such point is 1 (as 1 = 1), but there

may be another solution lying between 0 and 1. An important fact here is that function

1 > 1, there will be

is convex, and the value of 0 is between 0 and 1. Then if

a root of equation z = z in 0 1 . Otherwise, z = 1 will be the smallest positive root.

See Figure 1.8.

In fact, it is not difficult to check that the limiting extinction probability q exists and

= the least non-negative solution to z = z (1.90)

Indeed, if there exists z ∈ 0 1 with z = z , then it is unique, and also 0 < z < 1 and

0 < 0 = Y = 0 < z (as is convex and 1 = 1). Then 0 < 0 <···<z

z = z). The sequence n

because is monotone (and 0 must then have a limit

which, by continuity, coincides with z.

If the least non-negative fixed point is z = 1, then the above analysis can be repeated

without changes, yielding that = 1. We conclude that if Y = 0 > 0, then > 0

(actually, > Y = 0 ). On the other hand, if 0 = 0 (i.e. Y = 0 = 0), then, trivially,

= 0. This establishes equation (1.90). We see that even in the supercritical case (with

1 > 1) the limiting extinction probability can be arbitrarily close to 1.

A slight modification of the above construction arises when we initially have several

items (possibly a random number X0 ).

Problem 1.72 In a branching process every individual has probability pk of producing

exactly k offspring, k = 0 1 , and the individuals of each generation produce offspring

independently of each other and of individuals in preceding generations. Let Xn represent

the size of the nth generation. Assume that X0 = 1 and p0 > 0 and let n s be the PGF

of Xn . Thus

n

s= s = X1

pk sk

1

k=0

y

y

y=z

y=z

1

1

y = φ(z)

z = φ(z) 1 z

1 z

Figure 1.8

1.7 Branching processes 99

(i) Prove that

s= s

n+1 n 1

(ii) Prove that for n < m

s 0

n n’m

sXn Xm = 0 =

0

m

Solution (i) By definition,

s = sXn+1 = Xn+1 = k sk

n+1

k=0

where Xn denotes the size of the nth generation. We write

n

sXn+1 = sXn+1 Xn = l

pl

l

n

Here pl = Xn = l , and the conditional expectation is

sXn+1 Xn = l = Xn+1 = k Xn = l sk

k=l

Now observe that

sXn+1 Xn = l = sX1 l = l

s

1

because (a) under the condition that Xn = l,

l

˜

Xn+1 = Xj

j=1

˜

where Xj is the number of offspring produced by the jth individual of the nth generation,

˜ ˜

(b) all the Xj are IID and sXj Xn = l = sX1 = 1 s . This relation yields

n

sXn+1 = =

l

pl s s

n

1 1

l

m m m

(ii) Denote by I0 the indicator I Xm = 1 . Then = 1 = I0 = Xm = 0 =

I0

m 0 . Furthermore,

m

sXn Xm = 0 = sXn I0 / 0

m

Hence, it suffices to check that

m

=

sXn I0 s 0

n n’m

Discrete outcomes

100

Indeed,

m

= Xn = k Xm = 0

sXn I0 sk

k

= Xn = k Xm = 0 Xn = k

sk

k

X m = 0 Xn = k = 0 k,

Now, since m’n

m

=

sXn I0 s 0

n n’m

Problem 1.73 A laboratory keeps a population of aphids. The probability of an aphid

passing a day uneventfully is q < 1. Given that a day is not uneventful, there is a probability

r that the aphid will have one offspring, a probability s that it will have two offspring,

and a probability t that it will die of exhaustion, where r + s + t = 1. Offspring are ready

to reproduce the next day. The fates of different aphids are independent, as are the events

of different days. The laboratory starts out with one aphid.

Let Xn be the number of aphids at the end of n days. Show how to obtain an expression

for the PGF n z of Xn . What is the expected value of Xn ?

Show that the probability of extinction does not depend on q and that if 2r + 3s ¤ 1,

the aphids will certainly die out. Find the probability of extinction if r = 1/5, s = 2/5 and

t = 2/5.

Solution Denote by z the PGF of X1 , i.e. the number of aphids generated, at the

X1

end of a single day, by a single aphid (including the initial aphid). Then

z = 1 ’ q t + qz + 1 ’ q rz2 + 1 ’ q sz3 z > 0

X1

Xn = X1 n with X1 = X1 1 = q + 2 1 ’ q r + 3 1 ’ q s. Indeed, Xn z =

Write

implies Xn = Xn’1 X1 or Xn = X1 n . The probability of extinction

X1 z

Xn’1

at the end of n days is Xn 0 . It is non-decreasing with n and tends to a limit as n ’

giving the probability of (eventual) extinction . As we already know < 1 iff X1 > 1.

The extinction probability is the minimal positive root of

1 ’ q t + qz + 1 ’ q rz2 + 1 ’ q sz3 = z

or, after division by 1 ’ q (since q < 1):

t ’ z + rz2 + sz3 = 0

The last equation does not depend on q, hence also does not depend on q. Condition

X1 ¤ 1 is equivalent to 2r + 3s ¤ 1. In the case r = 1/5 s = 2/5 t = 2/5, the equation

takes the form 2z3 + z2 ’ 5z + 2 = 0. Dividing by z ’ 1 (as z = 1 is a root) one gets

a quadratic equation 2z2 + 3z ’ 2 = 0, with roots z± = ’3 ± 5 /4. The positive root is

1/2, and it gives the extinction probability.

1.7 Branching processes 101

s = 1 ’ p 1 ’ s , where 0 < p < 1 and 0 <

Problem 1.74 Let < 1. Prove that

s is a PGF and that its iterates are

+ ··· + n’1 n

s = 1 ’ p1+ 1’s n=1 2

n

=

Find the mean m of the associated distribution and the extinction probability

limn’ n 0 for a branching process with offspring distribution determined by .

Solution The coefficients in Taylor™s expansion of s are

dk

ak = k s =p ’1 ··· ’ k + 1 ’1 ≥0

k’1

s=0

ds

k=1 2 a0 = 1 ’ p

and 1 = k≥0 ak /k! = 1. Thus, s is the PGF for the probabilities pk = ak k!.

The second iterate, 2 s = s is of the form

2

s =1’p 1’ = 1 ’ pp 1 ’ s

s

+ ··· + k’1 k

s = 1 ’ p1+ 1’s , k ¤ n ’ 1. Then

Assume inductively that k

+ ··· + n’2 n’1

s =1’p 1’ = 1 ’ p p1+ 1’s

s

n n’1

+ ··· + n’1 n

= 1 ’ p1+ 1’s

as required.

1 = lims’1’ s =+

Finally, the mean value is and the extinction probability,

= lim 0 = 1 ’ p1/ 1’

n

n’

Problem 1.75 At time 0, a blood culture starts with one red cell. At the end of 1 min,

the red cell dies and is replaced by one of the following combinations with probabilities

as indicated

1 2 1

two red cells: one red, one white: two white cells:

4 3 12

Each red cell lives for 1 min and gives birth to offspring in the same way as the parent

cell. Each white cell lives for 1 min and dies without reproducing. Assume that individual

cells behave independently.

(i) At time n + 2 min after the culture began, what is the probability that no white cells

1

have yet appeared?

(ii) What is the probability that the entire culture dies out eventually?

Solution (i) The event

1

by time n + no white cells have yet appeared

2

Discrete outcomes

102

implies that the total number of cells equals the total number of red cells and equals 2n .

Then, denoting the probability of this event by pn , we find that

2n

1 1

p0 = 1 p1 = and pn+1 = pn n≥1

4 4

whence

2n ’1

2n’1 2n’2 20

1 1 1 1

pn = = n≥1

4 4 4 4

(ii) The extinction probability obeys

1 2 1 1 1 1

= + + ’ + =0

2 2

or

4 3 12 4 3 12

whence

21

= ±

33

= 1/3, the smaller root.

We must take

Problem 1.76 Let Xn be a branching process such that X0 = 1, X1 = . If Yn =

X0 + · · · + Xn , and for 0 ¤ s ¤ 1

s ≡ s Yn

n

prove that

s =s s

n+1 n

s ≡ sX1 . Deduce that, if Y = s ≡ sY satisfies

n≥0 Xn ,

where then

s =s 0¤s¤1

s

’1

Y = 1’

< 1, prove that

If .

Solution Write Yn+1 = 1 + Yn1 + · · · + YnX1 , where Ynj is the total number of offspring

j

produced by individual j from the first generation. Then the RVs Yn are IID and have the

PGF n s . Hence

j

s= s =s X1 = j

Yn+1

s

n+1 n

j l=1

=s X1 = j =s

j

s s

n n

j

The infinite series Y = s = limn’

n≥0 Xn s . Hence, it obeys

has the PGF n

s =s s

1.7 Branching processes 103

By induction, Yn = 1 + + · · · + n . In fact, Y1 = 1 + X1 and Y1 = 1 + . Assume that

the formula holds for n ¤ k ’ 1. Then

Yk = = 1+

1 1 1

k k’1 k’1 k’1

=1+ 1+ +···+ =1+ + +···+

k’1 k

2

’1

Y = 1’

which completes the induction. Therefore, .

Problem 1.77 Green™s disease turns your hair pink and your toenails blue but has

no other symptoms. A sufferer from Green™s disease has a probability pn of infecting n

further uninfected individuals (n = 0 1 2 3) but will not infect more than 3. (The standard

assumptions of elementary branching processes apply.) Write down an expression for e,

the expected number of individuals infected by a single sufferer.

Starting from the first principles, find the probability that the disease will die out if,

initially, there is only one case.

Let eA and A be the values of e and when p0 = 2/5, p1 = p2 = 0 and p3 = 3/5. Let

eB and B be the values of e and when p0 = p1 = 1/10, p2 = 4/5 and p3 = 0. Show that

eA > eB but A > B .

Solution The expression for e is e = p1 + 2p2 + 3p3 . Let Xj be the number of individuals

in the jth generation of the disease and X0 = 1. Assume that each sufferer dies once

passing the disease on to n ¤ 3 others. Call the probability that the disease dies out:

= X1 = k = p0 + p1 + p2 + p3

k 2 3

k

Direct calculation shows that eA = 9/5, eB = 17/10. Value is identified as the smallest

A

positive root of the equation

3 3 2

0 = p0 + p1 ’ 1 + p2 + p3 = ’1 + ’

2 3 2

5 5 5

Hence

√

’3 + 33

A= ≈ 0 46

6

Similarly, is the smallest positive root of the equation

B

4 1

0= ’1 ’

5 10

= 1/8. So, eA > eB and >

and B.

B A

Problem 1.78 Suppose that Xr r ≥ 0 is a branching process with X0 = 1 and that

the PGF for X1 is s . Establish an iterative formula for the PGF s for Xr . State a

r

s about the probability of eventual extinction.

result in terms of

Discrete outcomes

104

(i) Suppose the probability that an individual leaves k descendants in the next generation

is pk = 1/2k+1 , for k ≥ 0. Show from the result you state that extinction is certain. Prove

further that

r ’ r ’1 s

rs= r ≥1

r + 1 ’ rs

and deduce the probability that the rth generation is empty.

(ii) Suppose that every individual leaves at most three descendants in the next genera-

tion, and that the probability of leaving k descendants in the next generation is

1

3

pk = k=0 1 2 3

k 23

What is the probability of extinction?

Solution (i) Let Yin be the number of offspring of individual i in generation n. Then

Xn+1 = Y1n + · · · + YXn

n

and

sY1 +···+Yk

n n

sXn+1 = = Xn = k

sXn+1 Xn

k=0

n

= Xn = k = Xn = k s k=

s Y1 k

s

n

k=0 k=0

and so n+1 s = s . The probability of extinction is the least s ∈ 0 1 such that

n

s = s . Further,

1 1 1

s= + 2s +···=

2’s

22

Solving s = 1/ 2 ’ s yields s ’ 1 2 = 0, i.e. s = 1. Hence the extinction is certain.

The formula for r s is established by induction:

r ’ r ’1 / 2’s r + 1 ’ rs

s= =

r+1

r + 1 ’ r/ 2 ’ s r +2 ’ r +1 s

Hence, the probability that the rth generation is empty is

r

r0=

r +1

(ii) s = 213 1 + s√ whence solving 23 s = 1 + s 3 or s ’ 1 s2 + 4s ’ 1 = 0 we get

3

√

solutions s = 1, s = ± 5 ’ 2, and the extinction probability is 5 ’ 2 ≈ 0 24.

Problem 1.79 Consider a Galton“Watson process (i.e. the branching process where the

number of offspring is random and independent for each division), with X1 = 0 = 2/5

and X1 = 2 = 3/5. Compute the distribution of the random variable X2 . For generation

3 find all probabilities X3 = 2k , k = 0 1 2 3 4. Find the extinction probability for

this model.

1.7 Branching processes 105

= 2/3 and the PGF for X2

Solution The extinction probability

3

2 12 36 2 3

s= + + s+ s4

X2

5 125 125 5

s= 2/5 + 3s2 /5 . Then

The PGF X3 X2

3

2 12 36 3

X2 = 0 = + X2 = 2 = X2 = 4 =

5 125 125 5

2 3 4

2 12 36 2 3 2

X3 = 0 = + + + = 0 54761

5 125 125 5 5 5

24 — 33 25 — 34

X3 = 2 = + = 0 17142

55 57

4 — 34 42 — 35 8 — 35

X3 = 4 = + + = 0 17833

55 57 57

36 — 23

X3 = 6 = = 0 07465

57

7

3

X3 = 8 = = 0 02799

5

Problem 1.80 By developing the theory of extinction probabilities, or otherwise, solve

the following problem.

No-one in their right mind would wish to be a guest at the Virtual Reality Hotel.

The rooms are numbered 0 to 3N ’ 3 /2, where N is a very large integer. If 0 ¤ i ¤

3N ’1 ’ 3 /2 and j = 1 2 3 there is a door between Room i and Room 3i + j through

which (if it is unlocked) guests may pass in both directions. In addition, any room with

a number higher than 3N ’1 ’ 3 /2 has an open window through which guests can (and

should) escape into the street. So far as the guests are concerned, there are no other doors

or windows. Figure 1.9 shows part of the floor plan of the hotel.

12

11

3

10

9

2 8

0

7

6

1

5

4

Figure 1.9

Discrete outcomes

106

Each door in the hotel is locked with probability 1/3 independently of the others. An

arriving guest is placed in Room 0 and can then wander freely (insofar as the locked

√

doors allow). Show that the guest™s chance of escape is about 9 ’ 27 /4.

Solution Denote by Xr the number of rooms available at level r from 0. Writing

t = tXr , with =:

r 1

t = tXr+1 = Xr = i tXr+1 Xr = i

r+1

i

i

= Xr = i tX1

i

i

= Xr = i =

t t

r

i

t= =

t t , and

Then r+1 r

can™t reach level r = 0

r

t equals

Now PGF

3 2 2 3

1 2 1 2 12 2 1

+3 t+3 t+ t3 = 1 + 6t + 12t2 + 8t3

3 3 3 3 3 3 27

t = t becomes

Hence, the equation

27t = 1 + 6t + 12t2 + 8t3 i.e. 1 ’ 21t + 12t2 + 8t3 = 0

By factorising 1 ’ 21t + 12t2 + 8t3 = t ’ 1 8t2 + 20t ’ 1 , we find that the roots are

√

’5 ± 27

t=1

4

√

27 ’ 5 4. The sequence n 0 is monotone increasing

The root between 0 and 1 is

and bounded above. Hence, it converges as N ’ :

√

27 ’ 5

no escape ’

4

Then

√ √

27 ’ 5 9 ’ 27

escape ’ 1 ’ = ≈ 0 950962

4 4

Problem 1.81 (i) A mature individual produces offspring according to the PGF s.

Suppose we start with a population of k immature individuals, each of which grows to

maturity with probability p and then reproduces, independently of the other individuals.

Find the PGF of the number of (immature) individuals in the next generation.

1.7 Branching processes 107

(ii) Find the PGF of the number of mature individuals in the next generation, given

that there are k mature individuals in the parent generation.

Hint: (i) Let R be the number of immature descendants of an immature individual.

If X n is the number of immature individuals in generation n, then, given that X n = k,

X n+1 = R1 + · · · + Rk

where Ri ∼ R, independently. The conditional PGF is

sX n+1 X n = k = g k

s

where g x = 1 ’ p + px.

(ii) Let U be the number of mature descendants of a mature individual, and Z n be the

number of mature individuals in generation n. Then, as before, conditional on Z n = k,

Z n+1 = U1 + · · · + Uk

where Ui ∼ U , independently. The conditional PGF is

k

sZ n+1 Z n = k = gs

Problem 1.82 Show that the distributions in parts (i) and (ii) of Problem 1.81 have

the same mean, but not necessarily the same variance.

Hint:

d2 d2

X n+1

=1 =p sZ n+1 Z n = 1 = p2

n

s X 1 1

ds2 ds2

2 Continuous outcomes

2.1 Uniform distribution. Probability density functions.

Random variables. Independence

Probabilists do it continuously but discreetly.

(From the series ˜How they do it™.)

Bye, Bi, Variate

(From the series ˜Movies that never made it to the Big Screen™.)

After developing a background in probabilistic models with discrete outcomes we can

now progress further and do exercises where uncountably many outcomes are explicitly

involved. Here, the events are associated with subsets of a continuous space (a real line ,

an interval a b , a plane 2 , a square, etc.). The simplest case is where the outcome

space is represented by a ˜nice™ bounded set and the probability distribution corresponds

to a unit mass uniformly spread over it. Then an event (i.e. a subset) A ⊆ acquires the

probability

vA

A= (2.1)

v

where v A is the standard Euclidean volume (or area or length) of A and v that of .

The term ˜uniformly spread™ is the key here; an example below shows that one has to

be careful about what exactly it means in the given context.

Example 2.1 This is known as Bertrand™s paradox. A chord has been chosen at random

in a circle of radius r. What is the probability that it is longer than the side of the equilateral

triangle inscribed in the circle? The answer is different in the following three cases:

(i) the middle point of the chord is distributed uniformly inside the circle;

(ii) one endpoint is fixed and the second is uniformly distributed over the circum-

ference;

(iii) the distance between the middle point of the chord and the centre of the circle

is uniformly distributed over the interval 0 r .

108

2.1 Uniform distribution 109

In fact, in case (i), the middle point of the chord must lie inside the circle inscribed in

the triangle. Hence,

r 2 /4 1

area of the inscribed circle

chord longer = = =

r2

area of the original circle 4

In case (ii), the second endpoint must then lie on the opposite third of the circumfer-

ence. Hence,

1

chord longer =

3

Finally, in case (iii), the middle point of the chord must be at distance ¤ r/2 from the

centre. Hence,

1

chord longer =

2

A useful observation is that we can think in terms of a uniform probability density

function assigning to a point x ∈ the value

1

f uni x = Ix (2.2)

v

with the probability of event A ⊆ calculated as the integral

1

A= f uni x dx = dx (2.3)

v

A A

giving of course the same answer as formula (2.1). Because f uni ≥ 0 and f uni x dx = 1,

the probability of event A ⊆ is always between 0 and 1. Note that the mass assigned to

a single outcome represented by a point of is zero. Hence the mass assigned to any

finite or countable set of outcomes is zero (as it is the sum of the masses assigned to each

outcome); to get a positive mass (and thus a positive probability), an event A must be

uncountable.

Problem 2.1 Alice and Bob agree to meet in the Copper Kettle after their Saturday

lectures. They arrive at times that are independent and uniformly distributed between

12:00 and 13:00. Each is prepared to wait s minutes before leaving. Find a minimal s

such that the probability that they meet is at least 1/2.

is the unit square with co-ordinates 0 ¤ x y ¤ 1 (measuring the

Solution The set

time in fractions of an hour between 12:00 and 13:00). Each = x y ∈ specifies

Alice™s arrival time x and Bob™s y. Then the event: ˜they arrive within s minutes of each

other™ is a strip around the diagonal x = y:

s

A= xy∈ x’y ¤

60

Continuous outcomes

110

1“ s/60

1“ s/60

s/60

s/60

Figure 2.1

2

Its complement is formed by two triangles, of area 1/2 1 ’ s/60 each. So, the area

v A is

s 2

1’ 1’

60

and we want it to be ≥ 1/2. See Figure 2.1.

√

This gives s ≥ 60 1 ’ 2/2 ≈ 18 minutes.

Problem 2.2 A stick is broken into two at random; then the longer half is broken

again into two pieces at random. What is the probability that the three pieces will make

a triangle?

Solution Let the stick length be . If x is the place of the first break, then 0 ¤ x ¤

and x is uniform on 0 . If x ≥ /2, then the second break point y is uniformly chosen

on the interval 0 x . See Figure 2.2. Otherwise y is uniformly chosen on x .

Thus

= xy 0¤x y¤ y ¤ x for x ≥ /2 and x ¤ y ¤ for x ¤ /2

= 3 2 /4. See Figure 2.3.

and the area v

To make a triangle x y must lie in A, where

A = max x y > x’y <

min x y <

2 2 2

2

/4. The answer then is 1/3.

which yields the area

0 y x l

Figure 2.2

2.1 Uniform distribution 111

„¦

A

Figure 2.3

to a ˜half™ of the above set just assuming that x is always

It is also possible to reduce

the length of the longer stick:

= xy /2 ¤ x ¤ y¤x

the probability will still be 1/3.

Problem 2.3 A stick is broken in two places chosen beforehand, completely at random

along its length. What is the probability that the three pieces will make a triangle?

Answer : 1/4. The loss in probability occurs as we include into the outcomes the

possibility that a shorter stick is broken again while in the previous example it was

excluded.

Problem 2.4 The ecu coin is a disc of diameter 4/5 units. In the traditional game of

drop the ecu, an ecu coin is dropped at random onto a grid of squares formed by lines

one unit apart. If the coin covers part of a line you lose your ecu. If not, you still lose

your ecu but the band plays the national anthem of your choice. What is the probability

that you get to choose your national anthem?

Solution Without loss of generality, assume that the centre of the coin lies in the unit

square with corners 0 0 01 10 1 1 . You will hear an anthem when the

centre lies in the inside square described by

2 3 2 3

¤x¤ ¤y¤

5 5 5 5

Hence,

1

anthem = area of S =

25

There are several serious questions arising here which we will address later. One is

the so-called measurability: there exist weird sets A ‚ (even when is a unit interval

Continuous outcomes

112

(0,1)) which do not have a correctly defined volume (or length). In general, how does one

measure the volume of a set A in a continuous space? Such sets may not be particularly

difficult to describe (for instance, Cantor™s continuum has a correctly defined length),

but calculating their volume, area or length goes beyond standard Riemann integration,

let alone elementary formulas. (As a matter of fact, the correct length of is zero.) To

develop a complete theory, we would need the so-called Lebesgue integration, which is

called after H.L. Lebesgue (1875“1941), the famous French mathematician. Lebesgue was

of a very humble origin, but became a top mathematician. He was renowned for flawless

and elegant presentations and written works. In turn, the Lebesgue integration requires

the concept of a sigma-algebra and an additive measure which leads to a far-reaching

generalisation of the concept of length, area and volume encapsulated in a concept of a

measure. We will discuss these issues in later volumes.

An issue to discuss now is: what if the distribution of the mass is not uniform?

This question is not only of a purely theoretical interest. In many practical models is

represented by an unbounded subset of a Euclidean space d whose volume is infinite (e.g.

by d itself or by + = 0 , for d = 1). Then the denominator v in equation (2.1)

becomes infinite. Here, the recipe is: consider a function f ≥ 0 with f x dx = 1 and set

A= f x dx A ⊆

A

(cf. equation (2.3)). Such a function f is interpreted as a (general) probability density

function (PDF). The following natural (and important) examples appear in problems

below:

A uniform distribution on an interval a b a < b: here = a b and

1

fx = I a<x<b (2.4)

b’a

=

A Gaussian, or normal, distribution, with and

1 1

f x =√ exp ’ x’ x∈

2

(2.5)

2

2

2

Here ∈ > 0 are parameters specifying the distribution.

The graphs of normal PDFs on an interval around the origin and away from it are

plotted in Figures 2.4 and 2.5.

This is the famous curve about which the great French mathematician J.-H. Poincar©

(1854“1912) said ˜Experimentalists think that it is a mathematical theorem while the

mathematicians believe it to be an experimental fact.™

An exponential distribution: here = + and

f x = e’ x I x ≥ 0 x∈ (2.6)

Here > 0 is a parameter specifying the distribution.

The graphs of the exponential PDFs are shown in Figure 2.6.

2.1 Uniform distribution 113

0.4 mu = 0, sigma = 1

mu = 0, sigma = 1.5

mu = 0, sigma = 2

0.3

0.2

0.1

0.0

“3 “2 “1 0 1 2 3

x

Figure 2.4 The normal PDFs, I.

0.0 0.01 0.02 0.03 0.04 0.05 0.06

mu = 0, sigma = 1

mu = 0, sigma = 1.5

mu = 0, sigma = 2

3.0 3.5 4.0 4.5 5.0 5.5 6.0

x

Figure 2.5 The normal PDFs, II.

1.0

lambda = 1

lambda = 1/2

0.8

0.6

0.4

0.2

0.0

0 2 4 6 8 x

Figure 2.6 The exponential PDFs.

Continuous outcomes

114

=

A generalisation of formula (2.6) is the Gamma distribution. Here again and

+

the PDF is

’1 ’ x

fx = I x≥0

x e (2.7)

= 0 x ’1 e’x dx (the value of the Gamma function)

> 0. Here

with parameters

n = n ’ 1 !; in

is the normalising constant. Recall, for a positive integer argument,

= ’1 ’1 .

general, for > 1

The Gamma distribution plays a prominent rôle in statistics and will repeatedly appear

in later chapters. The graphs of the Gamma PDFs are sketched in Figure 2.7.

Another example is a Cauchy distribution, with = and

1

fx = x∈ (2.8)

+ x’

2 2

with parameters ∈ and > 0. There is a story that the Cauchy distribution was

discovered by Poisson in 1824 when he proposed a counterexample to the CLT. See

below. The graphs of the Cauchy PDFs are sketched in Figure 2.8.

Cauchy was a staunch royalist and a devoted Catholic and, unlike many other prominent

French scientists of the period, he had difficult relations with the Republican regime. In

1830, during one of the nineteenth century French revolutions, he went into voluntary

exile to Turin and Prague where he gave private mathematics lessons to the children of

the Bourbon Royal family. His admission to the French Academy occurred only in 1838,

after he had returned to Paris.

The Gaussian distribution will be discussed in detail below. At this stage we only

indicate a generalisation to the multidimensional case where = d and

1 1 ’1

fx = √ exp ’ x’ x’ (2.9)

2

d 1/2

2 det

0.5

lambda = 1, alpha = 0.5

lambda = 1, alpha = 1.5

0.4

lambda = 1, alpha = 5.5

0.3

0.2

0.1

0.0

0 2 4 6 x

Figure 2.7 The Gamma PDFs.

2.1 Uniform distribution 115

0.05 0.10 0.15 0.20 0.25 0.30

alpha = 0, tau = 1

alpha = 0, tau = 1.5

alpha = 0, tau = 2

“3 “2 “1 0 1 2 3

x

Figure 2.8 The Cauchy PDFs.

are real d-dimensional vectors:

Here, x and

⎛⎞ ⎛⎞

x1 1

⎜⎟ ⎜⎟

x=⎝ ⎠ =⎝ ⎠ ∈ d

xd d

and is an invertible positive-definite d — d real matrix, with the determinant det > 0

and the inverse matrix ’1 = ’1 . Matrix is called positive-definite if it can be

ij

represented as the product = AA— and strictly positive-definite if in this representation

matrix A is invertible, i.e. the inverse A’1 exists (in which case the inverse matrix

’1

= A—’1 A’1 also exists). It is easy to see that a positive-definite matrix is always

symmetric (or Hermitian), i.e. obeys — = . Hence, a positive-definite matrix has an

orthonormal basis of eigenvectors, and its eigenvalues are non-negative (positive if it is

stands for the Euclidean scalar product in d :

strictly positive-definite). Further,

d

’1 ’1

x’ x’ = xi ’ xj ’

i j

ij

i j=1

A PDF of this form is called a multivariate normal, or Gaussian, distribution.

Remark We already have seen a number of probability distributions bearing personal

names (Gauss, Poisson, Cauchy; more will appear in Chapters 3 and 4). Another example

(though not as frequent) is the Simpson distribution. Here we take X Y ∼ U 0 1 ,

independently. Then X + Y has a ˜triangular™ PDF known as Simpson™s PDF:

§

⎪u 0¤u¤1

⎪

⎨

fX+Y u = 2 ’ u 1 ¤ u ¤ 2

⎪

⎪

©0 02

Continuous outcomes

116

T. Simpson (1700“1761), an English scientist, left a notable mark in interpolation and

numerical methods of integration. He was the most distinguished of the group of itinerant

lecturers who taught in fashionable London coffee-houses (a popular way of spreading

scientific information in eighteenth-century England).

As before, we face a question: what type of function f can serve as PDFs? (The

example with f x = I x ∈ 0 1 \ , where ‚ 0 1 is Cantor™s set, is typical. Here

1

f ≥ 0 by definition but how 0 f x dx should be defined?) And again, the answer lies in

the theory of Lebesgue integration. Fortunately, in ˜realistic™ models, these matters arise

rarely and are overshadowed by far more practical issues.

So, from now until the end of this chapter our basic model will be where outcomes

run over an ˜allowed™ subset of (such subsets are called measurable and will be

introduced later). Quite often will be d . The probability A will be calculated for

every such set A (called an event in ) as

A= f x dx (2.10)

A

Here f is a given PDF f ≥ 0 with f x dx = 1.

As in the discrete case, we have an intuitively plausible property of additivity: if

A1 A2 is a (finite or countable) sequence of pair-wise disjoint events then

Aj = Aj (2.11)

j

j

j Aj ¤ = 1, we obtain that for the

Aj . As

while, in a general case, j

complement A = \ A, A = 1 ’ A , and for the set-theoretical difference A \

c c

A \ B = A ’ A © B . Of course, more advanced facts that we learned in the

B

discrete space case remain true, such as the inclusion“exclusion formula.

In this setting, the concept of a random variable develops, unsurprisingly, from its

discrete-outcome analogue: a RV is a function

∈ ’X

X

with real or complex values X (in the complex case we again consider a pair of

real RVs representing the real and imaginary parts). Formally, a real RV must have the

property that ∀ x ∈ , the set ∈ X < x is an event in to which the probability

X < y can be assigned. Then with each real-valued RV we associate its cumulative

distribution function (CDF)

y∈ ’ FX x = X<x (2.12)

varying monotonically from 0 to 1 as y increases from ’ to . See Figure 2.9.

The quantity

F X x = 1 ’ FX x = X≥x (2.13)

describing tail probabilities is also often used.

2.1 Uniform distribution 117

F(x)

1

.

x