Probability of a run of heads

The question is: If you flip n coins in a row, what is the probability that somewhere in those flips you’ll get a series of k (or more) heads in a row?

This is one that comes up periodically on Reddit, and I’ve answered it, though I’m not 100% I’ve always answered it exactly the same way twice. I do know that my answers to this one tend to be a little long and messy. To make a long story short, I always use some version of a recursive approach.

So the purpose of this article is to introduce the idea of recursive probability calculations, for problems which don’t seem amenable to another approach.

It interests me because it seems like there should be a simple formulation, but I always have to work out a fairly complicated approach. I’ve observed over the years that with counting problems there are almost always multiple approaches, and often one that’s quite elegant and simple. I suspect that’s the case here, but if so the alternate methods have so far eluded me.

Anyway, my approach is to use recursion, and if you haven’t seen a recursive probability calculation, then you may find my long-way around useful.

Why is this so hard?

First let’s talk about what makes it hard. Let’s say n = 5 and k = 3. You might be tempted to list the possible outcomes with 3 heads as: HHHxx, xHHHx, and xxHHH, where x means you don’t care what that flip was. Can’t you just count all of the outcomes fitting each of those patterns?

Well, not exactly. For instance HHHxx includes HHHHT, HHHHH, HHHTT and HHHHH. While xHHHx includes HHHHT, THHHT, HHHHH and THHHH. You’ll notice that two of those outcomes, HHHHH and HHHHT, are counted twice, and the pattern xxHHH includes other outcomes that have already been counted. So you wouldn’t just add up the four HHHxx outcomes, the four xHHHX, and the four xxHHH outcomes to say there are twelve ways this can happen. You’ve overcounted, which is always a danger in counting problems.

Now, there are well-established procedures for handling overcounting. First of all, you can try to construct your outcomes so that they don’t overlap. To make your outcomes mutually exclusive in other words. So we could use the pattern HHHTx and count those separately from HHHHx. There is guaranteed to be no overlap between HHHTx and xHHHx or xxHHH. With care, you could construct mutually-exclusive classes to count the outcomes for n = 5, k = 3.

But when I try to do that for larger problems, say n = 10, k = 3, where you could have multiple runs of 3 which are separated or together and a huge number of mutually-exclusive patterns, it gets unwieldy. I’m not saying it can’t be done, just saying it breaks my brain and I haven’t found a way yet.

The other approach to overcounting is inclusion-exclusion. Suppose you want to count the number of things that fall in set A, or in set B or in set C, etc and those sets have overlap as those three patterns HHHxx, xHHHx, and xxHHH do. Then the inclusion-exclusion rule is that to calculate the total number of things in the union of those sets:

  • Add up the membership of each individual set, N(A) + N(B) + \hdots. We know we’re overcounting the overlaps, so next we
  • Subtract the membership of each intersecting pair of sets, -N(A\cap B) -N(B\cap C) -N(A\cap C)\hdots. Each of those pairs is also counting things that belong to three or more sets, so we’ve over done the correction, which means we have to…
  • Add the membership of triplets of sets, N(A \cap B \cap C) +N(A\cap B\cap D)+\hdots
  • etc, alternating addition and subtraction (four sets, five sets, etc), until we’ve handled every combination.

This seems to me even worse than trying to construct mutually-exclusive categories. Suppose there were 20 different positions where the HHH could occur. Then you’d have to construct all the combinations of 2,\:3\;, etc outcomes and count each of those. In effect one term for every subset of a collection of 20 items. There are 2^{20} = 1048576 subsets.

Again, there may be a different collection of events you could define and a clever inclusion-exclusion argument that’s tractable. But I haven’t found it yet. So that brings us to…

Recursive counting of coin flips

Recursive definition of anything is a pretty simple idea. You have a rule for defining higher terms (larger n) in terms of lower terms (smaller n). And then you define what the first terms are so the rule has a place to start.

The classic example used in teaching recursive computer programming is the factorial function. n! can be defined recursively as n * (n-1)!, and you could define 0! = 1 to start things off. So for instance 3! = 3 * 2! = 3 * 2 * 1! = 3 * 2 * 1 * 0! = 3 * 2 * 1 * 1

As another example, the Fibonacci numbers can be defined recursively as a_n = a_{n-1} + a_{n-2}, so each term is defined in terms of two previous terms. It thus requires two numbers to be defined to start it out, so we usually define a_1 = 1, a_2 = 1.

For instance, to use the recursive expression to evaluate a_5 we would do this: a_5 = a_4 + a_3 = (a_3 + a_2) + (a_2 + a_1) = (a_2 + a_1 + a_2) + (a_2 + a_1) = 5. If we were evaluating a_{100} we would have to follow the chain from a_{99} and a_{98} all the way down to a_1 and a_2. It can take a lot of computation to evaluate such an expression for large values of n, though a few hundred should not be a problem for any computer (Python limits recursion to a depth of 1000 by default.)

Let’s define P(n, k) as the probability of at least one run of k heads in n flips, and let’s see what probabilities we can easily calculate for low values of n and k to use as the starting values for the recursions.

First of all, P(n, 1) = 1 for any n. Every flip is a “run” of length 1.

P(n, k) = 0 for all n < k because you can’t have a run of length k in less than k flips. We can include negative values of n as well.

P(n, n) is the probability that all flips are heads and is therefore equal to (1/2)^n for any n.

That handles the cases of n < k and n = k. Now let’s consider a general P(n,k), n > k. There will be a run of length k in the n flips if either

  • There is a run of k within the first n-1, probability P(n-1, k), or
  • The last k+1 flips are THH…H, probability (1/2)^{k+1}, and there is no run of length k in the first n - k - 1, probability 1 - P(n-k-1, k)

This gives us the recursion

\displaystyle P(n,k) = P(n-1,k) + \left ( \frac{1}{2} \right )^{k+1} \left[ 1 - P(n-k-1, k) \right ]

Let’s see how this works with, for example, P(5, 3). The outcomes with 3 or more heads in a row are HHHTT, HHHTH, HHHHT, HHHHH, THHHT, THHHH, HTHHH, TTHHH, 8 altogether out of 2^5 = 32. So P(5,3) = 8/32 = 1/4.


\displaystyle \begin{aligned} P(5,3) &= P(4,3) + \left ( \frac{1}{2} \right )^{4} \left[ 1 - P(1, 3) \right ] \\[9 pt] &= P(4,3) + \frac{1}{16} [1 - 0] = P(4,3) + \frac{1}{16}\\[9 pt] &= \left \{ P(3,3) + \frac{1}{16} \left [1 - P(0,3) \right ] \right \} + \frac{1}{16} \\[9 pt] &= \frac{1}{8} + \frac{1}{16} + \frac{1}{16} = \frac{1}{4}  \end{aligned}

Here’s an implementation of the recursive calculation in Python.

def P(n, k):
    if n < k:
        return 0
    elif n == k:
        return 1/2**n
        return P(n-1, k) + (1 - P(n-k-1, k))/2**(k+1)

and here is a function to generate all possible sequences of n coin flips and collect statistics on the longest sequence of heads found (the logical value True is coding for heads):

import itertools as it
import numpy as np
def seq_stats(n):
    # All possible flips of length n
    all_n_seqs = it.product({True, False}, repeat=n)
    stats = np.zeros((n+1,), dtype=int)
    for flips in all_n_seqs:
        seq_len = 0
        longest_seq = 0
        for flip in flips:
            if flip:
                seq_len += 1
                longest_seq = max(longest_seq, seq_len)
                seq_len = 0
        # print(flips)
        # print(f'Longest seq = {longest_seq}')
        stats[longest_seq] += 1
    return stats

This code runs a side-by-side comparison of the entire probability distribution for any n (takes about 10 seconds for n = 20).

# Do a side by side comparison of the two methods
def compare_methods(n):
    t0 = time.perf_counter()
    flip_stats = seq_stats(n)
    # stats are P(>=k) so do the cumulative sum
    empirical_stats = np.cumsum(flip_stats[::-1])[::-1] / 2**n
    t1 = time.perf_counter()
    theoretical_stats = [P(n,k) for k in range(n+1)]
    t2 = time.perf_counter()
    print('  k Empirical P(n,k)  Theoretical P(n,k)')
    for k in range(n+1):
        print(f'{k:3d} {empirical_stats[k]:12.5f} {theoretical_stats[k]:12.5f}')

and finally, here are the results for n = 10, validating the recursive formula:

  k Empirical P(n,k)  Theoretical P(n,k)
  0      1.00000      1.00000
  1      0.99902      0.99902
  2      0.85938      0.85938
  3      0.50781      0.50781
  4      0.24512      0.24512
  5      0.10938      0.10938
  6      0.04688      0.04688
  7      0.01953      0.01953
  8      0.00781      0.00781
  9      0.00293      0.00293
 10      0.00098      0.00098

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: