/usr/sbin/blog

Alex's Geekery Journal.

This blog has been deprecated. Go here for the new /usr/sbin/blog

Bouncing Python’s Generators With a Trampoline

In this post, I’m going to combine two of my favorite topics: recursion and Python’s generators. My aim is to address a real pain point in Python: its lack of tail call optimization. Let’s start with a simple countdown function that counts from start to 0 using recursion:

Recursive Countdown
1
2
3
4
5
6
def countdown(start):
    print start
    if start == 0:
        return 0
    else:
        return countdown(start - 1)

This function needs no explanation. It simply prints start and then calls itself with start - 1. The issue here is that for large values of start, we can easily blow Python’s stack:

Countdown Demo
1
2
3
4
5
6
7
8
9
10
11
% countdown(1000)
1000
999
[...]
3
2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in countdown
[...]
RuntimeError: maximum recursion depth exceeded

Of course, one option is to just loop:

For Loop Countdown
1
print "\n".join([str(i) for i in xrange(1000,-1,-1)])

But I didn’t write this post just to tell you that recursive functions can be translated into loops (duh). Instead, I’m going to use a technique called trampolining to convert tail-recursive functions into tail-recursive generators, which won’t blow the stack.

Returning to the countdown example, let’s think about why the recursive call is a problem in the first place. The issue is that in return countdown(start - 1), the call to countdown must complete before its value can be returned, thus increasing the call stack. “Well, of course,” you say. “That’s how recursive functions work.” But wait a minute. Is that really necessary? What if, rather than countdown calling itself, it simply returned the next recursive call in the sequence for its caller to execute. Hey, I think we might be on to something:

A Tail-Recursive Generator
1
2
3
4
5
6
def countdown(start):
    print start
    if start == 0:
        yield 0
    else:
        yield countdown(start - 1)

So now countdown is a generator which calls itself. That’s kind of wacky. Let’s call it and see what happens.

Trying It Out
1
2
% countdown(5)
<generator object countdown at 0x100492550>

That’s interesting. We now have a generator. Think about that for a moment. What does this generator represent? Ah, that’s right. It’s the next recursive call in the sequence. It just hasn’t been executed yet. Let’s execute it:

Sweet
1
2
3
% g = countdown(5)
% g = g.next()
5

We’re getting somewhere.

Manual Trampoline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
% g = countdown(5)
% g = g.next()
5
% g = g.next()
4
% g = g.next()
3
% g = g.next()
2
% g = g.next()
1
% g = g.next()
0
% g = g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'next'

Well, that’s really neat. Each generator represents one recursive call, and a generator’s return value is the next recursive call in the sequence, except for the last call. What happened there? We reached the base case, and the last generator returned 0.

We now have a general pattern. Let’s write a function which executes these tail-recursive generators automatically:

Trampoline Function
1
2
3
4
5
6
7
import types

def tramp(gen, *args, **kwargs):
    g = gen(*args, **kwargs)
    while isinstance(g, types.GeneratorType):
        g=g.next()
    return g

So, a generator function is passed in, which is then used to create the generator g. g.next() is repeatedly called, and the next recursive call is stored back to g. We keep doing this until g no longer returns a generator. If g is not a generator, then we’ve reached the base case, and we return it (if the base case is a generator, you’ll have to use something like exceptions to signal that you’ve reached the base case).

Note: It’s about time that I explained where the word “trampoline” comes from. The idea is that each recursive call is “trampolined” off the trampoline runner (the tramp function in this case). That is, each recursive call is returned to tramp and is bounced off of it. I also like to think that each recursive call bounces higher (just like on a trampoline) because the “virtual” recursion depth increases, but it’s very possible that I made that up in my head.

Let’s try this with countdown, and a large start value:

Trampoline Test
1
2
3
4
5
6
7
8
9
% tramp(countdown, 1000)
1000
999
[...]
2
1
0
0   <--- Since we executed in the interactive interp,
         this is the return val of `tramp`.

Excellent! The original function blew the stack, but the trampolined generator does just fine. Furthermore, we’ve written a general trampoline runner to execute these trampolined generators. But, before we call it a day, we have to write a trampolined version of the Fibonacci function, otherwise, what’s even the point of recursion?

Trampolined Fibonacci
1
2
3
4
5
6
7
def fib(count, cur=0, next_=1):
    if count <= 1:
        yield cur
    else:
        # Notice that this is the tail-recursive version
        # of fib.
        yield fib(count-1, next_, cur + next_)

Let’s take this for a test drive:

The First 10 Fib Numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Print the first 10 Fibonacci numbers.
for i in xrange(1,11):
    print tramp(fib, i)

Output:
0
1
1
2
3
5
8
13
21
34

Well, that’s terrific. Not only is the tail-recursive version just as efficient as the for-loop version (except, of course, for function calling, and generator overhead), it also won’t blow the stack.

So, this is yet another example of the power of Python’s generators (Python’s most under appreciated feature, as David Beazley argues). What are the takeaways?

  1. This technique only works for tail-recursive functions. The recursive call must be the last thing the function does.
  2. The translation to a trampolined generator is easy! Just turn the return statements into yield expression.
  3. Although this will protect your stack, creating a generator for each call is probably slow.

If this was fun for you, then definitely check out David Beazley’s presentation on coroutines and concurrency. He uses trampolining, generators, and coroutines to write an OS in Python, complete with a task scheduler, sys calls, and blocking and non-block IO. This man is a wizard.

Eli Bendersky also has a couple great articles on using generators for lexical scanning and state machines. Those posts inspired me to write a command line argument parser using coroutines, which can process a stream of characters and handles all the usual suspects, like quotes and escape characters. As you can tell, I’ve been getting quite a bit of mileage out of these techniques.

Testing With the Poisson Process

The Poisson process is a mathematical abstraction of random events happening over a stretch of time. It’s often used to model all sorts of occurrences from particle emission due to radioactive decay, to natural disasters.1 The process itself has several mathematical properties such as not allowing for simultaneous events, but these specifics aren’t important for our purposes. What makes it useful to software engineers is that it also turns out to be a good model for events like hits to a webpage2 or requests to an API. For example, suppose you have written a new API for your web site, and you wanted to test it out under normal load. In order to simulate this with a Poisson process, all you’d need to know is the expected number of hits per hour (or minute or second). With this information, you could then generate random inter-arrival times (the times between events). That is, you could model the amount of time between requests to your API using this equation:3

Exponentially distributed random variable

This is the equation for an exponentially distributed random variable, where U is a random number between 0 and 1, and 1/α is the expected value of the inter-arrival times. In English, this means that if you expect 100 hits per hours, then 1/α would equal 1/100 (because you expect there to be a hit every 0.01 hours), and U would be some random number generated with a call to Math.random(). The number that pops out the other end would be a randomly generated inter-request time. If you wanted to simulate more than one request, you would evaluate this equation multiple times (using a new randomly generated U each time), and each result would be the time between two requests. So if you ended up with 0.04, 0.2, 0.1, then the second request would come in 0.04 hours after the first, and the third would come in 0.2 hours after the second, and so on.

This technique can be automated using a simple Python script:

1
2
3
4
5
6
7
8
9
10
11
12
# Make a function iterable, by repeatedly calling it.
def make_iterable(func, *args):
    try:
        while 1:
            yield func(*args)
    except:
        pass

uni_rand = make_iterable(random.uniform, 0, 1)

# A generator for inter arrival times.
inter_arrival = ( -(1./a)*math.log(u) for u in uni_rand)

This creates a generator inter_arrival which generates one inter-arrival time every time it’s iterated on. Using this to test an API is as simple as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import random
import math
import sys
import time

# Expected number of arrivals per unit time.
a = float(sys.argv[1])
# Number of events
count = int(sys.argv[2])

# Function for testing API
def test_api():
    print "Testing API..."
    # TODO: Make request to the API

# Make a function iterable, by repeatedly calling it.
def make_iterable(func, *args):
    try:
        while 1:
            yield func(*args)
    except:
        pass

uni_rand = make_iterable(random.uniform, 0, 1)

# A generator for inter-arrival times.
inter_arrival = ( -(1./a)*math.log(u) for u in uni_rand)

# Generate inter-arrival times, then sleep for that long.
inter_arrival_iter = iter(inter_arrival)
for i in xrange(count):
    inter_arrival_seconds = inter_arrival_iter.next() * 3600.
    print "Sleeping for %f seconds." % inter_arrival_seconds
    time.sleep(inter_arrival_seconds)
    test_api()

This script takes in two arguments. The first is the expected number of events per hour, and the second is the number of events to generate. The last loop is where the testing occurs. It generates one inter-arrival time, sleeps for that long, and then calls the test_api function which would need to be filled in with the API test.

Running this file with input 3600 10 generates 10 events, each one spaced by about 1 second:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
% python poisson.py 3600 10
Sleeping for 1.710953 seconds.
Testing API...
Sleeping for 5.028327 seconds.
Testing API...
Sleeping for 0.511307 seconds.
Testing API...
Sleeping for 2.336480 seconds.
Testing API...
Sleeping for 1.594504 seconds.
Testing API...
Sleeping for 0.308939 seconds.
Testing API...
Sleeping for 0.682607 seconds.
Testing API...
[snip]

References

  1. Tijms, Henk. Understanding Probability: Chance Rules in Everyday Life. 2nd ed. New York: Cambridge University Press, 2010. 118. Print.
  2. Arlitt, Martin F.; Williamson, Carey L. (1997). “Internet Web servers: Workload characterization and performance implications”. IEEE/ACM Transactions on Networking 5 (5): 631. DOI:10.1109/90.649565
  3. Ibid 1. Page 124

Did the Boulder PD Reduce Traffic Accidents?

There’s an article I came across a couple of years ago, which I filed away in the back of my brain, concerning the number of traffic accidents in Boulder. The police department increased enforcement at several intersections, and they were optimistic about the results. At all of the intersections, the number of accidents had dropped compared to the number of accidents in the year before.

The police department’s traffic unit decided early last year to step up enforcement around three intersections that consistently rank among the top accident locations in the city…That meant placing more officers in marked cars at the intersection…Compared with 2008, last year saw 14 fewer accidents at those intersections, an 11 percent decrease.

The box to the left of the article presents the data, which I’ve reproduced below:

1
2
3
4
5
6
7
8
9
10
11
12
Accident data
Intersection            2008    2009

28th and Arapahoe       52      46

30th and Arapahoe       39      33

Foothills and Arapahoe  35      33

Citywide                3,242   3,405

Source: Boulder Police Department

As you may have guessed, I was quite skeptical of this article. Just because a random variable varies from experiment to experiment, doesn’t mean the variation is anything but coincidence. I just flipped a coin 4 times. The first two times it was heads, and the second two, it was first heads and then tails. That, of course, doesn’t mean the coin suddenly became less lucky. So, the question I want to answer in this post is: Is the Boulder PD justified in thinking that they’ve decreased the number of traffic accidents at these three intersections?

The way I’m going to answer this question is to first model these accidents with a probability distribution, and then use this to find the probability that the decrease in accidents could have been due to random variation. The distribution I’m going to use is a really interesting one called the Poisson distribution. This distribution is great for modeling events that happen very rarely, such as shark attacks and traffic accidents. Many people drive every day, but very few (relatively speaking) are involved in an accident. Further, this distribution has one parameter, λ, and both its variance and expected value are equal to it.

The issue, of course, is that we don’t know the expected value or variance of traffic accidents at these intersections. We can, though, make a charitable assumption. In the interests of some fun mathematics, let’s assume that 2008 was an incredibly average year, and that the number of accidents that occurred that year is exactly the expected value. Since the variance is equal to the expected value, we can, for example, say that the variance is 52 for 28th and Arapahoe.

Now that we know the variance and expected value, we can calculate the z-score of the decrease in accidents. This will tell us the probability that the random variable (in this case, the number of accidents at 28th and Arapahoe) could, by mere chance, decrease by 6 to 46. But wait a minute. z-scores only pertain to normal distributions, and we’re using the Poisson distribution. This is ok, though. Since λ is large (λ > 25) it can be approximated with the normal distribution.1 So, let’s calculate the z-score:

We now know that a decrease of 6 accidents from 52 falls at exactly -0.832 standard deviations from the mean. From this, we can tell that the chance that the number of accidents in a given year would vary between 46 and 58 (±6 from the mean of 52) is 60%, and the chance that the number of accidents would be something less than or equal to 46 is 20%. So, it seems quite likely that this variation was just random coincidence. The Boulder PD is probably being a bit too optimistic in thinking that it’s their additional enforcement that decreased the number of traffic accidents at 28th and Arapahoe. Now, what about the other intersections? I’ve repeated this calculation for the other three statistics below:

1
2
3
4
5
6
7
8
9
10
Accident data
Intersection            2008    2009    z-value     P(z < z-value) (left tailed p-value)

28th and Arapahoe       52      46      -0.832      0.202

30th and Arapahoe       39      33      -0.971      0.1683

Foothills and Arapahoe  35      33      -0.334      0.369

Citywide                3,242   3,405    2.862      0.0021 (right tailed)

The decrease at 30th and Arapahoe is slightly more significant, but still has a 17% chance of being due to randomness. Foothills and Arapahoe fairs the worst of the three, with a whopping 37% chance of being random. Ironically, it is the citywide increase in accidents that is the most significant result. There’s only a 0.2% chance that the number of accidents would randomly increase by at least 163! This is statistically significant, and it’s plausible that this variation isn’t due to random chance. Perhaps the Boulder PD spent too much time at these intersections at a cost to all of the other intersections in the city!

In any case, this analysis is, of course, far from perfect. The assumption I made concerning the expected value may very well have been wrong. Perhaps both 2008 and 2009 were anomalous years, and the expected value is actually much different. I will also grant that it is interesting that the citywide total went up, while these three intersections all went down, but I’m also not sure how much cherry picking they did. Really, the point of this post was to showcase some interesting mathematics, and show how far a little algebra can go. With only a little more data, I think this sort of analysis would have some interesting results. Hopefully someone at Boulder PD is listening. I wonder how much more money they threw at this enforcement campaign without really having any good data on its efficacy.

Notes

  1. Tijms, Henk. Understanding Probability: Chance Rules in Everyday Life. 2nd ed. New York: Cambridge University Press, 2010. 169. Print.

Permutations: An Interview Question

Here I go over a coding question I was recently asked during an interview.

You can find the source here: https://gist.github.com/2604992

Notes

  1. In some languages, prepending to a list is faster than appending. In Python it doesn’t matter.
  2. The weird characters in the Vim session are whitespace characters. $ is a newline character.

Look, Ma! No Loops: Reversing Word Order

Here’s a classic interview question:

Write a function that takes a string and reverses its word order. So, “Reverse this string” results in “string this Reverse”.

Notice that the task isn’t to reverse the string, but rather to reverse the order of the string’s words. So the first word becomes the last. The second word becomes the second to last, and so on.

Whenever I come across puzzles like this, I shudder when I start to think about the nested loops and pointer arithmetic. Unfortunately, that describes the classic solution pretty well. The common wisdom is that first you should reverse the entire string. Then you should go back and reverse each word. So:

  1. “Reverse this”
  2. “siht esreveR”
  3. “this Reverse”

The first step isn’t bad, but reversing each word will require at least one nested loop, and some careful thinking about how to deal with the spaces. Instead, let’s think through a functional solution that doesn’t involve any loops. It will necessarily be recursive, so the first step will be to come up with each of the cases we’ll encouter as we walk the string.

  • Recursive Case 1: We’re in the midst of a word. Add the current letter to the end of some word accumulator.
  • Recursive Case 2: We’re at the end of a word (a space). Add the current word accumulator to the beginning of some string accumulator, then clear the word accumulator.
  • Base Case: The string is empty. Return the word accumulator prepended to the string accumulator.

So, there are two accumulators. One grabs the letters of each word as we walk the string and keeps the letters in order (the word accumulator). Then, once an entire word has been read into the word accumulator, it’s prepended to the string accumulator. Prepending, reverses the word order. Once it’s prepended, the word accumulator is cleared in preparation for the next word. Let’s translate this case by case.

The base case is the easiest. Let’s start there:

Base Case
1
2
if len(s) == 0:
    return word_acc + str_acc

Start by testing if the input string, s, is empty. If it is, then we’ve reached the end of the string, so we return the contents of the word_acc, prepended to the str_acc. word_acc contains the last word in the string, and we’re prepending it to the rest of the string that has already been reversed.

Now let’s deal with the second recursive case:

2nd Recursive Case
1
2
elif s[0] == " ":
    return helper(s[1:], "", " " + word_acc + str_acc)

If we’ve gotten this far, the input string must have some length, so we check if the first character is a space. If it is, we’ve reached the end of a word, so we need to prepend str_acc with the word that’s just been read in (word_acc). We then clear out the word_acc in preparation for the next word, and recursively call the function with the rest of the input string, s. The recursive call is to helper because this will be nested inside a convenience wrapper that only takes one argument.

Finally, the last case:

1st Recursive Case
1
2
else:
    return helper(s[1:], word_acc + s[0], str_acc)

If we’ve gotten this far, we’re in the midst of a word (the string isn’t empty, and the current character isn’t a space). In this case, we add the current character onto the word_acc and then recurse with the rest of the string. Remember, the word_acc is building up each word character by character as we walk the input string.

Finally, we can simply throw these cases into a function, and it should just work. Ahh, the magic of recursion.

Reversing Word Order
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def rev_words(s):
    '''Returns the string `s` with its words in the reverse order.
       Example: rev_words("Reverse this!") #"this! Reverse"
    '''
    def helper(s, word_acc, str_acc):
        # Base case: Return word accumulator plus whatever words
        # you have in the string acc
        if len(s) == 0:
            return word_acc + str_acc
        # This is the end of a word. Clear `word_acc`, and start with
        # the next word.
        elif s[0] == " ":
            return helper(s[1:], "", " " + word_acc + str_acc)
        # In the middle of a word. Add the letter to `word_acc`, and continue
        else:
            return helper(s[1:], word_acc + s[0], str_acc)
    return helper(s, "", "")

As much as I like this solution, it’s not without its drawbacks. The most significant is that Python’s string methods are very slow. Each concatenation causes both strings to be copied, and at least one concat is happening for each character in the string. Possible solutions include turning the string into a list of characters (list methods are much faster), or using a language that better supports the functional paradigm.1 But really, if speed is your highest priority, the imperative solution of swapping characters in place is your best bet, warts and all.

If you’d like to play with this code yourself, I’ve posted a gist here, along with some tests.

Notes

  1. If Python had a constant time prepend function (‘cons’ in the functional world), I suspect a better solution would be possible.

Building Data Structures From Functions

Here’s a puzzle I’ve adapted from exercise 2.4 of SICP.

Suppose you are programming in a language that only supports function application. That is, defining functions and applying arguments to these functions are the only things this language supports. Using only these building blocks, could you construct a linked list?

Surprisingly, the answer is yes, and the exercise linked to above provides a partial solution. Below, I’ve translated that solution into Python, and completed the exercise:

Solution in Python
1
2
3
4
5
6
7
8
9
# Create a pair from l and r
def cons(l, r):
    return lambda get: get(l, r)
# Given a pair, return the head (left) value.
def head(pair):
    return pair(lambda l, r: l)
# Give a pair, return the tail (right) value.
def tail(pair):
    return pair(lambda l, r: r)

First, let’s examine how these functions can be used, and then I’ll explain how they work. Consider the snippet below:

Usage Example
1
2
3
4
5
6
7
l1 = cons(1, 2)
print head(l1)       # Prints 1
print tail(l1)       # Prints 2
l2 = cons(0, l1)
print head(l2)       # Prints 0
print head(tail(l2)) # Prints 1
print tail(tail(l2)) # Prints 2

As can be seen, cons() is the constructor for this pair type. Give it two values, and it will create a pair of those two values. head() and tail() are the basic operators that let us access values inside these pairs; they return the left and right element of the pair, respectively. Also notice that we can create pairs of pairs. The last half of the example creates a pair composed of 0 and (1,2). Why is this significant? Well, we’ve just made a linked list! Linked lists are simply pairs of pairs. The list [1,2,3,4] can, for example, be represented as cons(1,cons(2,cons(3,cons(4,None)))). What’s None doing at the end of the list? You can think of it like the NULL pointer at the end of a linked list in C. If a function were traversing the list, None would signify to the function that it has reached the end. Mathematically, you can think of a linked list as an inductively defined data structure, where None is the base case. None is referred to as the empty list.

Now for the interesting part. How do these functions work? First let’s look at the cons() function:

cons()
1
2
3
# Create a pair from l and r
def cons(l, r):
    return lambda get: get(l, r)

This takes in two parameters (l and r) and returns a function.1 This returned function takes yet another function, which it applies to l and r and returns the result. So, if we call cons(1,2), we are returned a function, which takes a function and applies that function to the arguments 1 and 2. If, for example, I called cons(1,2)(lambda x, y: x+y) I’d get 3, because the addition function would be applied to 1 and 2.

Now suppose that we didn’t want to add l and r. Instead, we wanted to pull either l or r out of a pair that was already constructed. In other words, given list = cons(1,2), how could we pull the 1 or 2 out of the function stored in list? Well, all we need to do is pass it a function that returns only the left or right parameter. So, cons(1,2)(lambda l, r: l) would give us back 1, and cons(1,2)(lambda l,r: r) would give us back 2. This is almost exactly what the head() and tail() functions are doing! They take in a function (presumably produced by the cons() function), and apply either lambda l,r: l or lambda l,r: r. Just to cement this all together, below I step through the evaluation for the example head(cons(1,2)).

Evaluation Steps for head(cons(1,2,))
1
2
3
4
5
head(cons(1,2))
-> head(lambda get: get(1, 2))
-> (lambda get: get(1, 2))(lambda l,r: l)
-> (lambda l,r: l)(1,2)
-> 1

And here is a bit of code that uses these new functions to add the elements of a list:

Sum Function
1
2
3
def sum_list(l):
    return head(l) if tail(l) == None else head(l) + sum_list(tail(l))
sum_list(cons(1,cons(2,cons(3,None)))) # Returns 6

In the end, this might strike you as nothing more than a useless programming trick. In a sense that’s right. I’d never use this in my own code. What makes this technique so valuable is that it actually fits into the broader context of lambda calculus, which is a mathematical abstraction of computation. In the language of lambda calculus, you’re given only a very basic set of tools. Essentially all you have are simple one parameter functions (it doesn’t even have integers!). Incredibly, it turns out that that’s all you need for a Turing complete language. It’s possible to build integers, conditionals, loops, arithmetic operations, and (as you’ve seen here) data structures out of these simple building blocks. As a next step, and to see this for yourself, it might be fun to take a minute and see how the code presented here could be modified to give you binary trees, or even graphs. After that, you could check out this incredible article on writing the infamous FizzBuzz exercise in Ruby using only functions.

Notes

  1. Technically, this is really a closure, which is a function with some of its variables already set. In the terminology of the PL world, it’s a function bundled with its environment (the set of variables in its outer scope).

Super Bowl Probabilities: The Coin Toss

Browsing through my Twitter stream, I came across a blog post discussing an allegedly 3.8-sigma event: Apparently the last 14 Super Bowl coin tosses have been won by the NFC. What are the chances of this? The blog linked to above claims a probability of (1/2)13. Another blog claims (1/2)14. Which is correct? Before I get into the mathematics, I’ll try to disentangle three different questions:

  1. Out of 14 tosses, what’s the probability that they all come up heads?
  2. Out of 14 tosses, what’s the probability that one team wins all of them?
  3. Out of 45 tosses (one for each Super Bowl), what’s the probability that one team wins a string of 14 of them?1

(1/2)14 is the correct answer to question (1). There’s a 1 in 2 chance of getting heads. The probability of getting heads 14 times out of 14 tosses is therefore (1/2)14.

(1/2)13 is the correct answer to question (2). If you call a coin in the air, there’s 1 in 2 chance you’ll win the toss. Out of 14 tosses, the chance that either the NFC will win the toss 14 times or the AFC will win the toss 14 times is: (1/2)14 + (1/2)14 = (1/2)13.

The last situation is much more difficult to calculate, and similar questions are often the cause for much surprise. For example, if you toss a coin 20 times, do you think it’s likely or unlikely that you’ll get a string of 5 heads in a row? It seems like this should be unlikely. After all, the probability of tossing a coin 5 times, and ending up with heads every time is quite small: (1/2)5 = 1/32 (approx. 3%). Believe it or not, the actual probability is around 25%.2

How are these probabilities found? One solution is a rather nasty recursive formula.

A similar recursion can be given to calculate the probability that in n fair coin tosses a run of r heads or r tails occurs. In this case, we say that the tossing process is in state (i, k) when there are k tosses still to go and the last i tosses all showed the same outcome but so far no run of r heads or r tails has occurred. The probability vk(i) is defined as

vk(i) = the probability of getting a run of r heads or r tails during n tosses when the current state of the tossing process is (i, k).

The probability vn-1(1) is being sought (why?). Verify for yourself that the following recursion applies for k = 1,2,…,n.

vk = 0.5*vk-1(i + 1) + 0.5*vk-1(1) for i =,…,r - 1.

The boundary conditions are v0(i) = 0 for 1 <= i <= r -1 and vj(r) = 1 for 0 <= j <= n - 1.3

I’m actually not sure why we’re looking for vn-1(1) as opposed to vn(0). I tested it for a few values, and those expressions seem to be equal.4 In any case, the formula is dense, but you can see the logic behind it. If you’re in the midst of a streak of heads, and your next flip comes up heads, your state is now (i+1, k-1). That is, your streak has increased, but your tosses to go has decreased. This explains the first half of the formula: 0.5*vk-1(i + 1). The other case is that you’re in the midst of a streak of heads, but then you get a tail, so your state becomes (1, k-1). That is, you now have a streak of 1 tail, and your total number of tosses to go decrements by one. This explains the last half of the equation. The equation then branches out like a tree, solving for each possibility along the way. Here’s a bit of Scheme that implements this:

An Exact Solution in Scheme
1
2
3
4
5
6
7
8
9
(define (coin-streak n r)
    (define (cs-helper k i)
      (cond ((and (<= 1 i) (<= i (- r 1)) (= k 0)) 0)
            ((and (<= 0 k) (<= k (- n 1)) (= i r)) 1)
            (else (+
                   (* (/ 1 2) (ct-string r n (- k 1) (+ i 1)))
                   (* (/ 1 2) (ct-string r n (- k 1) 1 ))))
            ))
    (cs-helper (- n 1) 1))

Notice that this problem quickly becomes intractable for large values of n. Each call branches into two different calls, giving an exponential growth (props to the first person who posts a tail recursive memoization). This is why I prefer solving this by simulation. What we need to do is simulate 45 coin flips, and check if we’ve encountered a string of 14 heads or 14 tails. Do this, say, 100,000 times and see how often this event occurs.

ctsim.py: An Approximation by Simulation in Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/usr/bin/python -O

import random
import sys

STREAK_LEN  = int(sys.argv[1])
TOSSES      = int(sys.argv[2])
TRIALS      = int(sys.argv[3])

def simulation(trials, tosses, streak):
    ''' Simulates *trials* trials of *tosses* tosses and returns the
        fraction of trials that contained at least one streak
        of *streak* or higher.
    '''

    def run_trial():
        ''' Flips a coin *tosses* times and returns True if a streak of
            *streak* length is encountered.
        '''
        cur_streak = 1      # Length of the current streak.
        prev_outcome = None # Outcome of a previous toss.
        # Simulate the tosses.
        for toss in range(tosses):
            outcome = random.randint(0,1)
            # If the last toss is the same as the current toss,
            # increment the length of the current streak.
            if outcome == prev_outcome:
                cur_streak += 1
            else:
                cur_streak = 1

            # If the current streak is equal to the target
            # length, return True
            if cur_streak == streak:
                return True

            prev_outcome = outcome

        return False

    streak_count = 0
    # Simulate all the trials, and keep track of how many had
    # a streak.
    for trial in range(trials):
        had_a_streak = run_trial()
        if had_a_streak:
            streak_count += 1

    return streak_count / float(trials)

if __name__ == "__main__":
    print simulation(TRIALS, TOSSES, STREAK_LEN)

We can now simulate the Super Bowl situation by looking for a streak of 14 wins out of 45 tosses with 100,000 simulations.

ctsim.py Usage
1
2
% ./ctsim.py 14 45 100000
0.00202

So, the probability of this happening is 0.00202 or around 0.2%. This is still tiny, but not as small as (1/2)13 (approx. 0.00012 or 0.012%).

Notes:

  1. Even this is slightly ambiguous. The exact question that we’ll be looking at is: What is the probability that one team will have at least one winning streak of at least 14 tosses.
  2. Tijms, H. (2010). Understanding probability: Chance rules in everyday life. (2 ed.). Cambridge, UK: Cambridge University Press.
  3. Ibid.
  4. Perhaps the equation vn(0) is technically undefined. That is, i cannot equal 0 because you must have a streak of at least 1 head or 1 tail. Nevertheless, it looks like the formula still works for this technically undefined state.

Implementing a Logic Evaluator Without If-expressions or Boolean Operators

I’ve slowly been making my way through An Overview of the Scala Programming Language [PDF], and was struck by this interesting implementation of Booleans:

1
2
3
4
5
6
7
8
9
10
11
12
abstract class Bool {
  def && (x: => Bool): Bool
  def || (x: => Bool): Bool
}
object False extends Bool {
  def && (x: => Bool): Bool = this
  def || (x: => Bool): Bool = x
}
object True extends Bool {
  def && (x: => Bool): Bool = x
  def || (x: => Bool): Bool = this
}

If you’re not a Scala programmer, don’t let the syntax trip you up. It’s pretty straightforward. True and False are singleton objects that inherit from the abstract class Bool. Each object implements the && (and) and || (or) operators. As you’d expect, they each take a Bool as an argument and return a Bool. When you execute:

1
True && False

It applies True’s && method to False, and correctly returns x, which is, in this case, False.

What’s so interesting about this is the absence of any if-expressions or built-in boolean operators. To illustrate, a less clever, but more obvious implementation would be as follows:

1
2
3
4
5
6
7
8
9
def and(x: Boolean, y: Boolean): Boolean = {
  if(x)
      if(y)
          true
      else
          false
  else
      false
}

So, if x and y are both true, return true , otherwise return false (Scala returns the value of the last executed expression). It works, but doesn’t win any style points. The nested if-expressions are especially ugly.1

So, this is a neat way of cutting down on if-expressions. Can it be extended to the other operations? Yes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
object True extends Bool {
  def && (x: => Bool): Bool = x
  def || (x: => Bool): Bool = this
  def unary_! : Bool = False
  def xor (x: => Bool): Bool = !x
  def if_ (x: => Bool): Bool = this
  def iff(x: => Bool): Bool = x
}

object False extends Bool {
  def && (x: => Bool): Bool = this
  def || (x: => Bool): Bool = x
  def unary_! : Bool = True
  def xor (x: => Bool): Bool = x
  def if_ (x: => Bool): Bool = !this
  def iff(x: => Bool): Bool = !x
}

One thing I like about this implementation is the pleasing symmetry between True’s methods and False’s methods. The unary_! (not) method is trivially opposite. True returns False and vice versa, but the symmetry continues throughout xor, if_, and iff. For example, False’s xor returns its parameter, while True’s xor returns the opposite of its parameter. False xor True returns True, whereas True xor True returns False. I’ll leave verifying the rest of these rules as an exercise to the reader.2

It’s also interesting to compare this to the State Pattern, where a state machine is simulated through the use of objects that implement a common interface. Each state is an object, and transitioning between states is as easy as switching between the different objects. Without this pattern, the alternative would be a complex set of if-expressions dictating what you’re allowed to do in a certain state, and which states you can switch to given your current state. The State Pattern essentially hides all this behind polymorphism, similar to what’s happening in the logic evaluator.

Notes

  1. You can cut it down to one if-expression if you’re on your toes.
  2. Note that if_ is backwards. True if_ False is the same as False -> True, because that’s closer to the English.

Picking Random Items: Take Two (Hacking Python’s Generators)

Earlier today I had my mind blown by David Beazley’s presentation on the power of Python’s generators, and it inspired me to write this highly Pythonic version of the random word selector, made almost entirely of generators:

1
2
3
4
5
6
7
8
9
import heapq
import random

lines       = (line for line in open("/usr/share/dict/words"))
word_pairs  = ((random.random(), word) for word in lines)
rand_pairs  = heapq.nlargest(4, word_pairs)

rand_words  = [word for rand, word in rand_pairs]
print rand_words

How does this work? First recall that a generator is an object that returns the next item in a sequence every time its next() method is called. There’s an example of this on line 4 where a generator named lines is created, which returns the next line of the file every time lines.next() is called. What’s so handy about a generator is that it can be automatically consumed by a for loop. That is, put a generator in a for loop, and the loop will automatically call the generator’s next() method on every iteration. There’s an example of this on line 5 where another generator is created that uses a for loop to consume the lines generator. This outputs a tuple containing a random number and the line returned by lines.next(). So, the result is that each time word_pairs.next() is called, you get the next line of the file paired with a random value (e.g., (0.12345, 'fire\n')). Finally, we use heapq.nlargest(n, iter) to grab the n largest elements from iter. In this case, it repeatedly calls word_pairs.next() and outputs a list of the 4 words with the highest random values.1 These are our 4 random words. This is all done in around 3 lines (excluding imports and printing). Wowza.

As Beazley points out, one advantage of this technique is that it resembles the way simple commands are chained together in the shell to create a pipeline. And, just like in the shell, the pipeline is highly modular, so different filters and stages can be easily inserted at different points. Below, I’ve added two stages to the pipeline that strip the words of their newline characters, and skip words that aren’t exactly 13 characters long:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import heapq
import random

def isValid(word):
    return len(word) == 13

lines       = (line for line in open("/usr/share/dict/words"))
words       = (line.strip() for line in lines)
valid_words = (word for word in words if isValid(word))
word_pairs  = ((random.random(), word) for word in valid_words)
rand_pairs  = heapq.nlargest(4, word_pairs)

rand_words  = [word for rand, word in rand_pairs]
print rand_words

The words generator calls strip() on each line which removes the newline character. The valid_words generator only returns words that pass the isValid test. In this case, isValid returns True only if the word is exactly 13 characters long. The end result is 4 random words that are 13 characters long.

One other advantage is that each generator creates its output only when requested. This translates into minimal memory use. The dictionary file being consumed might be gigabytes in size, but only one word will be loaded into memory at a time (excluding buffering done by the file class, etc). It’s definitely a neat way of parsing large files.

If you enjoyed this, definitely check out Beazley’s presentation, and venture further down the rabbit hole.

Notes

  1. You could even use the built-in function max() if you only need one word.

Picking Random Items From a File

Here’s a deceptively simple programming puzzle: Develop an algorithm for randomly selecting n words from a dictionary file. This is essentially the puzzle I had to solve in order to write my xkcd-style password generator, which implements the xkcd password spec.1

The simplest solution is to parse the dictionary into individual words (easy to do in Python) and put those words into a list. Selecting four random words is then as easy as selecting four random items from the list. This is fast, easy to implement, and simple to understand, but it is also very memory inefficient. I have to load 50,000+ words into memory in order to select four of them.2 Can we do better? Yes.

A Memory Efficient Algorithm

The key insight into developing a better algorithm is realizing that it should be possible to select the words as the dictionary file is being parsed, rather than loading the entire thing into memory. The difficulty is making sure that each word has an equal chance of being chosen, and that at least n words are chosen. If, for example, we simply give each word a 1 in 10 chance of being chosen, we’ll end up with way more words than we need (assuming n is small). If we give each a 1 in 50,000 chance, there’s the possibility that we won’t choose enough words. Bryce Boe has a clever solution to this problem where he chooses exactly n words, but the proof that it works is non-trivial, and he doesn’t provide it. This is why I came up with my algorithm.

In order to explain my algorithm, it’s best to think of it in terms of rolling dice. Consider the following procedure for randomly selecting 4 dice from 10.

  1. Roll all 10 dice.
  2. Select the 4 with the highest values.
    1. If, suppose, 5 of the dice all end up with a value of 6, randomly choose 4 from those 5 (perhaps by repeating the procedure with those 5).
    2. If, suppose 2 dice get a value of 6, and 3 get a value of 5, select the 2 with the value of 6, and then randomly select 2 of the 3 with a value of 5.

How can we adapt this procedure to select random words from a file, rather than dice? Here’s how: as we’re parsing the dictionary file, we give each word a random value, and then select the n words with the highest values. The issue is, the naive implementation of this procedure doesn’t really solve our memory problem. If every word gets a random value, don’t we now have to store every word in memory, along with its value? The key here is to observe that only the words with the n highest values need to be kept in memory, and all the others can be immediately discarded. Think about this in terms of the dice example. I want to select 1 die from 10:

  1. I roll the first die. I get a value of 1. I keep this die.
  2. I roll the second. I get a value of 3. I keep this die, and discard the other.
  3. I roll the third. I get a value of 3. I keep both dice.
  4. Fourth: I get a value of 6. I keep this die and discard the other 2.

By the end of the procedure, I might end up with 3 dice that all got a value of 6. I would then randomly select 1 from those 3.

How can we adapt this procedure for selecting random words? We use a priority queue:

  1. Read a word from the dictionary.
  2. Give it a random value.
  3. Insert the value-word pair (as a tuple) into the priority queue.
  4. If the queue has more than n items, pop an item.
  5. Repeat until every word has been read.

Remember that popping from a priority queue removes the item with the lowest value. So, we insert a word, and if we have too many words, we pop the one with the lowest value. At the end of this procedure there will be exactly n words in the queue. These are our n random words. Neat.

There is one issue, though. What if two words have the same random value? Well, one solution is to keep both words, and then break the tie at the end like we did in the dice example, but that breaks the elegance of the priority queue implementation. Another is to break ties randomly as soon as they occur, and discard the losing word, but I’m not sure how to do this in a statistically safe way. The easiest solution is to just pray that collisions don’t occur. In Python, each call to random() produces 53-bits of precision, so it’s very unlikely that two values will collide. If 53-bits isn’t enough (yeah right), you can use multiple random numbers. So, rather than a tuple of (value, word), you can use (value_1, value_2, value_3, word).3 Python’s priority queue implementation will automatically know how to sort that.

Without further ado, here’s the proof of concept:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/usr/bin/python -O

import random
import heapq

DICT_PATH = "/usr/share/dict/words"
WORD_COUNT = 4

dict_file = open(DICT_PATH)
wordq = []
for line in dict_file:
    word = line.strip()
    rand_val = random.random()
    heapq.heappush(wordq, (rand_val, word) )
    if len(wordq) > WORD_COUNT:
        heapq.heappop(wordq)

print wordq

Endnotes

  1. Summary: a good password is composed of four random words.
  2. A slight improvement on this would be to store only the word’s position in the file, rather than the word itself. Then the word could be retrieved by seeking to that position. http://www.bryceboe.com/2009/03/23/random-lines-from-a-file
  3. If you’re only using one random value, and your dictionary file has 50,000 words, the chance of a collision is 50,000/253 , which is roughly 3 in 562 trillion. I’ll take those odds. Whoops! This is actually a version of the birthday problem. The actual probability of a collision is: 1.39e-7. Still quite good.