To Understand Recursion You Need To Understand Recursion…

After seeing this problem, and being a developer gradually falling in love with Python (or falling in love-hate with it to be precise). It occured to me “well, why not try to see what python could do?”

Of course, I didn’t want to use the straightforward recursive solution, since as my colleague, Ahmed Soliman, rightly concluded that recursion in Python is slow. (An interesting counter-argument can be found here as well).

To cut things short, I wrote a C++ and a Python implementation of the Fibonacci algorithm with memoization, and seemingly I had some interesting observations

Here is the C++ implementation:

#include <iostream>
#include <ctime>
#include <cstdlib>
#include <map>
using namespace std;

unsigned long long fib(unsigned int n) {
	static map<unsigned long long, unsigned int> memo;
	if (n < memo.size()) {
		return memo[n];
	} else if (n <= 1) {
		memo[0] = 1;
		memo[1] = 1;
		return memo[n];
	} else { 
		unsigned long long x = fib(n-1) + fib(n-2);
		memo[n] = x;
		return x;

int main(int argc, char **argv) {
	unsigned int n = 0;
	unsigned long long x = 0;
	time_t t1, t2;
	if (argc != 2) {
		cout << "Usage: fib , where  is the input in f(n) = f(n-1) + f(n-2)" << endl;
		return 1;

	n = atoi(argv[1]);

	t1 = time(NULL);
	x = fib(n);
	t2 = time(NULL);

	cout << "Time taken: " << t2 - t1 << endl;
	cout << "fib(n)=" << x << endl;
	return 0;

Here are the inputs and outputs, note that some values might be incorrect due to overflows, but what matters here is the function call mechanism:

mohd@mohd-laptop:~$ ./fib_memo 40
Time taken: 0
mohd@mohd-laptop:~$ ./fib_memo 100
Time taken: 0
mohd@mohd-laptop:~$ ./fib_memo 1000
Time taken: 0
mohd@mohd-laptop:~$ ./fib_memo 900
Time taken: 0

Now here is the python implementation:

memo = dict() # I don't like this, can't python just have static variables?

def fib(n):
	if memo.has_key(n):
		return memo[n]
	elif n == 0 or n == 1:
		memo[n] = 1
		memo[n] = fib(n-1) + fib(n-2)
	return memo[n]

if __name__ == "__main__":
	if len(sys.argv) != 2:
		print "Usage: fib , where  is the input to f(n) = f(n-1) + f(n-2)"

	n = int(sys.argv[1])
	t1 = time.time()
	x = fib(n)
	t2 = time.time()
	print "Time Taken: ", t2 - t1
	print "fib(n)=", x

And here are my tests:

mohd@mohd-laptop:~$ python 40
Time Taken:  0.000197887420654
fib(n)= 165580141
mohd@mohd-laptop:~$ python 100
Time Taken:  0.00048303604126
fib(n)= 573147844013817084101
mohd@mohd-laptop:~$ python 900
Time Taken:  0.00656604766846
fib(n)= 88793027306605937532517516910637647045239090036365766884466525589158360259770006891772711976920559280382807770394537471560061517120086971996377683290300054868066659454250625417891167369401
mohd@mohd-laptop:~$ python 1000
File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
    memo[n] = fib(n-1) + fib(n-2)
  File "", line 11, in fib
RuntimeError: maximum recursion depth exceeded

So WTF? The recursion depth limit is somewhere near a 1000 function calls? To confirm my doubts I cracked open iPython with this scenario:

mohd@mohd-laptop:~$ ipython
/var/lib/python-support/python2.6/IPython/ DeprecationWarning: the sets module is deprecated
  from sets import Set
Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) 
Type "copyright", "credits" or "license" for more information.

IPython 0.9.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: from fib_memo import *

In [2]: fib(1000)

RuntimeError: maximum recursion depth exceeded

In [3]: fib(900)
Out[3]: 88793027306605937532517516910637647045239090036365766884466525589158360259770006891772711976920559280382807770394537471560061517120086971996377683290300054868066659454250625417891167369401L

In [4]: fib(1000)
Out[4]: 70330367711422815821835254877183549770181269836358732742604905087154537118196933579742249494562611733487750449241765991088186363265450223647106012053374121273867339111198139373125598767690091902245245323403501L

Now python is happy because it has to do only 100 recursive calls to get fib(1000), which clearly shows that the recursion limit is much smaller than what’s offered by the C++ implementation, and even worse, it means that the function behaviour can change according to the order it receives arguments.

On the other hand, the C++ implementation suffers from the fact that lead to wrong return if they exceed the “long long” limit (how many bits is that on x86_64?), but at least the function behaviour is deterministic, and probably the only limiting factor to calling the function is when the size of the map grows too large to be held in the process’s address space hence causing a segfault.

Bottomline, yes, you can theoretically use recursion in Python, but neverthless you should worry A LOT about performance, and even if you find optimizations, be aware of how your functions can get called and that inputs don’t cause the function to exceed the recursion limits. In other words, just spare yourself the time and don’t use recursion in Python unless ABSOLUTELY necessary, othereise Python offers a lot of tricks to get your things done.


5 Responses

  1. 1. I’m not sure but, in both the Python as well as the C++ version, you don’t want to use a map to perform memoization, since all indices are monotonic increasing numbers. List position lookup would be O(1), whilst map ‘exists’, lookup and set actions are almost certainly worse, so use a statically allocated list of correct size and direct value lookups might be much better, although that’s mostly guessing. At least in pure C it would be 😉

    2. In Python you can set max stack depth using sys.setrecursionlimit(N) (and get it using sys.getrecursionlimit()), so the limit you’re facing is only a helper to make sure you code won’t go into eternal recursion.

    3. Don’t use ‘has_key’:

    In [1]: d = dict((n, n) for n in xrange(1000))

    In [2]: def f(x):
    …: return d.has_key(x)

    In [3]: def g(x):
    …: return x in d

    In [4]: %timeit f(500)
    1000000 loops, best of 3: 556 ns per loop

    In [5]: %timeit g(500)
    1000000 loops, best of 3: 401 ns per loop

    In [6]: import dis

    In [7]: dis.dis(f)
    2 0 LOAD_GLOBAL 0 (d)
    3 LOAD_ATTR 1 (has_key)
    6 LOAD_FAST 0 (x)

    In [8]: dis.dis(g)
    2 0 LOAD_FAST 0 (x)
    3 LOAD_GLOBAL 0 (d)
    6 COMPARE_OP 6 (in)

    Finally, your implementation (even when using a dict) could do better. Running on the same system as used in my blog post, starting with your code (somewhat):

    $ cat
    import time

    def fib(n):
    if memo.has_key(n):
    return memo[n]
    elif n == 0 or n == 1:
    memo[n] = 1
    memo[n] = fib(n – 1) + fib(n – 2)
    return memo[n]

    def test(n):
    # Always start with a clean cache
    global memo
    memo = dict()
    return fib(n)

    start = time.time()
    for i in xrange(1000):
    end = time.time()
    print ‘Total:’, (end – start)

    $ python
    Total: 1.71642112732

    I rewrote this:
    – Using a closure instead of a global variable as cache (although using a global var seems to be faster, as noted in the comments), which is much cleaner
    – Doing some local binding (LOAD_FAST instead of LOAD_GLOBAL…)
    – Prepopulating values of fib(0) and fib(1), removing the if-clause (yes, a little cheating, I didnt measure the actual impact)
    – Using ‘in’
    – Copying the result value into a local variable instead of performing the calculation, putting the result in the map and then retrieving it again from the map in the return statement (which has a pretty big impact after some basic measurements)

    $ cat
    import time

    def gen_fib():
    # Using a closure because that’s much nicer than a module-global variable
    # Testing shows a global (also with local binding) is a little bit faster,
    # so the closure cell lookup is most likely pretty slow
    memo = {0: 1, 1: 1}

    def _fib(n):
    # Local binding
    # This is a minor performance helper
    m = memo
    f = _fib

    # Something like
    # value = m.get(n, -1)
    # if value != -1:
    # return value
    # is worse
    if n in m:
    return m[n]

    value = m[n] = f(n – 1) + f(n – 2)
    return value

    return _fib

    def test(n):
    # Always start with a clean cache
    fib = gen_fib()
    return fib(n)

    start = time.time()
    for i in xrange(1000):
    end = time.time()
    print ‘Total:’, (end – start)

    $ python
    Total: 1.46804785728

  2. Interesting!
    As for your very first point. It’d have been easy using a vector so the lookup time could be O(1), but bearing in mind that we call the function multiple times, hence adding elements a lot along the way which can cause a huge performance hit since everytime we add a new element to the vector, the whole old vector gets copied (I wonder how realloc() in C behaves, does it just find the next available pointer in the heap and attach it to the data structure, or does it copy the whole data structure accomodating space for the new data as in the case of the C++ vector?)

    We can alternatively use a linked list, but the lookup time will still be O(n).

    It was also nice knowing that setting the recursion limit in python is configurable, but I still think that it should not be a hard-set limit, but rather like Java should allow you to use the stack until an overflow occurs in which case it should throw an exception, no?

  3. I also wonder how much different is the performance would be between has_key() and ‘in’ construct?

  4. The ‘in’ vs ‘has_key’ difference can be seen in the first example code snippet: 556ns using ‘has_key’ becomes 401ns using ‘in’, which is normal since ‘in’ removes the need for LOAD_ATTR and (even more important, most likely) doesnt need a CALL_FUNCTION.

    WRT not using an array: you should keep a pointer to the value array and a size variable S, check when running fib(n) whether n > S, if so, do a realloc(), otherwise just use the existing array. This way you’ll only need at most one allocation (and optionally a copy cycle) for every first-level call to fib, which is pretty cheap.

    realloc in C is basically malloc + memcpy, unless the existing allocation can be extended, but this is obviously very much libc-specific. Check for example the realloc implementation of uclibc at

  5. […] Need To Understand Recursion (Take 2) Posted on August 4, 2009 by Mohammed Gamal Look here […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: