Running mini1:time_fib (on my laptop) I got:
N = 1000: time reverse_hr = 0.000136, time reverse_tr = 0.000077 N = 3000: time reverse_hr = 0.000528, time reverse_tr = 0.000399 N = 10000: time reverse_hr = 0.003446, time reverse_tr = 0.001611 N = 30000: time reverse_hr = 0.015134, time reverse_tr = 0.010520 N = 100000: time reverse_hr = 0.156484, time reverse_tr = 0.112085 N = 300000: time reverse_hr = 1.712151, time reverse_tr = 1.012803Let t_hr(N) denote the time of the head-recursive version, and t_tr(N) the time of the tail recursive one. Both function appear to run in quadratic time as seen by calculating t_hr(N)/N^2, and t_tr(N)/N^2. I added a function, time_fib_is_quadratic/0 to mini1.erl. Here's what it printed:
N = 1000: t_hr(N)/N^2 = 1.340e-10, t_tr(N)/N^2 = 7.686e-11 N = 3000: t_hr(N)/N^2 = 7.011e-11, t_tr(N)/N^2 = 3.681e-11 N = 10000: t_hr(N)/N^2 = 3.512e-11, t_tr(N)/N^2 = 1.606e-11 N = 30000: t_hr(N)/N^2 = 1.799e-11, t_tr(N)/N^2 = 1.064e-11 N = 100000: t_hr(N)/N^2 = 1.633e-11, t_tr(N)/N^2 = 1.117e-11 N = 300000: t_hr(N)/N^2 = 1.880e-11, t_tr(N)/N^2 = 1.146e-11The ratio seems to converge towards an number around 1.6e-11 to 1.9e-11 for fib_hr, and to something around 1.1e-11 for fib_tr. I'll blame the higher ratios for smaller values of N on lower-order terms (i.e. the linear and constant terms). I could do a regression to fit those terms, and use time_it:t to get better measurements in the first place, but that would be overkill. And then I'd have to account for the cache-misses that occur for larger values of N.
Why is it quadratic? That's because fib(N) grows roughly as phi^N where phi = (sqrt(5)+1)/2 (about 1.6). You can also think of that as roughly 2^(2N/3) or 10^(N/5). In other words, when N=100000, the decimal representation of fib(N) has 20000 digits, and the binary one has about 65000 bits -- it takes 1000 64-bit words to represent on the machine. The time to add fib(N-1) and fib(N-2) takes time proportional to the number of words to represent the big-integers. In other words, the time to add is linear in N. Calculating fib(N) requires a total of N adds, and the these goes linearly from 1 to N. The total time is O(N^2).
I also wrote fib_time_ratio that calculates t_hr(N)/t_tr(N). Here's the output:
N = 1000: t_hr(N)/t_tr(N) = 1.419e+0 N = 3000: t_hr(N)/t_tr(N) = 1.284e+0 N = 10000: t_hr(N)/t_tr(N) = 2.025e+0 N = 30000: t_hr(N)/t_tr(N) = 1.307e+0 N = 100000: t_hr(N)/t_tr(N) = 1.415e+0 N = 300000: t_hr(N)/t_tr(N) = 1.667e+0The ratio ranges from 1.28 to 2, typically in the 1.3 to 1.4 range. fib_tr seems to be about 30-40% faster than fib_hr.
I would expect fib_tr to be faster because of the tail call elimination. However, the stack frames should be fixed size (with pointers to their return results). So, I would expect the time difference, t_hr - t_tr to grow as N. But the ratio stays fairly large while t_hr and t_tr are both growing quadratically. That means that the time difference is growing quadratically as well. I'm puzzled. My best guess is that the erlang run-time is doing some optimization for memory allocation and/or garbage collection for the tail recursive case, and this saves time proportional to the amount of memory needed to represent the intermediate sums.