Solution for mini assignment 1

Solutions for questions 1, 2, 3, and 4a are in mini1.erl.

Solution for question 4.b: describe and explain the trends for the run-times of fib_hr and fib_tr.

I provide a short answer that covers all the bases (and a bit more) for a good solution. After that, I wrote a longer answer that presents a more detailed analysis of the timing data.

Short answer to 4.b

Both fib_hr(N) and fib_tr(N) have times that appear to grow roughly quadratically with N -- increasing N by a factor of 3 (or 3+1/3) increase the time by about a factor of 10. This is because fib(N) grows exponentially with N and takes N machine words to represent. Thus, adding fib(N-1) and fib(N-1) takes O(N) time. The total time for all the adds is then O(N^2). The times for fib_tr seem to be about 30% less than those for fib_hr. Clearly, tail-call optimization makes the code faster. I'm not sure why it is that much faster. I'd expect that avoiding creating N stack frames would save O(N) time, but the savings are O(N^2). Maybe there's some optimization Erlang is doing with memory allocation and/or garbage collection.

A longer answer to 4.b

Running mini1:time_fib (on my laptop) I got:

    N =     1000: time reverse_hr =   0.000136, time reverse_tr =   0.000077
    N =     3000: time reverse_hr =   0.000528, time reverse_tr =   0.000399
    N =    10000: time reverse_hr =   0.003446, time reverse_tr =   0.001611
    N =    30000: time reverse_hr =   0.015134, time reverse_tr =   0.010520
    N =   100000: time reverse_hr =   0.156484, time reverse_tr =   0.112085
    N =   300000: time reverse_hr =   1.712151, time reverse_tr =   1.012803
  
Let t_hr(N) denote the time of the head-recursive version, and t_tr(N) the time of the tail recursive one. Both function appear to run in quadratic time as seen by calculating t_hr(N)/N^2, and t_tr(N)/N^2. I added a function, time_fib_is_quadratic/0 to mini1.erl. Here's what it printed:
    N =     1000: t_hr(N)/N^2 =  1.340e-10, t_tr(N)/N^2 =  7.686e-11
    N =     3000: t_hr(N)/N^2 =  7.011e-11, t_tr(N)/N^2 =  3.681e-11
    N =    10000: t_hr(N)/N^2 =  3.512e-11, t_tr(N)/N^2 =  1.606e-11
    N =    30000: t_hr(N)/N^2 =  1.799e-11, t_tr(N)/N^2 =  1.064e-11
    N =   100000: t_hr(N)/N^2 =  1.633e-11, t_tr(N)/N^2 =  1.117e-11
    N =   300000: t_hr(N)/N^2 =  1.880e-11, t_tr(N)/N^2 =  1.146e-11
  
The ratio seems to converge towards an number around 1.6e-11 to 1.9e-11 for fib_hr, and to something around 1.1e-11 for fib_tr. I'll blame the higher ratios for smaller values of N on lower-order terms (i.e. the linear and constant terms). I could do a regression to fit those terms, and use time_it:t to get better measurements in the first place, but that would be overkill. And then I'd have to account for the cache-misses that occur for larger values of N.

Why is it quadratic? That's because fib(N) grows roughly as phi^N where phi = (sqrt(5)+1)/2 (about 1.6). You can also think of that as roughly 2^(2N/3) or 10^(N/5). In other words, when N=100000, the decimal representation of fib(N) has 20000 digits, and the binary one has about 65000 bits -- it takes 1000 64-bit words to represent on the machine. The time to add fib(N-1) and fib(N-2) takes time proportional to the number of words to represent the big-integers. In other words, the time to add is linear in N. Calculating fib(N) requires a total of N adds, and the these goes linearly from 1 to N. The total time is O(N^2).

I also wrote fib_time_ratio that calculates t_hr(N)/t_tr(N). Here's the output:

    N =     1000: t_hr(N)/t_tr(N) =   1.419e+0
    N =     3000: t_hr(N)/t_tr(N) =   1.284e+0
    N =    10000: t_hr(N)/t_tr(N) =   2.025e+0
    N =    30000: t_hr(N)/t_tr(N) =   1.307e+0
    N =   100000: t_hr(N)/t_tr(N) =   1.415e+0
    N =   300000: t_hr(N)/t_tr(N) =   1.667e+0
  
The ratio ranges from 1.28 to 2, typically in the 1.3 to 1.4 range. fib_tr seems to be about 30-40% faster than fib_hr.

I would expect fib_tr to be faster because of the tail call elimination. However, the stack frames should be fixed size (with pointers to their return results). So, I would expect the time difference, t_hr - t_tr to grow as N. But the ratio stays fairly large while t_hr and t_tr are both growing quadratically. That means that the time difference is growing quadratically as well. I'm puzzled. My best guess is that the erlang run-time is doing some optimization for memory allocation and/or garbage collection for the tail recursive case, and this saves time proportional to the amount of memory needed to represent the intermediate sums.