boutell: (Default)
[personal profile] boutell


The Erlang programming language freaks my shit out, yo. Tail recursion + concurrency + message passing + immutable variables = kaboom! Bits of Tom's brain all over the place. And the white paper doesn't even mention the tail recursion thing. Heaven help you if you weren't a computer science major.

But I was... a long time ago. So I'll do my best to explain.

You might remember the Fibonacci series from high school. This is a series in which each number is the sum of the two numbers before it:

1, 2, 3, 5, 8, 13, 21, 34...

The Fibonacci series is fun because it shows up all over the place in nature.

Here's a simple mathematical definition of the Fibonacci series. To obtain the nth Fibonacci number:

fibonacci(1) -> 1
fibonacci(2) -> 2
fibonacci(x) -> fibonacci(x - 2) + fibonacci(x - 1)

This is what's called a recursive function, because it is defined in terms of itself. Keep following the rules and you eventually wind up with a result.

Computers are just as capable of following recursive rules as people are, and so it's not hard to express this in any modern programming language. Here it is in PHP, just because:
function fibonacci($x)
{
  if ($x == 1) 
  {
    return 1;
  } else if ($x == 2)
  {
    return 2;
  } else {
    return (fibonacci($x - 2) + fibonacci($x - 1));
  }
}

Kind of elegant, huh.

But there's a big problem with doing things this way: when you ask for a high Fibonacci number... let's say the 50th... the computer has to "dig down" all the way to the smallest Fibonacci numbers before it can start summing things up and returning a result. And that means keeping track of a large tree of information, just to answer a simple question. A human being following the above recipe literally would have the same problem, perhaps taking notes like these:

f(50) -> f(48) + f(49)
f(50) -> (f(46) + f(47)) + (f(47) + f(48))
f(50) -> ((f(44) + f(45)) + (f(45) + f(46))) + ((f(45) + f(46)) + (f(46) + f(47)))

And on and on until f(1) and f(2) are reached and replaced with 1 and 2. And then summing everything up and up and up... eek.

Not to mention, it's easy to spot that lots of work is being done more than once here. f(46) and f(47) appear repeatedly.

For these reasons, if you try my PHP code above, you'll find that fibonacci(20) is about as high as you can go before there is a noticeable or even nigh-infinite delay. Eventually you'll hit a recursion limit imposed by PHP or run out of memory.

There has to be an easier and more efficient way, right?

Well, yes. And the plain-English recipe at the beginning of this post describes it pretty succinctly. Start with 1 and 2, keep adding the last two numbers together until you notice you've reached the nth one. Simple in English. Pretty simple in PHP, too:
function fibonacci($n)
{
  for ($i = 1; ($i <= $n); $i++)
  {
    if ($i == 1)
    {
      $next = 1;
      $backOne = $next;
    }
    else if ($i == 2)
    {
      $next = 2;
      $backTwo = $backOne;
      $backOne = $next;
    }
    else
    {
      $next = $backTwo + $backOne;
      $backTwo = $backOne;
      $backOne = $next;
    }
  }
  return $next;
}

This way the computer doesn't have to keep track of any more information than a person would. So clearly loops, like this PHP for loop, are the way to go... right?

Well... functional programming fans, people who love Scheme and Lisp and Erlang, would dispute that. They feel that recursion is such an elegant way of expressing things that an extra effort should be made to use it. And so they discovered a way to do it without taking so many notes. This idea is called "tail recursion."

"Why are all of your code examples in PHP?"

All of the concepts in this article can be expressed in PHP, a language we're both likely to be familiar with (and due to the abstract nature of these simple bits of code, C/C++/Java/Perl/etc. programmers will also find it accessible).

But the Erlang language actually enforces these concepts, and expresses them more elegantly. So I encourage you to check out the Erlang white paper to learn more about it.

You might think that this code would still require the computer to take notes as it went along, though not nearly so extensively as before:

function fibonacci($n)
{
  return fibonacciBody($n, 1, 0, 0);
}

function fibonacciBody($n, $i, $backOne, $backTwo)
{
  if ($i == 1)
  {
    $next = 1;
    $backOne = $next;
  }
  else if ($i == 2)
  {
    $next = 2;
    $backTwo = $backOne;
    $backOne = $next;
  }
  else
  {
    $next = $backTwo + $backOne;
    $backTwo = $backOne;
    $backOne = $next;
  }
  if ($i == $n)
  {
    return $next;
  }
  else
  {
    return fibonacciBody($n, $i + 1, $backOne, $backTwo);
  }
}

This version of the fibonacci function calls a fibonacciBody function that does the real work recursively, following the same algorithm as the non-recursive version of the code. The only difference is that the loop is implemented using recursion: $i, our loop index, is passed as the second argument to fibonacciBody, and at the end of each pass fibonacciBody checks to see whether the loop is complete. If not, it calls itself recursively, incrementing $i to the next step.

One would expect that this still requires the computer, or a human being simulating the computer, to take notes. After all, the first call to fibonacciBody can't return a value until the second call finishes, which is waiting on the third, and so on.

But because the fibonacciBody function returns the result of a call to itself, and doesn't combine that value with anything else (unlike the aggressively recursive version presented earlier, which summed up two calls to the fibonacci function), the computer can take a clever shortcut. Since there is no other information to be combined with the result of the next call to fibonacciBody, the computer can simply "jump back" to the start of the fibonacciBody function, without bothering to keep track of layer upon layer of calls to fibonacciBody. This is what the functional language fans refer to as "tail recursion."

Personally, though? I think it's mostly ridiculous. Tail recursion replaces a construct that humans have no trouble understanding with a concept that is much less obvious. Recursion is hard enough for some people to grasp; looking carefully for a particular kind of recursion is pain above and beyond the call of duty.

However... you'll notice I did say mostly ridiculous.

And this brings us back to Erlang and the rest of its functional-language friends.

In the Erlang programming language, tail recursion is indispensable. And that's because, in Erlang, variables are immutable.

Now, you could be forgiven for saying "what the flying duck are you talking about, Tom Boutell?" at this point. "Immutable variables? Are those anything like dwarf elephants?"

Well... think about high school math for a moment, if it's not too traumatic. When we see variables in a math problem, we understand that they aren't really subject to change. Their values are fixed, we just don't know what they are yet. When we simplify a problem like 5 + x = 2, we're not changing x, we're just revealing the value it had all along:
5 + x = 2
x = 2 - 5
x = -3

Erlang uses the term "variable" in the same spirit. You can't have a "for" loop in Erlang because you can't have a loop index in Erlang.

"But why, Tom? Why?"

Although immutable variables are deeply strange to those of us who have spent most of our time with commonplace languages like PHP and C, they do have certain benefits.

The biggest is that immutable variables, also known as "single assignment," tend to cut down on confusion that can lead to buggy code. If I write code today to assign a value to a variable, and more code tomorrow that modifies that variable, I might not fully consider the impact on later code in the same program.

If I have a variable that contains the price of lychee nuts in Guatemala, and I later decide to add sales tax to the value of that variable, there's a good chance I'm forgetting that I already took tax into account somewhere else:

$price = 10.00;
// Gee, I left out sales tax
$price = $price * 1.05;
printPrice($price);

function printPrice($price)
{
  // I'm pretty sure I left out sales tax, add it now
  $price = $price * 1.05;
  echo($price);
}


Oops. We've added sales tax twice. Easy mistake to make.

In a language with immutable variables, it's tougher to make this mistake. We're forced to assign a new variable for the price with sales tax included. After that, either we leave the call to printPrice alone as shown below, resulting in the correct outcome (although we haven't eliminated the redundant work), or we alter the call to printPrice to use the new variable, which encourages us to think about what the code is doing... again making it more likely that we'll make correct modifications to produce the correct outcome.

So when we introduce a new variable, both of the probable outcomes produce correct code:
// Listing 1: add the new variable, don't think about changing the printPrice code
$price = 10.00;
// Gee, I left out sales tax
$priceWithSalesTax = $price * 1.05;
// Do something with that; now, much later...
printPrice($price); 

function printPrice($price)
{
  // I'm pretty sure I left out sales tax, add it now
  $price = $price * 1.05;
  echo($price);
}


And alternatively:

// Listing 2: add the new variable, think about changing the printPrice code
$price = 10.00;
// Gee, I left out sales tax
$priceWithSalesTax = $price * 1.05;
// Do something with that; now, much later...
printPriceWithSalesTax($priceWithSalesTax); 

function printPriceWithSalesTax($priceWithSalesTax)
{
  // We already have sales tax
  echo($price);
}


So refusing to modify variables does have benefits, even in a language like PHP that allows modification of variables.

But am I going to run out and switch to Erlang tomorrow morning? Not very likely. And that's fine. Instead, I'll simply apply some of these useful notions to the languages I use every day. Languages that directly implement worthwhile concepts in computer science can be enlightening case studies even for those of us who must continue to be productive in languages that have a broader base of support.

One thing I don't plan to do: replace all of my loops with tail recursion. There are three good reasons for that. The first is that like most programmers I'm comfortable with the idea that variable names like $i, $j and $k are always loop indexes, and it would never occur to me to modify them in nasty and unexpected ways. So they are reasonably safe from the negative consequences of modifying variables. The second is that loops are considerably easier to understand than tail recursion. And the third is that C, PHP and most similar languages in common use today don't implement the "just jump back to the top when you detect tail recursion" feature that makes tail recursion practical.

It should be noted that Erlang isn't really a new language. I put "new" in quotes for a reason. It's been in use at Ericsson for quite a few years, and it's been available in an open source form since 1998. But Erlang, and other languages that attempt to deliver functional programming, are either truly new or long-forgotten to most of us who build practical applications in the field.

Date: 2008-08-21 10:18 pm (UTC)
From: [identity profile] dr-strych9.livejournal.com
You should probably look at Haskell and OCaml.

Date: 2008-08-22 12:45 am (UTC)
From: [identity profile] mskala.livejournal.com
In Prolog, you can do A=B, A=C, B=3, then test C, and find it equal to 3.

Date: 2008-08-22 03:17 am (UTC)
From: [identity profile] boutell.livejournal.com
Yes, Prolog is a wacky little world all its own.

Date: 2008-08-22 03:00 am (UTC)
From: [identity profile] madcaptenor.livejournal.com
So basically you have mathematician-variables, not programmer-variables.

recursion - difficult or not?

Date: 2008-08-22 08:41 am (UTC)
From: (Anonymous)
Novel approach - discussing Erlang by means of PHP snippets. (:

On the topic of recursion, I thought I'd mention Frank Huch's
excellent presentation at the Erlang Workshop 2007 in Freiburg.

http://www.erlang.se/workshop/2007/proceedings/12huch.pdf

It describes a one-week programming course for high school
girls who had no previous experience with Comp Sci or programming.
They found recursion perfectly natural. The course included
concurrency and distribution and ended with a final project
programming a chat server in Erlang (!).

I recommend the "Getting Started" part of the Erlang docs
for a gentler introduction than the white paper:

http://www.erlang.org/doc/getting_started/part_frame.html

BR,
Ulf Wiger

Re: recursion - difficult or not?

Date: 2008-08-22 11:26 am (UTC)
From: [identity profile] boutell.livejournal.com
I think recursion is perfectly natural. I think going out of one's way to be tail-recursive is not so natural. And that's a problem because it is essential in the real world.

But I'll check out the presentation. And then perhaps I'll teach my daughter Erlang.

fib in erlang

Date: 2008-08-22 09:27 am (UTC)
From: (Anonymous)
So why not give the erlang solution? - here it is with example output
(and I bet PHP will not get the right answer for large N :-)



-module(fib).

-export([fib/1, fast_fib/1, test/0]).

%% Example session:
%% $ erl
%% 1> c(fib).
%% {ok,fib}
%% 2> fib:test().
%% fibSeemsOk
%% 3> fib:fast_fib(1500).
%% 1355112566856310195163693686714840837778601071241849724213354315322
%% 1487310873528750612259354035717265300373778814347320257699257082356
%% 5500453499141029242495959974839822286992875272419318113250950996424
%% 4762124220020925443992019696046532143849830534589337893258539338153
%% 9093549479296194800838145996187122583354898000

test() ->
[1,1,2,3,5,8,13,21] = [test(N) || N <- [1,2,3,4,5,6,7,8]],
fibSeemsOk.

test(N) ->
Val = fib(N),
Val = fast_fib(N).

%% slow fib
fib(0) -> 0;
fib(1) -> 1;
fib(N) -> fib(N-1) + fib(N-2).

%% fast fib
fast_fib(N) -> fib(N, 1, 0).

fib(0, _, X) -> X;
fib(N, A, B) -> fib(N-1, B, A+B).

Re: fib in erlang

Date: 2008-08-22 11:24 am (UTC)
From: [identity profile] boutell.livejournal.com
> fast_fib

Neat. fast_fib's "decrement $n rather than counting up $i" solution leads to a less verbose PHP solution as well:

function fast_fib($n)
{
return fast_fib_body($n, 1, 0);
}

function fast_fib($n, $a, $b)
{
if ($n == 0)
{
return $b;
}
return fast_fib($n - 1, $b, $a + $b);
}

Of course the lack of tail recursion optimization in PHP means that the value of this tactic is somewhat limited in practice.

This is my first encounter with the underscore ( _ ). Is this a special placeholder meaning "I'm not using this on the right-and side and that's not an error, it's my intention?" Is it just a convention ( _ is an actual variable) or is it enforced by the language?

Re: fib in erlang

Date: 2008-08-23 07:20 pm (UTC)
From: (Anonymous)
Is the php correct?

Did you actually compute fib 1500 and compare with the
Erlang program? As far as I am aware (and I must admit to being ignorant here) I thought php integers didn't automatically coerce to bignums?

Can you run the php and check?

And yes "_" is a "don't care" variable place holder.

Re: fib in erlang

Date: 2008-08-23 07:26 pm (UTC)
From: [identity profile] boutell.livejournal.com
PHP has no painlessly built-in bignum support, although the bc extension is usually around.

I'm not trying to write PHP that is rigorously equivalent to Erlang. I'm just shedding light on some interesting concepts that PHP programmers don't normally think about.

Messing around with PHP's bc functions for arbitrary precision arithmetic (http://us3.php.net/bc) does not lend itself to that sort of thing.
Edited Date: 2008-08-23 07:27 pm (UTC)

September 2014

S M T W T F S
 123456
78910111213
14151617181920
2122232425 2627
282930    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 17th, 2017 10:09 pm
Powered by Dreamwidth Studios