Chapter 3: Functions

A program often needs to do the same thing in different places. Repeating all the necessary statements every time is tedious and error-prone. It would be better to put them in one place, and have the program take a detour through there whenever necessary. This is what functions were invented for: They are canned code that a program can go through whenever it wants. Putting a string on the screen requires quite a few statements, but when we have a print function we can just write print("Aleph") and be done with it.

To view functions merely as canned chunks of code doesn't do them justice though. When needed, they can play the role of pure functions, algorithms, indirections, abstractions, decisions, modules, continuations, data structures, and more. Being able to effectively use functions is a necessary skill for any kind of serious programming. This chapter provides an introduction into the subject, chapter 6 discusses the subtleties of functions in more depth.

Pure functions, for a start, are the things that were called functions in the mathematics classes that I hope you have been subjected to at some point in your life. Taking the cosine or the absolute value of a number is a pure function of one argument. Addition is a pure function of two arguments.

The defining properties of pure functions are that they always return the same value when given the same arguments, and never have side effects. They take some arguments, return a value based on these arguments, and do not monkey around with anything else.

In JavaScript, addition is an operator, but it could be wrapped in a function like this (and as pointless as this looks, we will come across situations where it is actually useful):

function add(a, b) {
  return a + b;
}

show(add(2, 2));

add is the name of the function. a and b are the names of the two arguments. return a + b; is the body of the function.

The keyword function is always used when creating a new function. When it is followed by a variable name, the resulting function will be stored under this name. After the name comes a list of argument names, and then finally the body of the function. Unlike those around the body of while loops or if statements, the braces around a function body are obligatory1.

The keyword return, followed by an expression, is used to determine the value the function returns. When control comes across a return statement, it immediately jumps out of the current function and gives the returned value to the code that called the function. A return statement without an expression after it will cause the function to return undefined.

A body can, of course, have more than one statement in it. Here is a function for computing powers (with positive, integer exponents):

function power(base, exponent) {
  var result = 1;
  for (var count = 0; count < exponent; count++)
    result *= base;
  return result;
}

show(power(2, 10));

If you solved exercise 2.2, this technique for computing a power should look familiar.

Creating a variable (result) and updating it are side effects. Didn't I just say pure functions had no side effects?

A variable created inside a function exists only inside the function. This is fortunate, or a programmer would have to come up with a different name for every variable he needs throughout a program. Because result only exists inside power, the changes to it only last until the function returns, and from the perspective of code that calls it there are no side effects.

Ex. 3.1

Write a function called absolute, which returns the absolute value of the number it is given as its argument. The absolute value of a negative number is the positive version of that same number, and the absolute value of a positive number (or zero) is that number itself.

function absolute(number) {
  if (number < 0)
    return -number;
  else
    return number;
}

show(absolute(-144));

Pure functions have two very nice properties. They are easy to think about, and they are easy to re-use.

If a function is pure, a call to it can be seen as a thing in itself. When you are not sure that it is working correctly, you can test it by calling it directly from the console, which is simple because it does not depend on any context2. It is easy to make these tests automatic ― to write a program that tests a specific function. Non-pure functions might return different values based on all kinds of factors, and have side effects that might be hard to test and think about.

Because pure functions are self-sufficient, they are likely to be useful and relevant in a wider range of situations than non-pure ones. Take show, for example. This function's usefulness depends on the presence of a special place on the screen for printing output. If that place is not there, the function is useless. We can imagine a related function, let's call it format, that takes a value as an argument and returns a string that represents this value. This function is useful in more situations than show.

Of course, format does not solve the same problem as show, and no pure function is going to be able to solve that problem, because it requires a side effect. In many cases, non-pure functions are precisely what you need. In other cases, a problem can be solved with a pure function but the non-pure variant is much more convenient or efficient.

Thus, when something can easily be expressed as a pure function, write it that way. But never feel dirty for writing non-pure functions.

Functions with side effects do not have to contain a return statement. If no return statement is encountered, the function returns undefined.

function yell(message) {
  alert(message + "!!");
}

yell("Yow");

The names of the arguments of a function are available as variables inside it. They will refer to the values of the arguments the function is being called with, and like normal variables created inside a function, they do not exist outside it. Aside from the top-level environment, there are smaller, local environments created by function calls. When looking up a variable inside a function, the local environment is checked first, and only if the variable does not exist there is it looked up in the top-level environment. This makes it possible for variables inside a function to 'shadow' top-level variables that have the same name.

function alertIsPrint(value) {
  var alert = print;
  alert(value);
}

alertIsPrint("Troglodites");

The variables in this local environment are only visible to the code inside the function. If this function calls another function, the newly called function does not see the variables inside the first function:

var variable = "top-level";

function printVariable() {
  print("inside printVariable, the variable holds '" +
        variable + "'.");
}

function test() {
  var variable = "local";
  print("inside test, the variable holds '" + variable + "'.");
  printVariable();
}

test();

However, and this is a subtle but extremely useful phenomenon, when a function is defined inside another function, its local environment will be based on the local environment that surrounds it instead of the top-level environment.

var variable = "top-level";
function parentFunction() {
  var variable = "local";
  function childFunction() {
    print(variable);
  }
  childFunction();
}
parentFunction();

What this comes down to is that which variables are visible inside a function is determined by the place of that function in the program text. All variables that were defined 'above' a function's definition are visible, which means both those in function bodies that enclose it, and those at the top-level of the program. This approach to variable visibility is called lexical scoping.

People who have experience with other programming languages might expect that a block of code (between braces) also produces a new local environment. Not in JavaScript. Functions are the only things that create a new scope. You are allowed to use free-standing blocks like this...

var something = 1;
{
  var something = 2;
  print("Inside: " + something);
}
print("Outside: " + something);

... but the something inside the block refers to the same variable as the one outside the block. In fact, although blocks like this are allowed, they are utterly pointless. Most people agree that this is a bit of a design blunder by the designers of JavaScript, and ECMAScript 4 is expected to add some way to define variables that stay inside blocks.

Here is a case that might surprise you:

var variable = "top-level";
function parentFunction() {
  var variable = "local";
  function childFunction() {
    print(variable);
  }
  return childFunction;
}

var child = parentFunction();
child();

parentFunction returns its internal function, and the code at the bottom calls this function. Even though parentFunction has finished executing at this point, the local environment where variable has the value "local" still exists, and childFunction still uses it. This phenomenon is called closure.

Apart from making it very easy to quickly see in which part of a program a variable will be available by looking at the shape of the program text, lexical scoping also allows us to 'synthesise' functions. By using some of the variables from an enclosing function, an inner function can be made to do different things. Imagine we need a few different but similar functions, one that adds 2 to its argument, one that adds 5, and so on.

function makeAddFunction(amount) {
  function add(number) {
    return number + amount;
  }
  return add;
}

var addTwo = makeAddFunction(2);
var addFive = makeAddFunction(5);
show(addTwo(1) + addFive(1));

On top of the fact that different functions can contain variables of the same name without getting tangled up, these scoping rules also allow functions to call themselves without running into problems. A function that calls itself is called recursive. Recursion allows for some interesting definitions. Look at this implementation of power:

function power(base, exponent) {
  if (exponent == 0)
    return 1;
  else
    return base * power(base, exponent - 1);
}

This is rather close to the way mathematicians define exponentiation, and to me it looks a lot nicer than the earlier version. It sort of loops, but there is no while, for, or even a local side effect to be seen. By calling itself, the function produces the same effect.

There is one important problem though: In most browsers, this second version is about ten times slower than the first one. In JavaScript, running through a simple loop is a lot cheaper than calling a function multiple times.

The dilemma of speed versus elegance is an interesting one. It not only occurs when deciding for or against recursion. In many situations, an elegant, intuitive, and often short solution can be replaced by a more convoluted but faster solution.

In the case of the power function above the un-elegant version is still sufficiently simple and easy to read. It doesn't make very much sense to replace it with the recursive version. Often, though, the concepts a program is dealing with get so complex that giving up some efficiency in order to make the program more straightforward becomes an attractive choice.

The basic rule, which has been repeated by many programmers and with which I wholeheartedly agree, is to not worry about efficiency until your program is provably too slow. When it is, find out which parts are too slow, and start exchanging elegance for efficiency in those parts.

Of course, the above rule doesn't mean one should start ignoring performance altogether. In many cases, like the power function, not much simplicity is gained by the 'elegant' approach. In other cases, an experienced programmer can see right away that a simple approach is never going to be fast enough.

The reason I am making a big deal out of this is that surprisingly many programmers focus fanatically on efficiency, even in the smallest details. The result is bigger, more complicated, and often less correct programs, which take longer to write than their more straightforward equivalents and often run only marginally faster.

But I was talking about recursion. A concept closely related to recursion is a thing called the stack. When a function is called, control is given to the body of that function. When that body returns, the code that called the function is resumed. While the body is running, the computer must remember the context from which the function was called, so that it knows where to continue afterwards. The place where this context is stored is called the stack.

The fact that it is called 'stack' has to do with the fact that, as we saw, a function body can again call a function. Every time a function is called, another context has to be stored. One can visualise this as a stack of contexts. Every time a function is called, the current context is thrown on top of the stack. When a function returns, the context on top is taken off the stack and resumed.

This stack requires space in the computer's memory to be stored. When the stack grows too big, the computer will give up with a message like "out of stack space" or "too much recursion". This is something that has to be kept in mind when writing recursive functions.

function chicken() {
  return egg();
}
function egg() {
  return chicken();
}
print(chicken() + " came first.");

In addition to demonstrating a very interesting way of writing a broken program, this example shows that a function does not have to call itself directly to be recursive. If it calls another function which (directly or indirectly) calls the first function again, it is still recursive.

Recursion is not always just a less-efficient alternative to looping. Some problems are much easier to solve with recursion than with loops. Most often these are problems that require exploring or processing several 'branches', each of which might branch out again into more branches.

Consider this puzzle: By starting from the number 1 and repeatedly either adding 5 or multiplying by 3, an infinite amount of new numbers can be produced. How would you write a function that, given a number, tries to find a sequence of additions and multiplications that produce that number?

For example, the number 13 could be reached by first multiplying 1 by 3, and then adding 5 twice. The number 15 can not be reached at all.

Here is the solution:

function findSequence(goal) {
  function find(start, history) {
    if (start == goal)
      return history;
    else if (start > goal)
      return null;
    else
      return find(start + 5, "(" + history + " + 5)") ||
             find(start * 3, "(" + history + " * 3)");
  }
  return find(1, "1");
}

print(findSequence(24));

Note that it doesn't necessarily find the shortest sequence of operations, it is satisfied when it finds any sequence at all.

The inner find function, by calling itself in two different ways, explores both the possibility of adding 5 to the current number and of multiplying it by 3. When it finds the number, it returns the history string, which is used to record all the operators that were performed to get to this number. It also checks whether the current number is bigger than goal, because if it is, we should stop exploring this branch, it is not going to give us our number.

The use of the || operator in the example can be read as 'return the solution found by adding 5 to start, and if that fails, return the solution found by multiplying start by 3'. It could also have been written in a more wordy way like this:

else {
  var found = find(start + 5, "(" + history + " + 5)");
  if (found == null)
    found = find(start * 3, history + " * 3");
  return found;
}

Even though function definitions occur as statements between the rest of the program, they are not part of the same time-line:

print("The future says: ", future());

function future() {
  return "We STILL have no flying cars.";
}

What is happening is that the computer looks up all function definitions, and stores the associated functions, before it starts executing the rest of the program. The same happens with functions that are defined inside other functions. When the outer function is called, the first thing that happens is that all inner functions are added to the new environment.

There is another way to define function values, which more closely resembles the way other values are created. When the function keyword is used in a place where an expression is expected, it is treated as an expression producing a function value. Functions created in this way do not have to be given a name (though it is allowed to give them one).

var add = function(a, b) {
  return a + b;
};
show(add(5, 5));

Note the semicolon after the definition of add. Normal function definitions do not need these, but this statement has the same general structure as var add = 22;, and thus requires the semicolon.

This kind of function value is called an anonymous function, because it does not have a name. Sometimes it is useless to give a function a name, like in the makeAddFunction example we saw earlier:

function makeAddFunction(amount) {
  return function (number) {
    return number + amount;
  };
}

Since the function named add in the first version of makeAddFunction was referred to only once, the name does not serve any purpose and we might as well directly return the function value.

Ex. 3.2

Write a function greaterThan, which takes one argument, a number, and returns a function that represents a test. When this returned function is called with a single number as argument, it returns a boolean: true if the given number is greater than the number that was used to create the test function, and false otherwise.

function greaterThan(x) {
  return function(y) {
    return y > x;
  };
}

var greaterThanTen = greaterThan(10);
show(greaterThanTen(9));

Try the following:

alert("Hello", "Good Evening", "How do you do?", "Goodbye");

The function alert officially only accepts one argument. Yet when you call it like this, the computer does not complain at all, but just ignores the other arguments.

show();

You can, apparently, even get away with passing too few arguments. When an argument is not passed, its value inside the function is undefined.

In the next chapter, we will see a way in which a function body can get at the exact list of arguments that were passed to it. This can be useful, as it makes it possible to have a function accept any number of arguments. print makes use of this:

print("R", 2, "D", 2);

Of course, the downside of this is that it is also possible to accidentally pass the wrong number of arguments to functions that expect a fixed amount of them, like alert, and never be told about it.

Technically, this wouldn't have been necessary, but I suppose the designers of JavaScript felt it would clarify things if function bodies always had braces.
Technically, a pure function can not use the value of any external variables. These values might change, and this could make the function return a different value for the same arguments. In practice, the programmer may consider some variables 'constant' ― they are not expected to change ― and consider functions that use only constant variables pure. Variables that contain a function value are often good examples of constant variables.