CS109 Tutorials -- Lists

Higher-order functions and function literals

Lists

Until now, when we needed a sequence of elements, we stored them in an array or in an ArrayBuffer. Both arrays and ArrayBuffer are large objects containing many slots with references to the elements.

The linked list is a completely different approach: a List consists of many small objects, each storing a reference to only one element.

One advantage of lists is that adding an element to a linked list is very fast.

One of the oldest programming languages is Lisp (1958). Lisp is from the same period as Fortran and Cobol, but unlike those languages Lisp is still very much alive.

Lisp stands for "List Processing", and lists are used everywhere in Lisp - in fact a Lisp program itself is a list. Scheme and Racket are versions of Lisp that are used commonly in education, and you will probably meet them in courses like "Programming Principles" or "Programming Languages."

The best way to define a List is recursively:

The empty list Nil is a list.
If a is an element and s is a list, then a :: s is a list.

The object a :: s, which is a small object storing references to a and to s, is called a cons-cell. The operator :: is pronounced “cons”. a is the head of the cons-cell a :: s, and s is the tail of a :: s.

In the Scala implementation, a cons-cell of type T is an object of type ::[T]. It has two val-fields, namely head and tail. The tail of a cons-cell of type T contains a reference to the unique object Nil, or to another cons-cell of type T.

We can create a list like this:

scala> val l = 2 :: 3 :: 4 :: 5 :: Nil
l: List[Int] = List(2, 3, 4, 5)

Note that the :: operator is right-associative, so a :: b :: s == a :: (b :: s), and the code above is equivalent to this:

scala> val l = 2 :: (3 :: (4 :: (5 :: Nil)))
l: List[Int] = List(2, 3, 4, 5)

This list consists of nine objects on the heap (four cons-cells, four integers, and the object Nil), like this:

A list on the heap

You may have noticed that Scala displays the list with a different syntax. The same syntax can be used to create lists more easily:

scala> val l2 = List(2, 3, 4, 5)
l2: List[Int] = List(2, 3, 4, 5)
scala> val empty = List[Int]()
empty: List[Int] = List()

Note that I indicated the type of list for the empty list—otherwise Scala cannot know what type of list I want.

Remember that a reference to a List[Int] is really a reference to cons-cell, so we can access the fields head and tail. The head is an integer, the tail is itself a list:

scala> l2.head
res0: Int = 2
scala> l2.tail
res1: List[Int] = List(3, 4, 5)
scala> l2.tail.head
res3: Int = 3
scala> l2.tail.tail
res4: List[Int] = List(4, 5)

Try following the references in the figure above to understand the output!

Do you remember the problem of reading words from a large file with English words from our ArrayBuffer tutorial?

Here is a new implementation of that readWords function, now returning a list instead of an array:

def readWords(): List[String] = {
  val F = Source.fromFile(fname)
  var A = List[String]()
  for (line <- F.getLines()) {
    A = line :: A
  }
  A
}

Here is the result of the test run:

$ scala readwords4.scala 
Reading all 113809 words took 126 milliseconds

The first ten words are:
zymurgy
zymurgies
zymotic
zymosis
zymoses
zymology
zymologies
zymogens
zymogenes
zymogene

So it is quite fast, but there is a slight problem: the final list is in reverse order!

This is true because adding an element to the list, in the line

  A = line :: A

adds the element line at the beginning of the list!

In fact, it is not possible to add elements efficiently at the end of the list. We can fix this by asking for the reversed list after reading the entire file:

def readWords(): List[String] = {
  val F = Source.fromFile(fname)
  var A = List[String]()
  for (line <- F.getLines()) {
    A = line :: A
  }
  A.reverse
}

Scala lists are immutable. In addition to being safer, as we discussed before, this has another advantage: we can reuse or share parts of lists. For instance:

scala> val a = List(5, 7, 9)
a: List[Int] = List(5, 7, 9)
scala> val b = 2 :: a
b: List[Int] = List(2, 5, 7, 9)
scala> val c = 3 :: a 
c: List[Int] = List(3, 5, 7, 9)
scala> val d = a.tail
d: List[Int] = List(7, 9)
scala> a
res0: List[Int] = List(5, 7, 9)
scala> b
res1: List[Int] = List(2, 5, 7, 9)
scala> c
res2: List[Int] = List(3, 5, 7, 9)
scala> d
res3: List[Int] = List(7, 9)

Note that the lists b and c only differ in their first element. If I stored these sequences as arrays, most of the sequence would have to be duplicated. For lists this is not necessary: the common part is stored only once.

Lists are immutable, and they fit very well with the functional style of programming that we mentioned before (where all variables are val variables and functions have no side-effects). In particular, since the tail of a list is also a list, lists can be handled very nicely with recursive functions. If you learn functional programming in Scheme later (for instance in CS220), remember that you can do the same with Scala functions. For instance, the following recursive Scheme function:

(define (square-list items)
  (if (null? items)
      empty
    (cons (square (car items))
          (square-list (cdr items)))))

can be written in Scala like this:

def squareList(items: List[Int]): List[Int] =
  if (items == Nil) Nil else
    square(items.head) :: squareList(items.tail)

Lists