MapsProgramming Practice (CS109)ExceptionsArrayBuffer and StringBuilder

ArrayBuffer and StringBuilder

ArrayBuffer

The length of an array is fixed: Once you have created the array object, the number of slots in the array cannot be changed.

Nevertheless we have used the :+ operator to add an element to an array. Here it is again:

scala> val a = Array(1, 2, 3, 4)
a: Array[Int] = Array(1, 2, 3, 4)
scala> val b = a :+ 17
b: Array[Int] = Array(1, 2, 3, 4, 17)

If you look carefully, you notice that the operator did not modify the array a. Instead, it returned a new array, which contained one more slot than the old one. We can check this again:

scala> a
res0: Array[Int] = Array(1, 2, 3, 4)
scala> b
res1: Array[Int] = Array(1, 2, 3, 4, 17)

The following function plus implements what the :+ operator does:

def plus(a: Array[Int], b: Int): Array[Int] = {
  val r = new Array[Int](a.length + 1)
  for (i <- 0 until a.length)
    r(i) = a(i)
  r(r.length-1) = b
  r
}

However—when the array a is large, then this becomes a slow operation.

We can see this by running the following program on our words.txt file (containing 113809 English words):


import scala.compat.Platform.currentTime
import scala.io.Source

val fname = "words.txt"

def readWords(): Array[String] = {
  val F = Source.fromFile(fname)
  var A = new Array[String](0)
  for (line <- F.getLines()) {
    A = A :+ line
  }
  A
}

println("Starting...")
val t0 = currentTime
val A = readWords()
val t1 = currentTime
printf("Reading all %d words took %d milliseconds\n", A.length, t1 - t0)

println()
println("The first ten words are:")
for (i <- 0 until 10)
  println(A(i))
Here is the output on my computer:
$ scala readwords1.scala 
Starting...
Reading all 113809 words took 6583 milliseconds

The first ten words are:
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
It took over 6 seconds to read the file.

If we knew the number of words in advance, we could of course simply make an array that is large enough and read the words into this array. For instance, we could use this readWords function:

def readWords(): Array[String] = {
  val F = Source.fromFile(fname)
  val A = new Array[String](200000)
  var count = 0
  for (line <- F.getLines()) {
    A(count) = line
    count += 1
  }
  val B = new Array[String](count)
  for (i <- 0 until count)
    B(i) = A(i)
  B
}

The output changes to

$ scala readwords2.scala 
Starting...
Reading all 113809 words took 66 milliseconds

The first ten words are:
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
Note that reading the file has become 100 times faster!

But this is not a good solution, because it requires us to know the maximum possible file size in advance. If the file has only a few words, then we waste a lot of time creating the large array. And if the file happens to be even larger than the large array, then the program crashes.

Fortunately for us, Scala provides a data structure ArrayBuffer, which can be used like an array (it has all the methods described here), but which can also grow efficiently:

scala> val a = new scala.collection.mutable.ArrayBuffer[Int]
a: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer()
scala> a.length
res0: Int = 0
scala> a += 1
res1: a.type = ArrayBuffer(1)
scala> a += 2
res2: a.type = ArrayBuffer(1, 2)
scala> a.length
res3: Int = 2
The ArrayBuffer is implemented using an internal array that reserves some extra space for future additions. Only when the extra space has been completely filled, then a new internal array is created and data is copied. The ArrayBuffer manages this efficiently.

Let's see how we can use an ArrayBuffer for reading our file of words:

def readWords(): Array[String] = {
  val F = Source.fromFile(fname)
  var A = new scala.collection.mutable.ArrayBuffer[String]
  for (line <- F.getLines()) {
    A += line
  }
  A.toArray
}

This time, the output is

$ scala readwords3.scala 
Starting...
Reading all 113809 words took 80 milliseconds

The first ten words are:
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark

As you can see, the ArrayBuffer is very efficient! If it is necessary, you can convert the contents of the ArrayBuffer to an array by calling its toArray method.

StringBuilder

A similar problem sometimes occurs for strings: we want to build a large string from small pieces. Here is an example:

import scala.compat.Platform.currentTime

def makestring(A: Array[Int]): String = {
  var s = ""
  for (e <- A) {
    if (s.isEmpty)
      s = e.toString
    else
      s = s + ", " + e.toString
  }
  s
}

val n = args(0).toInt
val A = (1 to n).toArray

println("Starting...")
val t0 = currentTime
val str = makestring(A)
val t1 = currentTime
println(str)
printf("Making a string of %d numbers took %d milliseconds\n", n, t1 - t0)

The output is

$ scala makestring1.scala 100
Starting...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100
Making a string of 100 numbers took 3 milliseconds
$ scala makestring1.scala 1000
Starting...
Making a string of 1000 numbers took 32 milliseconds
$ scala makestring1.scala 10000
Starting...
Making a string of 10000 numbers took 526 milliseconds
$ scala makestring1.scala 20000
Starting...
Making a string of 20000 numbers took 1709 milliseconds
$ scala makestring1.scala 40000
Starting...
Making a string of 40000 numbers took 7179 milliseconds
$ scala makestring1.scala 100000
Starting...
Making a string of 100000 numbers took 48955 milliseconds
(I suppressed the printing of the string for more than 100 elements.)

Note how the code gets slower and slower: Doubling the number of elements causes the runtime to increase by a factor of roughly four.

The reason is the same as in the readwords programs: String objects are immutable, so every time we add a new integer to the string, a new String object has to be created, and all the characters from the long old string have first to be copied to the new string.

The solution is the same: Scala provides a data type StringBuilder which is essentially a mutable String. Once the string has been fully built, we can convert it to a normal string by calling its toString method.

Here is the new makestring function:

def makestring(A: Array[Int]): String = {
  var s = new StringBuilder
  for (e <- A) {
    if (s.isEmpty)
      s ++= e.toString
    else
      s ++= ", " + e.toString
  }
  s.toString
}

Running this script for various number of elements gives the following result:

$ scala makestring2.scala 100
Starting...
Making a string of 100 numbers took 3 milliseconds
$ scala makestring2.scala 1000
Starting...
Making a string of 1000 numbers took 13 milliseconds
$ scala makestring2.scala 10000
Starting...
Making a string of 10000 numbers took 62 milliseconds
$ scala makestring2.scala 100000
Starting...
Making a string of 100000 numbers took 121 milliseconds
$ scala makestring2.scala 1000000
Starting...
Making a string of 1000000 numbers took 350 milliseconds

Note that even a million numbers are no problem now.

StringBuilder objects have some useful methods that you can find here.

MapsProgramming Practice (CS109)ExceptionsArrayBuffer and StringBuilder