ArrayBuffer and StringBuilder |
The length of an array is fixed: Once you have created the array object, the number of slots in the array cannot be changed.
Nevertheless we have used the :+ operator to add an element to an array. Here it is again:
scala> val a = Array(1, 2, 3, 4) a: Array[Int] = Array(1, 2, 3, 4) scala> val b = a :+ 17 b: Array[Int] = Array(1, 2, 3, 4, 17)
If you look carefully, you notice that the operator did not modify the array a. Instead, it returned a new array, which contained one more slot than the old one. We can check this again:
scala> a res0: Array[Int] = Array(1, 2, 3, 4) scala> b res1: Array[Int] = Array(1, 2, 3, 4, 17)
The following function plus implements what the :+ operator does:
def plus(a: Array[Int], b: Int): Array[Int] = { val r = new Array[Int](a.length + 1) for (i <- 0 until a.length) r(i) = a(i) r(r.length-1) = b r }
However—when the array a is large, then this becomes a slow operation.
We can see this by running the following program on our words.txt file (containing 113809 English words):
import scala.compat.Platform.currentTime import scala.io.Source val fname = "words.txt" def readWords(): Array[String] = { val F = Source.fromFile(fname) var A = new Array[String](0) for (line <- F.getLines()) { A = A :+ line } A } println("Starting...") val t0 = currentTime val A = readWords() val t1 = currentTime printf("Reading all %d words took %d milliseconds\n", A.length, t1 - t0) println() println("The first ten words are:") for (i <- 0 until 10) println(A(i))Here is the output on my computer:
$ scala readwords1.scala Starting... Reading all 113809 words took 6583 milliseconds The first ten words are: aa aah aahed aahing aahs aal aalii aaliis aals aardvarkIt took over 6 seconds to read the file.
If we knew the number of words in advance, we could of course simply make an array that is large enough and read the words into this array. For instance, we could use this readWords function:
def readWords(): Array[String] = { val F = Source.fromFile(fname) val A = new Array[String](200000) var count = 0 for (line <- F.getLines()) { A(count) = line count += 1 } val B = new Array[String](count) for (i <- 0 until count) B(i) = A(i) B }
The output changes to
$ scala readwords2.scala Starting... Reading all 113809 words took 66 milliseconds The first ten words are: aa aah aahed aahing aahs aal aalii aaliis aals aardvarkNote that reading the file has become 100 times faster!
But this is not a good solution, because it requires us to know the maximum possible file size in advance. If the file has only a few words, then we waste a lot of time creating the large array. And if the file happens to be even larger than the large array, then the program crashes.
Fortunately for us, Scala provides a data structure ArrayBuffer, which can be used like an array (it has all the methods described here), but which can also grow efficiently:
scala> val a = new scala.collection.mutable.ArrayBuffer[Int] a: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer() scala> a.length res0: Int = 0 scala> a += 1 res1: a.type = ArrayBuffer(1) scala> a += 2 res2: a.type = ArrayBuffer(1, 2) scala> a.length res3: Int = 2The ArrayBuffer is implemented using an internal array that reserves some extra space for future additions. Only when the extra space has been completely filled, then a new internal array is created and data is copied. The ArrayBuffer manages this efficiently.
Let's see how we can use an ArrayBuffer for reading our file of words:
def readWords(): Array[String] = { val F = Source.fromFile(fname) var A = new scala.collection.mutable.ArrayBuffer[String] for (line <- F.getLines()) { A += line } A.toArray }
This time, the output is
$ scala readwords3.scala Starting... Reading all 113809 words took 80 milliseconds The first ten words are: aa aah aahed aahing aahs aal aalii aaliis aals aardvark
As you can see, the ArrayBuffer is very efficient! If it is necessary, you can convert the contents of the ArrayBuffer to an array by calling its toArray method.
A similar problem sometimes occurs for strings: we want to build a large string from small pieces. Here is an example:
import scala.compat.Platform.currentTime def makestring(A: Array[Int]): String = { var s = "" for (e <- A) { if (s.isEmpty) s = e.toString else s = s + ", " + e.toString } s } val n = args(0).toInt val A = (1 to n).toArray println("Starting...") val t0 = currentTime val str = makestring(A) val t1 = currentTime println(str) printf("Making a string of %d numbers took %d milliseconds\n", n, t1 - t0)
The output is
$ scala makestring1.scala 100 Starting... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 Making a string of 100 numbers took 3 milliseconds $ scala makestring1.scala 1000 Starting... Making a string of 1000 numbers took 32 milliseconds $ scala makestring1.scala 10000 Starting... Making a string of 10000 numbers took 526 milliseconds $ scala makestring1.scala 20000 Starting... Making a string of 20000 numbers took 1709 milliseconds $ scala makestring1.scala 40000 Starting... Making a string of 40000 numbers took 7179 milliseconds $ scala makestring1.scala 100000 Starting... Making a string of 100000 numbers took 48955 milliseconds(I suppressed the printing of the string for more than 100 elements.)
Note how the code gets slower and slower: Doubling the number of elements causes the runtime to increase by a factor of roughly four.
The reason is the same as in the readwords programs: String objects are immutable, so every time we add a new integer to the string, a new String object has to be created, and all the characters from the long old string have first to be copied to the new string.
The solution is the same: Scala provides a data type StringBuilder which is essentially a mutable String. Once the string has been fully built, we can convert it to a normal string by calling its toString method.
Here is the new makestring function:
def makestring(A: Array[Int]): String = { var s = new StringBuilder for (e <- A) { if (s.isEmpty) s ++= e.toString else s ++= ", " + e.toString } s.toString }
Running this script for various number of elements gives the following result:
$ scala makestring2.scala 100 Starting... Making a string of 100 numbers took 3 milliseconds $ scala makestring2.scala 1000 Starting... Making a string of 1000 numbers took 13 milliseconds $ scala makestring2.scala 10000 Starting... Making a string of 10000 numbers took 62 milliseconds $ scala makestring2.scala 100000 Starting... Making a string of 100000 numbers took 121 milliseconds $ scala makestring2.scala 1000000 Starting... Making a string of 1000000 numbers took 350 milliseconds
Note that even a million numbers are no problem now.
StringBuilder objects have some useful methods that you can find here.
ArrayBuffer and StringBuilder |