ExceptionsProgramming Practice (CS109)Immutable and mutable objects, references and the heapSets

Sets

A Set is a data type used to store a collection of elements. There is no order on the collection, and all the elements must be distinct (so you cannot have several copies of the same element in the set). In other words, a Set is the Scala implementation of a set in mathematics.

Sets appear naturally in many problems: Correctly spelled words form a set. Prime numbers form a set.

Of course, we could simply use an array to store a collection of items that forms a set. But this is not natural and often not efficient: An array is an indexed sequence, where the elements have an ordering, the same element can appear several times, and searching for an element can only be done by looking at all the elements in the array one by one.

Fortunately, Scala provides for us a very nice Set data type. More precisely, Set is a parameterized data type, so there is a Set[Int], Set[String], and so on.

Let's create some sets:

scala> val s = Set(2, 3, 5, 7, 9)
s: scala.collection.immutable.Set[Int] = Set(5, 9, 2, 7, 3)
scala> val w = Set("CS109", "is", "wonderful")
w: scala.collection.immutable.Set[String] = Set(CS109, is, wonderful)
scala> val e = Set()
e: scala.collection.immutable.Set[Nothing] = Set()
Note that we can write the empty set simply as Set(). You can also create sets by converting other collections (including arrays) using their toSet method.

Remember that it doesn't matter in which order you write the elements, and an element cannot appear more than once:

scala> val s2 = Set(9, 9, 5, 7, 3, 5, 3, 2)
s2: scala.collection.immutable.Set[Int] = Set(5, 9, 2, 7, 3)
scala> s == s2
res0: Boolean = true
scala> val w2 = Set("wonderful", "is", "CS109")
w2: scala.collection.immutable.Set[String] = Set(wonderful, is, CS109)
scala> w == w2
res1: Boolean = true

The + and - operators can be used to add elements to a set, and to remove elements from a set. The result is a new set. It is okay to add elements already in the set, and to remove elements not in the set.

scala> s + 11
res2: scala.collection.immutable.Set[Int] = Set(5, 9, 2, 7, 3, 11)
scala> s - 6
res3: scala.collection.immutable.Set[Int] = Set(5, 9, 2, 7, 3)
scala> s - 5
res4: scala.collection.immutable.Set[Int] = Set(9, 2, 7, 3)
scala> s + 7
res5: scala.collection.immutable.Set[Int] = Set(5, 9, 2, 7, 3)
scala> s + 7 == s
res6: Boolean = true

There are Scala operators for the standard mathematical operations of union, intersection, difference, and containment:

Here are some examples:

scala> val a = (1 to 10).toSet
a: scala.collection.immutable.Set[Int] = Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
scala> val b = (1 to 10 by 2).toSet
b: scala.collection.immutable.Set[Int] = Set(5, 1, 9, 7, 3)
scala> val c = (1 to 5).toSet
c: scala.collection.immutable.Set[Int] = Set(5, 1, 2, 3, 4)
scala> b subsetOf a
res7: Boolean = true
scala> c subsetOf b
res8: Boolean = false
scala> c subsetOf a
res9: Boolean = true
scala> a diff b
res10: scala.collection.immutable.Set[Int] = Set(10, 6, 2, 8, 4)
scala> b union c
res11: scala.collection.immutable.Set[Int] = Set(5, 1, 9, 2, 7, 3, 4)
scala> b intersect c
res12: scala.collection.immutable.Set[Int] = Set(5, 1, 3)

As a first example of using sets, here is a simple spell checker. It makes use of the file words.txt, containing 113809 English words, one per line.

We read the file and immediately convert it to a set (you can learn about reading text files here). Then we allow the user to enter words from the terminal. We check if the word is in the set of correctly spelled words, and report this back.

import scala.io.Source
import scala.io.StdIn.readLine

val fname = "words.txt"

val F = Source.fromFile(fname)
val words = F.getLines().toSet

while (true) {
  val w = readLine("Enter a word> ").trim
  if (w == "")
    sys.exit()
  if (words contains w)
    println(w + " is a word")
  else
    printf("Error: %s is not a word\n", w)
}
Here is an example run:
$ scala spell1.scala 
Enter a word> lovely
lovely is a word
Enter a word> lovly
Error: lovly is not a word
Enter a word> wierd
Error: wierd is not a word
Enter a word> weird
weird is a word
Enter a word> supercede
Error: supercede is not a word
Enter a word> supersede
supersede is a word
Enter a word> 

You could also use sets to measure the similarity between two texts (for instance to classify documents on the web). Consider the set of words of each text, and compare the size of their union with the size of their intersection.

Another classic application is the Sieve of Erathosthenes to compute prime numbers. Here is one implementation:

def sieve(n: Int): Set[Int] = {
  var s = Set[Int]()
  for (i <- 2 to n)
    s = s + i
  val sqrtn = math.sqrt(n).toInt
  for (i <- 2 to sqrtn) {
    if (s contains i) {
      var k = i * i
      while (k <= n) {
	s = s - k
	k += i
      }
    }
  }
  s
}
  
val num = if (args.nonEmpty) args(0).toInt else 1000

val primes = sieve(num).toArray.sorted

for (i <- primes)
  print(i + " ")
println()
And the output of a run:
$ scala sieve1.scala 500
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181
191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277
281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383
389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487
491 499
Note that to print the output, I first convert the set to an array and then sort the array. Since a set has no natural order, without this the output would be in some arbitrary order.

There are many other applications of sets. For instance, when finding your way in a maze, you could store the positions already seen in a set. Similarly, when implementing a computer game, you could store the game positions already evaluated in a set.

Mutable sets

All the sets we have looked at above are immutable: There is no way to change the contents of a Set. Set operations like union and intersection actually return a new set object.

Sometimes it is more efficient or convenient to use a mutable set (but remember that these are more dangerous to use). Scala provides a mutable set data type, where you can add elements using the operator += and remove elements using the operator -=.

scala> val s = scala.collection.mutable.Set(1, 2, 3, 4)
s: scala.collection.mutable.Set[Int] = Set(1, 2, 3, 4)
scala> s += 9
res0: s.type = Set(9, 1, 2, 3, 4)
scala> s += 13
res1: s.type = Set(9, 1, 13, 2, 3, 4)
scala> s
res2: scala.collection.mutable.Set[Int] = Set(9, 1, 13, 2, 3, 4)
scala> s -= 2
res3: s.type = Set(9, 1, 13, 3, 4)
scala> s
res4: scala.collection.mutable.Set[Int] = Set(9, 1, 13, 3, 4)

Below I've rewritten the Sieve of Erathosthenes to use a mutable set:

import scala.collection.mutable.Set

def sieve(n: Int): Set[Int] = {
  var s = Set[Int]()
  for (i <- 2 to n)
    s += i
  val sqrtn = math.sqrt(n).toInt
  for (i <- 2 to sqrtn) {
    if (s contains i) {
      var k = i * i
      while (k <= n) {
	s -= k
	k += i
      }
    }
  }
  s
}
  
val num = if (args.nonEmpty) args(0).toInt else 1000

val primes = sieve(num).toArray.sorted

for (i <- primes)
  print(i + " ")
println()
ExceptionsProgramming Practice (CS109)Immutable and mutable objects, references and the heapSets