Home

Misspelling words!

Misspelling words!

Download the cmudict.txt and words.txt data files. The following function reads both files, and creates a map that maps English words (in lower case) into a set with possible pronounciations of this word.
def readPronounciations(): Map[String,Set[String]] = {
  val words = (scala.io.Source.fromFile("words.txt").getLines.toSet 
	       union Set("i", "a"))
  val F = scala.io.Source.fromFile("cmudict.txt")
  var M = Map[String,Set[String]]()
  for (l <- F.getLines()) {
    if (l(0).isLetter) {
      val p = l.trim.split("\\s+", 2)
      val i = p(0).indexOf('(')
      val word = (if (i >= 0) p(0).substring(0,i) else p(0)).toLowerCase
      if (words contains word) {
	val pro = p(1)
	val S = M.getOrElse(word, Set())
	M = M + (word -> (S + pro))
      }
    }
  }
  M
}

Your task is to write a Scala script misspeller.scala.

The script must take a filename as a command line argument, and then print the file, but with words replaced by homophones (that is, words that sound exactly the same). When there are several homophones, choose one randomly (but never chose the original word).

For instance, when poem.txt contains the following text:

I have a spell checker,
it came with my PC.
It plainly marks for my review
mistakes I can not see.

I ran this poem through it,
you're surely glad to know,
it's very polished in its way.
My checker told me so.
then the output could be like this:
$ scala misspeller.scala poem.txt 
AYE halve ay spell checker,
it came with my PC.
It plainly marques fore my review
mistakes AY can knot si.

EYE ran this poem threw it,
yew'ree surely glad too no,
it's vary polished inn its weigh.
My checker told mi sew.

Hints

Start by writing a function homophoneMap that constructs a map that maps words to homophones (using readPronounciations and reverseMap). It will be convenient for the value type to be an array of strings (not a set). Note that the value should not include the original words. Words that have no homophone should not be in the map:

def homophoneMap(): Map[String, Array[String]] = // up to you

Some examples:

scala> val hm = homophoneMap()
hm: Map[String,Array[String]] = Map(motts -> Array(mots), breaks ->
Array(brakes), compliment -> Array(complement), haugh -> Array(haw),
derry -> Array(dairy), nicols -> Array(nickles, nickels), alpha ->
Array(alfa), california -> Array(calif), peons -> Array(paeans),...

scala> hm contains "lovely"
res1: Boolean = false

scala> hm("knight")
res2: Array[String] = Array(night)

scala> hm("night")
res3: Array[String] = Array(knight)

scala> hm("write")
res4: Array[String] = Array(wright, rite, right)

scala> hm("right")
res5: Array[String] = Array(wright, rite, write)

When you go through the input file, don't try to use split to partition a line into words. It is easier to go through the line from left to right. All characters that are not letters (that is, between 'A' and 'Z' or 'a' and 'z') should be printed immediately. Once you detect a letter, scan forward to find the last consecutive letter, and extract the word as a substring.

Try to leave punctuation exactly as it was, changing only the words (sequences of letters). Try to preserve case: If a word was all lower-case, replace it with lower-case. If the first letter was capitalized, capitalize the replacement. If the entire word was in capitals, print the replacement in capitals.

Try using the program on your next English assignment, and drive your English instructor crazy!