Misspelling words! |
Download the cmudict.txt and words.txt data files. The following function reads both files, and creates a map that maps English words (in lower case) into a set with possible pronounciations of this word.
fun readPronounciations(): Map<String, Set<String>> { val words = java.io.File("words.txt").useLines { it.toSet() + setOf("i", "a") } val M = mutableMapOf<String, Set<String>>() java.io.File("cmudict.txt").forEachLine { l -> if (l[0].isLetter()) { val p = l.trim().split(Regex("\\s+"), 2) val i = p[0].indexOf('(') val word = (if (i >= 0) p[0].substring(0,i) else p[0]).toLowerCase() if (word in words) { val pro = p[1] val S = M.getOrElse(word) { emptySet<String>() } M[word] = S + pro } } } return M }
Your task is to write a script misspeller.kts.
The script must take a filename as a command line argument, and then print the file, but with words replaced by homophones (that is, words that sound exactly the same). When there are several homophones, choose one randomly (but never chose the original word).
For instance, when poem.txt contains the following text:
I have a spell checker, it came with my PC. It plainly marks for my review mistakes I can not see. I ran this poem through it, you're surely glad to know, it's very polished in its way. My checker told me so.then the output could be like this:
$ kts misspeller.kts poem.txt AYE halve ay spell checker, it came with my PC. It plainly marques fore my review mistakes AY can knot si. EYE ran this poem threw it, yew'ree surely glad too no, it's vary polished inn its weigh. My checker told mi sew.
Start by modifying the function reverseMap from the tutorial so that it starts with a Map<String, Set<String>>. The result should again be a Map<String, Set<String>>. Test the function:
>>> val m = readPronounciations() >>> val r = reverseMap(m) >>> m["lovely"] [L AH1 V L IY0] >>> r["L AH1 V L IY0"] [lovely] >>> m["write"] [R AY1 T] >>> r["R AY1 T"] [right, rite, wright, write]
Next, write a function homophoneMap that constructs a map that maps words to homophones (using readPronounciations and reverseMap). It will be convenient for the value type to be a List<String>. The value should not include the original words. Words that have no homophone should not be in the map:
fun homophoneMap(): Map<String, List<String>> { // ... }
Some examples:
>>> :load misspeller.kts >>> val h = homophoneMap() >>> h["be"] [bee] >>> h["write"] [right, rite, wright] >>> h["rite"] [right, wright, write] >>> h["minute"] null >>> h["bear"] [bare]
Now it's time to implement the project. Write a function misspellLine(s: String) that converts a string as described above. Don't try to use split to partition the string into words. It is easier to go through the string from left to right. All characters that are not letters (that is, between 'A' and 'Z' or 'a' and 'z') should be printed immediately. Once you detect a letter, scan forward to find the last consecutive letter, and extract the word as a substring.
Try to leave punctuation exactly as it was, changing only the words (sequences of letters). Try to preserve case: If a word was all lower-case, replace it with lower-case. If the first letter was capitalized, capitalize the replacement. If the entire word was in capitals, print the replacement in capitals.
Finally, the following code will call misspellLine for each line of the file given as an argument.
if (args.size != 1) { println("Usage: kts misspeller.kts <filename>") kotlin.system.exitProcess(1) } val hm = homophoneMap() java.io.File(args[0]).forEachLine { misspellLine(it, hm) }
Try using the program on your next English assignment, and drive your English instructor crazy!
Misspelling words! |