Gain experience using tree structures ??" more specifically a trie
Learn the basic process of writing a Java program from scratch.
Learn how to use hash classes (more about this in recitation sections).
**Make Your Own Best Shakespeare
Write a program that takes as a parameter the control file name. Read in the whole data file and use a trie data structure to store the n-grams and their frequencies. For each n-gram in the control file, generate more words, each of which being the best guess (using our trie).
Note that in some cases, we might not be able to produce the required number of words based on the n-gram. In the example above we are supposed to generate up to 10 words using 3-gram frequencies. If we start with the n-gram "president richard nixon", the next word we generate is supposed to be the most frequent word occurring after the "richard nixon". Since no such 3-gram ever occurs in the [url removed, login to view] data file, we stop there and don't produce any more words. Instead we print the token IMSTUCK and go on to the next line.
We will expect your main class to be called MyOS. Here is an example command-line call to your class:
java MyOS [url removed, login to view]
Your program should write the results to standard output (STDOUT).
You will pack all your java and class files into [url removed, login to view] file. If your name is Harry Potter, then the file you will turn in will be called hw2-harrypotter.jar. Do not pack a data file or a control file with your code. We will unpack your jar file in a directory and run the command: javac *.java in that base directory and nothing else in order to compile your program. Please make sure that this sequence of steps works under a unix/linux machine:
jar -xvf [url removed, login to view]
java -Xmx230m -Xms8m MyOS [url removed, login to view] > outfile
You should start out by making a small example and testing it. Below are two large data files you can play with. These files will be used to test your program with different parameters. The memory parameter (230M in above example) may be increased for some higher n-grams.
Alice in Wonderland [200K] (attached in the folder)
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Please read the [url removed, login to view] in the attached zip file. Follow directions and please do not use very complicated codes. Use appropriate difficulty level of codes when writing it.