Algorithms and Data Structures part2

As described in the handout for Part 1, the overall aim of the assignment is to develop a program
to build a lexicon and to find the words that match certain patterns.
Whereas for Part 1 we are only concerned with the correctness of the program, for Part 2 we
are concerned with the efficiency. More specifically, you are required to do the tasks described
below.
Besides the information given in the tasks below, please refer to Part 1 of the Assignment for
any other information you need.
Task 1
Write a Java program called WordMatch.java. This program takes four command-line arguments:
1. The first is the name of a text file that contains the names of one or more text files from
which the words are to be read to build the lexicon. The names of the input files must
include the file extensions. (This argument “specifies” the input files.)
2. The second is the name of the text file to which the words in the lexicon are to be written.
This output file can be new. If it already exists, it will be overwritten. (This argument
specifies the file that contains the word list.)
The word list is to be written in the format described in Part 1. A sample is shown below:
a 11 [i]
about 1 []
acknowledged 1 []
all 1 []
also 1 []
and 6 []
answer 1 []
at 2 [it]
austen 1 []
be 2 [by, he, me]
been 1 []
bennet 3 []
but 1 []
by 1 [be, my]
can 1 [cat, man]
cat 1 [can, fat, hat, mat]
chapter 1 []
considered 1 []
cried 1 []
cup 1 []
2
3. The third is the name of a text file that contains a number of matching patterns, one per line.
(This argument specifies the matching patterns.)
4. The fourth is the name of the text file that contains the result of the matching for the given
patterns. This output file can be new. If it already exists, it will be overwritten. (This
argument specifies the file that contains the matching results.)
The format should be like this. First, the matching pattern is displayed, followed by the
matching words and their frequencies (one word per line), followed by a blank line (to
separate this result from the next). A sample is shown below:
ma?
man 2
map 1
mat 1
may 1
?o?
for 1
not 2
too 1
you 5
zoo 1
For this version, the efficiency with which the program performs various operations is the
major concern.
For example, the files read in can be quite long and the lexicon of words can grow to be quite
lengthy. Time to insert the words will be critical here and you will need to carefully consider
which algorithms and data structures you use.
You can use any text files for input to this program. A good source of long text files is at
the Gutenberg project (www.gutenberg.com) which is a project aimed to put into electronic form
older literary works that are in the public domain. The extract from Jane Austen’s book Pride and
Prejudice used as the sample text file above was sourced from this web site. You should choose
files of lengths suitable for providing good information about the efficiency of your program.
A selection of test files have been posted on LMS for your efficiency testing. You can consider
additional test files if you wish.
As expected, the definition of a word, and the content of a query’s result and display of this
result are exactly the same as what described in Part 1.
This program will be marked on correctness, efficiency and coding style and documentation.
With the exceptions of ArrayList and LinkedList, you are not permitted to use any of
the classes in the Java Collections Framework, e.g. TreeMap, HashMap, Collections.
3
Task 2
For this task, you are required to submit electronically a written report. This report is concerned
with the optimized version only. It should have three main sections:
• Description of the Design and Implementation. You should describe how the program
works and which data structures and algorithms you use. You should focus on the data
structures and algorithms used and your reasons for choosing them.
• Correctness Testing. You should describe the tests carried out on the program as evidence
of the correctness of your program. What input files did you use? What test case did you
include? The test cases should be reported in table form with columns such as test case
number, matching pattern, reason and expected result, and actual outcome.
• Efficiency Testing. You should describe the strategies you use to “measure” your program’s
efficiency with respect to each tasks it has to perform. What data sets (i.e. text files) did
you use? Reasons for using them? Matching patterns and reasons for their inclusion? What
techniques dis you use to measure time efficiency? What were the results? And, perhaps,
you should comment on the results obtained.
The report should not be longer than 6 pages. Point form and tables are acceptable where
appropriate.
It should be submitted electronically as a file with the name “report”. Acceptable formats are:
PDF or Word).
Make sure your name and student number are on every page of your report. (Putting it into a
header or footer would be a good idea.)
Task 3 (Optional)
Develop a solution for the puzzle in which you are given two words, and you need to find a
sequence of words to transform the first words into the second by changing one letter at a time.
The key idea is to represent the list of words of the same length as a graph in which each node
represents a word, and two nodes are connected by an edge if the two associated words differ from
each other only by one letter. Thus, the solution would involve algorithms on graphs which we are
yet to cover.
Note: This optional work should NOT be attempted until all other tasks have been successfully
completed. Good attempts at these extensions are worth up to an additional 10% of the mark for
Part 2. However, it is not possible to end up with more than 100% for the total mark of Part 2.
4
Task 4
Consider the B-trees of order M . Assume that we have the following result, which we will refer
to as Lemma 1.
Lemma 1: The barest B-tree of height H contains N = 2K H - 1 elements, where K = ⌈ M 2 ⌉.
Determine the upper bound for a B-tree of order 21 which has 1, 000, 000 = 106 elements.
You must give an integer value as the upper bound of the B-tree.
You are not allow to use the result given in the lecture regarding the upper bound for B-tree’s
height. Instead, you must work out the answer using Lemma 1 above.
Note: The total mark for Part 2 will be 100 for CSE2ALG students and 110 (100 + 10 for Task
4) for CSE5ALG students. The percentage of contribution to the final will be the same.

5

代写C代码代做C程序 C辅导 C家教

远程写代码 Debug 讲解答疑不是中介,本人直接写

微信: ittutor QQ: 14061936 Email: ittutor@qq.com

导航

相关推荐