Project 2: Linear-Time String Matching Algorithms - The Z Algorithm
DEADLINE: March 16th, 2011, 9 p.m.
The goal of this project is to study the theoretical and experimental behaviour of the Z algorithm. In order to do so, this algorithm will test on several data test chosen from different contexts.
More specifically, goals of this project include:
The second part of this pro ject consists of programming the Z algorithm and checking it out over large files and patterns with wild cards. Below you have a table with a list of patterns and texts.
|
Pattern
|
Wild Card
|
Text
|
Alphabet
|
|
Pattern1.txt
|
No
|
Text1.txt
|
Binary
|
|
Pattern2.txt
|
Yes, char=X
|
Text2.txt
|
Binary
|
|
Pattern3.txt
|
No
|
Text3.txt
|
Binary
|
|
Pattern4.txt
|
No
|
Text4.txt
|
DNA
|
|
Pattern2.txt
|
Yes, char=X
|
Text5.txt
|
DNA
|
All the files can be found at:
http://www.eui.upm.es/~fmartin/webpgomez/Docencia/Pattern-Recognition/
Solve the SMC problem and after that answer the following questions:
Implementation of algorithms may be done in any language of student’s choice. However, the language and its compiler should support certain features in order to be able to run the experiments properly. The choice of C, C++, Maple or the like should be enough. Source code and a .exe file have to be handed over.
A paper describing the following points must be handed over.
The paper has to be written in correct English; it also has to possess clarity of thought. Show me what you know; do not force to search for it through a poorly written paper.
The whole project counts 50% of your programming final grade. I will take points off if:
I am willing to answer your questions about algorithms, complexity or the experiment. I will not answer questions about coding errors as it is my feeling that, at this point, writing error-free code is
your responsibility.