Course Information

1.Introduction. This is an introductory course that covers some of the most fundamental topics of exact string pattern recognition. There will be general descriptions of those topics, but there will not be an in-depth discussion of each. Instead, the course is intended to give the student an overview of the field.

2.Course goals. The goals of this course include:

2.1.To know the theoretical and algorithmic foundations of exact string pattern recognition.

2.2.To provide the students with a hands-on approach that will include to know the practical issues involved in the programming of pattern recognition algorithms.

2.3.To know the main applications of exact string pattern recognition to other problems in computer science.

2.4.To know some applications of exact string pattern recognition to problems found in other fields, in particular, in Computational Biology and Computational Music Theory.

3.Course content.

3.1.Review of some basic concepts on complexity, data structures and algorithms.

3.2.Exact pattern recognition. The brute-fore algorithm. Algorithms based on preprocessing. Preprocessing in linear time. Linear-time exact matching algorithm.

3.3.The Boyer-Moore algorithm. Analysis of their complexity. The Knuth-Morris-Pratt algorithm. Pattern recognition with finite automata. Real-time string matching. al.

3.4.Preprocessing in the Knuth-Morris-Pratt algorithm. Exactg matching with a set of patterns.

3.5.Introduction to suffix trees. The naive algorithm to build suffix trees. Ukkonen’s linear-time suffix tree algorithm. Practical implementation issues.

3.6.The edit distance between two strings. Dynamic programming calculation of edit distance. String similarity.

3.7.Aplications of exact string pattern recognition algorithms. Suffix trees and the exact set matching problem. The substring of more than two strings. Longest common substring of two strings. DNA contamination. Circular string linearization. The edit distance and the problem of melodic similarity.

4.Grading.

4.1.For the February examination session:

4.1.1.Attendance of the 75% of sessions is required.

4.1.2.Course grade will be assigned based on scores on four homework assignments. There will be both theoretical and practical (programming) assignnments. There will at least 4 assignments and at most 6, depending on time and pace.

4.1.3.Each assignment will have the same weight over the final grade.

4.1.4.One of the assignments will consist of a programming project.

4.1.5.A pass is obtained with 50 points over 100.

4.2.For the other examination sessions students will have to hand over a project (60%) and write an exam (40%).

 

5.Prerequisites. There is no formal prerequisites for this course.

6.References.

6.1.Cormen, Leiserson y Rivest, Introduction to Algorithms, McGrawHill, 2001.

6.2.Dan Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.

6.3.Web site of Algorithms in Real World: http://www.cs.cmu.edu/afs/cs/project/pscico-guyb/realworld/www/index.