Algorithms and Data Structures for Bioinformatics: BI-GY 7573




Lecturer:
Professor B. Mishra
with
Teaching Assistants:
Sumit Aggarwal [ email: sa4764@nyu.edu ]

Calendar
First Day of Class: September 04 2018
Last Day of Class: December 14 2018

Cancelled Classes (tentative):
Note that some of these classes may be covered by Guest Lectures.


Office Hours: By appt.
Office Phone: 212.998.3464
Email Address: mishra@nyu.edu

Day, Time and Place:
Wednesday, 8:00-9:15pm EST, Webex.

Credits for Course:
3

Prerequisites:
Mathematical Maturity (Discrete Mathematics and Linear Algebra) and Programming

Grading Policy:
Syllabus:
The online course is aimed at introducing the foundational ideas from computer science in designing and implementing bioinformatics algorithms. The goal of the underlying algorithms and data structures is to accurately abstract and model the biological problems and to devise provably correct procedures with efficient computational complexity bounds. The algorithms will be described in pseudo-codes in order to simplify the correctness and complexity analysis, but with sufficient details to enable the students implement them in any suitable software pipelines and hardware architectures.

Topic of Modules 1 & 2

Biology, Genomics, Algorithms and Data Structures. By the end of this module, the students will have familiarity with basic concepts of biology in terms genomics, transcriptomics and proteomics and how they are used in formulating combinatorial problems leading to data science algorithms and their complexity. The students will also have a historical perspectives.
[Reading 1] https://en.wikipedia.org/wiki/Algorithm;
https://en.wikipedia.org/wiki/Muhammad_ibn_Musa_al-Khwarizmi
[Reading 2] https://en.wikipedia.org/wiki/Genomics;
https://en.wikipedia.org/wiki/Human_Genome_Project

Topic of Module 3, 4, 5 & 6

Objective(s) By the end of this module, the students will have familiarity with basic concepts of Graph Theory and ability to devise new graph algorithms for bioinformatics applications. Topic of interest: Graph Representation, Trees, DeBruijn Graph, DFS and BFS, Shortest Paths, Eulerian Paths, Hamiltonian Paths, Applications to Genomics
[Reading 1] Cormen et al. Chapters 23 and 25.
[Reading 2] https://en.wikipedia.org/wiki/Hamiltonian_path;
https://en.wikipedia.org/wiki/Eulerian_path

Topic of Module 7, 8 & 9.

Strings & Stringology.
By the end of this module, the students will have familiarity with basic concepts of sequence analysis and will be able to devise new string theoretic algorithms to manipulate data structures involving strings over finite alphabets. Topics of interest: Suffix Trees and Suffix Arrays, String Matching, Edit Distance, Alignment, BLAST, BLAT.
[Reading 1] Cormen et al. Chapter 34
[Reading 2] https://en.wikipedia.org/wiki/BLAST

Topic of Module 10, 11 & 12

Sorting and Searching
By the end of this module, the students will have familiarity with basic concepts of representation of bioinformatics statistics and data and will be able to devise algorithms to search, organize and analyze data using efficient algorithms. Topics of interest: Order Statistics, Sorting, Binary Search, Merge Sort, Heap Sort, Quick Sort, Selection and Median.
[Reading 1] Cormen et al., Chapters 7, 8, 9, & 10.
[Reading 2] https://en.wikipedia.org/wiki/Sorting_algorithm; https://en.wikipedia.org/wiki/Search_algorithm

Topic of Module 13

Computational Complexity Analysis
By the end of this module, the students will have familiarity with basic concepts of complexity theory and will be able to understand, devise and solve recurrence equations for expressing the time complexity of an algorithm.
[Reading 1] Cormen et al. Chapters 2, 3 and 4.
[Reading 2] https://en.wikipedia.org/wiki/Recurrence_relation

Topic of Module 14

Future!


Required Text(s):


Midterm Date:
No Midterm.
Final Date:
Class Project.
Homework(s):
Class Presentation.

Bud Mishra
September 1 2017