Computer Science Colloquium

Arabic Natural Language Processing: Challenges, Solutions, Applications

Nizar Habash, Columbia University

April 10, 2013 11:30AM
Warren Weaver Hall, 1302
251 Mercer Street
New York, NY 10012

Spring 2013 Colloquia Calendar


Dennis Shasha


The Arabic language is spoken by some 300 million people. It also is the language of worship for over 1.5 billion Muslims. In the context of natural language processing, Arabic poses a lot of challenges: Arabic is both morphologically rich and highly ambiguous. It has complex morpho-syntactic agreement rules and a lot of irregular forms. Arabic also has a large number of unstandardized dialectal variants that are as different from Standard Arabic as Romance languages are different from Latin. In this talk, I will present some of my research on addressing these challenges. My overall approach combines the use of linguistic knowledge with data-driven statistical modeling. I will also discuss the results of using the tools and resources that came out of my research in natural language applications, in particular, for machine translation.


Dr. Nizar Habash received his PhD in 2003 from the Computer Science Department, University of Maryland College Park. He is currently a research scientist at the Center for Computational Learning Systems in Columbia University. His research includes work on machine translation, natural language generation, lexical semantics, morphological analysis, generation and disambiguation, syntactic parsing, and computational modeling of Arabic dialects. Nizar recently published the book "Introduction to Arabic Natural Language Processing". Nizar’s website is at

top | contact webmaster