Fast Lingustic Information Retrieval from Big Data -- Introducing writing tools Linggle and WriteAhead
Speaker: Jason Chang, National Tsing Hua University
Location: Warren Weaver Hall 102
Date: January 22, 2016, 11:30 a.m.
Host: Chee Yap
The past decade has witnessed the emergence of corpus-based statistical methods for automatically rating essays and providing corrective feedback. This has the potential to change the landscape of language learning and testing, as well as the future of writing. However, the goal of fully automatic, high-quality Grammatical Error Correction (GEC) is still elusive. The emergence of open big data, coupled with linguistic theory and cloud computing, could open the door to new ways to attack this problem. In this talk, I will describe our on-going work on providing easy access to big data for Computer Assisted Writing. As two example applications of this line of research, I will demonstrate Linggle, a Web-scaled linguistic search engine, as well as WriteAhead, an interactive writing system, both aimed at helping learners during writing. Linggle enhances Google’s Web 1T Ngram with part-of-speech information, as well as exhaustive query inversion to support linguistic search and lightning-fast retrieval of recurring phrases, while WriteAhead extracts and displays grammar patterns from scholar big data with an attempt to help learners improve their writing fluency and accuracy. Both systems are at the forefront of corpus use in their own right: Linggle is the only system that supports versatile linguistic queries on datasets with size approaching Google’s data (Fletcher, 2012), while WriteAhead addresses the lack of methods for giving feedback for partially written sentences, so aptly pointed out by Hearst (2015).
The speaker will distribute copies of the Linggle 2016 calender at this talk.
Professor Chang received his PhD from Courant in 1986 under Chee Yap.
As a professor of computational linguistics at National Tsing Hua University, Taiwan, Prof. Jason S. Chang’s recent research focuses on the design of software systems for addressing the pressing needs of turning Big Data into cloud-based educational services for second language learners. Collaborating with humanity researchers, he has developed several CALL tools, notably, TotalRecall (bilingual concordance), Linggle (Web-scaled linguistic search engine), and WriteAhead (interactive writing environment).
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.