Center for Data Science and Analytics Talk by Daichi Mochihashi

Bayesian Unsupervised Word Segmentation and Beyond
Date & Time: 
Monday, April 17, 2017 - 12:00 to 13:00
Daichi Mochihashi
603, 1555 Century Avenue, Pudong New Area, Shanghai

For NYUSH community RSVP HERE

For non-NYUSH community, please send email to RSVP

Many languages including Japanese and Chinese are written without word boundaries, thus word segmentation is a crucial first step to natural language processing. However, because languages will inevitably contain novel words and expressions that are not covered by any dictionaries, ordinary supervised machine learning methods are incompetent to cover such phenomena. In this talk, I will present a completely unsupervised word segmentation from a Bayesian point of view, which can recognize "words" from raw strings without no human intervention. This language model can be readily applied to any languages, even if they are "alien" languages. I will also show some recent extensions to this model, specifically to recognize motion "words" in robotics using Gaussian processes.

Daichi Mochihashi is an associate professor at the Institute of Statistical Mathematics, Tokyo, Japan. He obtained BS from the University of Tokyo in 1998 and PhD from Nara Institute of Science and Technology in 2005, respectively. His research interest includes statistical natural language processing and machine learning, especially in nonparametric Bayesian statistics.

Professor Ryo Okui will introduce Prof. Daichi Mochihashi. This event is sponsored by Center for Data Science and Analytics.

Location & Details: 

To our visitors

  • RSVP may be required for this event.  Please check event details
  • Visitors will need to present a photo ID at the entrance
  • There is no public parking on campus
  • Entrance only through the South Lobby (1555 Century Avenue) 
  • Taxi card


Metro: Century Avenue Station, Metro Lines 2/4/6/9 Exit 6 in location B

Bus: Century Avenue at Pudian Road, Bus Lines 169/987