Introduction

Chanming
Mar 6, 2016

An Introduction to Natural Language Processing, its histories, techniques and applications.

History

1950s

“Computing Machinery and Intelligence” - Alan Turing

  • Turing test: measure machine intelligence via a conversational test

“Syntactic Structures”, Noam Chomsky

  • Formal language theory: uses algebra and set theory to define formal languages as sequences of symbols
  • Colourless green ideas sleep furiously
    • Sentence doesn’t make sense
    • But its grammar seems fine
    • Highlights the difference between semantics (meaning) and syntax (sentence structure)

1960-1970s

  • Symbolic paradigm
    • Generative grammar
    • Discover a system of rules that generates grammatical sentences
      • Parsing algorithms
  • Stochastic paradigm
    • Bayesian method for optical character recognition and authorship attribution
  • First online corpus: Brown corpus of American English
    • 1 million words, 500 documents from different genres (news, novels, etc)

1970-1980s

  • Stochastic paradigm
    • Hidden Markov models, noisy channel decoding
    • Speech recognition and synthesis
  • Logic-based paradigm
    • More grammar systems (e.g. Lexical functional Grammar)
  • Natural language understanding
    • Winograd’s SHRDLU
    • Robot embedded in a toy blocks world
    • Program takes natural language commands (move the red block to the left of the blue block)
    • Motivates the field to study semantics and discourse

1980-1990s

  • Finite-state machines
    • Phonology, morphology and syntax
  • Return of empiricism
    • Probabilistic models developed by IBM for speech recognition
    • Inspired other data-driven approaches on part-ofspeech tagging, parsing, and semantics
    • Empirical evaluation based on held-out data, quantitative metrics, and comparison with state-ofthe-art

1990-2000s

  • Better computational power
  • Gradual lessening of the dominance of Chomskyan theories of linguistics
  • More language corpora developed
    • Penn Treebank, PropBank, RSTBank, etc
    • Corpora with various forms of syntactic, semantic and discourse annotations
  • Better models adapted from the machine learning community: support vector machines, logistic regression

2000s

  • Emergence of very deep neural networks (i.e. networks with many many layers)
  • Started from the computer vision community for image classification
  • Advantage: uses raw data as input (e.g. just words and documents), without the need to develop hand-engineered features
  • Computationally expensive: relies on GPU to scale for large models and training data
  • Contributed to the AI wave we now experience:
    • Home assistants and chatbot