Study Notes

三

三

↑

Introduction

Chanming
Mar 6, 2016

An Introduction to Natural Language Processing, its histories, techniques and applications.

History

1950s

“Computing Machinery and Intelligence” - Alan Turing

Turing test: measure machine intelligence via a conversational test

“Syntactic Structures”, Noam Chomsky

Formal language theory: uses algebra and set theory to define formal languages as sequences of symbols
Colourless green ideas sleep furiously
- Sentence doesn’t make sense
- But its grammar seems fine
- Highlights the difference between semantics (meaning) and syntax (sentence structure)

1960-1970s

Symbolic paradigm
- Generative grammar
- Discover a system of rules that generates grammatical sentences
  - Parsing algorithms
Stochastic paradigm
- Bayesian method for optical character recognition and authorship attribution
First online corpus: Brown corpus of American English
- 1 million words, 500 documents from different genres (news, novels, etc)

1970-1980s

Stochastic paradigm
- Hidden Markov models, noisy channel decoding
- Speech recognition and synthesis
Logic-based paradigm
- More grammar systems (e.g. Lexical functional Grammar)
Natural language understanding
- Winograd’s SHRDLU
- Robot embedded in a toy blocks world
- Program takes natural language commands (move the red block to the left of the blue block)
- Motivates the field to study semantics and discourse

1980-1990s

Finite-state machines
- Phonology, morphology and syntax
Return of empiricism
- Probabilistic models developed by IBM for speech recognition
- Inspired other data-driven approaches on part-ofspeech tagging, parsing, and semantics
- Empirical evaluation based on held-out data, quantitative metrics, and comparison with state-ofthe-art

1990-2000s

Better computational power
Gradual lessening of the dominance of Chomskyan theories of linguistics
More language corpora developed
- Penn Treebank, PropBank, RSTBank, etc
- Corpora with various forms of syntactic, semantic and discourse annotations
Better models adapted from the machine learning community: support vector machines, logistic regression

2000s

Emergence of very deep neural networks (i.e. networks with many many layers)
Started from the computer vision community for image classification
Advantage: uses raw data as input (e.g. just words and documents), without the need to develop hand-engineered features
Computationally expensive: relies on GPU to scale for large models and training data
Contributed to the AI wave we now experience:
- Home assistants and chatbot