Introduction
An Introduction to Natural Language Processing, its histories, techniques and applications.
History
1950s
“Computing Machinery and Intelligence” - Alan Turing
- Turing test: measure machine intelligence via a conversational test
“Syntactic Structures”, Noam Chomsky
- Formal language theory: uses algebra and set theory to define formal languages as sequences of symbols
- Colourless green ideas sleep furiously
- Sentence doesn’t make sense
- But its grammar seems fine
- Highlights the difference between semantics (meaning) and syntax (sentence structure)
1960-1970s
- Symbolic paradigm
- Generative grammar
- Discover a system of rules that generates grammatical sentences
- Parsing algorithms
- Stochastic paradigm
- Bayesian method for optical character recognition and authorship attribution
- First online corpus: Brown corpus of American English
- 1 million words, 500 documents from different genres (news, novels, etc)
1970-1980s
- Stochastic paradigm
- Hidden Markov models, noisy channel decoding
- Speech recognition and synthesis
- Logic-based paradigm
- More grammar systems (e.g. Lexical functional Grammar)
- Natural language understanding
- Winograd’s SHRDLU
- Robot embedded in a toy blocks world
- Program takes natural language commands (move the red block to the left of the blue block)
- Motivates the field to study semantics and discourse
1980-1990s
- Finite-state machines
- Phonology, morphology and syntax
- Return of empiricism
- Probabilistic models developed by IBM for speech recognition
- Inspired other data-driven approaches on part-ofspeech tagging, parsing, and semantics
- Empirical evaluation based on held-out data, quantitative metrics, and comparison with state-ofthe-art
1990-2000s
- Better computational power
- Gradual lessening of the dominance of Chomskyan theories of linguistics
- More language corpora developed
- Penn Treebank, PropBank, RSTBank, etc
- Corpora with various forms of syntactic, semantic and discourse annotations
- Better models adapted from the machine learning community: support vector machines, logistic regression
2000s
- Emergence of very deep neural networks (i.e. networks with many many layers)
- Started from the computer vision community for image classification
- Advantage: uses raw data as input (e.g. just words and documents), without the need to develop hand-engineered features
- Computationally expensive: relies on GPU to scale for large models and training data
- Contributed to the AI wave we now experience:
- Home assistants and chatbot