Treebanks and their uses

Treebanks are collections of syntactically annotated sentences. These syntactically annotated corpora consist of a set of sentences which have been manually (or semi-automatically) assigned parse trees with at least syntactic and morphosyntactic annotation.

The annotation in a treebank typically consists of at least part-of-speech (i.e. word class) and syntactic levels. In addition, the word-level annotation may include lemma information and morphological descriptions. Some treebanks have an additional annotation level for semantics.

Treebanks have several applications in Linguistics, Computational Linguistics and Natural Language Processing. Linguists use them among other things for searching examples and counter-examples for a hypothesis or theory. Psycholinguists can use treebanks for counting frequencies of specific sentence construction types. Applications in Computational Linguistics include development and evaluation of text classification, parsing and machine translation systems.