Level 8 Level 10
Level 9

Corpus Linguistics


35 words 0 ignored

Ready to learn       Ready to review

Ignore words

Check the boxes below to ignore/unignore words, then click save at the bottom. Ignored words will never appear in any learning session.

All None

Ignore?
alignment
linking (associating) corresponding elements of source and target texts in a parallel corpus; can be done automatically or semi-automatically
tag
a piece of code attached to words in a text representing some feature relating to that word, or the physical markup of an element
concordance
a list of occurrences of a word or set of words shown in context
KWIC (key word in context)
a type of concordance where a word is shown within x words of context, centred in the middle of the page
KWAL (key word and line)
a type of concordance which allows one or more lines of context either side of the key word
span
measurement (in words) of the co-text appearing with the word selected for study (e.g. -4, +4)
chunking
roughly dividing sentences into non-overlapping segments
disambiguation
eliminating ambiguity by choosing a specific tag (code) from available options
encoding
representing textual and linguistic data (corpus annotations, tags) in a certain format, usually standardized
SGML (Standard Generalized Markup Language)
an internationally recognized text encoding standard widely used in corpus processing
parsing
assigning the syntactic structure to a text, a common form of corpus annotation
treebank
a parsed corpus
full parsing
a type of parsing that tries to provide the most detailed sentence structure possible
skeleton parsing
a less detailed approach to parsing that ignores finer points of structure
corpus (pl. corpora)
any body of text(s), especially machine readable ones
parallel corpus
(a.k.a. aligned corpus, translation corpus) a corpus containing different language versions of the same texts
comparable corpus
a number of corpora in each language that follow the same compositional pattern
monitor corpus
a growing and consistently structured collection of texts used mainly in lexicography to reflect language change
monolingual corpus
a corpus of texts in a single language
multilingual corpus
collections of different textual corpora in different languages (a collection of individual monolingual corpora)
unannotated corpus
a corpus that exists as raw plain text
opportunistic corpus
a corpus that may be in many ways deficient (raw, unannotated, incomplete) but is otherwise cheap and easily available
frequency list
a list of lexical items ordered by frequency count
lexicon
dictionary: a collection of words and related information
CAT (computer-aided translation)
computer systems and software that make translation more effective and reliable to human users
authenticity
the quality that characterizes naturally occurring corpus data
collocation
the characteristic co-occurrence of patterns of words
error tagging
assigning codes to indicate the types of errors occurring in a learner corpus
metadata
data about data, typically contextual information of corpus samples (where they came from)
sorting
arranging items in a given order
wildcard
a special character (often * or ?) used to represent any character for searching or matching
corpus analysis
statistical probing, manipulating and generalizing from the corpus dataset
keyword
a significantly frequent (or infrequent) word selected for study
annotation
tagging texts with various forms of information (phonetic, prosodic, syntactic, semantic, pragmatic, etc.)
regular expression
a search term including wildcards used for complex searches