[ COMPARE ARCHITECTURE ]


[ COMPARE COCA TO OTHER CORPORA: BNC, ANC, BOE, OEC ]

The Corpus of Contemporary American English (COCA) offers a balance of availability, size, genres, and currency (how recent it is) that is not found in other corpora, including the American National Corpus (ANC), the British National Corpus (BNC), the Bank of English (BOE), or the Oxford English Corpus (OEC).

In addition, the architecture used for COCA (and the other corpora from corpus.byu.edu) is among the most powerful architectures available for online corpora -- at least as scalable and feature-rich as Corpus Workbench (including its incarnation in Sketch Engine and BNCweb), as well as other architectures like VISL and PIE (more information...).

The chart below provides a summary of the features of the different corpora, and detailed discussions can be found via the links at the top of each column.
 
Feature COCA BNC ANC BOE OEC
Availability Free / web Free / web Free $1150 year (Limited) 1
           
Size (millions of words) 400 100 22 455 1,898
Time span 1990-2009 1970s-1993 2000-2005 ?2 1970s-2005 2000-2006
Number of words of text being added each year 20 million 0 0 0 0
Can be used as a monitor corpus to see ongoing changes in English Yes No No (Limited) (Limited)
           
Wide range of genres: spoken, fiction, popular magazine, newspaper, academic Yes Yes No (Yes) (Yes)
Size of spoken (millions of words) 83 10 4 62 82
Spoken = conversational, unscripted ? (Mostly:
notes)
Yes Yes (Some) (Some)
           
Dialect American British American Br / Am + Am / Br +
           
Interface BYU BYU
Sketch Engine
BNCweb
VISL
PIE
Open ANC
(DVD)
Sketch Engine Sketch Engine

Notes:

1. The OEC is generally available only to researchers at Oxford University Press, although access is occasionally given to selected researchers outside of the OUP.
2. In 2005 the corpus had 22 million words, and the current site still mentions 22 million words, but some people have claimed that new texts have been added since 2005. We'd appreciate details from anyone who knows about the status on this.