[ COMPARE COCA TO THE ANC, BNC, BOE, OEC ]

 

Corpus of Contemporary American English

American National Corpus 2

Size

400+ million words 1

22 million words

Dates

1990 - 2009

1990 - ?? 3

Date distribution

20 million words each year

0.5-3 million

Updated

Yes, 1-2 times/year

No (??) 4

Availability / price

Free access (but only via web interface)

Free (via Open ANC), or DVD ($75, from the LDC). Full text access.

     

Spoken

83 million words (4m each year, 1990-2009)
Transcripts of unscripted conversation on 150+ different TV and radio programs (ABC, CBS, NBC, Fox, PBS, NPR, etc)

4 million words
Call-Home, Charlotte, MICASE, Switchboard

Fiction

79 million words (4m each year, 1990-2009)
Short stories and plays from literary magazines, children’s magazines, popular magazines, first chapters of first edition books 1990-present, movie and TV scripts

0.5 million words
From the publishers Hargraves and Eggan

Magazines
(popular)

84 million words (4m each year, 1990-2009)
100 magazines; balanced between news, health, home and gardening, women, financial, religion, sports, etc

5 million words
2 magazines: Slate (politics) and Verbatim (linguistics)

Newspapers

79 million words (4m each year, 1990-2009)
10 newspapers, including USA Today, New York Times, Atlanta Journal Constitution, San Francisco Chronicle, etc.

4 million words
1 newspaper: New York Times

Academic
(journals)

79 million words (4m each year, 1990-2009)
100 journals. Balanced coverage of the entire range of the Library of Congress classification system (K = education, T = technology, etc.),

4 million words
2 journals: BioMed and PLOS (Public Library of Science)

     

Other text types

 

3 million words: Blog (Buffy the Vampire Slayer)
1m: Travel guide (Berlitz)
1m: Government ("web data" [??] )
<1m: Miscellaneous (911 report, letters, other non-fiction)
 

Notes

1 The Corpus of Contemporary American English contained about 365 million words in size when it was released in early 2008 (20 million words each year, 1990-2007). As of mid-2009, it has more than 400 million words. It will continue to grow by 20 million words each year.
2 Refers to the Second Release (2005) of the American National Corpus. There has not been a Third Release since that time.
3 This is probably a function of whether/when the ANC is completed
4 The ANC was projected to have 100 million words upon completion in c2005. No plans have been announced to expand the corpus beyond that size, if/when the corpus is completed.