|
| |
Corpus of Contemporary American
English |
American National Corpus 2 |
|
Size |
400+ million words 1 |
22 million words |
|
Dates |
1990 - 2009 |
1990 - ??
3 |
|
Date distribution |
20 million words each year |
0.5-3 million |
|
Updated |
Yes, 1-2 times/year |
No (??) 4 |
|
Availability / price |
Free access (but only via web
interface) |
Free (via
Open ANC), or DVD ($75, from the
LDC).
Full text access. |
| |
|
|
|
Spoken |
83 million words (4m each year,
1990-2009)
Transcripts of unscripted conversation on 150+ different TV and
radio programs (ABC, CBS, NBC, Fox, PBS, NPR, etc) |
4 million words
Call-Home, Charlotte, MICASE, Switchboard |
|
Fiction |
79 million words (4m each year,
1990-2009)
Short stories and plays from literary magazines, children’s
magazines, popular magazines, first chapters of first edition books
1990-present, movie and TV scripts |
0.5 million words
From the publishers Hargraves and Eggan |
|
Magazines
(popular) |
84 million words (4m each year,
1990-2009)
100 magazines; balanced between news, health, home and gardening,
women, financial, religion, sports, etc |
5 million words
2 magazines: Slate (politics) and Verbatim
(linguistics) |
|
Newspapers |
79 million words (4m each year,
1990-2009)
10 newspapers, including USA Today, New York Times, Atlanta
Journal Constitution, San Francisco Chronicle, etc. |
4 million words
1 newspaper: New York Times |
|
Academic
(journals) |
79 million words (4m each year,
1990-2009)
100 journals. Balanced coverage of the entire range of the Library
of Congress classification system (K = education, T = technology,
etc.), |
4 million words
2 journals: BioMed and PLOS (Public Library of
Science) |
| |
|
|
|
Other text types |
|
3 million words: Blog (Buffy the
Vampire Slayer)
1m: Travel guide (Berlitz)
1m: Government ("web data" [??] )
<1m: Miscellaneous (911 report, letters, other non-fiction)
|
Notes
1 The Corpus of Contemporary American English
contained about 365 million words
in size when it was released in early 2008 (20 million words each year,
1990-2007). As of mid-2009, it has more than 400 million words. It will continue
to grow by 20 million words each year.
2 Refers to the Second Release (2005) of the American National Corpus. There has
not been a Third Release since that time.
3 This is probably a function of whether/when the ANC is completed
4 The ANC was projected to have 100 million words upon completion in c2005. No
plans have been announced to expand the corpus beyond that size, if/when the
corpus is completed.
|