|
We have what is perhaps the most complete and accurate list of word frequency for American English. The data comes from the Corpus of American English. This corpus:
In other words, this is the only wordlist that is based on a large corpus of American English that has texts from many different genres. This frequency data can be used for many different purposes:
To see a sample frequency listing of the top 20,000 words, click on one of the following: text file, Excel. You can also see example that show frequency information from each of the five genres (spoken, fiction, popular magazines, newspapers, and academic journals), or the frequency information for each word form of each lemma. In addition, I can create pretty much any other format that you want (such as this hierarchical list, which has lemmas grouped without POS, then by POS, then by word form for a given POS). Interested users are able to purchase these frequency lists. The one complication, however, is that in 2009 I will be publishing a frequency dictionary of the top 5000 lemmas of English, based on this list. As a result, we need to control the distribution of the frequency data before late 2009 (so that the data is not used for a product that might compete with this frequency dictionary), and users will therefore need to sign a non-disclosure agreement. Until the frequency dictionary is published in late 2009, the prices are as follows. If you are interested in purchasing one of these lists or if you have other questions, please contact us.
|
||||||||||||||||||||