在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:first20hours/google-10000-english开源软件地址:https://github.com/first20hours/google-10000-english开源编程语言:开源软件介绍:About This RepoThis repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus. According to the Google Machine Translation Team:
This repo is derived from Peter Norvig's compilation of the 1/3 million most frequent English words. I limited this file to the 10,000 most common words, then removed the appended frequency counts by running this sed command in my text editor:
Special thanks to koseki for de-duplicating the list. Swear-free listsThere are two additional lists which are identical to the original 10,000 word list, but with swear words removed. Swear words were removed based on these lists:
Word length listsThree of the lists (all based on the US english list) are based on word length:
Each list retains the original list sorting (by frequency, decending). UsageThis repo is useful as a corpus for typing training programs. According to analysis of the Oxford English Corpus, the 7,000 most common English lemmas account for approximately 90% of usage, so a 10,000 word training corpus is more than sufficient for practical training applications. To use this list as a training corpus in Amphetype, paste the contents into the "Lesson Generator" tab with the following settings:
In the "Sources" tab, you should see google-10000-english available for training. Set WPM at 10 more than your current average, set accuracy to 98%, and you're set to train. Enjoy! |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论