wordlists
The Harvard Sentences ⚙️ ( https://www.cs.columbia.edu/~hgs/audio/harvard.html ) contains about 1880 distinct words, across the 720 sentences. 💡 ( My script split the word "don't" into don and t.)
It is statistically implausible just how often the word "the" is used; 746 times. The next most common words are less common ⚙️ ( a at 212, of at 132, to at 123) .
While these are useful sample sentences for certain purposes 💡 ( such as the "what is the verb in this sentence" test for 1B size LLMs) , it is too small and idiosyncratic to stand up a useful word-bank.