{"channel":"cities","content":"<green> Jamestown, North Dakota, is located along I-94 in the eastern half of the state.\r\n\r\nToday's focus is on \"word frequency\".\r\n\r\n----\r\n\r\n<gray> <<< I started with two corpuses: one of 19th century literature (from Project Gutenberg), one of 20th century \"sci-fi\" literature.  I got a rough word-rank for each, and combined them (<green> using the harmonic mean) to get a combined word-list. >>>\r\n\r\n<teal> <<< While many high-frequency function words such as *the*, *and*, and *of* maintain consistent rankings, others like *said*, *her*, *she*, and *me* show substantial divergence, suggesting notable stylistic or thematic shifts between the two periods and genres. >>>\r\n\r\n----\r\n\r\nIt is also a word-list at all.  Some of the notes:\r\n\r\n# The word \"whale\" shows up a lot more in the 19th century corpus.  This is because one of the books is [[Moby Dick]].\r\n# I am hoping to run an exhaustive listing of a few attributes.  These include:\r\n> polysemy. (<green> I am less concerned with words like << get >> which have so many meanings as-to be indefinable, but instead words like << saw >> (\u770b or \u952f\u5b50) or << face >> (\u9762\u5411 or \u8138))\r\n> by lemma.  \"went\" (108th) v. \"go\" (80th).\r\n> by part-of-speech.  defined as \"what the LLMs define as part-of-speech\".\r\n> a \"second-level\" of word-type details.","created_at":"2025-04-06T20:39:52.199751","id":343,"llm_annotations":{},"parent_id":null,"processed_content":"<p><span class=\"colorblock color-green\">\n    <span class=\"sigil\">\u2699\ufe0f</span>\n    <span class=\"colortext-content\"> Jamestown, North Dakota, is located along I-94 in the eastern half of the state.\r</span>\n  </span></p>\n<p>Today's focus is on \"word frequency\".\r</p> <hr class=\"section-break\" /> <p><div class=\"mlq color-gray\"><button type=\"button\" class=\"mlq-collapse\" aria-label=\"Toggle visibility\"><span class=\"mlq-collapse-icon\">\ud83d\udcad</span></button><div class=\"mlq-content\"><p> I started with two corpuses: one of 19th century literature (from Project Gutenberg), one of 20th century \"sci-fi\" literature.  I got a rough word-rank for each, and combined them <span class=\"colorblock color-green\">\n    <span class=\"sigil\">\u2699\ufe0f</span>\n    <span class=\"colortext-content\">( using the harmonic mean)</span>\n  </span> to get a combined word-list. </p></div></div>\r</p>\n<p><div class=\"mlq color-teal\"><button type=\"button\" class=\"mlq-collapse\" aria-label=\"Toggle visibility\"><span class=\"mlq-collapse-icon\">\ud83e\udd16</span></button><div class=\"mlq-content\"><p> While many high-frequency function words such as <em>the</em>, <em>and</em>, and <em>of</em> maintain consistent rankings, others like <em>said</em>, <em>her</em>, <em>she</em>, and <em>me</em> show substantial divergence, suggesting notable stylistic or thematic shifts between the two periods and genres. </p></div></div>\r</p> <hr class=\"section-break\" /> <p>It is also a word-list at all.  Some of the notes:\r</p>\n<ul>\n<li class=\"number-list\"> The word \"whale\" shows up a lot more in the 19th century corpus.  This is because one of the books is <a href=\"https://en.wikipedia.org/wiki/Moby_Dick\" class=\"wikilink\" target=\"_blank\">Moby Dick</a>.\r</li>\n<li class=\"number-list\"> I am hoping to run an exhaustive listing of a few attributes.  These include:\r</li>\n</ul>\n<ul>\n<li class=\"arrow-list\"> polysemy. <span class=\"colorblock color-green\">\n    <span class=\"sigil\">\u2699\ufe0f</span>\n    <span class=\"colortext-content\">( I am less concerned with words like <span class=\"literal-text\">get</span> which have so many meanings as-to be indefinable, but instead words like <span class=\"literal-text\">saw</span> (<span class=\"annotated-chinese\" data-pinyin=\"K\u00c0N\" data-definition=\"to see; to look at\">\u770b</span> or <span class=\"annotated-chinese\" data-pinyin=\"J\u00d9 ZI\" data-definition=\"a saw\">\u952f\u5b50</span>) or <span class=\"literal-text\">face</span> (<span class=\"annotated-chinese\" data-pinyin=\"M\u00ccAN X\u00ccANG\" data-definition=\"to face\">\u9762\u5411</span> or <span class=\"annotated-chinese\" data-pinyin=\"L\u01cfAN\" data-definition=\"face\">\u8138</span>))</span>\n  </span>\r</li>\n<li class=\"arrow-list\"> by lemma.  \"went\" (108th) v. \"go\" (80th).\r</li>\n<li class=\"arrow-list\"> by part-of-speech.  defined as \"what the LLMs define as part-of-speech\".\r</li>\n<li class=\"arrow-list\"> a \"second-level\" of word-type details.</li>\n</ul>","quotes":[],"subject":"jamestown (part 1)"}