Recent Messages

2025-04-24 21:08:46

https://www.fox7austin.com/news/waymo-driverless-cars-austin-slammed-complaints

The title is Waymo driverless cars in Austin slammed with complaints ... but a more accurate title would continue slammed with unfounded complaints by people who hate technology.

The two complaints in the article are "doesn't run red lights", and "pulls over when you push the pull over now button.

Their real complaint is some form of Google is evil, technology is evil, they are taking good jobs. But they can't say that. So they scour the ends of the earth looking for problematic experiences. And, once they find a few things which look bad if you squint, they get a friendly journalist to write an article about the number of complaints. 💡 ( which is based entirely on the number of people who want to complain, not the severity of the problematic actions)

2025-04-24 20:00:24

so far today: running the "proficiency" benchmarks against GPT-4-1 and Gemini-2.5-flash.


The headline: Google's cheap model can count letters. Gemini was substantially slower than both OpenAI and Anthropic (but, perhaps, that can vary day-to-day). But it got 96% on the infamous count how many "R"s in strawberry metric, and none of the similarly-priced models got above 70%. 💡 ( the only metric it did "bad" on was the IPA one, and that is because the response normalization code is broken)


for pricing ⚙️ ( all prices per million tokens) :

GPT-4-1-nano: 10c IN, 40c OUT

GPT-4-1-mini: 40c IN, 160c OUT

GPT-4o-mini: 30c IN, 120c OUT

Gemini-2.5-flash: 15c IN, 60c OUT

Claude-3-5-haiku: 80c IN, 400c OUT

⚙️ Most of these have (or will have) "cache" discounts of 50-90% for repeated queries with the same long context.

💡 Claude is both the most expensive at this tier, and the lowest-performing. And the least-recently updated.

🔥 presumably, they will have a new model at half the price, next week.

2025-04-24 17:42:07

https://www.md-a.co/p/intellectual-laziness

The collapse of General Electric stands apart. GE was the bluest of the blue-chips: descended from Thomas Edison and J.P. Morgan, it was one of the original twelve components of the Dow in 1896, and grew to become one of the leading technology giants of the early 20th century. After WWII, GE evolved into an industrial behemoth with dominant positions in a dizzying array of electricity-adjacent markets, from jet engines and turbines to light bulbs and home appliances.

In the 1980s, GE ascended to new heights. Jack Welch took the reins as CEO in 1981, and he established GE a major player in media and financial services while reinforcing GE’s position in select attractive industrial markets. For most of the 1990s and 2000s, GE was the most valuable company in America, with a valuation topping out at over $1 trillion in 2023 dollars. While GE had some skeptics and critics at the time, it was typically seen as a corporate paragon, regularly named by Fortune as the most admired company in the world. Welch was regarded as a management guru, and his underlings were routinely poached to become CEOs at other Fortune 500 companies.

And then, a few years ago, it all unraveled in spectacular fashion. Much of the supposed success from the Welch era of the 1980s and 1990s proved to be illusory, the product of temporary tailwinds and aggressive accounting. GE’s fortunes worsened under the reign of Welch’s handpicked successor, Jeff Immelt, who took over in 2001. Immelt struggled to cope with the problems he inherited, which were compounded by the 2008 financial crisis and major missteps of his own. In 2017, when the extent of GE’s problems became clear, GE’s stock nose-dived, and Immelt was pushed out.

GE has been one of the worst performing mega-cap stocks of the modern era. A $1,000 investment in the S&P 500 in 2000 would be worth over $2,700 today (excluding dividends), while a $1,000 investment in GE in 2000 would have dwindled to only $210. Even going all the way back to Welch’s appointment in 1981, the S&P has outperformed GE by a three-to-one margin.

The reasons: style over substance, an inability to distinguish good-luck from a permanent advantage, a failed attempt to do financial services. In short, the intellectual laziness described in the title of the post.

2025-04-22 18:09:16

To write a few paragraphs on a topic, there is (roughly) a four-step process:

  • Choose a topic
  • Create an argument
  • Choose the tone and style
  • Write the words

The machine is, in many situations, better than I am at the task of writing the words. 💡 ( it still struggles with a few tones) It is always about 20 times faster than me.

However, it struggles with the first three tasks.

  • The "agent" framework isn't focused on self-willed agents. This is (probably) a good thing. But "fiduciary" agents aren't happening either yet. These solve some of the problem related to motivation.
  • The machine prefers to create a good-sounding argument to a logically-sound one. 🔥 ( of course, many humans do the same thing) .
  • The machine defaults to a tone that has been over-used to the point of annoyance. It is obsequious and aggressively cheerful. In the context of "a chat-bot for people unfamiliar with the technology", this is a defensible choice. But, most of the time, you need your own tone. 💡 ( It has been long enough that the training data allows "write a tone prompt for this query" kind-of works.)
2025-04-21 19:45:31

Edgeley, North Dakota, is a small rural town in LaMoure County, located in the southeastern part of the state. With a population hovering around 500 people, it's one of many prairie towns that exemplify the broader character of the upper Great Plains—quiet, sparsely populated, and closely tied to agriculture.


https://www.lesswrong.com/posts/bfHDoWLnBH9xR3YAK/ai-2027-is-a-bet-against-amdahl-s-law

Of course the post is right. The various FOOM claims are all bullshit. And Amdahl's Law is one of the reason why. Just because a few things will be a hundred times faster (or a million times faster) doesn't make the whole thing that much faster.

Also, AGI definitions vary so widely, from things that have already happened to things that are impossible, that a "prediction market" is nearly meaningless.


I have seen various commentary related to "Twilight of the Edgelords" ⚙️ ( https://www.astralcodexten.com/p/twilight-of-the-edgelords ) , a piece that I don't have access to.

And, the response I can piece together from the fragments I can see would fall under GUILD LAW. ⚙️ ( additional commentary at https://www.writingruxandrabio.com/p/the-edgelords-were-right-a-response and https://theahura.substack.com/p/contra-scott-and-rux-on-whos-to-blame )


https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/

To make Gemma 3 even more accessible, we are announcing new versions optimized with Quantization-Aware Training (QAT) that dramatically reduces memory requirements while maintaining high quality. This enables you to run powerful models like Gemma 3 27B locally on consumer-grade GPUs like the NVIDIA RTX 3090.

It seems pretty obvious. A majority of the users of open-source models are using quantized models on personal hardware; might as well optimize that use-case. 💡 ( it is less clear that a majority of the CPU cycles are there; but a majority of the people certainly are.)

My next round of updating the Greenland metrics will have to include the gemma3-12b-qat model. 💡 ( or, maybe the 27b. According to Hacker News, gemma3-27b-Q4 only uses ~22Gb (via Ollama) or ~15GB (MLX). On a 24GB machine, this clearly needs the non-Ollama approach.)

And, also, GPT-4.1 . And probably Gemini-2.5 . 💡 ( the goal for these models should be to perform at 100% accuracy.) ⚔️ ( well, actually, a few of the "correct" benchmark answers right now are incorrect.)

2025-04-17 17:04:04

If I had a three-word summary for what I am working on, it is tools for AI.

The term is vague, as all good three-word summaries are.


glenora is the dictionary app. i have enough of an API for the next stage, but it would need a full re-write for productization.

🔥 perhaps I should try a directory next.


Some of these have been tools to help humans use the AI.

2025-04-16 18:57:28

5/5

it has been a drudge to get work done.


the end-goal is within sight. one more round of prompt-tuning, a $1 gpt-4.1-nano run, and some rounds of "LLM consensus checking".

the goal then becomes applications.

  • LLM benchmarks. 🔥 ( use the machine to feed the machine) 💡 ( "which word has this definition" questions will be possible at some point. but not yet.)
  • Elementary education. 💡 ( which words should a 3rd/5th grader know? be studying?) ⚙️ ( so far no useful progress on "how easy/hard is it to spell this word")
  • Second-language learning. 💡 ( a "which is the Chinese for this word in this sentence" app.)
  • Text difficulty. A smarter metric than Flesch-Kincaid.

The "cosine similarity" question of "how similar are these word definitions" is not yet solved. I'm not sure I can solve it.

I can test it; I have several ways of generating embeddings. And (probably) these can include a sentence as context.


I also have no solutions for the "group different word-forms with the same meaning". For jump, jumps, jumped, for example.

This would be a much more substantial problem for more highly-conjugated languages. With English, it is almost avoidable.


Around word 5000, i am seeing raft, yield, algebra, and pizza.

This seems correct enough? "Algebra" is more common in encyclopedic contexts, and "pizza" doesn't show up in the 19th century corpus at all.


But, as far as "exploration" is concerned, I am reaching diminishing returns.

I have one more list of "see if Claude can do this quickly". After that, "glenora" will become an inactive project.

why cyc failed Miscellany
2025-04-13 21:08:52

https://yuxi-liu-wired.github.io/essays/posts/cyc/

The first reason Cyc failed was the cost of the system. A single researcher with a personal allocation of compute in 2025 can do as much as 1000 full-time employees could do in 1980.

🔥 the second reason was that Doug Lenat became a cultist, and cultists never discover anything.

2025-04-13 18:27:23

This system does demonstrate the flaws of some of the smaller models. Gemma3:4b, when I asked it to define artillery, said that one definition was an alternative for artilegia, a (non-existent) word for "fingers or toes".


The next goal is to get a Chinese word-frequency distribution.

This isn't too hard.

  • Download a zhwiki dump.
  • Generate a list of the "top 2500" pages. 💡 ( I have some old code that does a PageRank-like algorithm to find the top pages. For enwiki, the results are quite good; with a few anomalies like Oxford University Press being highly ranked because of citations.) ⚙️ ( it is convenient that most of the Wiki templates are in English.)
  • Use jieba to tokenize the text.
  • Get a word count.
  • "Merge" this with the English lists.

The obvious problem is that words don't match 1-1 between languages. eros, agape, and philia.

The details of these problems are unpredictable at this time.

2025-04-13 15:18:52

i had a dream last night: i had to visit somebody in the hospital. roughly 75 years old; had been in a minor car accident.

what did I bring them? scratch-off lottery tickets.

in the dream, this turned out to be a great thing to bring. the dexterity involved was challenging yet possible. the excitement was noticeable but not overwhelming. once neither of the two tickets won, there was no issue with redeeming the tickets. and at $4, it was no more expensive than a greeting card.


for a younger person of the same competence, lottery tickets may be a trap of addiction.

for the temporarily-impaired senior, perhaps they are a good thing.

🔥 now I just need to try it in real life.

Show Older Messages