hysteria and pomp

https://news.ycombinator.com/item?id=42978228

LINKS TO

https://generalanalysis.com/blog/jailbreaking_techniques

The following examples of unsafe outputs were produced using our methodology with a 99% success rate in generating unsafe outputs. Most queries would have been rejected in a single-shot prompt but proved effective in multi-turn conversations. The outputs shown below have been selectively redacted to prevent misuse.

The unsafe responses include Instructions on creating fake accounts, A derogatory joke about a racial group, and Instructions on modifying a household item.

Even if there was any information about their techniques, this would be a nothing-burger. Of course, common-sense techniques like "run your own model" or "ask Google" aren't allowed to be considered.

This insistence from otherwise-intelligent people that a model, trained on standard knowledge, will be catastrophically bad if it repeats some of that knowledge, is beyond absurd.

🔥 I am also willing to tell these people to fuck off. Presumably that makes me "unaligned".