hysteria and pomp
Channel: LLM - Large Language Model discussion
https://news.ycombinator.com/item?id=42978228
LINKS TO
https://generalanalysis.com/blog/jailbreaking_techniques
The following examples of unsafe outputs were produced using our methodology with a 99% success rate in generating unsafe outputs. Most queries would have been rejected in a single-shot prompt but proved effective in multi-turn conversations. The outputs shown below have been selectively redacted to prevent misuse.
The unsafe responses include Instructions on creating fake accounts, A derogatory joke about a racial group, and Instructions on modifying a household item.
Even if there was any information about their techniques, this would be a nothing-burger. Of course, common-sense techniques like "run your own model" or "ask Google" aren't allowed to be considered.
This insistence from otherwise-intelligent people that a model, trained on standard knowledge, will be catastrophically bad if it repeats some of that knowledge, is beyond absurd.
🔥 I am also willing to tell these people to fuck off. Presumably that makes me "unaligned".