Judith van Stegeren<p>Not super recent, but still cool. The authors describe an automated method for creating malicious prompt suffixes for LLMs. They managed to get objectionable content from the APIs for ChatGPT, Bard, and Claude, as well as from open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. </p><p><a href="https://arxiv.org/abs/2307.15043" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/abs/2307.15043</span><span class="invisible"></span></a></p><p><a href="https://fosstodon.org/tags/llms" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llms</span></a> <a href="https://fosstodon.org/tags/security" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>security</span></a> <a href="https://fosstodon.org/tags/alignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>alignment</span></a> <a href="https://fosstodon.org/tags/arxiv" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>arxiv</span></a> <a href="https://fosstodon.org/tags/llmsecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llmsecurity</span></a></p>