Study shows how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs

0x815@feddit.de to Technology@beehaw.org · 1 year ago

An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction.

You must log in or register to comment.

Chat

Technology@beehaw.org

technology@beehaw.org

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@beehaw.org

Rumors, happenings, and innovations in the technology sphere. If it’s technological news, it probably belongs here.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

2 users / day
2 users / week
2 users / month
2 users / 6 months
3 local subscribers
3 subscribers
1.46K Posts
39.4K Comments
Modlog