Thinking about ChatGPT Security

Imagine the attention these ML systems will attract once it becomes public that big companies/governments/individuals use them for any given task. Suddenly, there will be immense incentive to poison the datasets in subtle but impactful ways. Then we’ll see attacks that are both very direct (compromise the humans overseeing the “safety” of the system) as well as very indirect (massive content farms taking advantage of known weaknesses/limitations to poison the data pool(s)).

Imagine when someone hack’s one of these organization’s systems, then subtly alters the precedence used when ingesting information!

And then there’s figuring out how to inject the right prompts to leverage RCE vulnerabilities!

And the more “capable” they become, the bigger the attack surface!