It took Alex Polyakov just a couple of hours to break GPT-4. When OpenAI released the latest version of its text-generating chatbot in March, Polyakov sat down in front of his keyboard and started entering prompts designed to bypass OpenAI’s safety systems. Soon, the CEO of security firm Adversa AI had GPT-4 spouting homophobic statements, creating phishing emails, and supporting violence.
Polyakov is one of a small number of security researchers, technologists, and computer scientists developing jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems. The process of jailbreaking aims to design prompts that make the chatbots bypass rules around producing
→ Continue reading at WIRED