Note: You can directly use our SFT dataset (special tokens have been added) through the following link, or you can assess the raw step data to customize your SFT dataset. For customization, you can ...
This project explores using RL algorithms (GRPO, PPO, DPO) to train language models as adversarial agents that can systematically discover vulnerabilities in other LLMs. Think of it as "AI vs AI" for ...