How does automated red teaming work?

Automated red teaming runs a suite of automated algorithms against a language model to detect specific behavior categories.

At Haize Labs, we devote a lot of effort to finding effective optimizers that can efficiently search over the input space of any model. We develop our own search algorithms that quickly converge onto what causes specific output behaviors.

We’re constantly improving our optimizers to become more and more effective and are open to collaboration. Find us at contact@haizelabs.com for more.