AI Risk Mitigation Fund

Akbir Khan: Compute for empirical work on AI Safety Via Debate

Amount: $55,000.00

Award date: October 1, 2023

Focus area: Technical research

Khan is an independent researcher with a PhD in Computer Science from University College London.
Debate between models could allow human overseers to more easily verify the output of smarter AI systems. By distilling disagreements down to key points, debate formats enable monitors to evaluate a small set of cruxes, rather than the model’s full reasoning.
If effective protocols can be found, debate offers a promising approach to training aligned agents amenable to human oversight.
Khan is one of the first people to get positive results from empirical debate work with LLMs.

Outcomes: This grant was made within the last 1.5 years and does not have outcomes yet

Note: this grant was made by the same grantmaking team under the Long-Term Future Fund. Read more about the AI Risk Mitigation Fund Team here.