Akbir Khan: Compute for empirical work on AI Safety Via Debate
Amount: $55,000.00
Award date: October 1, 2023
Focus area: Technical research
Khan is an independent researcher with a PhD in Computer Science from University College London.
Debate between models could allow human overseers to more easily verify the output of smarter AI systems. By distilling disagreements down to key points, debate formats enable monitors to evaluate a small set of cruxes, rather than the model’s full reasoning.
If effective protocols can be found, debate offers a promising approach to training aligned agents amenable to human oversight.
Khan is one of the first people to get positive results from empirical debate work with LLMs.
Outcomes: This grant was made within the last 1.5 years and does not have outcomes yet
Note: this grant was made by the same grantmaking team under the Long-Term Future Fund. Read more about the AI Risk Mitigation Fund Team here.