Akbir Khan: Compute for empirical work on AI Safety Via Debate
Amount: $55,000.00
Award date: October 1, 2023
Focus area: Technical research
  • Khan is an independent researcher with a PhD in Computer Science from University College London.

  • Debate between models could allow human overseers to more easily verify the output of smarter AI systems. By distilling disagreements down to key points, debate formats enable monitors to evaluate a small set of cruxes, rather than the model’s full reasoning. 

  • If effective protocols can be found, debate offers a promising approach to training aligned agents amenable to human oversight.

  • Khan is one of the first people to get positive results from empirical debate work with LLMs.

Outcomes: This grant was made within the last 1.5 years and does not have outcomes yet

Note: this grant was made by the same grantmaking team under the Long-Term Future Fund. Read more about the AI Risk Mitigation Fund Team here.