Alan Chan: 4-month stipend for a research visit to collaborate with academics in Cambridge on evaluating non-myopia in language models and RLHF systems
Amount: $12,321.00
Award date: October 1, 2022
Focus area: Building research capacity
  • Chan was a PhD student in machine learning at MILA (a top academic machine learning institute).

  • The visit involved collaborating on evaluating non-myopia in language models and RLHF (Reinforcement Learning from Human Feedback) systems. Non-myopia refers to an AI system’s ability to trade short-term gains for long-term gains, which in future systems could increase extreme risks (for example, if such systems engaged in long-term deceptive plans).

  • Chan aims to provide empirical examples of non-myopia emerging in LMs/RLHF systems. This could help to build consensus around extreme risks from AI within academic communities.

  • Backing talented researchers such as Chan to collaborate with leading alignment academics can accelerate progress on key technical problems for AI safety, as well as strengthening the links between AI hubs.

Outcomes: Chan published a number of papers out of this research exploration:

Note: this grant was made by the same grantmaking team under the Long-Term Future Fund. Read more about the AI Risk Mitigation Fund Team here.