Chan was a PhD student in machine learning at MILA (a top academic machine learning institute).
The visit involved collaborating on evaluating non-myopia in language models and RLHF (Reinforcement Learning from Human Feedback) systems. Non-myopia refers to an AI system’s ability to trade short-term gains for long-term gains, which in future systems could increase extreme risks (for example, if such systems engaged in long-term deceptive plans).
Chan aims to provide empirical examples of non-myopia emerging in LMs/RLHF systems. This could help to build consensus around extreme risks from AI within academic communities.
Backing talented researchers such as Chan to collaborate with leading alignment academics can accelerate progress on key technical problems for AI safety, as well as strengthening the links between AI hubs.
Outcomes: Chan published a number of papers out of this research exploration: