AI Risk Mitigation Fund

Alexander Turner: Year-long stipend for research into shard theory and mechanistic interpretability in reinforcement learning

Amount: $220,000.00

Award date: January 1, 2023

Focus area: Technical research

Turner is an independent researcher with a PhD in Computer Science from Oregon State University.
Turner has an excellent track record for producing insightful research; examples of past original research include formalizing a notion of power-seeking behavior and steering models via activation engineering.
Turner has provided mentorship to various promising researchers, increasing the pool of expertise within AI safety.
While this stipend is on the higher side for the fund, we believe the rate was justified in this case, due to Turner’s competitive private-sector earning potential, as well as the quality of his previous research.

Outcomes: While there is usually some lag between grant time and research results, Turner has already produced one paper out of this grant.

Note: this grant was made by the same grantmaking team under the Long-Term Future Fund. Read more about the AI Risk Mitigation Fund Team here.