The case for independent AI safety funding

Major AI labs such as DeepMind, OpenAI, and Anthropic have contributed significantly to AI safety research and have spoken publicly about existential risks from AI. Recently, an industry collaboration announced $10M in funding for AI safety research via an AI Safety Fund, as part of the Frontier Model Forum. Despite these developments, we believe it’s critical to have an independent organization that funds researchers at nonprofits and in academia, for the following reasons:

Incentives

Industry AI labs are incentivized to accelerate AI development for the benefit of investors and other company stakeholders. Insofar as they do focus on risk mitigation, they have incentives to prefer solutions that enhance their financial position, as opposed to those that are best for the world (e.g., incentives for regulatory capture). They may downplay important catastrophic risks, or overestimate their ability to mitigate risks, in order to promote their products in a competitive market environment.

Viewpoint diversity

It is crucial to hear from a diverse range of viewpoints about the risks posed by this transformative technology. Labs substantially influence the public conversation on AI risk. We think it is therefore necessary to fund communication from a range of experts, to ensure that more perspectives are represented. We believe it is especially important to fund some safety researchers outside of the major labs, as a check on these labs: external researchers may be more comfortable voicing dissent with lab orthodoxy or pointing out if the major labs are acting irresponsibly.

Methodological diversity

AI labs prioritize safety research areas that align with their strengths, e.g. scalable oversight with current large models, dangerous capabilities evaluations, and certain forms of interpretability research. For example, the Frontier Model Forum’s AI Safety Fund is focused primarily on red-teaming and dangerous capabilities evaluations. However, industry labs largely neglect other crucial areas of study that form the building blocks for new areas of safety work (e.g. mechanistic interpretability on small models, adversarial robustness, and AI alignment theory). Smaller nonprofits and academia are better suited to explore new methods, some of which have subsequently been adopted at industry labs.

Field-building capacity

While major labs fund established researchers, they don’t focus on educating and upskilling newcomers. Nonprofits and academia may be better suited for education and upskilling. Even after recent capital investments in AI, the AI safety field is a very small proportion of AI research, and funding researchers in academia, nonprofits, and upskilling programs expands the total number of specialists in this crucial area beyond what industry would fund on its own. We’ve seen previous grantees who innovated in AI safety collaborate with or work directly with major labs to put their work into practice.

Help reduce catastrophic risks from advanced AI