Trusted #002 - Finding Common Ground on AI Safety

Apr 11, 2023

Introduction

In my previous column, I laid out what I see as the four “camps” or “groups” of the AI existential risk debate.

The Existentialists, who think that existential risk from AI is a serious problem.
The Ethicists, who think that existential risk is less important that the harms/risks from AI right now or in the short-term future.
The Pragmatists, who think that other long-term risks are more important than existential risk.
The Futurists, who think that existential risk is ludicrous, and we should be focusing on AI’s benefits.

I lean to the Pragmatist point of view, but I'd like to try to find a common ground between these perspectives. Unfortunately, a lot of the debate on this issue is happening in bad faith. I don’t expect people involved here to fully agree, but it would help if they could accept some ground rules. I’ve thought a lot about this and come up with a list of three.

Common-Sense Statement #1: AI Safety is a real problem.

A large fraction of those currently working in the field believe that AI Safety "is" a problem (with some notable exceptions). The Stanford 2023 AI Index includes a survey of 480 NLP researchers from 2022, of which 73% said AI could “cause revolutionary social change,” and 36% fully or partially agreed that AI could cause a “nuclear-level catastrophe.” Katja Grace also conducted a 2022 survey of those who published papers at ML/NLP conferences, and found that 48% of respondents thought that there was a 10% chance of an extremely bad outcome from AI.

Generally speaking, if a lot of experts all believe that something bad could happen, we should probably listen. It's okay to think that maybe they're exaggerating the scope of the problem, but to dismiss it out of hand, as many of the Futurists do, seems foolish. (Terminology note - I prefer “AI Safety” instead of “AI Alignment” as I think that framing is more conducive to discussions, especially when introducing the concept to a less-engaged audience.)

Common-Sense Statement #2: Stopping capabilities development is infeasible.

There is no scientific or political consensus that AI capabilities research is dangerous, and it is incredibly unlikely that one will emerge in the near future. Throughout history, humans have shown no ability to be able to restrict development of dangerous technologies. (Excepting weapons of mass destruction, which is the exception that proves the rule.) It will likely take an actual catastrophic event to persuade the public that catastrophic events are possible…and the public has a short memory. This is also completely leaving out that AI is potentially very economically beneficial, which will make it even more difficult to restrict.

Many Existentialists disagree with this in nonproductive ways. Hyperbolic Twitter (but I repeat myself) statements like "If you go to work at OpenAI, you are contributing to the death of humanity" worsen the problem. All of the three main AI companies were founded on safety principles, and have actively engaged safety teams. Arguing that they’re ineffective is fair game, but dismissing them out of hand is also foolish. It’s very possible that a capabilities engineer ends up solving the AI safety problem by accident - as long as the well isn’t poisoned! (For a good summary of this debate, see Scott Alexander's post.)

Common-Sense Statement #3: Public support is necessary for government action on AI Safety.

The overall public view on AI right now is neutral-to-negative, with a lot of "unsure." Good data here is hard to find, but this 2022 study from Pew Research breaks it down at 18% favorable, 45% neutral, and 37% unfavorable, with the top 5 concerns as “loss of human jobs” (19%), “surveillance, hacking, and privacy” (16%), lack of human connection (12%), “AI will get too powerful” (8%), and “people misusing AI” (8%).

The issue isn’t top-of-mind for anyone except a small slice of people right now, but if AI progresses in capability like it appears to, voters will take notice. If your proposed regulation doesn’t also address the “jobs problem” and the “surveillance problem”, it won’t survive to address issues like AI safety. If never solved, the danger is AI might eventually get regulated to extinction, like American nuclear power. (While that might please some Existentialists, I think it's clear that the end result would be a shift in AI capabilities development to unregulated countries, which is worse.) I would really love to see more ideas here. For example, I really like this paper fleshing out a potential mechanism for how a government could monitor compliance. I don’t know enough to agree or disagree, but I’d much rather have discussions on those grounds.

The “Trusted AI” Common Ground

So how can we work on these objectives in a united fashion? In my opinion, the core thread linking all the groups is "reliability."

The Existentialists want reliable alignment, so they can ensure that more-developed systems aren’t secretly plotting to kill us (either deliberately or inadvertently). The Ethicists want reliable harm-prevention systems built into current AI systems. The Pragmatists and Futurists want AI systems to work well, to either help with today’s problems or solve tomorrow’s problems. By reframing the problem this way, all parties are able to cooperate, and shifts the dynamic away from a (business jargon incoming) win-lose situation to a win-win situation.

It's almost like we need AI that is...Trusted.😎

Graphic generated by the TikZ package of LaTeX, LaTeX code generated by GPT-4 (I’ve never used LaTeX before, this was an experiment)

I’m not sure if this grouping helps or not, but I think it’s a starting point to begin a discussion with leaders who want to do the right things but aren’t sure where to begin (and who also have to deal with a ton of single-issue lobbyists trying to frame the problem around their specific issue.). Focus on building Trusted AI…and many of those downstream issues solve themselves.

I’m going to keep discussing this as it develops. As they said in the Open Letter, “AI research and development should be refocused on making today’s powerful, state-of-the-art systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal.” That sounds great! What’s not so great is the paragraph immediately following, which calls for policymakers to do a ton of things, many of which are not currently technically possible. As a government employee, I really don’t want to contribute to bad policy or law, so I’m really hopeful that a consensus can be reached which will minimize the chance of that happening.

(With thanks to Leopold Aschenbrenner, who posted a great series of blog posts that helped crystallize my thinking.)

Standard disclaimer: All views presented are those of the author and do not represent the views of the U.S. government or any of its components.

Trusted