Jul 21, 2022 18 min read AI Safety

"AI": Think The Trial as much as The Terminator

generated with https://www.craiyon.com/

The other day, I was discussing some ideas about machine learning models and how they could be used for certain applications with some friends. For reasons of anonymity I'm going to change some details, but during this discussion I experienced an interesting dynamic in how people react to the idea of machine learning or "AI" models. At some point the idea of using AutoML^[1] was brought up. Someone suggested that for certain applications, particularly high risk or sensitive ones, it might be a good idea to avoid using the phrase "AutoML", since it could get misinterpreted as suggesting that the developers of the model weren't going to properly evaluate and monitor the system.

At first, I found this silly and kind of annoying. I don't think anyone who has an understanding of the technical meaning of this term would claim that it in any way suggests an absence of all testing or monitoring. Avoiding using the term for this reason seemed like enabling ignorance. Why let people who don't understand make the rules about how we can talk about these things?

But then I thought about it a bit more. This person wasn't saying that not using "AutoML" was an ideal scenario, but rather warning to be aware of how it might come across. Machine learning models are being used for some important and sensitive applications, and they are only going to become more ubiquitous in the future. It's inevitable that people who don't know what "AutoML" means are going to be exposed to the concept when it's used in something that has a big impact on their lives. When that happens, is it really unreasonable from their perspective that they might react without researching what the term means?

Even though it might annoy me a bit, it actually seems pretty reasonable to react that way. It makes sense that when non-technical people hear about some system that they don't understand possibly playing a big role in their lives, the first thing they do isn't to immediately go research the technical details of said system. They aren't experts, knowing the technical details isn't their job! My annoyance was implicitly based on the belief that non-technical people should have some level of trust that technical people know what their doing, but technical people shouldn't just get that trust for no reason. We should get it because we're able to explain stuff and convince everyone else that in fact we do know what we're doing.

I still feel like being worried by the term "AutoML" is based on a misunderstanding. Despite that, I think I've come to the conclusion that the burden is on technical people who are trying to implement or deploy these systems to resolve the misunderstandings, not the other way around.

The AI Safety Debate

I think the dynamic that I experienced in the situation above often plays a role in press coverage and online discussions about the impact of "AI"^[2] on society. For example, a few years ago Scientific American published an article titled "Don't Fear the Terminator", which makes the case that fears of a Sci-Fi AI takeover (as in the Terminator movie series) are largely driven by a misunderstanding of the technology. The article argues that these scenarios are premised on an implicit assumption that "AI" will have a drive to dominate humans, much like humans seem to have an inclination to dominate other species, as well as each other. The misunderstanding is that "AI" won't be shaped by the same forces that shaped human intelligence. Evolution shaped human intelligence to have an inclination to dominance because that inclination conferred a survival advantage in the past. Artificial intelligence, on the other hand, won't be shaped by those forces, and therefore "AI" systems won't have the same inclination.

This kicked off a discussion on social media (reproduced here), including two Turing Award winners (one of whom, Yann LeCunn, is an author of the SA article). The discussion kind of became a debate between two sides. One side, featuring computer scientist Stuart Russell, argues that AI presents a grave threat to humanity. The other side, featuring LeCunn, argues that this claim is overblown. Let's call the first group AI safety pessimists (they are pessimistic that AI can be safe without major effort), and the second group AI safety optimists (they think safe AI is reasonably doable without huge effort in the near/medium term)^[3]. The optimists feel like they are being unfairly accused of being stupid or malicious, like some caricature taken from a work of fiction (thus the Terminator reference). The pessimists feel like they are being unfairly straw-manned, that the comparison to science fiction is a way to ignore their arguments instead of engaging with them.

I think interactions between these two groups share a lot of emotional and social dynamics with the experience I described at the beginning of this post. Just like my initial reaction in my story was to feel annoyed by what I perceived as an unfair accusation that depended on a misunderstanding, I can imagine an optimist saying something like this:

You're literally saying I'm so stupid or malicious that I want to unleash an apocalypse out of a science fiction movie! Stop being so uncharitable! Yes, there are problems that will need to get solved with AI like any new technology, but I'm aware of that and fully support that work. Invoking these world-ending AI takeover scenarios only feeds misinformed fears among the public that are based on emotion and pop-culture rather than science. It shouldn't be my job to fix this. Rather, people should learn to trust that the experts understand the science behind these systems, and should trust that those experts can solve the legitimate problems that do come up.

I'm extremely sympathetic to this reaction. It's basically the same one I had in the "AutoML" conversation. Nevertheless, I'm writing a blog about the possible negative impact artificial intelligence could have on the future. Why?

For one, I think concerns about what AI can or will do in the future are much more debatable than the definition of the term "AutoML". But, even if you believe the optimistic perspective is pretty compelling, I think the conclusion that I came to in my story still applies here. If one nuclear physicist told me that a new power plant design was safe, and another told me that it would potentially lead to nuclear winter, I would be very concerned about using those new designs to build a power plant. Even if I thought the arguments that the design was safe seemed a lot more convincing, the amount of certainty for something so important probably should be higher than "convincing". It's hard to meet this burden if experts in the field don't agree.

Russell also had a debate on this topic with yet-another expert in the space, Melanie Mitchell, with Russell arguing the pessimistic view and Mitchell the optimistic one. One of the arguments that Mitchell presents is that the level of AI that Russell is worried about isn't going to come any time soon, comparing the current scientific understanding of "AI" to alchemy^[4]. I agree that there is a lack of fundamental understanding of how machine learning systems work. At the same time, it seems like large and powerful organizations that contain highly proficient machine learning researchers and engineers are claiming that they are going to create "human-level AI". As Mitchell writes in her paper "Why AI is Harder Than We Think"^[5]:

In surveys of AI researchers carried out in 2016 and 2018, the median prediction of those surveyed gave a 50 percent chance that human-level AI would be created by 2040–2060, though there was much variance of opinion, both for sooner and later estimates. Even some of the most well-known AI experts and entrepreneurs are in accord. Stuart Russell, co-author of a widely used textbook on AI, predicts that "superintelligent AI" will "probably happen in the lifetime of my children" and Sam Altman, CEO of the AI company OpenAI, predicts that within decades, computer programs "will do almost everything, including making new scientific discoveries that will expand our concept of 'everything.'" Shane Legg, co-founder of Google DeepMind, predicted in 2008 that, "Human level AI will be passed in the mid-2020s", and Facebook’s CEO, Mark Zuckerberg, declared in 2015 that "One of [Facebook’s] goals for the next five to 10 years is to basically get better than human level at all of the primary human senses: vision, hearing, language, general cognition".

I think Mitchell's perspective on the situation is that worrying about "superintelligent AI" is like worrying about the ability to turn lead into gold. Alchemists can't actually turn lead into gold, so these worries are just wild speculation. But now imagine that major banking and financial institutions are constantly hyping up alchemy and talking about how it's going to revolutionize their industries. I think it would be reasonable to have the opposite perspective, the fact that we lack deep understanding of how alchemy works while industry is trying to push massive changes based on the field is actually a reason to be concerned. If experts disagree about whether a thing could happen, as a member of the public I don't want to rely on that thing never happening as the central solution to possible bad outcomes. Additionally, even if the thing never happens, all the powerful people who are overconfident in their ability to do the thing might cause a lot of harm by prematurely making a lot of changes based on their overly rosy view.

I think I agree with a lot of what Mitchell says, but come to a somewhat^[6] opposite conclusion. Facebook or Google or whoever planning to deploy systems based on alchemy-level science in their products makes me more worried, not less. I think it suggests that these companies should convince neutral observers that they've actually moved beyond alchemy and into chemistry before they attempt to deploy these systems.

Dystopian Analogies

This is where I think dystopian analogies (like the Terminator reference in the article mentioned above) are playing a big role in how people view the issue of AI safety. I think part of what is going on is that people like LeCunn and Mitchell see these arguments that sound like they are coming out of a Hollywood movie instead of a scientific journal and feel like they need to push back against them. And to an extent, I can't blame them! It does seem like a lot of the arguments that safety pessimists find the most powerful relate to imagining what a world with extremely powerful (sometimes "superintelligent") artificial agents would look like. Here's an example from Nick Bostrom's Superintelligence:

An AI, given the final goal of evaluating the Riemann Hypothesis, pursues this goal by transforming the Solar System into "computronium" (physical resources arranged in a way that is optimized for computation)—including the atoms in the bodies of whomever once cared about the answer.

I think hypotheticals like this can be useful from the pessimist side for examining intuitions about how a very powerful artificial agent might behave. This particular case is meant to demonstrate the idea of instrumental convergence. I think many people find that these types of examples help to communicate the intuitions behind AI safety pessimism.

On the other hand, I can see why people might have a different gut reaction to hearing these hypotheticals that makes them skeptical about pessimist arguments. I'm pretty sympathetic to the pessimist view, but I do get Sci-Fi vibes sometimes when I hear arguments like the one above. I mean, the world being turned into "computronium" is objectively a pretty Sci-Fi scenario!

I can see why an AI safety optimist would find these arguments frustrating. You invented this entirely speculative Hollywood-sounding future that only exists in your mind, and you're mad that people in the field haven't immediately decided that the most critical thing in the world is to address it? The reason most researchers aren't working on these issues is because they don't exist yet! If and when they become real issues experts who are actually building these systems will come up with practical solutions, just like every technology ever. Just like my "AutoML" story, why should people on the sidelines who don't understand real ML systems get to dictate to the true experts?

In my view, the effectiveness of examples like the one above depend on the reader sharing certain intuitions in response to the example. Just like a joke can "land" with some people and not others, these arguments can trigger an intuition is some people but not others. When they do trigger those intuitions I think they are very helpful, but when they don't "land" it can end up with people having a reaction like the one in the previous paragraph. I'll offer one final example from a discussion between computer scientist Scott Aaronson and psychologist Steven Pinker that I think really expresses the exasperation that some people feel. Pinker says:

All this is relevant to AI safety. I’m all for safety, but I worry that the dazzling intellectual capital being invested in the topic will not make us any safer if it begins with a woolly conception of intelligence as a kind of wonder stuff that you can have in different amounts. It leads to unhelpful analogies, like "exponential increase in the number of infectious people during a pandemic" ≈ "exponential increase in intelligence in AI systems." It encourages other questionable extrapolations from the human case, such as imagining that an intelligent tool will develop an alpha-male lust for domination. Worst of all, it may encourage misconceptions of AI risk itself, particularly the standard scenario in which a hypothetical future AGI is given some preposterously generic single goal such as "cure cancer" or "make people happy" and theorists fret about the hilarious collateral damage that would ensue.

I think this is a genuinely difficult communication problem. People who are skeptical about AI-risk understandably want a concrete description of what the concerns are. The problem is, the concerns are about the future, so whenever safety pessimists give an example of how bad things could happen it inevitably sounds like rank speculation. I want to try a different type of scenario that I think preserves the "dystopian" vibes that pessimists want in order to appropriately communicate their level of concern, while also perhaps hitting on some different intuitions that might help people understand the pessimistic point of view.

The Trial vs The Terminator

I first came across this analogy in the different context of digital privacy. The idea comes from this paper addressing the so-called "I've got nothing to hide" argument. The author makes the point that people often compare actual or potential privacy violations to George Orwell’s 1984, but Franz Kafka's The Trial is an equally enlightening comparison in terms of understanding the value of privacy. From the paper:

In my work on conceptualizing privacy thus far, I have attempted to lay the groundwork for a pluralistic understanding of privacy. In some works, I have attempted to analyze specific privacy issues, trying to better articulate the nature of the problems. For example, in my book, The Digital Person, I argued that the collection and use of personal information in databases presents a different set of problems than government surveillance. Many commentators had been using the metaphor of George Orwell’s 1984 to describe the problems created by the collection and use of personal data. I contended that the Orwell metaphor, which focuses on the harms of surveillance (such as inhibition and social control) might be apt to describe law enforcement’s monitoring of citizens. But much of the data gathered in computer databases is not particularly sensitive, such as one’s race, birth date, gender, address, or marital status. Many people do not care about concealing the hotels they stay at, the cars they own or rent, or the kind of beverages they drink. People often do not take many steps to keep such information secret. Frequently, though not always, people’s activities would not be inhibited if others knew this information.

I suggested a different metaphor to capture the problems: Franz Kafka’s The Trial, which depicts a bureaucracy with inscrutable purposes that uses people’s information to make important decisions about them, yet denies the people the ability to participate in how their information is used. The problems captured by the Kafka metaphor are of a different sort than the problems caused by surveillance. They often do not result in inhibition or chilling. Instead, they are problems of information processing—the storage, use, or analysis of data—rather than information collection. They affect the power relationships between people and the institutions of the modern state. They not only frustrate the individual by creating a sense of helplessness and powerlessness, but they also affect social structure by altering the kind of relationships people have with the institutions that make important decisions about their lives.

I think a very similar analysis applies in the context of AI safety^[7]. Instead of SkyNet taking over the world or a superintelligence using all our atoms to make computronium, imagine a world where algorithmic decisions are ubiquitous, but their reasoning is opaque to humans. In your day-to-day life you're constantly interacting with advanced AI systems. Sometimes these work fine, but sometimes they do something like deny you access to something or punish you in some way, without any explanation. No one seems to really understand how these AI systems work, but the populace accepts their existence because society has become dependent on them and developers at least claim that they've done internal testing which shows the systems are accurate.

Now, this description is a lot less concrete and easy to imagine as some of the other scenarios that have been proposed. That's intentional. I think people are sometimes skeptical of scenarios that seem unreasonably specific. We are talking about future events that relate to a technology we don't understand very well. That makes it inherently hard to give a concrete description of what could or will happen, but in the framing that I've given, I think this uncertainty should make us more worried, not less. Instead of trying to articulate real events that I think could happen, I'm trying to convey the intuition for why we should be concerned. The "vibe" of the thing, you might say. The fear and discomfort of having your life controlled by some alien system that you don't understand.

Artificial vs Intelligence

One way to think about this is that a lot of the "superintelligence" scenarios focus on the intelligence part of AI, with the assumption that intelligence leads to scary levels of power. This framing places emphasis on the idea of instrumental convergence. In contrast, the framing that I'm proposing focuses on the artificial part of AI, with the assumption that artificiality leads to scary levels of weirdness or uncertainty. This framing places emphasis on the so-called orthogonality thesis. I think both are useful, but incomplete. As a result, I think using them in tandem could work well, and in particular, some people may be more drawn to one than the other. If so, then I'm hopeful that this artificiality framing can bridge some of the gap between AI safety optimists and pessimists.

Attempting to do some of that gap-bridging, I think once you consider this new framing a lot of the arguments for why we shouldn't be concerned about AI get flipped on their heads. AI isn't subject to the pressures of evolution? That's the problem! Institutions like social norms, laws and governments were built for humans, including all the quirks that evolution built us with. A powerful technology that those institutions aren't prepared to handle can cause great harm without having some type of will towards "dominance". Invoking the "dominance" terminology is itself making the error of anthropomorphizing ML systems. For example, social media can have a huge influence in people's lives without having some type of "intention" to dominate. The experience of loss of control comes from having to interact with an incomprehensible system, not from an explicit malicious intent.

How about the argument that these concerns are pure Sci-Fi that's hundreds of years away if they will ever even be relevant at all? A "superintelligence" turning the world into "computronium" sounds extremely Sci-Fi, but a system so complex that it feels exhausting and insanity-inducing to interact with seems like it already exists! As in the social media example above, I think people already have these types of feelings now with currently existing technology, not to mention human bureaucracies like the legal or financial systems. As machine learning gets more powerful and more difficult to understand, the problem only intensifies. True, the problems I'm worried about are mostly in the future, but that doesn't mean there aren't existing problems that are similar in some ways. So, while my main concerns really are related to ML systems and not social media or tech in general, I think we can also see problems that exist today and imagine how they could be exacerbated or combined with new problems resulting from AI in detrimental ways.

Does this framing underestimate the risk?

I think some AI safety pessimists may dislike this framing because they feel it underestimates how dangerous AI could be. After all, I'm comparing to things like social media, doesn't that mean I'm saying the risk is only "normal" levels of bad, instead of apocalyptic levels of bad that many pessimists believe is the case? I would say no. I think it is true that some pessimists are more confident than I am that AI will definitely be really, really bad unless major efforts are made to stop that result. But, I do think those really, really bad outcomes are a relevant risk that should not be ignored. After all, the main character dies at the end of The Trial. My framing definitely doesn't place the possibility of world-ending disaster center stage like some superintelligence examples, but I'm hopefully that it can convey a broad range of risks, including both more "normal" ones all the way up to world-ending ones.

There is a trade-off in emphasis here. I think my read on the risks involved may be different from some pessimists, and as a result I may be more willing to trade off some emphasis on big risks for a higher likelihood of pessimist-optimist gap-bridging. I can see why someone who thinks hammering home world-ending risks is the absolutely most important thing would oppose my framing, but I want to make clear that I'm not trying to pull some type of bait and switch. I'm most definitely not saying that the main risks are from social media or currently existing AI systems or more mainstream worries like AI-induced unemployment^[8]. Rather, I'm imagining the risks on a spectrum were progressively more powerful AI makes people feel more and more helpless and dis-empowered. I think the worst versions of that are less "in-your-face" as imagining the literal end of the world^[9], but I am hopefully that being less "in-your-face" will actually be more helpful in some conversations.

Likewise, I think some optimists may feel I'm trying to pull a bait and switch on them, by replacing outlandish Sci-Fi scenarios with something more similar to issues presented by existing systems. As I say above, I explicitly am not saying that known or high profile suspected problems with existing systems are the main issue I am concerned with. To the extent that my framing sounds similar to those issues, I would suggest that it's because we can imagine related issues coming up that are similar but also much more challenging to address as ML systems become more powerful.

Practical Conclusions

I think we can draw some practical conclusions based on this new framing. In particular, the big take-away that I'm arguing for is that under this framing, it makes a lot of sense that developers of AI/ML systems should have a burden to explain how their systems work and to show that they are safe. I'm not saying they have to prove beyond a reasonable doubt that a system can never cause any harm, but I think if you are planning to use a new technology whose impacts are highly uncertain, you need to at least show to a reasonable level of confidence that you understand how that system works on a deep level. By "on a deep level", I mean that showing good test set performance isn't enough. It might be necessary, but it's not sufficient to claim that you have an understanding of a model "on a deep level". You need to have a more mechanistic understanding of how it maps inputs to outputs, how it will behavior in different environments that may not be included in its training or evaluation data, and probably many more things as well. I think this follows pretty naturally from my proposed framing. You don't need to directly address every possible scenario people might come up with relating to risks of your model (although doing so isn't bad!), but you do need to have a high level of understanding of how your model works that would justify making broader conclusions about it's safety and effectiveness. You need to prove you are doing chemistry and not alchemy. I think that is a reasonable goal that hopefully makes sense to both AI safety optimists and pessimists.

"AutoML" wasn't the actual terminology in question, but it gives the spirit of the discussion. ↩︎
I usually like to think about "advanced machine learning systems" instead of "AI", but the term "AI" is extremely common in this context, which is why I sometimes use it in scare-quotes. ↩︎
This description, as well as all the other places in this post where I summarize the views of these two groups, are generalizations. They probably don't represent all the nuances of opinion on either side nor the diversity of beliefs that the two sides contain. Nevertheless, I think they are reasonably fair summaries of both sides. I'm also trying to strike a balance between giving relevant examples to make the ideas clear without calling out or characterizing the views of individual people too much. Please don't interpret these summaries as applying either to all people in either camp or as descriptions of the views of particular people. ↩︎
This is discussed at around 31:00 in the audio on the Munk Debates page. ↩︎
I've removed internal citations from this quote. ↩︎
I say "somewhat" because my understanding is that Mitchell does think there are things about AI/ML that are reasonably concerning. She says as much in the debate with Russell, so I don't want to be unfair or misrepresent her position. However, I do think that part of her argument is that AI being "harder than we think" should temper our concerns somewhat. I disagree with that conclusion. ↩︎
I propose The Terminator as the comparison to The Trial in the context of AI safety, but there's definitely no shortage of 1984 comparisons to "AI" out there as well, I just think they tend to focus on different issues. As a result, I think The Terminator is more similar to the "superintelligent AI" scenarios I have in mind. See here and here for commentary by others about the analogy between The Terminator and AI safety. ↩︎
Although to be clear, I don't think these worries are unreasonable. Many of them are extremely reasonable! I just think that there are even bigger risks that are less concrete/obvious now that will become a big deal in the future. ↩︎
I also think that my framing can include "the literal end of the world" as well, it just won't be as obvious. Kind of like the difference between "imagine hearing a loud sound and seeing a big flash of light" vs "imagine a nuclear war between the US and Russia where a bunch of major cities are hit". ↩︎