This is a collaborative post between David Scott Krueger and Otto Barten. David is University Assistant Professor in Machine Learning at the University of Cambridge, focusing on Deep Learning, AI Alignment, and AI safety. He is also a research affiliate of the Centre for the Study of Existential Risk (CSER). Otto is director of the Existential Risk Observatory.
“A computer would deserve to be called intelligent if it could deceive a human into believing that it was human,” Alan Turing wrote in 1950, defining his now famous Turing test. Google’s new LaMDA model has not achieved this quite yet, but it did convince Google engineer Blake Lemoine of its sentience last month, sparking significant publicity. Such a pronouncement might have seemed laughable only a few years ago. But rapid progress in AI has generated some uneasiness.
In science fiction, sentience tends to portend an AI system “going rogue”: no sooner will the AI become self-aware than it begins to disobey its human masters, seeking dominion over us. In reality, however, the careless development and deployment of AI poses risks to humanity regardless of whether these systems are considered sentient. Unfortunately, these risks are overlooked and dismissed by those racing to develop more advanced models.
Researchers studying the risks of advanced AI—at Oxford’s Future of Humanity Institute and Cambridge’s Centre for the Study of Existential Risk, for example—have expressed clear concern that out-of-control AI poses an existential risk to humanity. Oxford academic Toby Ord estimates a ten percent chance of AI-induced human extinction. But the fear that AI could indeed seize power arises not from its sentience, but merely from its competence.
In the last five years, large-scale deep learning has produced AI systems that can write convincing essays and computer programs, synthesise surreal scenes in photorealistic detail, and trounce human champions of video and board games, including Go. This rapid progress has left many to wonder: how long will it be until AI can outperform us at any cognitive task? As Geoff Hinton, the “godfather” of deep learning, has warned: “There is not a good track record of less intelligent things controlling things of greater intelligence.”
Beyond such common-sense arguments, researchers have identified several reasons for concern. First, AI systems are built by searching through a gigantic set of poorly understood behaviours. Even their programmers can’t predict what they will do in new situations — after a Google AI system tagged a photo of Black people as “gorillas”, the only fix reliable enough was to remove the gorilla tag entirely. Second, this search is driven by simple metrics, exemplifying ”ends justify the means” reasoning. AI systems exploit loopholes like a clever accountant, with a sociopath’s indifference to harmful externalities overlooked by the metric. Having spent millennia seeking to clarify and codify our values, humans now run the risk of consigning the future to machines optimising for advertising revenue.
Fortunately, the research community is increasingly embracing the need to align AI system’s behaviours with the intentions of designers and users. Still, no definite solution for ensuring AI stays under human control exists. Whether sentient or not, a cunning AI might thwart our best efforts at human oversight by subterfuge or manipulation. LaMDA almost certainly was not trying to win Lemoine’s empathy—it was trained to predict text, not influence people. That it was able to convince him of its sentience without even trying begs the question: what if it were programmed to persuade?
AI risk sceptics, like Yann LeCun, Facebook’s director of research, often argue that human-level AI is still decades away. But far from being reassuring, such predictions in fact highlight the urgency of addressing these risks: an AI takeover scenario could very well occur within our lifetimes. This does not have to look like a sudden sci-fi coup. AI alignment researcher Paul Christiano argues that the metrics-driven nature of AI “could cause a slow-rolling catastrophe”, as what we want and what we measure peel apart over time. So far, humanity has not tackled climate change, despite decades of advanced notice; why should AI risk be any different?
Moreover, AI is an increasingly central part of tech giants’ strategies. If serious restrictions on AI development become warranted, big tech might adopt the playbook of the fossil fuel industry. Already, nearly all AI researchers receive or have received funding from big tech companies. Lemoine’s suspension should be considered in the context of Google’s recent decision to “resignate” AI ethics researcher Timnit Gebru after she wrote a paper about the risks of building ever-larger AI systems, which Google is enthusiastically doing. In a lesser-known incident, Google employees were prohibited from publishing work on aligning recommender systems. This is despite researchers’ warnings that the AI powering YouTube’s recommendations could be incentivized to manipulate user interests — say to favour more extreme content — in order to keep them watching.
Such research needs government support to undermine corporate censorship. Companies can also do a better job in-house, but may need encouragement from policymakers. Assessing whether an AI system is safe to deploy requires both top-notch technical expertise and the proper incentives. Furthermore, AI developers are rarely equipped to study and understand the impact these systems have on users. Ultimately, addressing AI harms and risks may be less of a technical problem and more a matter of coordinating to enforce sensible standards.
Meanwhile, despite media attention, topics like existential risk — and indeed, AI sentience — remain neglected by an AI research community wary of hype. Only 10 years ago, even the term “artificial intelligence” was somewhat taboo. But the development — and risks — of human-level AI is too important not to talk about. Whether it will turn out to be the best or the worst thing humanity has ever achieved remains disturbingly uncertain.