On April 12, 2024, there was an evening in Pakhuis de Zwijger about new AI developments, organized by Existential Risk Observatory. Otto Barten (director of ERO) gave a short introduction about existential risks, in particular those of human-level artificial intelligence (AGI), in front of a sold-out large hall.

Large language models such as ChatGPT and Bing Chat suddenly seem to bring a future with AGI closer. Is that really so? And what happens if AI can perform any cognitive task as well as we can? prof. Stuart Russell (UC Berkeley) attempted to answer these questions in his lecture. Below are some highlights.
Russell began by listing the potential positives of AI, such as increased GDP, better, more individualized health care, personalized education for every child, accelerated advancement in science, and much more. Then he asked the question: “Where are we now?”

Due to the development of LLMs (large language models), some people believe that human-level AI already exists, but Russell doesn’t think that’s true. The LLMs may be a puzzle piece from AGI, but we don’t yet know what the shape of the puzzle is, or how this piece fits in, or what the other pieces are. The LLMs do not understand the world around them. They cannot plan on multiple abstract levels like humans. However, in the future there could be AGI that has fewer problems with this. The pieces that are still missing could also be ready very soon. We are already seeing sparks of what AGI could be due to the rapid development of LLMs. And where there are sparks, fire can suddenly start.

Russell quoted Alan Turing as saying in a lecture in 1951: “It seems likely that once the machine-thinking method gets going it will not be long before it will surpass our feeble faculties. So at some point we have to expect the machines to take control.”

So it could be, says Russell, that while you make AI better and better, things just get worse. Russell thinks things could go wrong because of a phenomenon called misalignment. If it is not described completely correctly what a person wants from AI in the future, then you create a mismatch, a misalignment between what we want the future to look like and the objective that the machines have.

An example where we have already seen it go wrong is the algorithms of social media. The algorithms that determine what millions of people see and read every day have the goal of generating as much clickbait as possible, which is a different goal than the user has. These algorithms can gradually change social media users: people become more extreme versions of themselves. Societies are polarizing as a result. The algorithms of social media are relatively simple, if “better” algorithms were used that can manipulate people even better, the consequences of social media use would be much more serious than they are now. “So we need machines that have a beneficial effect on humanity, not machines that are just intelligent.

What are we going to make those machines do?

  1. The machines are constitutionally required to act in your best interest.
  2. The machines are explicitly insecure about what those human interests are. They know they don’t know what the goal is.
    These two principles together would provide a solution to the control problem of AI. The mathematical version of this is called the assistance game.”

Back to the LLMs, the sparks of AGI that can start fires. We have no experience of entities that have “read” every book that man has written. We do not yet know whether an LLM is flexible enough to effectively step outside its training data. One thing we do know about machine learning is that typically the best solutions discovered by machine learning algorithms have been found in and by the model itself, without human intervention. So it is not the case that an AI system learns what the goals of the people are, it forms internal goals itself. “I asked Microsoft’s Sébastien Bubeck if the LLMs have internal goals and the answer was, ‘We have no idea.’ We should be concerned about that,” said Russell.

And if AI had a goal, then we don’t know whether that is the ‘good’ goal and whether the path through which that goal is achieved will be beneficial to humanity. If we as humans want to limit climate change, we know we shouldn’t do that by removing all the oxygen from the air, but AI has no understanding of the world around it and could achieve such a goal on its own.

Since no one knows if LLMs have goals, we don’t know if they will pursue them. The LLMs are now being used without establishing basic criteria for the security of humanity. Hence the open letter to pause the development of more powerful LLMs than GPT-4 for half a year. “This is fully in line with the agreements signed by all governments of developed Western economies. Nothing should be running on a computer that is not certain to be safe.”

Later, during the questions, it was mentioned that we have seen examples of the enormous damage that nuclear weapons can cause, but not AI, which means that many people underestimate the risks.
In the second half of the evening Mark Brakel (FLI), Lammert van Raan (PvdD) Queeny Rajkowski (VVD) and others discussed AGI further. Maarten Gehem (Argumentenfabriek) moderated.