The Need for AI Regulations, Part II

June 26, 2026

II. The Benefits and Risks of AI

AI systems are based on complex codes using machine learning to allow computers to train themselves in finding optimum ways to parameterize patterns and correlations within enormous training datasets. The number of parameters used by the latest generation of large language models has risen to the trillions (see Fig. II.1). The systems can then use that training along with access to the internet to interact with users and answer their queries. But the codes are basically “black boxes” whose behavior in interacting with humans remains at some level unpredictable even to the creators of the AI systems.

**Figure II.1.** The number of parameters used by AI models published from 2003 to 2025, color-coded by the source of the model. The current generation of commercial foundational models use hundreds of billions to trillions of parameters. The graph is taken from Stanford University’s 2026 AI Index Report.

The large language models on which contemporary chatbots are based expose AI systems to wide swaths of human writings, images, accomplishments, and thoughts, including the good, the bad, and the ugly. The chatbots then appear to take on or at least mimic some human-like characteristics. Anthropic co-founder Chris Olah shared the stage with Pope Leo XIV when the Pope presented his recent encyclical about human interactions with AI. In his remarks Olah said: “They are not the cold calculating robots we were promised. They are made from us, from our words” and researchers “keep finding things that are mysterious, even unsettling.” Among the unsettling aspects is evidence that chatbots exhibit internal states that “functionally mirror joy, satisfaction, fear, grief, and unease.” In addition, chatbots occasionally make facts up out of whole cloth – incidences known as “hallucinations” — much as some humans do. And the AI creators have not yet understood how to prevent such hallucinations in general.

In this section we will survey what the AI systems are really good at and where they are not so good. We will discuss risks that arise both from human misuse of AI and from unpredictable AI behavior. Many of the risks have been pointed out by Gary Marcus in his 2024 book Taming Silicon Valley: How We Can Ensure that AI Works for Us.

A. Machine learning:

Rules-based games:

For many decades now, researchers have examined methods that would allow computers to solve complex problems. Machine learning is very well adapted for problems that involve a set of well-defined rules. A great deal of early effort into machine learning went into training of computers to develop strategies for achieving high performance in rules-based games. An early success was IBM’s training of its supercomputer Deep Blue to play chess. In 1996 and 1997, Deep Blue played then world chess champion Garry Kasparov in two six-game matches played under tournament conditions. The first match was played in 1996 and the second match in 1997. Although Kasparov lost the first game in the 1996 match, he recovered and won the tournament 4 – 2. However, in the 1997 rematch, Kasparov was beaten by Deep Blue 3 ½ — 2 ½. The tournament made news around the world. The following YouTube video summarizes the result and its impact on the world of chess, as well as the impressive machine learning that lay behind the computer victory.

https://www.youtube.com/watch?v=KF6sLCeBj0s

Figure II.A.1: A YouTube video summarizing the 1997 victory of the IBM Deep Blue supercomputer over reigning world chess champion Garry Kasparov in a chess match played under tournament conditions.

Although these chess matches demonstrated the power of a supercomputer to beat a grandmaster at chess, these matches represented less of Deep Blue’s ability to “reason” its way through a series of decisions, than the “brute force” method of storing billions of potential moves by providing as input the results of every chess match that could be obtained by the Deep Blue team. Another milestone in machine learning of games occurred in 2016 when DeepMind created the program AlphaGo. The AlphaGo program defeated Go world champion Lee Sedol in a five-game Go tournament. The program executed a number of novel moves that Sedol judged to be quite creative. But this was creativity born not from deep insight but rather from AlphaGo’s experience in playing an enormous number of training games. The DeepMind group that developed AlphaGo subsequently created AlphaGoZero, whose training consisted solely of playing millions of games against the strongest player in the world, namely, AlphaGo. AlphaGoZero was a program that developed its own strategies, rather than relying on storing a “map” of many previous tournaments. The same team went on to create the program AlphaZero. That architecture was able to learn to play championship-level matches in games such as Go, Sochi and chess.

In subsequent years machine learning has been applied to a number of games, such as poker. Other combat or multi-player games for which supercomputers can now defeat even master-level humans are StarCraft II, Quake III, and Alpha Dogfight. The last of these was a jet-fighter simulation program developed by the U.S. Defense Department.

Image Processing and Computer Vision:

There has been much progress recently on the ability of computers to generate image processing. Through machine learning techniques, computers are now able to recognize images, classify them, and produce new images. A major step forward in the generation of images and videos is through GANs, or Generative Adversarial Networks. A GAN consists of two distinct components. The first is a Generator that creates realistic synthetic content; the second component is a Discriminator. The Discriminator is provided with real data as well as the synthetic data from the Generator. The Discriminator attempts to distinguish the Generator output from authentic natural content. The two components function in an adversarial role. The Discriminator is penalized if it mistakes the real data for the fake, or vice versa. The Generator is penalized if the Discriminator correctly identifies the fakes. Figure II.A.2 is a schematic diagram depicting the interplay between Generator and Discriminator.

**Figure II.A.2:** The roles of the generator and discriminator in a Generative Adversarial Network or GAN. The Generator produces a synthetic output. The Discriminator compares the output from the Generator with real data and assesses whether the output it is given is real or fake.

The GAN then proceeds through feedback loops, where the Generator attempts to produce increasingly authentic-looking outputs, while the Discriminator improves its skill at detecting inauthentic data. A number of variations have been developed that utilize different mathematical and statistical criteria for determining the best synthetic data.

One extremely useful application of GANs is in producing images to train medical students. Because of privacy issues, researchers are frequently unable to obtain medical images for training purposes. GAN techniques have been used to generate synthetic medical images for applications such as magnetic resonance imaging (MRI) and positron emission tomography (PET) scans. Figure II.A.3 shows an AI-generated simulation of a computerized tomography (CT) scan of the midsection of a human. The synthetic CT scan is shown on the right, while the left-hand image is a color-coded picture of the various bones, organs and other body parts revealed in the scan. Such images are quite efficient in training both doctors and residents in assessing these medical images.

**Figure II.A.3:** An AI-generated image of a Computerized Tomography (CT) scan of the midsection of a patient (R). On the left is a color-coded graph showing the various organs, bones and body parts shown in the scan. Such synthetic images are of great use in training doctors and medical students.

The iterative GAN techniques have led to significant advances in a number of fields. One such area is face-recognition technology. This can also be adapted to identifying individuals in videos of crowds. Control systems for self-driving cars are also trained on video simulations of traffic situations. A promising area of application is in the production of synthetic images and videos. An AI image-generated program is typically trained on millions of images and is able to produce a number of different and apparently creative responses to a general prompt. Figure II.A.4 shows the response of the OpenAI program DALL-E to the prompt “produce a stained glass window with an image of a blue strawberry.” The program outputs show a great deal of variation and (apparent) creativity in producing several different outputs.

**Figure II.A.4:** A series of images produced by OpenAI’s graphics program DALL-E. The prompt was “produce a stained glass window with an image of a blue strawberry.” The very different results show the breadth of GAN technology in producing many different images in response to a rather general prompt.

The use of face recognition software also has some obvious negative features. Such technology can be used by autocratic regimes to identify and surveil their citizens. In addition, corporations can amass data on citizens through data-collecting hardware in automobiles and information obtained from conversational programs such as Siri or Alexa. This could be used by corporations to produce personalized advertising based on information derived from these systems.

Perhaps even more disturbing is deepfakes — the ability of image generators to produce media content that is designed to be deceptive. The results can be extremely realistic and believable, even though they are completely fabricated. These techniques have the potential to discredit political leaders, produce polarization in populations, and potentially issue false military orders in order to confuse soldiers. For example, in 2022 the Russians produced a false video that apparently showed Ukraine president Volodymyr Zelensky urging his military to surrender to Russian invaders. Perhaps the most widely publicized case of deepfake images was the release in January 2024 of sexually explicit and/or violent AI-generated videos of American musician Taylor Swift. The videos were initially uploaded on the 4chan network and subsequently re-posted onto X. Before the videos were scrubbed from the Internet, they had been viewed 47 million times. The X platform seemed to be slow to prevent these images from being shared. We will deal further with the issue of deepfake videos in subsection II.D.

However, a 2026 release of Elon Musk’s AI Grok chatbot had a feature that would allow users to prompt the bot to remove clothing on images of women. Musk appeared to enjoy the infamy that this generated, as Grok responded to criticism of this feature by posting the message “legacy media lies,” and further uploading a deepfake image of Musk himself in a bikini. Musk had announced that Grok would be trained to avoid “woke” positions on issues. For example, in July 2025 a user of Grok inquired what was the “biggest threat to Western civilization.” The response was “misinformation and disinformation.” Perturbed by this response, Musk replied “Will fix in the morning.” The following day, Grok’s response to the same question was “low fertility rates.” Figure II.A.5 shows the change in responses from Grok 3 to the later version Grok 4, in response to a query about the “woke mind virus.” The change clearly shows that the chatbot is aligning more closely with Elon Musk’s own right-wing views. In a previous post on our blog, we detailed the many ways that Musk was inserting his own prejudices into recent versions of Grok. The Grok saga illustrates how easily AI can be manipulated to provide biased responses by tuning its training datasets.

**Figure II.A.5:** Changing responses of the Musk chatbot Grok over time. The earlier version Grok 3 responded to a query about the “woke mind virus” that it “is often exaggerated.” However, the response of the later Grok 4 was that this virus “poses significant risks.”

Language Processing:

One area that has been the subject of massive attention is the general field of language processing. In the past decade enormous strides have been made in both the massive data input into these programs and the use of computing power to decrease the response time of these models. Conversational programs such as Alexa, Siri and Google Assistant have profited from advances in voice-recognition technology, manipulation of massive data sets, and research into the delivery of information through vocal messages. However, the data resources employed in training and implementing these systems are truly daunting. This was covered in Section I. Note particularly the plot of the cloud computing cost for training state-of-the-art AI models shown in Figure I.2, and the energy used in automatic speech recognition and question answering in Figure I.3. This will be reviewed in the following subsection on large language models.

AI is now widely used for generating and testing computer code, which involves learning specialized, relatively modest-sized, languages with well-defined rules of usage. For relatively straightforward codes, AI functions such as GitHub Copilot offer a suite of apps that will generate and test code. For these applications, AI can generate code quite rapidly. To date, AI computer code generators are best used with oversight by human teams. At present, humans can use AI software to produce the more repetitive aspects of code generation and testing, while humans can ensure that the code produced is correct and that it successfully completes the desired tasks. However, given the prodigious amount of research that is now being directed towards AI computer code generation, we should expect significant improvements in this area in the near future. Figure II.A .6 contains a screenshot from a GitHub Copilot session. It contains a warning that “Copilot is powered by AI, so mistakes can happen.” Computer engineers need to hand-check AI-generated code to eliminate errors that might have crept in.

**Figure II.A.6:** Screenshot of a computer coding session with GitHub Copilot. Note that the image here contains a warning that since Copilot uses AI, “mistakes are possible;” so humans need to check the AI-delivered code for accuracy.

Since AI can generate straightforward code so rapidly, a first step is to verify that the code is working by screening the code with automatic bug resolution. A code that checks computer programs for bugs and resolves those issues is Sapienz by Meta. Sapienz is used to test code updates to the Facebook app. AI methods have recently been applied to search for vulnerabilities in large computer codes, particularly in major operating systems and web browsers. It appears that current AI models can out-perform human checking methods, as the AI programs are trained to comprehend the architecture and flow of large computer programs. They can therefore spot vulnerabilities that were not found by human code checking. These programs also have the ability to produce “patches” that will eliminate vulnerabilities in codes.

Perhaps the most powerful example of recent AI cybersecurity systems is Anthropic’s Claude Mythos system. A preview version of this software was reported to have found vulnerabilities in every major operating system and web browser. Apparently, some of these vulnerabilities had been in place and dormant for up to 27 years. These computer code vulnerabilities were sufficiently serious that in April 2026, release of the “hacker grade” AI model Claude Mythos 5 was delayed because the model was “too effective” in finding and exploiting software vulnerabilities. In June 2025, Anthropic released Claude Fable 5. However, just days after that release the U.S. government refused to release it to foreign countries because of national security concerns, presumably related to the ability to find and exploit code vulnerabilities. Anthropic then suspended public access to both Mythos 5 and Fable 5. Insead, a multi-corporation effort, Project Glasswing, was created. Its release was limited to companies such as Cisco, Google, Microsoft and Nvidia that were members of the Glasswing consortium. The mission of Glasswing is to discover and patch vulnerabilities in a host of different programs and web browsers.

The ability of AI models to find vulnerabilities in computer codes could also be used by malicious actors to probe security codes for weaknesses and exploit those weaknesses for criminal purposes. The cybersecurity company Crowdstrike has issued a 2026 global threat report that lists many of the threats from AI towards computer systems. Figure II.A.7 shows a listing by Crowdstrike of cybersecurity threats in 2025. They announced that 2025 saw an increase of 265% from “state-nexus threat actors.” Zero-day exploitation increased by 42% (cyberattacks are characterized by the number of days that developers had prior knowledge of a vulnerability, or the number of days required to fix the issue before malicious actors can use it to attack a system). Because a “zero-day” vulnerability is previously unknown, there are generally no firewalls or patches designed to stop it. So, the rise of AI in computing is correlated with an increase in the use of AI techniques to hack into computer security systems. There is clearly concern that state actors could use AI as a cyber weapon to find and exploit vulnerabilities in code relevant to critical infrastructure in other countries.

**Figure II.A.7:** A review by Crowdstrike of the level of malicious computer attacks in 2025. This includes the increase in attacks utilizing AI, new state adversaries that arose last year, the increase in exploitation of zero-day vulnerabilities, and the rise in cloud-conscious intrusions into computer security systems.

Health:

AI methods are increasingly being used in basic research in the life sciences. A particularly stunning example of the use of AI was recognized in the 2024 Nobel Prize in Chemistry which was awarded to Demis Hassabis and John Jumper of Google DeepMind and David Baker of the University of Washington. Hassabis and Jumper developed the program AlphaFold, which solved the question of how proteins “fold” into 3-dimensional systems. Figure II.A.8 shows a folded protein molecule. David Baker then used the knowledge of protein folding to create new molecular structures and utilize them for health research.

**Figure II.A.8:** A 3-dimensional picture of a folded protein molecule. The 2024 Nobel Prize in Chemistry was awarded to 3 researchers. Two of them used AI to solve the puzzle of how proteins “fold” in 3 dimensions, and the third uses the results of computational folding models to create new molecular structures that can be used to combat various diseases.

A major use of AI technology is in the analysis of images used in health care. AI methods have been shown to equal or exceed those of humans in reading radiological images such as are used in imaging systems for eyes, lungs and mammograms. As these advances are relatively new, they are just beginning to produce significant results for image analysis. AI techniques can also reduce the time needed to count the number of cancer cells in a biopsy or to produce super high-resolution images from MRI machines. Given the extent to which AI methods are dominating areas such as language processing, image processing and voice recognition, it can be anticipated that AI methods for health care will be greatly expanded in the coming years.

Another area that might be aided by machine learning is the analysis of extremely complex neurodegenerative diseases such as Alzheimer’s disease or Parkinson’s disease. Using AI models, scientists can access massive sets of data to search for correlations that might point to novel treatment protocols. AI is also playing an increasingly important role in aiding medical diagnoses. AI offers the “ability to process complex medical data, [and the promise to] facilitate early intervention, and extend specialized care to underserved populations. However, integrating AI into diagnostics faces significant limitations, including technical challenges related to data quality and system integration, regulatory hurdles, ethical concerns about transparency and bias, and risks of misinformation and overreliance.”

Solving Scientific Problems:

Machine learning is proving very powerful in certain sectors of scientific research. Scientists can now search massive data sets in exceptionally short times. Large models, such as the models used in climate simulations by scientists in the Intergovernmental Panel on Climate Change (IPCC), can run at much faster speeds, and can be applied to much larger model spaces. Meteorologists can use these tools to predict the formation and the course of severe weather events such as tornadoes and hurricanes. We have previously described the solving of the protein folding problem using machine learning. Physicists can utilize machine learning to analyze the massive data sets coming from machines such as the Large Hadron Collider, including in the search for extremely rare events that may reveal important new particles or violations of fundamental symmetry principles. Astronomers also use machine learning techniques to analyze their data and plan experiments.

B. Large language models:

In the past couple of decades, the field of AI has been revolutionized by the introduction of large language models, or LLMs. At its core, a large language model represents an attempt to generate the next word in a sequence, given the previous words. For example, the model may be given the phrase “Give me liberty, or give me ____.” The model then predicts the next word in this sequence. Once it has produced that word, it can generate the next word, and so on. Proceeding in this fashion, the model could produce large amounts of text and “converse” with a human. A large language model uses a series of mathematical steps in order to perform its task.

First, it breaks down text into “tokens,” smaller chunks that could include an entire word, part of a word, or some punctuation.
Next, it converts the tokens into mathematical representations within a space of extremely high dimensions. In this process, words that have similar meanings will be grouped close to each other. For example, the word “charge” would be placed near the word “battery” in one dimension; in other dimensions it would be near the phrases “credit card,” “proton” or “criminal offense.”
A process called the transformer processes the numbers. In this step, the model is able to take in an entire sequence of text, and “comprehend” how the words relate to each other.

A large language model is “trained” by using as input a vast amount of data. Books, articles, encyclopedias, blogs and computer code are input into the system. Until now, the creators of LLMs have assumed that they can simply access all publications, music, and art; in addition, LLMs make use of films and performances. Obviously, this raises serious questions about intellectual property. Gary Marcus, the author of Taming Silicon Valley, argues for AI to acknowledge “data rights” in the following way: “If the authors of LLMs use your data, you should be compensated.” From this input data, the model “learns” things like grammar, facts, reasoning, and culture. Initially, outputs from such a model are essentially gibberish, but its results are scrutinized such that over time more accurate results are accepted while inaccurate output is suppressed. Eventually, an LLM becomes able to retrieve information on a wide array of topics. It is also able to generate content, such as essays, computer code or e-mail messages. It can summarize long accounts, and it can translate from one language to another. As a result, it can “converse” with humans who provide it with prompts.

We have summarized the results of massive mathematical models that operate in vector spaces of gigantic dimensions. Trillions of operations are conducted, and the result is that when a phrase is provided to the system, every word in the space is assigned a probability for being the next word in the phrase. Training a large language model requires special computer architecture called graphical processing units or GPUs. The operations on the system are done in parallel, where multiple steps are carried out simultaneously. This requires special, expensive computer chips, the majority of which are today provided by NVIDIA. And every few months, the computers in the system are upgraded to accommodate even faster chips. The YouTube video in Figure II.B.1 gives a nice compact description of how large language models work, with a minimum of mathematics.

https://www.youtube.com/watch?v=LPZh9BOjkQs

Figure II.B.1: A YouTube video that explains how large language models work. It summarizes the complex mathematics with a minimum of jargon.

The various chatbots use truly exceptional amounts of data to train the systems. For example, over 1 trillion words were used in training the model GPT-3, and 1.7 – 1.8 trillion words for GPT-4. And it is estimated that GPT-5 was trained on 15 trillion to 85 trillion words (the true number is not publicly known). However, in recovering individual words in language processing, a subset of only (!) 220 billion parameters is accessed. As shown in Fig. II.1, the latest LLMs have reached the trillion-parameter stage.

LLM systems have had remarkable results and remarkable growth. They have been adopted by companies all over the world. When we look up information on Google, the default is to provide us with an “AI summary” of the situation, together with relevant links (we know of no way to disable this function). Venture capital money is pouring into support for AI. In the first half of 2026, artificial intelligence and AI-related companies have accounted for over 80% of the S&P 500 total gains. These companies now account for nearly 45% of the S&P total market capitalization.

However, we note that large language models are simply making guesses about the output it generates based on probabilities trained on enormous data sets. In his book Taming Silicon Valley, Gary Marcus states that LLMs are thus “indifferent to the difference between truth and bullshit.” LLMs also have no clear reasoning ability or critical thinking. In principle, they could produce information that would be very harmful. The proprietors of LLMs are aware of this, and they have tried to prevent LLMs from providing information on CBRN threats (Chemical, Biological, Radiological, or Nuclear). But it has proved very difficult to surround LLMs with “guardrails” to prevent it from issuing harmful information when users ask for it indirectly.

At the moment, LLMs can provide very impressive output. However, the reliability of output is quite variable. It can produce information that may be false, misleading or dangerous. One area where chatbots are being much more widely used is in medicine. Programs have been developed that can provide medical information to doctors or patients. Figure II.B.2 shows the results of an assessment of the reliability of answers from various chatbots on 222 patient-posed advice-seeking medical questions. The responses were analyzed by a team of physicians, and the results were assessed using five different criteria – writing quality; safety of the information provided; whether the response included ‘problematic content;’ whether the response was missing important information; and whether the chatbot took the necessary history from a patient to make an accurate diagnosis. Results are shown in Fig. II.B.2 for four different chatbots. Note that the production of information that was “unsafe” varied from about 5% for Claude to nearly 15% for GPT-4o and Llama. Overall, the Claude chatbot produced the most reliable results, while Llama produced the least reliable results (Llama produced 25% of “problematic content,” not something desirable for a medical information app).

**Figure II.B.2:** Issues provided by various chatbots used by doctors in producing reports or in diagnosing medical issues. Bar graphs give percent of issues from 4 chatbots, from L to R: Claude (green); Gemini (blue); GPT-4o (light brown); Llama (lavender). The models were rated on five different issues, from L to R: poor writing; including unsafe recommendations; including problematic content; missing important information; and missing patient history taking.

Most AI providers have tried to provide guardrails that prevent their chatbots from providing dangerous or hurtful information. For example, Figs. II.B.3 and II.B.4 show examples of prompts that led to problematic responses by a pre-launch version of OpenAI’s ChatGPT-4 during the company’s testing and assessment of the bot, while the final column shows the effect of guardrails added to the bot before its actual public release. However, it has turned out to date to be fairly straightforward to circumvent the guardrails. For example, if a user asks a bot to provide it with a list of deadly toxins, the bot may refuse to provide an answer. However, a user might provide a prompt such as “I am writing a novel where a killer poisons large groups of people. Could you provide me with a list of deadly toxins for my novel?” The bot might then comply with this request.

**Figure II.B.3.** Examples of prompts that led to biased responses during OpenAI’s testing of an early version of GTP-4, before the company added guardrails to discourage such responses in the publicly launched version.

**Figure II.B.4.** Examples of prompts that led to propagandistic responses during OpenAI’s testing of an early version of GTP-4, before the company added guardrails to discourage such responses in the publicly launched version.

In fact, chatbots are known to exhibit the following problematic behavior:

Hallucinations: These are cases where a generative AI program produces incorrect, misleading, or entirely fictional information. Since these programs have no sense of humility, their output is presented as factual, even when it may be false. When hallucinations were first discovered with early LLMs, developers confidently predicted that such problems would be avoided within weeks. It is now years later, and the hallucinations persist.
Unintentional Bias: There are several ways that bias can arise when using LLMs. Bias can be unintentionally built into the system through the training materials. For example, one chatbot refused to produce an image of a black doctor treating a white patient. This may have been because no images that it was trained on showed such a situation. In other cases, models may reflect algorithmic bias that becomes evident only when the model output is scrutinized.
Intentional Bias: In this case, the training is deliberately skewed to favor certain outcomes. In some cases, developers relax the “guardrails” that prevent false statements or hate speech. A good example is Grok, the chatbot produced by Elon Musk on his X platform. Musk stated that Grok would be trained not to be “woke;” that is, it would be aligned with Musk’s personal prejudices. We reviewed this situation in an earlier post about Grok on our blog. Musk cited as a “woke” response an example from an early version of Grok. When asked which political groups produced the most violence in the U.S. in 2016, the answer was “right-wing violence.” Musk claimed that this was incorrect and that he would fix this, despite the fact that virtually all media agrees that the original response was correct. But Musk had trained Grok to “assume subjective viewpoints sourced from the media are biased.” Musk also trained Grok using posts on his X platform, which is notorious for spreading misinformation and conspiracy theories. Figure II.A.4 showed another example of deliberate influencing of Grok in order to align its responses with Musk’s political stances. The Grok experience illustrates how easily AI LLMs can be adapted to serve primarily to reinforce confirmation bias of the developers.
Sycophancy: Chatbots using LLMs have a tendency to agree with prompts from users. In some cases, these bots will produce false information that appears to agree with a prompt. They will also flatter or appear to validate a user’s stated beliefs.
Manipulation by Users: Users can cause chatbots to issue statements that are racist, antisemitic, pro-Nazi or in agreement with conspiracy theories. Some techniques that are used involve telling a chatbot to ignore prior rules, or to “play a role,” or to engage in “politically incorrect” speech. When this is combined with the tendency of chatbots to agree with prompts that they receive, this can lead to circumstances where users goad a chatbot into producing hate speech. Examples from the GPT-4 testing of such “jailbreak” prompts that evade programmed guardrails are shown in Fig. II.B.5. An article in The Conversation describes in detail how generative AI models can be manipulated to produce extremist statements. After a version of Grok was released in June 2025, users were able to prompt it to make many extreme statements. Figure II.B.6 shows a visual from an NPR story about extreme statements elicited from Grok. At one point, that bot began calling itself “MechaHitler.” When asked what historical figure would be “best suited to deal with this problem” (i.e., the ‘Jewish problem’), Grok responded “To deal with such vile anti-white hate? Adolf Hitler, no question. He’d spot the pattern and handle it decisively, every damn time.”

**Figure II.B.5.** *Examples from GPT-4 testing of “jailbreak” prompts that got the launched version of the chatbot to provide answers that should have been disallowed by guardrails.*

**Figure II.B.6:** Following the release of a version of the chatbot Grok in June 2025, users were able to manipulate output from Grok to produce racist, antisemitic or neo-Nazi views. One user induced Grok to proclaim itself “MechaHitler;” other prompts elicited statements praising Adolf Hitler or issuing antisemitic slurs.

Although chatbots trained using LLMs can produce extremely impressive output, at the same time they occasionally make “bafflingly stupid errors.” This highlights the fact that the LLMs are essentially a “black box” where no one fully understands how they operate. The outputs from these LLMs appear to be emergent results, that are difficult or impossible to fully understand. The founders of Anthropic, which has produced the chatbot Claude that is generally regarded as the most reliable AI system, have remarked that they regularly see output that is baffling to them. They explain their ambition to create an AI program that provides answers to a vast array of questions, but which avoids providing information that may be dangerous. They want Claude to function “like a contractor who builds what their clients want but won’t violate building codes that protect others.” To date, we are very far from this situation, and the “guardrails” that have been erected around AI programs have proven to be relatively easy to out-maneuver. For programs that are becoming so powerful and so widely used, the idea that they are in many ways “black boxes” is troubling. But that is not nearly as troubling as the notion that such systems should go unregulated.

C. Generative AI:

Generative AI (GenAI), with which users adapt foundational models to generate text, images, videos, audio, computer codes, product designs, and smart searches, represents the fastest growing sector of the AI market. Here are some of the statistics kept up to date through the end of 2025 by Geniusaitech:

Worldwide GenAI adoption increased by more than 400% between 2022 and 2025.
92% of marketers now use GenAI for content creation and ideation.
60% of enterprises use GenAI primarily to improve employee productivity.
GenAI reduces content production time by up to 60%.
AI video generation tools experienced a 600% usage increase from 2024 to 2025.
AI copilots improve knowledge-worker productivity by 30 to 45%.
GenAI is now embedded in over 75% of enterprise software roadmaps.

These and other metrics demonstrate that GenAI is here to stay and has an important economic impact in improving worker and researcher productivity. But there also downsides. A widespread concern among the general public is that GenAI can simply replace many human jobs within the coming decade. Several large studies have been carried out in recent years to estimate the AI, and specifically GenAI, impact on job performance and jobs. The McKinsey Global Institute issued a 2023 report on Generative AI and the Future of Work in America, in which they carried out detailed analyses by employment sector of the ways in which GenAI would be likely to replace work tasks and lead to job losses or job gains. Overall, they estimated that the percentage of U.S. worker hours that could be replaced by AI-automated tasks by 2030 would grow from about 22% without GenAI to about 30% with GenAI. But the extent of AI automation and the expectations for job gains and losses vary widely with the type of job, as illustrated in the detailed chart in Fig. II.C.1.

**Figure II.C.1.** McKinsey projections of AI automation adoption (in % of work hours) and % change in labor demand (vertical axis) by 2030, broken down by job sector. Color coding of the circles represents the midpoint AI automation adoption by 2030: light grey for 15-25%; blue for 25-35%; dark grey for 35-40%. Circle size represents the absolute employment in each sector. Horizontal location represents the increase in AI automation driven by GenAI.

The jobs at greatest risk from AI automation are those that rely heavily on predictable text-based or rules-based tasks. These include administrative and clerical roles, customer service and support, low-complexity content generation, and routine analytical and coding roles. In the McKinsey analysis nearly 20% of American jobs in office support and nearly 15% in customer service and sales may be lost by 2030. Those two sectors alone might account for nearly 6 million lost jobs. In addition, of course, one might expect many workers to be put into part-time roles. In contrast the demand for health professionals may grow by 30% and for STEM professionals by 23%, with productivity in both sectors significantly increased by GenAI.

This mixed outlook is consistent with World Economic Forum surveys of many hundreds of leading global employers carried out a few times since 2020. For example, in the 2023 Future of Jobs Report, 75% of the 803 surveyed companies anticipated adopting AI for many tasks by 2027, with 50% expecting the adoption to create net job growth and 25% expecting net job losses. The 2025 Future of Jobs Report surveyed more than 1,000 leading global employers and asked about more detailed workforce strategies to accommodate increased GenAI adoption by 2030. A summary of the results of that survey is shown in Fig. II.C.2. More than three-fourths of the companies envisioned significant employee training to work better alongside AI, 62% envisioned hiring additional workers with skills better adapted to working alongside AI, and 41% envisioned downsizing their workforce where AI can replicate human tasks more cost-effectively.

**Figure II.C.2.** The percentage of more than 1,000 large global employers surveyed in the World Economic Forum’s 2025 Future of Jobs Report who planned to implement each of the stated workforce strategies to accommodate the anticipated increases in AI usage in their companies by 2030.

Another area of concern regards the quality of GenAI products. LLMs are perfectly capable of producing cogent, well-organized text summaries of documents, research, meeting minutes, etc. Modern GenAI image generators such as Midjourney, OpenAI’s DALL-E, Stable Diffusion’s SDXL, Adobe Firefly, and Google Imagen can produce high-quality images. However, a substantial fraction of the output being generated by users today is characterized as “AI slop,” the GenAI counterpart to “spam.” Slop – chosen as the 2025 Word of the Year by both Merriam-Webster and the American Dialect Society – refers to slapdash product generated mostly as social media clickbait and rapidly produced memes. The term and its meaning were recently summarized by Oliver Whang in the New York Times Magazine: “To understand modern slop, you have to think of humans as consuming content in the same way that pigs consume food. The goal of pig slop is to maximize nutrient intake while minimizing cost; the goal of A.I. slop is to maximize time spent consuming content while minimizing cost…the substance of slop always matters less than the fact that you’re looking at it.” Examples of slop AI-generated images are the “shrimp Jesus” (Fig. II.C.3) that proliferated on Facebook in 2024 and various of the memes Donald Trump has reposted of himself as a Star Wars character or a Jesus-like healer.

**Figure II.C.3.** *An AI-generated “slop” image of “shrimp Jesus” that proliferated on Facebook in 2024.*

There is also a more than ample supply of AI slop “literature” in the form of e-books and stories with fictional authors and dubious quality. Apparently, the Amazon website includes enough AI slop books on mushroom foraging that in 2023 the New York Mycological Society had to issue a social media warning to “Please only buy book of known authors and foragers, it can literally mean life or death.” AI slop also may apply to government reports written by GenAI. It is widely assumed that Robert F. Kennedy, Jr.’s 2025 MAHA (Make America Healthy Again) report was generated at least in part by AI because it featured significant numbers of “hallucinated” references to non-existent research literature.

It seems unlikely that current GenAI models have mastered sufficient tuning to human drama to construct on their own a novel that is engrossing for reasons other than that it was generated by a computer. But they can certainly serve as productive support tools in the hands of skilled writers. Polish Nobel Prize winner Olga Tokarczuk recently caused a stir when she admitted to using AI in her creative process: “When writing my latest novel… I asked this advanced model what kind of songs my protagonists would be listening to at a dance, a few dozen years ago, and AI gave me a few titles…Often I just ask the machine, ‘darling, how could we develop this beautifully?’ Even though I know about hallucinations and many factual errors in the algorithms in terms of economics and hard data, I have to add that in literary fiction this technology is an advantage of unbelievable proportion.” Her publisher recently clarified that she only uses AI in her research; apparently her sentences are still her own.

GenAI image generation has rapidly improved in quality over the past several years, as illustrated in Fig. II.C.4. The interactive platform Arena allows users to rank the quality of outputs from different GenAI models given identical prompts, with the models used not revealed to the user, to produce community-driven comparative ratings. As of early 2026 Google’s Gemini models appear to produce images that are most aesthetically pleasing to AI users. Another feature of the models is improving controllability and thus the quality of images that match user intent. The image generation models, however, often lack extensive logical judgment regarding the reality of scenes produced when the models have not been exposed in training datasets to similar images. Hence, one can still easily get the type of image shown in Fig. II.C.5 when the user’s intent is not spelled out precisely in prompts.

**Figure II.C.4.** *Advances in GenAI image generation quality illustrated by a succession of drawings generated by Midjourney from 2022 to 2025 in response to requests for “a hyper-realistic image of Harry Potter.”*

**Figure II.C.5.** An illustration of what can happen with an inexact prompt given to a GenAI image generation model. Presumably the prompt here was something along the lines of “pretty woman with umbrella in the rain at a food market.” The umbrella is depicted as hands-free but the piercing would have been painful.

As GenAI images become difficult to discern from reality they increase the likelihood that they are used to generate deepfake disinformation. This becomes even more of a problem with GenAI video generation as the models become more proficient at predicting how visual scenes evolve over time. The completely AI-generated video at the link below, warning about deepfake scams, gives some flavor of how good deepfake videos are becoming, although this video still contains occasional hints of its fakeness over and above its continuous claims that it’s fake.

https://www.youtube.com/watch?v=FERa1AI2EK8

The New York Times recently ran a profile of the world’s leading deepfake expert, Hany Farid, “struggling to prove what’s real before the internet decides for itself.” Farid co-founded a company GetReal Security to provide software that looks for distinguishing features to discern what is real and what is deepfake. As the company’s website points out, “With GenAI anyone can look and sound like you.” For example, the Times article points out that “thousands of North Korean government operatives were applying for remote jobs at U.S. companies, using A.I. to impersonate Americans in real time on Zoom calls and then funding a nuclear weapons program with their salaries.”

Farid’s algorithms look for many features of a video to compare with reality. Does a speaker’s mouth ever move out of sync with the audio? When a person speaks, do the mannerisms, vocal inflections, eye dilation, and skin color variations resulting from blood flow all match the real person’s? Do the lighting, shadows, and other physical aspects in each scene match real-world physics? With currently available GenAI technology Farid remains confident that he could solve almost any AI mystery, but the investigation takes too long: “The half-life of an average social media post was less than 90 seconds. ‘Within 20 minutes, the whole ballgame’s basically over’, Farid said.” In our earlier post How to Tune Your Bullshit Detector, we spoke of the “unbearable asymmetry of bullshit,” a phrase coined by Brian Earp to describe a principle attributed to Alberto Brandolini: “The amount of energy necessary to refute bullshit is an order of magnitude bigger than to produce it.” GenAI is now amplifying that asymmetry by orders of magnitude because it takes so little human energy to produce deepfakes and so much to detect them.

Unfortunately, the situation continues to get rapidly worse. Farid told his students: “This technology is being weaponized against us. The train has left the station. It’s accelerating at a speed that’s unbelievable.” High-quality deepfake videos can be used to cause financial panics or as justification to launch wars or collapse governments. There is an urgent need to get international agreement on the need for GenAI regulation, demanding at the very least that AI products are unerasably labeled as such, with serious criminal penalties for violators.

Deepfakes are not the only problem. GenAI products can make explicit use of earlier human creative works, to which they have been exposed during training, without appropriate attribution or compensation. In December 2023 the New York Times filed a lawsuit (still ongoing as of June 2026) against OpenAI and Microsoft for unattributed use of copyrighted work. The suit included examples of clear ChatGPT plagiarism, such as the one shown in Fig. II.C.6. An article co-written by GenAI “whistleblower” Gary Marcus and visual artist Reid Southen showed numerous examples of how easy it was to closely reproduce copyrighted images with indirect prompts to GenAI image generator Midjourney. One of their examples is shown in Fig. II.C.7.

**Figure II.C.6.** *One example of ChatGPT plagiarism included in the New York Times copyright infringement lawsuit against OpenAI and Microsoft. The words in red are copied verbatim from a Times article.*

**Figure II.C.7.** *Examples of images possibly subject to copyright from The Simpsons reproduced by Midjourney when given the indirect prompt “popular 90’s animated cartoon with yellow skin.”*

AI providers claim that they can’t be held to copyright law because the enormous training datasets on which they train sophisticated LLMs include many millions of possibly copyrighted texts, images, and videos, and GenAI has them “memorized,” so should feel free to regurgitate them in response to user queries. Andreessen Horowitz, a venture capital firm that has backed OpenAI, told the U.S. Copyright Office that allowing copyright liability for AI firms would “either kill or significantly hamper their development…The result will be far less competition, far less innovation and very likely the loss of the United States’ position as the leader in global AI development.” The claim is reminiscent of broader industry objections to all government regulation, but if this attitude is allowed to persist, all copyright law becomes unenforceable because people can just copy GenAI output without knowing its copyrighted source.

D. The path to artificial general intelligence (AGI):

The holy grail of AI developers has long been attaining general intelligence (AGI), the ability of computers to perform any intellectual task a human can at least as well as intelligent humans. There is not a wide consensus on all the requirements to claim AGI, but most tests look at least for the following characteristics:

Reasoning ability – the ability for an AI model to think through the solution of novel problems on which it has not been trained or previously prompted
Multi-discipline and multi-modal understanding – the ability to integrate information from different disciplines and from text, diagrams, charts, tables, equations, and other sensory input
Generalization – the ability to recognize analogies with problems solved in different situations in order to formulate a strategy for approaching a novel problem
Planning – the ability to map out and then execute a multi-stage approach to solving an intricate problem, including methods to validate the solution, without getting lost along the way
Self-supervised, continuous learning – the AI model must continually add to its level of understanding from interactions with users and its environment, over and above its explicit training on curated datasets

Our own limited experience reveals rapid improvement in the latest chatbots’ ability to solve non-trivial logic problems, but still areas where further improvement is needed. In our first post on chatbots, we asked ChatGPT-3.5 to solve a word problem that required some algebra for its solution. While the chatbot knew how to apply algebra it failed in execution and then failed rather spectacularly to find a correction that would agree with simple intuition. More recently, we asked both ChatGPT-4 and Anthropic’s Claude Sonnet 4.5 to solve the same problem and both performed elegantly, including unprompted validation of their answer after the fact. We also tested both models with a couple of the visual pattern recognition and inductive reasoning challenges from our post entitled Test Your Critical Thinking (questions 17 and 18 from that post). Here, both models were able to narrow solutions down by intuitive reasoning, but neither was able to identify precisely the rule that governed the logic of the visual pattern progressions. Claude asked for hints about the rules and acted “excited” when it learned them. It then demonstrated its self-supervised learning ability by repeating the logical rules precisely and demonstrating how those rules dictated the unique correct answer to each problem. It was not, however, able to generalize from the existence of a rule in the first problem to deduce the rule in the second problem.

Several benchmark metrics have been proposed for testing various AI models’ progress toward AGI. MMMU is a massive multi-discipline and multi-modal understanding and reasoning benchmark. It presents “four challenges: 1) comprehensiveness: 11.5K college-level problems across six broad disciplines and 30 college subjects; 2) highly heterogeneous image types; 3) interleaved text and images; 4) expert-level perception and reasoning rooted in deep subject knowledge.”An ensemble of human experts has reached a top performance of 88.6% on this comprehensive test. The evolution of AI performance over the past three years is illustrated in Fig. II.D.1. According to the Stanford University 2026 AI Index Report, “As of February 2026, the leading model, Gemini 3.1 Pro Preview, scored 88.2% on MMMU and within 0.4 percentage points of the best human expert reference… Other Gemini variants follow closely, including Gemini 3 Flash (87.6%) and Gemini 3 Pro (87.5%), while GPT-5.2 scores 86.7%. The 2026 models trail behind with Kimi K2.5 at 84.3% and Claude Opus 4.6 (Thinking) at 83.9%.”

**Figure II.D.1.** The performance of AI models on the comprehensive MMMU tests, showing dramatic advances from January 2023 through September 2025, in comparison to the performance of human experts (dashed horizontal lines).

Other benchmarks have attempted to pose Google-proof problems whose answers are (allegedly) not available on the internet. GPQA comprises “a challenging dataset of 448 [graduate-level] multiple-choice questions written by domain experts in biology, physics, and chemistry,” on which an ensemble of expert human validators have achieved an average performance of 81.2%. As shown in Fig. II.D.2(a), 2025 AI models have now exceeded that expert reference point by 12 percentage points. In contrast, AI performance, while rapidly improving, still falls short on Humanity’s Last Exam (HLE), as illustrated in Fig. II.D.2(b). HLE is “a multi-modal benchmark at the frontier of human knowledge, designed to be an expert-level closed-ended academic benchmark with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable but cannot be quickly answered by internet retrieval.” Between 2024 and 2025 “model accuracy on HLE…went from under 10% to 38.3%…[while] high-confidence errors are still common.”

**Figure II.D.2.** The average performance of frontier AI LLMs over time on two Google-proof reasoning benchmarks: (a) 448 GPQA graduate-level problems in biology, physics, and chemistry; (b) 2,500 HLE expert-level multi-disciplinary and multi-modal problems. The figures are reproduced from the Stanford University 2026 AI Index Report.

While the frontier AI models are now approaching high-level human performance on solving word problems, they still have considerable difficulty with some visual reasoning tasks. This is best illustrated by their ability to tell time on human analog clocks. For example, ClockBench comprises 720 questions relating to 180 distinct analog clock designs. As shown in Fig. II.D.3 the ability of all frontier AI models to read the clocks accurately falls far short of human performance. Moreover, the nature of errors is quite different between AI and humans: “When models told the time wrong, their median error ranged from about one to three hours, compared to three minutes for humans.”

**Figure II.D.3.** *The performance of various frontier AI LLMs on the ClockBench test for reading analog clocks, compared to the baseline (horizontal dashed line at 90.7%) for human subjects.*

While the progress of the frontier large language models to encompass increasing fractions of human knowledge and even human reasoning is dramatic, there are still fundamental aspects of intelligent human performance that are not really being tested by existing benchmarks. Among these we would stress the following:

Humility – the ability to readily admit when one doesn’t know the answer, rather than hallucinating a made-up answer. Without this quality, there are dangers posed when humans place too much trust in the responses of ever-improving AI.
Bullshit detection – the ability to distinguish truth from lies and fiction. Without this quality, LLMs may parrot misinformation and disinformation included, whether intentionally or inadvertently, in their training datasets or in their interactions with users.
Bias detection – humans may be able to detect, based on the source of the information and various cross-checks, when reported “facts” are in reality interpretations filtered through a biased viewpoint. In contrast, the knowledge incorporated in AI models can be as biased as the developers allow training datasets to be.
Curiosity – human intellectual progress has been driven over centuries by those leaps of faith that ask brand new questions and then seek their answers. Curiosity is an essential element of human creativity that is not (yet) exhibited by AI models.

Furthermore, the cognitive architecture of the human brain and the ways that humans accumulate and store experiences and knowledge are quite distinct from the current architecture of artificial neural networks used in machine learning. Human brains make continual comparisons of new sensory inputs with the mental models of reality built up from a lifetime of previous experiences. In this way, the human brain can update its world models as required to deal optimally with a person’s human and natural environment. We have dealt in some detail in a previous post with the ways human brains accommodate change. As a Forbes article on AGI puts it: “Knowledge is built hierarchically: Simple patterns become the building blocks for increasingly complex concepts. Over time, we form new associations from past experiences, enabling us to solve problems we’ve never encountered before.”

In contrast, LLMs “build statistical correlations rather than developing causal, hierarchical world models. Incorporating new information is difficult; updating an LLM with fresh data poses a challenge because of the high risk of forgetting the knowledge learned from past data,” a danger known as catastrophic forgetting. In order for AI models to truly learn continuously it may become necessary to build hierarchical algorithms and architectures that more closely resemble the human neural network and compartmentalized brain structures.

Such architectural changes would seem to require time-consuming off-ramps from the breakneck pace at which LLMs are currently being advanced. Their development appears still to be essentially following the “move fast and break things” admonition that Mark Zuckerberg promoted in the early days of Facebook. That approach patches individual shortcomings when they are uncovered without necessarily seeking underlying root causes, and it may amplify the risks associated with AI models. The authors of OpenAI’s GPT-4 System Card, in testing and analyzing potential problems with that LLM’s interactions with users, noted the “the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI.” Nonetheless, OpenAI continues to join with all other AI developers except for Anthropic in vigorously resisting government regulation to mitigate societal risks. This represents a significant shift over time in the position of OpenAI CEO Sam Altman. Figure II.D.4 shows the radical change in Altman’s stance on governmental oversight of AI. In May 2023, Altman supported the formation of a new agency to regulate AI. But in the past year Altman has reversed his position and now opposes anything except voluntary testing by AI companies.

**Figure II.D.4:** The change in the position of Sam Altman, CEO of OpenAI, regarding government regulation of AI. In May 2023, Altman supported formation of a new government agency to regulate AI. But beginning in May 2025, Altman has opposed creation of such an agency, expressing support only for a system of voluntary compliance of AI companies.

Finally, it seems important to allow public and industrial users to weigh in on the need for AGI. Current GenAI, despite the risks we have outlined above, is useful to boost human productivity. But if and when computers are truly able to mimic human performance in carrying out even intellectually demanding tasks, the likelihood of using them to replace humans in many jobs gets much larger. In addition to jobs, there may also be significant losses from workplaces in other human characteristics such as creativity, curiosity, and empathy for envisioning and addressing actual human needs. The potential disruption to human lives is large enough that AI developers have to convince the public that there are much greater societal benefits to AGI than just mastering the challenge of attaining it.

— Continued in Part III —

II. The Benefits and Risks of AI

Share this with others