November 19, 2025
4 min read
Each Time AI Gets Smarter, We Change the Definition of Intelligence
As AI systems exceed one benchmark after another, our standards for “humanlike intelligence” keep evolving
“When will AI achieve humanlike intelligence?” I recently asked a friend. “It already has,” he replied, suggesting that if you were to travel back in time to 1995 and evaluate our current versions of artificial intelligence from that vantage, most people would consider the technology’s intelligence humanlike—maybe even superhuman. The goalposts for humanlike intelligence, he said, keep shifting each time AI improves.
Intelligence has never been easy to define. For decades, we’ve debated what makes up analytical, creative and emotional intelligence in people, weighing the value of instruction-following against autonomy. We’ve done the same with machines, and my friend is right: the target we’ve set for AI intelligence has continually moved.
The subject isn’t merely philosophical. Consider the contract put in place when Microsoft and OpenAI began working together in 2019. OpenAI said in a blog post that Microsoft’s $1-billion investment in the company would “support us building artificial general intelligence (AGI),” which OpenAI’s charter defines as “highly autonomous systems that outperform humans at most economically valuable work.”
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Three weeks ago, on October 28, Microsoft and OpenAI updated their agreement. In it, Microsoft keeps special access to OpenAI’s technology and retains the right to use it first in products until OpenAI says it has reached AGI. Under the new agreement, Microsoft also has the rights to “post-AGI” models through 2032, and if OpenAI claims it has reached AGI, that declaration will now be independently verified by an expert panel. It raises a difficult question: How will that group of experts decide when human-level intelligence has been achieved?
Since 1950 the primary benchmark for machine intelligence has been the Turing test, proposed by computer pioneer Alan Turing. The idea is simple: A human judge communicates with an unseen human and a machine via text and must decide which is human. If the judge can’t reliably tell the two apart, the machine passes.
Over the decades that followed Turing’s proposal, researchers built symbolic systems using rules and logic to imitate human problem-solving. Their programs solved puzzles and played games but were largely useless when faced with real-world complexity. Well into the 1990s, “expert systems” were created that encoded human knowledge but functioned only within extremely narrow domains.
The modern era began in the 2010s, when neural networks and large datasets allowed machines to learn patterns instead of relying on fixed rules. In 1997 IBM’s Deep Blue beat chess grandmaster Garry Kasparov, and chess, which had been a proxy for “thinking,” suddenly became less important in the intelligence discussion. Models for translation, image recognition and language also began to excel. In 2015 a vision model did better than estimated human performance in classifying objects. Other programs surpassed challenges in language and reasoning in the late 2010s. Between 2015 and 2017 AlphaGo—designed to play Go, a more complex game than chess—defeated the world’s best Go players.
Cognitive scientist Douglas Hofstadter has argued that we redraw the borders of “real intelligence” whenever machines reach abilities once seen as uniquely human, downgrading those tasks to mere mechanical abilities to preserve humanity’s distinction. Each time AI surpasses the bar for achieving human skills, we raise it.
That’s how the concept of AGI emerged—to describe a system that could understand, learn and act across many domains with a human mind’s flexibility. Introduced in 1997 by physicist Mark Avrum Gubrud, it was popularized in the 2000s and stuck because it moved away from the concept of AI as parlor‑game imitation and toward the development of benchmarks that evaluate competence across domains and in many different situations. This meant that Deep Blue, ImageNet and AlphaGo had not only to outperform humans in their areas of expertise but also to solve Ph.D.-level math, write prizewinning fiction and make fortunes in the stock market—because, of course, that’s what it means to be human.
This is why, when OpenAI’s GPT-4.5 decisively passed the Turing test in 2025, the achievement barely made the news. It’s also why, when GPT-4 received a top-decile score on a simulated bar exam or when any of the current major frontier models solved Ph.D.-level questions, we didn’t arm up to do battle with the robots, as many science-fiction films predicted we would. But if we returned to the 1990s to reveal systems that could converse fluently about science, generate websites in seconds, offer real-time spoken translations and write up a serviceable will, people may well have armed the nukes.
Still, there’s something missing. My friend isn’t wrong—machine intelligence meets or surpasses humanlike abilities in many areas—but being an embodied human is complex, and our grasp of intelligence has grown significantly. Although this year’s AI Index Report from the Stanford Institute for Human-Centered AI highlights that the technology is mastering new benchmarks faster than ever, it also stresses that complex reasoning remains a challenge.
As many thinkers have pointed out, the problem may simply be in the concept of humanlike intelligence. If AI intelligence is perceived as uneven and ours isn’t, it’s because we’ve set ourselves as the standard. Evolution gave us highly adaptable reasoning skills and a hard skull that limits the size of our databases. Seen in that light, we are also uneven. And as we constantly move the targets for AGI, the intelligence that arrives may be one we hardly recognize.
It’s Time to Stand Up for Science
If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.
I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.
If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.
In return, you get essential news, captivating podcasts, brilliant infographics, can’t-miss newsletters, must-watch videos, challenging games, and the science world’s best writing and reporting. You can even gift someone a subscription.
There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.


