On a recent afternoon, Tudor Achim gave a brain teaser to an A.I. bot called Aristotle.
The question involved a 10-by-10 table filled with a hundred numbers. If you collected the smallest number in each row and the largest number in each column, he asked, could the largest of the small numbers ever be greater than the smallest of the large numbers?
The bot correctly answered “No.” But that was not surprising. Popular chatbots like ChatGPT may give the right answer, too. The difference was that Aristotle had proved that its answer was right. The bot generated a detailed computer program that verified “No” was the correct response.
Chatbots like ChatGPT from OpenAI and Gemini from Google can answer questions, write poetry, summarize news articles and generate images. But they also make mistakes that defy common sense. Sometimes, they make stuff up — a phenomenon called hallucination.
Mr. Achim, the chief executive and co-founder of a Silicon Valley start-up called Harmonic, is part of growing effort to build a new kind of A.I. that never hallucinates. Today, this technology is focused on mathematics. But many leading researchers believe they can extend the same techniques into computer programming and other areas.
Because math is a rigid discipline with formal ways of proving whether an answer is right or wrong, companies like Harmonic can build A.I. technologies that check their own answers and learn to produce reliable information.
Google DeepMind, the tech giant’s central A.I. lab, recently unveiled a system called AlphaProof that operates in this way. Competing in the International Mathematical Olympiad, the premier math competition for high schoolers, the system achieved “silver medal” performance, solving four of the competition’s six problems. It was the first time a machine had reached that level.
“This is a path around hallucinations,” said David Silver, a principal research scientist at Google DeepMind. “Proof is a form of truth.”
Using similar techniques, some researchers believe they can eventually build an A.I. system that is better at math than any human. That’s the goal of Mr. Achim and his co-founder, Vlad Tenev, better known as the chief executive of the online stock trading company Robinhood. Harmonic has raised $75 million in funding from Sequoia Capital and other investors.
Others, like Dr. Silver, believe these techniques can extend even further, leading to A.I. systems that can verify physical truths as well as mathematical.
Around 2017, companies like Google, Microsoft and OpenAI began building large language models. These A.I. systems often spent months analyzing digital text culled from across the internet, including books, Wikipedia articles and chat logs. (The New York Times sued OpenAI and Microsoft in December for copyright infringement of news content related to A.I. systems.)
By pinpointing patterns in all that text, these systems learned to generate text of their own, including term papers, poetry and computer code. They could even carry on a conversation.
But the technology also seemed dopey at times. It seemed to just spit out what it had learned from the internet — unable to verify whether the information was right or wrong, real or completely made up.
This month, OpenAI unveiled a new version of ChatGPT that was designed to reason through questions. It spends time “thinking,” trying different strategies in an effort to reach the right answer. But it still gets things wrong and makes stuff up.
Researchers like Mr. Achim are beginning to address these problems through mathematics. With math, you can formally prove whether an answer is right or wrong.
About a decade ago, a Microsoft researcher, Leonardo de Moura, created a computer programming language specifically for proving mathematical statements. Called Lean, this programming language was originally a tool for human mathematicians. But now that A.I. systems are skillful enough to generate their own computer code, they can also use Lean.
Harmonic is designing a large language model that can generate its own Lean proofs. The Lean code it generates is not always perfect. But through trial and error, it can learn to verify a solution.
“It is a lot like a human,” Mr. Achim said. “If you are trying to solve a math problem, you try certain steps. And if they fail, you try others, until you get them right.”
When Aristotle is asked to answer math problems, it can check the answers. These might be simple questions like “What is 2+2?” Or they might be more complex brain teasers like the one with the 10-by-10 grid of numbers.
“If the system can output an answer, it is basically guaranteed to be correct,” Mr. Achim said.
As Aristotle checks its own answers, it becomes a way of generating enormous amounts of trustworthy digital data that can be used to teach A.I. systems. In other words, Aristotle can generate data that can be used to improve itself.
Researchers call this “synthetic data” — data produced by A.I. that can then be used to train A.I. Many researchers believe this concept will be vital part of A.I. development.
Mr. Achim and Mr. Tenev believe that after years of training, Aristotle will be better at math than any human. “We want it to be as smart as the collection of all the mathematicians in the world,” Mr. Tenev said. “We want it to solve problems that have never been solved.”
A.I. systems can use the same techniques to verify their own computer code, which relies heavily on mathematical logic. And if a system can generate reliable code, it can take actions on the internet. It becomes what researchers call an A.I. agent. As these A.I. systems improve, many researchers say, they could automate almost any digital work.
But researchers are quick to add that these A.I. systems have limits. Lean code can prove math theorems and verify computer code, but it cannot handle the complex ins and outs of daily life.
“Once you step out of the mathematical realm, things are very different,” said Angela Fan, a research scientist at Meta. There is often no absolute right and wrong that A.I. systems can learn to work toward as they do in mathematics.
Dr. Silver acknowledges this problem. But he also says there are verifiable truths in the real world. A rock is a rock. Sound travels at 343 meters per second. The sun sets in the west. If A.I. systems pull information from physical reality, it can verify these truths, too.
“Truth can come from the world,” Dr. Silver said. “If you can get feedback from world, you can improve and improve and improve.”