Artificial Intelligence

Think ChatGPT Is Smart? Ask It to Do Arithmetic

Chatbot AI has advanced, but it won't be reliable soon—or perhaps ever.

Posted March 14, 2023 | Reviewed by Abigail Fagan

Key points

ChatGPT has been found to fail at simple arithmetic.
AI is best suited to problems where there is a single, verifiable, correct answer.
We may eventually go back to having computers do what they do best: tedious algorithmic tasks.

Source: Cottonbro Studio/Pexels

ChatGPT seems to be everywhere. People of all ages—especially my kids—have enjoyed hours of laughs asking its artificial intelligence silly and scatalogical questions.

Since it has ingested billions of pages of text, including reference works, ChatGPT possesses a massive store of information. But it can do more than just search its memory—it also synthesizes information. You can ask it, “what is the hangy thing called in the back of your throat?” and it will correctly describe the uvula (and it won’t care about your use of colloquial language and awkward grammar).

But there are darker sides to this technology.

While there are some innovations in ChatGPT from a computer science point of view—mainly due to the size of its training set—the practical uses of the technology are often illicit. An online mental health service was caught in January 2023 using ChatGPT to provide counseling to unwitting clients. Students around the world are asking it to do their homework and exams for them, causing a collective freak-out among educators (it's interesting that other trendy digital technologies like cryptocurrency also seem to primarily serve illicit functions like money laundering).

Are the downsides of chatbots an acceptable price for the service they provide? I think this depends on the quality of knowledge that a system produces. At present, as economist Tim Harford argues, when ChatGPT is asked a substantive question, it provides plausible answers, rather than correct ones. These are the kinds of answers that a student who doesn’t entirely understand a question might produce—good enough to get partial credit, but not correct.

ChatGPT and Simple Math

To get a better sense of how smart ChatGPT is, ask it to do some arithmetic. Start with small numbers. It works! Now try longer numbers. Here are some I tried: 732542667 + 2348378780099. ChatGPT confidently spit out this answer:

The sum of 732542667 and 2348378780099 is 2355113707466.

Too bad the actual answer is 2349111322766. This flaw was pointed out recently by science fiction writer Ted Chiang. As Chiang suggests, the system seems to have trouble carrying ones.

ChatGPT isn't always incorrect. And with smaller numbers, the errors are small; again, the answers are plausible. But with numbers of 20 digits, errors can throw the answer off by a factor of a hundred or more.

There is a convergence between the views of Harford and Chiang: ChatGPT lacks something essential to the task, resulting in only an approximation of the truth, even when a perfectly correct answer is fully achievable by humans (and by much less sophisticated technology).

A related problem is that AI systems like ChatGPT don’t have ways of indicating that they don’t have a good answer to a particular question—they can’t say “I don’t know.” They are designed to choose the statistically best response from a ranked set—they can’t say “all the options I can come up with are bad.”

Let Computers Compute, and Humans Think

What’s ironic about all this is that computers were invented for the specific purpose of solving math problems. They’re called computers, after all. Computers were imbued with tireless logic circuits so that our time would not have to be spent performing long and tedious computations. It’s not that we can’t do these tasks: human calculators (many of whom were Black women) performed the computations that got NASA's Apollo astronauts to the moon. It's just that computers do it much faster.

Even if ChatGPT can be improved to solve arithmetic problems correctly every time, let’s take a step back for a moment: why are we trying to build a machine that knows everything? Beyond entertainment value, I think the only explanation is that the problem’s sexiness attracts investors. The idea that such a thing can (let alone should) be built is a kind of utopian dream and an all-encompassing framework. But in the quest to make it infallible, all the old problems of our messy world need to be addressed one by one, which defeats the whole purpose of artificial intelligence.

Of course, ChatGPT is not an end in itself—chatbots could eventually be specialized for limited domains, such as mental health counseling. But why would we expect a targeted application such as counseling to be "easier" to solve with AI?

The brain-inspired tech at the heart of AI is not useless. The issue is finding the right problem to solve with it. As Princeton computer scientist Arvind Narayanan argues, AI is best suited to problems where there is a single, verifiable, correct answer. Calculating the structure of a supercomplex molecule fits the bill; this is what the AlphaFold AI does using the same basic tricks as ChatGPT. Chatbot therapy is a totally different kind of problem, and a bad match for AI.

Artificial Intelligence Essential Reads

Can We Feel Empathy for AI?

AI Can Make Healthcare More Empathic

Perhaps the market won't care. If it's good enough (and cheap enough) maybe we'll accept the erosion of truth into plausibility. The damage will become just another externality of big business, like air pollution.

But I am hopeful that the current rush for human-imitating AI will pass and we will eventually go back to having computers do what they do best: tedious algorithmic tasks that humans can’t or won’t do.