Artificial Intelligence
Prediction Based on Past Data Has Its Limits
AI relies on past data, but if parameters change, so does prediction accuracy.
Posted December 12, 2023 Reviewed by Michelle Quirk
Key points
- Artificial intelligence has no capacity to learn anything and to derive answers.
- All its responses, whether recommendations, red flags, or a complex set of text, are based on predictions.
- If underlying training data are ill-equipped for accurate predictions, then prediction validity is suspect.
By now, most people are aware of popular artificial intelligence (AI) tools, such as Bard, ChatGPT, and DALL-E. Far fewer, though, appear to possess even a basic understanding of how such tools do what they do. While it certainly isn’t important to know all the inner workings of AI,1 the use of expressions like intelligence and hallucination anthropomorphizes the tool and implies that AI learns and is somehow aware of what it is producing.
This, of course, couldn’t be further from the truth, as AI has no capacity to learn anything and to derive answers, recommendations, or conclusions that exceed the scope of the data it was trained on.2 As a case in point, Pomeroy (2023) reported on some research showing that ChatGPT was unable to think creatively. If the answer wasn’t predictable based on the data that had been used to train the AI, it simply struggled to perform well on the task (even failing to outperform young children on a creative task).
Based on Predictions
There are numerous such anecdotal examples out there where a large language model (LLM), like ChatGPT or Bard, can produce what amounts to an illogical response. The primary reason for this, as I argued when discussing the accuracy of ChatGPT, is that AIs (all of them that I know of) operate based on predictions, no different than the way Facebook’s or X’s algorithms predict which information to place in your feed. It’s simply a much more sophisticated version of that and operates on a larger scale.
So, AI doesn’t actually know anything, and all of its responses, whether those are recommendations, red flags, or a complex set of text, are based on predictions. And those predictions are based on the data it was trained on or has available to use to make said predictions. Those predictions can be refined over time, but not without feedback.
And that brings us back to the issue of novelty. In the case of a task that is not easily predicted based on the training data, an AI can produce a logically or morally indefensible response.3 Although sometimes questionable responses may be based on constraints put on the AI (as I discussed in the context of censorship), if the underlying training data are ill-equipped to lead to accurate predictions, then the validity of those predictions becomes suspect.
Model Drift
But even if the data used to train the AI were well-equipped to lead to accurate predictions, if the underlying data change, so, too, can prediction accuracy (what is known as model drift). Gigerenzer (2022), for example, showed how attempts to apply Google Flu Trends (which had been effective to predict traditional influenza) failed to apply to the prediction of swine flu—largely because the data used to predict one was woefully inadequate for predicting the other.
More recently, Dinerstein (2023) reported on a test of tools that were developed by various vendors to predict the occurrence of sepsis among hospital patients. Sepsis is a big issue for hospitals and an even bigger risk for patients. The tools relied on AI-driven predictions and were installed on hospital systems, meaning the tools themselves were not continuously re-trained based on current data. Based on what was known about the algorithms underlying the tool,4 researchers tested the accuracy of such models over time.
What they found was that the models originally possessed reasonable accuracy, but, over time (several years in this case), experienced model drift, degrading to little better than a coin toss. Some of the degradation occurred due to a change in the way certain billing information was coded, and so simply constraining the model to avoid said information improved accuracy to a fair extent. But the rest of the degradation occurred because of changing patient characteristics.
In both cases, the data that originally led to a moderately accurate predictive model no longer did. And with no ability to (a) monitor the accuracy of predictions on their own or (b) adjust the algorithm so it would be more accurate, hospitals began making less-informed decisions over time. In other words, past data were no longer predictive of future occurrence, but there was no way to know that or to adjust the tool so better accuracy could be achieved.
Now, these problems are not unique to AI-driven systems. Any system that relies on prior data to predict future responses has the same potential limitation. Most AI-driven systems, though, lack transparency, so it’s unclear exactly how its arguments, conclusions, or recommendations are derived. And so, unless the AI’s predictive model is tested and recalibrated in an ongoing way (the Sepsis AI tools were not5), model drift is likely to occur over time, leading to less predictive accuracy.
And yet, many people, including key decision-makers, ascribe a level of intelligence to AI—even though such systems are not intelligent at all. They end up defaulting to the AI’s recommendations or outputs as if they are somehow immune to error. This is why, as I argued here, human expertise is still required to oversee AI, and AI predictions should not be assumed to be perfectly accurate. They are merely another source of data that experts should be relying on when making decisions. But even when AI sources are used, there will be a need for continuous monitoring and recalibration so the tool is producing outputs that are reasonably accurate over time.
References
1. After all, even their developers often have little clue.
2. I asked ChatGPT to review my post and provide feedback. It told me that I should emphasize here that “AI doesn't truly understand or possess awareness but operates based on patterns in the training data.” That seemed like a good suggestion, so I offer it here.
3. A recent example of this can be found here.
4. Because the algorithms underlying the predictions were proprietary.
5. But even if they were, this presents its own challenges. Either someone needs to be responsible for managing this task, or the AI has to be allowed to recalibrate itself, which could result in other forms of prediction degradation.
Ross Pomeroy. Young Children Beat ChatGPT in a Simple Problem-Solving Task. November 14, 2023. RealClear Science.
What Is Model Drift. DataTron.
Gerd Gigerenzer. One Data Point Can Beat Big Data. Behavioral Scientist. August 31, 2022.