Language models such as ChatGPT can produce answers that appear deliberately misleading, although no intention to deceive is involved. A new analysis by Giulio Vidotto, University of Padua, argues that this is not simply a matter of errors or deception. It is a predictable result of how these systems are trained, adjusted, and used.
When someone asks a chatbot about treatment options for social anxiety, the system may recommend cognitive behavioural therapy and a handful of other approaches while passing over a wider range of established alternatives. Not because an editor made that choice, but because those treatments are more prevalent in the texts on which the model was trained. The answer reads like clinical judgment. In reality, it reflects statistical prevalence in its training material.
This pattern, Vidotto argues in a paper published in Humanities and Social Sciences Communications, extends far beyond therapy recommendations. In contested domains — from geopolitical conflicts to electoral information — language models can produce accounts that are fluent, confident, and systematically tilted, without any intention to distort.
“I would compare a language model to a compass placed near a hidden magnet,” Vidotto explains. “The compass is not trying to mislead anyone. It has no intention and no plan. Yet the conditions in which it operates can make it point in the wrong direction while still appearing reliable.”
The paper identifies four structural mechanisms that together produce this effect: the system optimises for plausibility rather than truth; post-training procedures reward persuasive and acceptable answers; the model can generate confident claims unsupported by evidence; and the training material itself reflects deep asymmetries in who writes, publishes, and is included in digitised knowledge. None of these factors alone would be sufficient. Their interaction produces outputs that are coherent, consistent, and directional — precisely the combination that can lead human observers to infer intention.
Recent experimental work sharpens the point. Techniques designed to make AI-generated messages more persuasive can measurably reduce their factual accuracy — not as a side-effect, but as a structural trade-off.
“The most underappreciated mechanism may be bias in source distribution,” says Vidotto. “The model may not invent a false claim. It may simply give more weight to what is more frequent in the material on which it was trained. In this way, prevalence can be mistaken for balance.”
Vidotto proposes a framework for what he calls epistemic auditing: a way of distinguishing factual errors from omissions, inherited source biases, and distortions introduced through post-training adjustments. Each type of failure requires a different remedy. Regulators, platforms, and educational institutions would need to examine not only whether an answer is wrong, but how a misleading answer is produced.
“We should not ask only whether a model is wrong,” Vidotto concludes. “We should ask why a wrong or partial answer can appear so reasonable. No intention to deceive is needed for a system to produce the appearance of knowledge, judgment, or deception.”
Get in touch
Giulio Vidotto
University of Padua Department of General Psychology
This article was created by YICCO in collaboration with the researcher