Stanislav ‘Slava’ Sazhin, Russian entrepreneur, flags ChatGPT-4 on LinkedIn for giving an incorrect answer to an elementary arithmetical query; incites interesting responses

July 22, 2024 Dr. Piyush Mathur

by Dr. Piyush Mathur

As their early childhood continues to unfold, Generative Artificial Intelligence (GAI) models keep attracting (social) media comments about their performance. One striking comment was posted on LinkedIn last week by Stanislav ‘Slava’ Sazhin—a Russian medical entrepreneur currently busy articulating a new business venture in Nigeria.

On July 17, Sazhin posted on his profile a screenshot of his exchange with ChatGPT-4—the latest iteration of the GPT series of large language models developed by Open AI—in which the model produced a patently incorrect answer to a basic arithmetical query.

A screenshot of Slava Sazhin’s LinkedIn post dated July 17, 2024

Sazhin’s query was typed exactly as follows: ‘9.11 and 9.9 - which is bigger?’

The model’s response, which includes an explanation, was as follows: ‘9.11 is bigger than 9.9. In decimal numbers, each position after the decimal point represents a tenth, hundredth, thousandth, etc., of a whole number. Thus, 9.11 is 9 plus 11 hundredths, and 9.9 is 9 plus 9 tenths. Since 9 tenths is equal to 0.90, 9.11 is greater than 9.90.’

From his connections and followers, Sazhin’s post predictably received interesting feedback, revealing gaping holes in the human-GAI interface. To the extent that most of this feedback was advisory in nature—as in pieces of advice to Sazhin qua GAI user—it is apt to receive some audit of its own, which I briefly provide it below.

But how far advisable is it to advise a GAI user?

Olusegun Usman Q. (whose profile suggests that he is an entrepreneurial pharmacist and a software engineer) advised Sazhin to try using the word ‘greater’ in place of ‘bigger’ in his query to ChatGPT-4, arguing that ‘[t]he keyword used in your prompt determines your response.’ Usman made no mention, though, of whether he had himself tried that linguistic replacement successfully—and nor did he provide any evidence to support his specific advice. Moreover, while the choice of words in one’s prompts to GAI models affects their responses (which is common knowledge anyway), it does not address the underlying issue of a model’s fundamental understanding of basic arithmetic.

A GAI model should be able to handle elementary mathematical tasks regardless of reasonable linguistic variations in the prompts. Expecting users to modify their natural language input to accommodate the model’s limitations is impractical. Users expect a GAI model to understand and process straightforward queries without needing to tailor their language extensively—and Sazhin very likely had this type of an expectation (which was belied by ChatGPT-4).

In another comment to Sazhin’s post, Petr Lavrov—whose profile self-description indicates that he is a software engineer—wonders whether Sazhin had overlooked the rule of thumb that ‘if you want to make AI better at math, ask it to explain its reasoning BEFORE giving the answer, and also ask it to think step by step’, stressing that if a GAI model ‘first gives an answer’, it tends to switch ‘to explaining mode instead of reasoning mode—and spews out garbage’.

See this content in the original post

Lavrov’s recommendation to prompt AI to explain its reasoning before providing an answer and to think step-by-step can enhance transparency and help the user identify where errors occur; it is theoretically sound and aligns with best practices in debugging complex algorithms, where breaking down a problem into smaller, manageable parts can reveal logical inconsistencies or misunderstandings—which helps calibrate AI models. But there is also an assumption that underlies this recommendation that users have the patience and expertise to guide the AI through iterative questioning, which may not be feasible for all users, especially those seeking quick and accurate results without additional effort.

A screenshot of Dr. Piyush Mathur’s interaction with ChatGPT-4 dated July 17, 2024

Moreover, the AI's ability to accurately explain its reasoning depends on its initial training data and algorithms. If the foundational logic is flawed or the model is not well-tuned for specific types of queries, then even a step-by-step explanation might not yield the correct outcome. So, at its best, Lavrov’s comment—which otherwise seeks to preach user-guided problem-solving—simply underscores the need for the developers to keep improving the AI's training data, processes and algorithmic robustness to handle basic tasks without requiring extensive user intervention.

For the AI developer community, this suggests a dual approach (which is otherwise supposed to be in place already): refining AI models to handle straightforward tasks autonomously while also developing tools and methods that allow for deeper interrogation and understanding when necessary. This balanced strategy is supposed to help bridge the gap between AI capabilities and user expectations; to the extent that this has not yet quite worked out as planned, we have situations like the one faced by Sazhin.

Meanwhile, ChatGPT-4 ended up admitting its error!

Intrigued by Sazhin's post, I wanted to cross-check the problem he had faced with his query on ChatGPT-4. When I posted on the model the exact same query, using Sazhin's words (albeit as a more formal sentence), it reproduced the same incorrect answer while giving a slightly different explanation for it: ‘9.11 is bigger than 9.9.’

In its explanation, though, the model replaced the word ‘bigger’ with the word ‘greater’ (twice), which appears to indicate that this was not a verbal or linguistic problem as far as the model is concerned (and in fact the model was trying to convey that it grasped the fact that ‘bigger’ and ‘greater’ mean the same to it in the context of the query. (See the screenshot to your right to check out my interaction with the model.)

A screenshot of Segun Adepoju’s comment on Sazhin’s LinkedIn post dated July 17, 2024

Ergo, Usman’s suggestion (quoted earlier) to Sazhin to try using the word ‘greater’ instead of ‘bigger’ in his query would not have made any difference to the model’s answer. That becomes even clearer from yet another comment (made by Segun Adepoju) on Sazhin’s post.

In this comment, Adepoju, a cross-sectional technologist (going by his LinkedIn self-description), reports that he also tried out Sazhin's query on ChatGPT (and he presumably got the same incorrect answer)—except that the model corrected itself when he 'probed further.’

As for Adepoju’s ‘probe’, it appears to have culminated in the following command to the model: 'Subtract 9.11 from 9.9’—to which the model’s response was as follows: ‘Subtracting 9.11 from 9.9 results in -0.21, not 0.79. My earlier statement had an error, 9.11 is indeed smaller than 9.9.’

A screenshot of Slava Sazhin’s LinkedIn post dated July 20, 2024

That the model would end up admitting to its error in response to a probative workaround by Adepoju utterly undercuts the case, made separately by Usman and Lavrov (and seconded by some other commentators not mentioned in this report) that the user, be it Sazhin or anybody else, had needed to improve his or her query to get the right answer.

The saga continued…

Undeterred by the mostly paternalistic feedback to his post, Sazhin made another post on July 20—a screenshot showing that ChatGPT-4 was still giving the same incorrect answer (including the same explanation) to his query. In this new post, Sazhin pokes fun at some of his previous commentators, telling his readers that he was ‘Waiting for the AI fans to say that I phrased my prompt wrong’.

As of July 23, his post had not received any negative feedback on how he had phrased his prompt; however, one commentator—Tópé Isaac Adéríyè’s—did try to provide him some cold comfort—by noting, with a smiley, that ‘AI makes mistakes too’, and posting a screenshot of another query and Meta AI’s response to it.

A partial screenshot of Tópé Isaac Adéríyè’s comment on Slava Sazhin’s LinkedIn post dated July 20, 2024. For a full screenshot of Adéríyè’s comment, click here.

In this screenshot (see the image above), a different GAI model—Meta AI—is shown to be giving several wrong answers to this question (and while responding to a hilarious series of follow-up ones): ‘The word watermelon contains how many r’s’!

Dr. Piyush Mathur is the author of the book Technological Forms and Ecological Communication: A Theoretical Heuristic (Lexington Books, 2017).

If this report interested you, then you may be interested in these, too: In Nigeria, a Russian medical entrepreneur implements a novel approach to starting a business; gets loads of useful attention and As Russia and Ukraine fight it out, a look back into NSCAI’s Final Report.

You can post your comment below, and contact Thoughtfox here.