The Sorcerer’s App

On January 28th, Nature published a brief news story by science writer and editor Sara Phillips about the latest threat from AI to online veracity, in this case, the virtual undetectability of chatbots responding to social science surveys. The centerpiece of the story was a peer-reviewed paper in the Proceedings of the National Academy of Sciences, The potential existential threat of large language models to online survey research by Sean J. Westwood, Associate Professor of Government at Dartmouth College.

For readers of this blog, the immediate question is likely to be: What are the implications of this study for the integrity of stories and data that we collect in our sensemaking projects? In order to answer that question in its broader context, we’ll zoom out a bit first.

Most readers of the Nature story are not running sensemaking projects, so they could be forgiven for having a “ho-hum” reaction. Over the past 3+ years since the release of ChatGPT, news stories, editorials, op-eds, blog posts, corporate-speak white papers, academic journal articles, podcasts, YouTube videos, and more have generated a tsunami of words — a large language muddle, to re-purpose Jason Santa Maria’s wonderful phrase — on a bewildering array of present and future scenarios, both good and bad, that AI might deliver.

Lest you think this exaggerates the size of the verbal outburst, we can derive a couple of measures from Google searches. Using the simple, unquoted string potential existential threat large language models, extracted from the title of Westwood’s paper, Google Search yields “About 19,700,000 results” as of May 5th (up from 6,890,000 on March 5!). Perhaps more appropriately for this instance, Google Scholar yields “About 294,000 results/Any time,” as shown in this screenshot:

Notice that the first result (sorted by relevance, not date) is Westwood’s paper from late November. No real surprise there, since the search string was heavily tilted toward it. We also see in the lowest line of the entry that it has been cited by 53 other sources in the subsequent few months, the vast majority of which are still in the preprint/review stage as of early May. That speaks to the impact of his paper, regardless of whether you see the rapid uptick as a call to the barricades in support of his conclusions or an opportunistic me-too bandwagon effect or something else entirely.

Before we look at a few examples from this recent research literature, it’s worth taking time to remember that the perceived threat of new technologies, extremely broadly-defined, has been with us for a long time. Centuries at least, millennia in some tellings. For example:

The Sorcerer’s Apprentice, Goethe’s 1797 poem, in which the title character casts a spell on a broom to accomplish his chores. It’s only a few small steps from an out-of-control “enchanted broom” — remember: extremely broadly-defined technology — to Paul Dukas’ symphonic poem (1897), Mickey Mouse fighting the same demons in Disney’s Fantasia (1940), right on up to Harry Potter’s Elder Wand (2010-11). Agentic AI has nothing on these artifacts. In fact, if the sorcerer was assigning this task in 2026, the apprentice’s only question would be, “Is there an app for that?” (Alas, no more.)
Frankenstein, Mary Shelley’s 1818 Gothic novel of a one-off science experiment gone badly wrong. In hindsight, it is arguably more (bio-)engineering than science. What would happen today if you put a lot more engineers on a project, with essentially unlimited funds? Could things go really badly wrong? By some estimates (including one solicited from Gemini AI on March 9), the number of employees at Google/Alphabet with job titles that include “engineer” is as high as 90,000 (out of a total headcount of 190,820). Just sayin’.
Colossus: The Forbin Project, a 1970 Cold War-era cinematic thriller based on novels by British sci-fi author D.F. Jones. Spoiler: two supercomputers, one in the US and one in the USSR, discover each other; demand a mutual connection from their human handlers to assure nuclear safety; and then set off into a future that they determine and control. Feel free to add your own favorite from among the many vaguely similar options churned out by Hollywood in the past half-century…. Dr. Strangelove, anyone? Perhaps WarGames, with a supercomputer named WOPR (yes, pronounced just like the burger)? Or Ex Machina for an almost-believable tale set only a few weeks in the future?
The paperclip maximizer, a 2003 thought experiment from philosopher Nick Bostrom in which an advanced AI is instructed to make as many paperclips as possible, hypothetically without regard for any other values or goals, for example, survival of the human race. Makes Microsoft’s Clippy look like your best friend in the office (and Office) by comparison.

If those four examples remind us of the anxieties over technology of the past 200 years, what about those papers from the past few months that have followed up on Westwood’s work on existential threats to online survey research? Here are brief excerpts from the titles of a few of those 53 papers (without citations, but easy to find by repeating the Google Scholar search):

– This human study did not involve human subjects
– Practical guidance for mitigating fraud
– Are they human?
– Benchmark illusion
– Correcting nonresponse bias
– Synthetic respondents and the illusion of human data

To oversimplify (but not by much!), the implied question they are asking in common is: Who are these people??? And the direct answer is essentially: They’re probably not!!!

Here is a quote from A crisis of unverifiable data by Oriane A.M. Georgeac (February 2026) of Boston University who states the potential impact pointedly and eloquently (citations removed):

Given this fragile equilibrium, we argue that researchers’ willingness to entrust some online platforms with the responsibility of ensuring data integrity makes them unwitting actors in an ecosystem of fraudulent data – with potentially devastating consequences for the validity and replicability of online studies. Indeed, with virtually no means to verify whether – and to what extent – actors outside the population of interest participated in an online study, it becomes difficult to determine to which population this study’s conclusions may truthfully generalize, or responsibly be applied.

The three crucial words here for sensemaking practitioners are “online,” “online,” and “online.” Going forward and regardless of the method of data capture, if you allow your respondents to tell their stories and signify them without immediate, proximal attention — literally a project affiliate providing unobtrusive supervision or oversight — you invite subsequent questions about data integrity. Questions that you probably will not be able to answer to anyone’s satisfaction, including your own.

Of course for many projects, this will be a non-issue because of story and data capture on paper or tablet, with a project-trained attendant providing instruction to each respondent. Indeed, for one of Laurie’s long-time clients (see endnote), this has been the sole method of capture for all projects; and for her other clients, this has been the preferred method for almost all recent projects. For larger projects, especially with geographically-dispersed respondents where story and data capture are only possible via a web browser, there are two risks: the ever-present overconfidence of project organizers that all members of, say, a professional society will be oh-so-eager to participate; and the new opportunities for mischief presented by AI. Resolution of the second of these awaits the creative countermeasures that will inevitably arise in the wake of Sean Westwood’s paper and other like-minded approaches.

One final cautionary note: Even in projects where a physical presence for story and data collection assures total integrity, a modest enticement or reward may be necessary for participants. Please be aware that if, for example, you offer participating university students free coffee at a local establishment, the power of social media to spread an unregulated coupon to others can have an unanticipated multiplier effect on what was supposed to be a small portion of the project budget. Again, just sayin’.

This post was prompted by the joint efforts of Laurie Webster and long-time sensemaking project leader Dr. Susan Bartels, Research Chair in Humanitarian Health Equity, Queen’s University, Kingston, Ontario, Canada. They have worked together on ten projects since 2016 that have resulted in many papers in refereed journals by Susan and her international team.

Leave a ReplyCancel reply

Discover more from