AI-Generated Survey Responses Are Getting Harder to Catch. Here's How We're Staying Ahead.

Or why the fraud problem is an arms race, not a fix

Three years ago, the fraud problem in online research was solved by looking for patterns. Bots moved too fast. Completion times were too short. Respondents gave identical answers to branching logic. You could catch them with simple rules. If completion time < 60 seconds, flag it. If variance in answers = 0, flag it. If device fingerprint appears five times in one hour, flag it.

Those rules still work. For obvious fraud.

But the fraud has evolved. And the problem has become something else entirely: the more sophisticated the detection becomes, the more sophisticated the fraud becomes to evade it. Earlier this year we wrote about this shift as it was beginning. Now it's the dominant pattern. And it's not getting better. It's getting worse.

AI-generated responses are no longer made of patterns. They're made of mimicry. A bot trained on human survey responses can answer a brand tracker the way a human would. It can vary its response across different attributes. It can answer branching questions logically. It can adjust timing to look natural. It can even tailor answers to screening logic to pass eligibility gates. And because it's been trained on real human data, it doesn't fall into the statistical traps that used to catch fraudsters.

The most dangerous AI-generated responses are the ones that pass attention checks. They don't ignore the attention question. They answer it correctly. They've learned that an inattentive response is a red flag. So they pay attention. They vary their timing. They introduce micro-patterns that look human. They're not bots anymore. They're mimics.

The old fraud detection relied on catching people breaking the rules. The new fraud detection has to catch people following them too well.

Here's what this means operationally. A completion time that looks natural now requires deeper analysis. You have to look at the variance across the survey, the pattern of thinking time between different question types, the relationship between typing speed in open ends and response complexity. A natural completion time used to mean "probably human." Now it means "maybe human, maybe very good mimic."

An attention check that gets answered correctly is no longer reliable by itself. You have to ask why it was answered correctly. Did the respondent stop and think about it? Or did a language model recognize the pattern and generate a correct answer? Did they introduce delays before answering? Or did they just load the right weights and output?

The only defense against this kind of sophistication is sophistication in detection. And that sophistication has to evolve continuously because the attacks evolve continuously.

This is where detection shifts from a feature to an architecture. Earlier this year we published our approach: 35+ browser fingerprinting attributes. We're not just looking at the device. We're looking at how the browser behaves. How are fonts being rendered? Is the screen refresh rate consistent with what the operating system reports? Are WebGL settings realistic? Are plugins actually present or spoofed? These aren't questions that have simple yes/no answers. They require probabilistic reasoning about what makes a device real.

We're detecting incognito mode, VPNs, proxies, and emulators because these are the tools that fraud uses. A person in Nigeria answering a US survey needs to mask their geography somehow. An automated system needs to appear to be coming from a legitimate device. We can't just block these things—they're used by real respondents in legitimate situations—but we can flag them and increase the scrutiny applied to that response.

We're detecting device tampering: cases where someone has modified the operating system or installed custom versions of Android. Again, this isn't an automatic disqualification. But it's a signal. It says "this device is not in its original state."

Then there's dynamic risk scoring. A single attribute—completion time, fingerprint inconsistency, a suspicious open-end response—is not enough to conclude fraud. But a combination of correlated attributes tells a story. Did this response have three fingerprinting inconsistencies AND a completion time below the 10th percentile AND an open-end response that shows statistical markers of being LLM-written? The probability compounds. That's a signal worth taking seriously.

The LLM-written detection is its own challenge. Language models have patterns: overuse of certain phrases, a particular kind of word choice, a formulaic structure to complex thoughts. Real humans ramble. They make typos. They contradict themselves. They trail off. Language models tend to wrap things up neatly. We're training on corpora of known LLM outputs and building classifiers that can distinguish them from human writing. It's not perfect—good mimics are very good—but it catches the obvious cases.

Here's the thing though: this entire detection apparatus is only one half of the solution. The other half is understanding that fraud detection is not a problem that gets solved. It's a problem that gets managed. Every time you build a better detector, someone figures out how to build a better evasion. Every time you deploy a new fingerprinting attribute, attackers find a way to spoof it or mask it. Every time you improve your open-end analysis, fraud becomes more sophisticated about how it generates prose.

This is what we mean by an arms race. The platforms that treat fraud detection as something they solved three years ago and then stopped investing in will fall behind. Quietly at first—just a slow degradation of data quality as fraud gets more sophisticated. Then visibly, when clients start noticing response patterns that don't make sense.

The platforms that understand this as an ongoing investment problem—that hire detection experts, that continuously test new evasion techniques, that update their systems monthly instead of quarterly—those are the ones that will maintain data integrity as fraud evolves.

We're not saying we've solved it. We're saying we're staying ahead of it. We're testing our detection against the latest generation of AI-generated responses. We're monitoring what fraud looks like in the wild. We're updating our systems continuously. Because the moment you stop updating them, you stop detecting.

The fraud problem used to be about catching bad actors. Now it's about staying ahead of them. And staying ahead means accepting that you're never actually done.