How to QA Your Survey Using Claude or ChatGPT

Link testing is the part of survey research that nobody enjoys and everybody cuts short. You've built the instrument. You've set up the logic. Now someone, usually a junior researcher or, worse, the person who wrote the survey, has to click through every path, checking that the questions make sense, the piping works, and the skip logic doesn't strand anyone in an infinite loop.

It's tedious. It's slow. And because it's tedious and slow, teams don't do enough of it. They test the happy path, spot-check a few branches, and hope for the best. Then they find the broken logic three days into field.

There's a better way now. Export your respondent transcripts from MX8 Labs, hand the files to an LLM, and let it do the reading for you.

What You're Working With

MX8 Labs now lets you export respondent transcripts as Markdown files. Each file is a complete record of one respondent's survey experience: every question they saw, every answer option they were shown, and every response they gave, in order. The format looks like a conversation: an ## Interviewer block with the question, followed by a ## Respondent block with the answer, separated by horizontal rules.

This isn't a data file. It's a readable document. And that's the point. LLMs are very good at reading documents and spotting things that feel off.

You download them from the survey page using the Download button. Choose your dataset, select "Respondent transcripts," and you'll get a zip file containing a Markdown file for each respondent, organized into complete/ and in_progress/ folders. For a detailed walkthrough of the export itself, see the documentation.

The Three-Stage Workflow

There are three distinct moments in the survey lifecycle where transcript-based QA adds value. Each uses a different dataset and looks for different things.

Stage 1: Simulated Data

Before you've fielded anything, run a simulated dataset and export the transcripts. The responses will be random: age values that make no sense and nonsense in the open-ends. That's fine. You're not evaluating the answers. You're evaluating the questions.

Hand the transcripts to Claude or ChatGPT and ask it to review the survey flow. Does the branching feel logical? Do follow-up questions match what came before? Is any piped text showing up blank or garbled? Are answer options appropriate for the question being asked?

Simulated data is free and instant. There's no reason not to run this check on every survey before it goes live.

Stage 2: Synthetic Data

If you're using Synthetic Twins, the transcripts get more interesting. Synthetic respondents produce realistic response patterns: they answer demographic questions with plausible values, give coherent open-ended responses, and follow the survey in a way that resembles real behavior.

This means your LLM review can catch subtler problems. Does the survey feel repetitive when someone actually goes through the full flow? Are there sections where the tone shifts awkwardly? Does the routing make sense for a respondent who said they've never used the product but somehow ends up in the heavy-user module?

Synthetic transcripts are the closest thing to a dress rehearsal. Use them.

Stage 3: Live Responses

Once real data starts coming in, export the completed transcripts and QA a sample. This is where you catch problems that only surface with real behavior: respondents who interpret questions differently than you intended, answer combinations that expose logic gaps, or routing paths that technically work but produce a confusing experience.

This is also where transcript QA becomes a quality control tool, not just a testing tool. You're reviewing the actual respondent experience, not a hypothetical one.

How to Prompt the LLM

The key is giving the LLM a clear checklist and telling it what to ignore. Here's a prompt you can adapt:

Review each respondent's transcript and assess whether the survey feels logical and coherent. We're looking for specific issues with question wording, survey structure, and logic.

Check for the following at minimum:

Does each question feel relevant based on previous answers?

Do follow-up questions logically match what was said earlier?

Does the survey ever feel repetitive or disjointed?

Is any inserted or piped text incorrect, blank, or awkward?

Are inconsistent or contradictory answers being allowed through?

Do exits or endings feel natural and well-timed?

Do the answer options fit the question being asked?

Do repeated sections reflect earlier choices?

Does the overall journey feel coherent?

If the dataset is simulated, focus only on question wording, structure, and logic. The responses themselves will be random and should not be evaluated for validity.

Adjust the checklist based on what matters for your study. If you're running a brand tracker, add checks for brand name consistency. If you've got complex quota routing, ask the LLM to verify that respondents are landing in the right cells. If you're piping answers from earlier questions, make sure the piped text reads naturally in context.

Why Markdown Matters

You might wonder why we export as Markdown instead of, say, JSON or CSV. The answer is that LLMs process natural-language documents better than structured data formats when the task is qualitative review. A Markdown transcript reads like a conversation. The LLM understands the flow the same way a human reviewer would, sequentially, contextually, with an eye for whether things make sense.

JSON would give you structure but lose readability. CSV would give you tabular data but lose sequence. Markdown gives you both: it's structured enough for programmatic processing and readable enough for qualitative analysis. It's also small. A typical transcript fits comfortably within any modern LLM's context window, which means you can process hundreds of respondents in parallel without hitting limits.

What This Replaces (and What It Doesn't)

LLM-based transcript QA replaces the most painful part of link testing: the manual click-through. Instead of one person spending half a day walking through survey paths, you export transcripts covering thousands of paths and let the LLM read them all.

It doesn't replace thinking about your survey design. The LLM will catch mechanical problems, broken piping, illogical routing, missing answer options. It won't tell you that your question about purchase intent is poorly worded for your category, or that your rating scale should use seven points instead of five. That's still your job.

But the mechanical problems are exactly what slip through when teams are under time pressure. And they're exactly what LLMs are good at catching.

Getting Started

Export your transcripts from any survey using the Download button on the survey page. If you haven't fielded yet, run a simulated dataset first. It takes seconds and gives you something to work with immediately.

Drop the files into Claude Cowork, ChatGPT, or whatever LLM tool your team uses. Paste in the prompt above. Read the output. Fix what it finds. Then do it again with synthetic or live data when you have it.

The whole process takes minutes. The alternative is finding the bug in field.