Or why “AI-generated” isn’t the same as made-up
Most of the commentary around synthetic data falls into two camps: uncritical hype or outright dismissal. The reality, as ever, is more practical. Done right, synthetic data can radically accelerate research workflows. Done badly, it becomes a hall of mirrors.
Today, we’re launching synthetic data in our platform. And we’re doing it the right way.
For teams that opt in, you can now train a private model on your own historical survey responses, then generate synthetic data for new questions or survey designs using that model. Nothing is shared with foundation model providers. Nothing is shared with other customers. The model stays private. The data stays yours.
This isn’t a shortcut to replace real respondents. It’s an extension layer. A way to make your existing data more flexible, more responsive, and more useful. If you’ve ever had to triage a list of fifty early-stage concepts, or stitch together a tracking study after a questionnaire change, or translate research findings into attributes your ad-tech stack actually recognises, this is for you.
Synthetic doesn’t invent opinions. It extrapolates patterns. And when those patterns are grounded in real, clean, high-quality survey data, the output becomes powerful, not as a substitute for truth, but as a companion to it.
Here’s what it’s actually good for.
1. Early-stage triage, before you spend field budget
Most exploratory research starts with a flood of ideas and a thin window to sort through them. You’re asking which of these fifty directions is even worth testing, or whether an alternative frame might land better, or whether a concept makes logical sense on paper. Synthetic gives you a way to test the structure of these questions before you commit to fieldwork. You’re not looking for certainty; you’re mapping the terrain so you know where to look harder.
The model is trained on your real data; it’s consistent, and it’s fast. That’s the power here: directional signal, not decision-grade output, but enough to eliminate the obviously weak ideas. Start with 100 messages and quickly narrow them down to five worth pursuing. Changes the game.
2. Exploring surveys that would break a human
Some designs are un-fieldable in their raw form. No human respondent will complete 150 attribute combinations without fatigue. No panel is going to survive an exhaustive grid of feature permutations. Early-stage conjoint or volumetric structures often live in the “impossible but interesting” space. Valuable to explore, but risky to field.
Synthetic lets you model those spaces without the cost or the churn. You can probe the edges, identify what looks promising, and bring only the viable concepts into real-world testing. Think of it as computational scouting, not an end state, but a smarter way to begin.
3. Repairing tracking when your questionnaire evolves
Every insight team has been here: you add a few new questions to a brand tracker and suddenly your historical time series becomes a patchwork. Synthetic can’t recreate real responses, but it can estimate what those missing variables might have looked like. It can backfill, interpolate, bridge across waves, or even set stakeholder expectations before the next field run.
These aren’t replacements for measurement — and we don’t treat them as such. But in the real world, where trackers change and brands shift and products launch faster than procurement cycles can keep up, having a flexible tool for continuity is invaluable.
4. Mapping research data to activation environments
This is where synthetic becomes a connector. You collect attributes A, B, and C in your survey. Your CDP, DSP, or targeting environment needs X, Y, and Z. Rather than re-fielding, you can model X, Y, and Z from the structure in your existing data.
The result isn’t “truth,” but it’s often good enough to activate: survey segments aligned to media taxonomies, audience models that plug into campaign logic, mapping layers that move research closer to the point of action.
This use case is already generating real commercial impact. It’s not theoretical.
Making researchers 10× faster
This is the real story. Synthetic doesn’t eliminate researchers — it removes the friction that slows them down. It makes exploration easier, iteration faster, and waste smaller. You don’t need to wait two weeks to test if a concept is flawed. You don’t need to re-field to model a reasonable estimate. You don’t need to guess what an interim wave might look like. You have a tool that can simulate, extend, and clarify — and you use real fieldwork to validate what matters.
This is the difference between “more data” and “better thinking.” Synthetic is the latter.
What it can’t do
If you’re trying to estimate incidence, model rare behaviours, or understand emotional nuance, synthetic data isn’t going to help. It doesn’t capture lived experience; it can’t mirror cultural context; it doesn’t replace diversity in the field. It’s not good at surprise. And it shouldn’t be used to make high-stakes decisions without validation.
We’ve built guardrails into the platform for exactly this reason.
Any chart that includes synthetic data is clearly marked, visually distinct, with a disclaimer in the legend and footnotes. We’re continuing to stat test synthetic outputs where appropriate, but we never render them the same way as human data, and our AI insights will always highlight synthetic data. If it’s modelled, you’ll know.
The goal here is transparency, not trickery. If you’re treating synthetic as a stand-in for truth, you’re misusing it. But if you’re using it to move faster, frame better hypotheses, or prepare smarter fieldwork, you’re exactly where you should be.
The future is hybrid
Synthetic data brings elasticity and scale; human data brings grounding and depth. The best teams will use both. Synthetic to open the space; real responses to choose the path; hybrid systems that move faster than competitors can react.
This isn’t a pivot away from research. It’s a shift in what research can become.
And today, with private, safe, user-controlled synthetic models in the platform, you can use it the way it was always meant to work: grounded in your own data, on your own terms.
