Data Prep Overview

After fieldwork closes, you often need to reshape the data before it is ready for analysis: derive a new metric, roll several answers into a total, group respondents into segments, clean up labels, or fold in data that lives outside the survey. The MX8 Labs Research Platform handles this through the recode script — a small Python program that runs once per respondent against their captured answers.

Use this section when you already know what needs to change in the dataset and need the exact recode-script pattern to do it. For a walkthrough of where the recode script lives in the platform and how to open the data-prep editor, see Recoding and post-field data wrangling. For generated method signatures and parameter tables, see the Recoder API Reference.

The `Recoder` object

Every recode script starts by constructing a Recoder, which gives you access to the current respondent's answers and the methods for storing new values:

from survey import Recoder

r = Recoder(**globals())

The script runs once for each respondent, one audience at a time. Inside it, you read answers with get_value, get_values, or get_dict, calculate whatever you need with ordinary Python, and write results back with store_value or store_values so they appear as new variables in reporting.

Common operations

Most data-prep scripts use the same small set of operations.

get_value(reporting_id) returns the single answer to a question. Use it for select, rating, and numeric questions where only one value is captured.

gender = r.get_value("Q_gender")

get_values(reporting_id) returns the list of selected answers for a multi-select question, in the order they were asked.

selected_brands = r.get_values("Q_brands")

store_value(name, value) saves a single derived value as a new reporting variable without touching the original questions.

r.store_value("is_young", 1 if age < 35 else 0)

get_dict(reporting_id) returns a structured answer for grid or list-style questions.

store_values(name, values) saves a list of derived values, producing a multi-select-shaped calculated variable.

recode(reporting_id, recodes) maps captured answers onto new values — useful for standardizing messy text or collapsing options into buckets.

r.recode("Q1", {"Male": "M", "Female": "F"})

mark_poor_quality(respondent_ids) flags respondents to drop from the final output. IDs that are not part of the current recoding pass are ignored.

r.mark_poor_quality(["respondent_1", "respondent_2"])

r.respondent_id returns the current respondent's ID, which is handy when joining in external data keyed by respondent.

Example patterns

The examples below use healthcare-style reporting IDs to keep the code concrete. Replace those IDs with the reporting IDs from your own survey.

Summing numeric values shows how to combine several numeric answers into a single total.
Calculating rates with division shows how to divide one answer by another while guarding against division by zero.
Estimating totals with multiplication shows how to scale a count by a percentage.
Appending external data by respondent ID shows how to join client metadata onto each respondent.
Building segments from multiple variables shows how to combine several answers into a segment label.

Allowed Python libraries

The recode script runs in the same sandbox as survey code, so the same import rules apply. You may use any of these standard libraries:

collections
datetime
itertools
math
re
statistics
string
survey
time
random

If you need other libraries, including non-standard ones, these are available on an Enterprise plan. See Using Advanced Python Features for the full allowed-imports list and the forbidden-functions rules the editor enforces as you type.

Data Prep Overview

The Recoder object

Common operations

Example patterns

Allowed Python libraries

The `Recoder` object