Segmentation is one of the most valuable things you can do in data prep: take several individual answers and resolve them into a single label that drives the rest of your analysis. The example healthcare survey combines chronic-condition status, appointment recency, medication adherence, and satisfaction into one engagement segment.
The recipe
from survey import Recoder
r = Recoder(**globals())
has_chronic_condition = r.get_value("has-chronic-condition")
last_appointment = r.get_value("last-healthcare-appointment")
medication_adherence = r.get_value("medication-adherence")
satisfaction = r.get_value("satisfaction-with-care") or 0
recent_appointment = last_appointment in ["Past 3 months", "4-6 months ago"]
adherent = medication_adherence in ["Always", "Usually"]
if has_chronic_condition == "No" and not recent_appointment:
segment = "Non-user"
elif has_chronic_condition == "Yes" and not recent_appointment:
segment = "Lapsed patient"
elif has_chronic_condition == "Yes" and recent_appointment and adherent and satisfaction >= 4:
segment = "Active managed patient"
elif has_chronic_condition == "Yes" and recent_appointment and (not adherent or satisfaction <= 2):
segment = "At-risk active patient"
else:
segment = "General care user"
r.store_value("Healthcare engagement segment", segment)
How it works
The script reads the four inputs first. Satisfaction gets the or 0 guard because it feeds numeric comparisons; the select answers are compared as strings, so a blank simply fails to match any branch and falls through to the default.
The two helper variables, recent_appointment and adherent, turn lists of acceptable answers into plain true/false flags. Defining them once keeps the conditions below readable instead of repeating membership tests inline.
The if/elif chain assigns exactly one segment per respondent. Order is what makes this correct: Python stops at the first branch that matches, so the more specific conditions must come before the broader ones. The final else is a deliberate catch-all — every respondent who does not match an earlier rule lands in "General care user" rather than being left unlabeled.
Variations
As segmentation rules grow, a long if/elif chain gets hard to audit. Pulling the logic into a named function makes it testable and easier to read:
def engagement_segment(has_condition, recent, adherent, satisfaction):
if has_condition == "No" and not recent:
return "Non-user"
if has_condition == "Yes" and not recent:
return "Lapsed patient"
if has_condition == "Yes" and recent and adherent and satisfaction >= 4:
return "Active managed patient"
if has_condition == "Yes" and recent and (not adherent or satisfaction <= 2):
return "At-risk active patient"
return "General care user"
r.store_value("Healthcare engagement segment",
engagement_segment(has_chronic_condition, recent_appointment, adherent, satisfaction))
When a respondent naturally belongs to several groups at once — multiple needs, multiple behaviors — store a list with store_values instead of forcing a single label, so reporting treats it like a multi-select variable.