Building Segments from Multiple Variables

Segmentation is one of the most valuable things you can do in data prep: take several individual answers and resolve them into a single label that drives the rest of your analysis. The example healthcare survey combines chronic-condition status, appointment recency, medication adherence, and satisfaction into one engagement segment.

The recipe

from survey import Recoder

r = Recoder(**globals())

has_chronic_condition = r.get_value("has-chronic-condition")
last_appointment = r.get_value("last-healthcare-appointment")
medication_adherence = r.get_value("medication-adherence")
satisfaction = r.get_value("satisfaction-with-care") or 0

recent_appointment = last_appointment in ["Past 3 months", "4-6 months ago"]
adherent = medication_adherence in ["Always", "Usually"]

if has_chronic_condition == "No" and not recent_appointment:
    segment = "Non-user"
elif has_chronic_condition == "Yes" and not recent_appointment:
    segment = "Lapsed patient"
elif has_chronic_condition == "Yes" and recent_appointment and adherent and satisfaction >= 4:
    segment = "Active managed patient"
elif has_chronic_condition == "Yes" and recent_appointment and (not adherent or satisfaction <= 2):
    segment = "At-risk active patient"
else:
    segment = "General care user"

r.store_value("Healthcare engagement segment", segment)

How it works

The script reads the four inputs first. Satisfaction gets the or 0 guard because it feeds numeric comparisons; the select answers are compared as strings, so a blank simply fails to match any branch and falls through to the default.

The two helper variables, recent_appointment and adherent, turn lists of acceptable answers into plain true/false flags. Defining them once keeps the conditions below readable instead of repeating membership tests inline.

The if/elif chain assigns exactly one segment per respondent. Order is what makes this correct: Python stops at the first branch that matches, so the more specific conditions must come before the broader ones. The final else is a deliberate catch-all — every respondent who does not match an earlier rule lands in "General care user" rather than being left unlabeled.

Variations

As segmentation rules grow, a long if/elif chain gets hard to audit. Pulling the logic into a named function makes it testable and easier to read:

def engagement_segment(has_condition, recent, adherent, satisfaction):
    if has_condition == "No" and not recent:
        return "Non-user"
    if has_condition == "Yes" and not recent:
        return "Lapsed patient"
    if has_condition == "Yes" and recent and adherent and satisfaction >= 4:
        return "Active managed patient"
    if has_condition == "Yes" and recent and (not adherent or satisfaction <= 2):
        return "At-risk active patient"
    return "General care user"

r.store_value("Healthcare engagement segment",
              engagement_segment(has_chronic_condition, recent_appointment, adherent, satisfaction))

When a respondent naturally belongs to several groups at once — multiple needs, multiple behaviors — store a list with store_values instead of forcing a single label, so reporting treats it like a multi-select variable.