Least-fill balancing methodology

This page is the methodology companion to the Least Fill function reference. The reference documents the user-facing call and its parameters; this page explains what the technique does, why it matters for inference, and where MX8 uses it under the hood.

Scope

Least-fill balancing is a dynamic assignment rule: given a list of options (items, profiles, or task sets) and a running count of how many respondents have already seen each option, return the options with the lowest counts. Repeating this for each respondent produces roughly equal exposure across options by the end of fielding.

MX8 uses least-fill balancing in three places:

The user-facing s.get_least_filled() survey function, used for list rotation across brands, concepts, or stimuli.
Internally inside conjoint_question to assign one task set per respondent.
Internally inside the MaxDiff instrument to choose which auto-generated set each respondent sees. (See "MaxDiff set assignment" below for caveats on what is and isn't documented.)

Least-fill is not a quota and does not terminate respondents. It changes which options each respondent is offered, not whether they continue. If you need termination based on counts, use set_quota() instead — see the Least Fill function reference for the contrast.

Why balanced exposure matters

When you compare items in a survey — average rating for brand A vs brand B, share-of-choice for concept X vs concept Y, MaxDiff utility for item P vs item Q — every comparison is implicitly a between-groups comparison. The "group" for each item is the set of respondents who were asked about it.

If you let exposure be uneven, two things go wrong:

Effective sample size (ESS) drifts per item. An item shown to twice as many respondents has roughly twice the precision; comparisons across items become uneven, and small-base items can swing on a handful of responses. ESS is the same concept used in Weighting methodology to talk about the information content of a weighted sample.
Order or recency effects can correlate with item identity. If item A is always asked first when included, and inclusion is not balanced over time, you can confound order with the item itself.

Least-fill balancing addresses both: every option ends the fielding having been shown to roughly the same number of respondents, and the option each respondent sees does not depend on properties of that respondent (only on what's been undersampled so far).

For discrete-choice analysis specifically (MaxDiff, CBC), unequal exposure also breaks identification of the underlying utility model. The estimator in Utility and simulated share methodology assumes each respondent's tasks are a balanced sample of the design space; if exposure is heavily uneven, posterior estimates for under-shown items become poorly identified.

The algorithm

The publicly documented behavior of s.get_least_filled(number, from_list, quota) is:

Take a list of options.
Look up how many respondents in the named quota have already been assigned each option.
Return the number options with the lowest current counts.

This is documented in the Least Fill function reference and the API reference entry for get_least_filled. The quota argument is a counter name supplied at the call site; the counter is created on first use, so you do not need to set up a quota separately. Two calls that share the same name share the same counter pool.

Beyond that, several operational details are consequential for survey authors but are not currently documented:

Tie-breaking. When multiple options share the lowest count, the rule used to pick among them (randomization within tier, list order, alphabetical, deterministic by respondent) is not documented. In practice, treat tie-breaking as undefined and design your survey so the outcome doesn't depend on it.
When counts increment. Whether counts increment at the moment a respondent is offered an option, when they complete the question, or when they complete the survey overall, is not documented. This affects what "least-filled" means during fielding, especially for high-dropout audiences.
Dropout behavior. Whether counts decrement (or are otherwise corrected) when a respondent drops out or is terminated after assignment is not documented.
Persistence across fielding waves. Whether counters reset when a survey is re-opened, copied, or rerun is not documented.

These are the kinds of details that ought to live alongside the function reference once verified against the implementation. Until they are, the safe assumption for study design is: least-fill produces approximately equal exposure across options by the end of a clean, single-wave fielding, and is not a substitute for explicit quota enforcement if you need guaranteed per-item bases.

Where MX8 uses least-fill

Explicit `s.get_least_filled()`

The user-facing surface. Used for rotating large item lists — brands, concepts, ads, product variants — so each respondent sees a manageable subset and each item ends the fielding with comparable bases. See the Least Fill function reference for the call signature and a worked brand-rotation example, and the Survey programming cookbook for the "Handling Lists" pattern.

This is also the surface used in profile-rating conjoint, where each respondent rates a subset of a large profile pool — see step 4 of Setting up a conjoint study for the canonical example.

Inside `conjoint_question`

Choice-based conjoint (CBC) uses least-fill internally to assign one task set per respondent from the pool of task sets the survey author has defined. The survey author does not call get_least_filled directly for this — the conjoint_question instrument handles task-set assignment behind the scenes. This is what makes per-task-set exposure roughly even across respondents, which is what the HB estimator in Utility and simulated share methodology relies on to identify utilities across the design space.

MaxDiff set assignment

Running MaxDiff describes set construction as a design problem (the prime-factor story for choosing the number of items and the size of each set). What that doc does not currently spell out is how the platform assigns the resulting sets to respondents — whether that assignment uses least-fill against the set pool, a precomputed balanced rotation, or something else.

The two layers are related but distinct:

Set construction is about which sets exist in the design and is determined by item count, set size, and repetition count. The prime-factor story in the MaxDiff doc covers this.
Set assignment is about which set each respondent sees. This is where least-fill would operate if it operates here at all.

I'm flagging this as related but unconfirmed rather than asserting it. The "each option is shown uniformly" goal in the MaxDiff doc is consistent with least-fill assignment, but it's also consistent with a precomputed balanced-incomplete-block (BIB) rotation that doesn't need a dynamic counter.

Practical implications

End-of-fielding bases are approximately balanced, not exactly balanced. Least-fill is greedy at the per-respondent level. With small N or high dropout, residual imbalance can persist.
It doesn't gate. A respondent assigned a particular set or item will not be terminated for being assigned a "filled" one — they just won't be assigned it in the first place. If a quota truly needs to be enforced (you must have at least X respondents per item), use set_quota() and combine it with least-fill rather than relying on least-fill alone.
Counters are per-name and created automatically. The quota argument is a counter name; you do not need to define a quota separately. Two get_least_filled calls with the same name share a counter; calls with different names track independently. If you want exposure balanced across the same item pool used in two different places, use the same name in both calls.

When it isn't appropriate

Small samples. With very few respondents, the difference between balanced and random assignment is small, and least-fill's overhead (a tracked counter) buys little.
Intentionally unbalanced designs. If you want some items shown more often (for example, focal stimuli vs distractors), use explicit quotas with target ratios rather than least-fill.
Adaptive designs. If the option a respondent should see depends on their earlier answers (branching, screening, segmentation), that's a routing decision, not an exposure-balancing decision. Use survey logic, not least-fill.

Least Fill — function reference, signature, and example.
get_least_filled in the API reference — canonical parameter table.
Survey programming cookbook — "Handling Lists" pattern.
Choice-based conjoint — uses least-fill for task-set assignment.
Setting up a conjoint study — uses explicit get_least_filled for profile-rating conjoint.
Running MaxDiff — set construction and the design context for balanced exposure in MaxDiff.
Utility and simulated share methodology — why balanced exposure matters for HB identification.
Weighting methodology — for the ESS framing of "information content per item."