This page is the technical companion to the Weighted results FAQ. The FAQ explains what weighting does and when it runs; this page documents the estimator, stage composition, respondent-universe selection, and inference adjustments. It is written for analysts, data-science teams, and methodologists reviewing the approach.
Scope
MX8 Labs uses iterative proportional fitting (IPF, also called raking) to calibrate respondent weights to marginal targets supplied through quota configuration. Point estimates are computed from the final calibrated weights. Standard errors and significance tests use an effective-sample-size adjustment so that precision claims degrade gracefully as weights become more heterogeneous.
Weighting can be run in one stage or two stages:
- Pre-weighting stage (optional)
- Main weighting stage
Each stage can use its own configured respondent universe.
This is a calibration-weighting framework with Kish effective-sample-size adjustment. It is not a full design-based variance estimator: it does not model primary sampling units, explicit strata, or finite-population corrections. In practice, for the kinds of online respondent sources MX8 supports, this framework is well matched to the data and the questions people ask of it, but the scope is worth naming explicitly.
Notation
Let respondents be indexed by with final calibrated weights .
For a reporting cell (for example, "women aged 25-34 in the West region"):
- is the set of respondents in cell .
- is the raw base for that cell.
- is the weighted base.
For a binary outcome , the weighted proportion is:
Weighted means for numeric or rating outcomes are computed analogously.
Target marginals
Weighting targets are supplied as category proportions for one or more questions defined on the respondent source (age bands, gender, region, and so on). Targets may be:
- One-way, e.g. the marginal distribution of age alone.
- Nested (joint), e.g. a target distribution over the joint of age and gender. Joint targets are calibrated directly against the joint cell, not against the two marginals separately.
Only categories with a positive target are treated as required for the eligibility checks described below. A category with a zero target is permitted to be absent from the eligible data.
Respondent-universe selection
Each weighting stage defines its own eligible respondent universe. This is a methodological control, not just an operational filter — calibration answers "representative of which population?", and universe selection defines that population.
Common configurations include:
- Completes-only universes for analysis-sample inference.
- Broader universes (for example including selected terminates) when calibrating to entrant-level representativeness.
- Distinct universes across stages when pre-weighting is used to correct source composition before final reporting calibration.
IPF within a stage
Given a seed tensor built from the stage's eligible respondents and a set of target margins on the weighting dimensions, IPF repeatedly rescales along one dimension at a time so that its marginal on that dimension matches the corresponding target. After each pass it moves to the next dimension and rescales again. The loop continues until the fitted tensor matches every target to within a convergence tolerance.
Once has converged, each calibration cell receives a cell multiplier:
The right-hand factor normalizes the weights so that their sum equals the eligible respondent count, which keeps weighted bases on the same scale as raw bases. Each respondent inherits the multiplier of their calibration cell as their weight.
Two-stage composition
With both stages configured:
- Pre-weighting runs first on its configured universe and targets. The seed tensor is built from observed respondent counts and IPF produces a first set of respondent weights.
- Main weighting runs second on its own configured universe and targets. Critically, its seed tensor is built from the pre-weighting weights, not from raw counts. IPF then produces a second cell multiplier on top of those weights.
Because the second stage's seed is the first stage's output, stages compose multiplicatively:
This supports sequential correction (for example source/frame correction first, reporting calibration second) without collapsing both objectives into a single target system.
Eligibility guardrails
Before any weighting runs, the platform checks every configured stage. A stage is eligible only if:
- every question referenced by the targets is present in the data,
- the number of respondents complete across the weighting questions exceeds the minimum base for weighting, and
- every category with a positive target has at least one eligible respondent in that stage's selected universe (including joint categories for nested targets).
Condition (3) is the most commonly hit. It prevents IPF from trying to put mass into a category that is empty in the eligible sample, which would otherwise produce either non-convergence or implausibly large weights.
If any configured stage fails eligibility, no stage is applied and respondent weights remain at 1.0. This all-or-nothing behavior prevents a partial calibration from introducing targets that were never intended to be hit in isolation. The weighting diagnostics will show which check failed so you can fix the underlying issue — typically by collapsing a sparse category, revising the target, or increasing the sample.
Effective sample size
For any set of weights , the Kish effective sample size is:
Two properties to keep in mind:
- If every respondent has the same weight, . Unequal weights always reduce below the raw count.
- is computed separately for each reporting cell, so a subgroup with stable weights can retain most of its precision even if the dataset overall has heavy weighting.
At the dataset level, we report the weighting efficiency:
An efficiency near 1 means weighting has cost very little precision. Low efficiency, or a long right tail in the weight histogram, is a signal that the targets are straining the sample — often a cue to collapse sparse categories or revisit the quota.
Variance approximation
Standard errors for weighted proportions use a binomial-style approximation with substituted for the raw base:
This is the standard calibration-weighting shortcut: it captures the first-order precision cost of unequal weights without requiring a full Taylor-linearization pass for every estimand. Mean-aggregated outputs use the same pattern after scale-normalizing the estimate to a proportion.
Significance testing
All significance tests in MX8 reports consume rather than raw counts:
- Column t-tests compare each cell to other cells in its column using the reported weighted mean and standard error, with sample size set to .
- Row t-tests are the same, applied across rows.
- Residual t-tests (the default in cross-tabs) use a critical with degrees of freedom derived from , so the threshold for flagging a cell as significant tightens when the effective base is small.
The practical consequence is that heavily weighted data produces fewer significant cells than raw counts alone would suggest. This is intentional and avoids overclaiming precision from inflated weighted totals.
A small worked example
Suppose a 600-respondent genpop dataset is weighted to national age and gender targets. After weighting, the weights for one reporting cell — "women in the Northeast" — are distributed as follows:
- 40 respondents with
- 60 respondents with
For this cell:
So the 100 raw respondents in this cell carry the information of roughly 93 equally-weighted respondents. If the weighted proportion on a binary outcome is , the standard error is:
Compared to if the cell were unweighted — a modest widening that reflects the mild weight dispersion.
Diagnostics to monitor
Per dataset, weighting diagnostics report:
- raw respondent count ,
- effective sample size ,
- efficiency ,
- min, median, mean, and max weight,
- a histogram of the weight distribution,
- stage eligibility status and failure reason.
Points to watch for:
- Efficiency well below 1 — weights are doing a lot of work. Inference is valid but precision is reduced; consider whether the targets are achievable from the realized sample.
- A long right tail on the weight histogram — a small number of respondents are carrying a lot of weight. These cells will have the largest effect on point estimates and the smallest effective base.
- A stage reported as ineligible — read the failure reason in the diagnostics. Usually this points to an empty target category in the eligible sample.
Assumptions and limitations
- Inference assumes independent respondents within calibration cells; there is no explicit modeling of clustering, stratification, or finite-population corrections.
- The quality of point estimates depends on the validity of the supplied marginal targets and on overlap between the sample and the targets.
- Extreme weights reduce efficiency and widen standard errors. The eligibility guardrails and the diagnostics are the operational controls for this.
- The variance approximation is a binomial-style shortcut with substitution, not a full design-based estimator.
Reproducibility
Weighting is deterministic given the input dataset, the marginal targets, the stage configuration, the respondent universes selected for each stage, and the IPF convergence tolerance. Re-running the pipeline on the same inputs produces the same weights and the same reported statistics.