Documentation

Utility and simulated share methodology

This page is the technical companion to the discrete-choice reporting outputs in MX8 Labs. It documents how utility scores are estimated, how simulated shares are computed, and how uncertainty is summarized in reports.

For user-facing setup of the question types this estimator powers, see Choice-Based Conjoint and Running MaxDiff. The weighted aggregation described below shares its effective-sample-size machinery with the Weighting methodology used elsewhere in reporting.

Scope

MX8 Labs computes respondent-level utilities for MaxDiff and choice-based conjoint (CBC) questions using a Hierarchical Bayes (HB) multinomial logit model. Report outputs are then derived from posterior draws:

  • Utility scores (part-worth/item utility)
  • Scaled scores
  • Ranks
  • Simulated share of preference

Point estimates and uncertainty are produced by combining:

  • posterior variation across utility draws, and
  • weighted sampling variation using effective sample size.

Inputs and notation

Let:

  • respondents be indexed by i=1,,Ni = 1, \dots, N,
  • tasks by t=1,,Tit = 1, \dots, T_i per respondent,
  • alternatives in a task by aa,
  • utility components (MaxDiff items or conjoint attribute-levels) by kk.

For each respondent ii, the model estimates a vector of part utilities:

βi=(βi1,,βiK)\boldsymbol{\beta}_i = (\beta_{i1}, \dots, \beta_{iK})

Each alternative utility is a linear sum of included components.

Design parsing

The estimator infers question design from reporting rows:

  • Conjoint/CBC: at least one component topic column exists.
  • MaxDiff: no component topic columns; responses are treated as item components.

For MaxDiff, if both best and worst selections are present, the best task is modeled with positive sign and the worst task is modeled on the remaining alternatives with negative sign.

Hierarchical Bayes estimator

Utilities are estimated with a respondent-level random-coefficients logit model:

βi=μ+ziσ\boldsymbol{\beta}_i = \boldsymbol{\mu} + \mathbf{z}_i \odot \boldsymbol{\sigma}

with priors:

  • μkN(0,1)\mu_k \sim \mathcal{N}(0,1)
  • σkHalfNormal(1)\sigma_k \sim \text{HalfNormal}(1)
  • zikN(0,1)z_{ik} \sim \mathcal{N}(0,1)

For each task, utility for alternative aa is:

Uita=stk=1KxitakβikU_{ita} = s_t \sum_{k=1}^{K} x_{itak}\beta_{ik}

where:

  • xitak{0,1}x_{itak} \in \{0,1\} indicates whether component kk appears in alternative aa,
  • st{+1,1}s_t \in \{+1,-1\} is the task sign (used for MaxDiff worst handling).

Choice probability is softmax:

P(yit=a)=exp(Uita)aexp(Uita)P(y_{it}=a) = \frac{\exp(U_{ita})}{\sum_{a'} \exp(U_{ita'})}

Posterior sampling uses a No-U-Turn Sampler (NUTS) under platform-managed defaults for chains, warm-up, and retained draws. A deterministic subset of posterior draws is cached for downstream reporting.

Cached utility outputs

For each respondent-component pair, MX8 persists:

  • Posterior mean utility
  • Posterior draw utilities (draw_id indexed)

These cached draws drive all downstream utility/share reporting.

Utility score reporting

For utility mode, each row value is the cached posterior draw utility directly:

vidkutility=βidkv_{idk}^{\text{utility}} = \beta_{idk}

where dd indexes retained posterior draws.

Conjoint rows are labeled as attribute: level; MaxDiff rows use item names.

Simulated share reporting

Simulated share is computed within each respondent and draw by exponentiating utilities and normalizing:

sidr=exp(vidr)rexp(vidr)s_{idr} = \frac{\exp(v_{idr})}{\sum_{r'} \exp(v_{idr'})}

where rr indexes reported options (items or attribute-level rows in derived mode, or configured scenarios in explicit simulation mode).

Two simulation paths are supported:

  1. Derived share: share over the natural reported rows for the question.
  2. Scenario simulation: share over user-defined scenarios.

For scenario simulation:

  • MaxDiff scenario utility is the utility of the specified item.
  • Conjoint scenario utility is the sum of the selected attribute-level utilities in the profile.

Shares are then softmax-normalized across scenarios for that respondent and draw.

Weighted aggregation and uncertainty

For any reported row and tab cell, MX8 aggregates draw-level values using respondent reporting weights.

Within each draw dd:

m^d=iwividiwi\hat{m}_d = \frac{\sum_i w_i v_{id}}{\sum_i w_i}

The final estimate is the mean across draws:

m^=1Dd=1Dm^d\hat{m} = \frac{1}{D}\sum_{d=1}^{D}\hat{m}_d

Total variance is decomposed as:

Var^(m^)=Vard(m^d)posterior variance+1Dd=1DVar^sampling,dsampling variance\widehat{\mathrm{Var}}(\hat{m}) = \underbrace{\mathrm{Var}_d(\hat{m}_d)}_{\text{posterior variance}} + \underbrace{\frac{1}{D}\sum_{d=1}^{D}\widehat{\mathrm{Var}}_{\text{sampling},d}}_{\text{sampling variance}}

Sampling variance per draw uses weighted variance with Kish effective sample size:

neff=(iwi)2iwi2,Var^sampling,d=iwi(vidm^d)2/iwineffn_{\mathrm{eff}} = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}, \quad \widehat{\mathrm{Var}}_{\text{sampling},d} = \frac{\sum_i w_i(v_{id}-\hat{m}_d)^2 / \sum_i w_i}{n_{\mathrm{eff}}}

Reported standard error is:

SE^=Var^(m^)\widehat{\mathrm{SE}} = \sqrt{\widehat{\mathrm{Var}}(\hat{m})}

Related output modes

The same posterior draws also support:

  • Scaled scores: per respondent/draw min-max scaling to 0-100.
  • Ranks: dense descending rank per respondent/draw.

These are transformations of cached utility draws before the same weighted draw-aggregation pipeline.

Guardrails and failure modes

  1. Utility estimation requires valid choice-task observations with timestamps and at least one selected alternative pattern.
  2. If utility sidecars are missing, report requests return not-ready status until cache generation completes.
  3. If no valid observations exist for a question, utility generation fails terminally for that question and reporting returns a user-facing validation error.

Assumptions and limitations

  1. Current estimator is HB multinomial logit (hierarchical_bayes_discrete_choice_v1).
  2. The implementation uses additive utility within alternatives and standard softmax choice probabilities.
  3. Report uncertainty is an approximation that combines posterior and effective-sample-size weighted sampling variance; it is not a full complex-survey design variance estimator.
  4. Simulation outputs are preference shares under the model, not market shares.

Reproducibility

For fixed data, estimator settings, and seed, utility estimation and retained draw selection are deterministic, and downstream share calculations are deterministic transformations of those cached draws.

Exporting respondent-level outputs

The cached posterior outputs described above can be exported for offline analysis: