Documentation

Wide Excel Format

Article

Wide Excel Format

1. Overview

The Wide format presents each respondent as a single row, with each survey question represented as a separate column. This format is particularly useful when you want to:

  • Run statistical analysis or modeling in packages like SPSS, R, or Python.
  • Use dashboards or BI tools that expect one record per respondent.
  • Quickly view all answers for a respondent in one row.
2. File Structure & Layout

Each row corresponds to one respondent. Each column corresponds to ** one question or metadata field**.

Example (first 5 columns):
V001_respondent_idV002_Recent_Restaurant_VisitV003_AgeV004_GenderV005_Ethnicity
ps-0097ac15-868c-6608-25fc-c0fe2cd884a8Yes35FemaleWhite
ps-012a5d8b-9dc4-2132-782d-73742be4088fYes21FemaleWhite
ps-022b6b43-6304-2dc5-0830-10d98bd7dee3Yes18MaleWhite
3. Key Columns
  • V001_respondent_id – Unique identifier for each respondent.
  • Question columns (V###_...) – Each prefixed with a variable code (V###) followed by a short label. For example:V002_Recent_Restaurant_Visit
  • V003_Age
  • V004_Gender

The last three columns are always:

  • Weight column (e.g., V171_weight) – The respondent’s statistical weight.
  • Timing columns (e.g., V172_start_time, V173_end_time) – Timestamps recording when the respondent started and completed the survey.
4. Data Representation
Single-choice questions

Stored as one column per question with the selected answer recorded.

Multi-choice questions

Each option is represented as a separate column. Values indicate whether the option was selected, and may also include rank order (e.g., -1 = not selected, 1 = selected first, 2 = selected second, etc.).

Numeric questions

Stored directly as numeric values (e.g., annual income).

Open-end questions

Stored as free-text responses in their own columns.

⚠️ Note: Recoded variables and coded open ends are ** not** included in the Wide format. They are only available in the Long format.

5. Missing & Special Values
  • -1 often denotes an option that was not selected in multi-choice or rank questions.
  • Empty cells may indicate a skipped or non-applicable question.
  • "Prefer not to say" appears as a standard response category.
6. Weighting
  • Apply the weight column when analyzing results to ensure the dataset reflects the target population.
7. Best Practices
  • Use variable codes (V###) for merging with codebooks or comparing across formats.
  • Treat -1 values consistently as non-selections in analysis.
  • For ranked multi-choice questions, filter out -1 and use positive values to analyze the order of selections.
  • When comparing across formats, match on reporting_id (long format) to V### codes (wide format).
  • Look to the Long format for recoded values and coded open ends, since these are not included in the Wide format.
8. When to Use Wide Format
  • For statistical modeling and regressions.
  • When using survey data in BI dashboards or visualization tools.
  • When analysts want one record per respondent with all answers side by side.