Article
Wide Excel Format
1. Overview
The Wide format presents each respondent as a single row, with each survey question represented as a separate column. This format is particularly useful when you want to:
- Run statistical analysis or modeling in packages like SPSS, R, or Python.
- Use dashboards or BI tools that expect one record per respondent.
- Quickly view all answers for a respondent in one row.
2. File Structure & Layout
Each row corresponds to one respondent. Each column corresponds to ** one question or metadata field**.
Example (first 5 columns):
| V001_respondent_id | V002_Recent_Restaurant_Visit | V003_Age | V004_Gender | V005_Ethnicity |
|---|---|---|---|---|
| ps-0097ac15-868c-6608-25fc-c0fe2cd884a8 | Yes | 35 | Female | White |
| ps-012a5d8b-9dc4-2132-782d-73742be4088f | Yes | 21 | Female | White |
| ps-022b6b43-6304-2dc5-0830-10d98bd7dee3 | Yes | 18 | Male | White |
3. Key Columns
- V001_respondent_id – Unique identifier for each respondent.
- Question columns (V###_...) – Each prefixed with a variable code (V###) followed by a short label. For example:
V002_Recent_Restaurant_Visit V003_AgeV004_Gender
The last three columns are always:
- Weight column (e.g., V171_weight) – The respondent’s statistical weight.
- Timing columns (e.g., V172_start_time, V173_end_time) – Timestamps recording when the respondent started and completed the survey.
4. Data Representation
Single-choice questions
Stored as one column per question with the selected answer recorded.
Multi-choice questions
Each option is represented as a separate column. Values indicate whether the option was selected, and may also include rank order (e.g., -1 = not selected, 1 = selected first, 2 = selected second, etc.).
Numeric questions
Stored directly as numeric values (e.g., annual income).
Open-end questions
Stored as free-text responses in their own columns.
⚠️ Note: Recoded variables and coded open ends are ** not** included in the Wide format. They are only available in the Long format.
5. Missing & Special Values
- -1 often denotes an option that was not selected in multi-choice or rank questions.
- Empty cells may indicate a skipped or non-applicable question.
- "Prefer not to say" appears as a standard response category.
6. Weighting
- Apply the weight column when analyzing results to ensure the dataset reflects the target population.
7. Best Practices
- Use variable codes (V###) for merging with codebooks or comparing across formats.
- Treat
-1values consistently as non-selections in analysis. - For ranked multi-choice questions, filter out
-1and use positive values to analyze the order of selections. - When comparing across formats, match on
reporting_id(long format) toV###codes (wide format). - Look to the Long format for recoded values and coded open ends, since these are not included in the Wide format.
8. When to Use Wide Format
- For statistical modeling and regressions.
- When using survey data in BI dashboards or visualization tools.
- When analysts want one record per respondent with all answers side by side.