Article
Long Excel Format
1. Overview
The Long format is designed for detailed, question-by-question analysis. It includes all the data stored in the platform and is identical to the data used internally for reporting. Each row represents a single respondent's answer to a single survey question. This format is particularly useful when you want to:
- Explore how different questions were answered across respondents.
- Pivot, filter, and group responses in tools like Excel, R, or Python.
- Work with multi-choice and grid questions without needing to manage many columns.
2. File Structure & Layout
Each row corresponds to one answer to one option of one question from a respondent. Respondents therefore have multiple rows, one for each question they encountered.
Example (first 5 rows):
| respondent_id | status | question | reporting_id | type | response | raw_response | responded | timestamp | weight |
|---|---|---|---|---|---|---|---|---|---|
| 00044f3b-ec63-2e17-9b5c-970e0efd5a8b | Terminated | How old are you? | Age | NumericQuestion | 35-44 | 36 | 1 | 2025-07-30 16:27:06.025 | 0.638 |
| 00044f3b-ec63-2e17-9b5c-970e0efd5a8b | Terminated | What is your gender? | Gender | MultiChoiceQuestion | Male | Male | 1 | 2025-07-30 16:27:06.068 | 0.638 |
| 00044f3b-ec63-2e17-9b5c-970e0efd5a8b | Terminated | What is your gender? | Gender | MultiChoiceQuestion | Female | Female | 0 | 2025-07-30 16:27:06.068 | 0.638 |
3. Key Columns
- respondent_id – Unique identifier for each participant.
- status – Final survey status (e.g., Completed, Terminated).
- question – Full wording of the question asked.
- reporting_id – The labeled identifier for the question as set in the dashboard (e.g., Age, Gender).
- line_number – The line number of the question in the survey script.
- type – Type of question (NumericQuestion, MultiChoiceQuestion, OpenEnd, etc.).
- response – The recoded, human-readable response category (e.g., 35-44).
- raw_response – The raw value stored (e.g.,
36). - responded – Indicates whether and in what order the respondent selected the option.
0= not selected,1= selected first,2= selected second, and so on. - timestamp – Time when the answer was submitted.
- weight – Weighting factor applied to this respondent’s answers for statistical adjustment.
4. Data Representation
Single-choice questions
Stored as one row with responded=1.
Multi-choice questions
Stored as multiple rows per respondent per option. The chosen options have responded>0, with the number indicating the order in which the options were chosen. Unchosen options have responded=0.
Example: Multi-choice question
Question: Which of the following fruits do you like? (Select all that apply)
| respondent_id | question | response | responded |
|---|---|---|---|
| r1 | Fruits | Apple | 1 |
| r1 | Fruits | Banana | 0 |
| r1 | Fruits | Orange | 2 |
Here, the respondent chose Apple first, Orange second, and did not select Banana.
Numeric questions
Both response (bucketed/cleaned category, e.g. 35-44) and raw_response (e.g. 36) are provided.
Open-end questions
The full text appears in response and raw_response.
5. Missing & Special Values
- Non-responses may appear with
responded=0and emptyraw_response. - "Prefer not to say" or similar options appear as normal response categories.
- Terminated respondents may have partial rows depending on where they dropped out.
6. Weighting
- Apply the weight column in analysis to ensure results reflect population targets.
7. Best Practices
- Use pivot tables (Excel) or
groupby(Python/Pandas) to aggregate responses. - For multi-choice questions, include all rows where
responded>0to capture all selected options. Use the order number if you need to analyze sequence of selection. - When comparing across formats, match on
reporting_id(long) to variable codes (wide/SPSS).
8. When to Use Long Format
- For deep exploratory analysis.
- When handling **multi-select ** or grid questions where wide format becomes cumbersome.
- When exporting data into R/Python for custom cleaning, text analysis, or advanced visualization.