Documentation

Long Excel Format

Article

Long Excel Format

1. Overview

The Long format is designed for detailed, question-by-question analysis. It includes all the data stored in the platform and is identical to the data used internally for reporting. Each row represents a single respondent's answer to a single survey question. This format is particularly useful when you want to:

  • Explore how different questions were answered across respondents.
  • Pivot, filter, and group responses in tools like Excel, R, or Python.
  • Work with multi-choice and grid questions without needing to manage many columns.
2. File Structure & Layout

Each row corresponds to one answer to one option of one question from a respondent. Respondents therefore have multiple rows, one for each question they encountered.

Example (first 5 rows):
respondent_idstatusquestionreporting_idtyperesponseraw_responserespondedtimestampweight
00044f3b-ec63-2e17-9b5c-970e0efd5a8bTerminatedHow old are you?AgeNumericQuestion35-443612025-07-30 16:27:06.0250.638
00044f3b-ec63-2e17-9b5c-970e0efd5a8bTerminatedWhat is your gender?GenderMultiChoiceQuestionMaleMale12025-07-30 16:27:06.0680.638
00044f3b-ec63-2e17-9b5c-970e0efd5a8bTerminatedWhat is your gender?GenderMultiChoiceQuestionFemaleFemale02025-07-30 16:27:06.0680.638
3. Key Columns
  • respondent_id – Unique identifier for each participant.
  • status – Final survey status (e.g., Completed, Terminated).
  • question – Full wording of the question asked.
  • reporting_id – The labeled identifier for the question as set in the dashboard (e.g., Age, Gender).
  • line_number – The line number of the question in the survey script.
  • type – Type of question (NumericQuestion, MultiChoiceQuestion, OpenEnd, etc.).
  • response – The recoded, human-readable response category (e.g., 35-44).
  • raw_response – The raw value stored (e.g., 36).
  • responded – Indicates whether and in what order the respondent selected the option. 0 = not selected, 1 = selected first, 2 = selected second, and so on.
  • timestamp – Time when the answer was submitted.
  • weight – Weighting factor applied to this respondent’s answers for statistical adjustment.
4. Data Representation
Single-choice questions

Stored as one row with responded=1.

Multi-choice questions

Stored as multiple rows per respondent per option. The chosen options have responded>0, with the number indicating the order in which the options were chosen. Unchosen options have responded=0.

Example: Multi-choice question

Question: Which of the following fruits do you like? (Select all that apply)

respondent_idquestionresponseresponded
r1FruitsApple1
r1FruitsBanana0
r1FruitsOrange2

Here, the respondent chose Apple first, Orange second, and did not select Banana.

Numeric questions

Both response (bucketed/cleaned category, e.g. 35-44) and raw_response (e.g. 36) are provided.

Open-end questions

The full text appears in response and raw_response.

5. Missing & Special Values
  • Non-responses may appear with responded=0 and empty raw_response.
  • "Prefer not to say" or similar options appear as normal response categories.
  • Terminated respondents may have partial rows depending on where they dropped out.
6. Weighting
  • Apply the weight column in analysis to ensure results reflect population targets.
7. Best Practices
  • Use pivot tables (Excel) or groupby (Python/Pandas) to aggregate responses.
  • For multi-choice questions, include all rows where responded>0 to capture all selected options. Use the order number if you need to analyze sequence of selection.
  • When comparing across formats, match on reporting_id (long) to variable codes (wide/SPSS).
8. When to Use Long Format
  • For deep exploratory analysis.
  • When handling **multi-select ** or grid questions where wide format becomes cumbersome.
  • When exporting data into R/Python for custom cleaning, text analysis, or advanced visualization.