4.4 KiB
4.4 KiB
Voice Branding Quantitative Analysis - Copilot Instructions
Project Overview
Qualtrics survey analysis for brand personality research. Analyzes voice samples (V04-V91) across speaking style traits, character rankings, and demographic segments. Uses Marimo notebooks for interactive analysis and Polars for data processing.
Architecture
Core Components
QualtricsSurvey(utils.py): Main class combining data loading, filtering, and plotting viaQualtricsPlotsMixin- Marimo notebooks (
0X_*.py): Interactive apps run viauv run marimo run <file>.py - Data exports (
data/exports/<date>/): Qualtrics CSVs with_Labels.csvand_Values.csvvariants - QSF files: Qualtrics survey definitions for mapping QIDs to question text
Data Flow
Qualtrics CSV (3-row header) → QualtricsSurvey.load_data() → LazyFrame with QID columns
↓
filter_data() → get_*() methods → plot_*() methods → figures/<export>/<filter>/
⚠️ Critical AI Agent Rules
- NEVER modify Marimo notebooks directly - The
XX_*.pyfiles are Marimo notebooks and should not be edited by AI agents - NEVER run Marimo notebooks for debugging - These are interactive apps, not test scripts
- For debugging: Create a standalone temporary Python script (e.g.,
debug_temp.py) to test functions - Reading notebooks is OK - You may read notebook files to understand how functions are used. Ask the user which notebook they're working in for context
- No changelog markdown files - Do not create new markdown files to document small changes or describe new usage
Key Patterns
Polars LazyFrames
Always work with pl.LazyFrame until visualization; call .collect() only when needed:
data = S.load_data() # Returns LazyFrame
subset, meta = S.get_voice_scale_1_10(data) # Returns (LazyFrame, Optional[dict])
df = subset.collect() # Materialize for plotting
Column Naming Convention
Survey columns follow patterns that encode voice/trait info:
SS_Green_Blue__V14__Choice_1→ Speaking Style, Voice 14, Trait 1Voice_Scale_1_10__V48→ 1-10 rating for Voice 48Top_3_Voices_ranking__V77→ Ranking position for Voice 77
Filter State & Figure Output
QualtricsSurvey stores filter state and auto-generates output paths:
S.filter_data(data, consumer=['Early Professional'])
# Plots save to: figures/<export>/Cons-Early_Professional/<plot_name>.png
Getter Methods Return Tuples
All get_*() methods return (LazyFrame, Optional[metadata]):
df, choices_map = S.get_ss_green_blue(data) # choices_map has trait descriptions
df, _ = S.get_character_ranking(data) # Second element may be None
Development Commands
# Run interactive analysis notebook
uv run marimo run 02_quant_analysis.py --port 8080
# Edit notebook in editor mode
uv run marimo edit 02_quant_analysis.py
# Headless mode for shared access
uv run marimo run 02_quant_analysis.py --headless --port 8080
Important Files
| File | Purpose |
|---|---|
utils.py |
QualtricsSurvey class, data transformations, PPTX utilities |
plots.py |
QualtricsPlotsMixin with all Altair plotting methods |
theme.py |
ColorPalette and jpmc_altair_theme() for consistent styling |
validation.py |
Data quality checks (progress, duration outliers, straight-liners) |
speaking_styles.py |
SPEAKING_STYLES dict mapping colors to trait groups |
Conventions
Altair Charts & Colors
- ALL colors MUST come from
theme.py- UseColorPalette.PRIMARY,ColorPalette.RANK_1, etc. - If a new color is needed, add it to
ColorPaletteintheme.pyfirst, then use it - Never hardcode hex colors directly in plotting code
- Charts auto-save via
_save_plot()whenfig_save_diris set - Filter footnotes added automatically via
_add_filter_footnote()
QSF Parsing
Use _get_qsf_question_by_QID() to extract question config:
cfg = self._get_qsf_question_by_QID('QID27')['Payload']
recode_map = cfg['RecodeValues'] # Maps choice numbers to values
PPTX Image Replacement
Images matched by perceptual hash (not filename); alt-text encodes figure path:
utils.update_ppt_alt_text(ppt_path, image_source_dir) # Tag images with alt-text
utils.pptx_replace_named_image(ppt, target_tag, new_image) # Replace by alt-text
This is a process that should be run manually be the user ONLY.