add copilot instructions and rename classes

This commit is contained in:
2026-02-02 17:21:57 +01:00
parent 02a0214539
commit 6ba30ff041
12 changed files with 133 additions and 28 deletions

105
.github/copilot-instructions.md vendored Normal file
View File

@@ -0,0 +1,105 @@
# Voice Branding Quantitative Analysis - Copilot Instructions
## Project Overview
Qualtrics survey analysis for brand personality research. Analyzes voice samples (V04-V91) across speaking style traits, character rankings, and demographic segments. Uses **Marimo notebooks** for interactive analysis and **Polars** for data processing.
## Architecture
### Core Components
- **`QualtricsSurvey`** (`utils.py`): Main class combining data loading, filtering, and plotting via `QualtricsPlotsMixin`
- **Marimo notebooks** (`0X_*.py`): Interactive apps run via `uv run marimo run <file>.py`
- **Data exports** (`data/exports/<date>/`): Qualtrics CSVs with `_Labels.csv` and `_Values.csv` variants
- **QSF files**: Qualtrics survey definitions for mapping QIDs to question text
### Data Flow
```
Qualtrics CSV (3-row header) → QualtricsSurvey.load_data() → LazyFrame with QID columns
filter_data() → get_*() methods → plot_*() methods → figures/<export>/<filter>/
```
## ⚠️ Critical AI Agent Rules
1. **NEVER modify Marimo notebooks directly** - The `XX_*.py` files are Marimo notebooks and should not be edited by AI agents
2. **NEVER run Marimo notebooks for debugging** - These are interactive apps, not test scripts
3. **For debugging**: Create a standalone temporary Python script (e.g., `debug_temp.py`) to test functions
4. **Reading notebooks is OK** - You may read notebook files to understand how functions are used. Ask the user which notebook they're working in for context
5. **No changelog markdown files** - Do not create new markdown files to document small changes or describe new usage
## Key Patterns
### Polars LazyFrames
Always work with `pl.LazyFrame` until visualization; call `.collect()` only when needed:
```python
data = S.load_data() # Returns LazyFrame
subset, meta = S.get_voice_scale_1_10(data) # Returns (LazyFrame, Optional[dict])
df = subset.collect() # Materialize for plotting
```
### Column Naming Convention
Survey columns follow patterns that encode voice/trait info:
- `SS_Green_Blue__V14__Choice_1` → Speaking Style, Voice 14, Trait 1
- `Voice_Scale_1_10__V48` → 1-10 rating for Voice 48
- `Top_3_Voices_ranking__V77` → Ranking position for Voice 77
### Filter State & Figure Output
`QualtricsSurvey` stores filter state and auto-generates output paths:
```python
S.filter_data(data, consumer=['Early Professional'])
# Plots save to: figures/<export>/Cons-Early_Professional/<plot_name>.png
```
### Getter Methods Return Tuples
All `get_*()` methods return `(LazyFrame, Optional[metadata])`:
```python
df, choices_map = S.get_ss_green_blue(data) # choices_map has trait descriptions
df, _ = S.get_character_ranking(data) # Second element may be None
```
## Development Commands
```bash
# Run interactive analysis notebook
uv run marimo run 02_quant_analysis.py --port 8080
# Edit notebook in editor mode
uv run marimo edit 02_quant_analysis.py
# Headless mode for shared access
uv run marimo run 02_quant_analysis.py --headless --port 8080
```
## Important Files
| File | Purpose |
|------|---------|
| `utils.py` | `QualtricsSurvey` class, data transformations, PPTX utilities |
| `plots.py` | `QualtricsPlotsMixin` with all Altair plotting methods |
| `theme.py` | `ColorPalette` and `jpmc_altair_theme()` for consistent styling |
| `validation.py` | Data quality checks (progress, duration outliers, straight-liners) |
| `speaking_styles.py` | `SPEAKING_STYLES` dict mapping colors to trait groups |
## Conventions
### Altair Charts & Colors
- **ALL colors MUST come from `theme.py`** - Use `ColorPalette.PRIMARY`, `ColorPalette.RANK_1`, etc.
- If a new color is needed, add it to `ColorPalette` in `theme.py` first, then use it
- Never hardcode hex colors directly in plotting code
- Charts auto-save via `_save_plot()` when `fig_save_dir` is set
- Filter footnotes added automatically via `_add_filter_footnote()`
### QSF Parsing
Use `_get_qsf_question_by_QID()` to extract question config:
```python
cfg = self._get_qsf_question_by_QID('QID27')['Payload']
recode_map = cfg['RecodeValues'] # Maps choice numbers to values
```
### PPTX Image Replacement
Images matched by perceptual hash (not filename); alt-text encodes figure path:
```python
utils.update_ppt_alt_text(ppt_path, image_source_dir) # Tag images with alt-text
utils.pptx_replace_named_image(ppt, target_tag, new_image) # Replace by alt-text
```
This is a process that should be run manually be the user ONLY.

View File

@@ -27,7 +27,7 @@ def _(Path):
@app.cell @app.cell
def _(qsf_file, results_file, utils): def _(qsf_file, results_file, utils):
survey = utils.JPMCSurvey(results_file, qsf_file) survey = utils.QualtricsSurvey(results_file, qsf_file)
data_all = survey.load_data() data_all = survey.load_data()
return (survey,) return (survey,)

View File

@@ -11,12 +11,12 @@ def _():
from pathlib import Path from pathlib import Path
from validation import check_progress, duration_validation, check_straight_liners from validation import check_progress, duration_validation, check_straight_liners
from utils import JPMCSurvey, combine_exclusive_columns, calculate_weighted_ranking_scores from utils import QualtricsSurvey, combine_exclusive_columns, calculate_weighted_ranking_scores
import utils import utils
from speaking_styles import SPEAKING_STYLES from speaking_styles import SPEAKING_STYLES
return ( return (
JPMCSurvey, QualtricsSurvey,
Path, Path,
SPEAKING_STYLES, SPEAKING_STYLES,
calculate_weighted_ranking_scores, calculate_weighted_ranking_scores,
@@ -49,8 +49,8 @@ def _(Path, file_browser, mo):
@app.cell @app.cell
def _(JPMCSurvey, QSF_FILE, RESULTS_FILE, mo): def _(QualtricsSurvey, QSF_FILE, RESULTS_FILE, mo):
S = JPMCSurvey(RESULTS_FILE, QSF_FILE) S = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
try: try:
data_all = S.load_data() data_all = S.load_data()
except NotImplementedError as e: except NotImplementedError as e:

View File

@@ -9,7 +9,7 @@ with app.setup:
from pathlib import Path from pathlib import Path
from validation import check_progress, duration_validation, check_straight_liners from validation import check_progress, duration_validation, check_straight_liners
from utils import JPMCSurvey, combine_exclusive_columns, calculate_weighted_ranking_scores from utils import QualtricsSurvey, combine_exclusive_columns, calculate_weighted_ranking_scores
import utils import utils
from speaking_styles import SPEAKING_STYLES from speaking_styles import SPEAKING_STYLES
@@ -35,7 +35,7 @@ def _(file_browser):
@app.cell @app.cell
def _(QSF_FILE, RESULTS_FILE): def _(QSF_FILE, RESULTS_FILE):
S = JPMCSurvey(RESULTS_FILE, QSF_FILE) S = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
try: try:
data_all = S.load_data() data_all = S.load_data()
except NotImplementedError as e: except NotImplementedError as e:

View File

@@ -10,8 +10,8 @@ def _():
import polars as pl import polars as pl
from pathlib import Path from pathlib import Path
from utils import JPMCSurvey, combine_exclusive_columns from utils import QualtricsSurvey, combine_exclusive_columns
return JPMCSurvey, combine_exclusive_columns, mo, pl return QualtricsSurvey, combine_exclusive_columns, mo, pl
@app.cell @app.cell
@@ -29,8 +29,8 @@ def _():
@app.cell @app.cell
def _(JPMCSurvey, QSF_FILE, RESULTS_FILE): def _(QualtricsSurvey, QSF_FILE, RESULTS_FILE):
survey = JPMCSurvey(RESULTS_FILE, QSF_FILE) survey = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
data = survey.load_data() data = survey.load_data()
data.collect() data.collect()
return data, survey return data, survey

View File

@@ -1,4 +1,4 @@
# Altair Migration Plan: Plotly → Altair for JPMCPlotsMixin # Altair Migration Plan: Plotly → Altair for QualtricsPlotsMixin
**Date:** January 28, 2026 **Date:** January 28, 2026
**Status:** Not Started **Status:** Not Started
@@ -22,9 +22,9 @@ Current Plotly implementation has a critical layout issue: filter annotations ov
## Current System Analysis ## Current System Analysis
### File Structure ### File Structure
- **`plots.py`** - Contains `JPMCPlotsMixin` class with 10 plotting methods - **`plots.py`** - Contains `QualtricsPlotsMixin` class with 10 plotting methods
- **`theme.py`** - Contains `ColorPalette` class with all styling constants - **`theme.py`** - Contains `ColorPalette` class with all styling constants
- **`utils.py`** - Contains `JPMCSurvey` class that mixes in `JPMCPlotsMixin` - **`utils.py`** - Contains `QualtricsSurvey` class that mixes in `QualtricsPlotsMixin`
### Color Palette (from theme.py) ### Color Palette (from theme.py)
```python ```python
@@ -1140,10 +1140,10 @@ uv remove plotly kaleido
```python ```python
import marimo as mo import marimo as mo
import polars as pl import polars as pl
from utils import JPMCSurvey from utils import QualtricsSurvey
# Load sample data # Load sample data
survey = JPMCSurvey() survey = QualtricsSurvey()
survey.load_data('path/to/data') survey.load_data('path/to/data')
survey.fig_save_dir = 'figures/altair_test' survey.fig_save_dir = 'figures/altair_test'
@@ -1244,7 +1244,7 @@ After completing all tasks, verify the following:
### Regression Testing ### Regression Testing
- [ ] Existing Marimo notebooks still work - [ ] Existing Marimo notebooks still work
- [ ] Data filtering still works (`filter_data()`) - [ ] Data filtering still works (`filter_data()`)
- [ ] `JPMCSurvey` class initialization unchanged - [ ] `QualtricsSurvey` class initialization unchanged
- [ ] No breaking changes to public API - [ ] No breaking changes to public API
### Documentation ### Documentation

View File

@@ -5,14 +5,14 @@ This example shows how to use the `create_traits_wordcloud` function to visualiz
## Basic Usage in Jupyter/Marimo Notebook ## Basic Usage in Jupyter/Marimo Notebook
```python ```python
from utils import JPMCSurvey, create_traits_wordcloud from utils import QualtricsSurvey, create_traits_wordcloud
from pathlib import Path from pathlib import Path
# Load your survey data # Load your survey data
RESULTS_FILE = "data/exports/1-23-26/JPMC_Chase Brand Personality_Quant Round 1_January 23, 2026_Labels.csv" RESULTS_FILE = "data/exports/1-23-26/JPMC_Chase Brand Personality_Quant Round 1_January 23, 2026_Labels.csv"
QSF_FILE = "data/19-dec_V1_quant_incl_shani_comments.qsf" QSF_FILE = "data/19-dec_V1_quant_incl_shani_comments.qsf"
S = JPMCSurvey(RESULTS_FILE, QSF_FILE) S = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
data = S.load_data() data = S.load_data()
# Get Top 3 Traits data # Get Top 3 Traits data

View File

@@ -1,6 +1,6 @@
import polars as pl import polars as pl
from utils import JPMCSurvey, process_speaking_style_data, process_voice_scale_data, join_voice_and_style_data from utils import QualtricsSurvey, process_speaking_style_data, process_voice_scale_data, join_voice_and_style_data
from plots import plot_speaking_style_correlation from plots import plot_speaking_style_correlation
from speaking_styles import SPEAKING_STYLES from speaking_styles import SPEAKING_STYLES
@@ -14,7 +14,7 @@ RESULTS_FILE = "data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase Bra
QSF_FILE = "data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf" QSF_FILE = "data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf"
try: try:
survey = JPMCSurvey(RESULTS_FILE, QSF_FILE) survey = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
except TypeError: except TypeError:
# Fallback if signature is different or file not found (just in case) # Fallback if signature is different or file not found (just in case)
print("Error initializing survey with paths. Checking signature...") print("Error initializing survey with paths. Checking signature...")

View File

@@ -11,8 +11,8 @@ from theme import ColorPalette
import hashlib import hashlib
class JPMCPlotsMixin: class QualtricsPlotsMixin:
"""Mixin class for plotting functions in JPMCSurvey.""" """Mixin class for plotting functions in QualtricsSurvey."""
def _process_title(self, title: str) -> str | list[str]: def _process_title(self, title: str) -> str | list[str]:
"""Process title to handle <br> tags for Altair.""" """Process title to handle <br> tags for Altair."""

View File

@@ -11,7 +11,7 @@ from io import BytesIO
import imagehash import imagehash
from PIL import Image from PIL import Image
from plots import JPMCPlotsMixin from plots import QualtricsPlotsMixin
from pptx import Presentation from pptx import Presentation
@@ -514,7 +514,7 @@ def normalize_global_values(df: pl.DataFrame, target_cols: list[str]) -> pl.Data
return res.lazy() if was_lazy else res return res.lazy() if was_lazy else res
class JPMCSurvey(JPMCPlotsMixin): class QualtricsSurvey(QualtricsPlotsMixin):
"""Class to handle JPMorgan Chase survey data.""" """Class to handle JPMorgan Chase survey data."""
def __init__(self, data_path: Union[str, Path], qsf_path: Union[str, Path]): def __init__(self, data_path: Union[str, Path], qsf_path: Union[str, Path]):

View File

@@ -323,12 +323,12 @@ def check_straight_liners(data, max_score=3):
if __name__ == "__main__": if __name__ == "__main__":
from utils import JPMCSurvey from utils import QualtricsSurvey
RESULTS_FILE = "data/exports/OneDrive_2026-01-28/1-28-26 Afternoon/JPMC_Chase Brand Personality_Quant Round 1_January 28, 2026_Afternoon_Labels.csv" RESULTS_FILE = "data/exports/OneDrive_2026-01-28/1-28-26 Afternoon/JPMC_Chase Brand Personality_Quant Round 1_January 28, 2026_Afternoon_Labels.csv"
QSF_FILE = "data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf" QSF_FILE = "data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf"
S = JPMCSurvey(RESULTS_FILE, QSF_FILE) S = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
data = S.load_data() data = S.load_data()
# print("Checking Green Blue:") # print("Checking Green Blue:")

View File

@@ -1,6 +1,6 @@
"""Word cloud utilities for Voice Branding analysis. """Word cloud utilities for Voice Branding analysis.
The main wordcloud function is available as a method on JPMCSurvey: The main wordcloud function is available as a method on QualtricsSurvey:
S.plot_traits_wordcloud(data, column='Top_3_Traits', title='...') S.plot_traits_wordcloud(data, column='Top_3_Traits', title='...')
This module provides standalone imports for backwards compatibility. This module provides standalone imports for backwards compatibility.