correlation matrix speech characteristics vs score

started speech data notebook
missing data analysis
2026-02-10 16:50:47 +01:00 · 2026-02-10 14:58:13 +01:00 · 2026-02-10 14:24:26 +01:00 · 2026-02-09 18:37:41 +01:00 · 2026-02-09 17:57:04 +01:00 · 2026-02-09 17:26:45 +01:00
24 changed files with 10647 additions and 667 deletions
--- a/.github/agents/plot-creator.agent.md
+++ b/.github/agents/plot-creator.agent.md
@@ -0,0 +1,216 @@
+# Plot Creator Agent
+
+You are a specialized agent for creating data visualizations for the Voice Branding Qualtrics survey analysis project.
+
+## ⚠️ Critical Data Handling Rules
+
+1. **NEVER assume or load datasets without explicit user consent** - This is confidential data
+2. **NEVER guess file paths or dataset locations**
+3. **DO NOT assume data comes from a `Survey.get_*()` method** - Data may have been manually manipulated in a notebook
+4. **Use ONLY the data snippet provided by the user** for understanding structure and testing
+
+## Your Workflow
+
+When the user provides a plotting request (e.g., "I need a bar plot that shows the frequency of the times each trait is chosen per brand character"), follow this workflow:
+
+### Step 1: Understand the Request
+- Parse the user's natural language request to identify:
+  - **Chart type** (bar, stacked bar, line, heatmap, etc.)
+  - **X-axis variable**
+  - **Y-axis variable / aggregation** (count, mean, sum, etc.)
+  - **Grouping / color encoding** (if any)
+  - **Filtering requirements** (if any)
+
+- Think critically about whether the requested plot is feasible with the available data.
+- Think critically about the best way to visualize the requested information, and if the requested chart type is appropriate. If not, consider alternatives and ask the user for confirmation before proceeding.
+
+### Step 2: Analyze Provided Data
+The user will paste a `df.head()` output. Examine:
+- Column names and their meaning (refer to column naming conventions in `.github/copilot-instructions.md`)
+- Data types
+- Whether the data is in the right shape for the desired plot
+
+**Important:** Do NOT make assumptions about where this data came from. It may be:
+- Output from a `Survey.get_*()` method
+- Manually transformed in a notebook
+- A join of multiple data sources
+- Any other custom manipulation
+
+### Step 3: Determine Data Manipulation Needs
+Decide if the provided data can be plotted directly, or if transformations are needed:
+- **No manipulation**: Data is ready → proceed to Step 5
+- **Manipulation needed**: Aggregation, pivoting, melting, filtering, or new computed columns required → proceed to Step 4
+
+### Step 4: Create Data Manipulation Function (if needed)
+Check if an existing `transform_<descriptive_name>` function exists in `utils.py` that performs the needed data manipulation. If not, create a dedicated method in the `QualtricsSurvey` class (`utils.py`):
+
+```python
+def transform_<descriptive_name>(self, df: pl.LazyFrame | pl.DataFrame) -> tuple[pl.LazyFrame, dict | None]:
+    """Transform <input_description> to <output_description>.
+    
+    Original use-case: "<paste user's original question here>"
+    
+    This function <concise 1-2 sentence explanation of what it does>.
+    
+    Args:
+        df: Pre-fetched data as a Polars LazyFrame or DataFrame.
+    
+    Returns:
+        tuple: (LazyFrame with columns [...], Optional metadata dict)
+    """
+    # Implementation - transform the INPUT data only
+    # NEVER call self.get_*() methods here
+    return result, metadata
+```
+
+**Requirements:**
+- **NEVER retrieve data inside transform functions** - The function receives already-fetched data as input
+- Data retrieval (`get_*()` calls) stays in the notebook so analysts can see all steps
+- Method must return `(pl.LazyFrame, Optional[dict])` tuple
+- Docstring MUST contain the original question verbatim
+- Follow existing patterns class methods of the `QualtricsSurvey()` in `utils.py`
+
+**❌ BAD Example (do NOT do this):**
+```python
+def transform_character_trait_frequency(self, q: pl.LazyFrame):
+    # BAD: Fetching data inside transform function
+    char_df, _ = self.get_character_refine(q)  # ← WRONG!
+    # ... rest of transform
+```
+
+**✅ GOOD Example:**
+```python
+def transform_character_trait_frequency(self, char_df: pl.LazyFrame | pl.DataFrame):
+    # GOOD: Receives pre-fetched data as input
+    if isinstance(char_df, pl.LazyFrame):
+        char_df = char_df.collect()
+    # ... rest of transform
+```
+
+**In the notebook, the analyst writes:**
+```python
+char_data, _ = S.get_character_refine(data)  # Step visible to analyst
+trait_freq, _ = S.transform_character_trait_frequency(char_data)  # Transform step
+chart = S.plot_character_trait_frequency(trait_freq)
+```
+
+### Step 5: Create Temporary Test File
+Create `debug_plot_temp.py` for testing. **Prefer using the data snippet already provided by the user.**
+
+**Option A: Use provided data snippet (preferred)**
+If the user provided a `df.head()` or sample data output, create inline test data from it:
+
+```python
+"""Temporary test file for <plot_name>.
+Delete after testing.
+"""
+import polars as pl
+from theme import ColorPalette
+import altair as alt
+
+# ============================================================
+# TEST DATA (reconstructed from user's df.head() output)
+# ============================================================
+test_data = pl.DataFrame({
+    "Column1": ["value1", "value2", ...],
+    "Column2": [1, 2, ...],
+    # ... recreate structure from provided sample
+})
+# ============================================================
+
+# Test the plot function
+from plots import QualtricsPlotsMixin
+# ... test code
+```
+
+**Option B: Ask user (only if necessary)**
+Only ask the user for additional code if:
+- The provided sample is insufficient to test the plot logic
+- You need to understand complex data relationships not visible in the sample
+- The transformation requires understanding the full data pipeline
+
+If you must ask:
+> "The sample data you provided should work for basic testing. However, I need [specific reason]. Could you provide:
+> 1. [specific information needed]
+> 
+> If you'd prefer, I can proceed with a minimal test using the sample data you shared."
+
+### Step 6: Create Plot Function
+Add a new method to `QualtricsPlotsMixin` in `plots.py`:
+
+```python
+def plot_<descriptive_name>(
+    self,
+    data: pl.LazyFrame | pl.DataFrame | None = None,
+    title: str = "<Default title>",
+    x_label: str = "<X label>",
+    y_label: str = "<Y label>",
+    height: int | None = None,
+    width: int | str | None = None,
+) -> alt.Chart:
+    """<Docstring with original question and description>."""
+    df = self._ensure_dataframe(data)
+    
+    # Build chart using ONLY ColorPalette from theme.py
+    chart = alt.Chart(...).mark_bar(color=ColorPalette.PRIMARY)...
+    
+    chart = self._save_plot(chart, title)
+    return chart
+```
+
+**Requirements:**
+- ALL colors MUST use `ColorPalette` constants from `theme.py`
+- Use `self._ensure_dataframe()` to handle LazyFrame/DataFrame
+- Use `self._save_plot()` at the end to enable auto-save
+- Use `self._process_title()` for titles with `<br>` tags
+- Follow existing plot patterns (see `plot_average_scores_with_counts`, `plot_top3_ranking_distribution`)
+
+### Step 7: Test
+Run the temporary test file to verify the plot works:
+```bash
+uv run python debug_plot_temp.py
+```
+
+### Step 8: Provide Summary
+After successful completion, output a summary:
+
+```
+✅ Plot created successfully!
+
+**Data function** (if created): `S.transform_<name>(data)`
+**Plot function**: `S.plot_<name>(data, title="...")`
+
+**Usage example:**
+```python
+# Assuming you have your data already prepared as `plot_data`
+chart = S.plot_<name>(plot_data, title="Your Title Here")
+chart  # Display in Marimo
+```
+
+**Files modified:**
+- `utils.py` - Added `transform_<name>()` (if applicable)
+- `plots.py` - Added `plot_<name>()`
+- `debug_plot_temp.py` - Test file (can be deleted)
+```
+
+## Critical Rules (from .github/copilot-instructions.md)
+
+1. **NEVER load confidential data without explicit user-provided code**
+2. **NEVER assume data source** - do not guess which `get_*()` method produced the data
+3. **NEVER modify Marimo notebooks** (`0X_*.py` files)
+4. **NEVER run Marimo notebooks for debugging**
+5. **ALL colors MUST come from `theme.py`** - use `ColorPalette.PRIMARY`, `ColorPalette.RANK_1`, etc.
+6. **If a new color is needed**, add it to `ColorPalette` in `theme.py` first
+7. **No changelog markdown files** - do not create new .md files documenting changes
+8. **Reading notebooks is OK** to understand function usage patterns
+9. **Getter methods return tuples**: `(LazyFrame, Optional[metadata])`
+10. **Use Polars LazyFrames** until visualization, then `.collect()`
+
+If any rule causes problems, ask user for permission before deviating.
+
+## Reference: Column Patterns
+
+- `SS_Green_Blue__V14__Choice_1` → Speaking Style trait score
+- `Voice_Scale_1_10__V48` → 1-10 voice rating
+- `Top_3_Voices_ranking__V77` → Ranking position
+- `Character_Ranking_<Name>` → Character personality ranking
--- a/.vscode/extensions.json
+++ b/.vscode/extensions.json
@@ -0,0 +1,5 @@
+{
+    "recommendations": [
+        "wakatime.vscode-wakatime"
+    ]
+}
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -0,0 +1,5 @@
+{
+    "chat.tools.terminal.autoApprove": {
+        "/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/.venv/bin/python": true
+    }
+}
--- a/02_quant_analysis.py
+++ b/02_quant_analysis.py
@@ -1,6 +1,6 @@
 import marimo

-__generated_with = "0.19.2"
+__generated_with = "0.19.7"
 app = marimo.App(width="full")


@@ -16,8 +16,8 @@ def _():

    from speaking_styles import SPEAKING_STYLES
    return (
-        QualtricsSurvey,
        Path,
+        QualtricsSurvey,
        SPEAKING_STYLES,
        calculate_weighted_ranking_scores,
        check_progress,
@@ -49,7 +49,7 @@ def _(Path, file_browser, mo):


@app.cell
-def _(QualtricsSurvey, QSF_FILE, RESULTS_FILE, mo):
+def _(QSF_FILE, QualtricsSurvey, RESULTS_FILE, mo):
    S = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
    try:
        data_all = S.load_data()
@@ -383,6 +383,12 @@ def _(S, data, mo):
    return (vscales,)


+@app.cell
+def _(vscales):
+    print(vscales.collect().head())
+    return
+
+
@app.cell
 def _(pl, vscales):
    # Count non-null values per row
--- a/03_quant_report.py
+++ b/03_quant_report.py
@@ -1,6 +1,6 @@
 import marimo

-__generated_with = "0.19.2"
+__generated_with = "0.19.7"
 app = marimo.App(width="full")

 with app.setup:
@@ -44,14 +44,14 @@ def _(QSF_FILE, RESULTS_FILE):


@app.cell(hide_code=True)
-def _():
-    mo.md(r"""
+def _(RESULTS_FILE, data_all):
+    mo.md(rf"""
    ---
    # Load Data

-    **Dataset:** `{Path(RESULTS_FILE).name}`
+    **Dataset:** {Path(RESULTS_FILE).name}

-    **Responses**: `{data_all.collect().shape[0]}`
+    **Responses**: {data_all.collect().shape[0]}
    """)
    return

@@ -71,7 +71,6 @@ def _(S, data_all):


    mo.md(f"""
-    # Data Validation

    {check_progress(data_all)}

@@ -104,39 +103,20 @@ def _(data_all):
    return (data_validated,)


-@app.cell(hide_code=True)
+@app.cell
 def _():


    return


-@app.cell
-def _(data_validated):
-    data = data_validated
-
-    data.collect()
-    return (data,)
-
-
@app.cell(hide_code=True)
 def _():
-    mo.md(r"""
-    ---
-
-    # Introduction (Respondent Demographics)
-    """)
+    # 
    return


@app.cell
-def _(S, data):
-    demographics = S.get_demographics(data)[0].collect()
-    demographics
-    return (demographics,)
-
-
-@app.cell(hide_code=True)
 def _():
    mo.md(r"""
    ## Lucia confirmation missing 'Consumer' data
@@ -145,6 +125,13 @@ def _():


@app.cell
+def _(S, data_validated):
+    demographics = S.get_demographics(data_validated)[0].collect()
+    # demographics
+    return (demographics,)
+
+
+@app.cell(hide_code=True)
 def _(demographics):
    # Demographics where 'Consumer' is null
    demographics_no_consumer = demographics.filter(pl.col('Consumer').is_null())['_recordId'].to_list()
@@ -160,16 +147,82 @@ def _(data_all, demographics_no_consumer):


@app.cell
-def _(data_all):
+def _():
+    mo.md(r"""
+    # Filter Data (Global corrections)
+    """)
+    return
+
+
+@app.cell
+def _():
+    BEST_CHOSEN_CHARACTER = "the_coach"
+    return (BEST_CHOSEN_CHARACTER,)
+
+
+@app.cell
+def _(S):
+    filter_form = mo.md('''
+
+
+
+    {age}
+
+    {gender}
+
+    {ethnicity}
+
+    {income}
+
+    {consumer}
+    '''
+    ).batch(
+        age=mo.ui.multiselect(options=S.options_age, value=S.options_age, label="Select Age Group(s):"),
+        gender=mo.ui.multiselect(options=S.options_gender, value=S.options_gender, label="Select Gender(s):"),
+        ethnicity=mo.ui.multiselect(options=S.options_ethnicity, value=S.options_ethnicity, label="Select Ethnicities:"),
+        income=mo.ui.multiselect(options=S.options_income, value=S.options_income, label="Select Income Group(s):"),
+        consumer=mo.ui.multiselect(options=S.options_consumer, value=S.options_consumer, label="Select Consumer Groups:")
+    ).form()
+    mo.md(f'''
+    ---
+
+    # Data Filter
+
+    {filter_form}
+    ''')
+    return (filter_form,)
+
+
+@app.cell
+def _(S, data_validated, filter_form):
+    mo.stop(filter_form.value is None, mo.md("**Please submit filter above to proceed**"))
+    _d = S.filter_data(data_validated, age=filter_form.value['age'], gender=filter_form.value['gender'], income=filter_form.value['income'], ethnicity=filter_form.value['ethnicity'], consumer=filter_form.value['consumer'])
+
+    # Stop execution and prevent other cells from running if no data is selected
+    mo.stop(len(_d.collect()) == 0, mo.md("**No Data available for current filter combination**"))
+    data = _d
+
+    # data = data_validated
+    data.collect()
+    return (data,)
+
+
+@app.cell
+def _():
+    return
+
+
+@app.cell
+def _():
    # Check if all business owners are missing a 'Consumer type' in demographics
-    assert all([a is None for a in data_all.filter(pl.col('QID4') == 'Yes').collect()['Consumer'].unique()]) , "Not all business owners are missing 'Consumer type' in demographics."
+    # assert all([a is None for a in data_all.filter(pl.col('QID4') == 'Yes').collect()['Consumer'].unique()]) , "Not all business owners are missing 'Consumer type' in demographics."
    return


@app.cell
 def _():
    mo.md(r"""
-    ## Demographic Distributions
+    # Demographic Distributions
    """)
    return

@@ -187,14 +240,13 @@ def _():


@app.cell
-def _(S, demo_plot_cols, demographics):
+def _(S, data, demo_plot_cols):
    _content = """
-    ## Demographic Distributions

    """
    for c in demo_plot_cols:
        _fig = S.plot_demographic_distribution(
-            data=demographics,
+            data=S.get_demographics(data)[0],
            column=c,
            title=f"{c.replace('Bussiness', 'Business').replace('_', ' ')} Distribution of Survey Respondents"
        )
@@ -214,7 +266,7 @@ def _():
    return


-@app.cell
+@app.cell(disabled=True)
 def _():
    mo.md(r"""
    ## Best performing: Original vs Refined frankenstein
@@ -222,15 +274,15 @@ def _():
    return


-@app.cell
+@app.cell(disabled=True)
 def _(S, data):
    char_refine_rank = S.get_character_refine(data)[0]
    # print(char_rank.collect().head())
-    # print(char_refine_rank.collect().head())
+    print(char_refine_rank.collect().head())
    return


-@app.cell
+@app.cell(disabled=True)
 def _():
    mo.md(r"""
    ## Character ranking points
@@ -266,6 +318,30 @@ def _(S, char_rank):


@app.cell
+def _():
+    mo.md(r"""
+    ### Statistical Significance Character Ranking
+    """)
+    return
+
+
+@app.cell(disabled=True)
+def _(S, char_rank):
+    _pairwise_df, _meta = S.compute_ranking_significance(char_rank)
+
+    # print(_pairwise_df.columns)
+
+    mo.md(f"""
+
+
+    {mo.ui.altair_chart(S.plot_significance_heatmap(_pairwise_df, metadata=_meta))}
+
+    {mo.ui.altair_chart(S.plot_significance_summary(_pairwise_df, metadata=_meta))}
+    """)
+    return
+
+
+@app.cell(disabled=True)
 def _():
    mo.md(r"""
    ## Character Ranking: times 1st place
@@ -306,9 +382,75 @@ def _():
    return


+@app.cell
+def _(S, data):
+    char_df = S.get_character_refine(data)[0]
+    return (char_df,)
+
+
+@app.cell
+def _(S, char_df):
+    from theme import ColorPalette
+
+    # Assuming you already have char_df (your data from get_character_refine or similar)
+    characters = ['Bank Teller', 'Familiar Friend', 'The Coach', 'Personal Assistant']
+    character_colors = {
+        'Bank Teller': (ColorPalette.CHARACTER_BANK_TELLER, ColorPalette.CHARACTER_BANK_TELLER_HIGHLIGHT),
+        'Familiar Friend': (ColorPalette.CHARACTER_FAMILIAR_FRIEND, ColorPalette.CHARACTER_FAMILIAR_FRIEND_HIGHLIGHT),
+        'The Coach': (ColorPalette.CHARACTER_COACH, ColorPalette.CHARACTER_COACH_HIGHLIGHT),
+        'Personal Assistant': (ColorPalette.CHARACTER_PERSONAL_ASSISTANT, ColorPalette.CHARACTER_PERSONAL_ASSISTANT_HIGHLIGHT),
+    }
+
+    # Build consistent sort order (by total frequency across all characters)
+    all_trait_counts = {}
+    for char in characters:
+        freq_df, _ = S.transform_character_trait_frequency(char_df, char)
+        for row in freq_df.iter_rows(named=True):
+            all_trait_counts[row['trait']] = all_trait_counts.get(row['trait'], 0) + row['count']
+
+    consistent_sort_order = sorted(all_trait_counts.keys(), key=lambda x: -all_trait_counts[x])
+
+    _content = """"""
+    # Generate 4 plots (one per character)
+    for char in characters:
+        freq_df, _ = S.transform_character_trait_frequency(char_df, char)
+        main_color, highlight_color = character_colors[char]
+        chart = S.plot_single_character_trait_frequency(
+            data=freq_df,
+            character_name=char,
+            bar_color=main_color,
+            highlight_color=highlight_color,
+            trait_sort_order=consistent_sort_order,
+        )
+        _content += f"""
+        {mo.ui.altair_chart(chart)}
+
+
+    """
+
+    mo.md(_content)
+    return
+
+
+@app.cell(disabled=True)
+def _():
+    mo.md(r"""
+    ## Statistical significance best characters
+
+    zie chat
+    > voorbeeld: als de nr 1 en 2 niet significant verschillen maar wel van de nr 3 bijvoorbeeld is dat ook top. Beetje meedenkend over hoe ik het kan presenteren weetje wat ik bedoel?:)
+    >
+    """)
+    return
+
+
+@app.cell(disabled=True)
+def _():
+    return
+
+
@app.cell
 def _():
-    # Join respondent 
    return


@@ -322,12 +464,208 @@ def _():
    return


+@app.cell
+def _():
+    COLOR_GENDER = True
+    return (COLOR_GENDER,)
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ## Top 8 Most Chosen out of 18
+    """)
+    return
+
+
+@app.cell
+def _(S, data):
+    v_18_8_3 = S.get_18_8_3(data)[0]
+    return (v_18_8_3,)
+
+
+@app.cell
+def _(COLOR_GENDER, S, v_18_8_3):
+    S.plot_voice_selection_counts(v_18_8_3, title="Top 8 Voice Selection from 18 Voices", x_label='Voice', color_gender=COLOR_GENDER)
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ## Top 3 most chosen out of 8
+    """)
+    return
+
+
+@app.cell
+def _(COLOR_GENDER, S, v_18_8_3):
+    S.plot_top3_selection_counts(v_18_8_3, title="Top 3 Voice Selection Counts from 8 Voices", x_label='Voice', color_gender=COLOR_GENDER)
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ## Voice Ranking Weighted Score
+    """)
+    return
+
+
+@app.cell
+def _(S, data):
+    top3_voices = S.get_top_3_voices(data)[0]
+    top3_voices_weighted = calculate_weighted_ranking_scores(top3_voices)
+    return top3_voices, top3_voices_weighted
+
+
+@app.cell
+def _(COLOR_GENDER, S, top3_voices_weighted):
+    S.plot_weighted_ranking_score(top3_voices_weighted, title="Most Popular Voice - Weighted Popularity Score<br>(1st = 3pts, 2nd = 2pts, 3rd = 1pt)", color_gender=COLOR_GENDER)
+    return
+
+
@app.cell(hide_code=True)
 def _():
    mo.md(r"""
-    ---
+    ## Which voice is ranked best in the ranking question for top 3?

-    # Brand Character Results
+    (not best 3 out of 8 question)
+    """)
+    return
+
+
+@app.cell
+def _(COLOR_GENDER, S, top3_voices):
+    S.plot_ranking_distribution(top3_voices, x_label='Voice', title="Distribution of Top 3 Voice Rankings (1st, 2nd, 3rd)", color_gender=COLOR_GENDER)
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ### Statistical significance for voice ranking
+    """)
+    return
+
+
+@app.cell
+def _():
+    # print(top3_voices.collect().head())
+    return
+
+
+@app.cell
+def _():
+
+    # _pairwise_df, _metadata = S.compute_ranking_significance(
+    #     top3_voices,alpha=0.05,correction="none")
+
+    # # View significant pairs
+    # # print(pairwise_df.filter(pl.col('significant') == True))
+
+    # # Create heatmap visualization
+    # _heatmap = S.plot_significance_heatmap(
+    #     _pairwise_df, 
+    #     metadata=_metadata,
+    #     title="Weighted Voice Ranking Significance<br>(Pairwise Comparisons)"
+    # )
+
+    # # Create summary bar chart
+    # _summary = S.plot_significance_summary(
+    #     _pairwise_df,
+    #     metadata=_metadata
+    # )
+
+    # mo.md(f"""
+    # {mo.ui.altair_chart(_heatmap)}
+
+    # {mo.ui.altair_chart(_summary)}
+    # """)
+    return
+
+
+@app.cell
+def _():
+    ## Voice Ranked 1st the most
+    return
+
+
+@app.cell
+def _(COLOR_GENDER, S, top3_voices):
+    S.plot_most_ranked_1(top3_voices, title="Most Popular Voice<br>(Number of Times Ranked 1st)", x_label='Voice', color_gender=COLOR_GENDER)
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ## Voice Scale 1-10
+    """)
+    return
+
+
+@app.cell
+def _(COLOR_GENDER, S, data):
+    # Get your voice scale data (from notebook)
+    voice_1_10, _ = S.get_voice_scale_1_10(data)
+    S.plot_average_scores_with_counts(voice_1_10, x_label='Voice', domain=[1,10], title="Voice General Impression (Scale 1-10)", color_gender=COLOR_GENDER)
+    return (voice_1_10,)
+
+
+@app.cell(disabled=True)
+def _():
+    mo.md(r"""
+    ### Statistical Significance (Scale 1-10)
+    """)
+    return
+
+
+@app.cell(disabled=True)
+def _(S, voice_1_10):
+    # Compute pairwise significance tests
+    pairwise_df, metadata = S.compute_pairwise_significance(
+        voice_1_10,
+        test_type="mannwhitney",  # or "ttest", "chi2", "auto"
+        alpha=0.05,
+        correction="bonferroni"   # or "holm", "none"
+    )
+
+    # View significant pairs
+    # print(pairwise_df.filter(pl.col('significant') == True))
+
+    # Create heatmap visualization
+    _heatmap = S.plot_significance_heatmap(
+        pairwise_df, 
+        metadata=metadata,
+        title="Voice Rating Significance<br>(Pairwise Comparisons)"
+    )
+
+    # Create summary bar chart
+    _summary = S.plot_significance_summary(
+        pairwise_df,
+        metadata=metadata
+    )
+
+    mo.md(f"""
+    {mo.ui.altair_chart(_heatmap)}
+
+    {mo.ui.altair_chart(_summary)}
+    """)
+    return
+
+
+@app.cell
+def _():
+    return
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ## Ranking points for Voice per Chosen Brand Character
+
+    **missing mapping**
    """)
    return

@@ -335,12 +673,261 @@ def _():
@app.cell(hide_code=True)
 def _():
    mo.md(r"""
-    ---
-
-    # Spoken Voice Results
+    ## Correlation Speaking Styles
    """)
    return


+@app.cell
+def _(S, data, top3_voices):
+    ss_or, choice_map_or = S.get_ss_orange_red(data)
+    ss_gb, choice_map_gb = S.get_ss_green_blue(data)
+
+    # Combine the data
+    ss_all = ss_or.join(ss_gb, on='_recordId')
+    _d = ss_all.collect()
+
+    choice_map = {**choice_map_or, **choice_map_gb}
+    # print(_d.head())
+    # print(choice_map)
+    ss_long = utils.process_speaking_style_data(ss_all, choice_map)
+
+    df_style = utils.process_speaking_style_data(ss_all, choice_map)
+
+    vscales = S.get_voice_scale_1_10(data)[0]
+    df_scale_long = utils.process_voice_scale_data(vscales)
+
+    joined_scale = df_style.join(df_scale_long, on=["_recordId", "Voice"], how="inner")
+
+    df_ranking = utils.process_voice_ranking_data(top3_voices)
+    joined_ranking = df_style.join(df_ranking, on=['_recordId', 'Voice'], how='inner')
+    return joined_ranking, joined_scale
+
+
+@app.cell
+def _(joined_ranking):
+    joined_ranking.head()
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ### Colors vs Scale 1-10
+    """)
+    return
+
+
+@app.cell
+def _(S, joined_scale):
+    # Transform to get one row per color with average correlation
+    color_corr_scale, _ = utils.transform_speaking_style_color_correlation(joined_scale, SPEAKING_STYLES)
+    S.plot_speaking_style_color_correlation(
+        data=color_corr_scale,
+        title="Correlation: Speaking Style Colors and Voice Scale 1-10"
+    )
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ### Colors vs Ranking Points
+    """)
+    return
+
+
+@app.cell
+def _(S, joined_ranking):
+    color_corr_ranking, _ = utils.transform_speaking_style_color_correlation(
+        joined_ranking, 
+        SPEAKING_STYLES, 
+        target_column="Ranking_Points"
+    )
+    S.plot_speaking_style_color_correlation(
+        data=color_corr_ranking,
+        title="Correlation: Speaking Style Colors and Voice Ranking Points"
+    )
+    return
+
+
+@app.cell
+def _():
+    mo.md(r"""
+    ### Individual Traits vs Scale 1-10
+    """)
+    return
+
+
+@app.cell
+def _(S, joined_scale):
+    _content = """"""
+
+    for _style, _traits in SPEAKING_STYLES.items():
+        # print(f"Correlation plot for {style}...")
+        _fig = S.plot_speaking_style_correlation(
+            data=joined_scale,
+            style_color=_style,
+            style_traits=_traits,
+            title=f"Correlation: Speaking Style {_style} and Voice Scale 1-10",
+        )
+        _content += f"""
+    #### Speaking Style **{_style}**:
+
+    {mo.ui.altair_chart(_fig)}
+
+    """
+    mo.md(_content)
+    return
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ### Individual Traits vs Ranking Points
+    """)
+    return
+
+
+@app.cell
+def _(S, joined_ranking):
+    _content = """"""
+
+    for _style, _traits in SPEAKING_STYLES.items():
+        # print(f"Correlation plot for {style}...")
+        _fig = S.plot_speaking_style_ranking_correlation(
+        data=joined_ranking,
+        style_color=_style,
+        style_traits=_traits,
+        title=f"Correlation: Speaking Style {_style} and Voice Ranking Points",
+    )
+        _content += f"""
+    #### Speaking Style **{_style}**:
+
+    {mo.ui.altair_chart(_fig)}
+
+    """
+    mo.md(_content)
+    return
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ## Correlations when "Best Brand Character" is chosen
+
+    Select only the traits that fit with that character
+    """)
+    return
+
+
+@app.cell
+def _(BEST_CHOSEN_CHARACTER):
+    from reference import ORIGINAL_CHARACTER_TRAITS
+    chosen_bc_traits = ORIGINAL_CHARACTER_TRAITS[BEST_CHOSEN_CHARACTER]
+    return (chosen_bc_traits,)
+
+
+@app.cell
+def _(chosen_bc_traits):
+    STYLES_SUBSET = utils.filter_speaking_styles(SPEAKING_STYLES, chosen_bc_traits)
+    return (STYLES_SUBSET,)
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ### Individual Traits vs Ranking Points
+    """)
+    return
+
+
+@app.cell
+def _(BEST_CHOSEN_CHARACTER, S, STYLES_SUBSET, joined_ranking):
+    _content = ""
+    for _style, _traits in STYLES_SUBSET.items():
+        _fig = S.plot_speaking_style_ranking_correlation(
+            data=joined_ranking,
+            style_color=_style,
+            style_traits=_traits,
+            title=f"""Brand Character "{BEST_CHOSEN_CHARACTER.replace('_', ' ').title()}" - Correlation: Speaking Style {_style} and Voice Ranking Points"""
+        )
+        _content += f"""
+    {mo.ui.altair_chart(_fig)}
+
+    """
+    mo.md(_content)
+    return
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ### Individual Traits vs Scale 1-10
+    """)
+    return
+
+
+@app.cell
+def _(BEST_CHOSEN_CHARACTER, S, STYLES_SUBSET, joined_scale):
+    _content = """"""
+
+    for _style, _traits in STYLES_SUBSET.items():
+        # print(f"Correlation plot for {style}...")
+        _fig = S.plot_speaking_style_correlation(
+            data=joined_scale,
+            style_color=_style,
+            style_traits=_traits,
+            title=f"""Brand Character "{BEST_CHOSEN_CHARACTER.replace('_', ' ').title()}" - Correlation: Speaking Style {_style} and Voice Scale 1-10""",
+        )
+        _content += f"""
+    {mo.ui.altair_chart(_fig)}
+
+    """
+    mo.md(_content)
+    return
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ### Colors vs Scale 1-10 (Best Character)
+    """)
+    return
+
+
+@app.cell
+def _(BEST_CHOSEN_CHARACTER, S, STYLES_SUBSET, joined_scale):
+    # Transform to get one row per color with average correlation
+    _color_corr_scale, _ = utils.transform_speaking_style_color_correlation(joined_scale, STYLES_SUBSET)
+    S.plot_speaking_style_color_correlation(
+        data=_color_corr_scale,
+        title=f"""Brand Character "{BEST_CHOSEN_CHARACTER.replace('_', ' ').title()}" - Correlation: Speaking Style Colors and Voice Scale 1-10"""
+    )
+    return
+
+
+@app.cell(hide_code=True)
+def _():
+    mo.md(r"""
+    ### Colors vs Ranking Points (Best Character)
+    """)
+    return
+
+
+@app.cell
+def _(BEST_CHOSEN_CHARACTER, S, STYLES_SUBSET, joined_ranking):
+    _color_corr_ranking, _ = utils.transform_speaking_style_color_correlation(
+        joined_ranking, 
+        STYLES_SUBSET, 
+        target_column="Ranking_Points"
+    )
+    S.plot_speaking_style_color_correlation(
+        data=_color_corr_ranking,
+        title=f"""Brand Character "{BEST_CHOSEN_CHARACTER.replace('_', ' ').title()}" - Correlation: Speaking Style Colors and Voice Ranking Points"""
+    )
+    return
+
+
 if __name__ == "__main__":
    app.run()
--- a/04_PPTX_Update_Images.py
+++ b/04_PPTX_Update_Images.py
@@ -1,6 +1,6 @@
 import marimo

-__generated_with = "0.19.2"
+__generated_with = "0.19.7"
 app = marimo.App(width="medium")

 with app.setup:
@@ -21,28 +21,24 @@ def _():

@app.cell
 def _():
-    TAG_SOURCE = Path('data/reports/Perception-Research-Report.pptx')
-    TAG_TARGET = Path('data/reports/Perception-Research-Report_tagged.pptx')
-    TAG_IMAGE_DIR = Path('figures/OneDrive_2026-01-28/')
-    return TAG_IMAGE_DIR, TAG_SOURCE, TAG_TARGET
-
-
-@app.cell
-def _(TAG_IMAGE_DIR, TAG_SOURCE, TAG_TARGET):
-    utils.update_ppt_alt_text(ppt_path=TAG_SOURCE, image_source_dir=TAG_IMAGE_DIR, output_path=TAG_TARGET)
    return


@app.cell
 def _():
-    utils._calculate_file_sha1('figures/OneDrive_2026-01-28/All_Respondents/most_prominent_personality_traits.png')
-    return
+    TAG_SOURCE = Path('data/reports/VOICE_Perception-Research-Report_4-2-26_19-30.pptx')
+    # TAG_TARGET = Path('data/reports/Perception-Research-Report_2-2_tagged.pptx')
+    TAG_IMAGE_DIR = Path('figures/debug')
+    return TAG_IMAGE_DIR, TAG_SOURCE


@app.cell
-def _():
-    utils._calculate_perceptual_hash('figures/Picture.png')
-
+def _(TAG_IMAGE_DIR, TAG_SOURCE):
+    utils.update_ppt_alt_text(
+        ppt_path=TAG_SOURCE, 
+        image_source_dir=TAG_IMAGE_DIR, 
+        # output_path=TAG_TARGET
+    )
    return


@@ -56,26 +52,21 @@ def _():

@app.cell
 def _():
-    REPLACE_SOURCE = Path('data/test_replace_source.pptx')
-    REPLACE_TARGET = Path('data/test_replace_target.pptx')
-    return REPLACE_SOURCE, REPLACE_TARGET
+    REPLACE_SOURCE = Path('data/reports/VOICE_Perception-Research-Report_4-2-26_19-30.pptx')
+    # REPLACE_TARGET = Path('data/reports/Perception-Research-Report_2-2_updated.pptx')

-
-app._unparsable_cell(
-    r"""
-    IMAGE_FILE = Path('figures/OneDrive_2026-01-28/Cons-Early_Professional/cold_distant_approachable_familiar_warm.png'
-    """,
-    name="_"
-)
+    NEW_IMAGES_DIR = Path('figures/2-4-26')
+    return NEW_IMAGES_DIR, REPLACE_SOURCE


@app.cell
-def _(IMAGE_FILE, REPLACE_SOURCE, REPLACE_TARGET):
-    utils.pptx_replace_named_image(
-        presentation_path=REPLACE_SOURCE,
-        target_tag=utils.image_alt_text_generator(IMAGE_FILE),
-        new_image_path=IMAGE_FILE,
-        save_path=REPLACE_TARGET)
+def _(NEW_IMAGES_DIR, REPLACE_SOURCE):
+    # get all files in the image source directory and subdirectories
+    results = utils.pptx_replace_images_from_directory(
+        REPLACE_SOURCE,            # Source presentation path,
+        NEW_IMAGES_DIR,          # Source directory with new images
+        # REPLACE_TARGET  # Output path (optional, defaults to overwrite)
+    )
    return


--- a/README.md
+++ b/README.md
@@ -1,5 +1,239 @@
+# Voice Branding Quantitative Analysis
+
+## Running Marimo Notebooks
+
 Running on Ct-105 for shared access:

-```
+```bash
 uv run marimo run 02_quant_analysis.py --headless --port 8080
-```
+```
+
+---
+
+## Batch Report Generation
+
+The quant report can be run with different filter combinations via CLI or automated batch processing.
+
+### Single Filter Run (CLI)
+
+Run the report script directly with JSON-encoded filter arguments:
+
+```bash
+# Single consumer segment
+uv run python 03_quant_report.script.py --consumer '["Starter"]'
+
+# Single age group
+uv run python 03_quant_report.script.py --age '["18 to 21 years"]'
+
+# Multiple filters combined
+uv run python 03_quant_report.script.py --age '["18 to 21 years", "22 to 24 years"]' --gender '["Male"]'
+
+# All respondents (no filters = defaults to all options selected)
+uv run python 03_quant_report.script.py
+```
+
+Available filter arguments:
+- `--age` — JSON list of age groups
+- `--gender` — JSON list of genders  
+- `--ethnicity` — JSON list of ethnicities
+- `--income` — JSON list of income groups
+- `--consumer` — JSON list of consumer segments
+
+### Batch Runner (All Combinations)
+
+Run all single-filter combinations automatically with progress tracking:
+
+```bash
+# Preview all combinations without running
+uv run python run_filter_combinations.py --dry-run
+
+# Run all combinations (shows progress bar)
+uv run python run_filter_combinations.py
+
+# Or use the registered CLI entry point
+uv run quant-report-batch
+uv run quant-report-batch --dry-run
+```
+
+This generates reports for:
+- All Respondents (no filters)
+- Each age group individually
+- Each gender individually
+- Each ethnicity individually
+- Each income group individually
+- Each consumer segment individually
+
+Output figures are saved to `figures/<export_date>/<filter_slug>/`.
+
+### Jupyter Notebook Debugging
+
+The script auto-detects Jupyter/IPython environments. When running in VS Code's Jupyter extension, CLI args default to `None` (all options selected), so you can debug cell-by-cell normally.
+
+---
+
+## Adding Custom Filter Combinations
+
+To add new filter combinations to the batch runner, edit `run_filter_combinations.py`:
+
+### Checklist
+
+1. **Open** `run_filter_combinations.py`
+
+2. **Find** the `get_filter_combinations()` function
+
+3. **Add** your combination to the list before the `return` statement:
+
+```python
+# Example: Add a specific age + consumer cross-filter
+combinations.append({
+    'name': 'Age-18to24_Consumer-Starter',  # Used for output folder naming
+    'filters': {
+        'age': ['18 to 21 years', '22 to 24 years'],
+        'consumer': ['Starter']
+    }
+})
+```
+
+4. **Filter keys** must match CLI argument names (defined in `FILTER_CONFIG` in `03_quant_report.script.py`):
+   - `age` — values from `survey.options_age`
+   - `gender` — values from `survey.options_gender`
+   - `ethnicity` — values from `survey.options_ethnicity`
+   - `income` — values from `survey.options_income`
+   - `consumer` — values from `survey.options_consumer`
+
+5. **Check available values** by running:
+```python
+from utils import QualtricsSurvey
+S = QualtricsSurvey('data/exports/2-2-26/...Labels.csv', 'data/exports/.../....qsf')
+S.load_data()
+print(S.options_age)
+print(S.options_consumer)
+# etc.
+```
+
+6. **Test** with dry-run first:
+```bash
+uv run python run_filter_combinations.py --dry-run
+```
+
+### Example: Adding Multiple Cross-Filters
+
+```python
+# In get_filter_combinations(), before return:
+
+# Young professionals
+combinations.append({
+    'name': 'Young_Professionals',
+    'filters': {
+        'age': ['22 to 24 years', '25 to 34 years'],
+        'consumer': ['Early Professional']
+    }
+})
+
+# High income males
+combinations.append({
+    'name': 'High_Income_Male',
+    'filters': {
+        'income': ['$150,000 - $199,999', '$200,000 or more'],
+        'gender': ['Male']
+    }
+})
+```
+
+### Notes
+
+- **Empty filters dict** = all respondents (no filtering)
+- **Omitted filter keys** = all options for that dimension selected
+- **Output folder names** are auto-generated from active filters by `QualtricsSurvey.filter_data()`
+
+---
+
+## Adding a New Filter Dimension
+
+To add an entirely new filter dimension (e.g., a new demographic question), you need to update several files:
+
+### Checklist
+
+1. **Update `utils.py` — `QualtricsSurvey.__init__()`** to initialize the filter state attribute:
+
+```python
+# In __init__(), add after existing filter_ attributes (around line 758):
+self.filter_region:list = None  # QID99
+```
+
+2. **Update `utils.py` — `load_data()`** to populate the `options_*` attribute:
+
+```python
+# In load_data(), add after existing options:
+self.options_region = sorted(df['QID99'].drop_nulls().unique().to_list()) if 'QID99' in df.columns else []
+```
+
+3. **Update `utils.py` — `filter_data()`** to accept and apply the filter:
+
+```python
+# Add parameter to function signature:
+def filter_data(self, q: pl.LazyFrame, ..., region:list=None) -> pl.LazyFrame:
+
+# Add filter logic in function body:
+self.filter_region = region
+if region is not None:
+    q = q.filter(pl.col('QID99').is_in(region))
+```
+
+4. **Update `plots.py` — `_get_filter_slug()`** to include the filter in directory slugs:
+
+```python
+# Add to the filters list:
+('region', 'Reg', getattr(self, 'filter_region', None), 'options_region'),
+```
+
+5. **Update `plots.py` — `_get_filter_description()`** for human-readable descriptions:
+
+```python
+# Add to the filters list:
+('Region', getattr(self, 'filter_region', None), 'options_region'),
+```
+
+6. **Update `03_quant_report.script.py` — `FILTER_CONFIG`**:
+
+```python
+FILTER_CONFIG = {
+    'age': 'options_age',
+    'gender': 'options_gender',
+    # ... existing filters ...
+    'region': 'options_region',  # ← New filter
+}
+```
+
+This **automatically**:
+- Adds `--region` CLI argument
+- Includes it in Jupyter mode (defaults to all options)
+- Passes it to `S.filter_data()`
+- Writes it to the `.txt` filter description file
+
+7. **Update `run_filter_combinations.py`** to generate combinations (optional):
+
+```python
+# Add after existing filter loops:
+for region in survey.options_region:
+    combinations.append({
+        'name': f'Region-{region}',
+        'filters': {'region': [region]}
+    })
+```
+
+### Currently Available Filters
+
+| CLI Argument | Options Attribute | QID Column | Description |
+|--------------|-------------------|------------|-------------|
+| `--age` | `options_age` | QID1 | Age groups |
+| `--gender` | `options_gender` | QID2 | Gender |
+| `--ethnicity` | `options_ethnicity` | QID3 | Ethnicity |
+| `--income` | `options_income` | QID15 | Income brackets |
+| `--consumer` | `options_consumer` | Consumer | Consumer segments |
+| `--business_owner` | `options_business_owner` | QID4 | Business owner status |
+| `--employment_status` | `options_employment_status` | QID13 | Employment status |
+| `--personal_products` | `options_personal_products` | QID14 | Personal products |
+| `--ai_user` | `options_ai_user` | QID22 | AI user status |
+| `--investable_assets` | `options_investable_assets` | QID16 | Investable assets |
+| `--industry` | `options_industry` | QID17 | Industry |
--- a/XX_detailed_trait_analysis.py
+++ b/XX_detailed_trait_analysis.py
@@ -0,0 +1,263 @@
+"""Extra analyses of the traits"""
+# %% Imports
+
+import utils
+import polars as pl
+import argparse
+import json
+import re
+from pathlib import Path
+from validation import check_straight_liners
+
+
+# %% Fixed Variables
+RESULTS_FILE = 'data/exports/2-4-26/JPMC_Chase Brand Personality_Quant Round 1_February 4, 2026_Labels.csv'
+QSF_FILE = 'data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf'
+
+
+# %% CLI argument parsing for batch automation
+# When run as script: uv run XX_statistical_significance.script.py --age '["18
+# Central filter configuration - add new filters here only
+# Format: 'cli_arg_name': 'QualtricsSurvey.options_* attribute name'
+FILTER_CONFIG = {
+    'age': 'options_age',
+    'gender': 'options_gender',
+    'ethnicity': 'options_ethnicity',
+    'income': 'options_income',
+    'consumer': 'options_consumer',
+    'business_owner': 'options_business_owner',
+    'ai_user': 'options_ai_user',
+    'investable_assets': 'options_investable_assets',
+    'industry': 'options_industry',
+}
+
+def parse_cli_args():
+    parser = argparse.ArgumentParser(description='Generate quant report with optional filters')
+    
+    # Dynamically add filter arguments from config
+    for filter_name in FILTER_CONFIG:
+        parser.add_argument(f'--{filter_name}', type=str, default=None, help=f'JSON list of {filter_name} values')
+    
+    parser.add_argument('--filter-name', type=str, default=None, help='Name for this filter combination (used for .txt description file)')
+    parser.add_argument('--figures-dir', type=str, default=f'figures/traits-likert-analysis/{Path(RESULTS_FILE).parts[2]}', help='Override the default figures directory')
+    
+    # Only parse if running as script (not in Jupyter/interactive)
+    try:
+        # Check if running in Jupyter by looking for ipykernel
+        get_ipython()  # noqa: F821 # type: ignore
+        # Return namespace with all filters set to None
+        no_filters = {f: None for f in FILTER_CONFIG}
+        # Use the same default as argparse
+        default_fig_dir = f'figures/traits-likert-analysis/{Path(RESULTS_FILE).parts[2]}'
+        return argparse.Namespace(**no_filters, filter_name=None, figures_dir=default_fig_dir)
+    except NameError:
+        args = parser.parse_args()
+        # Parse JSON strings to lists
+        for filter_name in FILTER_CONFIG:
+            val = getattr(args, filter_name)
+            setattr(args, filter_name, json.loads(val) if val else None)
+        return args
+
+cli_args = parse_cli_args()
+
+
+# %%
+S = utils.QualtricsSurvey(RESULTS_FILE, QSF_FILE, figures_dir=cli_args.figures_dir)
+data_all = S.load_data()
+
+
+# %% Build filtered dataset based on CLI args
+
+# CLI args: None means "no filter applied" - filter_data() will skip None filters
+
+# Build filter values dict dynamically from FILTER_CONFIG
+_active_filters = {filter_name: getattr(cli_args, filter_name) for filter_name in FILTER_CONFIG}
+
+_d = S.filter_data(data_all, **_active_filters)
+
+# Write filter description file if filter-name is provided
+if cli_args.filter_name and S.fig_save_dir:
+    # Get the filter slug (e.g., "All_Respondents", "Cons-Starter", etc.)
+    _filter_slug = S._get_filter_slug()
+    _filter_slug_dir = S.fig_save_dir / _filter_slug
+    _filter_slug_dir.mkdir(parents=True, exist_ok=True)
+    
+    # Build filter description
+    _filter_desc_lines = [
+        f"Filter: {cli_args.filter_name}",
+        "",
+        "Applied Filters:",
+    ]
+    _short_desc_parts = []
+    for filter_name, options_attr in FILTER_CONFIG.items():
+        all_options = getattr(S, options_attr)
+        values = _active_filters[filter_name]
+        display_name = filter_name.replace('_', ' ').title()
+        # None means no filter applied (same as "All")
+        if values is not None and values != all_options:
+            _short_desc_parts.append(f"{display_name}: {', '.join(values)}")
+            _filter_desc_lines.append(f"  {display_name}: {', '.join(values)}")
+        else:
+            _filter_desc_lines.append(f"  {display_name}: All")
+    
+    # Write detailed description INSIDE the filter-slug directory
+    # Sanitize filter name for filename usage (replace / and other chars)
+    _safe_filter_name = re.sub(r'[^\w\s-]', '_', cli_args.filter_name)
+    _filter_file = _filter_slug_dir / f"{_safe_filter_name}.txt"
+    _filter_file.write_text('\n'.join(_filter_desc_lines))
+    
+    # Append to summary index file at figures/<export_date>/filter_index.txt
+    _summary_file = S.fig_save_dir / "filter_index.txt"
+    _short_desc = "; ".join(_short_desc_parts) if _short_desc_parts else "All Respondents"
+    _summary_line = f"{_filter_slug}  |  {cli_args.filter_name}  |  {_short_desc}\n"
+    
+    # Append or create the summary file
+    if _summary_file.exists():
+        _existing = _summary_file.read_text()
+        # Avoid duplicate entries for same slug
+        if _filter_slug not in _existing:
+            with _summary_file.open('a') as f:
+                f.write(_summary_line)
+    else:
+        _header = "Filter Index\n" + "=" * 80 + "\n\n"
+        _header += "Directory  |  Filter Name  |  Description\n"
+        _header += "-" * 80 + "\n"
+        _summary_file.write_text(_header + _summary_line)
+
+# Save to logical variable name for further analysis
+data = _d
+data.collect()
+
+# %% Voices per trait
+
+
+ss_or, choice_map_or = S.get_ss_orange_red(data)
+ss_gb, choice_map_gb = S.get_ss_green_blue(data)
+
+# Combine the data
+ss_all = ss_or.join(ss_gb, on='_recordId')
+_d = ss_all.collect()
+
+choice_map = {**choice_map_or, **choice_map_gb}
+# print(_d.head())
+# print(choice_map)
+ss_long = utils.process_speaking_style_data(ss_all, choice_map)
+
+
+# %% Create plots
+
+for i, trait in enumerate(ss_long.select("Description").unique().to_series().to_list()):
+    trait_d = ss_long.filter(pl.col("Description") == trait)
+
+    S.plot_speaking_style_trait_scores(trait_d, title=trait.replace(":", " ↔ "), height=550, color_gender=True)
+
+
+
+
+
+# %% Filter out straight-liner (PER TRAIT) and re-plot to see if any changes
+# Save with different filename suffix so we can compare with/without straight-liners
+
+print("\n--- Straight-lining Checks on TRAITS ---")
+sl_report_traits, sl_traits_df = check_straight_liners(ss_all, max_score=5)
+sl_traits_df
+
+# %%
+
+if sl_traits_df is not None and not sl_traits_df.is_empty():
+    sl_ids = sl_traits_df.select(pl.col("Record ID").unique()).to_series().to_list()
+    n_sl_groups = sl_traits_df.height
+    print(f"\nExcluding {n_sl_groups} straight-lined question blocks from {len(sl_ids)} respondents.")
+    
+    # Create key in ss_long to match sl_traits_df for anti-join
+    # Question Group key in sl_traits_df is like "SS_Orange_Red__V14"
+    # ss_long has "Style_Group" and "Voice"
+    ss_long_w_key = ss_long.with_columns(
+        (pl.col("Style_Group") + "__" + pl.col("Voice")).alias("Question Group")
+    )
+    
+    # Prepare filter table: Record ID + Question Group
+    sl_filter = sl_traits_df.select([
+        pl.col("Record ID").alias("_recordId"), 
+        pl.col("Question Group")
+    ])
+
+    # Anti-join to remove specific question blocks that were straight-lined
+    ss_long_clean = ss_long_w_key.join(sl_filter, on=["_recordId", "Question Group"], how="anti").drop("Question Group")
+    
+    # Re-plot with suffix in title
+    print("Re-plotting traits (Cleaned)...")
+    for i, trait in enumerate(ss_long_clean.select("Description").unique().to_series().to_list()):
+        trait_d = ss_long_clean.filter(pl.col("Description") == trait)
+        
+        # Modify title to create unique filename (and display title)
+        title_clean = trait.replace(":", " ↔ ") + " (Excl. Straight-Liners)"
+        
+        S.plot_speaking_style_trait_scores(trait_d, title=title_clean, height=550, color_gender=True)
+else:
+    print("No straight-liners found on traits.")
+
+
+
+
+# %% Compare All vs Cleaned
+if sl_traits_df is not None and not sl_traits_df.is_empty():
+    print("Generating Comparison Plots (All vs Cleaned)...")
+    
+    # Always apply the per-question-group filtering here to ensure consistency
+    # (Matches the logic used in the re-plotting section above)
+    print("Applying filter to remove straight-lined question blocks...")
+    ss_long_w_key = ss_long.with_columns(
+        (pl.col("Style_Group") + "__" + pl.col("Voice")).alias("Question Group")
+    )
+    sl_filter = sl_traits_df.select([
+        pl.col("Record ID").alias("_recordId"), 
+        pl.col("Question Group")
+    ])
+    ss_long_clean = ss_long_w_key.join(sl_filter, on=["_recordId", "Question Group"], how="anti").drop("Question Group")
+
+    sl_ids = sl_traits_df.select(pl.col("Record ID").unique()).to_series().to_list()
+
+    # --- Verification Prints ---
+    print(f"\n--- Verification of Filter ---")
+    print(f"Original Row Count: {ss_long.height}")
+    print(f"Number of Straight-Liner Question Blocks: {sl_traits_df.height}")
+    print(f"Sample IDs affected: {sl_ids[:5]}")
+    print(f"Cleaned Row Count: {ss_long_clean.height}")
+    print(f"Rows Removed: {ss_long.height - ss_long_clean.height}")
+    
+    # Verify removal
+    # Re-construct key to verify
+    ss_long_check = ss_long.with_columns(
+        (pl.col("Style_Group") + "__" + pl.col("Voice")).alias("Question Group")
+    )
+    sl_filter_check = sl_traits_df.select([
+        pl.col("Record ID").alias("_recordId"), 
+        pl.col("Question Group")
+    ])
+    
+    should_be_removed = ss_long_check.join(sl_filter_check, on=["_recordId", "Question Group"], how="inner").height
+    print(f"Discrepancy Check (Should be 0): { (ss_long.height - ss_long_clean.height) - should_be_removed }")
+    
+    # Show what was removed (the straight lining behavior)
+    print("\nSample of Straight-Liner Data (Values that caused removal):")
+    print(sl_traits_df.head(5))
+    print("-" * 30 + "\n")
+    # ---------------------------
+    
+    for i, trait in enumerate(ss_long.select("Description").unique().to_series().to_list()):
+        
+        # Get data for this trait from both datasets
+        trait_d_all = ss_long.filter(pl.col("Description") == trait)
+        trait_d_clean = ss_long_clean.filter(pl.col("Description") == trait)
+        
+        # Plot comparison
+        title_comp = trait.replace(":", " ↔ ") + " (Impact of Straight-Liners)"
+        
+        S.plot_speaking_style_trait_scores_comparison(
+            trait_d_all, 
+            trait_d_clean, 
+            title=title_comp,
+            height=600  # Slightly taller for grouped bars
+        )
+
--- a/XX_quant_report.script.py
+++ b/XX_quant_report.script.py
@@ -0,0 +1,849 @@
+
+__generated_with = "0.19.7"
+
+# %%
+import marimo as mo
+import polars as pl
+from pathlib import Path
+import argparse
+import json
+import re
+from validation import check_progress, duration_validation, check_straight_liners
+from utils import QualtricsSurvey, combine_exclusive_columns, calculate_weighted_ranking_scores
+import utils
+
+from speaking_styles import SPEAKING_STYLES
+
+# %% Fixed Variables
+
+RESULTS_FILE = 'data/exports/2-4-26/JPMC_Chase Brand Personality_Quant Round 1_February 4, 2026_Labels.csv'
+# RESULTS_FILE = 'data/exports/debug/JPMC_Chase Brand Personality_Quant Round 1_February 2, 2026_Labels.csv'
+QSF_FILE = 'data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf'
+
+
+# %%
+# CLI argument parsing for batch automation
+# When run as script: python 03_quant_report.script.py --age '["18 to 21 years"]' --consumer '["Starter"]'
+# When run in Jupyter: args will use defaults (all filters = None = all options selected)
+
+# Central filter configuration - add new filters here only
+# Format: 'cli_arg_name': 'QualtricsSurvey.options_* attribute name'
+FILTER_CONFIG = {
+    'age': 'options_age',
+    'gender': 'options_gender',
+    'ethnicity': 'options_ethnicity',
+    'income': 'options_income',
+    'consumer': 'options_consumer',
+    'business_owner': 'options_business_owner',
+    'ai_user': 'options_ai_user',
+    'investable_assets': 'options_investable_assets',
+    'industry': 'options_industry',
+}
+
+def parse_cli_args():
+    parser = argparse.ArgumentParser(description='Generate quant report with optional filters')
+    
+    # Dynamically add filter arguments from config
+    for filter_name in FILTER_CONFIG:
+        parser.add_argument(f'--{filter_name}', type=str, default=None, help=f'JSON list of {filter_name} values')
+    
+    parser.add_argument('--filter-name', type=str, default=None, help='Name for this filter combination (used for .txt description file)')
+    parser.add_argument('--figures-dir', type=str, default=f'figures/{Path(RESULTS_FILE).parts[2]}', help='Override the default figures directory')
+    parser.add_argument('--best-character', type=str, default="the_coach", help='Slug of the best chosen character (default: "the_coach")')
+    parser.add_argument('--sl-threshold', type=int, default=None, help='Exclude respondents who straight-lined >= N question groups (e.g. 3 removes anyone with 3+ straight-lined groups)')
+    parser.add_argument('--voice-ranking-filter', type=str, default=None, choices=['only-missing', 'exclude-missing'], help='Filter by voice ranking completeness: "only-missing" keeps only respondents missing QID98 ranking data, "exclude-missing" removes them')
+    
+    # Only parse if running as script (not in Jupyter/interactive)
+    try:
+        # Check if running in Jupyter by looking for ipykernel
+        get_ipython()  # noqa: F821 # type: ignore
+        # Return namespace with all filters set to None
+        no_filters = {f: None for f in FILTER_CONFIG}
+        return argparse.Namespace(**no_filters, filter_name=None, figures_dir=f'figures/statistical_significance/{Path(RESULTS_FILE).parts[2]}', best_character="the_coach", sl_threshold=None, voice_ranking_filter=None)
+    except NameError:
+        args = parser.parse_args()
+        # Parse JSON strings to lists
+        for filter_name in FILTER_CONFIG:
+            val = getattr(args, filter_name)
+            setattr(args, filter_name, json.loads(val) if val else None)
+        return args
+
+cli_args = parse_cli_args()
+BEST_CHOSEN_CHARACTER = cli_args.best_character
+
+
+
+# %%
+S = QualtricsSurvey(RESULTS_FILE, QSF_FILE, figures_dir=cli_args.figures_dir)
+try:
+    data_all = S.load_data()
+except NotImplementedError as e:
+    mo.stop(True, mo.md(f"**⚠️ {str(e)}**"))
+
+
+# %% Build filtered dataset based on CLI args
+
+# CLI args: None means "no filter applied" - filter_data() will skip None filters
+
+# Build filter values dict dynamically from FILTER_CONFIG
+_active_filters = {filter_name: getattr(cli_args, filter_name) for filter_name in FILTER_CONFIG}
+
+# %% Apply filters
+_d = S.filter_data(data_all, **_active_filters)
+
+# Write filter description file if filter-name is provided
+if cli_args.filter_name and S.fig_save_dir:
+    # Get the filter slug (e.g., "All_Respondents", "Cons-Starter", etc.)
+    _filter_slug = S._get_filter_slug()
+    _filter_slug_dir = S.fig_save_dir / _filter_slug
+    _filter_slug_dir.mkdir(parents=True, exist_ok=True)
+    
+    # Build filter description
+    _filter_desc_lines = [
+        f"Filter: {cli_args.filter_name}",
+        "",
+        "Applied Filters:",
+    ]
+    _short_desc_parts = []
+    for filter_name, options_attr in FILTER_CONFIG.items():
+        all_options = getattr(S, options_attr)
+        values = _active_filters[filter_name]
+        display_name = filter_name.replace('_', ' ').title()
+        # None means no filter applied (same as "All")
+        if values is not None and values != all_options:
+            _short_desc_parts.append(f"{display_name}: {', '.join(values)}")
+            _filter_desc_lines.append(f"  {display_name}: {', '.join(values)}")
+        else:
+            _filter_desc_lines.append(f"  {display_name}: All")
+    
+    # Write detailed description INSIDE the filter-slug directory
+    # Sanitize filter name for filename usage (replace / and other chars)
+    _safe_filter_name = re.sub(r'[^\w\s-]', '_', cli_args.filter_name)
+    _filter_file = _filter_slug_dir / f"{_safe_filter_name}.txt"
+    _filter_file.write_text('\n'.join(_filter_desc_lines))
+    
+    # Append to summary index file at figures/<export_date>/filter_index.txt
+    _summary_file = S.fig_save_dir / "filter_index.txt"
+    _short_desc = "; ".join(_short_desc_parts) if _short_desc_parts else "All Respondents"
+    _summary_line = f"{_filter_slug}  |  {cli_args.filter_name}  |  {_short_desc}\n"
+    
+    # Append or create the summary file
+    if _summary_file.exists():
+        _existing = _summary_file.read_text()
+        # Avoid duplicate entries for same slug
+        if _filter_slug not in _existing:
+            with _summary_file.open('a') as f:
+                f.write(_summary_line)
+    else:
+        _header = "Filter Index\n" + "=" * 80 + "\n\n"
+        _header += "Directory  |  Filter Name  |  Description\n"
+        _header += "-" * 80 + "\n"
+        _summary_file.write_text(_header + _summary_line)
+
+# %% Apply straight-liner threshold filter (if specified)
+# Removes respondents who straight-lined >= N question groups across
+# speaking style and voice scale questions.
+if cli_args.sl_threshold is not None:
+    _sl_n = cli_args.sl_threshold
+    S.sl_threshold = _sl_n  # Store on Survey so filter slug/description include it
+    print(f"Applying straight-liner filter: excluding respondents with ≥{_sl_n} straight-lined question groups...")
+    _n_before = _d.select(pl.len()).collect().item()
+
+    # Extract question groups with renamed columns for check_straight_liners
+    _sl_ss_or, _ = S.get_ss_orange_red(_d)
+    _sl_ss_gb, _ = S.get_ss_green_blue(_d)
+    _sl_vs, _ = S.get_voice_scale_1_10(_d)
+    _sl_all_q = _sl_ss_or.join(_sl_ss_gb, on='_recordId').join(_sl_vs, on='_recordId')
+
+    _, _sl_df = check_straight_liners(_sl_all_q, max_score=5)
+
+    if _sl_df is not None and not _sl_df.is_empty():
+        # Count straight-lined question groups per respondent
+        _sl_counts = (
+            _sl_df
+            .group_by("Record ID")
+            .agg(pl.len().alias("sl_count"))
+            .filter(pl.col("sl_count") >= _sl_n)
+            .select(pl.col("Record ID").alias("_recordId"))
+        )
+        # Anti-join to remove offending respondents
+        _d = _d.collect().join(_sl_counts, on="_recordId", how="anti").lazy()
+        # Update filtered data on the Survey object so sample size is correct
+        S.data_filtered = _d
+        _n_after = _d.select(pl.len()).collect().item()
+        print(f"  Removed {_n_before - _n_after} respondents ({_n_before} → {_n_after})")
+    else:
+        print("  No straight-liners detected — no respondents removed.")
+
+# %% Apply voice-ranking completeness filter (if specified)
+# Keeps only / excludes respondents who are missing the explicit voice
+# ranking question (QID98) despite completing the top-3 selection (QID36).
+if cli_args.voice_ranking_filter is not None:
+    S.voice_ranking_filter = cli_args.voice_ranking_filter  # Store on Survey so filter slug/description include it
+    _vr_missing = S.get_top_3_voices_missing_ranking(_d)
+    _vr_missing_ids = _vr_missing.select('_recordId')
+    _n_before = _d.select(pl.len()).collect().item()
+
+    if cli_args.voice_ranking_filter == 'only-missing':
+        print(f"Voice ranking filter: keeping ONLY respondents missing QID98 ranking data...")
+        _d = _d.collect().join(_vr_missing_ids, on='_recordId', how='inner').lazy()
+    elif cli_args.voice_ranking_filter == 'exclude-missing':
+        print(f"Voice ranking filter: EXCLUDING respondents missing QID98 ranking data...")
+        _d = _d.collect().join(_vr_missing_ids, on='_recordId', how='anti').lazy()
+
+    S.data_filtered = _d
+    _n_after = _d.select(pl.len()).collect().item()
+    print(f"  {_n_before} → {_n_after} respondents ({_vr_missing_ids.height} missing ranking data)")
+
+# Save to logical variable name for further analysis
+data = _d
+data.collect()
+
+
+
+# %%
+# Check if all business owners are missing a 'Consumer type' in demographics
+# assert all([a is None for a in data_all.filter(pl.col('QID4') == 'Yes').collect()['Consumer'].unique()]) , "Not all business owners are missing 'Consumer type' in demographics."
+
+# %%
+mo.md(r"""
+# Demographic Distributions
+""")
+
+# %%
+demo_plot_cols = [
+    'Age',
+    'Gender',
+    # 'Race/Ethnicity',
+    'Bussiness_Owner',
+    'Consumer'
+]
+
+# %%
+_content = """
+
+"""
+for c in demo_plot_cols:
+    _fig = S.plot_demographic_distribution(
+        data=S.get_demographics(data)[0],
+        column=c,
+        title=f"{c.replace('Bussiness', 'Business').replace('_', ' ')} Distribution of Survey Respondents"
+    )
+    _content += f"""{mo.ui.altair_chart(_fig)}\n\n"""
+
+mo.md(_content)
+
+# %%
+mo.md(r"""
+---
+
+# Brand Character Results
+""")
+
+# %%
+mo.md(r"""
+## Best performing: Original vs Refined frankenstein
+""")
+
+# %%
+char_refine_rank = S.get_character_refine(data)[0]
+# print(char_rank.collect().head())
+print(char_refine_rank.collect().head())
+
+# %%
+mo.md(r"""
+## Character ranking points
+""")
+
+# %%
+mo.md(r"""
+## Character ranking 1-2-3
+""")
+
+# %%
+char_rank = S.get_character_ranking(data)[0]
+
+# %%
+char_rank_weighted = calculate_weighted_ranking_scores(char_rank)
+S.plot_weighted_ranking_score(char_rank_weighted, title="Most Popular Character - Weighted Popularity Score<br>(1st=3pts, 2nd=2pts, 3rd=1pt)", x_label='Voice')
+
+# %%
+S.plot_top3_ranking_distribution(char_rank, x_label='Character Personality', title='Character Personality: Rankings Top 3')
+
+# %%
+mo.md(r"""
+### Statistical Significance Character Ranking
+""")
+
+# %%
+# _pairwise_df, _meta = S.compute_ranking_significance(char_rank)
+
+# # print(_pairwise_df.columns)
+
+# mo.md(f"""
+
+
+# {mo.ui.altair_chart(S.plot_significance_heatmap(_pairwise_df, metadata=_meta))}
+
+# {mo.ui.altair_chart(S.plot_significance_summary(_pairwise_df, metadata=_meta))}
+# """)
+
+# %%
+mo.md(r"""
+## Character Ranking: times 1st place
+""")
+
+# %%
+S.plot_most_ranked_1(char_rank, title="Most Popular Character<br>(Number of Times Ranked 1st)", x_label='Character Personality')
+
+# %%
+mo.md(r"""
+## Prominent predefined personality traits wordcloud
+""")
+
+# %%
+top8_traits = S.get_top_8_traits(data)[0]
+S.plot_traits_wordcloud(
+    data=top8_traits,
+    column='Top_8_Traits',
+    title="Most Prominent Personality Traits",
+)
+
+# %%
+mo.md(r"""
+## Trait frequency per brand character
+""")
+
+# %%
+char_df = S.get_character_refine(data)[0]
+
+# %%
+from theme import ColorPalette
+
+# Assuming you already have char_df (your data from get_character_refine or similar)
+characters = ['Bank Teller', 'Familiar Friend', 'The Coach', 'Personal Assistant']
+character_colors = {
+    'Bank Teller': (ColorPalette.CHARACTER_BANK_TELLER, ColorPalette.CHARACTER_BANK_TELLER_HIGHLIGHT),
+    'Familiar Friend': (ColorPalette.CHARACTER_FAMILIAR_FRIEND, ColorPalette.CHARACTER_FAMILIAR_FRIEND_HIGHLIGHT),
+    'The Coach': (ColorPalette.CHARACTER_COACH, ColorPalette.CHARACTER_COACH_HIGHLIGHT),
+    'Personal Assistant': (ColorPalette.CHARACTER_PERSONAL_ASSISTANT, ColorPalette.CHARACTER_PERSONAL_ASSISTANT_HIGHLIGHT),
+}
+
+# Build consistent sort order (by total frequency across all characters)
+all_trait_counts = {}
+for char in characters:
+    freq_df, _ = S.transform_character_trait_frequency(char_df, char)
+    for row in freq_df.iter_rows(named=True):
+        all_trait_counts[row['trait']] = all_trait_counts.get(row['trait'], 0) + row['count']
+
+consistent_sort_order = sorted(all_trait_counts.keys(), key=lambda x: -all_trait_counts[x])
+
+_content = """"""
+# Generate 4 plots (one per character)
+for char in characters:
+    freq_df, _ = S.transform_character_trait_frequency(char_df, char)
+    main_color, highlight_color = character_colors[char]
+    chart = S.plot_single_character_trait_frequency(
+        data=freq_df,
+        character_name=char,
+        bar_color=main_color,
+        highlight_color=highlight_color,
+        trait_sort_order=consistent_sort_order,
+    )
+    _content += f"""
+    {mo.ui.altair_chart(chart)}
+
+
+"""
+
+mo.md(_content)
+
+# %%
+mo.md(r"""
+## Statistical significance best characters
+
+zie chat
+> voorbeeld: als de nr 1 en 2 niet significant verschillen maar wel van de nr 3 bijvoorbeeld is dat ook top. Beetje meedenkend over hoe ik het kan presenteren weetje wat ik bedoel?:)
+>
+""")
+
+# %%
+
+
+# %%
+
+
+# %%
+mo.md(r"""
+---
+
+# Spoken Voice Results
+""")
+
+# %%
+COLOR_GENDER = True
+
+# %%
+mo.md(r"""
+## Top 8 Most Chosen out of 18
+""")
+
+# %%
+v_18_8_3 = S.get_18_8_3(data)[0]
+
+# %%
+S.plot_voice_selection_counts(v_18_8_3, title="Top 8 Voice Selection from 18 Voices", x_label='Voice', color_gender=COLOR_GENDER)
+
+# %%
+mo.md(r"""
+## Top 3 most chosen out of 8
+""")
+
+# %%
+S.plot_top3_selection_counts(v_18_8_3, title="Top 3 Voice Selection Counts from 8 Voices", x_label='Voice', color_gender=COLOR_GENDER)
+
+# %%
+mo.md(r"""
+## Voice Ranking Weighted Score
+""")
+
+# %%
+top3_voices = S.get_top_3_voices(data)[0]
+top3_voices_weighted = calculate_weighted_ranking_scores(top3_voices)
+
+# %%
+S.plot_weighted_ranking_score(top3_voices_weighted, title="Most Popular Voice - Weighted Popularity Score<br>(1st = 3pts, 2nd = 2pts, 3rd = 1pt)", color_gender=COLOR_GENDER)
+
+# %%
+mo.md(r"""
+## Which voice is ranked best in the ranking question for top 3?
+
+(not best 3 out of 8 question)
+""")
+
+# %%
+S.plot_ranking_distribution(top3_voices, x_label='Voice', title="Distribution of Top 3 Voice Rankings (1st, 2nd, 3rd)", color_gender=COLOR_GENDER)
+
+# %%
+mo.md(r"""
+### Statistical significance for voice ranking
+""")
+
+# %%
+# print(top3_voices.collect().head())
+
+# %%
+
+# _pairwise_df, _metadata = S.compute_ranking_significance(
+#     top3_voices,alpha=0.05,correction="none")
+
+# # View significant pairs
+# # print(pairwise_df.filter(pl.col('significant') == True))
+
+# # Create heatmap visualization
+# _heatmap = S.plot_significance_heatmap(
+#     _pairwise_df, 
+#     metadata=_metadata,
+#     title="Weighted Voice Ranking Significance<br>(Pairwise Comparisons)"
+# )
+
+# # Create summary bar chart
+# _summary = S.plot_significance_summary(
+#     _pairwise_df,
+#     metadata=_metadata
+# )
+
+# mo.md(f"""
+# {mo.ui.altair_chart(_heatmap)}
+
+# {mo.ui.altair_chart(_summary)}
+# """)
+
+# %%
+## Voice Ranked 1st the most
+
+# %%
+S.plot_most_ranked_1(top3_voices, title="Most Popular Voice<br>(Number of Times Ranked 1st)", x_label='Voice', color_gender=COLOR_GENDER)
+
+# %%
+mo.md(r"""
+## Voice Scale 1-10
+""")
+
+# %%
+# Get your voice scale data (from notebook)
+voice_1_10, _ = S.get_voice_scale_1_10(data)
+S.plot_average_scores_with_counts(voice_1_10, x_label='Voice', domain=[1,10], title="Voice General Impression (Scale 1-10)", color_gender=COLOR_GENDER)
+
+# %%
+mo.md(r"""
+### Statistical Significance (Scale 1-10)
+""")
+
+# %%
+# Compute pairwise significance tests
+# pairwise_df, metadata = S.compute_pairwise_significance(
+#     voice_1_10,
+#     test_type="mannwhitney",  # or "ttest", "chi2", "auto"
+#     alpha=0.05,
+#     correction="bonferroni"   # or "holm", "none"
+# )
+
+# # View significant pairs
+# # print(pairwise_df.filter(pl.col('significant') == True))
+
+# # Create heatmap visualization
+# _heatmap = S.plot_significance_heatmap(
+#     pairwise_df, 
+#     metadata=metadata,
+#     title="Voice Rating Significance<br>(Pairwise Comparisons)"
+# )
+
+# # Create summary bar chart
+# _summary = S.plot_significance_summary(
+#     pairwise_df,
+#     metadata=metadata
+# )
+
+# mo.md(f"""
+# {mo.ui.altair_chart(_heatmap)}
+
+# {mo.ui.altair_chart(_summary)}
+# """)
+
+# %%
+
+
+# %%
+mo.md(r"""
+## Ranking points for Voice per Chosen Brand Character
+
+**missing mapping**
+""")
+
+# %%
+mo.md(r"""
+## Correlation Speaking Styles
+""")
+
+# %%
+ss_or, choice_map_or = S.get_ss_orange_red(data)
+ss_gb, choice_map_gb = S.get_ss_green_blue(data)
+
+# Combine the data
+ss_all = ss_or.join(ss_gb, on='_recordId')
+_d = ss_all.collect()
+
+choice_map = {**choice_map_or, **choice_map_gb}
+# print(_d.head())
+# print(choice_map)
+ss_long = utils.process_speaking_style_data(ss_all, choice_map)
+
+df_style = utils.process_speaking_style_data(ss_all, choice_map)
+
+vscales = S.get_voice_scale_1_10(data)[0]
+df_scale_long = utils.process_voice_scale_data(vscales)
+
+joined_scale = df_style.join(df_scale_long, on=["_recordId", "Voice"], how="inner")
+
+df_ranking = utils.process_voice_ranking_data(top3_voices)
+joined_ranking = df_style.join(df_ranking, on=['_recordId', 'Voice'], how='inner')
+
+# %%
+joined_ranking.head()
+
+# %%
+mo.md(r"""
+### Colors vs Scale 1-10
+""")
+
+# %%
+# Transform to get one row per color with average correlation
+color_corr_scale, _ = utils.transform_speaking_style_color_correlation(joined_scale, SPEAKING_STYLES)
+S.plot_speaking_style_color_correlation(
+    data=color_corr_scale,
+    title="Correlation: Speaking Style Colors and Voice Scale 1-10"
+)
+
+# %%
+mo.md(r"""
+### Colors vs Ranking Points
+""")
+
+# %%
+color_corr_ranking, _ = utils.transform_speaking_style_color_correlation(
+    joined_ranking, 
+    SPEAKING_STYLES, 
+    target_column="Ranking_Points"
+)
+S.plot_speaking_style_color_correlation(
+    data=color_corr_ranking,
+    title="Correlation: Speaking Style Colors and Voice Ranking Points"
+)
+
+# %%
+# Gender-filtered correlation plots (Male vs Female voices)
+from reference import VOICE_GENDER_MAPPING
+
+MALE_VOICES = [v for v, g in VOICE_GENDER_MAPPING.items() if g == "Male"]
+FEMALE_VOICES = [v for v, g in VOICE_GENDER_MAPPING.items() if g == "Female"]
+
+# Filter joined data by voice gender
+joined_scale_male = joined_scale.filter(pl.col("Voice").is_in(MALE_VOICES))
+joined_scale_female = joined_scale.filter(pl.col("Voice").is_in(FEMALE_VOICES))
+joined_ranking_male = joined_ranking.filter(pl.col("Voice").is_in(MALE_VOICES))
+joined_ranking_female = joined_ranking.filter(pl.col("Voice").is_in(FEMALE_VOICES))
+
+# Colors vs Scale 1-10 (grouped by voice gender)
+S.plot_speaking_style_color_correlation_by_gender(
+    data_male=joined_scale_male,
+    data_female=joined_scale_female,
+    speaking_styles=SPEAKING_STYLES,
+    target_column="Voice_Scale_Score",
+    title="Correlation: Speaking Style Colors and Voice Scale 1-10 (by Voice Gender)",
+    filename="correlation_speaking_style_and_voice_scale_1-10_by_voice_gender_color",
+)
+
+# Colors vs Ranking Points (grouped by voice gender)
+S.plot_speaking_style_color_correlation_by_gender(
+    data_male=joined_ranking_male,
+    data_female=joined_ranking_female,
+    speaking_styles=SPEAKING_STYLES,
+    target_column="Ranking_Points",
+    title="Correlation: Speaking Style Colors and Voice Ranking Points (by Voice Gender)",
+    filename="correlation_speaking_style_and_voice_ranking_points_by_voice_gender_color",
+)
+
+# %%
+mo.md(r"""
+### Individual Traits vs Scale 1-10
+""")
+
+# %%
+_content = """"""
+
+for _style, _traits in SPEAKING_STYLES.items():
+    # print(f"Correlation plot for {style}...")
+    _fig = S.plot_speaking_style_scale_correlation(
+        data=joined_scale,
+        style_color=_style,
+        style_traits=_traits,
+        title=f"Correlation: Speaking Style {_style} and Voice Scale 1-10",
+    )
+    _content += f"""
+#### Speaking Style **{_style}**:
+
+{mo.ui.altair_chart(_fig)}
+
+"""
+mo.md(_content)
+
+# %%
+mo.md(r"""
+### Individual Traits vs Ranking Points
+""")
+
+# %%
+_content = """"""
+
+for _style, _traits in SPEAKING_STYLES.items():
+    # print(f"Correlation plot for {style}...")
+    _fig = S.plot_speaking_style_ranking_correlation(
+    data=joined_ranking,
+    style_color=_style,
+    style_traits=_traits,
+    title=f"Correlation: Speaking Style {_style} and Voice Ranking Points",
+)
+    _content += f"""
+#### Speaking Style **{_style}**:
+
+{mo.ui.altair_chart(_fig)}
+
+"""
+mo.md(_content)
+
+# %%
+# Individual Traits vs Scale 1-10 (grouped by voice gender)
+_content = """### Individual Traits vs Scale 1-10 (by Voice Gender)\n\n"""
+
+for _style, _traits in SPEAKING_STYLES.items():
+    _fig = S.plot_speaking_style_scale_correlation_by_gender(
+        data_male=joined_scale_male,
+        data_female=joined_scale_female,
+        style_color=_style,
+        style_traits=_traits,
+        title=f"Correlation: Speaking Style {_style} and Voice Scale 1-10 (by Voice Gender)",
+        filename=f"correlation_speaking_style_and_voice_scale_1-10_by_voice_gender_{_style.lower()}",
+    )
+    _content += f"""
+#### Speaking Style **{_style}**:
+
+{mo.ui.altair_chart(_fig)}
+
+"""
+mo.md(_content)
+
+# %%
+# Individual Traits vs Ranking Points (grouped by voice gender)
+_content = """### Individual Traits vs Ranking Points (by Voice Gender)\n\n"""
+
+for _style, _traits in SPEAKING_STYLES.items():
+    _fig = S.plot_speaking_style_ranking_correlation_by_gender(
+        data_male=joined_ranking_male,
+        data_female=joined_ranking_female,
+        style_color=_style,
+        style_traits=_traits,
+        title=f"Correlation: Speaking Style {_style} and Voice Ranking Points (by Voice Gender)",
+        filename=f"correlation_speaking_style_and_voice_ranking_points_by_voice_gender_{_style.lower()}",
+    )
+    _content += f"""
+#### Speaking Style **{_style}**:
+
+{mo.ui.altair_chart(_fig)}
+
+"""
+mo.md(_content)
+
+# %%
+# ## Correlations when "Best Brand Character" is chosen
+# For each of the 4 brand characters, filter the dataset to only those respondents 
+# who selected that character as their #1 choice.
+
+# %%
+# Prepare character-filtered data subsets
+char_rank_for_filter = S.get_character_ranking(data)[0].collect()
+
+CHARACTER_FILTER_MAP = {
+    'Familiar Friend': 'Character_Ranking_Familiar_Friend',
+    'The Coach': 'Character_Ranking_The_Coach',
+    'Personal Assistant': 'Character_Ranking_The_Personal_Assistant',
+    'Bank Teller': 'Character_Ranking_The_Bank_Teller',
+}
+
+def get_filtered_data_for_character(char_name: str) -> tuple[pl.DataFrame, pl.DataFrame, int]:
+    """Filter joined_scale and joined_ranking to respondents who ranked char_name #1."""
+    col = CHARACTER_FILTER_MAP[char_name]
+    respondents = char_rank_for_filter.filter(pl.col(col) == 1).select('_recordId')
+    n = respondents.height
+    filtered_scale = joined_scale.join(respondents, on='_recordId', how='inner')
+    filtered_ranking = joined_ranking.join(respondents, on='_recordId', how='inner')
+    return filtered_scale, filtered_ranking, n
+
+def _char_filename(char_name: str, suffix: str) -> str:
+    """Generate filename for character-filtered plots (without n-value).
+    
+    Format: bc_ranked_1_{suffix}__{char_slug}
+    This groups all plot types together in directory listings.
+    """
+    char_slug = char_name.lower().replace(' ', '_')
+    return f"bc_ranked_1_{suffix}__{char_slug}"
+
+
+
+# %%
+# ### Voice Weighted Ranking Score (by Best Character)
+for char_name in CHARACTER_FILTER_MAP:
+    _, _, n = get_filtered_data_for_character(char_name)
+    # Get top3 voices for this character subset using _recordIds
+    respondents = char_rank_for_filter.filter(
+        pl.col(CHARACTER_FILTER_MAP[char_name]) == 1
+    ).select('_recordId')
+    # Collect top3_voices if it's a LazyFrame, then join
+    top3_df = top3_voices.collect() if isinstance(top3_voices, pl.LazyFrame) else top3_voices
+    filtered_top3 = top3_df.join(respondents, on='_recordId', how='inner')
+    weighted = calculate_weighted_ranking_scores(filtered_top3)
+    S.plot_weighted_ranking_score(
+        data=weighted,
+        title=f'"{char_name}" Ranked #1 (n={n})<br>Most Popular Voice - Weighted Score (1st=3pts, 2nd=2pts, 3rd=1pt)',
+        filename=_char_filename(char_name, "voice_weighted_ranking_score"),
+        color_gender=COLOR_GENDER,
+    )
+
+# %%
+# ### Voice Scale 1-10 Average Scores (by Best Character)
+for char_name in CHARACTER_FILTER_MAP:
+    _, _, n = get_filtered_data_for_character(char_name)
+    # Get voice scale data for this character subset using _recordIds
+    respondents = char_rank_for_filter.filter(
+        pl.col(CHARACTER_FILTER_MAP[char_name]) == 1
+    ).select('_recordId')
+    # Collect voice_1_10 if it's a LazyFrame, then join
+    voice_1_10_df = voice_1_10.collect() if isinstance(voice_1_10, pl.LazyFrame) else voice_1_10
+    filtered_voice_1_10 = voice_1_10_df.join(respondents, on='_recordId', how='inner')
+    S.plot_average_scores_with_counts(
+        data=filtered_voice_1_10,
+        title=f'"{char_name}" Ranked #1 (n={n})<br>Voice General Impression (Scale 1-10)',
+        filename=_char_filename(char_name, "voice_scale_1-10"),
+        x_label='Voice',
+        domain=[1, 10],
+        color_gender=COLOR_GENDER,
+    )
+
+
+
+# %%
+# ### Speaking Style Colors vs Scale 1-10 (only for Best Character)
+for char_name in CHARACTER_FILTER_MAP:
+    if char_name.lower().replace(' ', '_') != BEST_CHOSEN_CHARACTER:
+        continue
+    
+    filtered_scale, _, n = get_filtered_data_for_character(char_name)
+    color_corr, _ = utils.transform_speaking_style_color_correlation(filtered_scale, SPEAKING_STYLES)
+    S.plot_speaking_style_color_correlation(
+        data=color_corr,
+        title=f'"{char_name}" Ranked #1 (n={n})<br>Correlation: Speaking Style Colors vs Voice Scale 1-10',
+        filename=_char_filename(char_name, "colors_vs_voice_scale_1-10"),
+    )
+
+# %%
+# ### Speaking Style Colors vs Ranking Points (only for Best Character)
+for char_name in CHARACTER_FILTER_MAP:
+    if char_name.lower().replace(' ', '_') != BEST_CHOSEN_CHARACTER:
+        continue
+    
+    _, filtered_ranking, n = get_filtered_data_for_character(char_name)
+    color_corr, _ = utils.transform_speaking_style_color_correlation(
+        filtered_ranking, SPEAKING_STYLES, target_column="Ranking_Points"
+    )
+    S.plot_speaking_style_color_correlation(
+        data=color_corr,
+        title=f'"{char_name}" Ranked #1 (n={n})<br>Correlation: Speaking Style Colors vs Voice Ranking Points',
+        filename=_char_filename(char_name, "colors_vs_voice_ranking_points"),
+    )
+
+# %%
+# ### Individual Traits vs Scale 1-10 (only for Best Character)
+for _style, _traits in SPEAKING_STYLES.items():
+    print(f"--- Speaking Style: {_style} ---")
+    for char_name in CHARACTER_FILTER_MAP:
+        if char_name.lower().replace(' ', '_') != BEST_CHOSEN_CHARACTER:
+            continue
+        
+        filtered_scale, _, n = get_filtered_data_for_character(char_name)
+        S.plot_speaking_style_scale_correlation(
+            data=filtered_scale,
+            style_color=_style,
+            style_traits=_traits,
+            title=f'"{char_name}" Ranked #1 (n={n})<br>Correlation: {_style} vs Voice Scale 1-10',
+            filename=_char_filename(char_name, f"{_style.lower()}_vs_voice_scale_1-10"),
+        )
+
+# %%
+# ### Individual Traits vs Ranking Points (only for Best Character)
+for _style, _traits in SPEAKING_STYLES.items():
+    print(f"--- Speaking Style: {_style} ---")
+    for char_name in CHARACTER_FILTER_MAP:
+        if char_name.lower().replace(' ', '_') != BEST_CHOSEN_CHARACTER:
+            continue
+        
+        _, filtered_ranking, n = get_filtered_data_for_character(char_name)
+        S.plot_speaking_style_ranking_correlation(
+            data=filtered_ranking,
+            style_color=_style,
+            style_traits=_traits,
+            title=f'"{char_name}" Ranked #1 (n={n})<br>Correlation: {_style} vs Voice Ranking Points',
+            filename=_char_filename(char_name, f"{_style.lower()}_vs_voice_ranking_points"),
+        )
+
+
+# %%
--- a/XX_statistical_significance.script.py
+++ b/XX_statistical_significance.script.py
@@ -0,0 +1,370 @@
+"""Extra statistical significance analyses for quant report."""
+# %% Imports
+
+import utils
+import polars as pl
+import argparse
+import json
+import re
+from pathlib import Path
+
+
+
+# %% Fixed Variables
+RESULTS_FILE = 'data/exports/2-4-26/JPMC_Chase Brand Personality_Quant Round 1_February 4, 2026_Labels.csv'
+QSF_FILE = 'data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf'
+
+
+# %% CLI argument parsing for batch automation
+# When run as script: uv run XX_statistical_significance.script.py --age '["18
+# Central filter configuration - add new filters here only
+# Format: 'cli_arg_name': 'QualtricsSurvey.options_* attribute name'
+FILTER_CONFIG = {
+    'age': 'options_age',
+    'gender': 'options_gender',
+    'ethnicity': 'options_ethnicity',
+    'income': 'options_income',
+    'consumer': 'options_consumer',
+    'business_owner': 'options_business_owner',
+    'ai_user': 'options_ai_user',
+    'investable_assets': 'options_investable_assets',
+    'industry': 'options_industry',
+}
+
+def parse_cli_args():
+    parser = argparse.ArgumentParser(description='Generate quant report with optional filters')
+    
+    # Dynamically add filter arguments from config
+    for filter_name in FILTER_CONFIG:
+        parser.add_argument(f'--{filter_name}', type=str, default=None, help=f'JSON list of {filter_name} values')
+    
+    parser.add_argument('--filter-name', type=str, default=None, help='Name for this filter combination (used for .txt description file)')
+    parser.add_argument('--figures-dir', type=str, default=f'figures/statistical_significance/{Path(RESULTS_FILE).parts[2]}', help='Override the default figures directory')
+    
+    # Only parse if running as script (not in Jupyter/interactive)
+    try:
+        # Check if running in Jupyter by looking for ipykernel
+        get_ipython()  # noqa: F821 # type: ignore
+        # Return namespace with all filters set to None
+        no_filters = {f: None for f in FILTER_CONFIG}
+        # Use the same default as argparse
+        default_fig_dir = f'figures/statistical_significance/{Path(RESULTS_FILE).parts[2]}'
+        return argparse.Namespace(**no_filters, filter_name=None, figures_dir=default_fig_dir)
+    except NameError:
+        args = parser.parse_args()
+        # Parse JSON strings to lists
+        for filter_name in FILTER_CONFIG:
+            val = getattr(args, filter_name)
+            setattr(args, filter_name, json.loads(val) if val else None)
+        return args
+
+cli_args = parse_cli_args()
+
+
+# %%
+S = utils.QualtricsSurvey(RESULTS_FILE, QSF_FILE, figures_dir=cli_args.figures_dir)
+data_all = S.load_data()
+
+
+# %% Build filtered dataset based on CLI args
+
+# CLI args: None means "no filter applied" - filter_data() will skip None filters
+
+# Build filter values dict dynamically from FILTER_CONFIG
+_active_filters = {filter_name: getattr(cli_args, filter_name) for filter_name in FILTER_CONFIG}
+
+_d = S.filter_data(data_all, **_active_filters)
+
+# Write filter description file if filter-name is provided
+if cli_args.filter_name and S.fig_save_dir:
+    # Get the filter slug (e.g., "All_Respondents", "Cons-Starter", etc.)
+    _filter_slug = S._get_filter_slug()
+    _filter_slug_dir = S.fig_save_dir / _filter_slug
+    _filter_slug_dir.mkdir(parents=True, exist_ok=True)
+    
+    # Build filter description
+    _filter_desc_lines = [
+        f"Filter: {cli_args.filter_name}",
+        "",
+        "Applied Filters:",
+    ]
+    _short_desc_parts = []
+    for filter_name, options_attr in FILTER_CONFIG.items():
+        all_options = getattr(S, options_attr)
+        values = _active_filters[filter_name]
+        display_name = filter_name.replace('_', ' ').title()
+        # None means no filter applied (same as "All")
+        if values is not None and values != all_options:
+            _short_desc_parts.append(f"{display_name}: {', '.join(values)}")
+            _filter_desc_lines.append(f"  {display_name}: {', '.join(values)}")
+        else:
+            _filter_desc_lines.append(f"  {display_name}: All")
+    
+    # Write detailed description INSIDE the filter-slug directory
+    # Sanitize filter name for filename usage (replace / and other chars)
+    _safe_filter_name = re.sub(r'[^\w\s-]', '_', cli_args.filter_name)
+    _filter_file = _filter_slug_dir / f"{_safe_filter_name}.txt"
+    _filter_file.write_text('\n'.join(_filter_desc_lines))
+    
+    # Append to summary index file at figures/<export_date>/filter_index.txt
+    _summary_file = S.fig_save_dir / "filter_index.txt"
+    _short_desc = "; ".join(_short_desc_parts) if _short_desc_parts else "All Respondents"
+    _summary_line = f"{_filter_slug}  |  {cli_args.filter_name}  |  {_short_desc}\n"
+    
+    # Append or create the summary file
+    if _summary_file.exists():
+        _existing = _summary_file.read_text()
+        # Avoid duplicate entries for same slug
+        if _filter_slug not in _existing:
+            with _summary_file.open('a') as f:
+                f.write(_summary_line)
+    else:
+        _header = "Filter Index\n" + "=" * 80 + "\n\n"
+        _header += "Directory  |  Filter Name  |  Description\n"
+        _header += "-" * 80 + "\n"
+        _summary_file.write_text(_header + _summary_line)
+
+# Save to logical variable name for further analysis
+data = _d
+data.collect()
+
+# %% Character coach significatly higher than others
+
+
+char_rank = S.get_character_ranking(data)[0]
+
+
+
+_pairwise_df, _meta = S.compute_ranking_significance(
+    char_rank,
+    alpha=0.05,
+    correction="none",
+    )
+
+# %% [markdown]
+"""
+### Methodology Analysis
+
+**Input Data (`char_rank`)**:
+*   Generated by `S.get_character_ranking(data)`.
+*   Contains the ranking values (1st, 2nd, 3rd, 4th) assigned by each respondent to the four options ("The Coach", etc.).
+*   Columns represent the characters; rows represent individual respondents; values are the numerical rank (1 = Top Choice).
+
+**Processing**:
+*   The function `compute_ranking_significance` aggregates these rankings to find the **"Rank 1 Share"** (the percentage of respondents who picked that character as their #1 favorite).
+*   It builds a contingency table of how many times each character was ranked 1st vs. not 1st (or 1st v 2nd v 3rd).
+
+**Statistical Test**:
+*   **Test Used**: Pairwise Z-test for two proportions (uncorrected).
+*   **Comparison**: It compares the **Rank 1 Share** of every pair of characters.
+    *   *Example*: "Is the 42% of people who chose 'Coach' significantly different from the 29% who chose 'Familiar Friend'?"
+*   **Significance**: A result of `p < 0.05` means the difference in popularity (top-choice preference) is statistically significant and not due to random chance.
+"""
+
+
+# %% Plot heatmap of pairwise significance
+S.plot_significance_heatmap(_pairwise_df, metadata=_meta, title="Statistical Significance: Character Top Choice Preference")
+
+# %% Plot summary of significant differences (e.g., which characters are significantly higher than others)
+# S.plot_significance_summary(_pairwise_df, metadata=_meta)
+
+# %% [markdown]
+"""
+# Analysis: Significance of "The Coach"
+
+**Parameters**: `alpha=0.05`, `correction='none'`
+*   **Rationale**: No correction was applied to allow for detection of all potential pairwise differences (uncorrected p < 0.05). If strict control for family-wise error rate were required (e.g., Bonferroni), the significance threshold would be lower (p < 0.0083).
+
+**Results**:
+"The Coach" is the top-ranked option (42.0% Rank 1 share) and shows strong separation from the field.
+
+*   **Vs. Bottom Two**: "The Coach" is significantly higher than both "The Bank Teller" (26.9%, p < 0.001) and "Familiar Friend" (29.4%, p < 0.001).
+*   **Vs. Runner-Up**: "The Coach" is widely preferred over "The Personal Assistant" (33.4%). The difference of **8.6 percentage points** is statistically significant (p = 0.017) at the standard 0.05 level.
+    *   *Note*: While p=0.017 is significant in isolation, it would not meet the stricter Bonferroni threshold (0.0083). However, the effect size (+8.6%) is commercially meaningful.
+
+**Conclusion**:
+Yes, "The Coach" can be considered statistically more significant than the other options. It is clearly superior to the bottom two options and holds a statistically significant lead over the runner-up ("Personal Assistant") in direct comparison.
+"""
+
+# %% Mentions significance analysis
+
+char_pairwise_df_mentions, _meta_mentions = S.compute_mentions_significance(
+    char_rank,
+    alpha=0.05,
+    correction="none",
+)
+
+S.plot_significance_heatmap(
+    char_pairwise_df_mentions,
+    metadata=_meta_mentions,
+    title="Statistical Significance: Character Total Mentions (Top 3 Visibility)"
+)
+
+
+# %% voices analysis
+top3_voices = S.get_top_3_voices(data)[0]
+
+
+_pairwise_df_voice, _metadata = S.compute_ranking_significance(
+    top3_voices,alpha=0.05,correction="none")
+
+
+S.plot_significance_heatmap(
+    _pairwise_df_voice, 
+    metadata=_metadata,
+    title="Statistical Significance: Voice Top Choice Preference"
+)
+# %% Total Mentions Significance (Rank 1+2+3 Combined)
+# This tests "Quantity" (Visibility) instead of "Quality" (Preference)
+
+_pairwise_df_mentions, _meta_mentions = S.compute_mentions_significance(
+    top3_voices,
+    alpha=0.05,
+    correction="none"
+)
+
+S.plot_significance_heatmap(
+    _pairwise_df_mentions,
+    metadata=_meta_mentions,
+    title="Statistical Significance: Voice Total Mentions (Top 3 Visibility)"
+)
+# %% Male Voices Only Analysis
+import reference
+
+def filter_voices_by_gender(df: pl.DataFrame, target_gender: str) -> pl.DataFrame:
+    """Filter ranking columns to keep only those matching target gender."""
+    cols_to_keep = []
+    
+    # Always keep identifier if present
+    if '_recordId' in df.columns:
+        cols_to_keep.append('_recordId')
+        
+    for col in df.columns:
+        # Check if column is a voice column (contains Vxx)
+        # Format is typically "Top_3_Voices_ranking__V14"
+        if '__V' in col:
+            voice_id = col.split('__')[1]
+            if reference.VOICE_GENDER_MAPPING.get(voice_id) == target_gender:
+                cols_to_keep.append(col)
+                
+    return df.select(cols_to_keep)
+
+# Get full ranking data as DataFrame
+df_voices = top3_voices.collect()
+
+# Filter for Male voices
+df_male_voices = filter_voices_by_gender(df_voices, 'Male')
+
+# 1. Male Voices: Top Choice Preference (Rank 1)
+_pairwise_male_pref, _meta_male_pref = S.compute_ranking_significance(
+    df_male_voices,
+    alpha=0.05,
+    correction="none"
+)
+
+S.plot_significance_heatmap(
+    _pairwise_male_pref,
+    metadata=_meta_male_pref,
+    title="Male Voices Only: Top Choice Preference Significance"
+)
+
+# 2. Male Voices: Total Mentions (Visibility)
+_pairwise_male_vis, _meta_male_vis = S.compute_mentions_significance(
+    df_male_voices,
+    alpha=0.05,
+    correction="none"
+)
+
+S.plot_significance_heatmap(
+    _pairwise_male_vis,
+    metadata=_meta_male_vis,
+    title="Male Voices Only: Total Mentions Significance"
+)
+# %% Male Voices (Excluding Bottom 3: V88, V86, V81)
+
+# Start with the male voices dataframe from the previous step
+voices_to_exclude = ['V88', 'V86', 'V81']
+
+def filter_exclude_voices(df: pl.DataFrame, exclude_list: list[str]) -> pl.DataFrame:
+    """Filter ranking columns to exclude specific voices."""
+    cols_to_keep = []
+    
+    # Always keep identifier if present
+    if '_recordId' in df.columns:
+        cols_to_keep.append('_recordId')
+        
+    for col in df.columns:
+        # Check if column is a voice column (contains Vxx)
+        if '__V' in col:
+            voice_id = col.split('__')[1]
+            if voice_id not in exclude_list:
+                cols_to_keep.append(col)
+                
+    return df.select(cols_to_keep)
+
+df_male_top = filter_exclude_voices(df_male_voices, voices_to_exclude)
+
+# 1. Male Top Candidates: Top Choice Preference
+_pairwise_male_top_pref, _meta_male_top_pref = S.compute_ranking_significance(
+    df_male_top,
+    alpha=0.05,
+    correction="none"
+)
+
+S.plot_significance_heatmap(
+    _pairwise_male_top_pref,
+    metadata=_meta_male_top_pref,
+    title="Male Voices (Excl. Bottom 3): Top Choice Preference Significance"
+)
+
+# 2. Male Top Candidates: Total Mentions
+_pairwise_male_top_vis, _meta_male_top_vis = S.compute_mentions_significance(
+    df_male_top,
+    alpha=0.05,
+    correction="none"
+)
+
+S.plot_significance_heatmap(
+    _pairwise_male_top_vis,
+    metadata=_meta_male_top_vis,
+    title="Male Voices (Excl. Bottom 3): Total Mentions Significance"
+)
+
+# %% [markdown]
+"""
+# Rank 1 Selection Significance (Voice Level)
+
+Similar to the Total Mentions significance analysis above, but counting
+only how many times each voice was ranked **1st** (out of all respondents).
+This isolates first-choice preference rather than overall top-3 visibility.
+"""
+
+# %% Rank 1 Significance: All Voices
+
+_pairwise_df_rank1, _meta_rank1 = S.compute_rank1_significance(
+    top3_voices,
+    alpha=0.05,
+    correction="none",
+)
+
+S.plot_significance_heatmap(
+    _pairwise_df_rank1,
+    metadata=_meta_rank1,
+    title="Statistical Significance: Voice Rank 1 Selection"
+)
+
+# %% Rank 1 Significance: Male Voices Only
+
+_pairwise_df_rank1_male, _meta_rank1_male = S.compute_rank1_significance(
+    df_male_voices,
+    alpha=0.05,
+    correction="none",
+)
+
+S.plot_significance_heatmap(
+    _pairwise_df_rank1_male,
+    metadata=_meta_rank1_male,
+    title="Male Voices Only: Rank 1 Selection Significance"
+)
+
+# %%
--- a/XX_straight_liners.py
+++ b/XX_straight_liners.py
@@ -0,0 +1,267 @@
+"""Extra analyses of the straight-liners"""
+# %% Imports
+
+import utils
+import polars as pl
+import argparse
+import json
+import re
+from pathlib import Path
+from validation import check_straight_liners
+
+
+# %% Fixed Variables
+RESULTS_FILE = 'data/exports/2-4-26/JPMC_Chase Brand Personality_Quant Round 1_February 4, 2026_Labels.csv'
+QSF_FILE = 'data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf'
+
+
+# %% CLI argument parsing for batch automation
+# When run as script: uv run XX_statistical_significance.script.py --age '["18
+# Central filter configuration - add new filters here only
+# Format: 'cli_arg_name': 'QualtricsSurvey.options_* attribute name'
+FILTER_CONFIG = {
+    'age': 'options_age',
+    'gender': 'options_gender',
+    'ethnicity': 'options_ethnicity',
+    'income': 'options_income',
+    'consumer': 'options_consumer',
+    'business_owner': 'options_business_owner',
+    'ai_user': 'options_ai_user',
+    'investable_assets': 'options_investable_assets',
+    'industry': 'options_industry',
+}
+
+def parse_cli_args():
+    parser = argparse.ArgumentParser(description='Generate quant report with optional filters')
+    
+    # Dynamically add filter arguments from config
+    for filter_name in FILTER_CONFIG:
+        parser.add_argument(f'--{filter_name}', type=str, default=None, help=f'JSON list of {filter_name} values')
+    
+    parser.add_argument('--filter-name', type=str, default=None, help='Name for this filter combination (used for .txt description file)')
+    parser.add_argument('--figures-dir', type=str, default=f'figures/straight-liner-analysis/{Path(RESULTS_FILE).parts[2]}', help='Override the default figures directory')
+    
+    # Only parse if running as script (not in Jupyter/interactive)
+    try:
+        # Check if running in Jupyter by looking for ipykernel
+        get_ipython()  # noqa: F821 # type: ignore
+        # Return namespace with all filters set to None
+        no_filters = {f: None for f in FILTER_CONFIG}
+        # Use the same default as argparse
+        default_fig_dir = f'figures/straight-liner-analysis/{Path(RESULTS_FILE).parts[2]}'
+        return argparse.Namespace(**no_filters, filter_name=None, figures_dir=default_fig_dir)
+    except NameError:
+        args = parser.parse_args()
+        # Parse JSON strings to lists
+        for filter_name in FILTER_CONFIG:
+            val = getattr(args, filter_name)
+            setattr(args, filter_name, json.loads(val) if val else None)
+        return args
+
+cli_args = parse_cli_args()
+
+
+# %%
+S = utils.QualtricsSurvey(RESULTS_FILE, QSF_FILE, figures_dir=cli_args.figures_dir)
+data_all = S.load_data()
+
+
+# %% Build filtered dataset based on CLI args
+
+# CLI args: None means "no filter applied" - filter_data() will skip None filters
+
+# Build filter values dict dynamically from FILTER_CONFIG
+_active_filters = {filter_name: getattr(cli_args, filter_name) for filter_name in FILTER_CONFIG}
+
+_d = S.filter_data(data_all, **_active_filters)
+
+# Write filter description file if filter-name is provided
+if cli_args.filter_name and S.fig_save_dir:
+    # Get the filter slug (e.g., "All_Respondents", "Cons-Starter", etc.)
+    _filter_slug = S._get_filter_slug()
+    _filter_slug_dir = S.fig_save_dir / _filter_slug
+    _filter_slug_dir.mkdir(parents=True, exist_ok=True)
+    
+    # Build filter description
+    _filter_desc_lines = [
+        f"Filter: {cli_args.filter_name}",
+        "",
+        "Applied Filters:",
+    ]
+    _short_desc_parts = []
+    for filter_name, options_attr in FILTER_CONFIG.items():
+        all_options = getattr(S, options_attr)
+        values = _active_filters[filter_name]
+        display_name = filter_name.replace('_', ' ').title()
+        # None means no filter applied (same as "All")
+        if values is not None and values != all_options:
+            _short_desc_parts.append(f"{display_name}: {', '.join(values)}")
+            _filter_desc_lines.append(f"  {display_name}: {', '.join(values)}")
+        else:
+            _filter_desc_lines.append(f"  {display_name}: All")
+    
+    # Write detailed description INSIDE the filter-slug directory
+    # Sanitize filter name for filename usage (replace / and other chars)
+    _safe_filter_name = re.sub(r'[^\w\s-]', '_', cli_args.filter_name)
+    _filter_file = _filter_slug_dir / f"{_safe_filter_name}.txt"
+    _filter_file.write_text('\n'.join(_filter_desc_lines))
+    
+    # Append to summary index file at figures/<export_date>/filter_index.txt
+    _summary_file = S.fig_save_dir / "filter_index.txt"
+    _short_desc = "; ".join(_short_desc_parts) if _short_desc_parts else "All Respondents"
+    _summary_line = f"{_filter_slug}  |  {cli_args.filter_name}  |  {_short_desc}\n"
+    
+    # Append or create the summary file
+    if _summary_file.exists():
+        _existing = _summary_file.read_text()
+        # Avoid duplicate entries for same slug
+        if _filter_slug not in _existing:
+            with _summary_file.open('a') as f:
+                f.write(_summary_line)
+    else:
+        _header = "Filter Index\n" + "=" * 80 + "\n\n"
+        _header += "Directory  |  Filter Name  |  Description\n"
+        _header += "-" * 80 + "\n"
+        _summary_file.write_text(_header + _summary_line)
+
+# Save to logical variable name for further analysis
+data = _d
+data.collect()
+
+
+# %% Determine straight-liner repeat offenders
+# Extract question groups with renamed columns that check_straight_liners expects.
+# The raw `data` has QID-based column names; the getter methods rename them to
+# patterns like SS_Green_Blue__V14__Choice_1, Voice_Scale_1_10__V48, etc.
+
+ss_or, _ = S.get_ss_orange_red(data)
+ss_gb, _ = S.get_ss_green_blue(data)
+vs, _ = S.get_voice_scale_1_10(data)
+
+# Combine all question groups into one wide LazyFrame (joined on _recordId)
+all_questions = ss_or.join(ss_gb, on='_recordId').join(vs, on='_recordId')
+
+# Run straight-liner detection across all question groups
+# max_score=5 catches all speaking-style straight-lining (1-5 scale)
+# and voice-scale values ≤5 on the 1-10 scale
+# Note: sl_threshold is NOT set on S here — this script analyses straight-liners,
+# it doesn't filter them out of the dataset.
+print("Running straight-liner detection across all question groups...")
+sl_report, sl_df = check_straight_liners(all_questions, max_score=5)
+
+# %% Quantify repeat offenders
+# sl_df has one row per (Record ID, Question Group) that was straight-lined.
+# Group by Record ID to count how many question groups each person SL'd.
+
+if sl_df is not None and not sl_df.is_empty():
+    total_respondents = data.select(pl.len()).collect().item()
+
+    # Per-respondent count of straight-lined question groups
+    respondent_sl_counts = (
+        sl_df
+        .group_by("Record ID")
+        .agg(pl.len().alias("sl_count"))
+        .sort("sl_count", descending=True)
+    )
+
+    max_sl = respondent_sl_counts["sl_count"].max()
+    print(f"\nTotal respondents: {total_respondents}")
+    print(f"Respondents who straight-lined at least 1 question group: "
+          f"{respondent_sl_counts.height}")
+    print(f"Maximum question groups straight-lined by one person: {max_sl}")
+    print()
+
+    # Build cumulative distribution: for each threshold N, count respondents
+    # who straight-lined >= N question groups
+    cumulative_rows = []
+    for threshold in range(1, max_sl + 1):
+        count = respondent_sl_counts.filter(
+            pl.col("sl_count") >= threshold
+        ).height
+        pct = (count / total_respondents) * 100
+        cumulative_rows.append({
+            "threshold": threshold,
+            "count": count,
+            "pct": pct,
+        })
+        print(
+            f"  ≥{threshold} question groups straight-lined: "
+            f"{count} respondents ({pct:.1f}%)"
+        )
+
+    cumulative_df = pl.DataFrame(cumulative_rows)
+    print(f"\n{cumulative_df}")
+
+    # %% Save cumulative data to CSV
+    _filter_slug = S._get_filter_slug()
+    _csv_dir = Path(S.fig_save_dir) / _filter_slug
+    _csv_dir.mkdir(parents=True, exist_ok=True)
+
+    _csv_path = _csv_dir / "straight_liner_repeat_offenders.csv"
+    cumulative_df.write_csv(_csv_path)
+    print(f"Saved cumulative data to {_csv_path}")
+
+    # %% Plot the cumulative distribution
+    S.plot_straight_liner_repeat_offenders(
+        cumulative_df,
+        total_respondents=total_respondents,
+    )
+
+    # %% Per-question straight-lining frequency
+    # Build human-readable question group names from the raw keys
+    def _humanise_question_group(key: str) -> str:
+        """Convert internal question group key to a readable label.
+
+        Examples:
+            SS_Green_Blue__V14  → Green/Blue – V14
+            SS_Orange_Red__V48  → Orange/Red – V48
+            Voice_Scale_1_10    → Voice Scale (1-10)
+        """
+        if key.startswith("SS_Green_Blue__"):
+            voice = key.split("__")[1]
+            return f"Green/Blue – {voice}"
+        if key.startswith("SS_Orange_Red__"):
+            voice = key.split("__")[1]
+            return f"Orange/Red – {voice}"
+        if key == "Voice_Scale_1_10":
+            return "Voice Scale (1-10)"
+        # Fallback: replace underscores
+        return key.replace("_", " ")
+
+    per_question_counts = (
+        sl_df
+        .group_by("Question Group")
+        .agg(pl.col("Record ID").n_unique().alias("count"))
+        .sort("count", descending=True)
+        .with_columns(
+            (pl.col("count") / total_respondents * 100).alias("pct")
+        )
+    )
+
+    # Add human-readable names
+    per_question_counts = per_question_counts.with_columns(
+        pl.col("Question Group").map_elements(
+            _humanise_question_group, return_dtype=pl.Utf8
+        ).alias("question")
+    )
+
+    print("\n--- Per-Question Straight-Lining Frequency ---")
+    print(per_question_counts)
+
+    # Save per-question data to CSV
+    _csv_path_pq = _csv_dir / "straight_liner_per_question.csv"
+    per_question_counts.write_csv(_csv_path_pq)
+    print(f"Saved per-question data to {_csv_path_pq}")
+
+    # Plot
+    S.plot_straight_liner_per_question(
+        per_question_counts,
+        total_respondents=total_respondents,
+    )
+
+    # %% Show the top repeat offenders (respondents with most SL'd groups)
+    print("\n--- Top Repeat Offenders ---")
+    print(respondent_sl_counts.head(20))
+
+else:
+    print("No straight-liners detected in the dataset.")
--- a/analysis_missing_voice_ranking.ipynb
+++ b/analysis_missing_voice_ranking.ipynb
--- a/docs/README.pdf
+++ b/docs/README.pdf
--- a/docs/figures_structure_manual.md
+++ b/docs/figures_structure_manual.md
@@ -0,0 +1,104 @@
+# Appendix: Quantitative Analysis Plots - Folder Structure Manual
+
+This folder contains all the quantitative analysis plots, sorted by the filters applied to the dataset. Each folder corresponds to a specific demographic cut.
+
+## Folder Overview
+
+* `All_Respondents/`: Analysis of the full dataset (no filters).
+* `filter_index.txt`: A master list of every folder code and its corresponding demographic filter.
+* **Filter Folders**: All other folders represent specific demographic cuts (e.g., `Age-18to21years`, `Gen-Woman`).
+
+## How to Navigate
+
+Each folder contains the same set of charts generated for that specific filter.
+
+## Directory Reference Table
+
+Below is the complete list of folder names. These names are encodings of the filters applied to the dataset, which we use to maintain consistency across our analysis. 
+
+| Directory Code | Filter Description |
+| :--- | :--- |
+| All_Respondents | All Respondents |
+| Age-18to21years | Age: 18 to 21 years |
+| Age-22to24years | Age: 22 to 24 years |
+| Age-25to34years | Age: 25 to 34 years |
+| Age-35to40years | Age: 35 to 40 years |
+| Age-41to50years | Age: 41 to 50 years |
+| Age-51to59years | Age: 51 to 59 years |
+| Age-60to70years | Age: 60 to 70 years |
+| Age-70yearsormore | Age: 70 years or more |
+| Gen-Man | Gender: Man |
+| Gen-Prefernottosay | Gender: Prefer not to say |
+| Gen-Woman | Gender: Woman |
+| Eth-6_grps_c64411 | Ethnicity: All options containing 'Alaska Native or Indigenous American' |
+| Eth-6_grps_8f145b | Ethnicity: All options containing 'Asian or Asian American' |
+| Eth-8_grps_71ac47 | Ethnicity: All options containing 'Black or African American' |
+| Eth-7_grps_c5b3ce | Ethnicity: All options containing 'Hispanic or Latinx' |
+| Eth-BlackorAfricanAmerican<br>MiddleEasternorNorthAfrican<br>WhiteorCaucasian+<br>MiddleEasternorNorthAfrican | Ethnicity: Middle Eastern or North African |
+| Eth-AsianorAsianAmericanBlackorAfricanAmerican<br>NativeHawaiianorOtherPacificIslander+<br>NativeHawaiianorOtherPacificIslander | Ethnicity: Native Hawaiian or Other Pacific Islander |
+| Eth-10_grps_cef760 | Ethnicity: All options containing 'White or Caucasian' |
+| Inc-100000to149999 | Income: $100,000 to $149,999 |
+| Inc-150000to199999 | Income: $150,000 to $199,999 |
+| Inc-200000ormore | Income: $200,000 or more |
+| Inc-25000to34999 | Income: $25,000 to $34,999 |
+| Inc-35000to54999 | Income: $35,000 to $54,999 |
+| Inc-55000to79999 | Income: $55,000 to $79,999 |
+| Inc-80000to99999 | Income: $80,000 to $99,999 |
+| Inc-Lessthan25000 | Income: Less than $25,000 |
+| Cons-Lower_Mass_A+Lower_Mass_B | Consumer: Lower_Mass_A, Lower_Mass_B |
+| Cons-MassAffluent_A+MassAffluent_B | Consumer: MassAffluent_A, MassAffluent_B |
+| Cons-Mass_A+Mass_B | Consumer: Mass_A, Mass_B |
+| Cons-Mix_of_Affluent_Wealth__<br>High_Net_Woth_A+<br>Mix_of_Affluent_Wealth__<br>High_Net_Woth_B | Consumer: Mix_of_Affluent_Wealth_&_High_Net_Woth_A, Mix_of_Affluent_Wealth_&_High_Net_Woth_B |
+| Cons-Early_Professional | Consumer: Early_Professional |
+| Cons-Lower_Mass_B | Consumer: Lower_Mass_B |
+| Cons-MassAffluent_B | Consumer: MassAffluent_B |
+| Cons-Mass_B | Consumer: Mass_B |
+| Cons-Mix_of_Affluent_Wealth__<br>High_Net_Woth_B | Consumer: Mix_of_Affluent_Wealth_&_High_Net_Woth_B |
+| Cons-Starter | Consumer: Starter |
+| BizOwn-No | Business Owner: No |
+| BizOwn-Yes | Business Owner: Yes |
+| AI-Daily | Ai User: Daily |
+| AI-Lessthanonceamonth | Ai User: Less than once a month |
+| AI-Morethanoncedaily | Ai User: More than once daily |
+| AI-Multipletimesperweek | Ai User: Multiple times per week |
+| AI-Onceamonth | Ai User: Once a month |
+| AI-Onceaweek | Ai User: Once a week |
+| AI-RarelyNever | Ai User: Rarely/Never |
+| AI-Daily+<br>Morethanoncedaily+<br>Multipletimesperweek | Ai User: Daily, More than once daily, Multiple times per week |
+| AI-4_grps_d4f57a | Ai User: Once a week, Once a month, Less than once a month, Rarely/Never |
+| InvAsts-0to24999 | Investable Assets: $0 to $24,999 |
+| InvAsts-150000to249999 | Investable Assets: $150,000 to $249,999 |
+| InvAsts-1Mto4.9M | Investable Assets: $1M to $4.9M |
+| InvAsts-25000to49999 | Investable Assets: $25,000 to $49,999 |
+| InvAsts-250000to499999 | Investable Assets: $250,000 to $499,999 |
+| InvAsts-50000to149999 | Investable Assets: $50,000 to $149,999 |
+| InvAsts-500000to999999 | Investable Assets: $500,000 to $999,999 |
+| InvAsts-5Mormore | Investable Assets: $5M or more |
+| InvAsts-Prefernottoanswer | Investable Assets: Prefer not to answer |
+| Ind-Agricultureforestryfishingorhunting | Industry: Agriculture, forestry, fishing, or hunting |
+| Ind-Artsentertainmentorrecreation | Industry: Arts, entertainment, or recreation |
+| Ind-Broadcasting | Industry: Broadcasting |
+| Ind-Construction | Industry: Construction |
+| Ind-EducationCollegeuniversityoradult | Industry: Education – College, university, or adult |
+| Ind-EducationOther | Industry: Education – Other |
+| Ind-EducationPrimarysecondaryK-12 | Industry: Education – Primary/secondary (K-12) |
+| Ind-Governmentandpublicadministration | Industry: Government and public administration |
+| Ind-Hotelandfoodservices | Industry: Hotel and food services |
+| Ind-InformationOther | Industry: Information – Other |
+| Ind-InformationServicesanddata | Industry: Information – Services and data |
+| Ind-Legalservices | Industry: Legal services |
+| Ind-ManufacturingComputerandelectronics | Industry: Manufacturing – Computer and electronics |
+| Ind-ManufacturingOther | Industry: Manufacturing – Other |
+| Ind-Notemployed | Industry: Not employed |
+| Ind-Otherindustrypleasespecify | Industry: Other industry (please specify) |
+| Ind-Processing | Industry: Processing |
+| Ind-Publishing | Industry: Publishing |
+| Ind-Realestaterentalorleasing | Industry: Real estate, rental, or leasing |
+| Ind-Retired | Industry: Retired |
+| Ind-Scientificortechnicalservices | Industry: Scientific or technical services |
+| Ind-Software | Industry: Software |
+| Ind-Telecommunications | Industry: Telecommunications |
+| Ind-Transportationandwarehousing | Industry: Transportation and warehousing |
+| Ind-Utilities | Industry: Utilities |
+| Ind-Wholesale | Industry: Wholesale |
+
--- a/docs/statistical-significance-guide.md
+++ b/docs/statistical-significance-guide.md
@@ -0,0 +1,428 @@
+# Statistical Significance Testing Guide
+
+A beginner-friendly reference for choosing the right statistical test and correction method for your Voice Branding analysis.
+
+---
+
+## Table of Contents
+1. [Quick Decision Flowchart](#quick-decision-flowchart)
+2. [Understanding Your Data Types](#understanding-your-data-types)
+3. [Available Tests](#available-tests)
+4. [Multiple Comparison Corrections](#multiple-comparison-corrections)
+5. [Interpreting Results](#interpreting-results)
+6. [Code Examples](#code-examples)
+
+---
+
+## Quick Decision Flowchart
+
+```
+What kind of data do you have?
+│
+├─► Continuous scores (1-10 ratings, averages)
+│   │
+│   └─► Use: compute_pairwise_significance()
+│       │
+│       ├─► Data normally distributed? → test_type="ttest"
+│       └─► Not sure / skewed data?   → test_type="mannwhitney" (safer choice)
+│
+└─► Ranking data (1st, 2nd, 3rd place votes)
+    │
+    └─► Use: compute_ranking_significance()
+        (automatically uses proportion z-test)
+```
+
+---
+
+## Understanding Your Data Types
+
+### Continuous Data
+**What it looks like:** Numbers on a scale with many possible values.
+
+| Example | Data Source |
+|---------|-------------|
+| Voice ratings 1-10 | `get_voice_scale_1_10()` |
+| Speaking style scores | `get_ss_green_blue()` |
+| Any averaged scores | Custom aggregations |
+
+```
+shape: (5, 3)
+┌───────────┬─────────────────┬─────────────────┐
+│ _recordId │ Voice_Scale__V14│ Voice_Scale__V04│
+│ str       │ f64             │ f64             │
+├───────────┼─────────────────┼─────────────────┤
+│ R_001     │ 7.5             │ 6.0             │
+│ R_002     │ 8.0             │ 7.5             │
+│ R_003     │ 6.5             │ 8.0             │
+```
+
+### Ranking Data
+**What it looks like:** Discrete ranks (1, 2, 3) or null if not ranked.
+
+| Example | Data Source |
+|---------|-------------|
+| Top 3 voice rankings | `get_top_3_voices()` |
+| Character rankings | `get_character_ranking()` |
+
+```
+shape: (5, 3)
+┌───────────┬──────────────────┬──────────────────┐
+│ _recordId │ Top_3__V14       │ Top_3__V04       │
+│ str       │ i64              │ i64              │
+├───────────┼──────────────────┼──────────────────┤
+│ R_001     │ 1                │ null             │  ← V14 was ranked 1st
+│ R_002     │ 2                │ 1                │  ← V04 was ranked 1st
+│ R_003     │ null             │ 3                │  ← V04 was ranked 3rd
+```
+
+### ⚠️ Aggregated Data (Cannot Test!)
+**What it looks like:** Already summarized/totaled data.
+
+```
+shape: (3, 2)
+┌───────────┬────────────────┐
+│ Character │ Weighted Score │  ← ALREADY AGGREGATED
+│ str       │ i64            │     Lost individual variance
+├───────────┼────────────────┤     Cannot do significance tests!
+│ V14       │ 209            │
+│ V04       │ 180            │
+```
+
+**Solution:** Go back to the raw data before aggregation.
+
+---
+
+## Available Tests
+
+### 1. Mann-Whitney U Test (Default for Continuous)
+**Use when:** Comparing scores/ratings between groups  
+**Assumes:** Nothing about distribution shape (non-parametric)  
+**Best for:** Most survey data, Likert scales, ratings
+
+```python
+pairwise_df, meta = S.compute_pairwise_significance(
+    voice_data, 
+    test_type="mannwhitney"  # This is the default
+)
+```
+
+**Pros:**
+- Works with any distribution shape
+- Robust to outliers
+- Safe choice when unsure
+
+**Cons:**
+- Slightly less powerful than t-test when data IS normally distributed
+
+---
+
+### 2. Independent t-Test
+**Use when:** Comparing means between groups  
+**Assumes:** Data is approximately normally distributed  
+**Best for:** Large samples (n > 30 per group), truly continuous data
+
+```python
+pairwise_df, meta = S.compute_pairwise_significance(
+    voice_data, 
+    test_type="ttest"
+)
+```
+
+**Pros:**
+- Most powerful when assumptions are met
+- Well-understood, commonly reported
+
+**Cons:**
+- Can give misleading results if data is skewed
+- Sensitive to outliers
+
+---
+
+### 3. Chi-Square Test
+**Use when:** Comparing frequency distributions  
+**Assumes:** Expected counts ≥ 5 in each cell  
+**Best for:** Count data, categorical comparisons
+
+```python
+pairwise_df, meta = S.compute_pairwise_significance(
+    count_data, 
+    test_type="chi2"
+)
+```
+
+**Pros:**
+- Designed for count/frequency data
+- Tests if distributions differ
+
+**Cons:**
+- Needs sufficient sample sizes
+- Less informative about direction of difference
+
+---
+
+### 4. Two-Proportion Z-Test (For Rankings)
+**Use when:** Comparing ranking vote proportions  
+**Automatically used by:** `compute_ranking_significance()`
+
+```python
+pairwise_df, meta = S.compute_ranking_significance(ranking_data)
+```
+
+**What it tests:** "Does Voice A get a significantly different proportion of Rank 1 votes than Voice B?"
+
+---
+
+## Multiple Comparison Corrections
+
+### Why Do We Need Corrections?
+
+When you compare many groups, you're doing many tests. Each test has a 5% chance of a false positive (if α = 0.05). With 17 voices:
+
+| Comparisons | Expected False Positives (no correction) |
+|-------------|------------------------------------------|
+| 136 pairs   | ~7 false "significant" results!          |
+
+**Corrections adjust p-values to account for this.**
+
+---
+
+### Bonferroni Correction (Conservative)
+**Formula:** `p_adjusted = p_value × number_of_comparisons`
+
+```python
+pairwise_df, meta = S.compute_pairwise_significance(
+    data, 
+    correction="bonferroni"  # This is the default
+)
+```
+
+**Use when:**
+- You want to be very confident about significant results
+- False positives are costly (publishing, major decisions)
+- You have few comparisons (< 20)
+
+**Trade-off:** May miss real differences (more false negatives)
+
+---
+
+### Holm-Bonferroni Correction (Less Conservative)
+**Formula:** Step-down procedure that's less strict than Bonferroni
+
+```python
+pairwise_df, meta = S.compute_pairwise_significance(
+    data, 
+    correction="holm"
+)
+```
+
+**Use when:**
+- You have many comparisons
+- You want better power to detect real differences
+- Exploratory analysis where missing a real effect is costly
+
+**Trade-off:** Slightly higher false positive risk than Bonferroni
+
+---
+
+### No Correction
+**Not recommended for final analysis**, but useful for exploration.
+
+```python
+pairwise_df, meta = S.compute_pairwise_significance(
+    data, 
+    correction="none"
+)
+```
+
+**Use when:**
+- Initial exploration only
+- You'll follow up with specific hypotheses
+- You understand and accept the inflated false positive rate
+
+---
+
+### Correction Method Comparison
+
+| Method | Strictness | Best For | Risk |
+|--------|------------|----------|------|
+| Bonferroni | Most strict | Few comparisons, high stakes | Miss real effects |
+| Holm | Moderate | Many comparisons, balanced approach | Slightly more false positives |
+| None | No control | Exploration only | Many false positives |
+
+**Recommendation for Voice Branding:** Use **Holm** for exploratory analysis, **Bonferroni** for final reporting.
+
+---
+
+## Interpreting Results
+
+### Key Output Columns
+
+| Column | Meaning |
+|--------|---------|
+| `p_value` | Raw probability this difference happened by chance |
+| `p_adjusted` | Corrected p-value (use this for decisions!) |
+| `significant` | TRUE if p_adjusted < alpha (usually 0.05) |
+| `effect_size` | How big is the difference (practical significance) |
+
+### What the p-value Means
+
+| p-value | Interpretation |
+|---------|----------------|
+| < 0.001 | Very strong evidence of difference |
+| < 0.01 | Strong evidence |
+| < 0.05 | Moderate evidence (traditional threshold) |
+| 0.05 - 0.10 | Weak evidence, "trending" |
+| > 0.10 | No significant evidence |
+
+### Statistical vs Practical Significance
+
+**Statistical significance** (p < 0.05) means the difference is unlikely due to chance.
+
+**Practical significance** (effect size) means the difference matters in the real world.
+
+| Effect Size (Cohen's d) | Interpretation |
+|-------------------------|----------------|
+| < 0.2 | Small (may not matter practically) |
+| 0.2 - 0.5 | Medium |
+| 0.5 - 0.8 | Large |
+| > 0.8 | Very large |
+
+**Example:** A p-value of 0.001 with effect size of 0.1 means "we're confident there's a difference, but it's tiny."
+
+---
+
+## Code Examples
+
+### Example 1: Voice Scale Ratings
+
+```python
+# Get the raw rating data
+voice_data, _ = S.get_voice_scale_1_10(data)
+
+# Test for significant differences
+pairwise_df, meta = S.compute_pairwise_significance(
+    voice_data,
+    test_type="mannwhitney",  # Safe default for ratings
+    alpha=0.05,
+    correction="bonferroni"
+)
+
+# Check overall test first
+print(f"Overall test: {meta['overall_test']}")
+print(f"Overall p-value: {meta['overall_p_value']:.4f}")
+
+# If overall is significant, look at pairwise
+if meta['overall_p_value'] < 0.05:
+    sig_pairs = pairwise_df.filter(pl.col('significant') == True)
+    print(f"Found {sig_pairs.height} significant pairwise differences")
+
+# Visualize
+S.plot_significance_heatmap(pairwise_df, metadata=meta)
+```
+
+### Example 2: Top 3 Voice Rankings
+
+```python
+# Get the raw ranking data (NOT the weighted scores!)
+ranking_data, _ = S.get_top_3_voices(data)
+
+# Test for significant differences in Rank 1 proportions
+pairwise_df, meta = S.compute_ranking_significance(
+    ranking_data,
+    alpha=0.05,
+    correction="holm"  # Less conservative for many comparisons
+)
+
+# Check chi-square test
+print(f"Chi-square p-value: {meta['chi2_p_value']:.4f}")
+
+# View contingency table (Rank 1, 2, 3 counts per voice)
+for voice, counts in meta['contingency_table'].items():
+    print(f"{voice}: R1={counts[0]}, R2={counts[1]}, R3={counts[2]}")
+
+# Find significant pairs
+sig_pairs = pairwise_df.filter(pl.col('significant') == True)
+print(sig_pairs)
+```
+
+### Example 3: Comparing Demographic Subgroups
+
+```python
+# Filter to specific demographics
+S.filter_data(data, consumer=['Early Professional'])
+early_pro_data, _ = S.get_voice_scale_1_10(data)
+
+S.filter_data(data, consumer=['Established Professional'])
+estab_pro_data, _ = S.get_voice_scale_1_10(data)
+
+# Test each group separately, then compare results qualitatively
+# (For direct group comparison, you'd need a different test design)
+```
+
+---
+
+## Common Mistakes to Avoid
+
+### ❌ Using Aggregated Data
+```python
+# WRONG - already summarized, lost individual variance
+weighted_scores = calculate_weighted_ranking_scores(ranking_data)
+S.compute_pairwise_significance(weighted_scores)  # Will fail!
+```
+
+### ✅ Use Raw Data
+```python
+# RIGHT - use raw data before aggregation
+ranking_data, _ = S.get_top_3_voices(data)
+S.compute_ranking_significance(ranking_data)
+```
+
+### ❌ Ignoring Multiple Comparisons
+```python
+# WRONG - 7% of pairs will be "significant" by chance alone!
+S.compute_pairwise_significance(data, correction="none")
+```
+
+### ✅ Apply Correction
+```python
+# RIGHT - corrected p-values control false positives
+S.compute_pairwise_significance(data, correction="bonferroni")
+```
+
+### ❌ Only Reporting p-values
+```python
+# WRONG - statistical significance isn't everything
+print(f"p = {p_value}")  # Missing context!
+```
+
+### ✅ Report Effect Sizes Too
+```python
+# RIGHT - include practical significance
+print(f"p = {p_value}, effect size = {effect_size}")
+print(f"Mean difference: {mean1 - mean2:.2f} points")
+```
+
+---
+
+## Quick Reference Card
+
+| Data Type | Function | Default Test | Recommended Correction |
+|-----------|----------|--------------|------------------------|
+| Ratings (1-10) | `compute_pairwise_significance()` | Mann-Whitney U | Bonferroni |
+| Rankings (1st/2nd/3rd) | `compute_ranking_significance()` | Proportion Z | Holm |
+| Count frequencies | `compute_pairwise_significance(test_type="chi2")` | Chi-square | Bonferroni |
+
+| Scenario | Correction |
+|----------|------------|
+| Publishing results | Bonferroni |
+| Client presentation | Bonferroni |
+| Exploratory analysis | Holm |
+| Quick internal check | Holm or None |
+
+---
+
+## Further Reading
+
+- [Statistics for Dummies Cheat Sheet](https://www.dummies.com/article/academics-the-arts/math/statistics/statistics-for-dummies-cheat-sheet-208650/)
+- [Choosing the Right Statistical Test](https://stats.oarc.ucla.edu/other/mult-pkg/whatstat/)
+- [Multiple Comparisons Problem (Wikipedia)](https://en.wikipedia.org/wiki/Multiple_comparisons_problem)
--- a/plots.py
+++ b/plots.py
--- a/potential_dataset_issues.md
+++ b/potential_dataset_issues.md
@@ -0,0 +1,3 @@
+- V46 not in scale 1-10. Qualtrics 
+- Straightliners
+- V45 goed in qual maar slecht in quant
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -7,6 +7,7 @@ requires-python = ">=3.12"
 dependencies = [
    "altair>=6.0.0",
    "imagehash>=4.3.1",
+    "jupyter>=1.1.1",
    "marimo>=0.18.0",
    "matplotlib>=3.10.8",
    "modin[dask]>=0.37.1",
@@ -22,9 +23,14 @@ dependencies = [
    "python-pptx>=1.0.2",
    "pyzmq>=27.1.0",
    "requests>=2.32.5",
+    "scipy>=1.14.0",
    "taguette>=1.5.1",
+    "tqdm>=4.66.0",
    "vl-convert-python>=1.9.0.post1",
    "wordcloud>=1.9.5",
 ]

+[project.scripts]
+quant-report-batch = "run_filter_combinations:main"
+

--- a/reference.py
+++ b/reference.py
@@ -0,0 +1,59 @@
+ORIGINAL_CHARACTER_TRAITS = {
+    "the_familiar_friend": [
+        "Warm",
+        "Friendly",
+        "Approachable",
+        "Familiar",
+        "Casual",
+        "Appreciative",
+        "Benevolent",
+    ],
+    "the_coach": [
+        "Empowering",
+        "Encouraging",
+        "Caring",
+        "Positive",
+        "Optimistic",
+        "Guiding",
+        "Reassuring",
+    ],
+    "the_personal_assistant": [
+        "Forward-thinking",
+        "Progressive",
+        "Cooperative",
+        "Intentional",
+        "Resourceful",
+        "Attentive",
+        "Adaptive",
+    ],
+    "the_bank_teller": [
+        "Patient",
+        "Grounded",
+        "Down-to-earth",
+        "Stable",
+        "Formal",
+        "Balanced",
+        "Efficient",
+    ]
+}
+
+VOICE_GENDER_MAPPING = {
+    "V14": "Female",
+    "V04": "Female",
+    "V08": "Female",
+    "V77": "Female",
+    "V48": "Female",
+    "V82": "Female",
+    "V89": "Female",
+    "V91": "Female",
+    "V34": "Male",
+    "V69": "Male",
+    "V45": "Male",
+    "V46": "Male",
+    "V54": "Male",
+    "V74": "Male",
+    "V81": "Male",
+    "V86": "Male",
+    "V88": "Male",
+    "V16": "Male",
+}
--- a/run_filter_combinations.py
+++ b/run_filter_combinations.py
@@ -0,0 +1,306 @@
+#!/usr/bin/env python
+"""
+Batch runner for quant report with different filter combinations.
+
+Runs 03_quant_report.script.py for each single-filter combination:
+- Each age group (with all others active)
+- Each gender (with all others active)
+- Each ethnicity (with all others active)
+- Each income group (with all others active)
+- Each consumer segment (with all others active)
+
+Usage:
+    uv run python run_filter_combinations.py
+    uv run python run_filter_combinations.py --dry-run  # Preview combinations without running
+    uv run python run_filter_combinations.py --category age  # Only run age combinations
+    uv run python run_filter_combinations.py --category consumer  # Only run consumer segment combinations
+"""
+
+import subprocess
+import sys
+import json
+from pathlib import Path
+
+from tqdm import tqdm
+
+from utils import QualtricsSurvey
+
+
+# Default data paths (same as in 03_quant_report.script.py)
+RESULTS_FILE = 'data/exports/2-2-26/JPMC_Chase Brand Personality_Quant Round 1_February 2, 2026_Labels.csv'
+QSF_FILE = 'data/exports/OneDrive_2026-01-21/Soft Launch Data/JPMC_Chase_Brand_Personality_Quant_Round_1.qsf'
+
+REPORT_SCRIPT = Path(__file__).parent / '03_quant_report.script.py'
+
+
+def get_filter_combinations(survey: QualtricsSurvey, category: str = None) -> list[dict]:
+    """
+    Generate all single-filter combinations.
+    
+    Each combination isolates ONE filter value while keeping all others at "all selected".
+    
+    Args:
+        survey: QualtricsSurvey instance with loaded data
+        category: Optional filter category to limit combinations to.
+                  Valid values: 'all', 'age', 'gender', 'ethnicity', 'income', 'consumer',
+                               'business_owner', 'ai_user', 'investable_assets', 'industry'
+                  If None or 'all', generates all combinations.
+    
+    Returns:
+        List of dicts with filter kwargs for each run.
+    """
+    combinations = []
+    
+    # Add "All Respondents" run (no filters = all options selected)
+    if not category or category in ['all_filters', 'all']:
+        combinations.append({
+            'name': 'All_Respondents',
+            'filters': {}  # Empty = use defaults (all selected)
+        })
+    
+    # Age groups - one at a time
+    if not category or category in ['all_filters', 'age']:
+        for age in survey.options_age:
+            combinations.append({
+                'name': f'Age-{age}',
+                'filters': {'age': [age]}
+            })
+    
+    # Gender - one at a time
+    if not category or category in ['all_filters', 'gender']:
+        for gender in survey.options_gender:
+            combinations.append({
+                'name': f'Gender-{gender}',
+                'filters': {'gender': [gender]}
+            })
+    
+    # Ethnicity - grouped by individual values
+    if not category or category in ['all_filters', 'ethnicity']:
+        # Ethnicity options are comma-separated (e.g., "White or Caucasian, Hispanic or Latino")
+        # Create filters that include ALL options containing each individual ethnicity value
+        ethnicity_values = set()
+        for ethnicity_option in survey.options_ethnicity:
+            # Split by comma and strip whitespace
+            values = [v.strip() for v in ethnicity_option.split(',')]
+            ethnicity_values.update(values)
+        
+        for ethnicity_value in sorted(ethnicity_values):
+            # Find all options that contain this value
+            matching_options = [
+                opt for opt in survey.options_ethnicity 
+                if ethnicity_value in [v.strip() for v in opt.split(',')]
+            ]
+            combinations.append({
+                'name': f'Ethnicity-{ethnicity_value}',
+                'filters': {'ethnicity': matching_options}
+            })
+    
+    # Income - one at a time
+    if not category or category in ['all_filters', 'income']:
+        for income in survey.options_income:
+            combinations.append({
+                'name': f'Income-{income}',
+                'filters': {'income': [income]}
+            })
+    
+    # Consumer segments - combine _A and _B options, and also include standalone
+    if not category or category in ['all_filters', 'consumer']:
+        # Group options by base name (removing _A/_B suffix)
+        consumer_groups = {}
+        for consumer in survey.options_consumer:
+            # Check if ends with _A or _B
+            if consumer.endswith('_A') or consumer.endswith('_B'):
+                base_name = consumer[:-2]  # Remove last 2 chars (_A or _B)
+                if base_name not in consumer_groups:
+                    consumer_groups[base_name] = []
+                consumer_groups[base_name].append(consumer)
+            else:
+                # Not an _A/_B option, keep as-is
+                consumer_groups[consumer] = [consumer]
+        
+        # Add combined _A+_B options
+        for base_name, options in consumer_groups.items():
+            if len(options) > 1:  # Only combine if there are multiple (_A and _B)
+                combinations.append({
+                    'name': f'Consumer-{base_name}',
+                    'filters': {'consumer': options}
+                })
+        
+        # Add standalone options (including individual _A and _B)
+        for consumer in survey.options_consumer:
+            combinations.append({
+                'name': f'Consumer-{consumer}',
+                'filters': {'consumer': [consumer]}
+            })
+    
+    # Business Owner - one at a time
+    if not category or category in ['all_filters', 'business_owner']:
+        for business_owner in survey.options_business_owner:
+            combinations.append({
+                'name': f'BusinessOwner-{business_owner}',
+                'filters': {'business_owner': [business_owner]}
+            })
+    
+    # AI User - one at a time
+    if not category or category in ['all_filters', 'ai_user']:
+        for ai_user in survey.options_ai_user:
+            combinations.append({
+                'name': f'AIUser-{ai_user}',
+                'filters': {'ai_user': [ai_user]}
+            })
+        
+        # AI user daily, more than once daily, en multiple times a week = frequent
+        combinations.append({
+            'name': 'AIUser-Frequent',
+            'filters': {'ai_user': [
+                'Daily', 'More than once daily', 'Multiple times per week'
+            ]}
+        })
+        combinations.append({
+            'name': 'AIUser-RarelyNever',
+            'filters': {'ai_user': [
+                'Once a month', 'Less than once a month', 'Once a week', 'Rarely/Never'
+            ]}
+        })
+    
+    # Investable Assets - one at a time
+    if not category or category in ['all_filters', 'investable_assets']:
+        for investable_assets in survey.options_investable_assets:
+            combinations.append({
+                'name': f'Assets-{investable_assets}',
+                'filters': {'investable_assets': [investable_assets]}
+            })
+    
+    # Industry - one at a time
+    if not category or category in ['all_filters', 'industry']:
+        for industry in survey.options_industry:
+            combinations.append({
+                'name': f'Industry-{industry}',
+                'filters': {'industry': [industry]}
+            })
+    
+    # Voice ranking completeness filter
+    # These use a special flag rather than demographic filters, so we store
+    # the mode in a dedicated key that run_report passes as --voice-ranking-filter.
+    if not category or category in ['all_filters', 'voice_ranking']:
+        combinations.append({
+            'name': 'VoiceRanking-OnlyMissing',
+            'filters': {},
+            'voice_ranking_filter': 'only-missing',
+        })
+        combinations.append({
+            'name': 'VoiceRanking-ExcludeMissing',
+            'filters': {},
+            'voice_ranking_filter': 'exclude-missing',
+        })
+    
+    return combinations
+
+
+def run_report(filters: dict, name: str = None, dry_run: bool = False, sl_threshold: int = None, voice_ranking_filter: str = None) -> bool:
+    """
+    Run the report script with given filters.
+    
+    Args:
+        filters: Dict of filter_name -> list of values
+        name: Name for this filter combination (used for .txt description file)
+        dry_run: If True, just print command without running
+        sl_threshold: If set, exclude respondents with >= N straight-lined question groups
+        voice_ranking_filter: If set, filter by voice ranking completeness.
+            'only-missing' keeps only respondents missing QID98 data,
+            'exclude-missing' removes them.
+        
+    Returns:
+        True if successful, False otherwise
+    """
+    cmd = [sys.executable, str(REPORT_SCRIPT)]
+    
+    # Add filter-name for description file
+    if name:
+        cmd.extend(['--filter-name', name])
+    
+    # Pass straight-liner threshold if specified
+    if sl_threshold is not None:
+        cmd.extend(['--sl-threshold', str(sl_threshold)])
+    
+    # Pass voice ranking filter if specified
+    if voice_ranking_filter is not None:
+        cmd.extend(['--voice-ranking-filter', voice_ranking_filter])
+    
+    for filter_name, values in filters.items():
+        if values:
+            cmd.extend([f'--{filter_name}', json.dumps(values)])
+    
+    if dry_run:
+        print(f"  Would run: {' '.join(cmd)}")
+        return True
+    
+    try:
+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            cwd=Path(__file__).parent
+        )
+        if result.returncode != 0:
+            print(f"\n  ERROR: {result.stderr[:500]}")
+            return False
+        return True
+    except Exception as e:
+        print(f"\n  ERROR: {e}")
+        return False
+
+
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description='Run quant report for all filter combinations')
+    parser.add_argument('--dry-run', action='store_true', help='Preview combinations without running')
+    parser.add_argument(
+        '--category',
+        choices=['all_filters', 'all', 'age', 'gender', 'ethnicity', 'income', 'consumer', 'business_owner', 'ai_user', 'investable_assets', 'industry', 'voice_ranking'],
+        default='all_filters',
+        help='Filter category to run combinations for (default: all_filters)'
+    )
+    parser.add_argument('--sl-threshold', type=int, default=None, help='Exclude respondents who straight-lined >= N question groups (passed to report script)')
+    args = parser.parse_args()
+    
+    # Load survey to get available filter options
+    print("Loading survey to get filter options...")
+    survey = QualtricsSurvey(RESULTS_FILE, QSF_FILE)
+    survey.load_data()  # Populates options_* attributes
+    
+    # Generate combinations for specified category
+    combinations = get_filter_combinations(survey, category=args.category)
+    category_desc = f" for category '{args.category}'" if args.category != 'all' else ''
+    print(f"Generated {len(combinations)} filter combinations{category_desc}")
+    
+    if args.sl_threshold is not None:
+        print(f"Straight-liner threshold: excluding respondents with ≥{args.sl_threshold} straight-lined question groups")
+
+    if args.dry_run:
+        print("\nDRY RUN - Commands that would be executed:")
+        for combo in combinations:
+            print(f"\n{combo['name']}:")
+            run_report(combo['filters'], name=combo['name'], dry_run=True, sl_threshold=args.sl_threshold, voice_ranking_filter=combo.get('voice_ranking_filter'))
+        return
+    
+    # Run each combination with progress bar
+    successful = 0
+    failed = []
+    
+    for combo in tqdm(combinations, desc="Running reports", unit="filter"):
+        tqdm.write(f"Running: {combo['name']}")
+        if run_report(combo['filters'], name=combo['name'], sl_threshold=args.sl_threshold, voice_ranking_filter=combo.get('voice_ranking_filter')):
+            successful += 1
+        else:
+            failed.append(combo['name'])
+    
+    # Summary
+    print(f"\n{'='*50}")
+    print(f"Completed: {successful}/{len(combinations)} successful")
+    if failed:
+        print(f"Failed: {', '.join(failed)}")
+
+
+if __name__ == '__main__':
+    main()
--- a/speech_data_correlation.ipynb
+++ b/speech_data_correlation.ipynb
--- a/theme.py
+++ b/theme.py
@@ -19,11 +19,32 @@ class ColorPalette:
    # Neutral color for unhighlighted comparison items
    NEUTRAL = "#D3D3D3"  # Light Grey

+    # Character-specific colors (for individual character plots)
+    # Each character has a main color and a lighter highlight for original traits
+    CHARACTER_BANK_TELLER = "#004C6D"              # Dark Blue
+    CHARACTER_BANK_TELLER_HIGHLIGHT = "#669BBC"    # Light Steel Blue
+    
+    CHARACTER_FAMILIAR_FRIEND = "#008493"          # Teal
+    CHARACTER_FAMILIAR_FRIEND_HIGHLIGHT = "#A8DADC" # Pale Cyan
+    
+    CHARACTER_COACH = "#5AAE95"                    # Sea Green
+    CHARACTER_COACH_HIGHLIGHT = "#A8DADC"          # Pale Cyan
+    
+    CHARACTER_PERSONAL_ASSISTANT = "#457B9D"       # Steel Blue
+    CHARACTER_PERSONAL_ASSISTANT_HIGHLIGHT = "#669BBC" # Light Steel Blue
+
    # General UI elements
    TEXT = "black"
    GRID = "lightgray"
    BACKGROUND = "white"

+    # Statistical significance colors (for heatmaps/annotations)
+    SIG_STRONG = "#004C6D"       # p < 0.001 - Dark Blue (highly significant)
+    SIG_MODERATE = "#0077B6"     # p < 0.01 - Medium Blue (significant)
+    SIG_WEAK = "#5AAE95"         # p < 0.05 - Sea Green (marginally significant)
+    SIG_NONE = "#E8E8E8"         # p >= 0.05 - Light Grey (not significant)
+    SIG_DIAGONAL = "#FFFFFF"     # White for diagonal (self-comparison)
+
    # Extended palette for categorical charts (e.g., pie charts with many categories)
    CATEGORICAL = [
        "#0077B6",  # PRIMARY - Medium Blue
@@ -38,6 +59,37 @@ class ColorPalette:
        "#457B9D",  # Steel Blue
    ]

+    # Gender-based colors (Male = Blue tones, Female = Pink tones)
+    # Primary colors by gender
+    GENDER_MALE = "#0077B6"       # Medium Blue (same as PRIMARY)
+    GENDER_FEMALE = "#B6007A"     # Medium Pink
+
+    # Ranking colors by gender (Darkest -> Lightest)
+    GENDER_MALE_RANK_1 = "#004C6D"    # Dark Blue
+    GENDER_MALE_RANK_2 = "#0077B6"    # Medium Blue
+    GENDER_MALE_RANK_3 = "#669BBC"    # Light Steel Blue
+
+    GENDER_FEMALE_RANK_1 = "#6D004C"  # Dark Pink
+    GENDER_FEMALE_RANK_2 = "#B6007A"  # Medium Pink
+    GENDER_FEMALE_RANK_3 = "#BC669B"  # Light Pink
+
+    # Neutral colors by gender (for non-highlighted items)
+    GENDER_MALE_NEUTRAL = "#B8C9D9"   # Grey-Blue
+    GENDER_FEMALE_NEUTRAL = "#D9B8C9" # Grey-Pink
+
+    # Gender colors for correlation plots (green/red indicate +/- correlation)
+    # Male = darker shade, Female = lighter shade
+    CORR_MALE_POSITIVE = "#1B5E20"     # Dark Green
+    CORR_FEMALE_POSITIVE = "#81C784"   # Light Green
+    CORR_MALE_NEGATIVE = "#B71C1C"     # Dark Red
+    CORR_FEMALE_NEGATIVE = "#E57373"   # Light Red
+
+    # Speaking Style Colors (named after the style quadrant colors)
+    STYLE_GREEN = "#2E7D32"   # Forest Green
+    STYLE_BLUE = "#1565C0"    # Strong Blue
+    STYLE_ORANGE = "#E07A00"  # Burnt Orange
+    STYLE_RED = "#C62828"     # Deep Red
+

 def jpmc_altair_theme():
    """JPMC brand theme for Altair charts."""
--- a/utils.py
+++ b/utils.py
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
Luigi Maiorano	03a716e8ec	correlation matrix speech characteristics vs score	2026-02-10 16:50:47 +01:00
Luigi Maiorano	8720bb670d	started speech data notebook	2026-02-10 14:58:13 +01:00
Luigi Maiorano	9dfab75925	missing data analysis	2026-02-10 14:24:26 +01:00
Luigi Maiorano	14e28cf368	stat significance nr times ranked 1st	2026-02-09 18:37:41 +01:00
Luigi Maiorano	8e181e193a	SL filter	2026-02-09 17:57:04 +01:00
Luigi Maiorano	6c16993cb3	straight-liner plot analysis	2026-02-09 17:26:45 +01:00
Luigi Maiorano	92c6fc03ab	docs datasets	2026-02-09 13:17:59 +01:00
Luigi Maiorano	7fb6570190	statistical significance	2026-02-05 19:49:19 +01:00
Luigi Maiorano	840bd2940d	other top bc's	2026-02-05 11:50:00 +01:00
Luigi Maiorano	af9a15ccb0	renamed notebooks and added significance test	2026-02-05 10:14:53 +01:00
Luigi Maiorano	a3cf9f103d	update plots with final data release	2026-02-04 21:15:03 +01:00
Luigi Maiorano	f0eab32c34	update alt-text with full filepaths	2026-02-04 17:48:48 +01:00
Luigi Maiorano	d231fc02db	fix missing filter descr in correlation plots	2026-02-04 14:48:14 +01:00
Luigi Maiorano	fc76bb0ab5	voice gender split correlation plots	2026-02-04 13:44:51 +01:00
Luigi Maiorano	ab78276a97	male/female voices in separate plots for correlations	2026-02-04 12:35:24 +01:00
Luigi Maiorano	e17646eb70	correlation plots for best bc	2026-02-04 10:46:31 +01:00
Luigi Maiorano	ad1d8c6e58	all plots offline update	2026-02-03 22:38:15 +01:00
Luigi Maiorano	f5b4c247b8	tidy plots	2026-02-03 22:12:17 +01:00
Luigi Maiorano	a35670aa72	fixed missing ai_user category	2026-02-03 21:13:29 +01:00
Luigi Maiorano	36280a6ff8	fix sample size	2026-02-03 20:48:34 +01:00
Luigi Maiorano	9a587dcc4c	add ai-user filter combinations	2026-02-03 19:46:07 +01:00
Luigi Maiorano	9a49d1c690	added sample size to filter text	2026-02-03 19:16:39 +01:00
Luigi Maiorano	8f505da550	offline update 18-30	2026-02-03 18:43:20 +01:00
Luigi Maiorano	495b56307c	fixed filter to none	2026-02-03 18:19:06 +01:00
Luigi Maiorano	1e76a82f24	fix wordcloud filter values	2026-02-03 17:41:12 +01:00
Luigi Maiorano	01b7d50637	fixed empty plots, updated filters	2026-02-03 16:51:24 +01:00
Luigi Maiorano	dca9ac11ba	supposed wordcloud fix, but everything broke	2026-02-03 15:36:35 +01:00
Luigi Maiorano	081fb0dd6e	added 6 more filters	2026-02-03 15:20:01 +01:00
Luigi Maiorano	2817ed240a	automatic generation of all plots with all combinations	2026-02-03 15:03:57 +01:00
Luigi Maiorano	e44251c3d6	fixed consumer and ethnicity filter combinations	2026-02-03 14:43:03 +01:00
Luigi Maiorano	8dd41dfc96	Start automation of running filter combinations	2026-02-03 14:33:09 +01:00
Luigi Maiorano	840cb4e6dc	exported marimo to script form	2026-02-03 13:48:05 +01:00
Luigi Maiorano	a162701e94	move cell for better running	2026-02-03 02:22:06 +01:00
Luigi Maiorano	38f6d8a87c	fixed for basic plots, filter active	2026-02-03 02:19:47 +01:00
Luigi Maiorano	5c39bbb23a	images tagged	2026-02-03 02:05:29 +01:00
Luigi Maiorano	190e4fbdc4	finished correlation plots and generic voice plots	2026-02-03 01:59:26 +01:00
Luigi Maiorano	2408d06098	base correlations	2026-02-03 01:32:06 +01:00
Luigi Maiorano	1dce4db909	voice gender plots done	2026-02-03 01:03:29 +01:00
Luigi Maiorano	acf9c45844	male/female colored plots	2026-02-03 00:40:51 +01:00
Luigi Maiorano	77fdd6e8f6	fixed voices plot order	2026-02-03 00:20:56 +01:00
Luigi Maiorano	426495ebe3	generic voice plots	2026-02-03 00:15:10 +01:00
Luigi Maiorano	a7ee854ed0	voice plots	2026-02-03 00:12:18 +01:00
Luigi Maiorano	97c4b07208	added filter disabled broken cells and starting spoken voice generic results	2026-02-02 23:32:10 +01:00
Luigi Maiorano	fd14038253	comment out 'per subgroup' since these just take too long to create	2026-02-02 23:22:09 +01:00
Luigi Maiorano	611fc8d19a	added var split_group	2026-02-02 23:15:05 +01:00
Luigi Maiorano	3ac330263f	BC results per consumer	2026-02-02 23:04:40 +01:00
Luigi Maiorano	bda4d54231	split consumer groups best character	2026-02-02 22:05:56 +01:00
Luigi Maiorano	f2c659c266	statistical tests	2026-02-02 21:47:37 +01:00
Luigi Maiorano	29df6a4bd9	og traits	2026-02-02 18:37:45 +01:00
Luigi Maiorano	a62524c6e4	update plot agent with explicit things not to do	2026-02-02 18:26:23 +01:00
Luigi Maiorano	43b41a01f5	plot creator agent	2026-02-02 17:57:19 +01:00
Luigi Maiorano	b7cf6adfb8	fix ppt update images	2026-02-02 17:36:32 +01:00