Files

Luigi Maiorano 6ba30ff041 add copilot instructions and rename classes

2026-02-02 17:21:57 +01:00

43 KiB

Raw Blame History

Altair Migration Plan: Plotly → Altair for QualtricsPlotsMixin

Date: January 28, 2026
Status: Not Started
Objective: Migrate all plotting methods from Plotly to Altair to solve filter annotation overlap issues and ensure proper Marimo reactivity.

Background

Problem

Current Plotly implementation has a critical layout issue: filter annotations overlap with long rotated x-axis labels because Plotly doesn't support true bounding boxes. Elements overflow their assigned subplot areas.

Why Altair?

Better layout control - Vega-Lite (Altair's backend) properly calculates space for all elements
Marimo reactivity - Marimo documentation states reactive plots require Altair or Plotly; Altair is preferred
Clean separation - vconcat() creates true vertical stacking without overflow
Already installed - Altair >=6.0.0 is already a dependency (unused)

Current System Analysis

File Structure

plots.py - Contains QualtricsPlotsMixin class with 10 plotting methods
theme.py - Contains ColorPalette class with all styling constants
utils.py - Contains QualtricsSurvey class that mixes in QualtricsPlotsMixin

Color Palette (from theme.py)

class ColorPalette:
    PRIMARY = "#0077B6"    # Medium Blue
    RANK_1 = "#004C6D"     # Dark Blue
    RANK_2 = "#008493"     # Teal
    RANK_3 = "#5AAE95"     # Sea Green
    RANK_4 = "#9E9E9E"     # Grey
    NEUTRAL = "#D3D3D3"    # Light Grey
    TEXT = "black"
    GRID = "lightgray"
    BACKGROUND = "white"

Current Plot Methods Inventory

Method Name	Chart Type	Input Format	Special Features
`plot_average_scores_with_counts`	Vertical Bar	Wide DF (score columns)	Text inside bars (count)
`plot_top3_ranking_distribution`	Stacked Vertical Bar	Wide DF (rank values 1-3)	3-layer stack, legend
`plot_ranking_distribution`	Stacked Vertical Bar	Wide DF (rank values 1-4)	4-layer stack, legend
`plot_most_ranked_1`	Vertical Bar	Wide DF (ranking columns)	Top 3 highlighted
`plot_weighted_ranking_score`	Vertical Bar	`Character`, `Weighted Score`	Text inside bars
`plot_voice_selection_counts`	Vertical Bar	Comma-separated string col	Explode strings, Top 8 highlight
`plot_top3_selection_counts`	Vertical Bar	Comma-separated string col	Explode strings, Top 3 highlight
`plot_speaking_style_trait_scores`	Horizontal Bar	`Voice`, `score`, anchors	Text annotations at bottom
`plot_speaking_style_correlation`	Vertical Bar	Correlation data	Red/Green conditional
`plot_speaking_style_ranking_correlation`	Vertical Bar	Correlation data	Red/Green conditional

Filter System Components

_get_filter_slug() - Generates directory name from filters (e.g., Age-22to24_Gen-Man)
_get_filter_description() - Generates HTML text (e.g., Filters: Age: 22-24 Gender: Man)
_add_filter_footnote(fig) - Currently creates 2-row Plotly subplot, adds annotation
_save_plot(fig, title) - Adds footer, saves to figures/{slug}/{filename}.png

Common Styling Pattern

All plots use:

Height: 500px (default, can override)
Width: 1000px (default, can override)
Background: white
Grid: light gray
Font size: 11
X-axis: 45° rotated labels
Legends (where applicable): Horizontal, positioned above plot

Prerequisites

Dependencies to Add

uv add vl-convert-python  # For PNG export from Altair

Dependencies Already Present

altair>=6.0.0 ✅
polars>=1.37.1 ✅
pandas>=2.3.3 ✅

Migration Tasks

TASK 1: Create Altair Theme in theme.py

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/theme.py

Action: Add an Altair theme function and register it at the end of the file.

Code to add:

def jpmc_altair_theme():
    """JPMC brand theme for Altair charts."""
    return {
        'config': {
            'view': {
                'continuousWidth': 1000,
                'continuousHeight': 500,
                'strokeWidth': 0
            },
            'background': ColorPalette.BACKGROUND,
            'axis': {
                'grid': True,
                'gridColor': ColorPalette.GRID,
                'labelAngle': -45,  # Default rotated labels
                'labelFontSize': 11,
                'titleFontSize': 12,
                'labelColor': ColorPalette.TEXT,
                'titleColor': ColorPalette.TEXT
            },
            'axisX': {
                'labelAngle': -45
            },
            'axisY': {
                'labelAngle': 0
            },
            'legend': {
                'orient': 'top',
                'direction': 'horizontal',
                'titleFontSize': 11,
                'labelFontSize': 11
            },
            'title': {
                'fontSize': 14,
                'color': ColorPalette.TEXT,
                'anchor': 'start'
            },
            'bar': {
                'color': ColorPalette.PRIMARY
            }
        }
    }

# Register theme (add at end of file)
try:
    import altair as alt
    alt.themes.register('jpmc', jpmc_altair_theme)
    alt.themes.enable('jpmc')
except ImportError:
    pass  # Altair not installed

Verification:

Function jpmc_altair_theme() exists
Theme is registered as 'jpmc'
Theme is enabled by default
Import error is handled gracefully

TASK 2: Update imports in plots.py

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (lines 1-8)

Action: Replace Plotly imports with Altair imports.

Current code:

import plotly.graph_objects as go
from plotly.subplots import make_subplots

Replace with:

import altair as alt

Keep these imports:

import re
from pathlib import Path
import polars as pl
from theme import ColorPalette

Verification:

import altair as alt present
No Plotly imports remain
All other imports unchanged

TASK 3: Rewrite `_add_filter_footnote` for Altair

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~120-212)

Action: Replace entire _add_filter_footnote method with Altair version.

New implementation:

def _add_filter_footnote(self, chart: alt.Chart) -> alt.Chart:
    """Add a footnote with active filters to the chart.
    
    Creates a vconcat with main chart on top and filter text chart below.
    Returns the combined chart (or original if no filters).
    """
    filter_text = self._get_filter_description()
    
    # Skip if no filters active - return original chart
    if not filter_text:
        return chart
    
    # Remove HTML tags for plain text (Altair doesn't support HTML in mark_text)
    plain_text = re.sub(r'<[^>]+>', '', filter_text)
    # Replace <br> with newlines
    plain_text = plain_text.replace('<br>', '\n')
    
    # Create a text-only chart for the footer
    # Use a dummy dataframe with one row
    import pandas as pd
    footer_df = pd.DataFrame([{'text': plain_text, 'x': 0, 'y': 0}])
    
    footer_chart = alt.Chart(footer_df).mark_text(
        align='left',
        baseline='top',
        fontSize=9,
        color='gray',
        dx=5,  # Small left padding
        dy=5   # Small top padding
    ).encode(
        text='text:N'
    ).properties(
        height=60,  # Fixed height for footer
        width=chart.width if hasattr(chart, 'width') and chart.width else 1000
    )
    
    # Combine with vconcat
    combined = alt.vconcat(chart, footer_chart, spacing=10)
    
    return combined

Verification:

Method signature changed from fig: go.Figure to chart: alt.Chart
Returns alt.Chart instead of go.Figure
Uses vconcat for vertical stacking
HTML tags are stripped from filter text
Footer has fixed height
Spacing between chart and footer is set

TASK 4: Rewrite `_save_plot` for Altair

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~214-234)

Action: Replace _save_plot method for Altair chart saving.

New implementation:

def _save_plot(self, chart: alt.Chart, title: str) -> alt.Chart:
    """Save chart to PNG file if fig_save_dir is set.
    
    Returns the (potentially modified) chart with filter footnote added.
    """
    # Add filter footnote - returns combined chart if filters active
    chart = self._add_filter_footnote(chart)

    if hasattr(self, 'fig_save_dir') and self.fig_save_dir:
        path = Path(self.fig_save_dir)
        
        # Add filter slug subfolder
        filter_slug = self._get_filter_slug()
        path = path / filter_slug
        
        if not path.exists():
            path.mkdir(parents=True, exist_ok=True)
            
        filename = f"{self._sanitize_filename(title)}.png"
        
        # Save using vl-convert backend
        chart.save(str(path / filename), format='png', scale_factor=2.0)
    
    return chart

Verification:

Method signature changed from fig: go.Figure to chart: alt.Chart
Uses chart.save() instead of fig.write_image()
PNG format specified
Path handling unchanged (filter slug subdirectories)
Returns modified chart

TASK 5: Migrate `plot_average_scores_with_counts`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~248-313)

Action: Rewrite using Altair bar chart with text overlay.

Current behavior:

Input: Wide DataFrame with score columns
Output: Vertical bar chart with average scores, count text inside bars

New implementation:

def plot_average_scores_with_counts(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "General Impression (1-10)\nPer Voice with Number of Participants Who Rated It",
    x_label: str = "Stimuli",
    y_label: str = "Average General Impression Rating (1-10)",
    color: str = ColorPalette.PRIMARY,
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar plot showing average scores and count of non-null values for each column."""
    df = self._ensure_dataframe(data)

    # Calculate stats for each column (exclude _recordId)
    stats = []
    for col in [c for c in df.columns if c != '_recordId']:
        avg_score = df[col].mean()
        non_null_count = df[col].drop_nulls().len()
        # Extract voice ID from column name
        label = col.split('__')[-1] if '__' in col else col
        stats.append({
            'voice': label,
            'average': avg_score,
            'count': non_null_count
        })

    # Convert to pandas for Altair (sort by average descending)
    stats_df = pl.DataFrame(stats).sort('average', descending=True).to_pandas()

    # Base bar chart
    bars = alt.Chart(stats_df).mark_bar(color=color).encode(
        x=alt.X('voice:N', title=x_label, sort='-y'),
        y=alt.Y('average:Q', title=y_label, scale=alt.Scale(domain=[0, 10])),
        tooltip=[
            alt.Tooltip('voice:N', title='Voice'),
            alt.Tooltip('average:Q', title='Average', format='.2f'),
            alt.Tooltip('count:Q', title='Count')
        ]
    )

    # Text overlay for counts
    text = alt.Chart(stats_df).mark_text(
        dy=-5,  # Slight offset above bar
        color='black',
        fontSize=10
    ).encode(
        x=alt.X('voice:N', sort='-y'),
        y=alt.Y('average:Q'),
        text=alt.Text('count:Q')
    )

    # Combine layers
    chart = (bars + text).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart instead of go.Figure
Data transformed to long format (pandas DataFrame)
Bar chart created with mark_bar()
Text overlay added with mark_text()
Layers combined with + operator
Sorting preserved (by average descending)
Y-axis scale set to [0, 10]
Tooltip includes voice, average, count
Width/height properties set
_save_plot called at end

TASK 6: Migrate `plot_most_ranked_1`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~537-613)

Action: Rewrite using Altair with conditional coloring (top 3 highlighted).

Current behavior:

Input: Wide DataFrame with ranking columns
Output: Vertical bar chart, top 3 bars in PRIMARY color, rest in NEUTRAL

New implementation:

def plot_most_ranked_1(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Most Popular Choice\n(Number of Times Ranked 1st)",
    x_label: str = "Item",
    y_label: str = "Count of 1st Place Rankings",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar chart showing which item was ranked #1 the most. Top 3 highlighted."""
    df = self._ensure_dataframe(data)

    stats = []
    ranking_cols = [c for c in df.columns if c != '_recordId']

    for col in ranking_cols:
        count_rank_1 = df.filter(pl.col(col) == 1).height
        # Clean label
        label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
        stats.append({'item': label, 'count': count_rank_1})

    # Convert and sort
    stats_df = pl.DataFrame(stats).sort('count', descending=True)
    
    # Add rank column for coloring (1-3 vs 4+)
    stats_df = stats_df.with_row_index('rank_index')
    stats_df = stats_df.with_columns(
        pl.when(pl.col('rank_index') < 3)
        .then(pl.lit('Top 3'))
        .otherwise(pl.lit('Other'))
        .alias('category')
    ).to_pandas()

    # Bar chart with conditional color
    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X('item:N', title=x_label, sort='-y'),
        y=alt.Y('count:Q', title=y_label),
        color=alt.Color('category:N',
                       scale=alt.Scale(domain=['Top 3', 'Other'],
                                     range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
                       legend=None),
        tooltip=[
            alt.Tooltip('item:N', title='Item'),
            alt.Tooltip('count:Q', title='1st Place Votes')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Counts rank 1 occurrences per column
Adds category column for top 3 vs others
Uses conditional color via alt.Color() with custom scale
Tooltip shows item and count
Sorted by count descending
Legend hidden (color is self-explanatory)

TASK 7: Migrate `plot_weighted_ranking_score`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~615-662)

Action: Rewrite simple bar chart with text overlay.

Current behavior:

Input: DataFrame with Character and Weighted Score columns
Output: Vertical bar chart with score text inside bars

New implementation:

def plot_weighted_ranking_score(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Weighted Popularity Score\n(1st=3pts, 2nd=2pts, 3rd=1pt)",
    x_label: str = "Character Personality",
    y_label: str = "Total Weighted Score",
    color: str = ColorPalette.PRIMARY,
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar chart showing the weighted ranking score for each character."""
    weighted_df = self._ensure_dataframe(data).to_pandas()

    # Bar chart
    bars = alt.Chart(weighted_df).mark_bar(color=color).encode(
        x=alt.X('Character:N', title=x_label),
        y=alt.Y('Weighted Score:Q', title=y_label),
        tooltip=[
            alt.Tooltip('Character:N'),
            alt.Tooltip('Weighted Score:Q', title='Score')
        ]
    )

    # Text overlay
    text = bars.mark_text(
        dy=-5,
        color='white',
        fontSize=11
    ).encode(
        text='Weighted Score:Q'
    )

    chart = (bars + text).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Uses input columns as-is (Character, Weighted Score)
Text overlay with white color inside bars
Tooltip shows character and score

TASK 8: Migrate `plot_top3_ranking_distribution`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~315-415)

Action: Rewrite stacked bar chart (3 ranks).

Current behavior:

Input: Wide DataFrame with ranking columns (values 1, 2, 3)
Output: Stacked bar chart with 3 layers (Rank 1, 2, 3), horizontal legend

New implementation:

def plot_top3_ranking_distribution(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Top 3 Rankings Distribution\nCount of 1st, 2nd, and 3rd Place Votes per Voice",
    x_label: str = "Voices",
    y_label: str = "Number of Mentions in Top 3",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a stacked bar chart showing how often each voice was ranked 1st, 2nd, or 3rd."""
    df = self._ensure_dataframe(data)

    # Calculate stats per column
    stats = []
    for col in [c for c in df.columns if c != '_recordId']:
        rank1 = df.filter(pl.col(col) == 1).height
        rank2 = df.filter(pl.col(col) == 2).height
        rank3 = df.filter(pl.col(col) == 3).height
        total = rank1 + rank2 + rank3

        if total > 0:
            label = col.split('__')[-1] if '__' in col else col
            # Add 3 rows (one per rank)
            stats.append({'voice': label, 'rank': 'Rank 1 (1st Choice)', 'count': rank1, 'total': total})
            stats.append({'voice': label, 'rank': 'Rank 2 (2nd Choice)', 'count': rank2, 'total': total})
            stats.append({'voice': label, 'rank': 'Rank 3 (3rd Choice)', 'count': rank3, 'total': total})

    # Convert to long format, sort by total
    stats_df = pl.DataFrame(stats).to_pandas()
    
    # Create stacked bar chart
    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X('voice:N', title=x_label, sort=alt.EncodingSortField(field='total', op='sum', order='descending')),
        y=alt.Y('count:Q', title=y_label, stack='zero'),
        color=alt.Color('rank:N',
                       scale=alt.Scale(domain=['Rank 1 (1st Choice)', 'Rank 2 (2nd Choice)', 'Rank 3 (3rd Choice)'],
                                     range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3]),
                       legend=alt.Legend(orient='top', direction='horizontal', title=None)),
        tooltip=[
            alt.Tooltip('voice:N', title='Voice'),
            alt.Tooltip('rank:N', title='Rank'),
            alt.Tooltip('count:Q', title='Count')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Data converted to long format (one row per voice-rank combo)
Stacked with stack='zero' in y encoding
Custom color scale for 3 ranks
Sorted by total (sum of all ranks per voice)
Horizontal legend at top
Tooltip shows voice, rank, count

TASK 9: Migrate `plot_ranking_distribution`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~417-536)

Action: Rewrite stacked bar chart (4 ranks) - very similar to Task 8.

Current behavior:

Input: Wide DataFrame with ranking columns (values 1, 2, 3, 4)
Output: Stacked bar chart with 4 layers

New implementation:

def plot_ranking_distribution(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Rankings Distribution\n(1st to 4th Place)",
    x_label: str = "Item",
    y_label: str = "Number of Votes",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a stacked bar chart showing the distribution of rankings (1st to 4th)."""
    df = self._ensure_dataframe(data)

    stats = []
    ranking_cols = [c for c in df.columns if c != '_recordId']

    for col in ranking_cols:
        r1 = df.filter(pl.col(col) == 1).height
        r2 = df.filter(pl.col(col) == 2).height
        r3 = df.filter(pl.col(col) == 3).height
        r4 = df.filter(pl.col(col) == 4).height
        total = r1 + r2 + r3 + r4

        if total > 0:
            label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
            stats.append({'item': label, 'rank': 'Rank 1 (Best)', 'count': r1, 'rank1': r1})
            stats.append({'item': label, 'rank': 'Rank 2', 'count': r2, 'rank1': r1})
            stats.append({'item': label, 'rank': 'Rank 3', 'count': r3, 'rank1': r1})
            stats.append({'item': label, 'rank': 'Rank 4 (Worst)', 'count': r4, 'rank1': r1})

    if not stats:
        return alt.Chart().mark_text(text="No data")

    stats_df = pl.DataFrame(stats).to_pandas()

    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X('item:N', title=x_label, sort=alt.EncodingSortField(field='rank1', order='descending')),
        y=alt.Y('count:Q', title=y_label, stack='zero'),
        color=alt.Color('rank:N',
                       scale=alt.Scale(domain=['Rank 1 (Best)', 'Rank 2', 'Rank 3', 'Rank 4 (Worst)'],
                                     range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3, ColorPalette.RANK_4]),
                       legend=alt.Legend(orient='top', direction='horizontal', title=None)),
        tooltip=[
            alt.Tooltip('item:N', title='Item'),
            alt.Tooltip('rank:N', title='Rank'),
            alt.Tooltip('count:Q', title='Count')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
4 ranks supported
Sorted by Rank 1 count (added rank1 field for sorting)
Custom color scale for 4 ranks
Empty data handled (returns text mark)

TASK 10: Migrate `plot_voice_selection_counts`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~664-737)

Action: Rewrite with Polars data transformation + conditional coloring.

Current behavior:

Input: DataFrame with comma-separated string column (8_Combined)
Process: Split strings, explode, count occurrences
Output: Bar chart, top 8 bars in PRIMARY, rest in NEUTRAL

New implementation:

def plot_voice_selection_counts(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    target_column: str = "8_Combined",
    title: str = "Most Frequently Chosen Voices\n(Top 8 Highlighted)",
    x_label: str = "Voice",
    y_label: str = "Number of Times Chosen",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar plot showing the frequency of voice selections."""
    df = self._ensure_dataframe(data)

    if target_column not in df.columns:
        return alt.Chart().mark_text(text=f"Column '{target_column}' not found")

    # Process data: split, explode, count
    stats_df = (
        df.select(pl.col(target_column))
        .drop_nulls()
        .with_columns(pl.col(target_column).str.split(","))
        .explode(target_column)
        .with_columns(pl.col(target_column).str.strip_chars())
        .filter(pl.col(target_column) != "")
        .group_by(target_column)
        .agg(pl.len().alias("count"))
        .sort("count", descending=True)
        .with_row_index('rank_index')
        .with_columns(
            pl.when(pl.col('rank_index') < 8)
            .then(pl.lit('Top 8'))
            .otherwise(pl.lit('Other'))
            .alias('category')
        )
        .to_pandas()
    )

    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
        y=alt.Y('count:Q', title=y_label),
        color=alt.Color('category:N',
                       scale=alt.Scale(domain=['Top 8', 'Other'],
                                     range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
                       legend=None),
        tooltip=[
            alt.Tooltip(f'{target_column}:N', title='Voice'),
            alt.Tooltip('count:Q', title='Selections')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Polars chain: split → explode → strip → group → count
Top 8 categorization logic correct
Conditional coloring applied
Sorted by count descending

TASK 11: Migrate `plot_top3_selection_counts`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~739-808)

Action: Identical to Task 10, but default column is 3_Ranked and top 3 highlighted.

New implementation:

def plot_top3_selection_counts(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    target_column: str = "3_Ranked",
    title: str = "Most Frequently Chosen Top 3 Voices\n(Top 3 Highlighted)",
    x_label: str = "Voice",
    y_label: str = "Count of Mentions in Top 3",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Question: Which 3 voices are chosen the most out of 18?"""
    df = self._ensure_dataframe(data)

    if target_column not in df.columns:
        return alt.Chart().mark_text(text=f"Column '{target_column}' not found")

    stats_df = (
        df.select(pl.col(target_column))
        .drop_nulls()
        .with_columns(pl.col(target_column).str.split(","))
        .explode(target_column)
        .with_columns(pl.col(target_column).str.strip_chars())
        .filter(pl.col(target_column) != "")
        .group_by(target_column)
        .agg(pl.len().alias("count"))
        .sort("count", descending=True)
        .with_row_index('rank_index')
        .with_columns(
            pl.when(pl.col('rank_index') < 3)
            .then(pl.lit('Top 3'))
            .otherwise(pl.lit('Other'))
            .alias('category')
        )
        .to_pandas()
    )

    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
        y=alt.Y('count:Q', title=y_label),
        color=alt.Color('category:N',
                       scale=alt.Scale(domain=['Top 3', 'Other'],
                                     range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
                       legend=None),
        tooltip=[
            alt.Tooltip(f'{target_column}:N', title='Voice'),
            alt.Tooltip('count:Q', title='In Top 3')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Default target_column is "3_Ranked"
Top 3 categorization (not top 8)
Otherwise identical to Task 10

TASK 12: Migrate `plot_speaking_style_trait_scores`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~810-926)

Action: Rewrite horizontal bar chart with text annotations.

Current behavior:

Input: DataFrame with Voice, score, Left_Anchor, Right_Anchor columns
Output: Horizontal bar chart with anchor labels at bottom

New implementation:

def plot_speaking_style_trait_scores(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    trait_description: str = None,
    left_anchor: str = None,
    right_anchor: str = None,
    title: str = "Speaking Style Trait Analysis",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Plot scores for a single speaking style trait across multiple voices."""
    df = self._ensure_dataframe(data)

    if df.is_empty():
        return alt.Chart().mark_text(text="No data")
        
    required_cols = ["Voice", "score"]
    if not all(col in df.columns for col in required_cols):
        return alt.Chart().mark_text(text="Missing required columns")

    # Calculate stats: Mean, Count
    stats = (
        df.filter(pl.col("score").is_not_null())
        .group_by("Voice")
        .agg([
            pl.col("score").mean().alias("mean_score"),
            pl.col("score").count().alias("count")
        ])
        .sort("mean_score", descending=False)  # Ascending for bottom-to-top display
        .to_pandas()
    )

    # Extract anchors from data if not provided
    if (left_anchor is None or right_anchor is None) and "Left_Anchor" in df.columns:
        head = df.filter(pl.col("Left_Anchor").is_not_null()).head(1)
        if not head.is_empty():
            if left_anchor is None:
                left_anchor = head["Left_Anchor"][0]
            if right_anchor is None:
                right_anchor = head["Right_Anchor"][0]

    if trait_description is None:
        if left_anchor and right_anchor:
            trait_description = f"{left_anchor.split('|')[0]} vs. {right_anchor.split('|')[0]}"
        elif "Description" in df.columns:
            head = df.filter(pl.col("Description").is_not_null()).head(1)
            trait_description = head["Description"][0] if not head.is_empty() else ""
        else:
            trait_description = ""

    # Horizontal bar chart
    bars = alt.Chart(stats).mark_bar(color=ColorPalette.PRIMARY).encode(
        x=alt.X('mean_score:Q', title='Average Score (1-5)', scale=alt.Scale(domain=[1, 5])),
        y=alt.Y('Voice:N', title='Voice', sort='-x'),
        tooltip=[
            alt.Tooltip('Voice:N'),
            alt.Tooltip('mean_score:Q', title='Average', format='.2f'),
            alt.Tooltip('count:Q', title='Count')
        ]
    )

    # Count text inside bars
    text = bars.mark_text(
        align='center',
        baseline='middle',
        color='white',
        fontSize=16
    ).encode(
        text='count:Q'
    )

    # Combine
    chart = (bars + text).properties(
        title={
            "text": title,
            "subtitle": [trait_description, "(Numbers on bars indicate respondent count)"]
        },
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    # Note: Anchor annotations at bottom would require separate text marks
    # positioned at fixed coordinates - can add if needed

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Horizontal orientation (x=score, y=voice)
X-axis domain set to [1, 5]
Count text displayed inside bars (white, large font)
Title includes subtitle with trait description
Sorted by mean score (ascending for bottom-to-top)
Anchor label annotations (optional - commented in code)

TASK 13: Migrate `plot_speaking_style_correlation`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~928-1018)

Action: Rewrite with red/green conditional coloring based on sign.

Current behavior:

Input: DataFrame with correlation data
Process: Calculate Pearson correlation per trait
Output: Bar chart, positive correlations green, negative red

New implementation:

def plot_speaking_style_correlation(
    self,
    style_color: str,
    style_traits: list[str],
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str | None = None,
) -> alt.Chart:
    """Plots correlation between Speaking Style Trait Scores (1-5) and Voice Scale (1-10)."""
    df = self._ensure_dataframe(data)

    if title is None:
        title = f"Speaking style and voice scale 1-10 correlations"

    trait_correlations = []
    
    # Calculate correlations
    for i, trait in enumerate(style_traits):
        subset = df.filter(pl.col("Right_Anchor") == trait)
        valid_data = subset.select(["score", "Voice_Scale_Score"]).drop_nulls()
        
        if valid_data.height > 1:
            corr_val = valid_data.select(pl.corr("score", "Voice_Scale_Score")).item()
            # Handle trait text - wrap at '|' for display
            trait_display = trait.replace('|', '\n')
            trait_correlations.append({
                "trait_display": trait_display,
                "trait_index": f"Trait {i+1}",
                "correlation": corr_val if corr_val is not None else 0.0
            })
    
    if not trait_correlations:
        return alt.Chart().mark_text(text=f"No data for {style_color} Style")
        
    plot_df = pl.DataFrame(trait_correlations).to_pandas()

    # Conditional color based on sign
    chart = alt.Chart(plot_df).mark_bar().encode(
        x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
        y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
        color=alt.condition(
            alt.datum.correlation >= 0,
            alt.value('green'),
            alt.value('red')
        ),
        tooltip=[
            alt.Tooltip('trait_display:N', title='Trait'),
            alt.Tooltip('correlation:Q', format='.2f')
        ]
    ).properties(
        title=title,
        width=1000,
        height=400
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Pearson correlation calculated via pl.corr()
Conditional coloring: green if positive, red if negative
Y-axis domain [-1, 1]
Trait text wrapped at '|' for display
Tooltip shows trait and correlation value

TASK 14: Migrate `plot_speaking_style_ranking_correlation`

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~1020-1105)

Action: Almost identical to Task 13, but correlates with Ranking_Points instead of Voice_Scale_Score.

New implementation:

def plot_speaking_style_ranking_correlation(
    self,
    style_color: str,
    style_traits: list[str],
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str | None = None,
) -> alt.Chart:
    """Plots correlation between Speaking Style Trait Scores (1-5) and Voice Ranking Points (0-3)."""
    df = self._ensure_dataframe(data)

    if title is None:
        title = f"Speaking style {style_color} and voice ranking points correlations"
    
    trait_correlations = []
    
    for i, trait in enumerate(style_traits):
        subset = df.filter(pl.col("Right_Anchor") == trait)
        valid_data = subset.select(["score", "Ranking_Points"]).drop_nulls()
        
        if valid_data.height > 1:
            corr_val = valid_data.select(pl.corr("score", "Ranking_Points")).item()
            trait_display = trait.replace('|', '\n')
            trait_correlations.append({
                "trait_display": trait_display,
                "trait_index": f"Trait {i+1}",
                "correlation": corr_val if corr_val is not None else 0.0
            })
    
    if not trait_correlations:
        return alt.Chart().mark_text(text=f"No data for {style_color} Style")
        
    plot_df = pl.DataFrame(trait_correlations).to_pandas()

    chart = alt.Chart(plot_df).mark_bar().encode(
        x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
        y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
        color=alt.condition(
            alt.datum.correlation >= 0,
            alt.value('green'),
            alt.value('red')
        ),
        tooltip=[
            alt.Tooltip('trait_display:N', title='Trait'),
            alt.Tooltip('correlation:Q', format='.2f')
        ]
    ).properties(
        title=title,
        width=1000,
        height=400
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

Returns alt.Chart
Uses Ranking_Points column instead of Voice_Scale_Score
Otherwise identical to Task 13

TASK 15: Install vl-convert-python

Action: Add vl-convert-python to project dependencies for PNG export.

Command:

cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
uv add vl-convert-python

Verification:

vl-convert-python appears in pyproject.toml dependencies
Installation successful (no errors)

TASK 16: Remove Plotly dependencies (optional cleanup)

Action: Remove unused Plotly packages.

Command:

cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
uv remove plotly kaleido

Verification:

plotly and kaleido removed from pyproject.toml
No other code depends on Plotly (grep check)

TASK 17: Test all plot methods in Marimo notebook

Action: Create a test notebook to verify all 10 plotting methods work correctly.

Test checklist per plot:

Chart renders without errors
Chart has correct dimensions (width/height)
Colors match ColorPalette constants
Data is displayed correctly (bars, stacks, etc.)
Text overlays render (counts, scores)
Tooltips show correct information
Filter annotation appears below chart (if filters active)
PNG export works (check figures/ directory)
No overlap between chart elements and filter text

Create test file: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/test_altair_migration.py

Test template:

import marimo as mo
import polars as pl
from utils import QualtricsSurvey

# Load sample data
survey = QualtricsSurvey()
survey.load_data('path/to/data')
survey.fig_save_dir = 'figures/altair_test'

# Test each plot method
mo.md("## Testing Altair Migration")

# 1. Test plot_average_scores_with_counts
chart1 = survey.plot_average_scores_with_counts(...)
chart1

# 2. Test plot_most_ranked_1
chart2 = survey.plot_most_ranked_1(...)
chart2

# ... repeat for all 10 methods

Final Verification Checklist

After completing all tasks, verify the following:

Code Quality

No Plotly imports remain in plots.py
All methods return alt.Chart instead of go.Figure
No syntax errors (python -m py_compile plots.py)
Type hints updated (if any reference go.Figure)
Docstrings updated (if any mention Plotly)

Theme & Styling

jpmc_altair_theme() function exists in theme.py
Theme is registered and enabled
All charts use ColorPalette constants
Chart dimensions match original (width=1000, height=500 defaults)
Font sizes match original (11pt for labels, 14pt for titles)

Data Handling

All methods handle empty data gracefully
Wide-to-long transformations correct (stacked bars, selection counts)
Sorting preserved (by average, count, rank1, etc.)
Column filtering works (_recordId excluded)
String processing works (comma-split, strip, explode)

Visual Features

Bar charts render correctly (vertical and horizontal)
Stacked bars have correct layer order
Text overlays positioned correctly (inside/outside bars)
Conditional coloring works (top N highlighting, red/green by sign)
Tooltips show correct fields with proper formatting
Legends positioned correctly (top horizontal for stacked bars)
X-axis labels rotated at -45° by default
Grid lines visible

Filter System

_get_filter_slug() unchanged (still works)
_get_filter_description() unchanged (still works)
_add_filter_footnote() uses vconcat approach
Filter text appears at bottom of combined chart
No overlap between chart and filter text
Filter text is left-aligned
HTML tags stripped from filter text (Altair doesn't support HTML)
Filter subdirectories created correctly

PNG Export

vl-convert-python installed
chart.save() method works
PNG files created in correct subdirectories
PNG files have correct filenames (sanitized titles)
Image quality acceptable (scale_factor=2.0)

Marimo Integration

Charts render in Marimo notebooks
Charts are reactive (update when data changes)
No JavaScript console errors
Interactive features work (tooltips, pan, zoom if enabled)

All 10 Plot Methods

plot_average_scores_with_counts - vertical bar + text
plot_top3_ranking_distribution - stacked bar (3 ranks)
plot_ranking_distribution - stacked bar (4 ranks)
plot_most_ranked_1 - vertical bar + conditional color
plot_weighted_ranking_score - vertical bar + text
plot_voice_selection_counts - vertical bar + conditional color
plot_top3_selection_counts - vertical bar + conditional color
plot_speaking_style_trait_scores - horizontal bar + text
plot_speaking_style_correlation - vertical bar + red/green
plot_speaking_style_ranking_correlation - vertical bar + red/green

Edge Cases

Empty DataFrame handled gracefully
Missing columns detected and reported
Zero counts/values don't break charts
Single data point renders correctly
Very long labels don't cause layout issues
Many categories don't cause overcrowding

Regression Testing

Existing Marimo notebooks still work
Data filtering still works (filter_data())
QualtricsSurvey class initialization unchanged
No breaking changes to public API

Documentation

This migration plan marked as "Complete"
Any new dependencies documented
Any breaking changes documented
Example usage updated (if applicable)

Troubleshooting

Issue: Charts don't render

Check Altair version: python -c "import altair; print(altair.__version__)"
Check vl-convert: python -c "import vl_convert; print(vl_convert.__version__)"
Check for JavaScript errors in browser console

Issue: PNG export fails

Verify vl-convert-python installed: pip show vl-convert-python
Check write permissions on figures/ directory
Try saving as HTML first: chart.save('test.html')

Issue: Colors don't match theme

Verify theme is enabled: print(alt.themes.active)
Check color scale definitions in each plot method
Ensure ColorPalette imported correctly

Issue: Filter text overlaps chart

Increase spacing parameter in vconcat(chart, footer, spacing=20)
Increase footer chart height property
Check if footer chart is actually created (debug with print())

Issue: Data not displaying

Check DataFrame format (wide vs long)
Verify column names match encoding specs
Check for null values (.drop_nulls())
Print intermediate DataFrames for debugging

Notes

Backup: Before starting, create a backup of plots.py: cp plots.py plots.py.plotly_backup
Incremental testing: Test each plot method immediately after migration
Marimo restart: May need to restart Marimo kernel after major changes
Performance: Altair may be slightly slower for very large datasets (>5000 rows); use .sample() if needed

Completion Status

All tasks (1-17) completed
All verification checks passed
Existing notebooks tested and working
Migration documented
Ready for production use

Migration completed on: _________________
Tested by: _________________
Sign-off: _________________

43 KiB Raw Blame History

Altair Migration Plan: Plotly → Altair for QualtricsPlotsMixin

Background

Problem

Why Altair?

Current System Analysis

File Structure

Color Palette (from theme.py)

Current Plot Methods Inventory

Filter System Components

Common Styling Pattern

Prerequisites

Dependencies to Add

Dependencies Already Present

Migration Tasks

TASK 1: Create Altair Theme in theme.py

TASK 2: Update imports in plots.py

TASK 3: Rewrite _add_filter_footnote for Altair

TASK 4: Rewrite _save_plot for Altair

TASK 5: Migrate plot_average_scores_with_counts

TASK 6: Migrate plot_most_ranked_1

TASK 7: Migrate plot_weighted_ranking_score

TASK 8: Migrate plot_top3_ranking_distribution

TASK 9: Migrate plot_ranking_distribution

TASK 10: Migrate plot_voice_selection_counts

TASK 11: Migrate plot_top3_selection_counts

TASK 12: Migrate plot_speaking_style_trait_scores

TASK 13: Migrate plot_speaking_style_correlation

TASK 14: Migrate plot_speaking_style_ranking_correlation

TASK 15: Install vl-convert-python

TASK 16: Remove Plotly dependencies (optional cleanup)

TASK 17: Test all plot methods in Marimo notebook

Final Verification Checklist

Code Quality

Theme & Styling

Data Handling

Visual Features

Filter System

PNG Export

Marimo Integration

All 10 Plot Methods

Edge Cases

Regression Testing

Documentation

Troubleshooting

Issue: Charts don't render

Issue: PNG export fails

Issue: Colors don't match theme

Issue: Filter text overlaps chart

Issue: Data not displaying

Notes

Completion Status

43 KiB

Raw Blame History

TASK 3: Rewrite `_add_filter_footnote` for Altair

TASK 4: Rewrite `_save_plot` for Altair

TASK 5: Migrate `plot_average_scores_with_counts`

TASK 6: Migrate `plot_most_ranked_1`

TASK 7: Migrate `plot_weighted_ranking_score`

TASK 8: Migrate `plot_top3_ranking_distribution`

TASK 9: Migrate `plot_ranking_distribution`

TASK 10: Migrate `plot_voice_selection_counts`

TASK 11: Migrate `plot_top3_selection_counts`

TASK 12: Migrate `plot_speaking_style_trait_scores`

TASK 13: Migrate `plot_speaking_style_correlation`

TASK 14: Migrate `plot_speaking_style_ranking_correlation`