Files
JPMC-quant/docs/altair-migration-plan.md

43 KiB

Altair Migration Plan: Plotly → Altair for QualtricsPlotsMixin

Date: January 28, 2026
Status: Not Started
Objective: Migrate all plotting methods from Plotly to Altair to solve filter annotation overlap issues and ensure proper Marimo reactivity.


Background

Problem

Current Plotly implementation has a critical layout issue: filter annotations overlap with long rotated x-axis labels because Plotly doesn't support true bounding boxes. Elements overflow their assigned subplot areas.

Why Altair?

  1. Better layout control - Vega-Lite (Altair's backend) properly calculates space for all elements
  2. Marimo reactivity - Marimo documentation states reactive plots require Altair or Plotly; Altair is preferred
  3. Clean separation - vconcat() creates true vertical stacking without overflow
  4. Already installed - Altair >=6.0.0 is already a dependency (unused)

Current System Analysis

File Structure

  • plots.py - Contains QualtricsPlotsMixin class with 10 plotting methods
  • theme.py - Contains ColorPalette class with all styling constants
  • utils.py - Contains QualtricsSurvey class that mixes in QualtricsPlotsMixin

Color Palette (from theme.py)

class ColorPalette:
    PRIMARY = "#0077B6"    # Medium Blue
    RANK_1 = "#004C6D"     # Dark Blue
    RANK_2 = "#008493"     # Teal
    RANK_3 = "#5AAE95"     # Sea Green
    RANK_4 = "#9E9E9E"     # Grey
    NEUTRAL = "#D3D3D3"    # Light Grey
    TEXT = "black"
    GRID = "lightgray"
    BACKGROUND = "white"

Current Plot Methods Inventory

Method Name Chart Type Input Format Special Features
plot_average_scores_with_counts Vertical Bar Wide DF (score columns) Text inside bars (count)
plot_top3_ranking_distribution Stacked Vertical Bar Wide DF (rank values 1-3) 3-layer stack, legend
plot_ranking_distribution Stacked Vertical Bar Wide DF (rank values 1-4) 4-layer stack, legend
plot_most_ranked_1 Vertical Bar Wide DF (ranking columns) Top 3 highlighted
plot_weighted_ranking_score Vertical Bar Character, Weighted Score Text inside bars
plot_voice_selection_counts Vertical Bar Comma-separated string col Explode strings, Top 8 highlight
plot_top3_selection_counts Vertical Bar Comma-separated string col Explode strings, Top 3 highlight
plot_speaking_style_trait_scores Horizontal Bar Voice, score, anchors Text annotations at bottom
plot_speaking_style_correlation Vertical Bar Correlation data Red/Green conditional
plot_speaking_style_ranking_correlation Vertical Bar Correlation data Red/Green conditional

Filter System Components

  1. _get_filter_slug() - Generates directory name from filters (e.g., Age-22to24_Gen-Man)
  2. _get_filter_description() - Generates HTML text (e.g., Filters: <b>Age:</b> 22-24<br><b>Gender:</b> Man)
  3. _add_filter_footnote(fig) - Currently creates 2-row Plotly subplot, adds annotation
  4. _save_plot(fig, title) - Adds footer, saves to figures/{slug}/{filename}.png

Common Styling Pattern

All plots use:

  • Height: 500px (default, can override)
  • Width: 1000px (default, can override)
  • Background: white
  • Grid: light gray
  • Font size: 11
  • X-axis: 45° rotated labels
  • Legends (where applicable): Horizontal, positioned above plot

Prerequisites

Dependencies to Add

uv add vl-convert-python  # For PNG export from Altair

Dependencies Already Present

  • altair>=6.0.0
  • polars>=1.37.1
  • pandas>=2.3.3

Migration Tasks

TASK 1: Create Altair Theme in theme.py

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/theme.py

Action: Add an Altair theme function and register it at the end of the file.

Code to add:

def jpmc_altair_theme():
    """JPMC brand theme for Altair charts."""
    return {
        'config': {
            'view': {
                'continuousWidth': 1000,
                'continuousHeight': 500,
                'strokeWidth': 0
            },
            'background': ColorPalette.BACKGROUND,
            'axis': {
                'grid': True,
                'gridColor': ColorPalette.GRID,
                'labelAngle': -45,  # Default rotated labels
                'labelFontSize': 11,
                'titleFontSize': 12,
                'labelColor': ColorPalette.TEXT,
                'titleColor': ColorPalette.TEXT
            },
            'axisX': {
                'labelAngle': -45
            },
            'axisY': {
                'labelAngle': 0
            },
            'legend': {
                'orient': 'top',
                'direction': 'horizontal',
                'titleFontSize': 11,
                'labelFontSize': 11
            },
            'title': {
                'fontSize': 14,
                'color': ColorPalette.TEXT,
                'anchor': 'start'
            },
            'bar': {
                'color': ColorPalette.PRIMARY
            }
        }
    }

# Register theme (add at end of file)
try:
    import altair as alt
    alt.themes.register('jpmc', jpmc_altair_theme)
    alt.themes.enable('jpmc')
except ImportError:
    pass  # Altair not installed

Verification:

  • Function jpmc_altair_theme() exists
  • Theme is registered as 'jpmc'
  • Theme is enabled by default
  • Import error is handled gracefully

TASK 2: Update imports in plots.py

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (lines 1-8)

Action: Replace Plotly imports with Altair imports.

Current code:

import plotly.graph_objects as go
from plotly.subplots import make_subplots

Replace with:

import altair as alt

Keep these imports:

import re
from pathlib import Path
import polars as pl
from theme import ColorPalette

Verification:

  • import altair as alt present
  • No Plotly imports remain
  • All other imports unchanged

TASK 3: Rewrite _add_filter_footnote for Altair

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~120-212)

Action: Replace entire _add_filter_footnote method with Altair version.

New implementation:

def _add_filter_footnote(self, chart: alt.Chart) -> alt.Chart:
    """Add a footnote with active filters to the chart.
    
    Creates a vconcat with main chart on top and filter text chart below.
    Returns the combined chart (or original if no filters).
    """
    filter_text = self._get_filter_description()
    
    # Skip if no filters active - return original chart
    if not filter_text:
        return chart
    
    # Remove HTML tags for plain text (Altair doesn't support HTML in mark_text)
    plain_text = re.sub(r'<[^>]+>', '', filter_text)
    # Replace <br> with newlines
    plain_text = plain_text.replace('<br>', '\n')
    
    # Create a text-only chart for the footer
    # Use a dummy dataframe with one row
    import pandas as pd
    footer_df = pd.DataFrame([{'text': plain_text, 'x': 0, 'y': 0}])
    
    footer_chart = alt.Chart(footer_df).mark_text(
        align='left',
        baseline='top',
        fontSize=9,
        color='gray',
        dx=5,  # Small left padding
        dy=5   # Small top padding
    ).encode(
        text='text:N'
    ).properties(
        height=60,  # Fixed height for footer
        width=chart.width if hasattr(chart, 'width') and chart.width else 1000
    )
    
    # Combine with vconcat
    combined = alt.vconcat(chart, footer_chart, spacing=10)
    
    return combined

Verification:

  • Method signature changed from fig: go.Figure to chart: alt.Chart
  • Returns alt.Chart instead of go.Figure
  • Uses vconcat for vertical stacking
  • HTML tags are stripped from filter text
  • Footer has fixed height
  • Spacing between chart and footer is set

TASK 4: Rewrite _save_plot for Altair

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~214-234)

Action: Replace _save_plot method for Altair chart saving.

New implementation:

def _save_plot(self, chart: alt.Chart, title: str) -> alt.Chart:
    """Save chart to PNG file if fig_save_dir is set.
    
    Returns the (potentially modified) chart with filter footnote added.
    """
    # Add filter footnote - returns combined chart if filters active
    chart = self._add_filter_footnote(chart)

    if hasattr(self, 'fig_save_dir') and self.fig_save_dir:
        path = Path(self.fig_save_dir)
        
        # Add filter slug subfolder
        filter_slug = self._get_filter_slug()
        path = path / filter_slug
        
        if not path.exists():
            path.mkdir(parents=True, exist_ok=True)
            
        filename = f"{self._sanitize_filename(title)}.png"
        
        # Save using vl-convert backend
        chart.save(str(path / filename), format='png', scale_factor=2.0)
    
    return chart

Verification:

  • Method signature changed from fig: go.Figure to chart: alt.Chart
  • Uses chart.save() instead of fig.write_image()
  • PNG format specified
  • Path handling unchanged (filter slug subdirectories)
  • Returns modified chart

TASK 5: Migrate plot_average_scores_with_counts

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~248-313)

Action: Rewrite using Altair bar chart with text overlay.

Current behavior:

  • Input: Wide DataFrame with score columns
  • Output: Vertical bar chart with average scores, count text inside bars

New implementation:

def plot_average_scores_with_counts(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "General Impression (1-10)\nPer Voice with Number of Participants Who Rated It",
    x_label: str = "Stimuli",
    y_label: str = "Average General Impression Rating (1-10)",
    color: str = ColorPalette.PRIMARY,
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar plot showing average scores and count of non-null values for each column."""
    df = self._ensure_dataframe(data)

    # Calculate stats for each column (exclude _recordId)
    stats = []
    for col in [c for c in df.columns if c != '_recordId']:
        avg_score = df[col].mean()
        non_null_count = df[col].drop_nulls().len()
        # Extract voice ID from column name
        label = col.split('__')[-1] if '__' in col else col
        stats.append({
            'voice': label,
            'average': avg_score,
            'count': non_null_count
        })

    # Convert to pandas for Altair (sort by average descending)
    stats_df = pl.DataFrame(stats).sort('average', descending=True).to_pandas()

    # Base bar chart
    bars = alt.Chart(stats_df).mark_bar(color=color).encode(
        x=alt.X('voice:N', title=x_label, sort='-y'),
        y=alt.Y('average:Q', title=y_label, scale=alt.Scale(domain=[0, 10])),
        tooltip=[
            alt.Tooltip('voice:N', title='Voice'),
            alt.Tooltip('average:Q', title='Average', format='.2f'),
            alt.Tooltip('count:Q', title='Count')
        ]
    )

    # Text overlay for counts
    text = alt.Chart(stats_df).mark_text(
        dy=-5,  # Slight offset above bar
        color='black',
        fontSize=10
    ).encode(
        x=alt.X('voice:N', sort='-y'),
        y=alt.Y('average:Q'),
        text=alt.Text('count:Q')
    )

    # Combine layers
    chart = (bars + text).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart instead of go.Figure
  • Data transformed to long format (pandas DataFrame)
  • Bar chart created with mark_bar()
  • Text overlay added with mark_text()
  • Layers combined with + operator
  • Sorting preserved (by average descending)
  • Y-axis scale set to [0, 10]
  • Tooltip includes voice, average, count
  • Width/height properties set
  • _save_plot called at end

TASK 6: Migrate plot_most_ranked_1

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~537-613)

Action: Rewrite using Altair with conditional coloring (top 3 highlighted).

Current behavior:

  • Input: Wide DataFrame with ranking columns
  • Output: Vertical bar chart, top 3 bars in PRIMARY color, rest in NEUTRAL

New implementation:

def plot_most_ranked_1(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Most Popular Choice\n(Number of Times Ranked 1st)",
    x_label: str = "Item",
    y_label: str = "Count of 1st Place Rankings",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar chart showing which item was ranked #1 the most. Top 3 highlighted."""
    df = self._ensure_dataframe(data)

    stats = []
    ranking_cols = [c for c in df.columns if c != '_recordId']

    for col in ranking_cols:
        count_rank_1 = df.filter(pl.col(col) == 1).height
        # Clean label
        label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
        stats.append({'item': label, 'count': count_rank_1})

    # Convert and sort
    stats_df = pl.DataFrame(stats).sort('count', descending=True)
    
    # Add rank column for coloring (1-3 vs 4+)
    stats_df = stats_df.with_row_index('rank_index')
    stats_df = stats_df.with_columns(
        pl.when(pl.col('rank_index') < 3)
        .then(pl.lit('Top 3'))
        .otherwise(pl.lit('Other'))
        .alias('category')
    ).to_pandas()

    # Bar chart with conditional color
    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X('item:N', title=x_label, sort='-y'),
        y=alt.Y('count:Q', title=y_label),
        color=alt.Color('category:N',
                       scale=alt.Scale(domain=['Top 3', 'Other'],
                                     range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
                       legend=None),
        tooltip=[
            alt.Tooltip('item:N', title='Item'),
            alt.Tooltip('count:Q', title='1st Place Votes')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Counts rank 1 occurrences per column
  • Adds category column for top 3 vs others
  • Uses conditional color via alt.Color() with custom scale
  • Tooltip shows item and count
  • Sorted by count descending
  • Legend hidden (color is self-explanatory)

TASK 7: Migrate plot_weighted_ranking_score

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~615-662)

Action: Rewrite simple bar chart with text overlay.

Current behavior:

  • Input: DataFrame with Character and Weighted Score columns
  • Output: Vertical bar chart with score text inside bars

New implementation:

def plot_weighted_ranking_score(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Weighted Popularity Score\n(1st=3pts, 2nd=2pts, 3rd=1pt)",
    x_label: str = "Character Personality",
    y_label: str = "Total Weighted Score",
    color: str = ColorPalette.PRIMARY,
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar chart showing the weighted ranking score for each character."""
    weighted_df = self._ensure_dataframe(data).to_pandas()

    # Bar chart
    bars = alt.Chart(weighted_df).mark_bar(color=color).encode(
        x=alt.X('Character:N', title=x_label),
        y=alt.Y('Weighted Score:Q', title=y_label),
        tooltip=[
            alt.Tooltip('Character:N'),
            alt.Tooltip('Weighted Score:Q', title='Score')
        ]
    )

    # Text overlay
    text = bars.mark_text(
        dy=-5,
        color='white',
        fontSize=11
    ).encode(
        text='Weighted Score:Q'
    )

    chart = (bars + text).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Uses input columns as-is (Character, Weighted Score)
  • Text overlay with white color inside bars
  • Tooltip shows character and score

TASK 8: Migrate plot_top3_ranking_distribution

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~315-415)

Action: Rewrite stacked bar chart (3 ranks).

Current behavior:

  • Input: Wide DataFrame with ranking columns (values 1, 2, 3)
  • Output: Stacked bar chart with 3 layers (Rank 1, 2, 3), horizontal legend

New implementation:

def plot_top3_ranking_distribution(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Top 3 Rankings Distribution\nCount of 1st, 2nd, and 3rd Place Votes per Voice",
    x_label: str = "Voices",
    y_label: str = "Number of Mentions in Top 3",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a stacked bar chart showing how often each voice was ranked 1st, 2nd, or 3rd."""
    df = self._ensure_dataframe(data)

    # Calculate stats per column
    stats = []
    for col in [c for c in df.columns if c != '_recordId']:
        rank1 = df.filter(pl.col(col) == 1).height
        rank2 = df.filter(pl.col(col) == 2).height
        rank3 = df.filter(pl.col(col) == 3).height
        total = rank1 + rank2 + rank3

        if total > 0:
            label = col.split('__')[-1] if '__' in col else col
            # Add 3 rows (one per rank)
            stats.append({'voice': label, 'rank': 'Rank 1 (1st Choice)', 'count': rank1, 'total': total})
            stats.append({'voice': label, 'rank': 'Rank 2 (2nd Choice)', 'count': rank2, 'total': total})
            stats.append({'voice': label, 'rank': 'Rank 3 (3rd Choice)', 'count': rank3, 'total': total})

    # Convert to long format, sort by total
    stats_df = pl.DataFrame(stats).to_pandas()
    
    # Create stacked bar chart
    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X('voice:N', title=x_label, sort=alt.EncodingSortField(field='total', op='sum', order='descending')),
        y=alt.Y('count:Q', title=y_label, stack='zero'),
        color=alt.Color('rank:N',
                       scale=alt.Scale(domain=['Rank 1 (1st Choice)', 'Rank 2 (2nd Choice)', 'Rank 3 (3rd Choice)'],
                                     range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3]),
                       legend=alt.Legend(orient='top', direction='horizontal', title=None)),
        tooltip=[
            alt.Tooltip('voice:N', title='Voice'),
            alt.Tooltip('rank:N', title='Rank'),
            alt.Tooltip('count:Q', title='Count')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Data converted to long format (one row per voice-rank combo)
  • Stacked with stack='zero' in y encoding
  • Custom color scale for 3 ranks
  • Sorted by total (sum of all ranks per voice)
  • Horizontal legend at top
  • Tooltip shows voice, rank, count

TASK 9: Migrate plot_ranking_distribution

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~417-536)

Action: Rewrite stacked bar chart (4 ranks) - very similar to Task 8.

Current behavior:

  • Input: Wide DataFrame with ranking columns (values 1, 2, 3, 4)
  • Output: Stacked bar chart with 4 layers

New implementation:

def plot_ranking_distribution(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str = "Rankings Distribution\n(1st to 4th Place)",
    x_label: str = "Item",
    y_label: str = "Number of Votes",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a stacked bar chart showing the distribution of rankings (1st to 4th)."""
    df = self._ensure_dataframe(data)

    stats = []
    ranking_cols = [c for c in df.columns if c != '_recordId']

    for col in ranking_cols:
        r1 = df.filter(pl.col(col) == 1).height
        r2 = df.filter(pl.col(col) == 2).height
        r3 = df.filter(pl.col(col) == 3).height
        r4 = df.filter(pl.col(col) == 4).height
        total = r1 + r2 + r3 + r4

        if total > 0:
            label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
            stats.append({'item': label, 'rank': 'Rank 1 (Best)', 'count': r1, 'rank1': r1})
            stats.append({'item': label, 'rank': 'Rank 2', 'count': r2, 'rank1': r1})
            stats.append({'item': label, 'rank': 'Rank 3', 'count': r3, 'rank1': r1})
            stats.append({'item': label, 'rank': 'Rank 4 (Worst)', 'count': r4, 'rank1': r1})

    if not stats:
        return alt.Chart().mark_text(text="No data")

    stats_df = pl.DataFrame(stats).to_pandas()

    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X('item:N', title=x_label, sort=alt.EncodingSortField(field='rank1', order='descending')),
        y=alt.Y('count:Q', title=y_label, stack='zero'),
        color=alt.Color('rank:N',
                       scale=alt.Scale(domain=['Rank 1 (Best)', 'Rank 2', 'Rank 3', 'Rank 4 (Worst)'],
                                     range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3, ColorPalette.RANK_4]),
                       legend=alt.Legend(orient='top', direction='horizontal', title=None)),
        tooltip=[
            alt.Tooltip('item:N', title='Item'),
            alt.Tooltip('rank:N', title='Rank'),
            alt.Tooltip('count:Q', title='Count')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • 4 ranks supported
  • Sorted by Rank 1 count (added rank1 field for sorting)
  • Custom color scale for 4 ranks
  • Empty data handled (returns text mark)

TASK 10: Migrate plot_voice_selection_counts

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~664-737)

Action: Rewrite with Polars data transformation + conditional coloring.

Current behavior:

  • Input: DataFrame with comma-separated string column (8_Combined)
  • Process: Split strings, explode, count occurrences
  • Output: Bar chart, top 8 bars in PRIMARY, rest in NEUTRAL

New implementation:

def plot_voice_selection_counts(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    target_column: str = "8_Combined",
    title: str = "Most Frequently Chosen Voices\n(Top 8 Highlighted)",
    x_label: str = "Voice",
    y_label: str = "Number of Times Chosen",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Create a bar plot showing the frequency of voice selections."""
    df = self._ensure_dataframe(data)

    if target_column not in df.columns:
        return alt.Chart().mark_text(text=f"Column '{target_column}' not found")

    # Process data: split, explode, count
    stats_df = (
        df.select(pl.col(target_column))
        .drop_nulls()
        .with_columns(pl.col(target_column).str.split(","))
        .explode(target_column)
        .with_columns(pl.col(target_column).str.strip_chars())
        .filter(pl.col(target_column) != "")
        .group_by(target_column)
        .agg(pl.len().alias("count"))
        .sort("count", descending=True)
        .with_row_index('rank_index')
        .with_columns(
            pl.when(pl.col('rank_index') < 8)
            .then(pl.lit('Top 8'))
            .otherwise(pl.lit('Other'))
            .alias('category')
        )
        .to_pandas()
    )

    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
        y=alt.Y('count:Q', title=y_label),
        color=alt.Color('category:N',
                       scale=alt.Scale(domain=['Top 8', 'Other'],
                                     range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
                       legend=None),
        tooltip=[
            alt.Tooltip(f'{target_column}:N', title='Voice'),
            alt.Tooltip('count:Q', title='Selections')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Polars chain: split → explode → strip → group → count
  • Top 8 categorization logic correct
  • Conditional coloring applied
  • Sorted by count descending

TASK 11: Migrate plot_top3_selection_counts

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~739-808)

Action: Identical to Task 10, but default column is 3_Ranked and top 3 highlighted.

New implementation:

def plot_top3_selection_counts(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    target_column: str = "3_Ranked",
    title: str = "Most Frequently Chosen Top 3 Voices\n(Top 3 Highlighted)",
    x_label: str = "Voice",
    y_label: str = "Count of Mentions in Top 3",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Question: Which 3 voices are chosen the most out of 18?"""
    df = self._ensure_dataframe(data)

    if target_column not in df.columns:
        return alt.Chart().mark_text(text=f"Column '{target_column}' not found")

    stats_df = (
        df.select(pl.col(target_column))
        .drop_nulls()
        .with_columns(pl.col(target_column).str.split(","))
        .explode(target_column)
        .with_columns(pl.col(target_column).str.strip_chars())
        .filter(pl.col(target_column) != "")
        .group_by(target_column)
        .agg(pl.len().alias("count"))
        .sort("count", descending=True)
        .with_row_index('rank_index')
        .with_columns(
            pl.when(pl.col('rank_index') < 3)
            .then(pl.lit('Top 3'))
            .otherwise(pl.lit('Other'))
            .alias('category')
        )
        .to_pandas()
    )

    chart = alt.Chart(stats_df).mark_bar().encode(
        x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
        y=alt.Y('count:Q', title=y_label),
        color=alt.Color('category:N',
                       scale=alt.Scale(domain=['Top 3', 'Other'],
                                     range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
                       legend=None),
        tooltip=[
            alt.Tooltip(f'{target_column}:N', title='Voice'),
            alt.Tooltip('count:Q', title='In Top 3')
        ]
    ).properties(
        title=title,
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Default target_column is "3_Ranked"
  • Top 3 categorization (not top 8)
  • Otherwise identical to Task 10

TASK 12: Migrate plot_speaking_style_trait_scores

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~810-926)

Action: Rewrite horizontal bar chart with text annotations.

Current behavior:

  • Input: DataFrame with Voice, score, Left_Anchor, Right_Anchor columns
  • Output: Horizontal bar chart with anchor labels at bottom

New implementation:

def plot_speaking_style_trait_scores(
    self,
    data: pl.LazyFrame | pl.DataFrame | None = None,
    trait_description: str = None,
    left_anchor: str = None,
    right_anchor: str = None,
    title: str = "Speaking Style Trait Analysis",
    height: int | None = None,
    width: int | None = None,
) -> alt.Chart:
    """Plot scores for a single speaking style trait across multiple voices."""
    df = self._ensure_dataframe(data)

    if df.is_empty():
        return alt.Chart().mark_text(text="No data")
        
    required_cols = ["Voice", "score"]
    if not all(col in df.columns for col in required_cols):
        return alt.Chart().mark_text(text="Missing required columns")

    # Calculate stats: Mean, Count
    stats = (
        df.filter(pl.col("score").is_not_null())
        .group_by("Voice")
        .agg([
            pl.col("score").mean().alias("mean_score"),
            pl.col("score").count().alias("count")
        ])
        .sort("mean_score", descending=False)  # Ascending for bottom-to-top display
        .to_pandas()
    )

    # Extract anchors from data if not provided
    if (left_anchor is None or right_anchor is None) and "Left_Anchor" in df.columns:
        head = df.filter(pl.col("Left_Anchor").is_not_null()).head(1)
        if not head.is_empty():
            if left_anchor is None:
                left_anchor = head["Left_Anchor"][0]
            if right_anchor is None:
                right_anchor = head["Right_Anchor"][0]

    if trait_description is None:
        if left_anchor and right_anchor:
            trait_description = f"{left_anchor.split('|')[0]} vs. {right_anchor.split('|')[0]}"
        elif "Description" in df.columns:
            head = df.filter(pl.col("Description").is_not_null()).head(1)
            trait_description = head["Description"][0] if not head.is_empty() else ""
        else:
            trait_description = ""

    # Horizontal bar chart
    bars = alt.Chart(stats).mark_bar(color=ColorPalette.PRIMARY).encode(
        x=alt.X('mean_score:Q', title='Average Score (1-5)', scale=alt.Scale(domain=[1, 5])),
        y=alt.Y('Voice:N', title='Voice', sort='-x'),
        tooltip=[
            alt.Tooltip('Voice:N'),
            alt.Tooltip('mean_score:Q', title='Average', format='.2f'),
            alt.Tooltip('count:Q', title='Count')
        ]
    )

    # Count text inside bars
    text = bars.mark_text(
        align='center',
        baseline='middle',
        color='white',
        fontSize=16
    ).encode(
        text='count:Q'
    )

    # Combine
    chart = (bars + text).properties(
        title={
            "text": title,
            "subtitle": [trait_description, "(Numbers on bars indicate respondent count)"]
        },
        width=width or getattr(self, 'plot_width', 1000),
        height=height or getattr(self, 'plot_height', 500)
    )

    # Note: Anchor annotations at bottom would require separate text marks
    # positioned at fixed coordinates - can add if needed

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Horizontal orientation (x=score, y=voice)
  • X-axis domain set to [1, 5]
  • Count text displayed inside bars (white, large font)
  • Title includes subtitle with trait description
  • Sorted by mean score (ascending for bottom-to-top)
  • Anchor label annotations (optional - commented in code)

TASK 13: Migrate plot_speaking_style_correlation

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~928-1018)

Action: Rewrite with red/green conditional coloring based on sign.

Current behavior:

  • Input: DataFrame with correlation data
  • Process: Calculate Pearson correlation per trait
  • Output: Bar chart, positive correlations green, negative red

New implementation:

def plot_speaking_style_correlation(
    self,
    style_color: str,
    style_traits: list[str],
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str | None = None,
) -> alt.Chart:
    """Plots correlation between Speaking Style Trait Scores (1-5) and Voice Scale (1-10)."""
    df = self._ensure_dataframe(data)

    if title is None:
        title = f"Speaking style and voice scale 1-10 correlations"

    trait_correlations = []
    
    # Calculate correlations
    for i, trait in enumerate(style_traits):
        subset = df.filter(pl.col("Right_Anchor") == trait)
        valid_data = subset.select(["score", "Voice_Scale_Score"]).drop_nulls()
        
        if valid_data.height > 1:
            corr_val = valid_data.select(pl.corr("score", "Voice_Scale_Score")).item()
            # Handle trait text - wrap at '|' for display
            trait_display = trait.replace('|', '\n')
            trait_correlations.append({
                "trait_display": trait_display,
                "trait_index": f"Trait {i+1}",
                "correlation": corr_val if corr_val is not None else 0.0
            })
    
    if not trait_correlations:
        return alt.Chart().mark_text(text=f"No data for {style_color} Style")
        
    plot_df = pl.DataFrame(trait_correlations).to_pandas()

    # Conditional color based on sign
    chart = alt.Chart(plot_df).mark_bar().encode(
        x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
        y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
        color=alt.condition(
            alt.datum.correlation >= 0,
            alt.value('green'),
            alt.value('red')
        ),
        tooltip=[
            alt.Tooltip('trait_display:N', title='Trait'),
            alt.Tooltip('correlation:Q', format='.2f')
        ]
    ).properties(
        title=title,
        width=1000,
        height=400
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Pearson correlation calculated via pl.corr()
  • Conditional coloring: green if positive, red if negative
  • Y-axis domain [-1, 1]
  • Trait text wrapped at '|' for display
  • Tooltip shows trait and correlation value

TASK 14: Migrate plot_speaking_style_ranking_correlation

Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~1020-1105)

Action: Almost identical to Task 13, but correlates with Ranking_Points instead of Voice_Scale_Score.

New implementation:

def plot_speaking_style_ranking_correlation(
    self,
    style_color: str,
    style_traits: list[str],
    data: pl.LazyFrame | pl.DataFrame | None = None,
    title: str | None = None,
) -> alt.Chart:
    """Plots correlation between Speaking Style Trait Scores (1-5) and Voice Ranking Points (0-3)."""
    df = self._ensure_dataframe(data)

    if title is None:
        title = f"Speaking style {style_color} and voice ranking points correlations"
    
    trait_correlations = []
    
    for i, trait in enumerate(style_traits):
        subset = df.filter(pl.col("Right_Anchor") == trait)
        valid_data = subset.select(["score", "Ranking_Points"]).drop_nulls()
        
        if valid_data.height > 1:
            corr_val = valid_data.select(pl.corr("score", "Ranking_Points")).item()
            trait_display = trait.replace('|', '\n')
            trait_correlations.append({
                "trait_display": trait_display,
                "trait_index": f"Trait {i+1}",
                "correlation": corr_val if corr_val is not None else 0.0
            })
    
    if not trait_correlations:
        return alt.Chart().mark_text(text=f"No data for {style_color} Style")
        
    plot_df = pl.DataFrame(trait_correlations).to_pandas()

    chart = alt.Chart(plot_df).mark_bar().encode(
        x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
        y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
        color=alt.condition(
            alt.datum.correlation >= 0,
            alt.value('green'),
            alt.value('red')
        ),
        tooltip=[
            alt.Tooltip('trait_display:N', title='Trait'),
            alt.Tooltip('correlation:Q', format='.2f')
        ]
    ).properties(
        title=title,
        width=1000,
        height=400
    )

    chart = self._save_plot(chart, title)
    return chart

Verification:

  • Returns alt.Chart
  • Uses Ranking_Points column instead of Voice_Scale_Score
  • Otherwise identical to Task 13

TASK 15: Install vl-convert-python

Action: Add vl-convert-python to project dependencies for PNG export.

Command:

cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
uv add vl-convert-python

Verification:

  • vl-convert-python appears in pyproject.toml dependencies
  • Installation successful (no errors)

TASK 16: Remove Plotly dependencies (optional cleanup)

Action: Remove unused Plotly packages.

Command:

cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
uv remove plotly kaleido

Verification:

  • plotly and kaleido removed from pyproject.toml
  • No other code depends on Plotly (grep check)

TASK 17: Test all plot methods in Marimo notebook

Action: Create a test notebook to verify all 10 plotting methods work correctly.

Test checklist per plot:

  • Chart renders without errors
  • Chart has correct dimensions (width/height)
  • Colors match ColorPalette constants
  • Data is displayed correctly (bars, stacks, etc.)
  • Text overlays render (counts, scores)
  • Tooltips show correct information
  • Filter annotation appears below chart (if filters active)
  • PNG export works (check figures/ directory)
  • No overlap between chart elements and filter text

Create test file: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/test_altair_migration.py

Test template:

import marimo as mo
import polars as pl
from utils import QualtricsSurvey

# Load sample data
survey = QualtricsSurvey()
survey.load_data('path/to/data')
survey.fig_save_dir = 'figures/altair_test'

# Test each plot method
mo.md("## Testing Altair Migration")

# 1. Test plot_average_scores_with_counts
chart1 = survey.plot_average_scores_with_counts(...)
chart1

# 2. Test plot_most_ranked_1
chart2 = survey.plot_most_ranked_1(...)
chart2

# ... repeat for all 10 methods

Final Verification Checklist

After completing all tasks, verify the following:

Code Quality

  • No Plotly imports remain in plots.py
  • All methods return alt.Chart instead of go.Figure
  • No syntax errors (python -m py_compile plots.py)
  • Type hints updated (if any reference go.Figure)
  • Docstrings updated (if any mention Plotly)

Theme & Styling

  • jpmc_altair_theme() function exists in theme.py
  • Theme is registered and enabled
  • All charts use ColorPalette constants
  • Chart dimensions match original (width=1000, height=500 defaults)
  • Font sizes match original (11pt for labels, 14pt for titles)

Data Handling

  • All methods handle empty data gracefully
  • Wide-to-long transformations correct (stacked bars, selection counts)
  • Sorting preserved (by average, count, rank1, etc.)
  • Column filtering works (_recordId excluded)
  • String processing works (comma-split, strip, explode)

Visual Features

  • Bar charts render correctly (vertical and horizontal)
  • Stacked bars have correct layer order
  • Text overlays positioned correctly (inside/outside bars)
  • Conditional coloring works (top N highlighting, red/green by sign)
  • Tooltips show correct fields with proper formatting
  • Legends positioned correctly (top horizontal for stacked bars)
  • X-axis labels rotated at -45° by default
  • Grid lines visible

Filter System

  • _get_filter_slug() unchanged (still works)
  • _get_filter_description() unchanged (still works)
  • _add_filter_footnote() uses vconcat approach
  • Filter text appears at bottom of combined chart
  • No overlap between chart and filter text
  • Filter text is left-aligned
  • HTML tags stripped from filter text (Altair doesn't support HTML)
  • Filter subdirectories created correctly

PNG Export

  • vl-convert-python installed
  • chart.save() method works
  • PNG files created in correct subdirectories
  • PNG files have correct filenames (sanitized titles)
  • Image quality acceptable (scale_factor=2.0)

Marimo Integration

  • Charts render in Marimo notebooks
  • Charts are reactive (update when data changes)
  • No JavaScript console errors
  • Interactive features work (tooltips, pan, zoom if enabled)

All 10 Plot Methods

  1. plot_average_scores_with_counts - vertical bar + text
  2. plot_top3_ranking_distribution - stacked bar (3 ranks)
  3. plot_ranking_distribution - stacked bar (4 ranks)
  4. plot_most_ranked_1 - vertical bar + conditional color
  5. plot_weighted_ranking_score - vertical bar + text
  6. plot_voice_selection_counts - vertical bar + conditional color
  7. plot_top3_selection_counts - vertical bar + conditional color
  8. plot_speaking_style_trait_scores - horizontal bar + text
  9. plot_speaking_style_correlation - vertical bar + red/green
  10. plot_speaking_style_ranking_correlation - vertical bar + red/green

Edge Cases

  • Empty DataFrame handled gracefully
  • Missing columns detected and reported
  • Zero counts/values don't break charts
  • Single data point renders correctly
  • Very long labels don't cause layout issues
  • Many categories don't cause overcrowding

Regression Testing

  • Existing Marimo notebooks still work
  • Data filtering still works (filter_data())
  • QualtricsSurvey class initialization unchanged
  • No breaking changes to public API

Documentation

  • This migration plan marked as "Complete"
  • Any new dependencies documented
  • Any breaking changes documented
  • Example usage updated (if applicable)

Troubleshooting

Issue: Charts don't render

  • Check Altair version: python -c "import altair; print(altair.__version__)"
  • Check vl-convert: python -c "import vl_convert; print(vl_convert.__version__)"
  • Check for JavaScript errors in browser console

Issue: PNG export fails

  • Verify vl-convert-python installed: pip show vl-convert-python
  • Check write permissions on figures/ directory
  • Try saving as HTML first: chart.save('test.html')

Issue: Colors don't match theme

  • Verify theme is enabled: print(alt.themes.active)
  • Check color scale definitions in each plot method
  • Ensure ColorPalette imported correctly

Issue: Filter text overlaps chart

  • Increase spacing parameter in vconcat(chart, footer, spacing=20)
  • Increase footer chart height property
  • Check if footer chart is actually created (debug with print())

Issue: Data not displaying

  • Check DataFrame format (wide vs long)
  • Verify column names match encoding specs
  • Check for null values (.drop_nulls())
  • Print intermediate DataFrames for debugging

Notes

  • Backup: Before starting, create a backup of plots.py: cp plots.py plots.py.plotly_backup
  • Incremental testing: Test each plot method immediately after migration
  • Marimo restart: May need to restart Marimo kernel after major changes
  • Performance: Altair may be slightly slower for very large datasets (>5000 rows); use .sample() if needed

Completion Status

  • All tasks (1-17) completed
  • All verification checks passed
  • Existing notebooks tested and working
  • Migration documented
  • Ready for production use

Migration completed on: _________________
Tested by: _________________
Sign-off: _________________