43 KiB
Altair Migration Plan: Plotly → Altair for QualtricsPlotsMixin
Date: January 28, 2026
Status: Not Started
Objective: Migrate all plotting methods from Plotly to Altair to solve filter annotation overlap issues and ensure proper Marimo reactivity.
Background
Problem
Current Plotly implementation has a critical layout issue: filter annotations overlap with long rotated x-axis labels because Plotly doesn't support true bounding boxes. Elements overflow their assigned subplot areas.
Why Altair?
- Better layout control - Vega-Lite (Altair's backend) properly calculates space for all elements
- Marimo reactivity - Marimo documentation states reactive plots require Altair or Plotly; Altair is preferred
- Clean separation -
vconcat()creates true vertical stacking without overflow - Already installed - Altair >=6.0.0 is already a dependency (unused)
Current System Analysis
File Structure
plots.py- ContainsQualtricsPlotsMixinclass with 10 plotting methodstheme.py- ContainsColorPaletteclass with all styling constantsutils.py- ContainsQualtricsSurveyclass that mixes inQualtricsPlotsMixin
Color Palette (from theme.py)
class ColorPalette:
PRIMARY = "#0077B6" # Medium Blue
RANK_1 = "#004C6D" # Dark Blue
RANK_2 = "#008493" # Teal
RANK_3 = "#5AAE95" # Sea Green
RANK_4 = "#9E9E9E" # Grey
NEUTRAL = "#D3D3D3" # Light Grey
TEXT = "black"
GRID = "lightgray"
BACKGROUND = "white"
Current Plot Methods Inventory
| Method Name | Chart Type | Input Format | Special Features |
|---|---|---|---|
plot_average_scores_with_counts |
Vertical Bar | Wide DF (score columns) | Text inside bars (count) |
plot_top3_ranking_distribution |
Stacked Vertical Bar | Wide DF (rank values 1-3) | 3-layer stack, legend |
plot_ranking_distribution |
Stacked Vertical Bar | Wide DF (rank values 1-4) | 4-layer stack, legend |
plot_most_ranked_1 |
Vertical Bar | Wide DF (ranking columns) | Top 3 highlighted |
plot_weighted_ranking_score |
Vertical Bar | Character, Weighted Score |
Text inside bars |
plot_voice_selection_counts |
Vertical Bar | Comma-separated string col | Explode strings, Top 8 highlight |
plot_top3_selection_counts |
Vertical Bar | Comma-separated string col | Explode strings, Top 3 highlight |
plot_speaking_style_trait_scores |
Horizontal Bar | Voice, score, anchors |
Text annotations at bottom |
plot_speaking_style_correlation |
Vertical Bar | Correlation data | Red/Green conditional |
plot_speaking_style_ranking_correlation |
Vertical Bar | Correlation data | Red/Green conditional |
Filter System Components
_get_filter_slug()- Generates directory name from filters (e.g.,Age-22to24_Gen-Man)_get_filter_description()- Generates HTML text (e.g.,Filters: <b>Age:</b> 22-24<br><b>Gender:</b> Man)_add_filter_footnote(fig)- Currently creates 2-row Plotly subplot, adds annotation_save_plot(fig, title)- Adds footer, saves tofigures/{slug}/{filename}.png
Common Styling Pattern
All plots use:
- Height: 500px (default, can override)
- Width: 1000px (default, can override)
- Background: white
- Grid: light gray
- Font size: 11
- X-axis: 45° rotated labels
- Legends (where applicable): Horizontal, positioned above plot
Prerequisites
Dependencies to Add
uv add vl-convert-python # For PNG export from Altair
Dependencies Already Present
altair>=6.0.0✅polars>=1.37.1✅pandas>=2.3.3✅
Migration Tasks
TASK 1: Create Altair Theme in theme.py
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/theme.py
Action: Add an Altair theme function and register it at the end of the file.
Code to add:
def jpmc_altair_theme():
"""JPMC brand theme for Altair charts."""
return {
'config': {
'view': {
'continuousWidth': 1000,
'continuousHeight': 500,
'strokeWidth': 0
},
'background': ColorPalette.BACKGROUND,
'axis': {
'grid': True,
'gridColor': ColorPalette.GRID,
'labelAngle': -45, # Default rotated labels
'labelFontSize': 11,
'titleFontSize': 12,
'labelColor': ColorPalette.TEXT,
'titleColor': ColorPalette.TEXT
},
'axisX': {
'labelAngle': -45
},
'axisY': {
'labelAngle': 0
},
'legend': {
'orient': 'top',
'direction': 'horizontal',
'titleFontSize': 11,
'labelFontSize': 11
},
'title': {
'fontSize': 14,
'color': ColorPalette.TEXT,
'anchor': 'start'
},
'bar': {
'color': ColorPalette.PRIMARY
}
}
}
# Register theme (add at end of file)
try:
import altair as alt
alt.themes.register('jpmc', jpmc_altair_theme)
alt.themes.enable('jpmc')
except ImportError:
pass # Altair not installed
Verification:
- Function
jpmc_altair_theme()exists - Theme is registered as 'jpmc'
- Theme is enabled by default
- Import error is handled gracefully
TASK 2: Update imports in plots.py
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (lines 1-8)
Action: Replace Plotly imports with Altair imports.
Current code:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
Replace with:
import altair as alt
Keep these imports:
import re
from pathlib import Path
import polars as pl
from theme import ColorPalette
Verification:
import altair as altpresent- No Plotly imports remain
- All other imports unchanged
TASK 3: Rewrite _add_filter_footnote for Altair
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~120-212)
Action: Replace entire _add_filter_footnote method with Altair version.
New implementation:
def _add_filter_footnote(self, chart: alt.Chart) -> alt.Chart:
"""Add a footnote with active filters to the chart.
Creates a vconcat with main chart on top and filter text chart below.
Returns the combined chart (or original if no filters).
"""
filter_text = self._get_filter_description()
# Skip if no filters active - return original chart
if not filter_text:
return chart
# Remove HTML tags for plain text (Altair doesn't support HTML in mark_text)
plain_text = re.sub(r'<[^>]+>', '', filter_text)
# Replace <br> with newlines
plain_text = plain_text.replace('<br>', '\n')
# Create a text-only chart for the footer
# Use a dummy dataframe with one row
import pandas as pd
footer_df = pd.DataFrame([{'text': plain_text, 'x': 0, 'y': 0}])
footer_chart = alt.Chart(footer_df).mark_text(
align='left',
baseline='top',
fontSize=9,
color='gray',
dx=5, # Small left padding
dy=5 # Small top padding
).encode(
text='text:N'
).properties(
height=60, # Fixed height for footer
width=chart.width if hasattr(chart, 'width') and chart.width else 1000
)
# Combine with vconcat
combined = alt.vconcat(chart, footer_chart, spacing=10)
return combined
Verification:
- Method signature changed from
fig: go.Figuretochart: alt.Chart - Returns
alt.Chartinstead ofgo.Figure - Uses
vconcatfor vertical stacking - HTML tags are stripped from filter text
- Footer has fixed height
- Spacing between chart and footer is set
TASK 4: Rewrite _save_plot for Altair
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~214-234)
Action: Replace _save_plot method for Altair chart saving.
New implementation:
def _save_plot(self, chart: alt.Chart, title: str) -> alt.Chart:
"""Save chart to PNG file if fig_save_dir is set.
Returns the (potentially modified) chart with filter footnote added.
"""
# Add filter footnote - returns combined chart if filters active
chart = self._add_filter_footnote(chart)
if hasattr(self, 'fig_save_dir') and self.fig_save_dir:
path = Path(self.fig_save_dir)
# Add filter slug subfolder
filter_slug = self._get_filter_slug()
path = path / filter_slug
if not path.exists():
path.mkdir(parents=True, exist_ok=True)
filename = f"{self._sanitize_filename(title)}.png"
# Save using vl-convert backend
chart.save(str(path / filename), format='png', scale_factor=2.0)
return chart
Verification:
- Method signature changed from
fig: go.Figuretochart: alt.Chart - Uses
chart.save()instead offig.write_image() - PNG format specified
- Path handling unchanged (filter slug subdirectories)
- Returns modified chart
TASK 5: Migrate plot_average_scores_with_counts
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~248-313)
Action: Rewrite using Altair bar chart with text overlay.
Current behavior:
- Input: Wide DataFrame with score columns
- Output: Vertical bar chart with average scores, count text inside bars
New implementation:
def plot_average_scores_with_counts(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str = "General Impression (1-10)\nPer Voice with Number of Participants Who Rated It",
x_label: str = "Stimuli",
y_label: str = "Average General Impression Rating (1-10)",
color: str = ColorPalette.PRIMARY,
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Create a bar plot showing average scores and count of non-null values for each column."""
df = self._ensure_dataframe(data)
# Calculate stats for each column (exclude _recordId)
stats = []
for col in [c for c in df.columns if c != '_recordId']:
avg_score = df[col].mean()
non_null_count = df[col].drop_nulls().len()
# Extract voice ID from column name
label = col.split('__')[-1] if '__' in col else col
stats.append({
'voice': label,
'average': avg_score,
'count': non_null_count
})
# Convert to pandas for Altair (sort by average descending)
stats_df = pl.DataFrame(stats).sort('average', descending=True).to_pandas()
# Base bar chart
bars = alt.Chart(stats_df).mark_bar(color=color).encode(
x=alt.X('voice:N', title=x_label, sort='-y'),
y=alt.Y('average:Q', title=y_label, scale=alt.Scale(domain=[0, 10])),
tooltip=[
alt.Tooltip('voice:N', title='Voice'),
alt.Tooltip('average:Q', title='Average', format='.2f'),
alt.Tooltip('count:Q', title='Count')
]
)
# Text overlay for counts
text = alt.Chart(stats_df).mark_text(
dy=-5, # Slight offset above bar
color='black',
fontSize=10
).encode(
x=alt.X('voice:N', sort='-y'),
y=alt.Y('average:Q'),
text=alt.Text('count:Q')
)
# Combine layers
chart = (bars + text).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chartinstead ofgo.Figure - Data transformed to long format (pandas DataFrame)
- Bar chart created with
mark_bar() - Text overlay added with
mark_text() - Layers combined with
+operator - Sorting preserved (by average descending)
- Y-axis scale set to [0, 10]
- Tooltip includes voice, average, count
- Width/height properties set
_save_plotcalled at end
TASK 6: Migrate plot_most_ranked_1
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~537-613)
Action: Rewrite using Altair with conditional coloring (top 3 highlighted).
Current behavior:
- Input: Wide DataFrame with ranking columns
- Output: Vertical bar chart, top 3 bars in PRIMARY color, rest in NEUTRAL
New implementation:
def plot_most_ranked_1(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str = "Most Popular Choice\n(Number of Times Ranked 1st)",
x_label: str = "Item",
y_label: str = "Count of 1st Place Rankings",
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Create a bar chart showing which item was ranked #1 the most. Top 3 highlighted."""
df = self._ensure_dataframe(data)
stats = []
ranking_cols = [c for c in df.columns if c != '_recordId']
for col in ranking_cols:
count_rank_1 = df.filter(pl.col(col) == 1).height
# Clean label
label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
stats.append({'item': label, 'count': count_rank_1})
# Convert and sort
stats_df = pl.DataFrame(stats).sort('count', descending=True)
# Add rank column for coloring (1-3 vs 4+)
stats_df = stats_df.with_row_index('rank_index')
stats_df = stats_df.with_columns(
pl.when(pl.col('rank_index') < 3)
.then(pl.lit('Top 3'))
.otherwise(pl.lit('Other'))
.alias('category')
).to_pandas()
# Bar chart with conditional color
chart = alt.Chart(stats_df).mark_bar().encode(
x=alt.X('item:N', title=x_label, sort='-y'),
y=alt.Y('count:Q', title=y_label),
color=alt.Color('category:N',
scale=alt.Scale(domain=['Top 3', 'Other'],
range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
legend=None),
tooltip=[
alt.Tooltip('item:N', title='Item'),
alt.Tooltip('count:Q', title='1st Place Votes')
]
).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Counts rank 1 occurrences per column
- Adds
categorycolumn for top 3 vs others - Uses conditional color via
alt.Color()with custom scale - Tooltip shows item and count
- Sorted by count descending
- Legend hidden (color is self-explanatory)
TASK 7: Migrate plot_weighted_ranking_score
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~615-662)
Action: Rewrite simple bar chart with text overlay.
Current behavior:
- Input: DataFrame with
CharacterandWeighted Scorecolumns - Output: Vertical bar chart with score text inside bars
New implementation:
def plot_weighted_ranking_score(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str = "Weighted Popularity Score\n(1st=3pts, 2nd=2pts, 3rd=1pt)",
x_label: str = "Character Personality",
y_label: str = "Total Weighted Score",
color: str = ColorPalette.PRIMARY,
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Create a bar chart showing the weighted ranking score for each character."""
weighted_df = self._ensure_dataframe(data).to_pandas()
# Bar chart
bars = alt.Chart(weighted_df).mark_bar(color=color).encode(
x=alt.X('Character:N', title=x_label),
y=alt.Y('Weighted Score:Q', title=y_label),
tooltip=[
alt.Tooltip('Character:N'),
alt.Tooltip('Weighted Score:Q', title='Score')
]
)
# Text overlay
text = bars.mark_text(
dy=-5,
color='white',
fontSize=11
).encode(
text='Weighted Score:Q'
)
chart = (bars + text).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Uses input columns as-is (
Character,Weighted Score) - Text overlay with white color inside bars
- Tooltip shows character and score
TASK 8: Migrate plot_top3_ranking_distribution
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~315-415)
Action: Rewrite stacked bar chart (3 ranks).
Current behavior:
- Input: Wide DataFrame with ranking columns (values 1, 2, 3)
- Output: Stacked bar chart with 3 layers (Rank 1, 2, 3), horizontal legend
New implementation:
def plot_top3_ranking_distribution(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str = "Top 3 Rankings Distribution\nCount of 1st, 2nd, and 3rd Place Votes per Voice",
x_label: str = "Voices",
y_label: str = "Number of Mentions in Top 3",
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Create a stacked bar chart showing how often each voice was ranked 1st, 2nd, or 3rd."""
df = self._ensure_dataframe(data)
# Calculate stats per column
stats = []
for col in [c for c in df.columns if c != '_recordId']:
rank1 = df.filter(pl.col(col) == 1).height
rank2 = df.filter(pl.col(col) == 2).height
rank3 = df.filter(pl.col(col) == 3).height
total = rank1 + rank2 + rank3
if total > 0:
label = col.split('__')[-1] if '__' in col else col
# Add 3 rows (one per rank)
stats.append({'voice': label, 'rank': 'Rank 1 (1st Choice)', 'count': rank1, 'total': total})
stats.append({'voice': label, 'rank': 'Rank 2 (2nd Choice)', 'count': rank2, 'total': total})
stats.append({'voice': label, 'rank': 'Rank 3 (3rd Choice)', 'count': rank3, 'total': total})
# Convert to long format, sort by total
stats_df = pl.DataFrame(stats).to_pandas()
# Create stacked bar chart
chart = alt.Chart(stats_df).mark_bar().encode(
x=alt.X('voice:N', title=x_label, sort=alt.EncodingSortField(field='total', op='sum', order='descending')),
y=alt.Y('count:Q', title=y_label, stack='zero'),
color=alt.Color('rank:N',
scale=alt.Scale(domain=['Rank 1 (1st Choice)', 'Rank 2 (2nd Choice)', 'Rank 3 (3rd Choice)'],
range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3]),
legend=alt.Legend(orient='top', direction='horizontal', title=None)),
tooltip=[
alt.Tooltip('voice:N', title='Voice'),
alt.Tooltip('rank:N', title='Rank'),
alt.Tooltip('count:Q', title='Count')
]
).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Data converted to long format (one row per voice-rank combo)
- Stacked with
stack='zero'in y encoding - Custom color scale for 3 ranks
- Sorted by total (sum of all ranks per voice)
- Horizontal legend at top
- Tooltip shows voice, rank, count
TASK 9: Migrate plot_ranking_distribution
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~417-536)
Action: Rewrite stacked bar chart (4 ranks) - very similar to Task 8.
Current behavior:
- Input: Wide DataFrame with ranking columns (values 1, 2, 3, 4)
- Output: Stacked bar chart with 4 layers
New implementation:
def plot_ranking_distribution(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str = "Rankings Distribution\n(1st to 4th Place)",
x_label: str = "Item",
y_label: str = "Number of Votes",
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Create a stacked bar chart showing the distribution of rankings (1st to 4th)."""
df = self._ensure_dataframe(data)
stats = []
ranking_cols = [c for c in df.columns if c != '_recordId']
for col in ranking_cols:
r1 = df.filter(pl.col(col) == 1).height
r2 = df.filter(pl.col(col) == 2).height
r3 = df.filter(pl.col(col) == 3).height
r4 = df.filter(pl.col(col) == 4).height
total = r1 + r2 + r3 + r4
if total > 0:
label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
stats.append({'item': label, 'rank': 'Rank 1 (Best)', 'count': r1, 'rank1': r1})
stats.append({'item': label, 'rank': 'Rank 2', 'count': r2, 'rank1': r1})
stats.append({'item': label, 'rank': 'Rank 3', 'count': r3, 'rank1': r1})
stats.append({'item': label, 'rank': 'Rank 4 (Worst)', 'count': r4, 'rank1': r1})
if not stats:
return alt.Chart().mark_text(text="No data")
stats_df = pl.DataFrame(stats).to_pandas()
chart = alt.Chart(stats_df).mark_bar().encode(
x=alt.X('item:N', title=x_label, sort=alt.EncodingSortField(field='rank1', order='descending')),
y=alt.Y('count:Q', title=y_label, stack='zero'),
color=alt.Color('rank:N',
scale=alt.Scale(domain=['Rank 1 (Best)', 'Rank 2', 'Rank 3', 'Rank 4 (Worst)'],
range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3, ColorPalette.RANK_4]),
legend=alt.Legend(orient='top', direction='horizontal', title=None)),
tooltip=[
alt.Tooltip('item:N', title='Item'),
alt.Tooltip('rank:N', title='Rank'),
alt.Tooltip('count:Q', title='Count')
]
).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - 4 ranks supported
- Sorted by Rank 1 count (added
rank1field for sorting) - Custom color scale for 4 ranks
- Empty data handled (returns text mark)
TASK 10: Migrate plot_voice_selection_counts
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~664-737)
Action: Rewrite with Polars data transformation + conditional coloring.
Current behavior:
- Input: DataFrame with comma-separated string column (
8_Combined) - Process: Split strings, explode, count occurrences
- Output: Bar chart, top 8 bars in PRIMARY, rest in NEUTRAL
New implementation:
def plot_voice_selection_counts(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
target_column: str = "8_Combined",
title: str = "Most Frequently Chosen Voices\n(Top 8 Highlighted)",
x_label: str = "Voice",
y_label: str = "Number of Times Chosen",
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Create a bar plot showing the frequency of voice selections."""
df = self._ensure_dataframe(data)
if target_column not in df.columns:
return alt.Chart().mark_text(text=f"Column '{target_column}' not found")
# Process data: split, explode, count
stats_df = (
df.select(pl.col(target_column))
.drop_nulls()
.with_columns(pl.col(target_column).str.split(","))
.explode(target_column)
.with_columns(pl.col(target_column).str.strip_chars())
.filter(pl.col(target_column) != "")
.group_by(target_column)
.agg(pl.len().alias("count"))
.sort("count", descending=True)
.with_row_index('rank_index')
.with_columns(
pl.when(pl.col('rank_index') < 8)
.then(pl.lit('Top 8'))
.otherwise(pl.lit('Other'))
.alias('category')
)
.to_pandas()
)
chart = alt.Chart(stats_df).mark_bar().encode(
x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
y=alt.Y('count:Q', title=y_label),
color=alt.Color('category:N',
scale=alt.Scale(domain=['Top 8', 'Other'],
range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
legend=None),
tooltip=[
alt.Tooltip(f'{target_column}:N', title='Voice'),
alt.Tooltip('count:Q', title='Selections')
]
).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Polars chain: split → explode → strip → group → count
- Top 8 categorization logic correct
- Conditional coloring applied
- Sorted by count descending
TASK 11: Migrate plot_top3_selection_counts
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~739-808)
Action: Identical to Task 10, but default column is 3_Ranked and top 3 highlighted.
New implementation:
def plot_top3_selection_counts(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
target_column: str = "3_Ranked",
title: str = "Most Frequently Chosen Top 3 Voices\n(Top 3 Highlighted)",
x_label: str = "Voice",
y_label: str = "Count of Mentions in Top 3",
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Question: Which 3 voices are chosen the most out of 18?"""
df = self._ensure_dataframe(data)
if target_column not in df.columns:
return alt.Chart().mark_text(text=f"Column '{target_column}' not found")
stats_df = (
df.select(pl.col(target_column))
.drop_nulls()
.with_columns(pl.col(target_column).str.split(","))
.explode(target_column)
.with_columns(pl.col(target_column).str.strip_chars())
.filter(pl.col(target_column) != "")
.group_by(target_column)
.agg(pl.len().alias("count"))
.sort("count", descending=True)
.with_row_index('rank_index')
.with_columns(
pl.when(pl.col('rank_index') < 3)
.then(pl.lit('Top 3'))
.otherwise(pl.lit('Other'))
.alias('category')
)
.to_pandas()
)
chart = alt.Chart(stats_df).mark_bar().encode(
x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
y=alt.Y('count:Q', title=y_label),
color=alt.Color('category:N',
scale=alt.Scale(domain=['Top 3', 'Other'],
range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
legend=None),
tooltip=[
alt.Tooltip(f'{target_column}:N', title='Voice'),
alt.Tooltip('count:Q', title='In Top 3')
]
).properties(
title=title,
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Default
target_columnis"3_Ranked" - Top 3 categorization (not top 8)
- Otherwise identical to Task 10
TASK 12: Migrate plot_speaking_style_trait_scores
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~810-926)
Action: Rewrite horizontal bar chart with text annotations.
Current behavior:
- Input: DataFrame with
Voice,score,Left_Anchor,Right_Anchorcolumns - Output: Horizontal bar chart with anchor labels at bottom
New implementation:
def plot_speaking_style_trait_scores(
self,
data: pl.LazyFrame | pl.DataFrame | None = None,
trait_description: str = None,
left_anchor: str = None,
right_anchor: str = None,
title: str = "Speaking Style Trait Analysis",
height: int | None = None,
width: int | None = None,
) -> alt.Chart:
"""Plot scores for a single speaking style trait across multiple voices."""
df = self._ensure_dataframe(data)
if df.is_empty():
return alt.Chart().mark_text(text="No data")
required_cols = ["Voice", "score"]
if not all(col in df.columns for col in required_cols):
return alt.Chart().mark_text(text="Missing required columns")
# Calculate stats: Mean, Count
stats = (
df.filter(pl.col("score").is_not_null())
.group_by("Voice")
.agg([
pl.col("score").mean().alias("mean_score"),
pl.col("score").count().alias("count")
])
.sort("mean_score", descending=False) # Ascending for bottom-to-top display
.to_pandas()
)
# Extract anchors from data if not provided
if (left_anchor is None or right_anchor is None) and "Left_Anchor" in df.columns:
head = df.filter(pl.col("Left_Anchor").is_not_null()).head(1)
if not head.is_empty():
if left_anchor is None:
left_anchor = head["Left_Anchor"][0]
if right_anchor is None:
right_anchor = head["Right_Anchor"][0]
if trait_description is None:
if left_anchor and right_anchor:
trait_description = f"{left_anchor.split('|')[0]} vs. {right_anchor.split('|')[0]}"
elif "Description" in df.columns:
head = df.filter(pl.col("Description").is_not_null()).head(1)
trait_description = head["Description"][0] if not head.is_empty() else ""
else:
trait_description = ""
# Horizontal bar chart
bars = alt.Chart(stats).mark_bar(color=ColorPalette.PRIMARY).encode(
x=alt.X('mean_score:Q', title='Average Score (1-5)', scale=alt.Scale(domain=[1, 5])),
y=alt.Y('Voice:N', title='Voice', sort='-x'),
tooltip=[
alt.Tooltip('Voice:N'),
alt.Tooltip('mean_score:Q', title='Average', format='.2f'),
alt.Tooltip('count:Q', title='Count')
]
)
# Count text inside bars
text = bars.mark_text(
align='center',
baseline='middle',
color='white',
fontSize=16
).encode(
text='count:Q'
)
# Combine
chart = (bars + text).properties(
title={
"text": title,
"subtitle": [trait_description, "(Numbers on bars indicate respondent count)"]
},
width=width or getattr(self, 'plot_width', 1000),
height=height or getattr(self, 'plot_height', 500)
)
# Note: Anchor annotations at bottom would require separate text marks
# positioned at fixed coordinates - can add if needed
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Horizontal orientation (x=score, y=voice)
- X-axis domain set to [1, 5]
- Count text displayed inside bars (white, large font)
- Title includes subtitle with trait description
- Sorted by mean score (ascending for bottom-to-top)
- Anchor label annotations (optional - commented in code)
TASK 13: Migrate plot_speaking_style_correlation
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~928-1018)
Action: Rewrite with red/green conditional coloring based on sign.
Current behavior:
- Input: DataFrame with correlation data
- Process: Calculate Pearson correlation per trait
- Output: Bar chart, positive correlations green, negative red
New implementation:
def plot_speaking_style_correlation(
self,
style_color: str,
style_traits: list[str],
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str | None = None,
) -> alt.Chart:
"""Plots correlation between Speaking Style Trait Scores (1-5) and Voice Scale (1-10)."""
df = self._ensure_dataframe(data)
if title is None:
title = f"Speaking style and voice scale 1-10 correlations"
trait_correlations = []
# Calculate correlations
for i, trait in enumerate(style_traits):
subset = df.filter(pl.col("Right_Anchor") == trait)
valid_data = subset.select(["score", "Voice_Scale_Score"]).drop_nulls()
if valid_data.height > 1:
corr_val = valid_data.select(pl.corr("score", "Voice_Scale_Score")).item()
# Handle trait text - wrap at '|' for display
trait_display = trait.replace('|', '\n')
trait_correlations.append({
"trait_display": trait_display,
"trait_index": f"Trait {i+1}",
"correlation": corr_val if corr_val is not None else 0.0
})
if not trait_correlations:
return alt.Chart().mark_text(text=f"No data for {style_color} Style")
plot_df = pl.DataFrame(trait_correlations).to_pandas()
# Conditional color based on sign
chart = alt.Chart(plot_df).mark_bar().encode(
x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
color=alt.condition(
alt.datum.correlation >= 0,
alt.value('green'),
alt.value('red')
),
tooltip=[
alt.Tooltip('trait_display:N', title='Trait'),
alt.Tooltip('correlation:Q', format='.2f')
]
).properties(
title=title,
width=1000,
height=400
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Pearson correlation calculated via
pl.corr() - Conditional coloring: green if positive, red if negative
- Y-axis domain [-1, 1]
- Trait text wrapped at '|' for display
- Tooltip shows trait and correlation value
TASK 14: Migrate plot_speaking_style_ranking_correlation
Location: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py (currently lines ~1020-1105)
Action: Almost identical to Task 13, but correlates with Ranking_Points instead of Voice_Scale_Score.
New implementation:
def plot_speaking_style_ranking_correlation(
self,
style_color: str,
style_traits: list[str],
data: pl.LazyFrame | pl.DataFrame | None = None,
title: str | None = None,
) -> alt.Chart:
"""Plots correlation between Speaking Style Trait Scores (1-5) and Voice Ranking Points (0-3)."""
df = self._ensure_dataframe(data)
if title is None:
title = f"Speaking style {style_color} and voice ranking points correlations"
trait_correlations = []
for i, trait in enumerate(style_traits):
subset = df.filter(pl.col("Right_Anchor") == trait)
valid_data = subset.select(["score", "Ranking_Points"]).drop_nulls()
if valid_data.height > 1:
corr_val = valid_data.select(pl.corr("score", "Ranking_Points")).item()
trait_display = trait.replace('|', '\n')
trait_correlations.append({
"trait_display": trait_display,
"trait_index": f"Trait {i+1}",
"correlation": corr_val if corr_val is not None else 0.0
})
if not trait_correlations:
return alt.Chart().mark_text(text=f"No data for {style_color} Style")
plot_df = pl.DataFrame(trait_correlations).to_pandas()
chart = alt.Chart(plot_df).mark_bar().encode(
x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
color=alt.condition(
alt.datum.correlation >= 0,
alt.value('green'),
alt.value('red')
),
tooltip=[
alt.Tooltip('trait_display:N', title='Trait'),
alt.Tooltip('correlation:Q', format='.2f')
]
).properties(
title=title,
width=1000,
height=400
)
chart = self._save_plot(chart, title)
return chart
Verification:
- Returns
alt.Chart - Uses
Ranking_Pointscolumn instead ofVoice_Scale_Score - Otherwise identical to Task 13
TASK 15: Install vl-convert-python
Action: Add vl-convert-python to project dependencies for PNG export.
Command:
cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
uv add vl-convert-python
Verification:
vl-convert-pythonappears inpyproject.tomldependencies- Installation successful (no errors)
TASK 16: Remove Plotly dependencies (optional cleanup)
Action: Remove unused Plotly packages.
Command:
cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
uv remove plotly kaleido
Verification:
plotlyandkaleidoremoved frompyproject.toml- No other code depends on Plotly (grep check)
TASK 17: Test all plot methods in Marimo notebook
Action: Create a test notebook to verify all 10 plotting methods work correctly.
Test checklist per plot:
- Chart renders without errors
- Chart has correct dimensions (width/height)
- Colors match ColorPalette constants
- Data is displayed correctly (bars, stacks, etc.)
- Text overlays render (counts, scores)
- Tooltips show correct information
- Filter annotation appears below chart (if filters active)
- PNG export works (check
figures/directory) - No overlap between chart elements and filter text
Create test file: /home/luigi/Documents/VoiceBranding/JPMC/Phase-3/test_altair_migration.py
Test template:
import marimo as mo
import polars as pl
from utils import QualtricsSurvey
# Load sample data
survey = QualtricsSurvey()
survey.load_data('path/to/data')
survey.fig_save_dir = 'figures/altair_test'
# Test each plot method
mo.md("## Testing Altair Migration")
# 1. Test plot_average_scores_with_counts
chart1 = survey.plot_average_scores_with_counts(...)
chart1
# 2. Test plot_most_ranked_1
chart2 = survey.plot_most_ranked_1(...)
chart2
# ... repeat for all 10 methods
Final Verification Checklist
After completing all tasks, verify the following:
Code Quality
- No Plotly imports remain in
plots.py - All methods return
alt.Chartinstead ofgo.Figure - No syntax errors (
python -m py_compile plots.py) - Type hints updated (if any reference
go.Figure) - Docstrings updated (if any mention Plotly)
Theme & Styling
jpmc_altair_theme()function exists intheme.py- Theme is registered and enabled
- All charts use ColorPalette constants
- Chart dimensions match original (width=1000, height=500 defaults)
- Font sizes match original (11pt for labels, 14pt for titles)
Data Handling
- All methods handle empty data gracefully
- Wide-to-long transformations correct (stacked bars, selection counts)
- Sorting preserved (by average, count, rank1, etc.)
- Column filtering works (
_recordIdexcluded) - String processing works (comma-split, strip, explode)
Visual Features
- Bar charts render correctly (vertical and horizontal)
- Stacked bars have correct layer order
- Text overlays positioned correctly (inside/outside bars)
- Conditional coloring works (top N highlighting, red/green by sign)
- Tooltips show correct fields with proper formatting
- Legends positioned correctly (top horizontal for stacked bars)
- X-axis labels rotated at -45° by default
- Grid lines visible
Filter System
_get_filter_slug()unchanged (still works)_get_filter_description()unchanged (still works)_add_filter_footnote()usesvconcatapproach- Filter text appears at bottom of combined chart
- No overlap between chart and filter text
- Filter text is left-aligned
- HTML tags stripped from filter text (Altair doesn't support HTML)
- Filter subdirectories created correctly
PNG Export
vl-convert-pythoninstalledchart.save()method works- PNG files created in correct subdirectories
- PNG files have correct filenames (sanitized titles)
- Image quality acceptable (scale_factor=2.0)
Marimo Integration
- Charts render in Marimo notebooks
- Charts are reactive (update when data changes)
- No JavaScript console errors
- Interactive features work (tooltips, pan, zoom if enabled)
All 10 Plot Methods
plot_average_scores_with_counts- vertical bar + textplot_top3_ranking_distribution- stacked bar (3 ranks)plot_ranking_distribution- stacked bar (4 ranks)plot_most_ranked_1- vertical bar + conditional colorplot_weighted_ranking_score- vertical bar + textplot_voice_selection_counts- vertical bar + conditional colorplot_top3_selection_counts- vertical bar + conditional colorplot_speaking_style_trait_scores- horizontal bar + textplot_speaking_style_correlation- vertical bar + red/greenplot_speaking_style_ranking_correlation- vertical bar + red/green
Edge Cases
- Empty DataFrame handled gracefully
- Missing columns detected and reported
- Zero counts/values don't break charts
- Single data point renders correctly
- Very long labels don't cause layout issues
- Many categories don't cause overcrowding
Regression Testing
- Existing Marimo notebooks still work
- Data filtering still works (
filter_data()) QualtricsSurveyclass initialization unchanged- No breaking changes to public API
Documentation
- This migration plan marked as "Complete"
- Any new dependencies documented
- Any breaking changes documented
- Example usage updated (if applicable)
Troubleshooting
Issue: Charts don't render
- Check Altair version:
python -c "import altair; print(altair.__version__)" - Check vl-convert:
python -c "import vl_convert; print(vl_convert.__version__)" - Check for JavaScript errors in browser console
Issue: PNG export fails
- Verify vl-convert-python installed:
pip show vl-convert-python - Check write permissions on
figures/directory - Try saving as HTML first:
chart.save('test.html')
Issue: Colors don't match theme
- Verify theme is enabled:
print(alt.themes.active) - Check color scale definitions in each plot method
- Ensure ColorPalette imported correctly
Issue: Filter text overlaps chart
- Increase
spacingparameter invconcat(chart, footer, spacing=20) - Increase footer chart
heightproperty - Check if footer chart is actually created (debug with
print())
Issue: Data not displaying
- Check DataFrame format (wide vs long)
- Verify column names match encoding specs
- Check for null values (
.drop_nulls()) - Print intermediate DataFrames for debugging
Notes
- Backup: Before starting, create a backup of
plots.py:cp plots.py plots.py.plotly_backup - Incremental testing: Test each plot method immediately after migration
- Marimo restart: May need to restart Marimo kernel after major changes
- Performance: Altair may be slightly slower for very large datasets (>5000 rows); use
.sample()if needed
Completion Status
- All tasks (1-17) completed
- All verification checks passed
- Existing notebooks tested and working
- Migration documented
- Ready for production use
Migration completed on: _________________
Tested by: _________________
Sign-off: _________________