# Altair Migration Plan: Plotly → Altair for JPMCPlotsMixin **Date:** January 28, 2026 **Status:** Not Started **Objective:** Migrate all plotting methods from Plotly to Altair to solve filter annotation overlap issues and ensure proper Marimo reactivity. --- ## Background ### Problem Current Plotly implementation has a critical layout issue: filter annotations overlap with long rotated x-axis labels because Plotly doesn't support true bounding boxes. Elements overflow their assigned subplot areas. ### Why Altair? 1. **Better layout control** - Vega-Lite (Altair's backend) properly calculates space for all elements 2. **Marimo reactivity** - Marimo documentation states reactive plots require Altair or Plotly; Altair is preferred 3. **Clean separation** - `vconcat()` creates true vertical stacking without overflow 4. **Already installed** - Altair >=6.0.0 is already a dependency (unused) --- ## Current System Analysis ### File Structure - **`plots.py`** - Contains `JPMCPlotsMixin` class with 10 plotting methods - **`theme.py`** - Contains `ColorPalette` class with all styling constants - **`utils.py`** - Contains `JPMCSurvey` class that mixes in `JPMCPlotsMixin` ### Color Palette (from theme.py) ```python class ColorPalette: PRIMARY = "#0077B6" # Medium Blue RANK_1 = "#004C6D" # Dark Blue RANK_2 = "#008493" # Teal RANK_3 = "#5AAE95" # Sea Green RANK_4 = "#9E9E9E" # Grey NEUTRAL = "#D3D3D3" # Light Grey TEXT = "black" GRID = "lightgray" BACKGROUND = "white" ``` ### Current Plot Methods Inventory | Method Name | Chart Type | Input Format | Special Features | |-------------|-----------|--------------|------------------| | `plot_average_scores_with_counts` | Vertical Bar | Wide DF (score columns) | Text inside bars (count) | | `plot_top3_ranking_distribution` | Stacked Vertical Bar | Wide DF (rank values 1-3) | 3-layer stack, legend | | `plot_ranking_distribution` | Stacked Vertical Bar | Wide DF (rank values 1-4) | 4-layer stack, legend | | `plot_most_ranked_1` | Vertical Bar | Wide DF (ranking columns) | Top 3 highlighted | | `plot_weighted_ranking_score` | Vertical Bar | `Character`, `Weighted Score` | Text inside bars | | `plot_voice_selection_counts` | Vertical Bar | Comma-separated string col | Explode strings, Top 8 highlight | | `plot_top3_selection_counts` | Vertical Bar | Comma-separated string col | Explode strings, Top 3 highlight | | `plot_speaking_style_trait_scores` | Horizontal Bar | `Voice`, `score`, anchors | Text annotations at bottom | | `plot_speaking_style_correlation` | Vertical Bar | Correlation data | Red/Green conditional | | `plot_speaking_style_ranking_correlation` | Vertical Bar | Correlation data | Red/Green conditional | ### Filter System Components 1. **`_get_filter_slug()`** - Generates directory name from filters (e.g., `Age-22to24_Gen-Man`) 2. **`_get_filter_description()`** - Generates HTML text (e.g., `Filters: Age: 22-24
Gender: Man`) 3. **`_add_filter_footnote(fig)`** - Currently creates 2-row Plotly subplot, adds annotation 4. **`_save_plot(fig, title)`** - Adds footer, saves to `figures/{slug}/{filename}.png` ### Common Styling Pattern All plots use: - Height: 500px (default, can override) - Width: 1000px (default, can override) - Background: white - Grid: light gray - Font size: 11 - X-axis: 45° rotated labels - Legends (where applicable): Horizontal, positioned above plot --- ## Prerequisites ### Dependencies to Add ```bash uv add vl-convert-python # For PNG export from Altair ``` ### Dependencies Already Present - `altair>=6.0.0` ✅ - `polars>=1.37.1` ✅ - `pandas>=2.3.3` ✅ --- ## Migration Tasks ### TASK 1: Create Altair Theme in theme.py **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/theme.py` **Action:** Add an Altair theme function and register it at the end of the file. **Code to add:** ```python def jpmc_altair_theme(): """JPMC brand theme for Altair charts.""" return { 'config': { 'view': { 'continuousWidth': 1000, 'continuousHeight': 500, 'strokeWidth': 0 }, 'background': ColorPalette.BACKGROUND, 'axis': { 'grid': True, 'gridColor': ColorPalette.GRID, 'labelAngle': -45, # Default rotated labels 'labelFontSize': 11, 'titleFontSize': 12, 'labelColor': ColorPalette.TEXT, 'titleColor': ColorPalette.TEXT }, 'axisX': { 'labelAngle': -45 }, 'axisY': { 'labelAngle': 0 }, 'legend': { 'orient': 'top', 'direction': 'horizontal', 'titleFontSize': 11, 'labelFontSize': 11 }, 'title': { 'fontSize': 14, 'color': ColorPalette.TEXT, 'anchor': 'start' }, 'bar': { 'color': ColorPalette.PRIMARY } } } # Register theme (add at end of file) try: import altair as alt alt.themes.register('jpmc', jpmc_altair_theme) alt.themes.enable('jpmc') except ImportError: pass # Altair not installed ``` **Verification:** - [ ] Function `jpmc_altair_theme()` exists - [ ] Theme is registered as 'jpmc' - [ ] Theme is enabled by default - [ ] Import error is handled gracefully --- ### TASK 2: Update imports in plots.py **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (lines 1-8) **Action:** Replace Plotly imports with Altair imports. **Current code:** ```python import plotly.graph_objects as go from plotly.subplots import make_subplots ``` **Replace with:** ```python import altair as alt ``` **Keep these imports:** ```python import re from pathlib import Path import polars as pl from theme import ColorPalette ``` **Verification:** - [ ] `import altair as alt` present - [ ] No Plotly imports remain - [ ] All other imports unchanged --- ### TASK 3: Rewrite `_add_filter_footnote` for Altair **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~120-212) **Action:** Replace entire `_add_filter_footnote` method with Altair version. **New implementation:** ```python def _add_filter_footnote(self, chart: alt.Chart) -> alt.Chart: """Add a footnote with active filters to the chart. Creates a vconcat with main chart on top and filter text chart below. Returns the combined chart (or original if no filters). """ filter_text = self._get_filter_description() # Skip if no filters active - return original chart if not filter_text: return chart # Remove HTML tags for plain text (Altair doesn't support HTML in mark_text) plain_text = re.sub(r'<[^>]+>', '', filter_text) # Replace
with newlines plain_text = plain_text.replace('
', '\n') # Create a text-only chart for the footer # Use a dummy dataframe with one row import pandas as pd footer_df = pd.DataFrame([{'text': plain_text, 'x': 0, 'y': 0}]) footer_chart = alt.Chart(footer_df).mark_text( align='left', baseline='top', fontSize=9, color='gray', dx=5, # Small left padding dy=5 # Small top padding ).encode( text='text:N' ).properties( height=60, # Fixed height for footer width=chart.width if hasattr(chart, 'width') and chart.width else 1000 ) # Combine with vconcat combined = alt.vconcat(chart, footer_chart, spacing=10) return combined ``` **Verification:** - [ ] Method signature changed from `fig: go.Figure` to `chart: alt.Chart` - [ ] Returns `alt.Chart` instead of `go.Figure` - [ ] Uses `vconcat` for vertical stacking - [ ] HTML tags are stripped from filter text - [ ] Footer has fixed height - [ ] Spacing between chart and footer is set --- ### TASK 4: Rewrite `_save_plot` for Altair **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~214-234) **Action:** Replace `_save_plot` method for Altair chart saving. **New implementation:** ```python def _save_plot(self, chart: alt.Chart, title: str) -> alt.Chart: """Save chart to PNG file if fig_save_dir is set. Returns the (potentially modified) chart with filter footnote added. """ # Add filter footnote - returns combined chart if filters active chart = self._add_filter_footnote(chart) if hasattr(self, 'fig_save_dir') and self.fig_save_dir: path = Path(self.fig_save_dir) # Add filter slug subfolder filter_slug = self._get_filter_slug() path = path / filter_slug if not path.exists(): path.mkdir(parents=True, exist_ok=True) filename = f"{self._sanitize_filename(title)}.png" # Save using vl-convert backend chart.save(str(path / filename), format='png', scale_factor=2.0) return chart ``` **Verification:** - [ ] Method signature changed from `fig: go.Figure` to `chart: alt.Chart` - [ ] Uses `chart.save()` instead of `fig.write_image()` - [ ] PNG format specified - [ ] Path handling unchanged (filter slug subdirectories) - [ ] Returns modified chart --- ### TASK 5: Migrate `plot_average_scores_with_counts` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~248-313) **Action:** Rewrite using Altair bar chart with text overlay. **Current behavior:** - Input: Wide DataFrame with score columns - Output: Vertical bar chart with average scores, count text inside bars **New implementation:** ```python def plot_average_scores_with_counts( self, data: pl.LazyFrame | pl.DataFrame | None = None, title: str = "General Impression (1-10)\nPer Voice with Number of Participants Who Rated It", x_label: str = "Stimuli", y_label: str = "Average General Impression Rating (1-10)", color: str = ColorPalette.PRIMARY, height: int | None = None, width: int | None = None, ) -> alt.Chart: """Create a bar plot showing average scores and count of non-null values for each column.""" df = self._ensure_dataframe(data) # Calculate stats for each column (exclude _recordId) stats = [] for col in [c for c in df.columns if c != '_recordId']: avg_score = df[col].mean() non_null_count = df[col].drop_nulls().len() # Extract voice ID from column name label = col.split('__')[-1] if '__' in col else col stats.append({ 'voice': label, 'average': avg_score, 'count': non_null_count }) # Convert to pandas for Altair (sort by average descending) stats_df = pl.DataFrame(stats).sort('average', descending=True).to_pandas() # Base bar chart bars = alt.Chart(stats_df).mark_bar(color=color).encode( x=alt.X('voice:N', title=x_label, sort='-y'), y=alt.Y('average:Q', title=y_label, scale=alt.Scale(domain=[0, 10])), tooltip=[ alt.Tooltip('voice:N', title='Voice'), alt.Tooltip('average:Q', title='Average', format='.2f'), alt.Tooltip('count:Q', title='Count') ] ) # Text overlay for counts text = alt.Chart(stats_df).mark_text( dy=-5, # Slight offset above bar color='black', fontSize=10 ).encode( x=alt.X('voice:N', sort='-y'), y=alt.Y('average:Q'), text=alt.Text('count:Q') ) # Combine layers chart = (bars + text).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` instead of `go.Figure` - [ ] Data transformed to long format (pandas DataFrame) - [ ] Bar chart created with `mark_bar()` - [ ] Text overlay added with `mark_text()` - [ ] Layers combined with `+` operator - [ ] Sorting preserved (by average descending) - [ ] Y-axis scale set to [0, 10] - [ ] Tooltip includes voice, average, count - [ ] Width/height properties set - [ ] `_save_plot` called at end --- ### TASK 6: Migrate `plot_most_ranked_1` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~537-613) **Action:** Rewrite using Altair with conditional coloring (top 3 highlighted). **Current behavior:** - Input: Wide DataFrame with ranking columns - Output: Vertical bar chart, top 3 bars in PRIMARY color, rest in NEUTRAL **New implementation:** ```python def plot_most_ranked_1( self, data: pl.LazyFrame | pl.DataFrame | None = None, title: str = "Most Popular Choice\n(Number of Times Ranked 1st)", x_label: str = "Item", y_label: str = "Count of 1st Place Rankings", height: int | None = None, width: int | None = None, ) -> alt.Chart: """Create a bar chart showing which item was ranked #1 the most. Top 3 highlighted.""" df = self._ensure_dataframe(data) stats = [] ranking_cols = [c for c in df.columns if c != '_recordId'] for col in ranking_cols: count_rank_1 = df.filter(pl.col(col) == 1).height # Clean label label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip() stats.append({'item': label, 'count': count_rank_1}) # Convert and sort stats_df = pl.DataFrame(stats).sort('count', descending=True) # Add rank column for coloring (1-3 vs 4+) stats_df = stats_df.with_row_index('rank_index') stats_df = stats_df.with_columns( pl.when(pl.col('rank_index') < 3) .then(pl.lit('Top 3')) .otherwise(pl.lit('Other')) .alias('category') ).to_pandas() # Bar chart with conditional color chart = alt.Chart(stats_df).mark_bar().encode( x=alt.X('item:N', title=x_label, sort='-y'), y=alt.Y('count:Q', title=y_label), color=alt.Color('category:N', scale=alt.Scale(domain=['Top 3', 'Other'], range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]), legend=None), tooltip=[ alt.Tooltip('item:N', title='Item'), alt.Tooltip('count:Q', title='1st Place Votes') ] ).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Counts rank 1 occurrences per column - [ ] Adds `category` column for top 3 vs others - [ ] Uses conditional color via `alt.Color()` with custom scale - [ ] Tooltip shows item and count - [ ] Sorted by count descending - [ ] Legend hidden (color is self-explanatory) --- ### TASK 7: Migrate `plot_weighted_ranking_score` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~615-662) **Action:** Rewrite simple bar chart with text overlay. **Current behavior:** - Input: DataFrame with `Character` and `Weighted Score` columns - Output: Vertical bar chart with score text inside bars **New implementation:** ```python def plot_weighted_ranking_score( self, data: pl.LazyFrame | pl.DataFrame | None = None, title: str = "Weighted Popularity Score\n(1st=3pts, 2nd=2pts, 3rd=1pt)", x_label: str = "Character Personality", y_label: str = "Total Weighted Score", color: str = ColorPalette.PRIMARY, height: int | None = None, width: int | None = None, ) -> alt.Chart: """Create a bar chart showing the weighted ranking score for each character.""" weighted_df = self._ensure_dataframe(data).to_pandas() # Bar chart bars = alt.Chart(weighted_df).mark_bar(color=color).encode( x=alt.X('Character:N', title=x_label), y=alt.Y('Weighted Score:Q', title=y_label), tooltip=[ alt.Tooltip('Character:N'), alt.Tooltip('Weighted Score:Q', title='Score') ] ) # Text overlay text = bars.mark_text( dy=-5, color='white', fontSize=11 ).encode( text='Weighted Score:Q' ) chart = (bars + text).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Uses input columns as-is (`Character`, `Weighted Score`) - [ ] Text overlay with white color inside bars - [ ] Tooltip shows character and score --- ### TASK 8: Migrate `plot_top3_ranking_distribution` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~315-415) **Action:** Rewrite stacked bar chart (3 ranks). **Current behavior:** - Input: Wide DataFrame with ranking columns (values 1, 2, 3) - Output: Stacked bar chart with 3 layers (Rank 1, 2, 3), horizontal legend **New implementation:** ```python def plot_top3_ranking_distribution( self, data: pl.LazyFrame | pl.DataFrame | None = None, title: str = "Top 3 Rankings Distribution\nCount of 1st, 2nd, and 3rd Place Votes per Voice", x_label: str = "Voices", y_label: str = "Number of Mentions in Top 3", height: int | None = None, width: int | None = None, ) -> alt.Chart: """Create a stacked bar chart showing how often each voice was ranked 1st, 2nd, or 3rd.""" df = self._ensure_dataframe(data) # Calculate stats per column stats = [] for col in [c for c in df.columns if c != '_recordId']: rank1 = df.filter(pl.col(col) == 1).height rank2 = df.filter(pl.col(col) == 2).height rank3 = df.filter(pl.col(col) == 3).height total = rank1 + rank2 + rank3 if total > 0: label = col.split('__')[-1] if '__' in col else col # Add 3 rows (one per rank) stats.append({'voice': label, 'rank': 'Rank 1 (1st Choice)', 'count': rank1, 'total': total}) stats.append({'voice': label, 'rank': 'Rank 2 (2nd Choice)', 'count': rank2, 'total': total}) stats.append({'voice': label, 'rank': 'Rank 3 (3rd Choice)', 'count': rank3, 'total': total}) # Convert to long format, sort by total stats_df = pl.DataFrame(stats).to_pandas() # Create stacked bar chart chart = alt.Chart(stats_df).mark_bar().encode( x=alt.X('voice:N', title=x_label, sort=alt.EncodingSortField(field='total', op='sum', order='descending')), y=alt.Y('count:Q', title=y_label, stack='zero'), color=alt.Color('rank:N', scale=alt.Scale(domain=['Rank 1 (1st Choice)', 'Rank 2 (2nd Choice)', 'Rank 3 (3rd Choice)'], range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3]), legend=alt.Legend(orient='top', direction='horizontal', title=None)), tooltip=[ alt.Tooltip('voice:N', title='Voice'), alt.Tooltip('rank:N', title='Rank'), alt.Tooltip('count:Q', title='Count') ] ).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Data converted to long format (one row per voice-rank combo) - [ ] Stacked with `stack='zero'` in y encoding - [ ] Custom color scale for 3 ranks - [ ] Sorted by total (sum of all ranks per voice) - [ ] Horizontal legend at top - [ ] Tooltip shows voice, rank, count --- ### TASK 9: Migrate `plot_ranking_distribution` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~417-536) **Action:** Rewrite stacked bar chart (4 ranks) - very similar to Task 8. **Current behavior:** - Input: Wide DataFrame with ranking columns (values 1, 2, 3, 4) - Output: Stacked bar chart with 4 layers **New implementation:** ```python def plot_ranking_distribution( self, data: pl.LazyFrame | pl.DataFrame | None = None, title: str = "Rankings Distribution\n(1st to 4th Place)", x_label: str = "Item", y_label: str = "Number of Votes", height: int | None = None, width: int | None = None, ) -> alt.Chart: """Create a stacked bar chart showing the distribution of rankings (1st to 4th).""" df = self._ensure_dataframe(data) stats = [] ranking_cols = [c for c in df.columns if c != '_recordId'] for col in ranking_cols: r1 = df.filter(pl.col(col) == 1).height r2 = df.filter(pl.col(col) == 2).height r3 = df.filter(pl.col(col) == 3).height r4 = df.filter(pl.col(col) == 4).height total = r1 + r2 + r3 + r4 if total > 0: label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip() stats.append({'item': label, 'rank': 'Rank 1 (Best)', 'count': r1, 'rank1': r1}) stats.append({'item': label, 'rank': 'Rank 2', 'count': r2, 'rank1': r1}) stats.append({'item': label, 'rank': 'Rank 3', 'count': r3, 'rank1': r1}) stats.append({'item': label, 'rank': 'Rank 4 (Worst)', 'count': r4, 'rank1': r1}) if not stats: return alt.Chart().mark_text(text="No data") stats_df = pl.DataFrame(stats).to_pandas() chart = alt.Chart(stats_df).mark_bar().encode( x=alt.X('item:N', title=x_label, sort=alt.EncodingSortField(field='rank1', order='descending')), y=alt.Y('count:Q', title=y_label, stack='zero'), color=alt.Color('rank:N', scale=alt.Scale(domain=['Rank 1 (Best)', 'Rank 2', 'Rank 3', 'Rank 4 (Worst)'], range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3, ColorPalette.RANK_4]), legend=alt.Legend(orient='top', direction='horizontal', title=None)), tooltip=[ alt.Tooltip('item:N', title='Item'), alt.Tooltip('rank:N', title='Rank'), alt.Tooltip('count:Q', title='Count') ] ).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] 4 ranks supported - [ ] Sorted by Rank 1 count (added `rank1` field for sorting) - [ ] Custom color scale for 4 ranks - [ ] Empty data handled (returns text mark) --- ### TASK 10: Migrate `plot_voice_selection_counts` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~664-737) **Action:** Rewrite with Polars data transformation + conditional coloring. **Current behavior:** - Input: DataFrame with comma-separated string column (`8_Combined`) - Process: Split strings, explode, count occurrences - Output: Bar chart, top 8 bars in PRIMARY, rest in NEUTRAL **New implementation:** ```python def plot_voice_selection_counts( self, data: pl.LazyFrame | pl.DataFrame | None = None, target_column: str = "8_Combined", title: str = "Most Frequently Chosen Voices\n(Top 8 Highlighted)", x_label: str = "Voice", y_label: str = "Number of Times Chosen", height: int | None = None, width: int | None = None, ) -> alt.Chart: """Create a bar plot showing the frequency of voice selections.""" df = self._ensure_dataframe(data) if target_column not in df.columns: return alt.Chart().mark_text(text=f"Column '{target_column}' not found") # Process data: split, explode, count stats_df = ( df.select(pl.col(target_column)) .drop_nulls() .with_columns(pl.col(target_column).str.split(",")) .explode(target_column) .with_columns(pl.col(target_column).str.strip_chars()) .filter(pl.col(target_column) != "") .group_by(target_column) .agg(pl.len().alias("count")) .sort("count", descending=True) .with_row_index('rank_index') .with_columns( pl.when(pl.col('rank_index') < 8) .then(pl.lit('Top 8')) .otherwise(pl.lit('Other')) .alias('category') ) .to_pandas() ) chart = alt.Chart(stats_df).mark_bar().encode( x=alt.X(f'{target_column}:N', title=x_label, sort='-y'), y=alt.Y('count:Q', title=y_label), color=alt.Color('category:N', scale=alt.Scale(domain=['Top 8', 'Other'], range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]), legend=None), tooltip=[ alt.Tooltip(f'{target_column}:N', title='Voice'), alt.Tooltip('count:Q', title='Selections') ] ).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Polars chain: split → explode → strip → group → count - [ ] Top 8 categorization logic correct - [ ] Conditional coloring applied - [ ] Sorted by count descending --- ### TASK 11: Migrate `plot_top3_selection_counts` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~739-808) **Action:** Identical to Task 10, but default column is `3_Ranked` and top 3 highlighted. **New implementation:** ```python def plot_top3_selection_counts( self, data: pl.LazyFrame | pl.DataFrame | None = None, target_column: str = "3_Ranked", title: str = "Most Frequently Chosen Top 3 Voices\n(Top 3 Highlighted)", x_label: str = "Voice", y_label: str = "Count of Mentions in Top 3", height: int | None = None, width: int | None = None, ) -> alt.Chart: """Question: Which 3 voices are chosen the most out of 18?""" df = self._ensure_dataframe(data) if target_column not in df.columns: return alt.Chart().mark_text(text=f"Column '{target_column}' not found") stats_df = ( df.select(pl.col(target_column)) .drop_nulls() .with_columns(pl.col(target_column).str.split(",")) .explode(target_column) .with_columns(pl.col(target_column).str.strip_chars()) .filter(pl.col(target_column) != "") .group_by(target_column) .agg(pl.len().alias("count")) .sort("count", descending=True) .with_row_index('rank_index') .with_columns( pl.when(pl.col('rank_index') < 3) .then(pl.lit('Top 3')) .otherwise(pl.lit('Other')) .alias('category') ) .to_pandas() ) chart = alt.Chart(stats_df).mark_bar().encode( x=alt.X(f'{target_column}:N', title=x_label, sort='-y'), y=alt.Y('count:Q', title=y_label), color=alt.Color('category:N', scale=alt.Scale(domain=['Top 3', 'Other'], range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]), legend=None), tooltip=[ alt.Tooltip(f'{target_column}:N', title='Voice'), alt.Tooltip('count:Q', title='In Top 3') ] ).properties( title=title, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Default `target_column` is `"3_Ranked"` - [ ] Top 3 categorization (not top 8) - [ ] Otherwise identical to Task 10 --- ### TASK 12: Migrate `plot_speaking_style_trait_scores` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~810-926) **Action:** Rewrite horizontal bar chart with text annotations. **Current behavior:** - Input: DataFrame with `Voice`, `score`, `Left_Anchor`, `Right_Anchor` columns - Output: Horizontal bar chart with anchor labels at bottom **New implementation:** ```python def plot_speaking_style_trait_scores( self, data: pl.LazyFrame | pl.DataFrame | None = None, trait_description: str = None, left_anchor: str = None, right_anchor: str = None, title: str = "Speaking Style Trait Analysis", height: int | None = None, width: int | None = None, ) -> alt.Chart: """Plot scores for a single speaking style trait across multiple voices.""" df = self._ensure_dataframe(data) if df.is_empty(): return alt.Chart().mark_text(text="No data") required_cols = ["Voice", "score"] if not all(col in df.columns for col in required_cols): return alt.Chart().mark_text(text="Missing required columns") # Calculate stats: Mean, Count stats = ( df.filter(pl.col("score").is_not_null()) .group_by("Voice") .agg([ pl.col("score").mean().alias("mean_score"), pl.col("score").count().alias("count") ]) .sort("mean_score", descending=False) # Ascending for bottom-to-top display .to_pandas() ) # Extract anchors from data if not provided if (left_anchor is None or right_anchor is None) and "Left_Anchor" in df.columns: head = df.filter(pl.col("Left_Anchor").is_not_null()).head(1) if not head.is_empty(): if left_anchor is None: left_anchor = head["Left_Anchor"][0] if right_anchor is None: right_anchor = head["Right_Anchor"][0] if trait_description is None: if left_anchor and right_anchor: trait_description = f"{left_anchor.split('|')[0]} vs. {right_anchor.split('|')[0]}" elif "Description" in df.columns: head = df.filter(pl.col("Description").is_not_null()).head(1) trait_description = head["Description"][0] if not head.is_empty() else "" else: trait_description = "" # Horizontal bar chart bars = alt.Chart(stats).mark_bar(color=ColorPalette.PRIMARY).encode( x=alt.X('mean_score:Q', title='Average Score (1-5)', scale=alt.Scale(domain=[1, 5])), y=alt.Y('Voice:N', title='Voice', sort='-x'), tooltip=[ alt.Tooltip('Voice:N'), alt.Tooltip('mean_score:Q', title='Average', format='.2f'), alt.Tooltip('count:Q', title='Count') ] ) # Count text inside bars text = bars.mark_text( align='center', baseline='middle', color='white', fontSize=16 ).encode( text='count:Q' ) # Combine chart = (bars + text).properties( title={ "text": title, "subtitle": [trait_description, "(Numbers on bars indicate respondent count)"] }, width=width or getattr(self, 'plot_width', 1000), height=height or getattr(self, 'plot_height', 500) ) # Note: Anchor annotations at bottom would require separate text marks # positioned at fixed coordinates - can add if needed chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Horizontal orientation (x=score, y=voice) - [ ] X-axis domain set to [1, 5] - [ ] Count text displayed inside bars (white, large font) - [ ] Title includes subtitle with trait description - [ ] Sorted by mean score (ascending for bottom-to-top) - [ ] Anchor label annotations (optional - commented in code) --- ### TASK 13: Migrate `plot_speaking_style_correlation` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~928-1018) **Action:** Rewrite with red/green conditional coloring based on sign. **Current behavior:** - Input: DataFrame with correlation data - Process: Calculate Pearson correlation per trait - Output: Bar chart, positive correlations green, negative red **New implementation:** ```python def plot_speaking_style_correlation( self, style_color: str, style_traits: list[str], data: pl.LazyFrame | pl.DataFrame | None = None, title: str | None = None, ) -> alt.Chart: """Plots correlation between Speaking Style Trait Scores (1-5) and Voice Scale (1-10).""" df = self._ensure_dataframe(data) if title is None: title = f"Speaking style and voice scale 1-10 correlations" trait_correlations = [] # Calculate correlations for i, trait in enumerate(style_traits): subset = df.filter(pl.col("Right_Anchor") == trait) valid_data = subset.select(["score", "Voice_Scale_Score"]).drop_nulls() if valid_data.height > 1: corr_val = valid_data.select(pl.corr("score", "Voice_Scale_Score")).item() # Handle trait text - wrap at '|' for display trait_display = trait.replace('|', '\n') trait_correlations.append({ "trait_display": trait_display, "trait_index": f"Trait {i+1}", "correlation": corr_val if corr_val is not None else 0.0 }) if not trait_correlations: return alt.Chart().mark_text(text=f"No data for {style_color} Style") plot_df = pl.DataFrame(trait_correlations).to_pandas() # Conditional color based on sign chart = alt.Chart(plot_df).mark_bar().encode( x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)), y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])), color=alt.condition( alt.datum.correlation >= 0, alt.value('green'), alt.value('red') ), tooltip=[ alt.Tooltip('trait_display:N', title='Trait'), alt.Tooltip('correlation:Q', format='.2f') ] ).properties( title=title, width=1000, height=400 ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Pearson correlation calculated via `pl.corr()` - [ ] Conditional coloring: green if positive, red if negative - [ ] Y-axis domain [-1, 1] - [ ] Trait text wrapped at '|' for display - [ ] Tooltip shows trait and correlation value --- ### TASK 14: Migrate `plot_speaking_style_ranking_correlation` **Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~1020-1105) **Action:** Almost identical to Task 13, but correlates with `Ranking_Points` instead of `Voice_Scale_Score`. **New implementation:** ```python def plot_speaking_style_ranking_correlation( self, style_color: str, style_traits: list[str], data: pl.LazyFrame | pl.DataFrame | None = None, title: str | None = None, ) -> alt.Chart: """Plots correlation between Speaking Style Trait Scores (1-5) and Voice Ranking Points (0-3).""" df = self._ensure_dataframe(data) if title is None: title = f"Speaking style {style_color} and voice ranking points correlations" trait_correlations = [] for i, trait in enumerate(style_traits): subset = df.filter(pl.col("Right_Anchor") == trait) valid_data = subset.select(["score", "Ranking_Points"]).drop_nulls() if valid_data.height > 1: corr_val = valid_data.select(pl.corr("score", "Ranking_Points")).item() trait_display = trait.replace('|', '\n') trait_correlations.append({ "trait_display": trait_display, "trait_index": f"Trait {i+1}", "correlation": corr_val if corr_val is not None else 0.0 }) if not trait_correlations: return alt.Chart().mark_text(text=f"No data for {style_color} Style") plot_df = pl.DataFrame(trait_correlations).to_pandas() chart = alt.Chart(plot_df).mark_bar().encode( x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)), y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])), color=alt.condition( alt.datum.correlation >= 0, alt.value('green'), alt.value('red') ), tooltip=[ alt.Tooltip('trait_display:N', title='Trait'), alt.Tooltip('correlation:Q', format='.2f') ] ).properties( title=title, width=1000, height=400 ) chart = self._save_plot(chart, title) return chart ``` **Verification:** - [ ] Returns `alt.Chart` - [ ] Uses `Ranking_Points` column instead of `Voice_Scale_Score` - [ ] Otherwise identical to Task 13 --- ### TASK 15: Install vl-convert-python **Action:** Add vl-convert-python to project dependencies for PNG export. **Command:** ```bash cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3 uv add vl-convert-python ``` **Verification:** - [ ] `vl-convert-python` appears in `pyproject.toml` dependencies - [ ] Installation successful (no errors) --- ### TASK 16: Remove Plotly dependencies (optional cleanup) **Action:** Remove unused Plotly packages. **Command:** ```bash cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3 uv remove plotly kaleido ``` **Verification:** - [ ] `plotly` and `kaleido` removed from `pyproject.toml` - [ ] No other code depends on Plotly (grep check) --- ### TASK 17: Test all plot methods in Marimo notebook **Action:** Create a test notebook to verify all 10 plotting methods work correctly. **Test checklist per plot:** - [ ] Chart renders without errors - [ ] Chart has correct dimensions (width/height) - [ ] Colors match ColorPalette constants - [ ] Data is displayed correctly (bars, stacks, etc.) - [ ] Text overlays render (counts, scores) - [ ] Tooltips show correct information - [ ] Filter annotation appears below chart (if filters active) - [ ] PNG export works (check `figures/` directory) - [ ] No overlap between chart elements and filter text **Create test file:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/test_altair_migration.py` **Test template:** ```python import marimo as mo import polars as pl from utils import JPMCSurvey # Load sample data survey = JPMCSurvey() survey.load_data('path/to/data') survey.fig_save_dir = 'figures/altair_test' # Test each plot method mo.md("## Testing Altair Migration") # 1. Test plot_average_scores_with_counts chart1 = survey.plot_average_scores_with_counts(...) chart1 # 2. Test plot_most_ranked_1 chart2 = survey.plot_most_ranked_1(...) chart2 # ... repeat for all 10 methods ``` --- ## Final Verification Checklist After completing all tasks, verify the following: ### Code Quality - [ ] No Plotly imports remain in `plots.py` - [ ] All methods return `alt.Chart` instead of `go.Figure` - [ ] No syntax errors (`python -m py_compile plots.py`) - [ ] Type hints updated (if any reference `go.Figure`) - [ ] Docstrings updated (if any mention Plotly) ### Theme & Styling - [ ] `jpmc_altair_theme()` function exists in `theme.py` - [ ] Theme is registered and enabled - [ ] All charts use ColorPalette constants - [ ] Chart dimensions match original (width=1000, height=500 defaults) - [ ] Font sizes match original (11pt for labels, 14pt for titles) ### Data Handling - [ ] All methods handle empty data gracefully - [ ] Wide-to-long transformations correct (stacked bars, selection counts) - [ ] Sorting preserved (by average, count, rank1, etc.) - [ ] Column filtering works (`_recordId` excluded) - [ ] String processing works (comma-split, strip, explode) ### Visual Features - [ ] Bar charts render correctly (vertical and horizontal) - [ ] Stacked bars have correct layer order - [ ] Text overlays positioned correctly (inside/outside bars) - [ ] Conditional coloring works (top N highlighting, red/green by sign) - [ ] Tooltips show correct fields with proper formatting - [ ] Legends positioned correctly (top horizontal for stacked bars) - [ ] X-axis labels rotated at -45° by default - [ ] Grid lines visible ### Filter System - [ ] `_get_filter_slug()` unchanged (still works) - [ ] `_get_filter_description()` unchanged (still works) - [ ] `_add_filter_footnote()` uses `vconcat` approach - [ ] Filter text appears at bottom of combined chart - [ ] No overlap between chart and filter text - [ ] Filter text is left-aligned - [ ] HTML tags stripped from filter text (Altair doesn't support HTML) - [ ] Filter subdirectories created correctly ### PNG Export - [ ] `vl-convert-python` installed - [ ] `chart.save()` method works - [ ] PNG files created in correct subdirectories - [ ] PNG files have correct filenames (sanitized titles) - [ ] Image quality acceptable (scale_factor=2.0) ### Marimo Integration - [ ] Charts render in Marimo notebooks - [ ] Charts are reactive (update when data changes) - [ ] No JavaScript console errors - [ ] Interactive features work (tooltips, pan, zoom if enabled) ### All 10 Plot Methods 1. [ ] `plot_average_scores_with_counts` - vertical bar + text 2. [ ] `plot_top3_ranking_distribution` - stacked bar (3 ranks) 3. [ ] `plot_ranking_distribution` - stacked bar (4 ranks) 4. [ ] `plot_most_ranked_1` - vertical bar + conditional color 5. [ ] `plot_weighted_ranking_score` - vertical bar + text 6. [ ] `plot_voice_selection_counts` - vertical bar + conditional color 7. [ ] `plot_top3_selection_counts` - vertical bar + conditional color 8. [ ] `plot_speaking_style_trait_scores` - horizontal bar + text 9. [ ] `plot_speaking_style_correlation` - vertical bar + red/green 10. [ ] `plot_speaking_style_ranking_correlation` - vertical bar + red/green ### Edge Cases - [ ] Empty DataFrame handled gracefully - [ ] Missing columns detected and reported - [ ] Zero counts/values don't break charts - [ ] Single data point renders correctly - [ ] Very long labels don't cause layout issues - [ ] Many categories don't cause overcrowding ### Regression Testing - [ ] Existing Marimo notebooks still work - [ ] Data filtering still works (`filter_data()`) - [ ] `JPMCSurvey` class initialization unchanged - [ ] No breaking changes to public API ### Documentation - [ ] This migration plan marked as "Complete" - [ ] Any new dependencies documented - [ ] Any breaking changes documented - [ ] Example usage updated (if applicable) --- ## Troubleshooting ### Issue: Charts don't render - Check Altair version: `python -c "import altair; print(altair.__version__)"` - Check vl-convert: `python -c "import vl_convert; print(vl_convert.__version__)"` - Check for JavaScript errors in browser console ### Issue: PNG export fails - Verify vl-convert-python installed: `pip show vl-convert-python` - Check write permissions on `figures/` directory - Try saving as HTML first: `chart.save('test.html')` ### Issue: Colors don't match theme - Verify theme is enabled: `print(alt.themes.active)` - Check color scale definitions in each plot method - Ensure ColorPalette imported correctly ### Issue: Filter text overlaps chart - Increase `spacing` parameter in `vconcat(chart, footer, spacing=20)` - Increase footer chart `height` property - Check if footer chart is actually created (debug with `print()`) ### Issue: Data not displaying - Check DataFrame format (wide vs long) - Verify column names match encoding specs - Check for null values (`.drop_nulls()`) - Print intermediate DataFrames for debugging --- ## Notes - **Backup:** Before starting, create a backup of `plots.py`: `cp plots.py plots.py.plotly_backup` - **Incremental testing:** Test each plot method immediately after migration - **Marimo restart:** May need to restart Marimo kernel after major changes - **Performance:** Altair may be slightly slower for very large datasets (>5000 rows); use `.sample()` if needed --- ## Completion Status - [ ] All tasks (1-17) completed - [ ] All verification checks passed - [ ] Existing notebooks tested and working - [ ] Migration documented - [ ] Ready for production use **Migration completed on:** _________________ **Tested by:** _________________ **Sign-off:** _________________