1308 lines
43 KiB
Markdown
1308 lines
43 KiB
Markdown
# Altair Migration Plan: Plotly → Altair for JPMCPlotsMixin
|
|
|
|
**Date:** January 28, 2026
|
|
**Status:** Not Started
|
|
**Objective:** Migrate all plotting methods from Plotly to Altair to solve filter annotation overlap issues and ensure proper Marimo reactivity.
|
|
|
|
---
|
|
|
|
## Background
|
|
|
|
### Problem
|
|
Current Plotly implementation has a critical layout issue: filter annotations overlap with long rotated x-axis labels because Plotly doesn't support true bounding boxes. Elements overflow their assigned subplot areas.
|
|
|
|
### Why Altair?
|
|
1. **Better layout control** - Vega-Lite (Altair's backend) properly calculates space for all elements
|
|
2. **Marimo reactivity** - Marimo documentation states reactive plots require Altair or Plotly; Altair is preferred
|
|
3. **Clean separation** - `vconcat()` creates true vertical stacking without overflow
|
|
4. **Already installed** - Altair >=6.0.0 is already a dependency (unused)
|
|
|
|
---
|
|
|
|
## Current System Analysis
|
|
|
|
### File Structure
|
|
- **`plots.py`** - Contains `JPMCPlotsMixin` class with 10 plotting methods
|
|
- **`theme.py`** - Contains `ColorPalette` class with all styling constants
|
|
- **`utils.py`** - Contains `JPMCSurvey` class that mixes in `JPMCPlotsMixin`
|
|
|
|
### Color Palette (from theme.py)
|
|
```python
|
|
class ColorPalette:
|
|
PRIMARY = "#0077B6" # Medium Blue
|
|
RANK_1 = "#004C6D" # Dark Blue
|
|
RANK_2 = "#008493" # Teal
|
|
RANK_3 = "#5AAE95" # Sea Green
|
|
RANK_4 = "#9E9E9E" # Grey
|
|
NEUTRAL = "#D3D3D3" # Light Grey
|
|
TEXT = "black"
|
|
GRID = "lightgray"
|
|
BACKGROUND = "white"
|
|
```
|
|
|
|
### Current Plot Methods Inventory
|
|
|
|
| Method Name | Chart Type | Input Format | Special Features |
|
|
|-------------|-----------|--------------|------------------|
|
|
| `plot_average_scores_with_counts` | Vertical Bar | Wide DF (score columns) | Text inside bars (count) |
|
|
| `plot_top3_ranking_distribution` | Stacked Vertical Bar | Wide DF (rank values 1-3) | 3-layer stack, legend |
|
|
| `plot_ranking_distribution` | Stacked Vertical Bar | Wide DF (rank values 1-4) | 4-layer stack, legend |
|
|
| `plot_most_ranked_1` | Vertical Bar | Wide DF (ranking columns) | Top 3 highlighted |
|
|
| `plot_weighted_ranking_score` | Vertical Bar | `Character`, `Weighted Score` | Text inside bars |
|
|
| `plot_voice_selection_counts` | Vertical Bar | Comma-separated string col | Explode strings, Top 8 highlight |
|
|
| `plot_top3_selection_counts` | Vertical Bar | Comma-separated string col | Explode strings, Top 3 highlight |
|
|
| `plot_speaking_style_trait_scores` | Horizontal Bar | `Voice`, `score`, anchors | Text annotations at bottom |
|
|
| `plot_speaking_style_correlation` | Vertical Bar | Correlation data | Red/Green conditional |
|
|
| `plot_speaking_style_ranking_correlation` | Vertical Bar | Correlation data | Red/Green conditional |
|
|
|
|
### Filter System Components
|
|
|
|
1. **`_get_filter_slug()`** - Generates directory name from filters (e.g., `Age-22to24_Gen-Man`)
|
|
2. **`_get_filter_description()`** - Generates HTML text (e.g., `Filters: <b>Age:</b> 22-24<br><b>Gender:</b> Man`)
|
|
3. **`_add_filter_footnote(fig)`** - Currently creates 2-row Plotly subplot, adds annotation
|
|
4. **`_save_plot(fig, title)`** - Adds footer, saves to `figures/{slug}/{filename}.png`
|
|
|
|
### Common Styling Pattern
|
|
All plots use:
|
|
- Height: 500px (default, can override)
|
|
- Width: 1000px (default, can override)
|
|
- Background: white
|
|
- Grid: light gray
|
|
- Font size: 11
|
|
- X-axis: 45° rotated labels
|
|
- Legends (where applicable): Horizontal, positioned above plot
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Dependencies to Add
|
|
```bash
|
|
uv add vl-convert-python # For PNG export from Altair
|
|
```
|
|
|
|
### Dependencies Already Present
|
|
- `altair>=6.0.0` ✅
|
|
- `polars>=1.37.1` ✅
|
|
- `pandas>=2.3.3` ✅
|
|
|
|
---
|
|
|
|
## Migration Tasks
|
|
|
|
### TASK 1: Create Altair Theme in theme.py
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/theme.py`
|
|
|
|
**Action:** Add an Altair theme function and register it at the end of the file.
|
|
|
|
**Code to add:**
|
|
```python
|
|
def jpmc_altair_theme():
|
|
"""JPMC brand theme for Altair charts."""
|
|
return {
|
|
'config': {
|
|
'view': {
|
|
'continuousWidth': 1000,
|
|
'continuousHeight': 500,
|
|
'strokeWidth': 0
|
|
},
|
|
'background': ColorPalette.BACKGROUND,
|
|
'axis': {
|
|
'grid': True,
|
|
'gridColor': ColorPalette.GRID,
|
|
'labelAngle': -45, # Default rotated labels
|
|
'labelFontSize': 11,
|
|
'titleFontSize': 12,
|
|
'labelColor': ColorPalette.TEXT,
|
|
'titleColor': ColorPalette.TEXT
|
|
},
|
|
'axisX': {
|
|
'labelAngle': -45
|
|
},
|
|
'axisY': {
|
|
'labelAngle': 0
|
|
},
|
|
'legend': {
|
|
'orient': 'top',
|
|
'direction': 'horizontal',
|
|
'titleFontSize': 11,
|
|
'labelFontSize': 11
|
|
},
|
|
'title': {
|
|
'fontSize': 14,
|
|
'color': ColorPalette.TEXT,
|
|
'anchor': 'start'
|
|
},
|
|
'bar': {
|
|
'color': ColorPalette.PRIMARY
|
|
}
|
|
}
|
|
}
|
|
|
|
# Register theme (add at end of file)
|
|
try:
|
|
import altair as alt
|
|
alt.themes.register('jpmc', jpmc_altair_theme)
|
|
alt.themes.enable('jpmc')
|
|
except ImportError:
|
|
pass # Altair not installed
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Function `jpmc_altair_theme()` exists
|
|
- [ ] Theme is registered as 'jpmc'
|
|
- [ ] Theme is enabled by default
|
|
- [ ] Import error is handled gracefully
|
|
|
|
---
|
|
|
|
### TASK 2: Update imports in plots.py
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (lines 1-8)
|
|
|
|
**Action:** Replace Plotly imports with Altair imports.
|
|
|
|
**Current code:**
|
|
```python
|
|
import plotly.graph_objects as go
|
|
from plotly.subplots import make_subplots
|
|
```
|
|
|
|
**Replace with:**
|
|
```python
|
|
import altair as alt
|
|
```
|
|
|
|
**Keep these imports:**
|
|
```python
|
|
import re
|
|
from pathlib import Path
|
|
import polars as pl
|
|
from theme import ColorPalette
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] `import altair as alt` present
|
|
- [ ] No Plotly imports remain
|
|
- [ ] All other imports unchanged
|
|
|
|
---
|
|
|
|
### TASK 3: Rewrite `_add_filter_footnote` for Altair
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~120-212)
|
|
|
|
**Action:** Replace entire `_add_filter_footnote` method with Altair version.
|
|
|
|
**New implementation:**
|
|
```python
|
|
def _add_filter_footnote(self, chart: alt.Chart) -> alt.Chart:
|
|
"""Add a footnote with active filters to the chart.
|
|
|
|
Creates a vconcat with main chart on top and filter text chart below.
|
|
Returns the combined chart (or original if no filters).
|
|
"""
|
|
filter_text = self._get_filter_description()
|
|
|
|
# Skip if no filters active - return original chart
|
|
if not filter_text:
|
|
return chart
|
|
|
|
# Remove HTML tags for plain text (Altair doesn't support HTML in mark_text)
|
|
plain_text = re.sub(r'<[^>]+>', '', filter_text)
|
|
# Replace <br> with newlines
|
|
plain_text = plain_text.replace('<br>', '\n')
|
|
|
|
# Create a text-only chart for the footer
|
|
# Use a dummy dataframe with one row
|
|
import pandas as pd
|
|
footer_df = pd.DataFrame([{'text': plain_text, 'x': 0, 'y': 0}])
|
|
|
|
footer_chart = alt.Chart(footer_df).mark_text(
|
|
align='left',
|
|
baseline='top',
|
|
fontSize=9,
|
|
color='gray',
|
|
dx=5, # Small left padding
|
|
dy=5 # Small top padding
|
|
).encode(
|
|
text='text:N'
|
|
).properties(
|
|
height=60, # Fixed height for footer
|
|
width=chart.width if hasattr(chart, 'width') and chart.width else 1000
|
|
)
|
|
|
|
# Combine with vconcat
|
|
combined = alt.vconcat(chart, footer_chart, spacing=10)
|
|
|
|
return combined
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Method signature changed from `fig: go.Figure` to `chart: alt.Chart`
|
|
- [ ] Returns `alt.Chart` instead of `go.Figure`
|
|
- [ ] Uses `vconcat` for vertical stacking
|
|
- [ ] HTML tags are stripped from filter text
|
|
- [ ] Footer has fixed height
|
|
- [ ] Spacing between chart and footer is set
|
|
|
|
---
|
|
|
|
### TASK 4: Rewrite `_save_plot` for Altair
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~214-234)
|
|
|
|
**Action:** Replace `_save_plot` method for Altair chart saving.
|
|
|
|
**New implementation:**
|
|
```python
|
|
def _save_plot(self, chart: alt.Chart, title: str) -> alt.Chart:
|
|
"""Save chart to PNG file if fig_save_dir is set.
|
|
|
|
Returns the (potentially modified) chart with filter footnote added.
|
|
"""
|
|
# Add filter footnote - returns combined chart if filters active
|
|
chart = self._add_filter_footnote(chart)
|
|
|
|
if hasattr(self, 'fig_save_dir') and self.fig_save_dir:
|
|
path = Path(self.fig_save_dir)
|
|
|
|
# Add filter slug subfolder
|
|
filter_slug = self._get_filter_slug()
|
|
path = path / filter_slug
|
|
|
|
if not path.exists():
|
|
path.mkdir(parents=True, exist_ok=True)
|
|
|
|
filename = f"{self._sanitize_filename(title)}.png"
|
|
|
|
# Save using vl-convert backend
|
|
chart.save(str(path / filename), format='png', scale_factor=2.0)
|
|
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Method signature changed from `fig: go.Figure` to `chart: alt.Chart`
|
|
- [ ] Uses `chart.save()` instead of `fig.write_image()`
|
|
- [ ] PNG format specified
|
|
- [ ] Path handling unchanged (filter slug subdirectories)
|
|
- [ ] Returns modified chart
|
|
|
|
---
|
|
|
|
### TASK 5: Migrate `plot_average_scores_with_counts`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~248-313)
|
|
|
|
**Action:** Rewrite using Altair bar chart with text overlay.
|
|
|
|
**Current behavior:**
|
|
- Input: Wide DataFrame with score columns
|
|
- Output: Vertical bar chart with average scores, count text inside bars
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_average_scores_with_counts(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str = "General Impression (1-10)\nPer Voice with Number of Participants Who Rated It",
|
|
x_label: str = "Stimuli",
|
|
y_label: str = "Average General Impression Rating (1-10)",
|
|
color: str = ColorPalette.PRIMARY,
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Create a bar plot showing average scores and count of non-null values for each column."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
# Calculate stats for each column (exclude _recordId)
|
|
stats = []
|
|
for col in [c for c in df.columns if c != '_recordId']:
|
|
avg_score = df[col].mean()
|
|
non_null_count = df[col].drop_nulls().len()
|
|
# Extract voice ID from column name
|
|
label = col.split('__')[-1] if '__' in col else col
|
|
stats.append({
|
|
'voice': label,
|
|
'average': avg_score,
|
|
'count': non_null_count
|
|
})
|
|
|
|
# Convert to pandas for Altair (sort by average descending)
|
|
stats_df = pl.DataFrame(stats).sort('average', descending=True).to_pandas()
|
|
|
|
# Base bar chart
|
|
bars = alt.Chart(stats_df).mark_bar(color=color).encode(
|
|
x=alt.X('voice:N', title=x_label, sort='-y'),
|
|
y=alt.Y('average:Q', title=y_label, scale=alt.Scale(domain=[0, 10])),
|
|
tooltip=[
|
|
alt.Tooltip('voice:N', title='Voice'),
|
|
alt.Tooltip('average:Q', title='Average', format='.2f'),
|
|
alt.Tooltip('count:Q', title='Count')
|
|
]
|
|
)
|
|
|
|
# Text overlay for counts
|
|
text = alt.Chart(stats_df).mark_text(
|
|
dy=-5, # Slight offset above bar
|
|
color='black',
|
|
fontSize=10
|
|
).encode(
|
|
x=alt.X('voice:N', sort='-y'),
|
|
y=alt.Y('average:Q'),
|
|
text=alt.Text('count:Q')
|
|
)
|
|
|
|
# Combine layers
|
|
chart = (bars + text).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart` instead of `go.Figure`
|
|
- [ ] Data transformed to long format (pandas DataFrame)
|
|
- [ ] Bar chart created with `mark_bar()`
|
|
- [ ] Text overlay added with `mark_text()`
|
|
- [ ] Layers combined with `+` operator
|
|
- [ ] Sorting preserved (by average descending)
|
|
- [ ] Y-axis scale set to [0, 10]
|
|
- [ ] Tooltip includes voice, average, count
|
|
- [ ] Width/height properties set
|
|
- [ ] `_save_plot` called at end
|
|
|
|
---
|
|
|
|
### TASK 6: Migrate `plot_most_ranked_1`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~537-613)
|
|
|
|
**Action:** Rewrite using Altair with conditional coloring (top 3 highlighted).
|
|
|
|
**Current behavior:**
|
|
- Input: Wide DataFrame with ranking columns
|
|
- Output: Vertical bar chart, top 3 bars in PRIMARY color, rest in NEUTRAL
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_most_ranked_1(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str = "Most Popular Choice\n(Number of Times Ranked 1st)",
|
|
x_label: str = "Item",
|
|
y_label: str = "Count of 1st Place Rankings",
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Create a bar chart showing which item was ranked #1 the most. Top 3 highlighted."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
stats = []
|
|
ranking_cols = [c for c in df.columns if c != '_recordId']
|
|
|
|
for col in ranking_cols:
|
|
count_rank_1 = df.filter(pl.col(col) == 1).height
|
|
# Clean label
|
|
label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
|
|
stats.append({'item': label, 'count': count_rank_1})
|
|
|
|
# Convert and sort
|
|
stats_df = pl.DataFrame(stats).sort('count', descending=True)
|
|
|
|
# Add rank column for coloring (1-3 vs 4+)
|
|
stats_df = stats_df.with_row_index('rank_index')
|
|
stats_df = stats_df.with_columns(
|
|
pl.when(pl.col('rank_index') < 3)
|
|
.then(pl.lit('Top 3'))
|
|
.otherwise(pl.lit('Other'))
|
|
.alias('category')
|
|
).to_pandas()
|
|
|
|
# Bar chart with conditional color
|
|
chart = alt.Chart(stats_df).mark_bar().encode(
|
|
x=alt.X('item:N', title=x_label, sort='-y'),
|
|
y=alt.Y('count:Q', title=y_label),
|
|
color=alt.Color('category:N',
|
|
scale=alt.Scale(domain=['Top 3', 'Other'],
|
|
range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
|
|
legend=None),
|
|
tooltip=[
|
|
alt.Tooltip('item:N', title='Item'),
|
|
alt.Tooltip('count:Q', title='1st Place Votes')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Counts rank 1 occurrences per column
|
|
- [ ] Adds `category` column for top 3 vs others
|
|
- [ ] Uses conditional color via `alt.Color()` with custom scale
|
|
- [ ] Tooltip shows item and count
|
|
- [ ] Sorted by count descending
|
|
- [ ] Legend hidden (color is self-explanatory)
|
|
|
|
---
|
|
|
|
### TASK 7: Migrate `plot_weighted_ranking_score`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~615-662)
|
|
|
|
**Action:** Rewrite simple bar chart with text overlay.
|
|
|
|
**Current behavior:**
|
|
- Input: DataFrame with `Character` and `Weighted Score` columns
|
|
- Output: Vertical bar chart with score text inside bars
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_weighted_ranking_score(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str = "Weighted Popularity Score\n(1st=3pts, 2nd=2pts, 3rd=1pt)",
|
|
x_label: str = "Character Personality",
|
|
y_label: str = "Total Weighted Score",
|
|
color: str = ColorPalette.PRIMARY,
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Create a bar chart showing the weighted ranking score for each character."""
|
|
weighted_df = self._ensure_dataframe(data).to_pandas()
|
|
|
|
# Bar chart
|
|
bars = alt.Chart(weighted_df).mark_bar(color=color).encode(
|
|
x=alt.X('Character:N', title=x_label),
|
|
y=alt.Y('Weighted Score:Q', title=y_label),
|
|
tooltip=[
|
|
alt.Tooltip('Character:N'),
|
|
alt.Tooltip('Weighted Score:Q', title='Score')
|
|
]
|
|
)
|
|
|
|
# Text overlay
|
|
text = bars.mark_text(
|
|
dy=-5,
|
|
color='white',
|
|
fontSize=11
|
|
).encode(
|
|
text='Weighted Score:Q'
|
|
)
|
|
|
|
chart = (bars + text).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Uses input columns as-is (`Character`, `Weighted Score`)
|
|
- [ ] Text overlay with white color inside bars
|
|
- [ ] Tooltip shows character and score
|
|
|
|
---
|
|
|
|
### TASK 8: Migrate `plot_top3_ranking_distribution`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~315-415)
|
|
|
|
**Action:** Rewrite stacked bar chart (3 ranks).
|
|
|
|
**Current behavior:**
|
|
- Input: Wide DataFrame with ranking columns (values 1, 2, 3)
|
|
- Output: Stacked bar chart with 3 layers (Rank 1, 2, 3), horizontal legend
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_top3_ranking_distribution(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str = "Top 3 Rankings Distribution\nCount of 1st, 2nd, and 3rd Place Votes per Voice",
|
|
x_label: str = "Voices",
|
|
y_label: str = "Number of Mentions in Top 3",
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Create a stacked bar chart showing how often each voice was ranked 1st, 2nd, or 3rd."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
# Calculate stats per column
|
|
stats = []
|
|
for col in [c for c in df.columns if c != '_recordId']:
|
|
rank1 = df.filter(pl.col(col) == 1).height
|
|
rank2 = df.filter(pl.col(col) == 2).height
|
|
rank3 = df.filter(pl.col(col) == 3).height
|
|
total = rank1 + rank2 + rank3
|
|
|
|
if total > 0:
|
|
label = col.split('__')[-1] if '__' in col else col
|
|
# Add 3 rows (one per rank)
|
|
stats.append({'voice': label, 'rank': 'Rank 1 (1st Choice)', 'count': rank1, 'total': total})
|
|
stats.append({'voice': label, 'rank': 'Rank 2 (2nd Choice)', 'count': rank2, 'total': total})
|
|
stats.append({'voice': label, 'rank': 'Rank 3 (3rd Choice)', 'count': rank3, 'total': total})
|
|
|
|
# Convert to long format, sort by total
|
|
stats_df = pl.DataFrame(stats).to_pandas()
|
|
|
|
# Create stacked bar chart
|
|
chart = alt.Chart(stats_df).mark_bar().encode(
|
|
x=alt.X('voice:N', title=x_label, sort=alt.EncodingSortField(field='total', op='sum', order='descending')),
|
|
y=alt.Y('count:Q', title=y_label, stack='zero'),
|
|
color=alt.Color('rank:N',
|
|
scale=alt.Scale(domain=['Rank 1 (1st Choice)', 'Rank 2 (2nd Choice)', 'Rank 3 (3rd Choice)'],
|
|
range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3]),
|
|
legend=alt.Legend(orient='top', direction='horizontal', title=None)),
|
|
tooltip=[
|
|
alt.Tooltip('voice:N', title='Voice'),
|
|
alt.Tooltip('rank:N', title='Rank'),
|
|
alt.Tooltip('count:Q', title='Count')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Data converted to long format (one row per voice-rank combo)
|
|
- [ ] Stacked with `stack='zero'` in y encoding
|
|
- [ ] Custom color scale for 3 ranks
|
|
- [ ] Sorted by total (sum of all ranks per voice)
|
|
- [ ] Horizontal legend at top
|
|
- [ ] Tooltip shows voice, rank, count
|
|
|
|
---
|
|
|
|
### TASK 9: Migrate `plot_ranking_distribution`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~417-536)
|
|
|
|
**Action:** Rewrite stacked bar chart (4 ranks) - very similar to Task 8.
|
|
|
|
**Current behavior:**
|
|
- Input: Wide DataFrame with ranking columns (values 1, 2, 3, 4)
|
|
- Output: Stacked bar chart with 4 layers
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_ranking_distribution(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str = "Rankings Distribution\n(1st to 4th Place)",
|
|
x_label: str = "Item",
|
|
y_label: str = "Number of Votes",
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Create a stacked bar chart showing the distribution of rankings (1st to 4th)."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
stats = []
|
|
ranking_cols = [c for c in df.columns if c != '_recordId']
|
|
|
|
for col in ranking_cols:
|
|
r1 = df.filter(pl.col(col) == 1).height
|
|
r2 = df.filter(pl.col(col) == 2).height
|
|
r3 = df.filter(pl.col(col) == 3).height
|
|
r4 = df.filter(pl.col(col) == 4).height
|
|
total = r1 + r2 + r3 + r4
|
|
|
|
if total > 0:
|
|
label = col.replace('Character_Ranking_', '').replace('Top_3_Voices_ranking__', '').replace('_', ' ').strip()
|
|
stats.append({'item': label, 'rank': 'Rank 1 (Best)', 'count': r1, 'rank1': r1})
|
|
stats.append({'item': label, 'rank': 'Rank 2', 'count': r2, 'rank1': r1})
|
|
stats.append({'item': label, 'rank': 'Rank 3', 'count': r3, 'rank1': r1})
|
|
stats.append({'item': label, 'rank': 'Rank 4 (Worst)', 'count': r4, 'rank1': r1})
|
|
|
|
if not stats:
|
|
return alt.Chart().mark_text(text="No data")
|
|
|
|
stats_df = pl.DataFrame(stats).to_pandas()
|
|
|
|
chart = alt.Chart(stats_df).mark_bar().encode(
|
|
x=alt.X('item:N', title=x_label, sort=alt.EncodingSortField(field='rank1', order='descending')),
|
|
y=alt.Y('count:Q', title=y_label, stack='zero'),
|
|
color=alt.Color('rank:N',
|
|
scale=alt.Scale(domain=['Rank 1 (Best)', 'Rank 2', 'Rank 3', 'Rank 4 (Worst)'],
|
|
range=[ColorPalette.RANK_1, ColorPalette.RANK_2, ColorPalette.RANK_3, ColorPalette.RANK_4]),
|
|
legend=alt.Legend(orient='top', direction='horizontal', title=None)),
|
|
tooltip=[
|
|
alt.Tooltip('item:N', title='Item'),
|
|
alt.Tooltip('rank:N', title='Rank'),
|
|
alt.Tooltip('count:Q', title='Count')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] 4 ranks supported
|
|
- [ ] Sorted by Rank 1 count (added `rank1` field for sorting)
|
|
- [ ] Custom color scale for 4 ranks
|
|
- [ ] Empty data handled (returns text mark)
|
|
|
|
---
|
|
|
|
### TASK 10: Migrate `plot_voice_selection_counts`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~664-737)
|
|
|
|
**Action:** Rewrite with Polars data transformation + conditional coloring.
|
|
|
|
**Current behavior:**
|
|
- Input: DataFrame with comma-separated string column (`8_Combined`)
|
|
- Process: Split strings, explode, count occurrences
|
|
- Output: Bar chart, top 8 bars in PRIMARY, rest in NEUTRAL
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_voice_selection_counts(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
target_column: str = "8_Combined",
|
|
title: str = "Most Frequently Chosen Voices\n(Top 8 Highlighted)",
|
|
x_label: str = "Voice",
|
|
y_label: str = "Number of Times Chosen",
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Create a bar plot showing the frequency of voice selections."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
if target_column not in df.columns:
|
|
return alt.Chart().mark_text(text=f"Column '{target_column}' not found")
|
|
|
|
# Process data: split, explode, count
|
|
stats_df = (
|
|
df.select(pl.col(target_column))
|
|
.drop_nulls()
|
|
.with_columns(pl.col(target_column).str.split(","))
|
|
.explode(target_column)
|
|
.with_columns(pl.col(target_column).str.strip_chars())
|
|
.filter(pl.col(target_column) != "")
|
|
.group_by(target_column)
|
|
.agg(pl.len().alias("count"))
|
|
.sort("count", descending=True)
|
|
.with_row_index('rank_index')
|
|
.with_columns(
|
|
pl.when(pl.col('rank_index') < 8)
|
|
.then(pl.lit('Top 8'))
|
|
.otherwise(pl.lit('Other'))
|
|
.alias('category')
|
|
)
|
|
.to_pandas()
|
|
)
|
|
|
|
chart = alt.Chart(stats_df).mark_bar().encode(
|
|
x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
|
|
y=alt.Y('count:Q', title=y_label),
|
|
color=alt.Color('category:N',
|
|
scale=alt.Scale(domain=['Top 8', 'Other'],
|
|
range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
|
|
legend=None),
|
|
tooltip=[
|
|
alt.Tooltip(f'{target_column}:N', title='Voice'),
|
|
alt.Tooltip('count:Q', title='Selections')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Polars chain: split → explode → strip → group → count
|
|
- [ ] Top 8 categorization logic correct
|
|
- [ ] Conditional coloring applied
|
|
- [ ] Sorted by count descending
|
|
|
|
---
|
|
|
|
### TASK 11: Migrate `plot_top3_selection_counts`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~739-808)
|
|
|
|
**Action:** Identical to Task 10, but default column is `3_Ranked` and top 3 highlighted.
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_top3_selection_counts(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
target_column: str = "3_Ranked",
|
|
title: str = "Most Frequently Chosen Top 3 Voices\n(Top 3 Highlighted)",
|
|
x_label: str = "Voice",
|
|
y_label: str = "Count of Mentions in Top 3",
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Question: Which 3 voices are chosen the most out of 18?"""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
if target_column not in df.columns:
|
|
return alt.Chart().mark_text(text=f"Column '{target_column}' not found")
|
|
|
|
stats_df = (
|
|
df.select(pl.col(target_column))
|
|
.drop_nulls()
|
|
.with_columns(pl.col(target_column).str.split(","))
|
|
.explode(target_column)
|
|
.with_columns(pl.col(target_column).str.strip_chars())
|
|
.filter(pl.col(target_column) != "")
|
|
.group_by(target_column)
|
|
.agg(pl.len().alias("count"))
|
|
.sort("count", descending=True)
|
|
.with_row_index('rank_index')
|
|
.with_columns(
|
|
pl.when(pl.col('rank_index') < 3)
|
|
.then(pl.lit('Top 3'))
|
|
.otherwise(pl.lit('Other'))
|
|
.alias('category')
|
|
)
|
|
.to_pandas()
|
|
)
|
|
|
|
chart = alt.Chart(stats_df).mark_bar().encode(
|
|
x=alt.X(f'{target_column}:N', title=x_label, sort='-y'),
|
|
y=alt.Y('count:Q', title=y_label),
|
|
color=alt.Color('category:N',
|
|
scale=alt.Scale(domain=['Top 3', 'Other'],
|
|
range=[ColorPalette.PRIMARY, ColorPalette.NEUTRAL]),
|
|
legend=None),
|
|
tooltip=[
|
|
alt.Tooltip(f'{target_column}:N', title='Voice'),
|
|
alt.Tooltip('count:Q', title='In Top 3')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Default `target_column` is `"3_Ranked"`
|
|
- [ ] Top 3 categorization (not top 8)
|
|
- [ ] Otherwise identical to Task 10
|
|
|
|
---
|
|
|
|
### TASK 12: Migrate `plot_speaking_style_trait_scores`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~810-926)
|
|
|
|
**Action:** Rewrite horizontal bar chart with text annotations.
|
|
|
|
**Current behavior:**
|
|
- Input: DataFrame with `Voice`, `score`, `Left_Anchor`, `Right_Anchor` columns
|
|
- Output: Horizontal bar chart with anchor labels at bottom
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_speaking_style_trait_scores(
|
|
self,
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
trait_description: str = None,
|
|
left_anchor: str = None,
|
|
right_anchor: str = None,
|
|
title: str = "Speaking Style Trait Analysis",
|
|
height: int | None = None,
|
|
width: int | None = None,
|
|
) -> alt.Chart:
|
|
"""Plot scores for a single speaking style trait across multiple voices."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
if df.is_empty():
|
|
return alt.Chart().mark_text(text="No data")
|
|
|
|
required_cols = ["Voice", "score"]
|
|
if not all(col in df.columns for col in required_cols):
|
|
return alt.Chart().mark_text(text="Missing required columns")
|
|
|
|
# Calculate stats: Mean, Count
|
|
stats = (
|
|
df.filter(pl.col("score").is_not_null())
|
|
.group_by("Voice")
|
|
.agg([
|
|
pl.col("score").mean().alias("mean_score"),
|
|
pl.col("score").count().alias("count")
|
|
])
|
|
.sort("mean_score", descending=False) # Ascending for bottom-to-top display
|
|
.to_pandas()
|
|
)
|
|
|
|
# Extract anchors from data if not provided
|
|
if (left_anchor is None or right_anchor is None) and "Left_Anchor" in df.columns:
|
|
head = df.filter(pl.col("Left_Anchor").is_not_null()).head(1)
|
|
if not head.is_empty():
|
|
if left_anchor is None:
|
|
left_anchor = head["Left_Anchor"][0]
|
|
if right_anchor is None:
|
|
right_anchor = head["Right_Anchor"][0]
|
|
|
|
if trait_description is None:
|
|
if left_anchor and right_anchor:
|
|
trait_description = f"{left_anchor.split('|')[0]} vs. {right_anchor.split('|')[0]}"
|
|
elif "Description" in df.columns:
|
|
head = df.filter(pl.col("Description").is_not_null()).head(1)
|
|
trait_description = head["Description"][0] if not head.is_empty() else ""
|
|
else:
|
|
trait_description = ""
|
|
|
|
# Horizontal bar chart
|
|
bars = alt.Chart(stats).mark_bar(color=ColorPalette.PRIMARY).encode(
|
|
x=alt.X('mean_score:Q', title='Average Score (1-5)', scale=alt.Scale(domain=[1, 5])),
|
|
y=alt.Y('Voice:N', title='Voice', sort='-x'),
|
|
tooltip=[
|
|
alt.Tooltip('Voice:N'),
|
|
alt.Tooltip('mean_score:Q', title='Average', format='.2f'),
|
|
alt.Tooltip('count:Q', title='Count')
|
|
]
|
|
)
|
|
|
|
# Count text inside bars
|
|
text = bars.mark_text(
|
|
align='center',
|
|
baseline='middle',
|
|
color='white',
|
|
fontSize=16
|
|
).encode(
|
|
text='count:Q'
|
|
)
|
|
|
|
# Combine
|
|
chart = (bars + text).properties(
|
|
title={
|
|
"text": title,
|
|
"subtitle": [trait_description, "(Numbers on bars indicate respondent count)"]
|
|
},
|
|
width=width or getattr(self, 'plot_width', 1000),
|
|
height=height or getattr(self, 'plot_height', 500)
|
|
)
|
|
|
|
# Note: Anchor annotations at bottom would require separate text marks
|
|
# positioned at fixed coordinates - can add if needed
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Horizontal orientation (x=score, y=voice)
|
|
- [ ] X-axis domain set to [1, 5]
|
|
- [ ] Count text displayed inside bars (white, large font)
|
|
- [ ] Title includes subtitle with trait description
|
|
- [ ] Sorted by mean score (ascending for bottom-to-top)
|
|
- [ ] Anchor label annotations (optional - commented in code)
|
|
|
|
---
|
|
|
|
### TASK 13: Migrate `plot_speaking_style_correlation`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~928-1018)
|
|
|
|
**Action:** Rewrite with red/green conditional coloring based on sign.
|
|
|
|
**Current behavior:**
|
|
- Input: DataFrame with correlation data
|
|
- Process: Calculate Pearson correlation per trait
|
|
- Output: Bar chart, positive correlations green, negative red
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_speaking_style_correlation(
|
|
self,
|
|
style_color: str,
|
|
style_traits: list[str],
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str | None = None,
|
|
) -> alt.Chart:
|
|
"""Plots correlation between Speaking Style Trait Scores (1-5) and Voice Scale (1-10)."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
if title is None:
|
|
title = f"Speaking style and voice scale 1-10 correlations"
|
|
|
|
trait_correlations = []
|
|
|
|
# Calculate correlations
|
|
for i, trait in enumerate(style_traits):
|
|
subset = df.filter(pl.col("Right_Anchor") == trait)
|
|
valid_data = subset.select(["score", "Voice_Scale_Score"]).drop_nulls()
|
|
|
|
if valid_data.height > 1:
|
|
corr_val = valid_data.select(pl.corr("score", "Voice_Scale_Score")).item()
|
|
# Handle trait text - wrap at '|' for display
|
|
trait_display = trait.replace('|', '\n')
|
|
trait_correlations.append({
|
|
"trait_display": trait_display,
|
|
"trait_index": f"Trait {i+1}",
|
|
"correlation": corr_val if corr_val is not None else 0.0
|
|
})
|
|
|
|
if not trait_correlations:
|
|
return alt.Chart().mark_text(text=f"No data for {style_color} Style")
|
|
|
|
plot_df = pl.DataFrame(trait_correlations).to_pandas()
|
|
|
|
# Conditional color based on sign
|
|
chart = alt.Chart(plot_df).mark_bar().encode(
|
|
x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
|
|
y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
|
|
color=alt.condition(
|
|
alt.datum.correlation >= 0,
|
|
alt.value('green'),
|
|
alt.value('red')
|
|
),
|
|
tooltip=[
|
|
alt.Tooltip('trait_display:N', title='Trait'),
|
|
alt.Tooltip('correlation:Q', format='.2f')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=1000,
|
|
height=400
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Pearson correlation calculated via `pl.corr()`
|
|
- [ ] Conditional coloring: green if positive, red if negative
|
|
- [ ] Y-axis domain [-1, 1]
|
|
- [ ] Trait text wrapped at '|' for display
|
|
- [ ] Tooltip shows trait and correlation value
|
|
|
|
---
|
|
|
|
### TASK 14: Migrate `plot_speaking_style_ranking_correlation`
|
|
|
|
**Location:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/plots.py` (currently lines ~1020-1105)
|
|
|
|
**Action:** Almost identical to Task 13, but correlates with `Ranking_Points` instead of `Voice_Scale_Score`.
|
|
|
|
**New implementation:**
|
|
```python
|
|
def plot_speaking_style_ranking_correlation(
|
|
self,
|
|
style_color: str,
|
|
style_traits: list[str],
|
|
data: pl.LazyFrame | pl.DataFrame | None = None,
|
|
title: str | None = None,
|
|
) -> alt.Chart:
|
|
"""Plots correlation between Speaking Style Trait Scores (1-5) and Voice Ranking Points (0-3)."""
|
|
df = self._ensure_dataframe(data)
|
|
|
|
if title is None:
|
|
title = f"Speaking style {style_color} and voice ranking points correlations"
|
|
|
|
trait_correlations = []
|
|
|
|
for i, trait in enumerate(style_traits):
|
|
subset = df.filter(pl.col("Right_Anchor") == trait)
|
|
valid_data = subset.select(["score", "Ranking_Points"]).drop_nulls()
|
|
|
|
if valid_data.height > 1:
|
|
corr_val = valid_data.select(pl.corr("score", "Ranking_Points")).item()
|
|
trait_display = trait.replace('|', '\n')
|
|
trait_correlations.append({
|
|
"trait_display": trait_display,
|
|
"trait_index": f"Trait {i+1}",
|
|
"correlation": corr_val if corr_val is not None else 0.0
|
|
})
|
|
|
|
if not trait_correlations:
|
|
return alt.Chart().mark_text(text=f"No data for {style_color} Style")
|
|
|
|
plot_df = pl.DataFrame(trait_correlations).to_pandas()
|
|
|
|
chart = alt.Chart(plot_df).mark_bar().encode(
|
|
x=alt.X('trait_display:N', title=None, axis=alt.Axis(labelAngle=0)),
|
|
y=alt.Y('correlation:Q', title='Correlation', scale=alt.Scale(domain=[-1, 1])),
|
|
color=alt.condition(
|
|
alt.datum.correlation >= 0,
|
|
alt.value('green'),
|
|
alt.value('red')
|
|
),
|
|
tooltip=[
|
|
alt.Tooltip('trait_display:N', title='Trait'),
|
|
alt.Tooltip('correlation:Q', format='.2f')
|
|
]
|
|
).properties(
|
|
title=title,
|
|
width=1000,
|
|
height=400
|
|
)
|
|
|
|
chart = self._save_plot(chart, title)
|
|
return chart
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] Returns `alt.Chart`
|
|
- [ ] Uses `Ranking_Points` column instead of `Voice_Scale_Score`
|
|
- [ ] Otherwise identical to Task 13
|
|
|
|
---
|
|
|
|
### TASK 15: Install vl-convert-python
|
|
|
|
**Action:** Add vl-convert-python to project dependencies for PNG export.
|
|
|
|
**Command:**
|
|
```bash
|
|
cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
|
|
uv add vl-convert-python
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] `vl-convert-python` appears in `pyproject.toml` dependencies
|
|
- [ ] Installation successful (no errors)
|
|
|
|
---
|
|
|
|
### TASK 16: Remove Plotly dependencies (optional cleanup)
|
|
|
|
**Action:** Remove unused Plotly packages.
|
|
|
|
**Command:**
|
|
```bash
|
|
cd /home/luigi/Documents/VoiceBranding/JPMC/Phase-3
|
|
uv remove plotly kaleido
|
|
```
|
|
|
|
**Verification:**
|
|
- [ ] `plotly` and `kaleido` removed from `pyproject.toml`
|
|
- [ ] No other code depends on Plotly (grep check)
|
|
|
|
---
|
|
|
|
### TASK 17: Test all plot methods in Marimo notebook
|
|
|
|
**Action:** Create a test notebook to verify all 10 plotting methods work correctly.
|
|
|
|
**Test checklist per plot:**
|
|
- [ ] Chart renders without errors
|
|
- [ ] Chart has correct dimensions (width/height)
|
|
- [ ] Colors match ColorPalette constants
|
|
- [ ] Data is displayed correctly (bars, stacks, etc.)
|
|
- [ ] Text overlays render (counts, scores)
|
|
- [ ] Tooltips show correct information
|
|
- [ ] Filter annotation appears below chart (if filters active)
|
|
- [ ] PNG export works (check `figures/` directory)
|
|
- [ ] No overlap between chart elements and filter text
|
|
|
|
**Create test file:** `/home/luigi/Documents/VoiceBranding/JPMC/Phase-3/test_altair_migration.py`
|
|
|
|
**Test template:**
|
|
```python
|
|
import marimo as mo
|
|
import polars as pl
|
|
from utils import JPMCSurvey
|
|
|
|
# Load sample data
|
|
survey = JPMCSurvey()
|
|
survey.load_data('path/to/data')
|
|
survey.fig_save_dir = 'figures/altair_test'
|
|
|
|
# Test each plot method
|
|
mo.md("## Testing Altair Migration")
|
|
|
|
# 1. Test plot_average_scores_with_counts
|
|
chart1 = survey.plot_average_scores_with_counts(...)
|
|
chart1
|
|
|
|
# 2. Test plot_most_ranked_1
|
|
chart2 = survey.plot_most_ranked_1(...)
|
|
chart2
|
|
|
|
# ... repeat for all 10 methods
|
|
```
|
|
|
|
---
|
|
|
|
## Final Verification Checklist
|
|
|
|
After completing all tasks, verify the following:
|
|
|
|
### Code Quality
|
|
- [ ] No Plotly imports remain in `plots.py`
|
|
- [ ] All methods return `alt.Chart` instead of `go.Figure`
|
|
- [ ] No syntax errors (`python -m py_compile plots.py`)
|
|
- [ ] Type hints updated (if any reference `go.Figure`)
|
|
- [ ] Docstrings updated (if any mention Plotly)
|
|
|
|
### Theme & Styling
|
|
- [ ] `jpmc_altair_theme()` function exists in `theme.py`
|
|
- [ ] Theme is registered and enabled
|
|
- [ ] All charts use ColorPalette constants
|
|
- [ ] Chart dimensions match original (width=1000, height=500 defaults)
|
|
- [ ] Font sizes match original (11pt for labels, 14pt for titles)
|
|
|
|
### Data Handling
|
|
- [ ] All methods handle empty data gracefully
|
|
- [ ] Wide-to-long transformations correct (stacked bars, selection counts)
|
|
- [ ] Sorting preserved (by average, count, rank1, etc.)
|
|
- [ ] Column filtering works (`_recordId` excluded)
|
|
- [ ] String processing works (comma-split, strip, explode)
|
|
|
|
### Visual Features
|
|
- [ ] Bar charts render correctly (vertical and horizontal)
|
|
- [ ] Stacked bars have correct layer order
|
|
- [ ] Text overlays positioned correctly (inside/outside bars)
|
|
- [ ] Conditional coloring works (top N highlighting, red/green by sign)
|
|
- [ ] Tooltips show correct fields with proper formatting
|
|
- [ ] Legends positioned correctly (top horizontal for stacked bars)
|
|
- [ ] X-axis labels rotated at -45° by default
|
|
- [ ] Grid lines visible
|
|
|
|
### Filter System
|
|
- [ ] `_get_filter_slug()` unchanged (still works)
|
|
- [ ] `_get_filter_description()` unchanged (still works)
|
|
- [ ] `_add_filter_footnote()` uses `vconcat` approach
|
|
- [ ] Filter text appears at bottom of combined chart
|
|
- [ ] No overlap between chart and filter text
|
|
- [ ] Filter text is left-aligned
|
|
- [ ] HTML tags stripped from filter text (Altair doesn't support HTML)
|
|
- [ ] Filter subdirectories created correctly
|
|
|
|
### PNG Export
|
|
- [ ] `vl-convert-python` installed
|
|
- [ ] `chart.save()` method works
|
|
- [ ] PNG files created in correct subdirectories
|
|
- [ ] PNG files have correct filenames (sanitized titles)
|
|
- [ ] Image quality acceptable (scale_factor=2.0)
|
|
|
|
### Marimo Integration
|
|
- [ ] Charts render in Marimo notebooks
|
|
- [ ] Charts are reactive (update when data changes)
|
|
- [ ] No JavaScript console errors
|
|
- [ ] Interactive features work (tooltips, pan, zoom if enabled)
|
|
|
|
### All 10 Plot Methods
|
|
1. [ ] `plot_average_scores_with_counts` - vertical bar + text
|
|
2. [ ] `plot_top3_ranking_distribution` - stacked bar (3 ranks)
|
|
3. [ ] `plot_ranking_distribution` - stacked bar (4 ranks)
|
|
4. [ ] `plot_most_ranked_1` - vertical bar + conditional color
|
|
5. [ ] `plot_weighted_ranking_score` - vertical bar + text
|
|
6. [ ] `plot_voice_selection_counts` - vertical bar + conditional color
|
|
7. [ ] `plot_top3_selection_counts` - vertical bar + conditional color
|
|
8. [ ] `plot_speaking_style_trait_scores` - horizontal bar + text
|
|
9. [ ] `plot_speaking_style_correlation` - vertical bar + red/green
|
|
10. [ ] `plot_speaking_style_ranking_correlation` - vertical bar + red/green
|
|
|
|
### Edge Cases
|
|
- [ ] Empty DataFrame handled gracefully
|
|
- [ ] Missing columns detected and reported
|
|
- [ ] Zero counts/values don't break charts
|
|
- [ ] Single data point renders correctly
|
|
- [ ] Very long labels don't cause layout issues
|
|
- [ ] Many categories don't cause overcrowding
|
|
|
|
### Regression Testing
|
|
- [ ] Existing Marimo notebooks still work
|
|
- [ ] Data filtering still works (`filter_data()`)
|
|
- [ ] `JPMCSurvey` class initialization unchanged
|
|
- [ ] No breaking changes to public API
|
|
|
|
### Documentation
|
|
- [ ] This migration plan marked as "Complete"
|
|
- [ ] Any new dependencies documented
|
|
- [ ] Any breaking changes documented
|
|
- [ ] Example usage updated (if applicable)
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Charts don't render
|
|
- Check Altair version: `python -c "import altair; print(altair.__version__)"`
|
|
- Check vl-convert: `python -c "import vl_convert; print(vl_convert.__version__)"`
|
|
- Check for JavaScript errors in browser console
|
|
|
|
### Issue: PNG export fails
|
|
- Verify vl-convert-python installed: `pip show vl-convert-python`
|
|
- Check write permissions on `figures/` directory
|
|
- Try saving as HTML first: `chart.save('test.html')`
|
|
|
|
### Issue: Colors don't match theme
|
|
- Verify theme is enabled: `print(alt.themes.active)`
|
|
- Check color scale definitions in each plot method
|
|
- Ensure ColorPalette imported correctly
|
|
|
|
### Issue: Filter text overlaps chart
|
|
- Increase `spacing` parameter in `vconcat(chart, footer, spacing=20)`
|
|
- Increase footer chart `height` property
|
|
- Check if footer chart is actually created (debug with `print()`)
|
|
|
|
### Issue: Data not displaying
|
|
- Check DataFrame format (wide vs long)
|
|
- Verify column names match encoding specs
|
|
- Check for null values (`.drop_nulls()`)
|
|
- Print intermediate DataFrames for debugging
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- **Backup:** Before starting, create a backup of `plots.py`: `cp plots.py plots.py.plotly_backup`
|
|
- **Incremental testing:** Test each plot method immediately after migration
|
|
- **Marimo restart:** May need to restart Marimo kernel after major changes
|
|
- **Performance:** Altair may be slightly slower for very large datasets (>5000 rows); use `.sample()` if needed
|
|
|
|
---
|
|
|
|
## Completion Status
|
|
|
|
- [ ] All tasks (1-17) completed
|
|
- [ ] All verification checks passed
|
|
- [ ] Existing notebooks tested and working
|
|
- [ ] Migration documented
|
|
- [ ] Ready for production use
|
|
|
|
**Migration completed on:** _________________
|
|
**Tested by:** _________________
|
|
**Sign-off:** _________________
|