architecture overview for afstemming + reference
This commit is contained in:
153
Model_Selection_Reference.py
Normal file
153
Model_Selection_Reference.py
Normal file
@@ -0,0 +1,153 @@
|
||||
import marimo
|
||||
|
||||
__generated_with = "0.18.0"
|
||||
app = marimo.App(width="medium")
|
||||
|
||||
|
||||
@app.cell
|
||||
def _():
|
||||
import marimo as mo
|
||||
return (mo,)
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
# LLM Model Selection Reference
|
||||
|
||||
A reference guide for choosing models for interview transcript thematic analysis.
|
||||
|
||||
---
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
## Infrastructure
|
||||
|
||||
| Resource | Specification |
|
||||
|----------|---------------|
|
||||
| **GPU** | NVIDIA H100 (80GB VRAM) |
|
||||
| **VM** | `hiperf-gpu` via Tailscale |
|
||||
| **API** | Ollama Python client |
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
## Recommended Models for Thematic Analysis
|
||||
|
||||
### Primary Recommendation: `llama3.3:70b`
|
||||
|
||||
| Aspect | Value |
|
||||
|--------|-------|
|
||||
| **Context Window** | 128K tokens |
|
||||
| **VRAM Usage** | ~45GB |
|
||||
| **Architecture** | Dense (70B always active) |
|
||||
| **Strengths** | Excellent instruction following, proven reliability, great for long documents |
|
||||
|
||||
### Alternatives
|
||||
|
||||
| Model | Context | VRAM | Best For |
|
||||
|-------|---------|------|----------|
|
||||
| `qwen3:30b` | 256K | ~19GB | Fast iteration, huge context window |
|
||||
| `qwen3:32b` | 40K | ~20GB | Balance of speed and quality |
|
||||
| `qwen3:235b` | 256K | ~142GB (needs quantization) | Maximum quality (MoE: 22B active) |
|
||||
| `deepseek-r1:70b` | 64K | ~45GB | Reasoning transparency (shows thinking) |
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
## Context Window Considerations
|
||||
|
||||
### For 1-Hour Interview Transcripts
|
||||
|
||||
- **Estimated size**: ~8,000-10,000 tokens
|
||||
- **Requirement**: Any model with 32K+ context is sufficient
|
||||
- **Recommendation**: `llama3.3:70b` (128K) handles full transcripts easily
|
||||
|
||||
### When Larger Context Helps ✅
|
||||
|
||||
- Full document fits without chunking
|
||||
- Model can connect themes across entire transcript
|
||||
- Simpler preprocessing pipeline
|
||||
|
||||
### When Larger Context Can Hurt ⚠️
|
||||
|
||||
| Issue | Explanation |
|
||||
|-------|-------------|
|
||||
| **"Lost in the middle"** | LLMs focus on beginning/end, lose attention to middle |
|
||||
| **Slower inference** | Attention scales quadratically with length |
|
||||
| **Diluted attention** | Key info gets drowned by less relevant content |
|
||||
|
||||
### Key Insight
|
||||
|
||||
Research shows models often perform **worse** with very long contexts vs. strategically selected shorter contexts. For ~10K token transcripts, **context window size doesn't matter** — choose based on model quality and speed.
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
## Document Chunking
|
||||
|
||||
### When You Need Chunking
|
||||
|
||||
| Model Context | 30-min Transcript (~5K tokens) | 1-hour Transcript (~10K tokens) |
|
||||
|---------------|-------------------------------|--------------------------------|
|
||||
| 4K-8K (7B models) | ⚠️ May need chunking | ❌ Needs chunking |
|
||||
| 32K-40K | ✅ Fits | ✅ Fits |
|
||||
| 128K+ | ✅ Fits easily | ✅ Fits easily |
|
||||
|
||||
### Chunking Strategies (if needed)
|
||||
|
||||
1. **By speaker turns** — Split at natural conversation boundaries
|
||||
2. **By time segments** — 10-15 minute chunks
|
||||
3. **By token count** — Fixed size with overlap
|
||||
4. **Hierarchical** — Summarize chunks, then analyze summaries
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
## Model Comparison Summary
|
||||
|
||||
```
|
||||
Quality: qwen3:235b > llama3.3:70b ≈ qwen3:30b > qwen3:32b
|
||||
Speed: qwen3:30b > qwen3:32b > llama3.3:70b > qwen3:235b
|
||||
Context: qwen3:235b (256K) > qwen3:30b (256K) > llama3.3:70b (128K) > qwen3:32b (40K)
|
||||
```
|
||||
|
||||
### Final Recommendation
|
||||
|
||||
**Use `llama3.3:70b`** for this project:
|
||||
- 128K context is more than sufficient for 1-hour transcripts
|
||||
- Excellent quality for thematic analysis
|
||||
- Well-tested and reliable
|
||||
- Good balance of speed and quality on H100
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
@app.cell(hide_code=True)
|
||||
def _(mo):
|
||||
mo.md(r"""
|
||||
---
|
||||
|
||||
*Last updated: December 2025*
|
||||
""")
|
||||
return
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run()
|
||||
Reference in New Issue
Block a user