architecture clarification

This commit is contained in:
2025-12-03 12:12:23 +01:00
parent b21f402e1e
commit 98202ac3f2

View File

@@ -78,7 +78,9 @@ def _(mo):
**Goal:** Convert unstructured text into a structured dataset. **Goal:** Convert unstructured text into a structured dataset.
1. **Input:** All 26 Transcripts + `master_codebook.json`. This will be a dedicated notebook, and be run per transcript.
1. **Input:** Transcript + `master_codebook.json`.
2. **Process:** 2. **Process:**
* The LLM analyzes each transcript segment-by-segment. * The LLM analyzes each transcript segment-by-segment.
* It extracts specific quotes that match a Theme Definition. * It extracts specific quotes that match a Theme Definition.
@@ -86,8 +88,9 @@ def _(mo):
* **Granular Sentiment Analysis:** For each quote, the model identifies: * **Granular Sentiment Analysis:** For each quote, the model identifies:
* **Subject:** The specific topic/object being discussed (e.g., "Login Flow", "Brand Tone"). * **Subject:** The specific topic/object being discussed (e.g., "Login Flow", "Brand Tone").
* **Sentiment:** Positive / Neutral / Negative. * **Sentiment:** Positive / Neutral / Negative.
3. **Output:** `coded_segments.csv` 3. **Output:** `<transcript_name>_coded_segments.csv`
* Columns: `Source_File`, `Speaker`, `Theme`, `Quote`, `Subject`, `Sentiment`, `Context`. * Columns: `Source_File`, `Speaker`, `Theme`, `Quote`, `Subject`, `Sentiment`, `Context`.
* Each transcript produces its own CSV-file, which can be reviewed and adjusted before moving to the next stage
""") """)
return return