architecture clarification
This commit is contained in:
@@ -78,7 +78,9 @@ def _(mo):
|
|||||||
|
|
||||||
**Goal:** Convert unstructured text into a structured dataset.
|
**Goal:** Convert unstructured text into a structured dataset.
|
||||||
|
|
||||||
1. **Input:** All 26 Transcripts + `master_codebook.json`.
|
This will be a dedicated notebook, and be run per transcript.
|
||||||
|
|
||||||
|
1. **Input:** Transcript + `master_codebook.json`.
|
||||||
2. **Process:**
|
2. **Process:**
|
||||||
* The LLM analyzes each transcript segment-by-segment.
|
* The LLM analyzes each transcript segment-by-segment.
|
||||||
* It extracts specific quotes that match a Theme Definition.
|
* It extracts specific quotes that match a Theme Definition.
|
||||||
@@ -86,8 +88,9 @@ def _(mo):
|
|||||||
* **Granular Sentiment Analysis:** For each quote, the model identifies:
|
* **Granular Sentiment Analysis:** For each quote, the model identifies:
|
||||||
* **Subject:** The specific topic/object being discussed (e.g., "Login Flow", "Brand Tone").
|
* **Subject:** The specific topic/object being discussed (e.g., "Login Flow", "Brand Tone").
|
||||||
* **Sentiment:** Positive / Neutral / Negative.
|
* **Sentiment:** Positive / Neutral / Negative.
|
||||||
3. **Output:** `coded_segments.csv`
|
3. **Output:** `<transcript_name>_coded_segments.csv`
|
||||||
* Columns: `Source_File`, `Speaker`, `Theme`, `Quote`, `Subject`, `Sentiment`, `Context`.
|
* Columns: `Source_File`, `Speaker`, `Theme`, `Quote`, `Subject`, `Sentiment`, `Context`.
|
||||||
|
* Each transcript produces its own CSV-file, which can be reviewed and adjusted before moving to the next stage
|
||||||
""")
|
""")
|
||||||
return
|
return
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user