added functionality to load keywords from excel file

This commit is contained in:
2025-12-16 22:25:12 -08:00
parent e81961b819
commit e90b41f648
4 changed files with 181 additions and 91 deletions

View File

@@ -48,38 +48,39 @@ def ollama_keyword_extraction(content, tag, client: Client, model) -> list:
"""
# Construct prompt for Ollama model
prompt = f"""
### Role
You are a qualitative data analyst. Your task is to extract keywords from a user quote to build a semantic word cluster.
# Prompt optimized for small models (Llama 3.2):
# - Fewer rules, prioritized by importance
# - Explicit verbatim instruction (prevents truncation errors)
# - Examples that reinforce exact copying
# - Positive framing (do X) instead of negative (don't do Y)
# - Minimal formatting overhead
prompt = f"""Extract keywords from interview quotes for thematic analysis.
### Guidelines
1. **Quantity:** Extract **1-5** high-value keywords. If the quote only contains 1 valid insight, return only 1 keyword. Do not force extra words.
2. **Specificity:** Avoid vague, single nouns (e.g., "tech", "choice", "system"). Instead, capture the descriptor (e.g., "tech-forward", "payment choice", "legacy system").
3. **Adjectives:** Standalone adjectives are acceptable if they are strong descriptors (e.g., "reliable", "trustworthy", "professional").
4. **Normalize:** Convert verbs to present tense and nouns to singular.
5. **Output Format:** Return a single JSON object with the key "keywords" containing a list of strings.
RULES (in priority order):
1. Extract only keywords RELEVANT to the given context. Ignore off-topic content. Do NOT invent keywords.
2. Use words from the quote, but generalize for clustering (e.g., "not youthful" "traditional").
3. Extract 1-5 keywords or short phrases that capture key themes.
4. Prefer descriptive phrases over vague single words (e.g., "tech forward" not "tech").
### Examples
EXAMPLES:
**Input Context:** Chase as a Brand
**Input Quote:** "I would describe it as, you know, like the next big thing, like, you know, tech forward, you know, customer service forward, and just hating that availability."
**Output:** {{ "keywords": ["tech forward", "customer service focused", "availability"] }}
Context: Chase as a Brand
Quote: "It's definitely not, like, youthful or trendy."
Output: {{"keywords": ["traditional", "established"]}}
**Input Context:** App Usability
**Input Quote:** "There are so many options when I try to pay, it's confusing."
**Output:** {{ "keywords": ["confusing", "payment options"] }}
Context: App Usability
Quote: "There are so many options when I try to pay, it's confusing."
Output: {{"keywords": ["confusing", "overwhelming options"]}}
**Input Context:** Investment Tools
**Input Quote:** "It is just really reliable."
**Output:** {{ "keywords": ["reliable"] }}
Context: Brand Perception
Quote: "I would say reliable, trustworthy, kind of old-school."
Output: {{"keywords": ["reliable", "trustworthy", "old-school"]}}
### Input Data
**Context/Theme:** {tag}
**Quote:** "{content}"
NOW EXTRACT KEYWORDS:
### Output
```json
"""
Context: {tag}
Quote: "{content}"
Output:"""
max_retries = 3
for attempt in range(max_retries):