How to Use LLM Orchestration¶

This guide covers the AI-assisted workflow for analyzing experiments.

Overview¶

LLM Orchestration is available in API-enhanced mode (requires Anthropic API key). The system analyzes the experiment and recommends processing strategies for each document.

Prerequisites¶

Anthropic API key configured in Settings
An experiment with associated documents
Documents should have text content extracted

The 5-Stage Workflow¶

LLM Orchestration follows a structured workflow with human review at each decision point:

Stage 1: Analyze¶

The LLM examines the experiment to understand:

Research goals and scope
Document characteristics (length, format, historical period)
Focus terms and their domains
Temporal range of the corpus

Based on the analysis, the system recommends:

Which tools to apply to each document
Processing order and dependencies
Confidence scores for each recommendation
Rationale explaining the choices

Example recommendations:

Document	Recommended Tools	Confidence
Historical paper (1910)	Entity extraction, Definition extraction	0.92
Modern technical paper	Semantic segmentation, Entity extraction	0.88
Legal dictionary entry	Definition extraction, Temporal extraction	0.95

Stage 3: Review¶

Recommendations are reviewed before execution:

Approve recommendations as-is
Modify tool selections for specific documents
Add processing notes
Reject and request re-analysis

All LLM recommendations require human approval before execution.

Orchestration Review

Stage 4: Execute¶

After approval, the system processes documents:

Tools run using local NLP libraries (spaCy, NLTK, sentence-transformers)
Progress tracked in real-time
Results stored as ProcessingArtifacts with PROV-O provenance

Available Processing Tools:

Entity Extraction (spaCy): Named entities (PERSON, ORG, GPE) + noun phrase concepts
Temporal Extraction (spaCy + regex): Dates, periods, historical markers, relative expressions
Definition Extraction (pattern matching):
Pattern matching for 8 definition types (explicit, copula, acronym, appositive, etc.)
Strict acronym validation requiring first-letter matching
Quality filters to reject citations, reference lists, and nonsense patterns
Text Segmentation: Structure-aware document splitting
Embedding Generation (sentence-transformers): Period-aware semantic vectors
LLM Text Cleanup (Claude): Modernize OCR errors while preserving historical terminology

Stage 5: Synthesize¶

The LLM analyzes results across all documents:

Identifies patterns and themes
Generates term cards with frequency data
Organizes findings by temporal period
Does not interpret results - preserves researcher authority

Accessing LLM Orchestration¶

Go to Experiments and select an experiment
Click Document Pipeline
Select LLM mode (toggle at top of page)
Click LLM Analyze to begin Stage 1

Workflow States¶

State	Description
`not_started`	Orchestration not yet initiated
`analyzing`	Stage 1 in progress
`awaiting_approval`	Recommendations ready for review
`executing`	Processing documents
`synthesizing`	Generating cross-document insights
`completed`	All stages finished
`error`	Processing encountered an error

Manual Alternative¶

To process documents without LLM orchestration:

Go to Document Pipeline in the experiment
Select documents manually
Choose processing operations
Click Run Selected Tools

Manual selections are recorded with the same PROV-O provenance structure.

When to Use LLM Orchestration¶

Large document collections (10+ documents)
Mixed document types requiring different tools
For AI-generated synthesis of patterns across documents

When to Process Manually¶

Small experiments (< 5 documents)
When specific tools are already determined
When API costs are a concern

Troubleshooting¶

Orchestration stuck¶

Check that Celery worker is running
Verify API key is valid and has quota
Review application logs for errors

Recommendations seem wrong¶

Ensure documents have metadata (especially publication date)
Check that document text was extracted correctly
Try providing more specific experiment description