Case Study: A Simple LLM Harness

Document Exploration Through Code Execution

An open-source agent loop that lets LLMs explore documents by writing Python code — no vector database, no chunking, no retrieval pipeline. The model reads documents directly, cites its sources, and the system verifies every citation.

Project Type:

AI Infrastructure / Open Source

Technology Stack:

Python, litellm, E2B, Gradio

Deployment:

HuggingFace Spaces, Docker

Status:

Live — try it on this site

Try It

This isn't a demo in a screenshot. The assistant on this site's homepage runs on the harness — ask it anything about AppSimple and watch it explore, cite, and answer in real time.

Want to try it with your own documents? Upload them on the Document Explorer and see how the agent handles your content.

The Bet

Most document Q&A systems use RAG: chunk documents, embed them in a vector database, retrieve the closest chunks, and pass them to the model. It works, but the model never actually reads the documents. It sees fragments chosen by a similarity algorithm.

The harness takes a different approach. Give the model one tool — run_python — and let it read the documents itself. It writes Python to open files, search for patterns, cross-reference data, and compute answers. The model reasons about the documents rather than pattern-matching against embeddings.

The trade-off is latency (multiple tool calls vs. one vector search). The payoff is dramatically better reasoning — the model can filter, compare, calculate, and verify in ways that chunked retrieval cannot.

How It Works

The harness is a minimal agent loop: call the LLM, execute its code in a sandboxed container, feed the results back, repeat until the model has an answer.

Agent Loop

Model calls run_python to read files, search content, and compute results in an E2B cloud sandbox
Streaming delivers text deltas in real time via Server-Sent Events — answers type out live
Nudge logic ensures the model consults the workspace rather than answering from memory

Citations

The model cites evidence inline as [filename: "quoted passage"]
Server-side regex extracts citations, verifies each quote against the source file using three-tier fuzzy matching (exact → ellipsis segments → 5-word sliding window)
Citations render as clean superscript footnotes with a collapsible source list

Evaluation

Code-based assertions across 10 question categories (fact lookup, multi-doc synthesis, comparison, trend analysis, cited analysis)
Five diverse document workspaces (SEC filings, Federalist Papers, Sherlock Holmes, world data, Darwin)
Traces stored for every question — full tool call chain, token counts, citation match rates

Architecture

The system spans three repos with a shared frontend:

Harness library (~1,300 lines Python) — agent loop, citation parsing, telemetry, trace viewer. Imported by both apps.
Assistant app — HuggingFace Space (Docker SDK) with persona prompt, curated workspace, daily rate limiting. Powers the chat on this site.
Explorer app — HuggingFace Space (Docker SDK) with file upload, server-side sessions, access token auth. Powers the Document Explorer.
Shared frontend — vanilla JS/CSS consumed by both apps. Markdown rendering, citation display, SSE streaming.

The harness is model-agnostic (via litellm), sandbox-agnostic (Docker or E2B), and has no opinions about how you display results. Citation processing is the only presentation-adjacent feature, and it returns structured data — the app decides how to render it.

Results

99%

Citation match rate across 20 eval questions

Workspace files (docs + website pages)

~7s

Average response time with streaming

Live on appsimple.io — the assistant on the homepage uses the harness to answer questions about Charles's work, citing curated documents and raw website pages
Open source — the harness library is publicly available for anyone to use with any LLM provider
Viable RAG alternative — demonstrates that code-as-tool preserves full reasoning capability while providing verifiable, cited answers

Explore the Project

Try the assistant, upload your own documents, or dig into the source code.

Ask the Assistant Try the Document Explorer View on GitHub