Potato: The Portable Annotation Tool

Potato is a free, self-hosted annotation platform for NLP, Agentic, and GenAI research. Annotate text, audio, video, images, documents, agent traces, and more — configured entirely through YAML. No coding required.

Try the live demo on HuggingFace Spaces — no installation needed.

Quick Start

pip install potato-annotation

# List available templates
potato list all

# Get a template and start annotating
potato get sentiment_analysis
potato start sentiment_analysis

Or run from source:

git clone https://github.com/davidjurgens/potato.git
cd potato && pip install -r requirements.txt
python potato/flask_server.py start examples/classification/single-choice/config.yaml -p 8000

Open http://localhost:8000 and start annotating.

What Can You Annotate?

Potato handles the full spectrum of annotation tasks — from traditional NLP labeling to evaluating the latest AI agent systems.

Data Types

Modality	Capabilities
Text	Classification, span labeling, entity linking, coreference, pairwise comparison (docs)
Agent Traces	Step-by-step evaluation of LLM agents, tool calls, ReAct chains, and multi-agent systems (docs)
Web Agents	Screenshot-based review with SVG click/scroll overlays, or live browsing with automatic trace recording (docs)
RAG Pipelines	Retrieval relevance, answer faithfulness, citation accuracy, hallucination detection
Audio	Waveform visualization, segment labeling, ELAN-style tiered annotation (docs)
Video	Frame-by-frame labeling, temporal segments, playback sync (docs)
Images	Bounding boxes, polygons, landmarks, classification (docs)
Dialogue	Turn-level annotation, conversation trees, interactive chat evaluation
Documents	PDF, Word, Markdown, code, and spreadsheets with coordinate mapping (docs)

Annotation Schemes

Scheme	Use Case
Radio / Checkbox / Likert	Classification, multi-label, rating scales
Span annotation	NER, highlighting, hallucination marking
Pairwise comparison	A/B testing, best-worst scaling
Per-step ratings	Evaluate individual agent actions or dialogue turns
Free text	Open-ended responses with validation
Triage	Rapid accept/reject/skip curation (docs)
Conditional logic	Adaptive forms that respond to prior answers (docs)

Agent & LLM Evaluation

Potato provides purpose-built tooling for evaluating AI agents at every level of granularity.

Trace Formats

Import traces from any major agent framework with the built-in converter:

python -m potato.trace_converter --input traces.json --input-format openai --output data.jsonl

Supported formats: OpenAI, Anthropic/Claude, ReAct, LangChain, LangFuse, WebArena, SWE-bench, OpenTelemetry, CrewAI/AutoGen/LangGraph, MCP, and more. Auto-detection is available with --auto-detect.

Evaluation Levels

Level	What You Annotate	Example
Trajectory	Overall task success, efficiency, safety	"Did the agent complete the task?"
Step	Individual action correctness, reasoning quality	Per-turn Likert ratings on each agent step
Span	Specific text segments within agent output	Highlight hallucinated claims, factual errors
Comparison	Side-by-side A/B agent evaluation	"Which agent performed better?"

Web Agent Viewer

An interactive viewer for GUI agent traces — navigate step-by-step through screenshots with SVG overlays showing clicks, bounding boxes, mouse paths, and scroll actions. Annotators rate each step with inline controls while a filmstrip bar provides quick navigation.

Ready-to-Use Agent Examples

Example	What It Evaluates
agent-trace-evaluation	Text agent traces with MAST error taxonomy + hallucination spans
visual-agent-evaluation	GUI agents with screenshot grounding accuracy
agent-comparison	Side-by-side A/B agent comparison
rag-evaluation	RAG retrieval relevance and citation accuracy
openai-evaluation	OpenAI Chat API traces with tool calls
anthropic-evaluation	Claude messages with tool_use blocks
swebench-evaluation	Coding agents with patch correctness ratings
multi-agent-evaluation	Multi-agent coordination (CrewAI, AutoGen, LangGraph)
web-agent-review	Pre-recorded web traces with step-by-step overlay viewer
web-agent-creation	Live web browsing with automatic trace recording

AI-Powered Annotation

LLM Label Suggestions

Integrate any LLM provider to pre-annotate instances and suggest labels. Annotators review and correct — dramatically faster than labeling from scratch.

Supported backends: OpenAI, Anthropic, Ollama, vLLM, Gemini, HuggingFace, OpenRouter

Active Learning

Potato reorders your annotation queue based on model uncertainty so annotators label the most informative instances first. Supports uncertainty sampling, BADGE, BALD, diversity, and hybrid strategies (docs).

Solo Mode

A human-LLM collaborative workflow where the system learns from annotator feedback and progressively transitions to autonomous LLM labeling as agreement improves (docs).

Chat Assistant

An LLM-powered sidebar where annotators can ask questions about difficult instances. The AI provides guidance informed by your task description and annotation guidelines — helping annotators think through decisions without auto-labeling (docs).

Quality Control & Workflows

Quality Assurance

Feature	Description
Attention checks	Automatically inserted known-answer items to verify engagement
Gold standards	Track annotator accuracy against expert labels
Inter-annotator agreement	Built-in Krippendorff's alpha and Cohen's kappa
Training phase	Practice annotations with feedback before the real task
Behavioral tracking	Timing, click patterns, and annotation change history

Annotation Workflows

Workflow	Description
Multi-annotator	Multiple annotators per item with overlap control and agreement metrics
Adjudication	Expert review of annotator disagreements to produce gold labels (docs)
Solo mode	Human-LLM collaboration with progressive automation (docs)
Crowdsourcing	Prolific and MTurk integration with platform-specific auth (docs)
Triage	Rapid accept/reject/skip for data curation (docs)

Authentication & Deployment

Potato supports multiple authentication methods, from passwordless quick-start to enterprise SSO:

Method	Use Case
In-memory	Local development, quick studies
Password + file persistence	Team annotation with shared credential files (docs)
Database	Production deployments with SQLite or PostgreSQL (docs)
OAuth / SSO	Google, GitHub, or institutional OIDC login (docs)
Passwordless	Low-stakes tasks where ease of access matters (docs)

Passwords are hashed with per-user PBKDF2-SHA256 salts. Admins can reset passwords via CLI (potato reset-password) or REST API. Self-service token-based reset is also available.

Example Projects

Ready-to-use templates organized by type in examples/:

Category	Examples
Classification	Radio, checkbox, Likert, slider, pairwise comparison
Span	NER, span linking, coreference, entity linking
Agent Traces	LLM agents, web agents, RAG, multi-agent, code agents
Audio	Waveform annotation, classification, ELAN-style tiered
Video	Frame-level labeling, temporal segments
Image	Bounding boxes, PDF/document annotation
Advanced	Solo mode, adjudication, quality control, conditional logic
AI-Assisted	LLM suggestions, Ollama integration
Custom Layouts	Content moderation, dialogue QA, medical review

Research Showcase

The Potato Showcase contains annotation projects from published research — sentiment analysis, dialogue evaluation, summarization, and more.

potato list all          # Browse available projects
potato get <project>     # Download one

Documentation

Topic	Link
Quick Start	docs/quick-start.md
Configuration Reference	docs/configuration.md
Schema Gallery	docs/schemas_and_templates.md
Agent Trace Evaluation	docs/agent_traces.md
Web Agent Annotation	docs/web_agent_annotation.md
AI Support	docs/ai_support.md
Active Learning	docs/active_learning_guide.md
Solo Mode	docs/solo_mode.md
Quality Control	docs/quality_control.md
Password Management	docs/password_management.md
SSO & OAuth	docs/sso_authentication.md
Admin Dashboard	docs/admin_dashboard.md
Crowdsourcing	docs/crowdsourcing.md
Export Formats	docs/export_formats.md
Full Documentation Index	docs/index.md

Development

# Run tests
pytest tests/ -v

# By category
pytest tests/unit/ -v        # Unit tests (fast)
pytest tests/server/ -v      # Integration tests
pytest tests/selenium/ -v    # Browser tests

# With coverage
pytest --cov=potato --cov-report=html

Support

Issues: GitHub Issues
Questions: jurgens@umich.edu
Docs: potatoannotator.readthedocs.io

License

Potato is licensed under Polyform Shield. Non-commercial applications can use Potato however they want. Commercial applications can use Potato to annotate all they want, but cannot integrate Potato into a commercial product.

License FAQ

Use Case	Allowed?
Academic research	Yes
Company annotation	Yes
Fork for personal development	Yes
Integration in open-source pipelines	Yes
Commercial annotation service	Contact us
Competing annotation platform	Contact us

Citation

@inproceedings{pei2022potato,
  title={POTATO: The Portable Text Annotation Tool},
  author={Pei, Jiaxin and Ananthasubramaniam, Aparna and Wang, Xingyao and Zhou, Naitian and Dedeloudis, Apostolos and Sargent, Jackson and Jurgens, David},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 847 Commits
.github/workflows		.github/workflows
demo		demo
deployment/huggingface-spaces		deployment/huggingface-spaces
docs		docs
examples		examples
node		node
potato		potato
scripts		scripts
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MIGRATION.md		MIGRATION.md
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
potato_logo_dark.jpg		potato_logo_dark.jpg
potato_logo_light.jpg		potato_logo_light.jpg
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
run_tests.py		run_tests.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Potato: The Portable Annotation Tool

Quick Start

What Can You Annotate?

Data Types

Annotation Schemes

Agent & LLM Evaluation

Trace Formats

Evaluation Levels

Web Agent Viewer

Ready-to-Use Agent Examples

AI-Powered Annotation

LLM Label Suggestions

Active Learning

Solo Mode

Chat Assistant

Quality Control & Workflows

Quality Assurance

Annotation Workflows

Authentication & Deployment

Example Projects

Research Showcase

Documentation

Development

Support

License

Citation

About

Uh oh!

Releases 11

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Potato: The Portable Annotation Tool

Quick Start

What Can You Annotate?

Data Types

Annotation Schemes

Agent & LLM Evaluation

Trace Formats

Evaluation Levels

Web Agent Viewer

Ready-to-Use Agent Examples

AI-Powered Annotation

LLM Label Suggestions

Active Learning

Solo Mode

Chat Assistant

Quality Control & Workflows

Quality Assurance

Annotation Workflows

Authentication & Deployment

Example Projects

Research Showcase

Documentation

Development

Support

License

Citation

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages