What is TOON? A Complete Guide to Token-Oriented Object Notation
Discover TOON format - a revolutionary data serialization format designed to reduce LLM token usage by 30-60%. Learn how TOON optimizes data for GPT-4, Claude, and other AI models.
What is TOON? A Complete Guide to Token-Oriented Object Notation
In the era of Large Language Models, every token counts. If you're building AI chatbots, data analytics tools, or LLM-powered applications, you've probably noticed that token costs add up fast. That's the problem TOON was designed to solve.
What is TOON Format?
TOON (Token-Oriented Object Notation) is a compact, human-readable data format specifically designed for Large Language Models. Unlike JSON, which was created for web APIs in the early 2000s, TOON was built for the age of AI.
The core issue TOON addresses is redundancy in JSON. When you have an array of 1000 user objects, field names like "id", "name", and "role" get repeated 1000 times. That's wasteful, especially when you're paying per token. TOON solves this by declaring field names once and presenting data in a tabular format.
Key Features
30-60% Token Savings
TOON typically reduces token consumption by 30-60% compared to JSON for uniform data structures. This translates to:
- Lower API costs for GPT-4, Claude, and other LLMs
- Faster processing times
- More room in context windows for actual content
Better LLM Accuracy
In testing, TOON achieves 73.9% accuracy compared to JSON's 69.7% on LLM data retrieval tasks. The structured, compact format helps AI models understand and process data more effectively. The explicit headers and tabular layout make it easier for models to parse relationships.
100% JSON Compatible
TOON is fully compatible with the JSON data model. You can convert between formats seamlessly without losing information. All JSON data types are supported: objects, arrays, strings, numbers, booleans, and null values.
The conversion is lossless and bidirectional. Your existing JSON data can be converted to TOON for LLM prompts, and you can convert TOON responses back to JSON for storage or API responses.
Human-Readable Structure
Despite being optimized for tokens, TOON remains intuitive and readable. It combines the best aspects of CSV (for arrays) and YAML (for nested objects). Developers find it easy to read and debug, even without prior experience.
How TOON Works: A Simple Example
Let's compare JSON and TOON with a real example:
JSON (126 tokens)
{
"users": [
{
"id": 1,
"name": "Alice",
"role": "admin"
},
{
"id": 2,
"name": "Bob",
"role": "user"
}
]
}
TOON (49 tokens) - 61% savings
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
What Changed?
- Field declaration:
{id,name,role}declares all fields once instead of repeating them - Array length:
[2]explicitly states the array size for validation - Tabular format: Data rows contain only values, not property names
- Minimal punctuation: No repetitive brackets, quotes, or commas around keys
The token count drops from 126 to 49. Scale that up to 1000 users, and you're looking at significant cost savings.
When to Use TOON
TOON works best in specific scenarios:
Ideal Use Cases
Uniform arrays of objects: Employee records, user lists, transaction logs. When you have multiple objects with the same fields, TOON's tabular format shines.
Time-series data: Analytics data, sensor readings, stock prices. The CSV-like structure is perfect for temporal data.
API responses: REST API results with consistent structure. If your API returns arrays of similar objects, TOON can cut the token count dramatically.
Database query results: SQL query outputs are already tabular, making them perfect for TOON conversion.
E-commerce catalogs: Product lists with standard fields (id, name, price, category) compress well.
CSV-style datasets: Any data that's naturally tabular benefits from TOON's format.
Less Ideal Use Cases
Deeply nested configurations: Complex config files with varying structures don't benefit as much. JSON might be clearer.
Non-uniform data: Objects with varying field sets lose TOON's main advantage. The tabular format requires consistent fields.
Small datasets: Less than 10 objects won't show much benefit. The overhead of field declarations isn't worth it.
Pure tabular data: If you're just sending tabular data with no nesting, plain CSV is even more compact.
TOON Syntax Basics
Simple Objects
id: 123
name: Alice
active: true
Just key-value pairs, one per line. Similar to YAML but simpler.
Arrays of Primitives
tags[3]: admin,ops,dev
Declare the length, then comma-separate the values.
Tabular Arrays (Most Efficient)
employees[3]{id,name,department,salary}:
101,Alice Johnson,Engineering,120000
102,Bob Smith,Marketing,95000
103,Carol White,Sales,105000
This is where TOON really shines. The fields are declared once in the header ({id,name,department,salary}), and each row is just the values.
Nested Objects
company:
name: TechCorp
employees[2]{id,name}:
1,Alice
2,Bob
You can nest objects and arrays naturally. Indentation shows the structure.
Using TOON in Production
TOON is production-ready with several implementations:
Official TypeScript/JavaScript SDK
Available on npm as @toon-format/toon. Install it with:
npm install @toon-format/toon
Quick Start Example
import { encode, decode } from '@toon-format/toon';
// JSON to TOON
const data = {
users: [
{ id: 1, name: "Alice", role: "admin" },
{ id: 2, name: "Bob", role: "user" }
]
};
const toonString = encode(data);
// users[2]{id,name,role}:
// 1,Alice,admin
// 2,Bob,user
// TOON to JSON
const jsonData = decode(toonString);
// Back to original structure
The API is straightforward: encode() converts JSON to TOON, decode() converts TOON back to JSON.
Other Languages
Community libraries for Python, Go, and Rust are in development. The specification is open, so you can implement it yourself if needed.
Real-World Impact
Cost Savings Example
Let's say you process 1 million API requests per month with GPT-4:
- With JSON: ~500M tokens/month at $0.03/1K = $15,000/month
- With TOON: ~200M tokens/month at $0.03/1K = $6,000/month
- Savings: $9,000/month (60% reduction)
For a high-volume application, that's real money.
Context Window Optimization
GPT-4's context window is 8K tokens. If your dataset takes 6,000 tokens in JSON:
- JSON: 6,000 tokens for data → 2K tokens left for your prompt
- TOON: 2,400 tokens for data → 5.6K tokens left for your prompt
That's 2.8x more space for instructions, examples, and multi-turn conversation history.
Getting Started
Here's how to try TOON:
- Use our free online converter - Paste your JSON, get TOON instantly with token counts
- Explore the playground - Compare TOON with JSON, YAML, and XML side by side
- Read the documentation - Complete syntax guide and API reference
- Install the npm package:
npm install @toon-format/toon
Common Questions
Is TOON a replacement for JSON?
No. TOON is optimized for LLM input, not general data interchange. Use JSON for APIs and storage, then convert to TOON when sending data to LLMs. Think of it as a transformation layer, not a replacement.
Does TOON work with all LLMs?
Yes. TOON works with GPT-4, Claude, Gemini, LLaMA, and any text-based LLM. The format is intuitive enough that models understand it with minimal prompting. In some cases, you don't even need to explain it—just include it in a code block and the model figures it out.
Can I convert TOON back to JSON?
Absolutely. TOON is 100% lossless and bidirectional. Use decode() to convert TOON back to JSON at any time.
What about performance?
TOON encoding and decoding is fast—typically sub-millisecond for datasets under 1MB. The token savings far outweigh any processing overhead. For most applications, the conversion time is negligible compared to the LLM API call latency.
Do I need to explain TOON to the LLM?
Usually, a brief explanation helps: "Here's data in TOON format (fields declared once in the header)." After that, models understand it well. Some models even recognize it without explanation if you use code blocks.
The Bottom Line
TOON was built to solve a specific problem: reducing token waste when sending structured data to LLMs. It does this by:
- Eliminating field name repetition
- Using a tabular format for arrays
- Minimizing punctuation overhead
- Providing explicit validation markers
The result is typically 30-60% fewer tokens, which means lower costs, better accuracy, and more efficient use of context windows.
If you're working with LLMs and sending structured data in your prompts, TOON is worth trying. Start with our free converter to see the impact on your actual data.
Learn More:
Ready to reduce your LLM costs?
Try our free JSON to TOON converter and see instant token savings