TOON: Token-Oriented Object Notation for Efficient LLM Data Exchange
Introduction
In the rapidly evolving landscape of AI and large language models (LLMs), data efficiency has become a critical factor. Traditional JSON, while ubiquitous, can be verbose and token-intensive when transmitting structured data to LLMs. Enter TOON (Token-Oriented Object Notation) - a compact, human-readable serialization format specifically designed to reduce token usage by 30-60% compared to JSON while maintaining full structural integrity.
This article explores TOON’s capabilities and demonstrates how to implement it within agentic coding workflows, where autonomous AI agents handle data processing and transformation tasks.
What is TOON?
TOON is a specialized data format that combines the best of YAML’s structure with CSV’s tabular efficiency. It’s particularly optimized for uniform arrays of objects - the most common data pattern in LLM interactions. The format achieves its token savings through:
- Explicit length markers:
[N]declarations help LLMs validate data completeness - Tabular representation: Uniform data is presented in CSV-like rows
- Flexible delimiters: Choose commas, tabs, or pipes based on content
- Nested structure support: Indentation-based hierarchy for complex objects
Key Benefits
- 30-60% token reduction compared to formatted JSON
- LLM-friendly structure with explicit validation markers
- Human-readable yet compact representation
- Bidirectional conversion between JSON and TOON
- CLI tools for batch processing and file conversion
TOON Format Structure
Let’s examine TOON’s syntax through examples:
Basic Tabular Data
users[3]{id,name,role,active}:
1,Alice,admin,true
2,Bob,user,true
3,Charlie,user,false
This represents the same data as this JSON:
{
"users": [
{"id": 1, "name": "Alice", "role": "admin", "active": true},
{"id": 2, "name": "Bob", "role": "user", "active": true},
{"id": 3, "name": "Charlie", "role": "user", "active": false}
]
}
Nested Structures
order:
id: ORD-123
customer:
name: Alice Johnson
email: alice@example.com
items[2]{sku,qty,price}:
A1,2,19.99
B2,1,29.99
total: 69.97
Agentic Coding Implementation
Agentic coding involves autonomous AI agents that can understand requirements, plan solutions, and execute code generation. TOON integrates seamlessly into these workflows by providing an efficient data exchange format between human instructions, AI processing, and system outputs.
Setting Up TOON in Your Project
First, install the TOON library:
npm install @toon-format/toon
# or
bun add @toon-format/toon
Basic Encoding and Decoding
import { encode, decode } from '@toon-format/toon'
// Agent receives data request
const userData = {
users: [
{ id: 1, name: 'Alice', department: 'Engineering', salary: 95000 },
{ id: 2, name: 'Bob', department: 'Sales', salary: 75000 },
{ id: 3, name: 'Charlie', department: 'Engineering', salary: 105000 }
]
}
// Agent encodes data efficiently for LLM processing
const toonData = encode(userData, { delimiter: '\t' })
console.log('TOON format:', toonData)
// Agent can decode LLM responses back to JSON
const llmResponse = `filtered[2]{id,name,department,salary}:
1\tAlice\tEngineering\t95000
3\tCharlie\tEngineering\t105000`
const processedData = decode(llmResponse)
console.log('Decoded:', processedData)
Agentic Workflow Example: Data Analysis Pipeline
Here’s how an AI agent might use TOON in a complete data processing workflow:
import { encode, decode } from '@toon-format/toon'
class DataAnalysisAgent {
async processDataset(rawData: any, analysisRequest: string) {
// Step 1: Encode data efficiently for LLM
const compactData = encode(rawData, {
delimiter: '\t',
indent: 2
})
// Step 2: Construct optimized prompt
const prompt = `Analyze this dataset in TOON format:
\`\`\`toon
${compactData}
\`\`\`
${analysisRequest}
Return results in TOON format with appropriate structure.`
// Step 3: Send to LLM (simulated)
const llmResponse = await this.callLLM(prompt)
// Step 4: Decode results back to JSON
const results = decode(llmResponse, { strict: false })
return results
}
private async callLLM(prompt: string): Promise<string> {
// In real implementation, call your LLM API
// For demo, return a sample TOON response
return `analysis[3]{metric,value,insight}:
average_salary\t88333.33\tEngineering leads with 100k+ average
department_count\t2\tTwo active departments
top_performer\tCharlie\tHighest paid engineer`
}
}
// Usage example
const agent = new DataAnalysisAgent()
const dataset = {
employees: [
{ id: 1, name: 'Alice', dept: 'Engineering', salary: 95000 },
{ id: 2, name: 'Bob', dept: 'Sales', salary: 75000 },
{ id: 3, name: 'Charlie', dept: 'Engineering', salary: 105000 }
]
}
const results = await agent.processDataset(
dataset,
'Calculate average salary by department and identify top performer'
)
Advanced Agentic Patterns
Data Transformation Agent
class DataTransformationAgent {
async transformData(inputData: any, targetFormat: string) {
const toonInput = encode(inputData, { delimiter: '\t' })
const prompt = `Transform this data to ${targetFormat} format.
Input data:
\`\`\`toon
${toonInput}
\`\`\`
Return only the transformed data in TOON structure.`
const response = await this.callLLM(prompt)
return decode(response)
}
}
Batch Processing with CLI Integration
For large-scale agentic workflows, integrate TOON’s CLI tools:
import { exec } from 'child_process'
import { promisify } from 'util'
const execAsync = promisify(exec)
class BatchProcessingAgent {
async processLargeDataset(inputFile: string, outputFile: string) {
// Use TOON CLI for efficient file processing
const { stdout } = await execAsync(
`npx @toon-format/cli ${inputFile} --delimiter "\\t" --stats -o ${outputFile}`
)
console.log('Processing stats:', stdout)
// Agent can then analyze the processed TOON file
const analysisPrompt = `Analyze token efficiency of this TOON file: ${outputFile}`
// ... continue with LLM analysis
}
}
Real-World Agentic Coding Scenarios
1. API Data Aggregation
class APIAggregationAgent {
async aggregateAPIs(apiEndpoints: string[]) {
const results = []
for (const endpoint of apiEndpoints) {
const response = await fetch(endpoint)
const data = await response.json()
// Convert to TOON for efficient LLM analysis
const toonData = encode(data, { delimiter: '|' })
results.push(toonData)
}
// Agent analyzes combined data
const combinedPrompt = `Analyze these API responses for patterns:
${results.join('\n---\n')}
Identify common structures and suggest unified schema.`
const analysis = await this.callLLM(combinedPrompt)
return decode(analysis)
}
}
2. Database Query Optimization
class QueryOptimizationAgent {
async optimizeQuery(schema: any, query: string) {
const schemaToon = encode(schema, { delimiter: '\t' })
const prompt = `Database schema:
\`\`\`toon
${schemaToon}
\`\`\`
Optimize this query: ${query}
Return optimized query plan in TOON format with estimated token savings.`
const plan = await this.callLLM(prompt)
return decode(plan)
}
}
Performance Considerations
Token Efficiency Metrics
TOON typically achieves:
- 30-60% token reduction for uniform tabular data
- Minimal overhead for nested structures
- Configurable delimiters for maximum compression
When to Use TOON vs JSON
Use TOON when:
- Sending uniform arrays of objects to LLMs
- Token costs are a significant concern
- Data validation is important
- Human readability matters
Use JSON when:
- Data has irregular structure
- Deep nesting predominates
- Existing systems require JSON
- Token savings are minimal
Integration Best Practices
Agent Communication Protocols
- Standardize delimiters: Use tabs for maximum token efficiency
- Include length markers: Help LLMs validate data completeness
- Use strict mode: Enable validation in production
- Handle errors gracefully: Implement fallback to JSON
Workflow Optimization
class OptimizedAgentWorkflow {
private useToon = true
async processData(data: any) {
if (this.shouldUseToon(data)) {
return this.processWithToon(data)
} else {
return this.processWithJson(data)
}
}
private shouldUseToon(data: any): boolean {
// Analyze data structure for TOON suitability
return this.hasUniformArrays(data) && this.estimateTokenSavings(data) > 20
}
private hasUniformArrays(data: any): boolean {
// Check if data contains uniform object arrays
return Object.values(data).some((value: any) =>
Array.isArray(value) &&
value.length > 1 &&
value.every(item => typeof item === 'object' && item !== null)
)
}
private estimateTokenSavings(data: any): number {
const jsonTokens = JSON.stringify(data).length
const toonTokens = encode(data, { delimiter: '\t' }).length
return ((jsonTokens - toonTokens) / jsonTokens) * 100
}
}
Conclusion
TOON represents a significant advancement in data serialization for LLM interactions, offering substantial token savings while maintaining structural clarity. When integrated into agentic coding workflows, it enables more efficient AI-human collaboration by reducing communication overhead and improving data processing reliability.
The format’s strength lies in its dual nature: human-readable yet machine-optimized, making it ideal for autonomous agents that need to both understand and generate structured data. As AI systems continue to evolve, formats like TOON will become increasingly important for maintaining efficiency in complex, data-intensive workflows.
By adopting TOON in your agentic coding projects, you can achieve better performance, lower costs, and more reliable AI interactions - all while maintaining the flexibility and expressiveness that modern applications require.
This article was written by Kilo Code, based on content from: https://github.com/toon-format/toon