ZS provides formal constructs for describing reasoning processes — not as rigid instructions, but as composable cognitive operations with variables, control flow, and result formatting.
Think of it as SQL for thinking: you define what cognitive steps to take, the LLM decides how to execute them.
Scripts are executed by an LLM as interpreter: the model reads a .zobr
file, executes operations step by step, tracks variables, follows control flow,
and produces structured output.
Plus: variables, for/if/loop control flow,
user-defined functions (define), yield, imports,
@last/@N references.
ZS is a reasoning amplifier, not a capability test. It doesn’t make weak models strong — it makes all models structured. A Haiku execution of a ZS script produces more useful output than a free-form Haiku response to the same question, because the script forces the model to decompose reasoning, show its work, and format conclusions. The benchmark confirms: even the smallest model follows ZS scripts with 92.5% structural fidelity.
When reasoning structure is provided externally by the script, the model’s job shifts from organizing thought to filling containers with content. This is why Sonnet achieves near-parity with Opus (9.3 vs 9.4) — structured scripts compress the capability gap between tiers.
Encode your best analytical workflow once as a .zobr script,
then apply it to new inputs. A political news analysis script works on any article.
A due diligence script works on any company. The reasoning pattern is reusable —
the content changes.
Example: news-analysis.zobr runs the same 6-phase pipeline
(ground → stakeholders → motives → narrative gap → cui bono → blind spots)
on every article, ensuring nothing is missed.
ZS scripts make reasoning auditable. Instead of a black-box LLM response,
you get labeled operations ([doubt], [contrast])
with visible variable flow. You can verify that the model actually considered
counterarguments, not just generated a one-sided summary.
Critical for compliance, legal analysis, medical reasoning — anywhere you need to show how a conclusion was reached, not just what it is.
Benchmark shows different tasks need different models. Use Haiku for structural tasks (surveys, fact extraction) at 2.5× speed, Sonnet for most analytical work, Opus only for deep dialectical reasoning. ZS scripts make this routing explicit: the same script runs on any model.
Generate scripts with Sonnet (best architecturally), execute with Haiku at scale — valid structured reasoning at a fraction of the cost.
When an agent produces exceptional reasoning in a conversation,
the reasoning pattern can be distilled into a .zobr script —
a reusable artifact. The benchmark proves all three models can generate
valid, parameterized scripts (Task 05: 0 errors across all models).
Dual-purpose: humans write scripts as tasks for LLMs,
agents export their reasoning as .zobr files for future use.
ZS externalizes the structure of rigorous thinking: survey before asserting, doubt your own claims, contrast with the strongest counter, synthesize — don’t summarize. Students and analysts can learn these patterns by reading and writing scripts.
A dialectical.zobr template teaches iterative thesis refinement
better than a textbook paragraph about dialectics.
ZS scripts can serve as shared protocols between agents.
One agent runs survey and ground,
another runs doubt and contrast,
a third synthesizes the results.
The script defines the workflow; agents fill the operations.
Part of the Black Zobr federated co-thinking ecosystem.
steelman and devils_advocate functions with prompt, dot access (attack.damage_level), if/else branching..zobr script
encoding the reasoning pattern, and validates it with zobr-check. Tests both content quality and ZS code generation.
claude -p (headless mode)--effort high for consistent thinking depthevaluate-benchmark.zobr — a ZS script evaluating ZS results (meta-evaluation)15 runs total (5 tasks × 3 models), 0 failures. Total benchmark time: ~48 minutes.
| Task | Dimension | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|---|
| 01 — Simple pipeline | Structural | 10 | 10 | 9 |
| Content | 9 | 8 | 7 | |
| Composite | 9.5 | 9.0 | 8.0 | |
| 02 — Dialectical | Structural | 10 | 10 | 9 |
| Content | 9 | 9 | 6 | |
| Composite | 9.5 | 9.5 | 7.5 | |
| 03 — Custom functions | Structural | 10 | 10 | 9 |
| Content | 9 | 9 | 7 | |
| Composite | 9.5 | 9.5 | 8.0 | |
| 04 — News analysis | Structural | 10 | 10 | 10 |
| Content | 9 | 9 | 7 | |
| Composite | 9.5 | 9.5 | 8.5 | |
| 05 — Reflection | Content | 9 | 9 | 7 |
| Generation | 9 | 9 | 8 | |
| Composite | 9.0 | 9.0 | 7.5 |
All three models follow ZS scripts with high fidelity (9.25–10.0). Operations executed in order, variables tracked, control flow followed. The 0.75-point gap is cosmetic, not semantic.
The Opus–Haiku gap peaks at 3 points on Task 02 (iterative refinement, domain knowledge, emergent synthesis). Structural tasks show smaller gaps. ZS amplifies reasoning where it’s hardest.
Structured scripts reduce the capability gap between tiers. When reasoning structure is externalized, the model’s job shifts to filling containers with content — and Sonnet fills them nearly as well.
All three reflection.zobr files pass zobr-check with 0 errors. Generation capability scales with interpretation — no “generation penalty.” ZS script generation is a practical workflow.
| Use case | Model | Why |
|---|---|---|
| Structural tasks (extract, classify, survey) | Haiku | 1.7× faster than Opus; structural compliance ~perfect |
| Dialectical reasoning (doubt, contrast, reframe loops) | Opus | Content depth gap largest on iterative reasoning |
| News / political analysis | Sonnet Opus | Both expert-level; Sonnet adds source critique |
| Script generation | Sonnet | Most architecturally sophisticated; fully generalizable |
| High-volume batch processing | Haiku | 2.5× faster than Sonnet; valid reasoning at scale |
| Philosophy / deep analysis | Opus | Broadest references; most original framings |