Fable Is Gone — Don't Give Up: OpenRouter Fusion + Chinese LLMs + Review Layer

小村 豪

doi:10.5281/zenodo.21614692

Fable Is Gone — Don't Give Up: OpenRouter Fusion + Chinese LLMs + Review Layer

Jun 14, 2026 00:00 · Updated: Jul 5, 2026 · 小村豪 · AI, LLM, OpenRouter, Claude, ChatGPT, DeepSeek, Prompt Engineering, Developer Productivity, Tools

Revision history (first version, published Jun 14, 2026)

Jun 14, 2026: First published

Cite this article(DOI: 10.5281/zenodo.21614692)

This article is archived on Zenodo. Below are both the DOI that always resolves to the latest version and the DOI pinned to the version you are reading.

小村豪 (2026). Fable Is Gone — Don't Give Up: OpenRouter Fusion + Chinese LLMs + Review Layer. KomuraSoft LLC. https://doi.org/10.5281/zenodo.21614692 https://comcomponent.com/en/blog/2026/06/14/000-opencode-fusion-chinese-llms/

DOI (latest version): 10.5281/zenodo.21614692
DOI (this version): 10.5281/zenodo.21614693

1. Fable Is Dead. Now What?

Let’s be honest: nothing comes close to Fable.

As of this writing, Anthropic’s AI coding agent “Fable” has been discontinued. That accuracy, that speed, that sense of integration — there is no equivalent replacement.

But you can’t just sit around. The alternatives look like this:

Claude Code / Claude Sonnet: Smart. But alone it doesn’t reach Fable-level accuracy, and costs add up.
GPT-5 / GPT-5.5: Surprisingly mediocre. Runs out of steam fast on long codebases. And it’s expensive.
Cursor / Windsurf: Great experience, but you’re at the mercy of the backend model. No freedom to customize the setup.
Local LLMs (Ollama, etc.): Fast. But accuracy drops hard once you hand it a real repository.

After trying everything, here’s where I landed: generate code with OpenRouter’s Fusion + Chinese LLMs, then finish with a GPT-5.5-Pro review or a Codex PR review. It doesn’t touch Fable, but it’s substantially better than bare GPT-5.5. It’s usable.

One caveat: these are Chinese LLMs, so feeding them production code is genuinely scary. This is strictly for hobby projects and personal development.

2. What Is OpenRouter Fusion?

OpenRouter is an API gateway that unifies access to multiple LLM providers. Claude, GPT, Gemini, DeepSeek — all through the same interface.

OpenRouter has a feature called Fusion.

In short: one prompt goes to multiple models in parallel, their responses are analyzed, and a single model synthesizes the final output. Think of it as a model committee.

Here’s the actual flow:

The user submits a prompt
All models in analysis_models answer the same prompt simultaneously
The model specified in model (DeepSeek V4 Pro in this setup) compares their responses — mapping out agreements, contradictions, gaps, and blind spots (= the judge role)
The same model then writes the final answer, drawing on that structured analysis

One important technical note: you cannot assign different roles or prompts to individual analysis models. All models receive the identical prompt, answer independently, and the judge evaluates the differences. That’s the whole mechanism — simple, not magical.

opencode, a CLI coding agent, has native plugin support for Fusion. Drop a config into opencode.json and you get a multi-model collaborative coding agent.

3. The Actual Config

Here’s the core of my current opencode.json:

{
  "model": "openrouter/openrouter/fusion",
  "provider": {
    "openrouter": {
      "models": {
        "openrouter/fusion": {
          "name": "OpenRouter Fusion (Custom DeepSeek V4 Pro)",
          "options": {
            "plugins": [
              {
                "id": "fusion",
                "analysis_models": [
                  "xiaomi/mimo-v2.5-pro",
                  "z-ai/glm-5.1",
                  "deepseek/deepseek-v4-pro",
                  "moonshotai/kimi-k2.7-code",
                  "minimax/minimax-m3"
                ],
                "model": "deepseek/deepseek-v4-pro",
                "max_tool_calls": 8
              }
            ]
          }
        }
      }
    }
  }
}

5 analysis models and 1 execution model.

Analysis models (all receive the same prompt, answer in parallel)

Model	Developer	Strengths
`xiaomi/mimo-v2.5-pro`	Xiaomi	From one of China’s largest smartphone makers. Exceptional cost-performance.
`z-ai/glm-5.1`	Zhipu AI	Tsinghua University spin-out. Consistently tops Chinese-language benchmarks. Strong logical reasoning.
`deepseek/deepseek-v4-pro`	DeepSeek	The well-known disruptive LLM from China. Outstanding coding performance.
`moonshotai/kimi-k2.7-code`	Moonshot AI	Code-generation specialist. Strong reputation for long-context processing.
`minimax/minimax-m3`	MiniMax	Rising star in multimodal and long-context processing.

Execution model (writes the final answer based on the analysis)

deepseek/deepseek-v4-pro: Takes the structured analysis from the panel’s responses and produces the actual code and file edits.

Why Chinese LLMs?

Simple: the cost-performance is overwhelming.

As of 2026, Chinese LLMs offer API pricing at 1/5 to 1/20 of Anthropic or OpenAI rates, with comparable or better performance. DeepSeek V4 Pro’s coding ability, in particular, feels at least on par with Claude Sonnet and GPT-5.5 in real-world use — often better.

4. How It Works — And Why Fusion Alone Isn’t Enough

The Fusion-only flow:

User submits a prompt
5 analysis models independently generate responses
The execution model (DeepSeek V4 Pro) compares those responses and produces structured analysis — agreement points, contradictions, gaps, and blind spots
Drawing on that analysis, the same execution model (DeepSeek V4 Pro) writes the final code

This mirrors human team development: five engineers review the same spec and share their takes, then one synthesizes everything into an implementation.

A note on context window limits: Fusion’s context window is 128K tokens. 128K might seem tight, and it fills up faster than you’d expect — it has to hold the full conversation history plus all five panel responses. Don’t panic, though. Even if the 128K window overflows and the synthesis step fails, each analysis model has already finished processing the prompt independently with its own (much larger) context window — DeepSeek V4 Pro at 1M tokens, Kimi K2.7 at ~256K. opencode also retains the full session history. The models’ “memory” isn’t lost. In practice, it’s not a serious obstacle.

But this alone doesn’t come close to Fable. Fable was a single model that outperformed multi-model committees. That’s Anthropic’s engineering for you.

So I add one more layer: a review pass. Two options.

Option A: GPT-5.5-Pro code review

Generate code with Fusion
Feed the output to GPT-5.5-Pro with “review this and propose fixes”
Apply those fixes through the execution model

GPT-5.5-Pro is mediocre at generating code from scratch, but oddly good at reviewing existing code.

Option B: GitHub Codex PR review

Generate code with Fusion
Open a PR and run it through GitHub Codex PR review
Address the bugs and design concerns Codex flags

Codex reviews with GitHub context — issues, past PRs, project structure — giving it a different lens than a generic LLM review.

Pick one. Fusion (5-model committee) → review (GPT-5.5-Pro or Codex). Two layers. Can’t match Fable’s one-shot accuracy, so we compensate with process.

Library version mismatches and hallucinated APIs — the kind of nonsense bare GPT-5.5 pulls — dropped significantly with this setup.

5. Speed and Practicality

Speed-wise, it’s comparable to a single model. All 5 analysis models run in parallel, so raw response time is within tolerance.

The issue is accuracy. Where Fable would nail it in one shot, this setup requires some back-and-forth. Net development time is clearly slower than Fable. You pay for every mistake with rework.

But it’s substantially better than bare GPT-5.5. Fusion + review produces code that’s reasonably close on the first attempt. Far better than the generate → fix → fix → fix death spiral you get with GPT-5.5 alone.

The bottom line: speed is fine. Accuracy is nowhere near Fable, but several notches above bare GPT-5.5. A workable compromise.

6. Bonus: Review Layer and Other opencode Settings

Review layer: GPT-5.5-Pro or Codex PR review

Fusion alone feels insufficient, so I use one of these as a review layer.

GPT-5.5-Pro: Point opencode’s task sub-agent at it with “review this code and propose fixes.” Awkward at writing code from scratch, weirdly good at finding flaws in code someone else wrote.

GitHub Codex: Trigger Codex PR review when opening a PR. Reviews with full project context — different perspective from a generic LLM review.

Whichever I pick depends on the day. Both would be overkill and too slow.

Other settings

A few practical additions in opencode.json beyond Fusion:

{
  "permission": {
    "read": "allow",
    "glob": "allow",
    "grep": "allow",
    "task": "allow",
    "webfetch": "allow",
    "websearch": "allow",
    "lsp": "allow",
    "edit": "allow",
    "bash": {
      "*": "allow",
      "Remove-Item *": "deny",
      "del *": "deny",
      "rm *": "deny",
      "rmdir *": "deny",
      "rd *": "deny",
      "erase *": "deny",
      "git clean *": "deny"
    }
  },
  "experimental": {
    "primary_tools": ["task"]
  }
}

Two points:

1. Blocking destructive commands

Remove-Item, del, rm, rmdir, rd, erase, and git clean are explicitly denied. I don’t trust an AI agent with file deletion. If something needs deleting, I’ll do it myself.

2. primary_tools: ["task"]

Prioritizes sub-agent parallel exploration. When searching or reading across multiple files in a large codebase, the task tool is significantly faster.

7. Bottom Line

Losing Fable sucks. It really sucks. Nothing out there replaces it.

But you still need to ship code, so here’s my coping strategy: OpenRouter Fusion + Chinese LLMs, with a review pass from either GPT-5.5-Pro or Codex PR review.

The formula:

OpenRouter Fusion for 5-model “committee” code generation
Analysis models: Xiaomi Mimo / GLM / DeepSeek / Kimi / MiniMax
Execution model: DeepSeek V4 Pro
Review layer: GPT-5.5-Pro or Codex PR review (pick one)
Speed is comparable to a single model; slower than Fable only due to accuracy rework
Far better than bare GPT-5.5. Nowhere near Fable, but usable.
Chinese LLMs, so stick to hobby/personal projects

If you’re suffering from Fable withdrawal or finding bare GPT-5.5 too unreliable to get real work done, this is worth trying as a stopgap.

All you need is an OpenRouter account. You can start today.

References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

Frequently Asked Questions

Common questions about the topic of this article.

What is OpenRouter Fusion and how does it work?: OpenRouter is an API gateway that gives you unified access to multiple LLM providers, and Fusion is one of its features. A single prompt is sent to every model listed in analysis_models in parallel, then the model specified in the model field compares their responses — mapping agreements, contradictions, gaps, and blind spots — and the same model writes the final answer based on that analysis. All analysis models receive the identical prompt; you cannot assign different roles or prompts to individual models. The opencode CLI coding agent has native plugin support for Fusion, so you only need a configuration block in opencode.json.
Why use Chinese LLMs like DeepSeek and Kimi for coding?: The main reason is cost-performance. As of 2026, Chinese LLMs offer API pricing at roughly 1/5 to 1/20 of Anthropic or OpenAI rates, with comparable or better performance, and DeepSeek V4 Pro's coding ability in particular feels at least on par with Claude Sonnet and GPT-5.5 in real-world use. The significant caveat is trust: feeding production code to these models is genuinely risky, so this setup is strictly for hobby projects and personal development.
Is OpenRouter Fusion alone enough to replace a top-tier coding agent?: No. Fusion output alone doesn't come close to what Fable delivered as a single model, so the setup adds a second layer: a review pass using either a GPT-5.5-Pro code review or a GitHub Codex PR review. GPT-5.5-Pro is mediocre at generating code from scratch but notably good at finding flaws in existing code, while Codex reviews with full GitHub context such as issues and past PRs. With this two-layer process, library version mismatches and hallucinated APIs dropped significantly compared to bare GPT-5.5, though accuracy still requires more back-and-forth than Fable did.
Does Fusion's 128K context window limit cause problems in practice?: Less than you might expect. The 128K window has to hold the full conversation history plus all five panel responses, so it does fill up quickly. However, even if the window overflows and the synthesis step fails, each analysis model has already processed the prompt independently with its own much larger context window — DeepSeek V4 Pro at 1M tokens, Kimi K2.7 at around 256K — and opencode retains the full session history. In practice it isn't a serious obstacle.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

View Profile Contact

Public links

GitHub LinkedIn X COM_BLAS COM_BigDecimal

Back to the Blog

Fable Is Gone — Don't Give Up: OpenRouter Fusion + Chinese LLMs + Review Layer

1. Fable Is Dead. Now What?

2. What Is OpenRouter Fusion?

3. The Actual Config

Analysis models (all receive the same prompt, answer in parallel)

Execution model (writes the final answer based on the analysis)

Why Chinese LLMs?

4. How It Works — And Why Fusion Alone Isn’t Enough

5. Speed and Practicality

6. Bonus: Review Layer and Other opencode Settings

Review layer: GPT-5.5-Pro or Codex PR review

Other settings

7. Bottom Line

References

Information Security 10 Major Threats 2026 — How to Read the Ranking, and What SMEs Should Actually Guard Against

Best Practices for Designing Chatbots That Actually Help in Business

Power Automate Error Handling and Retry Design — Preventing 'The Flow Was Working, Then It Just Stopped'

PowerShell Error Handling and Retry Design — From the try/catch Trap to Exit Codes and Retry Best Practices

Designing Scheduled Flows in Power Automate — Month-End Processing, Business-Day Checks, and Reminders in Practice

Related Topics

Windows Technical Topics

Frequently Asked Questions

Author Profile

Go Komura

1. Fable Is Dead. Now What?

2. What Is OpenRouter Fusion?

3. The Actual Config

Analysis models (all receive the same prompt, answer in parallel)

Execution model (writes the final answer based on the analysis)

Why Chinese LLMs?

4. How It Works — And Why Fusion Alone Isn’t Enough

5. Speed and Practicality

6. Bonus: Review Layer and Other opencode Settings

Review layer: GPT-5.5-Pro or Codex PR review

Other settings

7. Bottom Line

References

Related Articles

Information Security 10 Major Threats 2026 — How to Read the Ranking, and What SMEs Should Actually Guard Against

Best Practices for Designing Chatbots That Actually Help in Business

Power Automate Error Handling and Retry Design — Preventing 'The Flow Was Working, Then It Just Stopped'

PowerShell Error Handling and Retry Design — From the try/catch Trap to Exit Codes and Retry Best Practices

Designing Scheduled Flows in Power Automate — Month-End Processing, Business-Day Checks, and Reminders in Practice

Related Topics

Windows Technical Topics

Frequently Asked Questions

Author Profile

Go Komura