Fable Is Gone — Don't Give Up: OpenRouter Fusion + Chinese LLMs + Review Layer

· · AI, LLM, OpenRouter, Claude, ChatGPT, DeepSeek, Prompt Engineering, Developer Productivity, Tools

1. Fable Is Dead. Now What?

Let’s be honest: nothing comes close to Fable.

As of this writing, Anthropic’s AI coding agent “Fable” has been discontinued. That accuracy, that speed, that sense of integration — there is no equivalent replacement.

But you can’t just sit around. The alternatives look like this:

  • Claude Code / Claude Sonnet: Smart. But alone it doesn’t reach Fable-level accuracy, and costs add up.
  • GPT-5 / GPT-5.5: Surprisingly mediocre. Runs out of steam fast on long codebases. And it’s expensive.
  • Cursor / Windsurf: Great experience, but you’re at the mercy of the backend model. No freedom to customize the setup.
  • Local LLMs (Ollama, etc.): Fast. But accuracy drops hard once you hand it a real repository.

After trying everything, here’s where I landed: generate code with OpenRouter’s Fusion + Chinese LLMs, then finish with a GPT-5.5-Pro review or a Codex PR review. It doesn’t touch Fable, but it’s substantially better than bare GPT-5.5. It’s usable.

One caveat: these are Chinese LLMs, so feeding them production code is genuinely scary. This is strictly for hobby projects and personal development.


2. What Is OpenRouter Fusion?

OpenRouter is an API gateway that unifies access to multiple LLM providers. Claude, GPT, Gemini, DeepSeek — all through the same interface.

OpenRouter has a feature called Fusion.

In short: one prompt goes to multiple models in parallel, their responses are analyzed, and a single model synthesizes the final output. Think of it as a model committee.

Here’s the actual flow:

  1. The user submits a prompt
  2. All models in analysis_models answer the same prompt simultaneously
  3. The model specified in model (DeepSeek V4 Pro in this setup) compares their responses — mapping out agreements, contradictions, gaps, and blind spots (= the judge role)
  4. The same model then writes the final answer, drawing on that structured analysis

One important technical note: you cannot assign different roles or prompts to individual analysis models. All models receive the identical prompt, answer independently, and the judge evaluates the differences. That’s the whole mechanism — simple, not magical.

opencode, a CLI coding agent, has native plugin support for Fusion. Drop a config into opencode.json and you get a multi-model collaborative coding agent.


3. The Actual Config

Here’s the core of my current opencode.json:

{
  "model": "openrouter/openrouter/fusion",
  "provider": {
    "openrouter": {
      "models": {
        "openrouter/fusion": {
          "name": "OpenRouter Fusion (Custom DeepSeek V4 Pro)",
          "options": {
            "plugins": [
              {
                "id": "fusion",
                "analysis_models": [
                  "xiaomi/mimo-v2.5-pro",
                  "z-ai/glm-5.1",
                  "deepseek/deepseek-v4-pro",
                  "moonshotai/kimi-k2.7-code",
                  "minimax/minimax-m3"
                ],
                "model": "deepseek/deepseek-v4-pro",
                "max_tool_calls": 8
              }
            ]
          }
        }
      }
    }
  }
}

5 analysis models and 1 execution model.

Analysis models (all receive the same prompt, answer in parallel)

Model Developer Strengths
xiaomi/mimo-v2.5-pro Xiaomi From one of China’s largest smartphone makers. Exceptional cost-performance.
z-ai/glm-5.1 Zhipu AI Tsinghua University spin-out. Consistently tops Chinese-language benchmarks. Strong logical reasoning.
deepseek/deepseek-v4-pro DeepSeek The well-known disruptive LLM from China. Outstanding coding performance.
moonshotai/kimi-k2.7-code Moonshot AI Code-generation specialist. Strong reputation for long-context processing.
minimax/minimax-m3 MiniMax Rising star in multimodal and long-context processing.

Execution model (writes the final answer based on the analysis)

  • deepseek/deepseek-v4-pro: Takes the structured analysis from the panel’s responses and produces the actual code and file edits.

Why Chinese LLMs?

Simple: the cost-performance is overwhelming.

As of 2026, Chinese LLMs offer API pricing at 1/5 to 1/20 of Anthropic or OpenAI rates, with comparable or better performance. DeepSeek V4 Pro’s coding ability, in particular, feels at least on par with Claude Sonnet and GPT-5.5 in real-world use — often better.


4. How It Works — And Why Fusion Alone Isn’t Enough

The Fusion-only flow:

  1. User submits a prompt
  2. 5 analysis models independently generate responses
  3. The execution model (DeepSeek V4 Pro) compares those responses and produces structured analysis — agreement points, contradictions, gaps, and blind spots
  4. Drawing on that analysis, the same execution model (DeepSeek V4 Pro) writes the final code

This mirrors human team development: five engineers review the same spec and share their takes, then one synthesizes everything into an implementation.

A note on context window limits: Fusion’s context window is 128K tokens. 128K might seem tight, and it fills up faster than you’d expect — it has to hold the full conversation history plus all five panel responses. Don’t panic, though. Even if the 128K window overflows and the synthesis step fails, each analysis model has already finished processing the prompt independently with its own (much larger) context window — DeepSeek V4 Pro at 1M tokens, Kimi K2.7 at ~256K. opencode also retains the full session history. The models’ “memory” isn’t lost. In practice, it’s not a serious obstacle.

But this alone doesn’t come close to Fable. Fable was a single model that outperformed multi-model committees. That’s Anthropic’s engineering for you.

So I add one more layer: a review pass. Two options.

Option A: GPT-5.5-Pro code review

  1. Generate code with Fusion
  2. Feed the output to GPT-5.5-Pro with “review this and propose fixes”
  3. Apply those fixes through the execution model

GPT-5.5-Pro is mediocre at generating code from scratch, but oddly good at reviewing existing code.

Option B: GitHub Codex PR review

  1. Generate code with Fusion
  2. Open a PR and run it through GitHub Codex PR review
  3. Address the bugs and design concerns Codex flags

Codex reviews with GitHub context — issues, past PRs, project structure — giving it a different lens than a generic LLM review.

Pick one. Fusion (5-model committee) → review (GPT-5.5-Pro or Codex). Two layers. Can’t match Fable’s one-shot accuracy, so we compensate with process.

Library version mismatches and hallucinated APIs — the kind of nonsense bare GPT-5.5 pulls — dropped significantly with this setup.


5. Speed and Practicality

Speed-wise, it’s comparable to a single model. All 5 analysis models run in parallel, so raw response time is within tolerance.

The issue is accuracy. Where Fable would nail it in one shot, this setup requires some back-and-forth. Net development time is clearly slower than Fable. You pay for every mistake with rework.

But it’s substantially better than bare GPT-5.5. Fusion + review produces code that’s reasonably close on the first attempt. Far better than the generate → fix → fix → fix death spiral you get with GPT-5.5 alone.

The bottom line: speed is fine. Accuracy is nowhere near Fable, but several notches above bare GPT-5.5. A workable compromise.


6. Bonus: Review Layer and Other opencode Settings

Review layer: GPT-5.5-Pro or Codex PR review

Fusion alone feels insufficient, so I use one of these as a review layer.

GPT-5.5-Pro: Point opencode’s task sub-agent at it with “review this code and propose fixes.” Awkward at writing code from scratch, weirdly good at finding flaws in code someone else wrote.

GitHub Codex: Trigger Codex PR review when opening a PR. Reviews with full project context — different perspective from a generic LLM review.

Whichever I pick depends on the day. Both would be overkill and too slow.

Other settings

A few practical additions in opencode.json beyond Fusion:

{
  "permission": {
    "read": "allow",
    "glob": "allow",
    "grep": "allow",
    "task": "allow",
    "webfetch": "allow",
    "websearch": "allow",
    "lsp": "allow",
    "edit": "allow",
    "bash": {
      "*": "allow",
      "Remove-Item *": "deny",
      "del *": "deny",
      "rm *": "deny",
      "rmdir *": "deny",
      "rd *": "deny",
      "erase *": "deny",
      "git clean *": "deny"
    }
  },
  "experimental": {
    "primary_tools": ["task"]
  }
}

Two points:

1. Blocking destructive commands

Remove-Item, del, rm, rmdir, rd, erase, and git clean are explicitly denied. I don’t trust an AI agent with file deletion. If something needs deleting, I’ll do it myself.

2. primary_tools: ["task"]

Prioritizes sub-agent parallel exploration. When searching or reading across multiple files in a large codebase, the task tool is significantly faster.


7. Bottom Line

Losing Fable sucks. It really sucks. Nothing out there replaces it.

But you still need to ship code, so here’s my coping strategy: OpenRouter Fusion + Chinese LLMs, with a review pass from either GPT-5.5-Pro or Codex PR review.

The formula:

  • OpenRouter Fusion for 5-model “committee” code generation
  • Analysis models: Xiaomi Mimo / GLM / DeepSeek / Kimi / MiniMax
  • Execution model: DeepSeek V4 Pro
  • Review layer: GPT-5.5-Pro or Codex PR review (pick one)
  • Speed is comparable to a single model; slower than Fable only due to accuracy rework
  • Far better than bare GPT-5.5. Nowhere near Fable, but usable.
  • Chinese LLMs, so stick to hobby/personal projects

If you’re suffering from Fable withdrawal or finding bare GPT-5.5 too unreliable to get real work done, this is worth trying as a stopgap.

All you need is an OpenRouter account. You can start today.


References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog