Why does Codex garble Japanese text on Windows?

The root cause is usually not that Codex is weak at Japanese, but that Windows assets mix multiple encodings and write paths. Newer sources and Markdown tend to be UTF-8 while older CSVs, TXT files, logs, and configs are CP932-family, with some UTF-16 artifacts and mixed LF/CRLF newlines. If Codex misinterprets the bytes even once and then saves, the problem stops being a display issue and becomes permanent corruption of the file itself. That is why prevention comes down to managing the I/O procedure.

Why is telling an AI to convert everything to UTF-8 dangerous?

Because for everyday maintenance it silently re-encodes files that other tools and workflows still expect in their original encoding. Moving a whole repository to UTF-8 can be a legitimate decision, but it is safer done as a separate, explicit task where you review the diff and the blast radius. Day to day, the stable workflow is to preserve the original encoding when editing existing files and only default to UTF-8 for newly created files.

Where should encoding rules for Codex be written down?

If your workflow has Codex read AGENTS.md, encoding rules belong there permanently rather than being repeated verbally in every session. The rules should cover checking encoding, BOM, and newlines before reading, prohibiting saves when mojibake is suspected, preserving existing files, using only write methods with explicit encoding, and stopping to report warning signs such as replacement characters or unexpected BOM changes. For individual tasks, adding target file paths and representative Japanese strings to watch is remarkably effective.

Codex Mojibake on Windows: How to Stop Garbled Text

When you have Codex work with files containing Japanese text on Windows, the first thing that actually helps is not aligning every editor and shell setting — it is explicitly telling Codex how to read, how to write, and where to stop.

The situations that cause the most trouble look like this.

UTF-8, CP932, and UTF-16-family files coexist
The text looks readable on screen, but the interpretation of the actual bytes is off
You only meant to tweak an existing file, but it gets re-saved in a different encoding
Breakage happens in “non-code” files: CSV, TXT, logs, Markdown, configuration files
A throwaway script or raw shell output gets saved as-is, and the accident becomes permanent

OpenAI’s Codex is more stable when you treat it less like a one-off chat partner and more like a teammate you use continuously, with settings and working rules. In particular, if your workflow has Codex read AGENTS.md, encoding rules belong there permanently rather than being repeated verbally every time.

In this article, we organize, from a practitioner’s perspective, the instructions that are most effective to give Codex up front so it can safely handle Japanese files on Windows.

1. The Conclusion First

The single most effective way to reduce Codex mojibake accidents on Windows is to fix the encoding work procedure in advance.

These are the rules that help the most.

For existing files containing Japanese, have it check the likely encoding, BOM presence, and newline style before reading
For files where mojibake is suspected, do not let it save until it is confident
For existing files, have it preserve the original encoding, BOM, and newlines
For new files, steer toward UTF-8 per repository convention
For writes, only allow methods where the encoding can be made explicit
After saving, have it re-read the file and verify representative Japanese lines

In short, day-to-day form, it boils down to this.

Check before reading
No saving when in doubt
Preserve existing files; UTF-8 only for new ones
Ban ambiguous write paths
Re-read and verify at the end

Conversely, these are the dangerous kinds of instructions.

“Fix the mojibake”
“Convert everything to UTF-8”
“Output a CSV”
“Just make it match”
“Save it for now and let’s see”

None of these say at which point Codex should stop. For mojibake prevention, you need to specify not only what to do but also where to stop short of saving.

2. Why Mojibake Accidents Are So Common on Windows

The real problem is not that Codex is weak at Japanese — it is that on the Windows asset side, multiple encodings and multiple write paths coexist.

In practice, this kind of mixture is not unusual.

Newer sources and Markdown are UTF-8
Older CSVs, TXT files, logs, and configs are CP932-family
Some outputs and tool-generated artifacts are UTF-16-family
Save paths vary across editors, shells, and Excel-derived output
Newlines are also mixed between LF and CRLF

In this state, if Codex misinterprets the bytes even once, it can proceed to the next edit treating strings it failed to read as if they had been read correctly. And if it then saves, the problem is no longer a display issue — it becomes fixed as corruption of the file itself.

That is why mojibake prevention ultimately comes down to how you manage the I/O procedure.

3. The Rules You Want to Fix for Codex First

3.1 Have it check the likely encoding, BOM, and newlines before reading

The first rule is this.

Before reading an existing file that contains Japanese, check its likely encoding, BOM presence, and newline style, and if anything looks suspicious, do not proceed to interpreting the content as-is.

The point is to change the workflow to “before reading the text, first look at the file’s premises.”

3.2 Do not let it save a file with suspected mojibake based on guesswork

This one is especially important.

When mojibake is suspected, treat the file as read-only during investigation and prohibit overwriting until the encoding interpretation is credible.

The same applies to humans: never save a file you have not actually been able to read. Saving on “it looks a bit broken, but this is probably it” turns that guess into the confirmed version of the accident.

3.3 Preserve existing files; default to UTF-8 only for new files

In the context of mojibake prevention, “unify everything to UTF-8” is surprisingly dangerous.

Eventually deciding to move the whole repo to UTF-8 is a legitimate call, but it is safer to do that as a separate task while reviewing the diff and the blast radius. For everyday maintenance, this workflow is the stable one.

When editing an existing file, preserve its original encoding
When adding a new file, create it as UTF-8 per repo convention
If an existing file needs conversion, keep that separate from normal functional fixes

3.4 Do not let it use ambiguous write paths by default

What multiplies accidents on Windows is “it’s just a small output, so write it sloppily from the shell.”

Dumping output directly via redirection
Saving directly with a convenience command
Promoting a temporary artifact straight into a production file

These paths often have no explicit encoding, making them a breeding ground for accidents. So it is safest to also fix, for Codex, how write mechanisms are chosen.

3.5 After saving, have it re-read and verify representative Japanese lines

“It saved successfully” and “it is not broken” are not the same thing.

What matters is having it read representative Japanese lines again after saving and check points like these.

No replacement characters U+FFFD have crept in
No unnatural increase in ?
The diff is not a huge BOM-only or newline-only change
Japanese text that was not supposed to change is still intact

3.6 When warning signs appear, have it report before fixing

In encoding accidents, you limit the damage better by having Codex stop and report rather than forcing a fix.

For example, if any of these signs appear, it is safer to treat the situation as abnormal for the moment.

An increase in U+FFFD
An increase in ?
An unexpected BOM change
A large newline-only diff
Only the Japanese lines changing unnaturally and substantially

4. As a Short Instruction to Attach to Tasks

If you want a short version to attach to each task, this much is already quite effective.

In this task, avoiding encoding accidents is the top priority.

- For existing files containing Japanese, check the likely encoding, BOM presence, and newline style before reading
- Do not save files with suspected mojibake based on guesswork
- Preserve the original encoding / BOM / newlines of existing files
- Create new files as UTF-8 per repo convention
- Only use write methods where the encoding can be made explicit
- After saving, re-read the file and confirm that representative Japanese lines are intact
- Report as abnormal any increase in `U+FFFD` or `?`, BOM / newline accidents, or large diffs

If the target files are already known, adding this one line stabilizes things considerably.

Target files: <paths> / Representative strings: "<examples>"

Providing representative strings is remarkably effective. It gives Codex a concrete watch point: “this Japanese text must not break.”

5. A Template Worth Keeping in `AGENTS.md`

Rather than repeating the same warnings over and over, put them in AGENTS.md. Below is a practice-oriented template for repos that handle Japanese files on Windows.

# Text Encoding Rules

## Scope
This repository may contain Japanese text and mixed legacy encodings.
Avoid mojibake and accidental re-encoding above all else.

## Mandatory Rules
- Before reading or editing an existing text file that may contain Japanese, first determine:
  - likely encoding
  - BOM presence
  - newline style
- If mojibake is suspected, do not save the file until the encoding interpretation is credible.
- Preserve the original encoding, BOM, and newline style for existing files.
- Treat "convert to UTF-8" as a separate, explicit task.
- New files should follow repository convention. If there is no clear rule, prefer UTF-8 and state whether BOM is used.
- Do not use ambiguous write paths by default, such as shell redirection or convenience commands without explicit encoding control.
- After writing, reopen the file and verify representative Japanese lines.
- If any of the following appears, stop and report:
  - replacement characters
  - unexpected `?`
  - unintended BOM change
  - unintended newline conversion
  - whole-file diffs without a business reason

## Reporting Format
For each changed text file, report:
- path
- detected or preserved encoding
- BOM presence
- newline style
- how verification was performed
- whether representative Japanese text remained intact

The strength of this template is that it pins down not just how to edit but how not to break things. In particular, these two lines pull a lot of weight:

If mojibake is suspected, do not save ...
Treat "convert to UTF-8" as a separate, explicit task.

6. Bad Instructions vs. Good Instructions

In mojibake prevention, the granularity of your instructions strongly shapes the outcome.

Bad instruction	Good instruction
Fix the mojibake	First determine whether the file itself is corrupted or it is only a display-side issue, and do not save based on guesswork
Convert everything to UTF-8	Preserve the original encoding of existing files; only create new files as UTF-8 per repo convention. Make converting existing files a separate task
Output a CSV	Match the encoding used in existing operations, make the encoding explicit when writing, and re-read the Japanese columns after output to verify
Fix whatever you can read	Do not save anything you are unsure about; report candidates and your reasoning instead
Just make it match	Do not change the BOM, newlines, or encoding on your own; make sure the diff contains only the business change

The point is to always write in the checks before touching anything and the verification after saving.

7. A Checklist for Review Time

After Codex has done the work, fixing the checkpoints on the human side as well makes things even more stable.

Is the encoding / BOM / newline handling reported for each changed file?
Have only the Japanese lines changed unnaturally and substantially?
Are there large newline-only diffs?
Has U+FFFD or ? increased?
Are there whole-file diffs unrelated to the business change?
Have columns or quoting broken in CSVs or logs?

What matters in mojibake prevention is stopping suspicious diffs early, more than accumulating successful diffs.

8. Summary

When you have Codex handle Japanese files on Windows, the first thing that helps is not perfecting your machine’s setup — it is explicitly giving Codex the encoding work procedure.

The five points worth remembering:

Have it check encoding / BOM / newlines before reading
If mojibake is suspected, do not let it save on guesswork
Preserve existing files; steer only new files toward UTF-8
Ban ambiguous write paths
Have it re-read after saving and verify representative Japanese lines

And if you find yourself saying it every time, put it in AGENTS.md. That is the most practical move.

The core of mojibake prevention is not asking it to “handle Japanese properly” — it is writing down the conditions under which saving is allowed and the conditions under which it must stop. Once you have that in writing, Codex becomes far easier to work with even on Windows.

9. References

OpenAI Codex docs, Best practices
OpenAI Codex docs, Custom instructions with AGENTS.md
OpenAI Codex docs, Windows

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

Windows Text Encodings and Line Endings - The Basics of Mojibake and CRLF/LF

Why text gets garbled on Windows and why CRLF vs LF still causes trouble. How UTF-8, UTF-16, and CP932 (Shift_JIS) differ, plus rules tha...

Read Article

An Introduction to Windows Text Encodings - The Mojibake That Happens When Integrating with Linux

A practical look at why mojibake happens on Windows, through the differences between CP932, UTF-8, UTF-16, BOMs, code pages, PowerShell, ...

Read Article

How to Think About Windows Session Isolation — Session 0, RDP, and Running Multiple Users Concurrently

This article untangles the concept of a Windows "session," a topic that consistently confuses Windows app developers. It covers why Sessi...

Read Article

Preventing Multiple Instances of a Windows App — Named Mutexes and Activating the Existing Window on a Second Launch

This article organizes the classic requirement for business Windows apps — 'don't let the same app launch twice' — around a named Mutex. ...

Read Article

Integrating Entra ID Authentication into WinForms/WPF Apps — A Practical Architecture with MSAL.NET and the WAM Broker

A practical, hands-on look at integrating Entra ID (formerly Azure AD) authentication into WinForms/WPF desktop apps: the public client m...

Read Article

Where This Topic Connects

This article connects naturally to the following service pages.

Technical Consulting & Design Review

In development environments where existing assets mix CP932 and UTF-8, sorting out AI prompting rules and operational procedures up front is one of the easiest ways to reduce accidents.

View Service Contact

Windows App Development

For Windows business tools and maintenance projects, operational design that prevents encoding accidents in Japanese files, CSVs, and configuration files directly affects implementation quality.

View Service Contact

Frequently Asked Questions

Common questions about the topic of this article.

Why does Codex garble Japanese text on Windows?: The root cause is usually not that Codex is weak at Japanese, but that Windows assets mix multiple encodings and write paths. Newer sources and Markdown tend to be UTF-8 while older CSVs, TXT files, logs, and configs are CP932-family, with some UTF-16 artifacts and mixed LF/CRLF newlines. If Codex misinterprets the bytes even once and then saves, the problem stops being a display issue and becomes permanent corruption of the file itself. That is why prevention comes down to managing the I/O procedure.
What prompting rules prevent mojibake accidents with AI coding tools?: Five rules do most of the work: have the tool check the likely encoding, BOM presence, and newline style before reading any existing file with Japanese text; forbid saving a file with suspected mojibake based on guesswork; preserve the original encoding, BOM, and newlines of existing files while creating new files as UTF-8; ban ambiguous write paths such as shell redirection without explicit encoding; and re-read the file after saving to verify representative Japanese lines are intact.
Why is telling an AI to convert everything to UTF-8 dangerous?: Because for everyday maintenance it silently re-encodes files that other tools and workflows still expect in their original encoding. Moving a whole repository to UTF-8 can be a legitimate decision, but it is safer done as a separate, explicit task where you review the diff and the blast radius. Day to day, the stable workflow is to preserve the original encoding when editing existing files and only default to UTF-8 for newly created files.
Where should encoding rules for Codex be written down?: If your workflow has Codex read AGENTS.md, encoding rules belong there permanently rather than being repeated verbally in every session. The rules should cover checking encoding, BOM, and newlines before reading, prohibiting saves when mojibake is suspected, preserving existing files, using only write methods with explicit encoding, and stopping to report warning signs such as replacement characters or unexpected BOM changes. For individual tasks, adding target file paths and representative Japanese strings to watch is remarkably effective.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

View Profile Contact

Public links

GitHub LinkedIn X COM_BLAS COM_BigDecimal

Prompting Rules That Reduce Codex Mojibake Accidents on Windows

1. The Conclusion First

2. Why Mojibake Accidents Are So Common on Windows

3. The Rules You Want to Fix for Codex First

3.1 Have it check the likely encoding, BOM, and newlines before reading

3.2 Do not let it save a file with suspected mojibake based on guesswork

3.3 Preserve existing files; default to UTF-8 only for new files

3.4 Do not let it use ambiguous write paths by default

3.5 After saving, have it re-read and verify representative Japanese lines

3.6 When warning signs appear, have it report before fixing

4. As a Short Instruction to Attach to Tasks

5. A Template Worth Keeping in `AGENTS.md`

6. Bad Instructions vs. Good Instructions

7. A Checklist for Review Time

8. Summary

9. References

Windows Text Encodings and Line Endings - The Basics of Mojibake and CRLF/LF

An Introduction to Windows Text Encodings - The Mojibake That Happens When Integrating with Linux

How to Think About Windows Session Isolation — Session 0, RDP, and Running Multiple Users Concurrently

Preventing Multiple Instances of a Windows App — Named Mutexes and Activating the Existing Window on a Second Launch

Integrating Entra ID Authentication into WinForms/WPF Apps — A Practical Architecture with MSAL.NET and the WAM Broker

Related Topics

Windows Technical Topics

Where This Topic Connects

Technical Consulting & Design Review

Windows App Development

Frequently Asked Questions

Author Profile

Go Komura

1. The Conclusion First

2. Why Mojibake Accidents Are So Common on Windows

3. The Rules You Want to Fix for Codex First

3.1 Have it check the likely encoding, BOM, and newlines before reading

3.2 Do not let it save a file with suspected mojibake based on guesswork

3.3 Preserve existing files; default to UTF-8 only for new files

3.4 Do not let it use ambiguous write paths by default

3.5 After saving, have it re-read and verify representative Japanese lines

3.6 When warning signs appear, have it report before fixing

4. As a Short Instruction to Attach to Tasks

5. A Template Worth Keeping in AGENTS.md

6. Bad Instructions vs. Good Instructions

7. A Checklist for Review Time

8. Summary

9. References

Related Articles

Related Topics

Where This Topic Connects

Frequently Asked Questions

Author Profile

Go Komura

5. A Template Worth Keeping in `AGENTS.md`