Do UUIDs Really Not Collide? Operational and Implementation Patterns That Cause Duplicates
You picked UUID as your primary key, and one day a duplicate key error shows up. The first reaction is usually, “So UUIDs do collide after all.”
In practice, however, most UUID duplications in real systems are not caused by the UUID specification itself. They come from implementations and operations that quietly break the assumptions the specification depends on.
1. The short version: a list of dangerous patterns
| Pattern | What goes wrong | First thing to do |
|---|---|---|
| Hand-rolled UUIDv4 using a fixed seed or weak PRNG | The same sequence reproduces across processes or nodes | Use the OS / runtime’s standard UUID API |
| Generation state carried over after fork, VM snapshot, or container clone | Random and counter state rolls back, producing duplicates | Reseed after fork; reinitialize after clone |
| Treating UUIDv3/v5 as “a fresh ID every time” | The same input always regenerates the same UUID | Treat them as deterministic IDs and limit their use |
| Custom implementations of UUIDv1/v6/v7/v8 | Duplicates surface under high throughput or across nodes | Use existing libraries; reduce homemade generators |
| Truncating UUIDs partway through | You throw away the original 128 bits of uniqueness yourself | Store and compare at full length |
| No UNIQUE / PRIMARY KEY at the DB layer | Duplicates slip in silently and are hard to trace | Keep a uniqueness constraint at the storage layer |
2. What each UUID version actually guarantees
| Version | Approach | Things to watch for |
|---|---|---|
| UUIDv4 | Random-based (122 random bits) | Must use a CSPRNG |
| UUIDv7 | Timestamp + random / counter | Sortable; counter design matters under high-rate generation |
| UUIDv3 / v5 | Name-based (deterministic) | Same input always produces the same UUID. Not for issuing fresh IDs. |
| UUIDv8 | Experimental / vendor-defined | Uniqueness is implementation-defined. Do not assume it. |
3. Pattern 1: Hand-rolling UUIDv4 with a weak PRNG
A common variant: pull 128 bits from a general-purpose PRNG comparable to Math.random(), and seed it at startup with time() or the PID. The output looks like a UUID, but if the random source is weak, the same sequence reproduces in another process.
RFC 9562 requires UUID generators to use a CSPRNG to provide both uniqueness and unpredictability. Python’s uuid.uuid4() is also documented as being generated cryptographically.
What to do: don’t roll your own UUID. Don’t tweak the random seed by hand. Use the standard library as is.
4. Pattern 2: Rolling back generator state via fork, snapshot, or clone
- Restoring multiple instances from the same VM snapshot
- A custom generator starting from the same initial state every time a container image launches
- Sharing PRNG or counter state across forked workers
In these setups, the UUID generation sequence can be unintentionally replayed.
What to do:
- Don’t keep your own UUID generation state alive for long
- Reinitialize immediately after fork / clone / restore
- Where possible, lean on implementations that pull from OS-provided randomness on every call
5. Pattern 3: Treating UUIDv3/v5 as fresh-issued IDs
UUIDv3/v5 are not “non-colliding ID issuers.” They are “same input, same ID.” RFC 9562 also states that UUIDs derived from the same namespace and name must be equal.
Wrong usage examples:
- Calling
uuid5(NAMESPACE_URL, "https://example.com/users/42")and treating it as fresh ID issuance every time - Issuing IDs from a single global namespace plus email, with no tenant in the namespace
What to do: understand UUIDv3/v5 as deterministic IDs and limit their use. Don’t leave the namespace design vague.
6. Pattern 4: Hand-rolling time-based UUIDs
UUIDv1/v6/v7/v8 are dangerous to mimic at the surface level.
- v1 / v6: don’t assume “MAC address therefore unique.” RFC 9562 explicitly notes that the rise of virtual machines and containers means MAC address uniqueness is no longer guaranteed.
- v7: an implementation that issues many IDs within the same millisecond without a counter design, or that keeps generating when the clock moves backwards, is dangerous.
- v8: an in-house UUID built as “timestamp + shard id + a bit of randomness” makes its own design document the uniqueness specification. Shipping it without review is risky.
7. Pattern 5: Shortening UUIDs in flight
- Using only the leading 8 characters as a foreign key
- Squeezing a 128-bit UUID into a 64-bit integer
- A string column that’s too short, so the tail is silently truncated
- Treating the abbreviated form used in logs or UI as a uniqueness key
Changing the representation itself is fine. Stripping hyphens, normalizing case, or storing as 16 binary bytes are all conversions that preserve all 128 bits. The dangerous move is dropping the bits that uniqueness depends on.
8. Pattern 6: No uniqueness constraint at the database
Even if UUIDs are statistically very unlikely to collide, if you genuinely cannot tolerate a duplicate, the storage layer needs a uniqueness constraint too. RFC 9562 itself states that while UUIDs can offer practically sufficient uniqueness, true global uniqueness can never be absolutely guaranteed.
The practical baseline:
- Treat UUIDs as collision-resistant IDs
- Keep a UNIQUE / PRIMARY KEY in the DB as the last line of defense
- Design retry, idempotency, and incident logging for the duplicate case
9. A practical checklist
- Confirm you are not generating UUIDs yourself — fall back to the standard API wherever possible
- Pin the UUID version as part of the spec — v4/v7 are random-based, v3/v5 are deterministic, v8 is custom
- Audit how seeds and generator state are handled — never inherit the same state after fork, snapshot, or clone
- Verify that storage keeps the full length — don’t promote an abbreviated display form to the canonical key
- Put UNIQUE / PRIMARY KEY at the DB layer — UUIDs lower the probability; they are not the constraint
- Make duplicates observable — never swallow a duplicate key; keep it traceable
Wrap-up
UUID collision incidents almost never start from UUIDs being mathematically weak. They start from implementations and operations that break the assumptions UUIDs are built on. When a duplicate shows up, the first thing to question isn’t the math of UUIDs but the generator, the state management, the storage format, and the constraint design.
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Sorting out Windows text encodings and line endings - Shift_JIS / UTF-8 / UTF-16, mojibake, CRLF / LF, and why it gets confusing
A practical guide that breaks Windows text-file trouble down into independent pieces — bytes, encoding, BOM, and CRLF / LF — and walks th...
Pseudorandom vs True Random: How to Tell Them Apart
A practical look at pseudorandom and true random numbers across four axes - source, reproducibility, predictability, and speed - covering...
What GS1 Barcode Standards Actually Define, and What to Watch for in Practice - Sorting Out GTIN, AI, GS1-128, and GS1 DataMatrix
Organizes GS1 barcode standards into four layers - GTIN as the identification key, Application Identifiers for attributes, GS1-128 and GS...
Pitfalls in COM, OCX, and ActiveX Development - Visual Studio Bitness, Registration, and Admin-Rights Traps
The traps that bite COM, OCX, and ActiveX work in practice: 32-bit/64-bit mismatches, regsvr32 vs Regasm, HKCU vs HKLM scope, and admin-r...
Where to `catch`, log, and handle exceptions — sorting out call-hierarchy boundaries and responsibilities for real-world code
A practical breakdown of where in the call hierarchy you should catch exceptions, where the primary log belongs, and where to decide betw...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.