When UUIDs Collide in Practice: Bad Operational and Implementation Patterns
When a table uses UUID as its primary key and a duplicate key error still appears, the first reaction is often simple:
“So UUIDs do collide after all.”
In practice, though, most UUID collision incidents are not caused by the UUID specification itself. They are caused by implementation and operational choices that break the assumptions the specification relies on. RFC 9562 defines UUIDv4 with 122 random bits, UUIDv7 with a time field plus 74 bits used for uniqueness, and UUIDv8 as explicitly implementation-specific. RFC 9562 is also very clear that UUIDv8 uniqueness must not be assumed. Python’s standard uuid documentation likewise says uuid4() is generated in a cryptographically secure method. In other words, when teams use the standard forms correctly, the built-in assumptions are already quite strong.1234
This article focuses on the bad operational and implementation patterns that turn UUID collisions into a real production problem.
The discussion is based on RFC 9562, Python documentation, and PostgreSQL documentation that could be verified as of March 2026.546
Contents
- 1. The short answer
- 2. Start by checking the version and generation model
- 3. Pattern 1: A weak or fixed-seed PRNG is pretending to be UUIDv4
- 4. Pattern 2: Generator state is replayed after fork, snapshot, or clone
- 5. Pattern 3: UUIDv3 or UUIDv5 is being used like a fresh ID allocator
- 6. Pattern 4: Time-based UUIDs or UUIDv8 are implemented by hand
- 6.1 UUIDv1 / v6 with careless node or clock-sequence handling
- 6.2 UUIDv7 without proper rollover and rollback logic
- 6.3 UUIDv8 treated as “the newer UUID”
- 7. Pattern 5: The UUID is shortened or collapsed later
- 8. Pattern 6: The database has no uniqueness backstop
- 9. Practical checklist
- 10. Wrap-up
- 11. Related articles
- 12. References
1. The short answer
These are the failure patterns worth suspecting first.
| Pattern | What actually goes wrong | First fix |
|---|---|---|
| Building “UUIDv4-like” values with a weak or fixed-seed PRNG | Different processes or nodes reproduce the same sequence | Use the runtime or OS UUID API directly |
| Reusing generator state after fork, VM snapshot, or container cloning | Random or counter state gets replayed | Reseed after state changes and reinitialize cloned generators |
| Using UUIDv3 / UUIDv5 as if they were always-fresh IDs | Same namespace + same name yields the same UUID by design | Treat them as deterministic IDs and use them intentionally |
| Hand-rolling UUIDv1 / v6 / v7 / v8 logic | Clock rollback, node reuse, or bad counters create duplicates | Remove custom generators where possible |
| Truncating or collapsing UUIDs later | You throw away the 128-bit uniqueness you thought you had | Store and compare the full value |
| Omitting UNIQUE / PRIMARY KEY constraints | Duplicates enter silently and diagnosis arrives late | Keep a storage-layer uniqueness guard |
The common theme is simple: the collision is often not happening because UUID is weak. It is happening because the system stops preserving the uniqueness properties UUID depends on.
2. Start by checking the version and generation model
A lot of confusion comes from talking about “UUID” as if all UUIDs behaved the same way.
- UUIDv4 is random-based. RFC 9562 defines 122 bits for random data after version and variant bits are reserved.1
- UUIDv7 is time-ordered, with a Unix-millisecond timestamp and the remaining bits used for uniqueness through random data and, optionally, carefully seeded counters.2
- UUIDv3 / v5 are name-based and deterministic. Same namespace plus same canonical name is supposed to produce the same UUID.7
- UUIDv8 is for experimental or vendor-specific layouts, and its uniqueness is implementation-specific.3
So before debugging a collision report, verify what is actually being generated:
- a standard-library
uuid4() - a library
uuid7() - a custom
timestamp + randomformatter - a
uuid5(namespace, name)call - an in-house “UUIDv8-style” format
Those are not operationally equivalent.
flowchart TD
A[Duplicate UUID is observed] --> B{Where was sameness introduced?}
B --> C[Weak generator]
B --> D[State rollback or cloning]
B --> E[Name-based UUID misuse]
B --> F[Truncation or serialization loss]
B --> G[Missing DB uniqueness guard]
C --> H[Implementation or operational failure]
D --> H
E --> H
F --> H
G --> H
In real incidents, this branch-based view is much faster than debating UUID probability in the abstract.
3. Pattern 1: A weak or fixed-seed PRNG is pretending to be UUIDv4
This is the most common implementation mistake.
Examples include:
- building 128 bits from a general-purpose PRNG
- seeding once from time or PID
- formatting 32 hex characters into a UUID-like string
The output may look like UUID, but if the random source is weak, the sequence can repeat across processes, machines, or restarts.
RFC 9562 recommends using a CSPRNG to get both low collision likelihood and low predictability. It also specifically says CSPRNG state should be properly reseeded after process-state changes such as forks.8
Python’s documentation mirrors the practical recommendation by stating that uuid.uuid4() generates a random UUID using a cryptographically secure method.4
The practical rule is straightforward:
- do not hand-roll UUID generation
- do not manually manage randomness unless you truly have to
- prefer standard runtime UUID APIs over “lightweight” in-house helpers
When teams keep a custom UUID generator because it feels small and harmless, they are usually creating future incident debt.
4. Pattern 2: Generator state is replayed after fork, snapshot, or clone
The next dangerous class of failures is state replay.
RFC 9562 explicitly calls out two relevant ideas:
- CSPRNG state should be reseeded after process forks8
- when an implementation lacks stable generator state, the frequency of regenerating clock sequence, counters, or random data increases, which increases the probability of duplicates9
From that, an important operational inference follows.
- a VM snapshot is taken and restored multiple times
- a container image boots many replicas with the same generator initialization path
- worker processes are forked while carrying shared random or counter state
In all of those cases, the ID generation path can replay more state than the team expects.
RFC 9562 does not literally say “VM snapshots will cause UUID collisions.” That part is an inference. But it is a strong and practical one, because the RFC is very clear about reseeding after process-state changes and about the risks of poor generator-state handling.89
The fix is usually operational, not mathematical:
- avoid long-lived custom generator state where possible
- reinitialize generators after fork, restore, or clone
- prefer UUID implementations that ask the OS for entropy instead of preserving too much internal state
- document what “cold start” and “restored start” mean for ID generation
5. Pattern 3: UUIDv3 or UUIDv5 is being used like a fresh ID allocator
UUIDv3 and UUIDv5 are not “collision-resistant new ID each time” functions.
They are deterministic name-to-ID mappings.
RFC 9562 says that UUIDs generated from the same name, in the same namespace, using the same canonical format, must be equal.7 That means duplicates are expected when teams do things like:
- using
uuid5(namespace, url)as if it were a fresh event ID - generating IDs from customer email without including tenant scope in the namespace design
- retrying the same name-based generation while assuming a new UUID should appear
The opposite problem also happens: if canonicalization is inconsistent, the same logical item can receive different UUIDs. RFC 9562 spends a surprising amount of effort clarifying canonical name representation for exactly this reason.710
The practical lessons are:
- UUIDv3 / v5 are deterministic, not random allocators
- namespace design must be explicit
- canonicalization rules must be treated as part of the ID specification
If the same input should always map to the same ID, v3 or v5 can be correct. If the goal is “give me a new unique value every time,” they are the wrong tool.
6. Pattern 4: Time-based UUIDs or UUIDv8 are implemented by hand
UUIDv1, v6, v7, and v8 are especially dangerous when teams copy only the shape and ignore the behavioral rules.
6.1 UUIDv1 / v6 with careless node or clock-sequence handling
RFC 9562 explains UUIDv6 as a reordered UUIDv1 for improved database locality, while preserving clock sequence and node concepts.11 It also discusses generator state, distributed generation, and node collision resistance in detail.912
One especially important point appears early in the RFC motivation section: with the rise of virtual machines and containers, MAC address uniqueness can no longer be assumed.5
That makes several patterns dangerous:
- assuming MAC address means globally unique enough
- cloning machine images with embedded node IDs
- resetting clock sequence to a fixed value at each boot
6.2 UUIDv7 without proper rollover and rollback logic
UUIDv7 is very practical, but RFC 9562 is careful about monotonicity and counters. It explicitly says implementations must not knowingly return duplicates due to counter rollover, and it discusses rollback handling for clock and counter state.213
So these implementations are risky:
- generating many UUIDs per millisecond without a proper counter plan
- ignoring clock rollback entirely
- running multiple processes that each reinitialize the same internal generator logic independently
6.3 UUIDv8 treated as “the newer UUID”
UUIDv8 is often misunderstood. RFC 9562 says the uniqueness of UUIDv8 is implementation specific and must not be assumed.3
That means a format such as:
- timestamp bits
- shard bits
- business-category bits
- “whatever random is left”
is not automatically safe because it still looks like UUID.
At that point, the real uniqueness contract is your design document, not the RFC.
7. Pattern 5: The UUID is shortened or collapsed later
Sometimes generation is correct, but the system destroys uniqueness later.
Typical examples:
- using only the first 8 characters as an external key
- folding a 128-bit UUID into a 64-bit integer
- storing UUID text in a column that is too short
- treating a display-friendly short form as the real unique identifier
The important distinction is that not all representation changes are bad.
These are usually fine because they preserve the full 128 bits:
- removing hyphens
- normalizing hex case
- storing the raw 16-byte binary form
The dangerous operations are the ones that discard information.
Once the system starts comparing prefixes, shortened forms, or lossy hashes, it is no longer operating on the same uniqueness guarantee.
8. Pattern 6: The database has no uniqueness backstop
Even when UUID generation is strong, storage should still defend itself if duplicates are unacceptable.
PostgreSQL’s documentation states that a unique constraint ensures the values in a column or group of columns are unique across the whole table, and that a primary key is both unique and not null.6
RFC 9562 makes the broader point too: UUIDs can provide practical uniqueness guarantees, but true global uniqueness cannot be absolutely guaranteed without shared knowledge, and applications should weigh the impact of collisions in context.14
So the practical baseline is:
- use UUIDs as low-collision identifiers
- keep UNIQUE or PRIMARY KEY constraints in the database
- design duplicate handling, retry logic, and idempotency intentionally
Using UUID does not remove the need for a uniqueness constraint. It reduces how often the constraint should fire.
9. Practical checklist
This is the condensed version that works well for audits and design reviews.
- Check whether UUID generation is custom
If it can be replaced withuuid4()oruuid7()from a mature runtime or library, replace it first. - Write down which UUID version is being used and why
v4/v7 are random-oriented, v3/v5 are deterministic, and v8 is custom by definition. - Audit seed and generator-state behavior
Include forks, worker restarts, VM restores, and container cloning. - Verify that the full UUID is preserved in storage and comparison
Display shortcuts should stay display-only. - Keep UNIQUE or PRIMARY KEY constraints at the storage layer
UUID is not a substitute for constraints. - Make duplicates observable
When a duplicate happens, the team should be able to tell which generator, which node, and which deployment path produced it.
10. Wrap-up
Most UUID collision incidents are not really about UUID probability in the abstract. They are about breaking the conditions that make UUID generation safe in practice.
- weak randomness
- replayed generator state
- misuse of name-based UUIDs
- careless custom v7 or v8 designs
- truncation during storage or transport
- missing uniqueness constraints
So when duplicate UUIDs appear, the first question should usually not be “Did UUID fail?”
It should be:
Which part of our generator, state handling, serialization, or storage design removed the uniqueness guarantee we thought we had?
That question tends to lead to the real fix much faster.
11. Related articles
- How to Use FileSystemWatcher Safely - Lost Events, Duplicate Notifications, and the Traps Around Completion Detection
- Safe File Integration Locking - Best Practices for File Locks, Atomic Claims, and Idempotent Processing
12. References
-
IETF RFC 9562, Section 5.4 UUID Version 4. Defines the random-bit layout used by UUIDv4. ↩ ↩2
-
IETF RFC 9562, Section 5.7 UUID Version 7. Covers the timestamp plus uniqueness layout of UUIDv7. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 5.8 UUID Version 8. Explains that UUIDv8 uniqueness is implementation specific and must not be assumed. ↩ ↩2 ↩3
-
Python 3.14 documentation,
uuidmodule. Describesuuid4()as cryptographically secure and documentsuuid5(),uuid7(), anduuid8()behavior. ↩ ↩2 ↩3 -
IETF RFC 9562, Universally Unique IDentifiers (UUIDs). The main standards document for UUID layouts and best practices. ↩ ↩2
-
PostgreSQL documentation, Constraints. Defines UNIQUE constraints and PRIMARY KEY behavior. ↩ ↩2
-
IETF RFC 9562, Section 6.5 Name-Based UUID Generation. Explains equality rules for same namespace and same canonical name. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 6.9 Unguessability. Recommends CSPRNG use and proper reseeding after forks. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 6.3 UUID Generator States. Covers stable generator state and duplicate-risk implications. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 5.5 UUID Version 5. Defines namespace-plus-name generation and canonical input handling. ↩
-
IETF RFC 9562, Section 5.6 UUID Version 6. Covers UUIDv6 structure, node, and clock sequence behavior. ↩
-
IETF RFC 9562, Section 6.4 Distributed UUID Generation. Discusses node collision resistance in distributed systems. ↩
-
IETF RFC 9562, Section 6.2 Monotonicity and Counters. Covers counter rollover, clock rollback, and batch generation concerns. ↩
-
IETF RFC 9562, Sections 6.7 and 6.8. Covers collision resistance and practical uniqueness guarantees. ↩
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Technical Consulting & Design Review
UUID collision issues usually cross specification choice, randomness sources, snapshot operations, database constraints, and idempotency, so they fit technical design review well.
Bug Investigation & Root Cause Analysis
When duplicate UUIDs appear in production, teams need to separate a standards problem from an implementation or operational problem and then design a realistic prevention plan.