Do UUIDs Really Not Collide? Operational and Implementation Patterns That Cause Duplicates

· · UUID, Identifiers, Distributed Systems, Data Modeling, Software Design

You picked UUID as your primary key, and one day a duplicate key error shows up. The first reaction is usually, “So UUIDs do collide after all.”

In practice, however, most UUID duplications in real systems are not caused by the UUID specification itself. They come from implementations and operations that quietly break the assumptions the specification depends on.

1. The short version: a list of dangerous patterns

Pattern What goes wrong First thing to do
Hand-rolled UUIDv4 using a fixed seed or weak PRNG The same sequence reproduces across processes or nodes Use the OS / runtime’s standard UUID API
Generation state carried over after fork, VM snapshot, or container clone Random and counter state rolls back, producing duplicates Reseed after fork; reinitialize after clone
Treating UUIDv3/v5 as “a fresh ID every time” The same input always regenerates the same UUID Treat them as deterministic IDs and limit their use
Custom implementations of UUIDv1/v6/v7/v8 Duplicates surface under high throughput or across nodes Use existing libraries; reduce homemade generators
Truncating UUIDs partway through You throw away the original 128 bits of uniqueness yourself Store and compare at full length
No UNIQUE / PRIMARY KEY at the DB layer Duplicates slip in silently and are hard to trace Keep a uniqueness constraint at the storage layer

2. What each UUID version actually guarantees

Version Approach Things to watch for
UUIDv4 Random-based (122 random bits) Must use a CSPRNG
UUIDv7 Timestamp + random / counter Sortable; counter design matters under high-rate generation
UUIDv3 / v5 Name-based (deterministic) Same input always produces the same UUID. Not for issuing fresh IDs.
UUIDv8 Experimental / vendor-defined Uniqueness is implementation-defined. Do not assume it.

3. Pattern 1: Hand-rolling UUIDv4 with a weak PRNG

A common variant: pull 128 bits from a general-purpose PRNG comparable to Math.random(), and seed it at startup with time() or the PID. The output looks like a UUID, but if the random source is weak, the same sequence reproduces in another process.

RFC 9562 requires UUID generators to use a CSPRNG to provide both uniqueness and unpredictability. Python’s uuid.uuid4() is also documented as being generated cryptographically.

What to do: don’t roll your own UUID. Don’t tweak the random seed by hand. Use the standard library as is.

4. Pattern 2: Rolling back generator state via fork, snapshot, or clone

  • Restoring multiple instances from the same VM snapshot
  • A custom generator starting from the same initial state every time a container image launches
  • Sharing PRNG or counter state across forked workers

In these setups, the UUID generation sequence can be unintentionally replayed.

What to do:

  • Don’t keep your own UUID generation state alive for long
  • Reinitialize immediately after fork / clone / restore
  • Where possible, lean on implementations that pull from OS-provided randomness on every call

5. Pattern 3: Treating UUIDv3/v5 as fresh-issued IDs

UUIDv3/v5 are not “non-colliding ID issuers.” They are “same input, same ID.” RFC 9562 also states that UUIDs derived from the same namespace and name must be equal.

Wrong usage examples:

  • Calling uuid5(NAMESPACE_URL, "https://example.com/users/42") and treating it as fresh ID issuance every time
  • Issuing IDs from a single global namespace plus email, with no tenant in the namespace

What to do: understand UUIDv3/v5 as deterministic IDs and limit their use. Don’t leave the namespace design vague.

6. Pattern 4: Hand-rolling time-based UUIDs

UUIDv1/v6/v7/v8 are dangerous to mimic at the surface level.

  • v1 / v6: don’t assume “MAC address therefore unique.” RFC 9562 explicitly notes that the rise of virtual machines and containers means MAC address uniqueness is no longer guaranteed.
  • v7: an implementation that issues many IDs within the same millisecond without a counter design, or that keeps generating when the clock moves backwards, is dangerous.
  • v8: an in-house UUID built as “timestamp + shard id + a bit of randomness” makes its own design document the uniqueness specification. Shipping it without review is risky.

7. Pattern 5: Shortening UUIDs in flight

  • Using only the leading 8 characters as a foreign key
  • Squeezing a 128-bit UUID into a 64-bit integer
  • A string column that’s too short, so the tail is silently truncated
  • Treating the abbreviated form used in logs or UI as a uniqueness key

Changing the representation itself is fine. Stripping hyphens, normalizing case, or storing as 16 binary bytes are all conversions that preserve all 128 bits. The dangerous move is dropping the bits that uniqueness depends on.

8. Pattern 6: No uniqueness constraint at the database

Even if UUIDs are statistically very unlikely to collide, if you genuinely cannot tolerate a duplicate, the storage layer needs a uniqueness constraint too. RFC 9562 itself states that while UUIDs can offer practically sufficient uniqueness, true global uniqueness can never be absolutely guaranteed.

The practical baseline:

  • Treat UUIDs as collision-resistant IDs
  • Keep a UNIQUE / PRIMARY KEY in the DB as the last line of defense
  • Design retry, idempotency, and incident logging for the duplicate case

9. A practical checklist

  1. Confirm you are not generating UUIDs yourself — fall back to the standard API wherever possible
  2. Pin the UUID version as part of the spec — v4/v7 are random-based, v3/v5 are deterministic, v8 is custom
  3. Audit how seeds and generator state are handled — never inherit the same state after fork, snapshot, or clone
  4. Verify that storage keeps the full length — don’t promote an abbreviated display form to the canonical key
  5. Put UNIQUE / PRIMARY KEY at the DB layer — UUIDs lower the probability; they are not the constraint
  6. Make duplicates observable — never swallow a duplicate key; keep it traceable

Wrap-up

UUID collision incidents almost never start from UUIDs being mathematically weak. They start from implementations and operations that break the assumptions UUIDs are built on. When a duplicate shows up, the first thing to question isn’t the math of UUIDs but the generator, the state management, the storage format, and the constraint design.

Related Articles

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

Related Topics

These topic pages place the article in a broader service and decision context.

Back to the Blog