Why MongoDB ObjectIds are only “almost” monotonic (and why it matters)
A deep dive into how assuming strict ordering can lead to subtle bugs.
MongoDB ObjectIds look monotonic and are often treated as such. In practice, they are only “almost” monotonic — and that subtle difference can lead to missed events in systems that rely on ordering. This post explains why that happens and how we adjusted our design to handle it correctly.
The Structure of a MongoDB ObjectId
A MongoDB ObjectId is a 12-byte value composed of three parts: a 4-byte timestamp (seconds since the Unix epoch), a 5-byte random value, and a 3-byte incrementing counter. This structure is designed to provide uniqueness across different machines and processes.
[ timestamp (4 bytes) ][ random / machine (5 bytes) ][ counter (3 bytes) ]
The timestamp provides coarse ordering with one-second resolution, while the random identifier and counter ensure that ObjectIds are unique, even when generated in the same second.
How ObjectIds are Compared
ObjectIds are compared lexicographically, byte-by-byte, in the order of `timestamp → machine/random → counter`. If the timestamps differ, the ordering is clear. However, if two ObjectIds are generated within the same second on different processes, the ordering depends on the random machine bytes. This has an important implication: ordering is not guaranteed across different nodes within that same second.
For example, if Process A generates an ObjectId `[T][A][100]` and Process B generates `[T][B][1]`, but 'B' comes before 'A' lexicographically, the ObjectId from Process B can appear "before" the one from Process A, even if it was created a fraction of a second later. This means ObjectIds are only approximately time-ordered, not strictly monotonic globally.
Designing for Accurate Event Aggregation
When designing our analytics pipeline for event aggregation, we considered two primary approaches for handling event ordering. One common approach is to rely on the database's natural ordering of ObjectIds. An aggregator would periodically fetch new events using a query like `_id > lastProcessedId`, assuming that ObjectIds always increase over time.
However, this approach has a subtle flaw. Because ObjectIds are not strictly monotonic across different processes, it's possible for events generated in the same second to be inserted in a non-deterministic order. A late-arriving event could have a lexicographically smaller ObjectId than the `lastProcessedId` checkpoint, causing it to be permanently skipped by the aggregator.
Why This Can Cause Missed Events
Consider a simple scenario with two workers generating events within the same second:
- Worker 1 generates
ObjectId_1at T + 0.1s - Worker 2 generates
ObjectId_2at T + 0.2s
Because ObjectIds include a random component, it is possible that ObjectId_2 < ObjectId_1 even though it was generated later.
If an aggregator processes ObjectId_1 first and stores it as the checkpoint, the next query using _id > ObjectId_1 will skip ObjectId_2 entirely. This illustrates how relying on ObjectId ordering alone can lead to missed events.
Our Approach: Making Ordering Explicit
To ensure accurate event processing, we adopted an ingestion pipeline using a partitioned queue combined with ordered processing. Events are grouped using a stable key and routed to the same shard. Each shard is handled by a dedicated worker, ensuring events from the same source are processed sequentially.
By making ordering a property of the ingestion pipeline rather than relying on database behavior, we ensure a predictable and stable flow of events into the system. This allows incremental aggregation based on _id > lastProcessedId to remain reliable and accurate even at scale.
Key Takeaway
MongoDB ObjectIds are excellent for ensuring uniqueness and providing approximate time-ordering. However, for systems that rely on incremental processing, it is safer to enforce ordering at the ingestion layer rather than assuming it from ObjectIds.
Closing Thoughts
In a privacy-first analytics system, event ordering directly impacts data accuracy. Understanding the subtle behaviors of components like ObjectIds is crucial for building robust and reliable systems. This level of attention to detail is central to how we approach building Planosys.