Event Sourcing in Practice: Lessons From Production Systems

From Theory to Reality

Event sourcing has a compelling pitch: instead of storing the current state of your data, store the sequence of events that produced that state. Every change is recorded as an immutable event. The current state is derived by replaying events. You get a complete audit history, temporal queries, and a natural foundation for event-driven integration.

The theory is sound. The conceptual foundations of event sourcing and CQRS are well-documented. But the gap between understanding event sourcing conceptually and operating it in production is substantial. This post covers the practical lessons — the things that aren't in the conference talks — from building and running event-sourced systems that handle real business data.

Event Design: The Decisions You Can't Undo

Events are immutable. Once published, an event becomes part of the permanent historical record. This makes event design the most consequential decision in an event-sourced system — mistakes in event design live forever.

Granularity matters. An OrderUpdated event that contains the entire order state is technically event sourcing but practically useless. It doesn't tell you what changed or why. Prefer specific events: OrderItemAdded, OrderShippingAddressChanged, OrderDiscountApplied. Each event represents a single business action with clear semantics.

But don't go too granular either. An event for every field change on a form submission creates noise without adding meaning. The right granularity is the business action level: the events should correspond to things the business cares about, not to database column updates.

Event naming is a public contract. Name events in past tense — they represent things that have already happened. PaymentCaptured, not CapturePayment. The name should be meaningful to domain experts, not just developers. A business person should be able to read the event stream for an order and understand what happened without technical translation.

Include enough context. Each event should contain all the information needed to understand what happened without looking up additional data. An InvoiceCreated event should include the invoice total, not just a reference to the invoice record. When you're replaying events months later to rebuild a projection, the data in the event is all you have — the entity state that existed when the event was published might have been modified by subsequent events.

Version your events from day one. You will change event schemas. New fields will be added. Existing fields will be reinterpreted. A version number on each event tells downstream consumers which schema to expect. Without versioning, you'll end up with implicit versioning based on the presence or absence of fields, which is fragile and error-prone.

Projections: The Part That Requires the Most Engineering

In an event-sourced system, the event store is the source of truth, but it's not what your application queries. Applications query projections — read models built by processing the event stream and materializing the results into queryable structures (database tables, search indexes, cache entries).

Every query pattern needs a projection. Want to list orders by customer? That's a projection. Want to search products by name? That's a projection. Want to show a dashboard of revenue by month? That's a projection. Each projection is a function that processes events and maintains a read model optimized for a specific query.

This means that adding a new query to your application often means building a new projection and populating it by replaying the event history. For a system with years of event history, this replay can take hours. Build projection replay tooling early — it's not optional infrastructure, it's core infrastructure.

Projection consistency. Projections are eventually consistent with the event store. An event is published, and projections update asynchronously. The lag is usually sub-second, but it exists. Your application needs to handle this. After a user creates an order, they should see it in their order list immediately — but the projection might not have processed the event yet. Patterns for handling this include read-your-own-writes (routing the creator's queries to include unprojected events) and optimistic UI updates (showing the expected result immediately and correcting if the projection disagrees).

Projection failures. A projection consumer that crashes mid-processing needs to resume from where it left off, not from the beginning. Track the last successfully processed event position per projection. When the consumer restarts, it picks up from that position. This is essentially the same consumer group offset tracking that Kafka provides, and it's equally important for custom event store implementations.

Event Store Operations

The event store is the most critical piece of infrastructure in an event-sourced system. If the event store loses data, you've lost your source of truth — unlike a traditional database where you might recover from backups, event loss in an event-sourced system means lost business history.

Storage growth is predictable but relentless. Events are append-only and never deleted. The event store grows monotonically. For a system processing 10,000 events per day, that's 3.6 million events per year. Plan storage capacity accordingly, and implement partitioning by stream or by time to keep query performance manageable.

Snapshots are a performance optimization, not a feature. Replaying thousands of events to reconstruct an entity's current state is slow. Snapshots periodically capture the entity's current state so that reconstruction only needs to replay events since the last snapshot. Implement snapshots when entity event counts make reconstruction noticeably slow — typically when an entity has more than a few hundred events.

Event store technology choices. Dedicated event stores (EventStoreDB) provide purpose-built features: projections, subscriptions, partitioning. PostgreSQL with an append-only events table works well for systems that don't need the scale of a dedicated event store. Kafka can serve as an event store but has limitations around event retrieval by aggregate ID. Choose based on your scale, your operational capacity, and your query patterns.

Archival and compaction. For long-lived systems, consider an archival strategy for old events. Events older than a certain threshold can be moved to cold storage while maintaining their availability for replay if needed. Some systems implement event compaction — reducing the event history for an entity to a single snapshot event — but this permanently loses the detailed history, which may conflict with audit requirements.

When to Walk Away

Event sourcing is not the right choice for every system. After building several event-sourced systems, I have a clearer picture of when the investment is justified and when it's not.

Justified when: The domain has genuine audit and compliance requirements that demand a complete, immutable history of every change. Financial systems, healthcare records, and regulatory compliance systems benefit genuinely. The system needs temporal queries — "what was the state of this account on March 15th?" — as a core requirement, not a nice-to-have. The event stream is a natural integration point for multiple downstream systems that need to react to domain events.

Not justified when: The application is primarily CRUD with simple query patterns. The team doesn't have experience with eventual consistency and the operational complexity of managing projections. The audit requirements can be satisfied with a simpler approach — append-only audit tables that record changes alongside a traditional state-based model. This simpler approach provides 80% of the audit benefit with 20% of the complexity.

The practical alternative for many systems is what I call "event-inspired architecture": use domain events for integration and communication between system components, maintain an audit log of changes, but store current state in a traditional database as the source of truth. You get the decoupling and integration benefits of events without the complexity of deriving all state from the event stream.

If you're evaluating event sourcing for your system, let's discuss whether it's the right fit.