Webhooks are the backbone of real-time communication in Shopify integrations. They fire instantly when something happens: an order is placed, inventory updates, a customer signs up. But what happens when your server is down, your endpoint times out, or a bug swallows the payload?
You lose that event. And with it, you lose data.
A Shopify webhook replay system solves this exact problem. It gives you the ability to recover missed events, reprocess failed deliveries, and keep every connected system in sync. This guide explains how replay systems work, why they matter, and how to build one that holds up in production.
What Is a Shopify Webhook Replay System?
A webhook replay system is infrastructure that stores incoming webhook events and allows you to reprocess them later. Instead of a webhook being a one-shot delivery, it becomes a recoverable record.
When Shopify sends a webhook and your server fails to process it, the replay system catches the raw payload, stores it safely, and retries it automatically or on demand.
Think of it as a DVR for your event stream. You can rewind, replay, and reprocess any event that was missed.
This pairs naturally with Shopify’s broader webhook architecture, which you should understand before building a replay layer on top of it.
Why Webhook Delivery Fails
Before building a replay system, understand what causes delivery failures in the first place.
| Failure Cause | Description |
|---|---|
| Server downtime | Your endpoint is offline when Shopify fires the event |
| Timeout | Your handler takes too long and Shopify marks it failed |
| Code errors | A bug in your handler throws an exception |
| Rate limiting | Your system is overloaded and rejects the request |
| Database unavailability | The DB is down when the handler tries to write |
| Deployment gaps | A deploy restarts your server at the exact wrong moment |
| Network issues | DNS, TLS, or routing problems block the request |
Shopify retries failed webhooks up to 19 times over 48 hours. After that, the webhook endpoint gets flagged as failing. If the issue persists, Shopify stops sending events altogether.
A replay system gives you control beyond that 48-hour window.
How a Shopify Webhook Replay System Works
A well-designed Shopify webhook replay system has four core stages:
1. Ingestion Layer
All incoming webhooks hit a lightweight ingestion endpoint first. This endpoint does one thing: acknowledge receipt immediately with a 200 OK and store the raw payload.
It does not process the event. Processing happens separately.
This is critical because Shopify only waits five seconds for a response. Any logic you run inline risks a timeout.
2. Persistent Event Store
Every received webhook gets written to a durable store before anything else happens. This is your replay source.
Each record should include:
- The raw JSON payload
- The webhook topic (e.g.,
orders/create) - The Shopify
X-Shopify-Webhook-Idheader - Arrival timestamp
- Processing status (pending, succeeded, failed)
- Retry count
- Last error message
3. Async Processing Worker
A background worker picks up events from the store and processes them. If processing fails, it marks the record as failed and schedules a retry.
This is exactly the pattern covered in queue-based Shopify webhook processing, where a message queue decouples ingestion from execution.
4. Replay Trigger
This is what makes the system a “replay” system. You can trigger reprocessing of any stored event, a filtered set, or an entire time range.
You replay events when:
- Your handler had a bug and you’ve since deployed a fix
- A downstream system was unavailable
- You onboard a new integration and need to backfill historical events
Replaying Shopify Events: The Core Mechanics
Replaying Shopify events is not the same as Shopify resending them. Shopify does not natively support replaying specific past events on demand (beyond its built-in retry window). You control your own replay.
Here is how the replay flow works:
[Event Store] → [Replay Trigger] → [Processing Queue] → [Handler] → [Update Status]
Querying for Replay
You query your event store by status, topic, or time range:
SELECT * FROM webhook_events WHERE status = 'failed' AND topic = 'orders/create' AND created_at > NOW() - INTERVAL '24 hours' ORDER BY created_at ASC;
Enqueuing for Reprocessing
Each matched record gets enqueued back into your processing queue. Your handler treats it identically to a new event.
Idempotency Is Non-Negotiable
When you replay an event, your handler will run twice for the same payload. You must ensure your logic is idempotent: processing the same event twice must produce the same result as processing it once.
Check the idempotency strategies used in Shopify systems to implement this correctly. Using the X-Shopify-Webhook-Id as a deduplication key is the standard approach.
Building the Event Store
Your event store is the foundation of the entire system. It needs to be:
- Durable: Survives server restarts and crashes
- Fast writes: Ingestion cannot block
- Queryable: You need to filter by topic, status, and time
Recommended Schema
CREATE TABLE webhook_events( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), shopify_id VARCHAR(255) UNIQUE, -- X-Shopify-Webhook-Id topic VARCHAR(100) NOT NULL, payload JSONB NOT NULL, hmac VARCHAR(255), status VARCHAR(20) DEFAULT 'pending', retry_count INT DEFAULT 0, last_error TEXT, created_at TIMESTAMPTZ DEFAULT NOW(), processed_at TIMESTAMPTZ ); CREATE INDEX idx_webhook_events_status ON webhook_events(status); CREATE INDEX idx_webhook_events_topic ON webhook_events(topic); CREATE INDEX idx_webhook_events_created_at ON webhook_events(created_at);
Storage Options
| Option | Best For |
|---|---|
| PostgreSQL | Full SQL queries, JSONB indexing, reliable transactions |
| Redis Streams | High-throughput ingestion, built-in consumer groups |
| AWS SQS + DynamoDB | Managed, scalable, no ops overhead |
| Kafka | High-volume event sourcing at enterprise scale |
For most Shopify apps, PostgreSQL is the right default. It handles volume well and gives you full query flexibility when you need to replay specific subsets.
Retry Strategies for Webhook Recovery
Not all failures are equal. Your retry strategy should match the failure type.
Exponential Backoff
Retry with increasing delays. This prevents hammering a struggling downstream system.
| Retry Attempt | Delay |
|---|---|
| 1 | 30 seconds |
| 2 | 2 minutes |
| 3 | 10 minutes |
| 4 | 1 hour |
| 5+ | 6 hours |
Dead Letter Queue
Events that exceed your max retry count should move to a dead letter queue (DLQ). This keeps your active queue clean while preserving the event for manual review.
Read the detailed breakdown on dead letter queues for Shopify webhooks to understand how to handle terminal failures properly.
Jitter
Add random jitter to retry delays to prevent synchronized retry storms when multiple events fail at once.
const delay = baseDelay * Math.pow(2, retryCount) + Math.random() * 1000;
Verifying Webhook Integrity Before Replay
Every Shopify webhook includes an HMAC signature in the X-Shopify-Hmac-Sha256 header. Verify this signature at ingestion time and store it alongside the payload.
During replay, you have two options:
- Skip re-verification: You already verified it at ingestion. Trust the stored payload.
- Re-verify against stored HMAC: Extra safety for high-security workflows.
Never replay an event that failed HMAC verification at ingestion. This is a security risk, not a data loss scenario.
This fits into the broader topic of building reliable Shopify webhook consumers, where signature verification is the first line of defense.
Handling Shopify Event Reprocessing at Scale
At low volume, replaying events is straightforward. At scale, it gets complex fast.
Batched Replay
Do not enqueue 50,000 events at once. Process in batches of 100-500 with a configurable concurrency limit. This prevents overwhelming your handlers and your downstream systems.
Priority Queuing
Give recent failures higher priority than old ones. An order from five minutes ago matters more than one from three days ago.
Rate Limit Awareness
If your handlers call the Shopify Admin API during processing, replay can trigger API rate limit errors. Use a token bucket or respect the Retry-After header.
For high-traffic scenarios, review the strategies in scaling Shopify apps to millions of requests to avoid rate limit cascades during large replays.
Concurrency Control
Limit concurrent workers per topic. Processing ten orders/paid events simultaneously can cause race conditions in your fulfillment logic.
The race conditions guide for Shopify order processing covers exactly how to prevent this.
Monitoring Your Replay System
A replay system you cannot observe is a system you cannot trust.
Track these metrics:
| Metric | What It Tells You |
|---|---|
| Events received per topic | Volume baseline |
Events in pending status > 5 min |
Processing lag |
Events in failed status |
Active problems |
| DLQ depth | Critical failures needing manual review |
| Replay success rate | Health of your handler logic |
| Average processing time | Performance benchmark |
Alert immediately when the DLQ depth grows or when failed events exceed a threshold. These are not informational signals. They represent real data that has not reached its destination.
Shopify Analytics can help you validate event counts against what your system processed. If orders in Shopify do not match what your system recorded, your replay system has a gap.
Replay System Architecture Diagram
A production-ready Shopify webhook replay system looks like this:
Shopify
|
v
[Ingestion Endpoint] ←── validates HMAC, returns 200 immediately
|
v
[Event Store (DB)] ←── stores raw payload, topic, status
|
v
[Processing Queue] ←── background worker picks up events
|
v
[Handler] ←── idempotent processing logic
|
├── Success → mark event as 'succeeded'
└── Failure → increment retry_count → re-enqueue with backoff
|
└── Max retries reached → move to Dead Letter Queue
This architecture follows the fault-tolerant Shopify integration pattern, where every stage is designed to fail safely without losing data.
When to Use Manual vs. Automatic Replay
| Scenario | Replay Type |
|---|---|
| Server was down for 2 hours | Automatic retry handles it |
| Bug deployed and fixed | Manual replay by time range |
| New integration added | Manual replay to backfill history |
| Downstream API was down | Automatic retry with backoff |
| Data migration | Manual replay by topic and date range |
| Suspicious processing gap | Manual investigation then selective replay |
Automatic retry handles transient failures. Manual replay handles logic failures and integration onboarding.
Build both. Use automatic retry for reliability and manual replay for recovery.
Common Mistakes to Avoid
Several mistakes consistently cause replay systems to fail.
Not storing the raw payload. If you only store processed data, you cannot replay. Always store the full original payload.
Processing in the ingestion endpoint. This causes timeouts and lost events. Decouple ingestion from processing always.
Missing idempotency. Replaying a non-idempotent handler creates duplicate orders, double charges, and corrupted inventory. Solve this before building replay.
Unlimited retries. Without a max retry cap, failed events cycle forever and consume all your worker capacity.
No DLQ. Events that exhaust retries need a place to land. Without a DLQ, they either get deleted or keep retrying forever.
Not monitoring DLQ depth. A DLQ that grows silently means data loss you are not seeing.
These mistakes mirror the common Shopify technical mistakes that teams make when building integrations under pressure.
Tools and Libraries
You do not need to build everything from scratch. Several tools accelerate development:
| Tool | Role |
|---|---|
| BullMQ (Node.js) | Queue management with retry and backoff built in |
| Sidekiq (Ruby) | Background jobs with retry logic |
| AWS SQS + Lambda | Managed queue and serverless processing |
| Temporal | Durable workflow execution with replay built in |
| Redis Streams | Lightweight event sourcing |
| PostgreSQL + pg-boss | Database-backed job queue |
For apps built on Shopify’s serverless Hydrogen infrastructure, AWS Lambda with SQS is a natural fit for replay processing.
Conclusion
A Shopify webhook replay system transforms your integration from fragile to resilient. Instead of losing events when things go wrong, you capture them, store them, and process them when you are ready.
The core principle is simple: receive fast, process safely, retry intelligently, and replay on demand.
Build the ingestion layer first. Then add persistent storage. Then add retry logic. Then add manual replay. Each layer compounds the reliability of your integration.
When combined with event-driven architecture for Shopify apps and solid queue infrastructure, a replay system gives you a foundation that handles outages, bugs, and scaling challenges without data loss.
Your Shopify store generates events constantly. Make sure none of them disappear.
Frequently Asked Questions
Q1: Does Shopify have a built-in webhook replay feature?
No. Shopify retries failed webhooks up to 19 times over 48 hours, but it does not offer on-demand replay of past events. You must build and manage your own replay system.
Q2: How long should I store webhook events for replay?
Store them for at least 30 days. This covers most business recovery scenarios. For compliance-sensitive workflows, 90 days or longer may be appropriate.
Q3: What is the difference between webhook retry and webhook replay?
Retry is automatic redelivery of a recently failed event. Replay is deliberate reprocessing of stored events, often triggered manually after fixing a bug or onboarding a new system.
Q4: How do I prevent duplicate processing when replaying webhooks?
Use the X-Shopify-Webhook-Id as a unique deduplication key. Before processing any event, check if that ID has already been successfully processed and skip it if so.
Q5: Can I replay webhooks for a specific topic only?
Yes. Your event store should index by topic. You can query for orders/paid events in a specific time window and replay only those, leaving other topics untouched.
Q6: What happens to events in the dead letter queue?
DLQ events require manual review. Investigate the root cause, fix the handler or configuration, and then selectively replay those events. Do not auto-retry DLQ events without understanding why they failed.
Q7: Is a webhook replay system necessary for small Shopify stores?
For small stores with simple integrations, Shopify’s built-in retry window may be sufficient. As integrations grow in complexity and data dependency, a replay system becomes essential to prevent data inconsistencies.
