Exponential Backoff
Increasing delays between retries
Smart Retry Logic
Retry only transient failures
Dead Letter Queue
Capture permanently failed jobs
Failure Monitoring
Track and alert on failures
Retry Configuration
Configure how jobs should be retried when they fail. By default, jobs are retried up to 3 times with exponential backoff.
import { platform } from '@/lib/platform'
const job = await platform.jobs.schedule({
url: 'https://myapp.com/api/jobs/process',
body: { data: 'value' },
retries: {
maxAttempts: 5,
backoff: 'exponential',
initialDelay: 60, // 1 minute
maxDelay: 3600, // 1 hour max between retries
retryOn: [500, 502, 503, 504], // Only retry server errors
},
})Retry Options
| Property | Type | Description |
|---|---|---|
maxAttempts | number= 3 | Maximum number of retry attempts (1-10) |
backoff | "linear" | "exponential"= "exponential" | Backoff strategy between retries |
initialDelay | number= 60 | Initial delay in seconds before first retry |
maxDelay | number= 3600 | Maximum delay in seconds between retries |
retryOn | number[]= [500, 502, 503, 504] | HTTP status codes that trigger retry |
Exponential Backoff
Exponential backoff increases the delay between retries, giving failing services time to recover while avoiding overwhelming them with requests.
// Exponential backoff formula:
// delay = min(initialDelay * 2^attempt, maxDelay)
// Example with initialDelay=60, maxDelay=3600:
// Attempt 1: 60 seconds (1 min)
// Attempt 2: 120 seconds (2 min)
// Attempt 3: 240 seconds (4 min)
// Attempt 4: 480 seconds (8 min)
// Attempt 5: 960 seconds (16 min)
// Attempt 6: 1920 seconds (32 min)
// Attempt 7: 3600 seconds (60 min, capped at maxDelay)// Best for most use cases - prevents thundering herd
const job = await platform.jobs.schedule({
url: 'https://myapp.com/api/jobs/webhook',
body: { event: 'order.created' },
retries: {
maxAttempts: 5,
backoff: 'exponential',
initialDelay: 30, // Start at 30 seconds
maxDelay: 1800, // Cap at 30 minutes
},
})
// Retry timeline:
// Fail #1 -> wait 30s -> Attempt #2
// Fail #2 -> wait 60s -> Attempt #3
// Fail #3 -> wait 120s -> Attempt #4
// Fail #4 -> wait 240s -> Attempt #5
// Fail #5 -> move to DLQJitter
Max Retry Limits
Set appropriate limits based on the criticality and nature of your job.
// Critical jobs that must eventually succeed
const job = await platform.jobs.schedule({
url: 'https://myapp.com/api/jobs/payment-webhook',
body: { paymentId: 'pay_123' },
retries: {
maxAttempts: 10, // Try hard to deliver
backoff: 'exponential',
initialDelay: 60,
maxDelay: 3600,
},
// Also configure DLQ alerting
deadLetterQueue: {
enabled: true,
webhookUrl: 'https://myapp.com/api/alerts/dlq',
},
})Dead Letter Queue (DLQ)
Jobs that fail after all retry attempts are moved to the Dead Letter Queue. This allows you to investigate and manually reprocess failed jobs.
import { platform } from '@/lib/platform'
// Configure DLQ behavior when scheduling
const job = await platform.jobs.schedule({
url: 'https://myapp.com/api/jobs/important-task',
body: { taskId: 'task_123' },
retries: {
maxAttempts: 5,
backoff: 'exponential',
},
deadLetterQueue: {
enabled: true,
webhookUrl: 'https://myapp.com/api/webhooks/dlq-alert',
retentionDays: 30,
},
})DLQ Options
| Property | Type | Description |
|---|---|---|
enabled | boolean= true | Enable Dead Letter Queue for failed jobs |
webhookUrl | string | URL to notify when job moves to DLQ |
retentionDays | number= 30 | Days to retain failed jobs in DLQ |
maxSize | number= 1000 | Maximum jobs in DLQ before oldest are purged |
Working with DLQ
import { platform } from '@/lib/platform'
// List all jobs in the Dead Letter Queue
const dlqJobs = await platform.jobs.listDLQ({
limit: 50,
offset: 0,
})
for (const job of dlqJobs.items) {
console.log({
id: job.id,
url: job.url,
failedAt: job.failedAt,
attempts: job.attempts,
lastError: job.lastError,
})
}Handling Transient vs Permanent Failures
Distinguish between transient failures (which should be retried) and permanent failures (which should not).
// app/api/jobs/process-order/route.ts
import { platform } from '@/lib/platform'
import { NextRequest } from 'next/server'
export async function POST(req: NextRequest) {
const isValid = await platform.jobs.verifyRequest(req)
if (!isValid) {
return new Response('Unauthorized', { status: 401 })
}
const { orderId } = await req.json()
try {
const order = await getOrder(orderId)
// Permanent failure - order doesn't exist
if (!order) {
return new Response(JSON.stringify({
error: 'Order not found',
code: 'ORDER_NOT_FOUND',
}), {
status: 400, // 4xx = no retry
})
}
// Permanent failure - invalid state
if (order.status === 'cancelled') {
return new Response(JSON.stringify({
error: 'Order is cancelled',
code: 'ORDER_CANCELLED',
}), {
status: 400, // 4xx = no retry
})
}
await processOrder(order)
return new Response('OK', { status: 200 })
} catch (error) {
// Transient failure - external service down
if (error.code === 'ECONNREFUSED') {
return new Response(JSON.stringify({
error: 'Payment service unavailable',
code: 'SERVICE_UNAVAILABLE',
}), {
status: 503, // 5xx = will retry
})
}
// Transient failure - rate limited
if (error.status === 429) {
return new Response(JSON.stringify({
error: 'Rate limited',
code: 'RATE_LIMITED',
retryAfter: error.retryAfter,
}), {
status: 503, // 5xx = will retry
headers: {
'Retry-After': String(error.retryAfter),
},
})
}
// Unknown error - retry to be safe
return new Response(JSON.stringify({
error: 'Internal error',
code: 'INTERNAL_ERROR',
}), {
status: 500, // 5xx = will retry
})
}
}Return Appropriate Status Codes
- 2xx: Success - job completed
- 4xx: Permanent failure - do not retry (except 408, 429)
- 5xx: Transient failure - will retry
Monitoring Failed Jobs
Set up monitoring and alerting for job failures to catch issues early.
import { platform } from '@/lib/platform'
// Get job failure statistics
const stats = await platform.jobs.getStats({
period: '24h',
})
console.log({
totalJobs: stats.total,
completedJobs: stats.completed,
failedJobs: stats.failed,
failureRate: stats.failureRate, // percentage
averageRetries: stats.averageRetries,
dlqSize: stats.dlqSize,
})
// Get failure stats by URL
const urlStats = await platform.jobs.getStats({
period: '24h',
groupBy: 'url',
})
for (const stat of urlStats) {
if (stat.failureRate > 0.1) { // > 10% failure rate
console.warn(`High failure rate for ${stat.url}: ${stat.failureRate}`)
}
}Best Practices
Make Jobs Idempotent
Jobs may run multiple times due to retries. Use idempotency keys to prevent duplicate processing.
Return Proper Status Codes
Use 4xx for permanent failures (no retry) and 5xx for transient failures (will retry).
Set Appropriate Timeouts
Configure timeouts shorter than your retry delay to prevent overlapping executions.
Monitor DLQ
Set up alerts for DLQ size and regularly review failed jobs to identify systemic issues.
Idempotency Pattern
// Use idempotency keys in your job handler
export async function POST(req: NextRequest) {
const { orderId, idempotencyKey } = await req.json()
// Check if already processed
const existing = await db.processedJobs.findUnique({
where: { idempotencyKey },
})
if (existing) {
// Already processed - return success without reprocessing
return new Response('Already processed', { status: 200 })
}
// Process the job
await processOrder(orderId)
// Record that we processed it
await db.processedJobs.create({
data: { idempotencyKey, processedAt: new Date() },
})
return new Response('OK', { status: 200 })
}