n8n Error Handling & Debugging – Retry Logic & Logging

Step-by-Step Instructions

Step 1 — Enable Error Handling on Every HTTP Request Node

By default, when an HTTP Request node fails (network error, 4xx/5xx response), n8n stops the entire workflow and marks it as failed. For production workflows, you need more control.

In every HTTP Request node, click the three-dot menu → "Settings". Enable "Continue on Fail". This tells n8n to continue to the next node even when this one fails, and passes the error information downstream.

After the HTTP Request node, add an IF node that checks: {{ $json.error }} or {{ $json.statusCode >= 400 }}. The "true" branch handles errors; the "false" branch continues normal processing.

For the error branch: log the error, send an alert, and either retry or gracefully skip this item. Never let errors silently disappear.

Step 2 — Implement Retry Logic with Wait Nodes

Many errors are transient — rate limits, brief server downtime, network hiccups. A retry mechanism handles these automatically.

Build a retry loop with a Code node that tracks attempt count:

// Code Node: Retry Counter
const maxRetries = 3;
const currentAttempt = ($json._retry_count || 0) + 1;

if (currentAttempt > maxRetries) {
  return [{
    _retry_count: currentAttempt,
    _max_retries_exceeded: true,
    _original_data: $json._original_data
  }];
}

return [{
  ...$json,
  _retry_count: currentAttempt,
  _should_retry: true
}];

Connect the retry loop: Error detected → Code node (increment counter) → IF node (should_retry?) → Wait node (exponential backoff: attempt 1 = 5s, attempt 2 = 15s, attempt 3 = 45s) → back to HTTP Request node.

Use exponential backoff: {{ $json._retry_count * $json._retry_count * 5 * 1000 }} milliseconds.

Step 3 — Build a Structured Logging System

Production workflows need structured logs so you can investigate any failure after the fact. Build a logging sub-workflow that all your main workflows call.

Create a MySQL table for logs:

CREATE TABLE workflow_logs (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  workflow_name VARCHAR(255),
  execution_id VARCHAR(255),
  status ENUM("success", "warning", "error", "retry"),
  node_name VARCHAR(255),
  message TEXT,
  payload JSON,
  error_details JSON,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_workflow_status (workflow_name, status),
  INDEX idx_created_at (created_at)
);

Create a reusable "Log" sub-workflow triggered via "Execute Workflow" node. Every time you need to log something, call this sub-workflow with the log data. This centralizes logging across all your workflows.

Step 4 — Set Up Error Trigger Workflows

n8n's Error Trigger node fires when any workflow in your n8n instance fails. Create a dedicated "Error Notifier" workflow that starts with an Error Trigger node.

In the Error Notifier, extract the failure information and route alerts to the right channel:

// Error data available from Error Trigger:
$json.workflow.name    → which workflow failed
$json.execution.id     → execution ID for replaying
$json.error.message    → error message
$json.error.node       → which node failed
$json.execution.startedAt → when it started

Build routing logic: critical workflow failures → PagerDuty + Slack + Email. Non-critical failures → Slack only. Log all failures to MySQL regardless of severity.

Include the n8n execution URL in your alert so your team can click directly to the failed execution and replay it: https://your-n8n.domain/execution/{{ $json.execution.id }}

Step 5 — Debugging Techniques for Production Workflows

Pin data for isolated debugging: Right-click any node in execution history → "Pin Data". Now you can modify downstream nodes and run them against identical data without triggering real webhooks or re-calling external APIs.

Use the "Debug Helper" Code node pattern: Add temporary Code nodes named "DEBUG: Check data here" that log the current $json to the execution output. Remove them before deploying to production.

// Debug Helper (remove before production)
console.log("=== DEBUG ===");
console.log(JSON.stringify($json, null, 2));
console.log("=== END DEBUG ===");
return [$input.first()];

Name your nodes descriptively: "HTTP: POST → HubSpot Contact" is infinitely more useful than "HTTP Request3" when investigating a failure.

Use Execution Filters: In n8n's execution history, filter by status (failed) + workflow name + date range to quickly find and diagnose production issues.

Sample Workflow Diagram

PRODUCTION-GRADE ERROR HANDLING SYSTEM ───────────────────────────────────────────────────────── [Trigger: Webhook or Schedule] │ ▼ [Set: Initialize Execution State] _execution_id: {{ $execution.id }} _retry_count: 0 _start_time: {{ new Date().toISOString() }} │ ▼ [HTTP: Call External API] ← Continue on Fail = ON │ ▼ [IF: $json.error OR statusCode >= 400] │ │ ERROR SUCCESS │ │ ▼ ▼ [Code: Check Retry Count] [Continue normal flow...] _retry_count < 3? │ │ YES NO │ │ ▼ ▼ [Wait: [MySQL: Log Failure] Exp status = "error" Backoff] │ │ ▼ │ [Slack: Alert Team] │ [Email: Error Report] │ └──► [HTTP: Retry Call] (loops back up) ───────────────────────────────────────────────────────── ERROR TRIGGER WORKFLOW (separate, always running) ───────────────────────────────────────────────────────── [Error Trigger: Any Workflow Fails] │ ▼ [MySQL: Log Error] workflow, node, message, execution_id, timestamp │ ▼ [IF: Critical Workflow?] │ │ YES NO │ ▼ ▼ [Slack: Warning Alert] [PagerDuty + Slack + Email]

Real-World Automation Example

Real-World Example: Production Email Campaign with Bulletproof Error Handling

A SaaS company sends weekly digest emails to 10,000 users via SendGrid. Random SendGrid rate limit errors used to silently fail, losing 15-20% of sends. With proper error handling, they now achieve 99.8% delivery.

Schedule trigger fires every Monday at 9 AM.

MySQL query fetches all active subscribers in batches of 200.

HTTP Request node calls SendGrid with "Continue on Fail" enabled.

IF node checks for error response (429 rate limit, 5xx server error).

On rate limit (429): Wait 60 seconds (exponential backoff), retry up to 3 times. Log each retry to workflow_logs table.

On persistent failure after 3 retries: log to MySQL, add subscriber to a "retry_queue" table, send Slack alert.

A separate daily workflow replays all records in retry_queue from the previous day.

Success rate improved from ~82% to 99.8%. Full audit trail in MySQL for compliance.

Frequently Asked Questions

What is the difference between "Continue on Fail" and the Error Trigger node?

"Continue on Fail" handles errors at the individual node level — the workflow keeps running and you handle the error inline. The Error Trigger node is a separate workflow that fires when an entire workflow fails and exits. Use Continue on Fail for expected, recoverable errors (rate limits, not-found responses). Use Error Trigger for catching catastrophic, unexpected failures that need immediate team notification.

How many retries should I configure?

3 retries with exponential backoff is the industry standard for most APIs. Use 1 retry with a 30-second wait for time-sensitive workflows. Use 5 retries for critical financial or notification workflows where failure has high business impact. Never retry indefinitely — always set a maximum and route exceeding records to a dead letter queue or human review.

Can I replay a failed execution in n8n?

Yes. In n8n's execution history, find the failed execution, click it, and select "Retry" or "Rerun". The workflow re-executes with the exact same input data that caused the original failure. This is invaluable for fixing the root cause and immediately testing the fix against the exact failure scenario.

Database Automation

Next Module

Full Production System Project

n8n Error Handling & Debugging – Build Production Workflows That Never Silently Fail