By default, when an HTTP Request node fails (network error, 4xx/5xx response), n8n stops the entire workflow and marks it as failed. For production workflows, you need more control.
In every HTTP Request node, click the three-dot menu → "Settings". Enable "Continue on Fail". This tells n8n to continue to the next node even when this one fails, and passes the error information downstream.
After the HTTP Request node, add an IF node that checks: {{ $json.error }} or {{ $json.statusCode >= 400 }}. The "true" branch handles errors; the "false" branch continues normal processing.
For the error branch: log the error, send an alert, and either retry or gracefully skip this item. Never let errors silently disappear.
Many errors are transient — rate limits, brief server downtime, network hiccups. A retry mechanism handles these automatically.
Build a retry loop with a Code node that tracks attempt count:
// Code Node: Retry Counter
const maxRetries = 3;
const currentAttempt = ($json._retry_count || 0) + 1;
if (currentAttempt > maxRetries) {
return [{
_retry_count: currentAttempt,
_max_retries_exceeded: true,
_original_data: $json._original_data
}];
}
return [{
...$json,
_retry_count: currentAttempt,
_should_retry: true
}];Connect the retry loop: Error detected → Code node (increment counter) → IF node (should_retry?) → Wait node (exponential backoff: attempt 1 = 5s, attempt 2 = 15s, attempt 3 = 45s) → back to HTTP Request node.
Use exponential backoff: {{ $json._retry_count * $json._retry_count * 5 * 1000 }} milliseconds.
Production workflows need structured logs so you can investigate any failure after the fact. Build a logging sub-workflow that all your main workflows call.
Create a MySQL table for logs:
CREATE TABLE workflow_logs (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
workflow_name VARCHAR(255),
execution_id VARCHAR(255),
status ENUM("success", "warning", "error", "retry"),
node_name VARCHAR(255),
message TEXT,
payload JSON,
error_details JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_workflow_status (workflow_name, status),
INDEX idx_created_at (created_at)
);Create a reusable "Log" sub-workflow triggered via "Execute Workflow" node. Every time you need to log something, call this sub-workflow with the log data. This centralizes logging across all your workflows.
n8n's Error Trigger node fires when any workflow in your n8n instance fails. Create a dedicated "Error Notifier" workflow that starts with an Error Trigger node.
In the Error Notifier, extract the failure information and route alerts to the right channel:
// Error data available from Error Trigger:
$json.workflow.name → which workflow failed
$json.execution.id → execution ID for replaying
$json.error.message → error message
$json.error.node → which node failed
$json.execution.startedAt → when it startedBuild routing logic: critical workflow failures → PagerDuty + Slack + Email. Non-critical failures → Slack only. Log all failures to MySQL regardless of severity.
Include the n8n execution URL in your alert so your team can click directly to the failed execution and replay it: https://your-n8n.domain/execution/{{ $json.execution.id }}
Pin data for isolated debugging: Right-click any node in execution history → "Pin Data". Now you can modify downstream nodes and run them against identical data without triggering real webhooks or re-calling external APIs.
Use the "Debug Helper" Code node pattern: Add temporary Code nodes named "DEBUG: Check data here" that log the current $json to the execution output. Remove them before deploying to production.
// Debug Helper (remove before production)
console.log("=== DEBUG ===");
console.log(JSON.stringify($json, null, 2));
console.log("=== END DEBUG ===");
return [$input.first()];Name your nodes descriptively: "HTTP: POST → HubSpot Contact" is infinitely more useful than "HTTP Request3" when investigating a failure.
Use Execution Filters: In n8n's execution history, filter by status (failed) + workflow name + date range to quickly find and diagnose production issues.
A SaaS company sends weekly digest emails to 10,000 users via SendGrid. Random SendGrid rate limit errors used to silently fail, losing 15-20% of sends. With proper error handling, they now achieve 99.8% delivery.