Production Jekyll deployments require sophisticated error handling and monitoring to ensure reliability and quick issue resolution. By combining Ruby's exception handling capabilities with Cloudflare's monitoring tools and GitHub Actions' workflow tracking, you can build a robust observability system. This guide explores advanced error handling patterns, distributed tracing, alerting systems, and performance monitoring specifically tailored for Jekyll deployments across the GitHub-Cloudflare pipeline.
A comprehensive error handling architecture spans the entire deployment pipeline from local development to production edge delivery. The system must capture, categorize, and handle errors at each stage while maintaining context for debugging.
The architecture implements a layered approach with error handling at the build layer (Ruby/Jekyll), deployment layer (GitHub Actions), and runtime layer (Cloudflare Workers/Pages). Each layer captures errors with appropriate context and forwards them to a centralized error aggregation system. The system supports error classification, automatic recovery attempts, and context preservation for post-mortem analysis.
# Error Handling Architecture:
# 1. Build Layer Errors:
# - Jekyll build failures (template errors, data validation)
# - Ruby gem dependency issues
# - Asset compilation failures
# - Content validation errors
#
# 2. Deployment Layer Errors:
# - GitHub Actions workflow failures
# - Cloudflare Pages deployment failures
# - DNS configuration errors
# - Environment variable issues
#
# 3. Runtime Layer Errors:
# - 4xx/5xx errors from Cloudflare edge
# - Worker runtime exceptions
# - API integration failures
# - Cache invalidation errors
#
# 4. Monitoring Layer:
# - Error aggregation and deduplication
# - Alert routing and escalation
# - Performance anomaly detection
# - Automated recovery procedures
# Error Classification:
# - Fatal: Requires immediate human intervention
# - Recoverable: Automatic recovery can be attempted
# - Transient: Temporary issues that may resolve themselves
# - Warning: Non-critical issues for investigation
Ruby provides sophisticated exception handling capabilities that can be extended for Jekyll deployments with automatic recovery, error context preservation, and intelligent retry logic.
# lib/deployment_error_handler.rb
module DeploymentErrorHandler
class Error < StandardError
attr_reader :context, :severity, :recovery_attempts
def initialize(message, context = {}, severity = :error)
super(message)
@context = context
@severity = severity
@recovery_attempts = 0
end
def to_h
{
message: message,
backtrace: backtrace,
context: @context,
severity: @severity.to_s,
timestamp: Time.now.utc.iso8601,
recovery_attempts: @recovery_attempts
}
end
end
class BuildError < Error
def initialize(message, file: nil, line: nil, template: nil)
super(message, {
file: file,
line: line,
template: template,
stage: 'build'
}, :critical)
end
end
class DeploymentError < Error
def initialize(message, deployment_id: nil, stage: nil)
super(message, {
deployment_id: deployment_id,
stage: stage || 'deployment',
environment: ENV['JEKYLL_ENV'] || 'production'
}, :critical)
end
end
# Error handler with recovery logic
class Handler
def initialize(config = {})
@config = config
@error_store = ErrorStore.new(config[:error_store])
@recovery_strategies = load_recovery_strategies
@notifiers = load_notifiers
end
def handle(error, context = {})
# Add additional context
error.context.merge!(context)
# Store error for analysis
@error_store.record(error)
# Attempt recovery for recoverable errors
if recoverable?(error)
attempt_recovery(error)
end
# Notify based on severity
notify(error) if should_notify?(error)
# Re-raise fatal errors
raise error if fatal?(error)
end
def attempt_recovery(error)
error.recovery_attempts += 1
@recovery_strategies.each do |strategy|
if strategy.applies_to?(error)
begin
strategy.recover(error)
log_recovery_success(error, strategy)
return true
rescue => recovery_error
log_recovery_failure(error, strategy, recovery_error)
end
end
end
false
end
def with_error_handling(context = {}, &block)
begin
block.call
rescue Error => e
handle(e, context)
raise e
rescue => e
# Convert generic errors to typed errors
typed_error = classify_error(e, context)
handle(typed_error, context)
raise typed_error
end
end
end
# Recovery strategies for common errors
class RecoveryStrategy
def applies_to?(error)
false
end
def recover(error)
raise NotImplementedError
end
end
class GemInstallationRecovery < RecoveryStrategy
def applies_to?(error)
error.is_a?(BuildError) &&
error.message.include?('Gem::LoadError') ||
error.message.include?('bundle install')
end
def recover(error)
# Attempt to clear bundle cache and retry
system('bundle clean --force')
system('bundle install')
# Verify recovery
raise 'Recovery failed' unless system('bundle check')
end
end
class CloudflareDeploymentRecovery < RecoveryStrategy
def applies_to?(error)
error.is_a?(DeploymentError) &&
error.message.include?('Cloudflare')
end
def recover(error)
# Extract deployment ID from error context
deployment_id = error.context[:deployment_id]
if deployment_id
# Attempt to retry deployment
client = Cloudflare::Client.new(ENV['CLOUDFLARE_API_TOKEN'])
client.retry_deployment(deployment_id)
else
# Trigger new deployment
trigger_new_deployment
end
end
end
# Error store with aggregation
class ErrorStore
def initialize(store_config)
@store = case store_config[:type]
when :redis then Redis.new(store_config[:url])
when :file then FileStore.new(store_config[:path])
when :cloudflare then CloudflareStore.new(store_config[:token])
else MemoryStore.new
end
@aggregator = ErrorAggregator.new
end
def record(error)
error_data = error.to_h
# Aggregate similar errors
fingerprint = @aggregator.fingerprint(error)
if @store.exists?(fingerprint)
# Update existing error count
existing = @store.get(fingerprint)
existing[:count] += 1
existing[:last_occurrence] = Time.now.utc.iso8601
@store.set(fingerprint, existing)
else
# Store new error
error_data[:fingerprint] = fingerprint
error_data[:count] = 1
error_data[:first_occurrence] = Time.now.utc.iso8601
@store.set(fingerprint, error_data)
end
end
end
# Jekyll plugin for error handling
class JekyllErrorHandler
def initialize(site)
@site = site
@handler = DeploymentErrorHandler::Handler.new(
site.config['error_handling'] || {}
)
end
def handle_build_errors
@handler.with_error_handling(stage: 'jekyll_build') do
yield
end
end
def validate_site
@handler.with_error_handling(stage: 'site_validation') do
validate_configuration
validate_content
validate_urls
end
end
end
end
# Integrate with Jekyll build process
Jekyll::Hooks.register :site, :pre_render do |site|
error_handler = DeploymentErrorHandler::JekyllErrorHandler.new(site)
error_handler.validate_site
end
Jekyll::Hooks.register :site, :post_write do |site|
error_handler = DeploymentErrorHandler::JekyllErrorHandler.new(site)
error_handler.handle_build_errors do
# Post-build validation
validate_build_output(site)
end
end
Cloudflare provides comprehensive analytics and error tracking through its dashboard and API. Advanced monitoring integrates these capabilities with custom error tracking for Jekyll deployments.
# lib/cloudflare_monitoring.rb
module CloudflareMonitoring
class AnalyticsCollector
def initialize(api_token, zone_id)
@client = Cloudflare::Client.new(api_token)
@zone_id = zone_id
@cache = {}
@last_fetch = nil
end
def fetch_errors(time_range = 'last_24_hours')
# Fetch error analytics from Cloudflare
data = @client.analytics(
@zone_id,
metrics: ['requests', 'status_4xx', 'status_5xx', 'status_403', 'status_404'],
dimensions: ['clientCountry', 'path', 'status'],
time_range: time_range
)
process_error_data(data)
end
def fetch_performance(time_range = 'last_hour')
# Fetch performance metrics
data = @client.analytics(
@zone_id,
metrics: ['pageViews', 'bandwidth', 'visits', 'requests'],
dimensions: ['path', 'referer'],
time_range: time_range,
granularity: 'hour'
)
process_performance_data(data)
end
def detect_anomalies
# Detect anomalies in traffic patterns
current = fetch_performance('last_hour')
historical = fetch_historical_baseline
anomalies = []
current.each do |metric, value|
baseline = historical[metric]
if baseline && anomaly_detected?(value, baseline)
anomalies << {
metric: metric,
current: value,
baseline: baseline,
deviation: calculate_deviation(value, baseline),
timestamp: Time.now.utc.iso8601
}
end
end
anomalies
end
private
def process_error_data(data)
errors = []
data['results'].each do |result|
if result['status'].to_i >= 400
errors << {
status: result['status'],
path: result['path'],
count: result['requests'],
country: result['clientCountry'],
timestamp: Time.now.utc.iso8601
}
end
end
errors.sort_by { |e| -e[:count] }
end
def fetch_historical_baseline
# Fetch historical data for comparison
@cache[:historical_baseline] ||= begin
data = @client.analytics(
@zone_id,
metrics: ['requests', 'bandwidth', 'visits'],
time_range: 'last_30_days',
granularity: 'day'
)
calculate_baseline(data)
end
end
def calculate_baseline(data)
# Calculate average and standard deviation
metrics = Hash.new { |h, k| h[k] = [] }
data['results'].each do |result|
metrics['requests'] << result['requests']
metrics['bandwidth'] << result['bandwidth']
metrics['visits'] << result['visits']
end
baseline = {}
metrics.each do |metric, values|
baseline[metric] = {
average: values.sum / values.size.to_f,
std_dev: standard_deviation(values),
min: values.min,
max: values.max
}
end
baseline
end
end
class ErrorTracker
def initialize(api_token, account_id)
@client = Cloudflare::Client.new(api_token)
@account_id = account_id
end
def track_error(error_data, severity = :error)
# Send error to Cloudflare Logs
log_entry = {
message: error_data[:message],
severity: severity,
timestamp: error_data[:timestamp] || Time.now.utc.iso8601,
context: error_data[:context] || {},
environment: ENV['JEKYLL_ENV'] || 'production'
}
@client.send_logs(@account_id, 'jekyll-errors', [log_entry])
end
def get_error_summary(time_range = 'last_24_hours')
# Fetch error summary from logs
query = '| filter severity in ("error", "critical")
| summarize count() by bin(timestamp, 1h), severity'
@client.query_logs(@account_id, query, time_range)
end
def create_alert_policy(conditions, notifications = [])
# Create alert policy for specific error conditions
policy = {
name: "Jekyll Deployment Alerts",
enabled: true,
alert_type: "stream",
conditions: conditions,
filters: {
source: "worker_logs",
service: "jekyll-deployment"
},
notifications: notifications
}
@client.create_alert_policy(@account_id, policy)
end
end
# Worker for error tracking
// workers/error-tracker.js
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url)
if (url.pathname === '/api/errors' && request.method === 'POST') {
return handleErrorReport(request, env, ctx)
}
if (url.pathname === '/api/errors/summary') {
return getErrorSummary(env)
}
return new Response('Not found', { status: 404 })
}
}
async function handleErrorReport(request, env, ctx) {
const errorData = await request.json()
// Validate error data
if (!errorData.message || !errorData.timestamp) {
return new Response('Invalid error data', { status: 400 })
}
// Store error in KV
const errorId = generateErrorId()
await env.ERRORS_KV.put(
`error:${errorId}`,
JSON.stringify({
...errorData,
id: errorId,
received: new Date().toISOString()
}),
{ expirationTtl: 604800 } // 7 days
)
// Update error aggregation
await updateErrorAggregation(errorData, env)
// Trigger alerts if needed
if (errorData.severity === 'critical') {
await triggerCriticalAlert(errorData, env, ctx)
}
return new Response(JSON.stringify({ id: errorId }), {
headers: { 'Content-Type': 'application/json' }
})
}
async function updateErrorAggregation(errorData, env) {
const hour = Math.floor(Date.now() / 3600000) * 3600000
const key = `aggregate:${hour}:${errorData.type || 'unknown'}`
const current = await env.ERRORS_KV.get(key, { type: 'json' }) || {
count: 0,
first_seen: errorData.timestamp,
last_seen: errorData.timestamp
}
current.count += 1
current.last_seen = errorData.timestamp
await env.ERRORS_KV.put(key, JSON.stringify(current), {
expirationTtl: 172800 // 48 hours
})
}
GitHub Actions provides extensive workflow monitoring capabilities that can be enhanced with custom Ruby scripts for deployment tracking and alerting.
# .github/workflows/monitoring.yml
name: Deployment Monitoring
on:
workflow_run:
workflows: ["Deploy to Production"]
types:
- completed
- requested
schedule:
- cron: '*/5 * * * *' # Check every 5 minutes
jobs:
monitor-deployment:
runs-on: ubuntu-latest
steps:
- name: Check workflow status
id: check_status
run: |
ruby .github/scripts/check_deployment_status.rb
- name: Send alerts if needed
if: steps.check_status.outputs.status != 'success'
run: |
ruby .github/scripts/send_alert.rb \
--status $ \
--workflow $ \
--run-id $
- name: Update deployment dashboard
run: |
ruby .github/scripts/update_dashboard.rb \
--run-id $ \
--status $ \
--duration $
health-check:
runs-on: ubuntu-latest
steps:
- name: Run comprehensive health check
run: |
ruby .github/scripts/health_check.rb
- name: Report health status
if: always()
run: |
ruby .github/scripts/report_health.rb \
--exit-code $
# .github/scripts/check_deployment_status.rb
#!/usr/bin/env ruby
require 'octokit'
require 'json'
require 'time'
class DeploymentMonitor
def initialize(token, repository)
@client = Octokit::Client.new(access_token: token)
@repository = repository
end
def check_workflow_run(run_id)
run = @client.workflow_run(@repository, run_id)
{
status: run.status,
conclusion: run.conclusion,
duration: calculate_duration(run),
artifacts: run.artifacts,
jobs: fetch_jobs(run_id),
created_at: run.created_at,
updated_at: run.updated_at
}
end
def check_recent_deployments(limit = 5)
runs = @client.workflow_runs(
@repository,
workflow_file_name: 'deploy.yml',
per_page: limit
)
runs.workflow_runs.map do |run|
{
id: run.id,
status: run.status,
conclusion: run.conclusion,
created_at: run.created_at,
head_branch: run.head_branch,
head_sha: run.head_sha
}
end
end
def deployment_health_score
recent = check_recent_deployments(10)
successful = recent.count { |r| r[:conclusion] == 'success' }
total = recent.size
return 100 if total == 0
(successful.to_f / total * 100).round(2)
end
private
def calculate_duration(run)
if run.status == 'completed' && run.conclusion == 'success'
start_time = Time.parse(run.created_at)
end_time = Time.parse(run.updated_at)
(end_time - start_time).round(2)
else
nil
end
end
def fetch_jobs(run_id)
jobs = @client.workflow_run_jobs(@repository, run_id)
jobs.jobs.map do |job|
{
name: job.name,
status: job.status,
conclusion: job.conclusion,
started_at: job.started_at,
completed_at: job.completed_at,
steps: job.steps.map { |s| { name: s.name, conclusion: s.conclusion } }
}
end
end
end
if __FILE__ == $0
token = ENV['GITHUB_TOKEN']
repository = ENV['GITHUB_REPOSITORY']
run_id = ARGV[0] || ENV['GITHUB_RUN_ID']
monitor = DeploymentMonitor.new(token, repository)
if run_id
result = monitor.check_workflow_run(run_id)
# Output for GitHub Actions
puts "status=#{result[:conclusion] || result[:status]}"
puts "duration=#{result[:duration] || 0}"
# JSON output
File.write('deployment_status.json', JSON.pretty_generate(result))
else
# Check deployment health
score = monitor.deployment_health_score
puts "health_score=#{score}"
if score < 80
puts "Health check failed: #{score}% success rate"
exit 1
end
end
end
# .github/scripts/send_alert.rb
#!/usr/bin/env ruby
require 'slack-ruby-client'
require 'discordrb'
require 'json'
class AlertSender
def initialize(config)
@config = config
@notifiers = build_notifiers
end
def send_alert(alert_data)
alert_data[:timestamp] = Time.now.utc.iso8601
@notifiers.each do |notifier|
begin
notifier.send(alert_data)
rescue => e
log("Failed to send alert via #{notifier.class}: #{e.message}")
end
end
# Store alert for audit
store_alert(alert_data)
end
private
def build_notifiers
notifiers = []
if @config[:slack_webhook]
notifiers << SlackNotifier.new(@config[:slack_webhook])
end
if @config[:discord_webhook]
notifiers << DiscordNotifier.new(@config[:discord_webhook])
end
if @config[:pagerduty_key]
notifiers << PagerDutyNotifier.new(@config[:pagerduty_key])
end
notifiers
end
def store_alert(alert_data)
# Store in Cloudflare KV via Worker
uri = URI.parse('https://alerts.yourdomain.com/api/alerts')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.path)
request['Authorization'] = "Bearer #{@config[:alert_token]}"
request['Content-Type'] = 'application/json'
request.body = alert_data.to_json
http.request(request)
end
end
class SlackNotifier
def initialize(webhook_url)
@webhook_url = webhook_url
end
def send(alert_data)
payload = {
text: format_message(alert_data),
attachments: [
{
color: alert_color(alert_data[:severity]),
fields: format_fields(alert_data),
ts: Time.now.to_i
}
]
}
uri = URI.parse(@webhook_url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.path)
request['Content-Type'] = 'application/json'
request.body = payload.to_json
http.request(request)
end
private
def format_message(alert_data)
emoji = case alert_data[:severity]
when 'critical' then 'đ¨'
when 'error' then 'â'
when 'warning' then 'â ī¸'
else 'âšī¸'
end
"#{emoji} *#{alert_data[:title]}*\n#{alert_data[:message]}"
end
end
Distributed tracing provides end-to-end visibility across the deployment pipeline, connecting errors and performance issues across different systems and services.
# lib/distributed_tracing.rb
module DistributedTracing
class Trace
attr_reader :trace_id, :spans, :metadata
def initialize(trace_id = nil, metadata = {})
@trace_id = trace_id || generate_trace_id
@spans = []
@metadata = metadata
@start_time = Time.now.utc
end
def start_span(name, attributes = {})
span = Span.new(
name: name,
trace_id: @trace_id,
span_id: generate_span_id,
parent_span_id: current_span_id,
attributes: attributes,
start_time: Time.now.utc
)
@spans << span
span
end
def finish_span(span, status = :ok, error = nil)
span.finish(status, error)
end
def export
{
trace_id: @trace_id,
metadata: @metadata,
spans: @spans.map(&:to_h),
duration: (Time.now.utc - @start_time).round(3),
start_time: @start_time.iso8601,
end_time: Time.now.utc.iso8601
}
end
def send_to_collector(collector_url)
exporter = Exporter.new(collector_url)
exporter.export(self)
end
private
def generate_trace_id
SecureRandom.hex(16)
end
def generate_span_id
SecureRandom.hex(8)
end
def current_span_id
@spans.last&.span_id
end
end
class Span
attr_reader :name, :trace_id, :span_id, :parent_span_id, :attributes
attr_reader :start_time, :end_time, :status, :error
def initialize(name:, trace_id:, span_id:, parent_span_id:, attributes:, start_time:)
@name = name
@trace_id = trace_id
@span_id = span_id
@parent_span_id = parent_span_id
@attributes = attributes
@start_time = start_time
@events = []
end
def add_event(name, attributes = {})
@events << {
name: name,
attributes: attributes,
timestamp: Time.now.utc.iso8601
}
end
def finish(status = :ok, error = nil)
@end_time = Time.now.utc
@status = status
@error = error
@duration = (@end_time - @start_time).round(6)
end
def to_h
{
name: @name,
trace_id: @trace_id,
span_id: @span_id,
parent_span_id: @parent_span_id,
attributes: @attributes,
start_time: @start_time.iso8601,
end_time: @end_time&.iso8601,
duration: @duration,
status: @status,
error: @error&.message,
events: @events
}
end
end
# Jekyll build tracing
class JekyllTracer
def initialize(trace)
@trace = trace
@current_span = nil
end
def trace_build(&block)
@current_span = @trace.start_span('jekyll_build', {
environment: ENV['JEKYLL_ENV'],
site_source: @site.source,
site_dest: @site.dest
})
begin
result = block.call
@trace.finish_span(@current_span, :ok)
result
rescue => e
@current_span.add_event('build_error', { error: e.message })
@trace.finish_span(@current_span, :error, e)
raise e
end
end
def trace_generation(generator_name, &block)
span = @trace.start_span("generate_#{generator_name}", {
generator: generator_name
})
begin
result = block.call
@trace.finish_span(span, :ok)
result
rescue => e
span.add_event('generation_error', { error: e.message })
@trace.finish_span(span, :error, e)
raise e
end
end
end
# GitHub Actions workflow tracing
class WorkflowTracer
def initialize(trace_id, run_id)
@trace = Trace.new(trace_id, {
workflow_run_id: run_id,
repository: ENV['GITHUB_REPOSITORY'],
actor: ENV['GITHUB_ACTOR']
})
end
def trace_job(job_name, &block)
span = @trace.start_span("job_#{job_name}", {
job: job_name,
runner: ENV['RUNNER_NAME']
})
begin
result = block.call
@trace.finish_span(span, :ok)
result
rescue => e
span.add_event('job_failed', { error: e.message })
@trace.finish_span(span, :error, e)
raise e
end
end
end
# Cloudflare Pages deployment tracing
class DeploymentTracer
def initialize(trace_id, deployment_id)
@trace = Trace.new(trace_id, {
deployment_id: deployment_id,
project: ENV['CLOUDFLARE_PROJECT_NAME'],
environment: ENV['CLOUDFLARE_ENVIRONMENT']
})
end
def trace_stage(stage_name, &block)
span = @trace.start_span("deployment_#{stage_name}", {
stage: stage_name,
timestamp: Time.now.utc.iso8601
})
begin
result = block.call
@trace.finish_span(span, :ok)
result
rescue => e
span.add_event('stage_failed', {
error: e.message,
retry_attempt: @retry_count || 0
})
@trace.finish_span(span, :error, e)
raise e
end
end
end
end
# Integration with Jekyll
Jekyll::Hooks.register :site, :after_reset do |site|
trace_id = ENV['TRACE_ID'] || SecureRandom.hex(16)
tracer = DistributedTracing::JekyllTracer.new(
DistributedTracing::Trace.new(trace_id, {
site_config: site.config.keys,
jekyll_version: Jekyll::VERSION
})
)
site.data['_tracer'] = tracer
end
# Worker for trace collection
// workers/trace-collector.js
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url)
if (url.pathname === '/api/traces' && request.method === 'POST') {
return handleTraceSubmission(request, env, ctx)
}
return new Response('Not found', { status: 404 })
}
}
async function handleTraceSubmission(request, env, ctx) {
const trace = await request.json()
// Validate trace
if (!trace.trace_id || !trace.spans) {
return new Response('Invalid trace data', { status: 400 })
}
// Store trace
await storeTrace(trace, env)
// Process for analytics
await processTraceAnalytics(trace, env, ctx)
return new Response(JSON.stringify({ received: true }))
}
async function storeTrace(trace, env) {
const traceKey = `trace:${trace.trace_id}`
// Store full trace
await env.TRACES_KV.put(traceKey, JSON.stringify(trace), {
metadata: {
start_time: trace.start_time,
duration: trace.duration,
span_count: trace.spans.length
}
})
// Index spans for querying
for (const span of trace.spans) {
const spanKey = `span:${trace.trace_id}:${span.span_id}`
await env.SPANS_KV.put(spanKey, JSON.stringify(span))
// Index by span name
const indexKey = `index:span_name:${span.name}`
await env.SPANS_KV.put(indexKey, JSON.stringify({
trace_id: trace.trace_id,
span_id: span.span_id,
start_time: span.start_time
}))
}
}
An intelligent alerting system categorizes issues, routes them appropriately, and provides context for quick resolution while avoiding alert fatigue.
# lib/alerting_system.rb
module AlertingSystem
class AlertManager
def initialize(config)
@config = config
@routing_rules = load_routing_rules
@escalation_policies = load_escalation_policies
@alert_history = AlertHistory.new
@deduplicator = AlertDeduplicator.new
end
def create_alert(alert_data)
# Deduplicate similar alerts
fingerprint = @deduplicator.fingerprint(alert_data)
if @deduplicator.recent_duplicate?(fingerprint)
log("Duplicate alert suppressed: #{fingerprint}")
return nil
end
# Create alert with context
alert = Alert.new(alert_data.merge(fingerprint: fingerprint))
# Determine routing
route = determine_route(alert)
# Apply escalation policy
escalation = determine_escalation(alert)
# Store alert
@alert_history.record(alert)
# Send notifications
send_notifications(alert, route, escalation)
alert
end
def resolve_alert(alert_id, resolution_data = {})
alert = @alert_history.find(alert_id)
if alert
alert.resolve(resolution_data)
@alert_history.update(alert)
# Send resolution notifications
send_resolution_notifications(alert)
end
end
private
def determine_route(alert)
@routing_rules.find do |rule|
rule.matches?(alert)
end || default_route
end
def determine_escalation(alert)
policy = @escalation_policies.find { |p| p.applies_to?(alert) }
policy || default_escalation_policy
end
def send_notifications(alert, route, escalation)
# Send to primary channels
route.channels.each do |channel|
send_to_channel(alert, channel)
end
# Schedule escalation if needed
if escalation.enabled?
schedule_escalation(alert, escalation)
end
end
def send_to_channel(alert, channel)
notifier = NotifierFactory.create(channel.type, channel.config)
notifier.send(alert.formatted_for(channel.format))
rescue => e
log("Failed to send to #{channel.type}: #{e.message}")
end
end
class Alert
attr_reader :id, :fingerprint, :severity, :status, :created_at, :resolved_at
attr_accessor :context, :assignee, :notes
def initialize(data)
@id = SecureRandom.uuid
@fingerprint = data[:fingerprint]
@title = data[:title]
@description = data[:description]
@severity = data[:severity] || :error
@status = :open
@context = data[:context] || {}
@created_at = Time.now.utc
@updated_at = @created_at
@resolved_at = nil
@assignee = nil
@notes = []
@notifications = []
end
def resolve(resolution_data = {})
@status = :resolved
@resolved_at = Time.now.utc
@resolution = resolution_data[:resolution] || 'manual'
@resolution_notes = resolution_data[:notes]
@updated_at = @resolved_at
add_note("Alert resolved: #{@resolution}")
end
def add_note(text, author = 'system')
@notes << {
text: text,
author: author,
timestamp: Time.now.utc.iso8601
}
@updated_at = Time.now.utc
end
def formatted_for(format = :slack)
case format
when :slack
format_for_slack
when :email
format_for_email
when :webhook
format_for_webhook
else
to_h
end
end
private
def format_for_slack
{
text: "*#{@title}*",
attachments: [
{
color: severity_color,
fields: [
{
title: "Description",
value: @description,
short: false
},
{
title: "Severity",
value: @severity.to_s.upcase,
short: true
},
{
title: "Status",
value: @status.to_s.upcase,
short: true
}
],
ts: @created_at.to_i
}
]
}
end
def severity_color
case @severity
when :critical then "#FF0000"
when :error then "#FF6B6B"
when :warning then "#FFA726"
when :info then "#42A5F5"
else "#78909C"
end
end
end
class AlertDeduplicator
def initialize(window_minutes = 5)
@window = window_minutes * 60
@recent_alerts = {}
end
def fingerprint(alert_data)
# Create fingerprint from alert characteristics
components = [
alert_data[:title],
alert_data[:severity],
alert_data.dig(:context, :source),
alert_data.dig(:context, :error_type)
].compact.map(&:to_s).join('|')
Digest::SHA256.hexdigest(components)
end
def recent_duplicate?(fingerprint)
if @recent_alerts[fingerprint]
time_since = Time.now.utc - @recent_alerts[fingerprint]
time_since < @window
else
@recent_alerts[fingerprint] = Time.now.utc
cleanup_old_alerts
false
end
end
private
def cleanup_old_alerts
cutoff = Time.now.utc - @window
@recent_alerts.delete_if { |_, timestamp| timestamp < cutoff }
end
end
# Integration with deployment errors
class DeploymentAlerting
def initialize(alert_manager)
@alert_manager = alert_manager
end
def handle_deployment_error(error, deployment_context = {})
alert_data = {
title: "Deployment Failed: #{deployment_context[:stage]}",
description: error.message,
severity: determine_severity(error),
context: {
source: 'deployment',
stage: deployment_context[:stage],
deployment_id: deployment_context[:id],
environment: deployment_context[:environment],
error_type: error.class.name,
backtrace: error.backtrace.first(5)
}
}
@alert_manager.create_alert(alert_data)
end
def handle_build_error(error, build_context = {})
alert_data = {
title: "Build Failed: #{build_context[:component]}",
description: error.message,
severity: :error,
context: {
source: 'build',
component: build_context[:component],
file: build_context[:file],
line: build_context[:line],
jekyll_env: ENV['JEKYLL_ENV'],
error_type: error.class.name
}
}
@alert_manager.create_alert(alert_data)
end
private
def determine_severity(error)
case error
when DeploymentErrorHandler::BuildError
:critical
when DeploymentErrorHandler::DeploymentError
:error
when Cloudflare::APIError
error.message.include?('rate limit') ? :warning : :error
else
:error
end
end
end
end
# Rake task for alert testing
namespace :alerts do
desc 'Test alerting system'
task :test do
require_relative 'lib/alerting_system'
alert_manager = AlertingSystem::AlertManager.new(
config_file: 'config/alerting.yml'
)
# Test critical alert
alert_manager.create_alert(
title: 'Test Critical Alert',
description: 'This is a test of the alerting system',
severity: :critical,
context: {
source: 'test',
test_id: '12345'
}
)
puts 'Test alert sent successfully'
end
end
This comprehensive error handling and monitoring system provides enterprise-grade observability for Jekyll deployments. By combining Ruby's error handling capabilities with Cloudflare's monitoring tools and GitHub Actions' workflow tracking, you can achieve rapid detection, diagnosis, and resolution of deployment issues while maintaining high reliability and performance.