# Node-RED update logging for Grafana This guide adds structured update-event logging to your existing Node-RED + Telegraf + Prometheus + Grafana stack without introducing Loki. ## Goal Track and surface (in Grafana) the latest update attempts from Node-RED, including: - when an update attempt started, - target container/project, - success/failure, - optional failure reason, - elapsed duration. ## 1) Add a reusable logger function in Node-RED Create a **Function** node named `Build update log event` and use: ```javascript const nowIso = new Date().toISOString(); const startedAt = msg.update_started_at || Date.now(); const durationMs = Math.max(0, Date.now() - startedAt); const payload = msg.payload || {}; const labels = payload.labels || {}; const status = (msg.update_status || payload.status || "unknown").toString().toLowerCase(); const success = status === "success" ? 1 : 0; const failed = status === "failed" ? 1 : 0; msg.payload = { ts: nowIso, flow: "docker-updates", event: msg.update_event || "attempt", container: msg.container || labels.container || "unknown", project: labels.com_docker_compose_project || msg.project || "unknown", host: msg.host || "unknown", status, success, failed, duration_ms: durationMs, code: Number.isFinite(Number(payload.code)) ? Number(payload.code) : 0, error: (msg.update_error || payload.error || "").toString().slice(0, 300) }; // one JSON line per event for file output msg.payload = JSON.stringify(msg.payload); return msg; ``` ### Wiring recommendation Use the same logger function in these branches: - before a pull/update command (`update_status=started`, `update_event=attempt`), - success path (`update_status=success`, `update_event=completed`), - failure path (`update_status=failed`, `update_event=completed`, and include `msg.update_error`). Then route each branch into a **File** node configured as: - Filename: `/data/update-events.ndjson` - Action: append to file - Add newline: enabled ## 2) Make update state explicit in existing update flow In your current update flow (already present in `flows.json`), add/change **Change** nodes around your shell/docker nodes: - At update start: - `msg.update_started_at = $millis()` - `msg.update_status = "started"` - `msg.update_event = "attempt"` - At success: - `msg.update_status = "success"` - `msg.update_event = "completed"` - At failure: - `msg.update_status = "failed"` - `msg.update_event = "completed"` - `msg.update_error = msg.payload.stderr` (or equivalent error field) ## 3) Let Telegraf ingest Node-RED event logs Append this to `monitoring/telegraf/telegraf.conf`: ```toml [[inputs.tail]] files = ["/var/log/node-red/update-events.ndjson"] from_beginning = false name_override = "node_red_update_event" data_format = "json_v2" [[inputs.tail.json_v2]] measurement_name = "node_red_update_event" [[inputs.tail.json_v2.tag]] path = "flow" [[inputs.tail.json_v2.tag]] path = "event" [[inputs.tail.json_v2.tag]] path = "container" [[inputs.tail.json_v2.tag]] path = "project" [[inputs.tail.json_v2.tag]] path = "host" [[inputs.tail.json_v2.tag]] path = "status" [[inputs.tail.json_v2.field]] path = "success" type = "int" [[inputs.tail.json_v2.field]] path = "failed" type = "int" [[inputs.tail.json_v2.field]] path = "duration_ms" type = "int" [[inputs.tail.json_v2.field]] path = "code" type = "int" ``` And mount the Node-RED data directory into Telegraf (read-only) in `monitoring/prometheus/docker-compose.yml` under `telegraf.volumes`: ```yaml - ${PROJECT_ROOT}/monitoring/node-red/data:/var/log/node-red:ro ``` ## 4) Prometheus scrape (already in place) No Prometheus scrape change is required as long as it already scrapes Telegraf (`telegraf:9273`). ## 5) Grafana queries to start with Use your Prometheus data source and try: - Latest success/failure by container: - `last_over_time(node_red_update_event_success[24h])` - `last_over_time(node_red_update_event_failed[24h])` - Failed updates in the last 24h: - `sum by (container, project) (increase(node_red_update_event_failed[24h]))` - Average update duration in last 24h: - `avg by (container, project) (avg_over_time(node_red_update_event_duration_ms[24h]))` Recommended panels: - **Table**: container, project, status (last value), duration_ms (last value) - **Time series**: failed count over time - **Stat**: total failed updates in last 24h ## 6) Validation checklist 1. Trigger a known update path (including one failure if possible). 2. Check Node-RED log file: - `tail -n 20 monitoring/node-red/data/update-events.ndjson` 3. Check Telegraf metrics endpoint for `node_red_update_event_` metrics. 4. Confirm Grafana panel values match the latest Node-RED run. ## Optional next step If you want searchable raw log text and richer log UX, add Loki + Promtail later. Keep this structured metrics path for high-signal alerting even after adding logs.