5.0 KiB
Node-RED update logging for Grafana
This guide adds structured update-event logging to your existing Node-RED + Telegraf + Prometheus + Grafana stack without introducing Loki.
Goal
Track and surface (in Grafana) the latest update attempts from Node-RED, including:
- when an update attempt started,
- target container/project,
- success/failure,
- optional failure reason,
- elapsed duration.
1) Add a reusable logger function in Node-RED
Create a Function node named Build update log event and use:
const nowIso = new Date().toISOString();
const startedAt = msg.update_started_at || Date.now();
const durationMs = Math.max(0, Date.now() - startedAt);
const payload = msg.payload || {};
const labels = payload.labels || {};
const status = (msg.update_status || payload.status || "unknown").toString().toLowerCase();
const success = status === "success" ? 1 : 0;
const failed = status === "failed" ? 1 : 0;
msg.payload = {
ts: nowIso,
flow: "docker-updates",
event: msg.update_event || "attempt",
container: msg.container || labels.container || "unknown",
project: labels.com_docker_compose_project || msg.project || "unknown",
host: msg.host || "unknown",
status,
success,
failed,
duration_ms: durationMs,
code: Number.isFinite(Number(payload.code)) ? Number(payload.code) : 0,
error: (msg.update_error || payload.error || "").toString().slice(0, 300)
};
// one JSON line per event for file output
msg.payload = JSON.stringify(msg.payload);
return msg;
Wiring recommendation
Use the same logger function in these branches:
- before a pull/update command (
update_status=started,update_event=attempt), - success path (
update_status=success,update_event=completed), - failure path (
update_status=failed,update_event=completed, and includemsg.update_error).
Then route each branch into a File node configured as:
- Filename:
/data/update-events.ndjson - Action: append to file
- Add newline: enabled
2) Make update state explicit in existing update flow
In your current update flow (already present in flows.json), add/change Change nodes around your shell/docker nodes:
- At update start:
msg.update_started_at = $millis()msg.update_status = "started"msg.update_event = "attempt"
- At success:
msg.update_status = "success"msg.update_event = "completed"
- At failure:
msg.update_status = "failed"msg.update_event = "completed"msg.update_error = msg.payload.stderr(or equivalent error field)
3) Let Telegraf ingest Node-RED event logs
Append this to monitoring/telegraf/telegraf.conf:
[[inputs.tail]]
files = ["/var/log/node-red/update-events.ndjson"]
from_beginning = false
name_override = "node_red_update_event"
data_format = "json_v2"
[[inputs.tail.json_v2]]
measurement_name = "node_red_update_event"
[[inputs.tail.json_v2.tag]]
path = "flow"
[[inputs.tail.json_v2.tag]]
path = "event"
[[inputs.tail.json_v2.tag]]
path = "container"
[[inputs.tail.json_v2.tag]]
path = "project"
[[inputs.tail.json_v2.tag]]
path = "host"
[[inputs.tail.json_v2.tag]]
path = "status"
[[inputs.tail.json_v2.field]]
path = "success"
type = "int"
[[inputs.tail.json_v2.field]]
path = "failed"
type = "int"
[[inputs.tail.json_v2.field]]
path = "duration_ms"
type = "int"
[[inputs.tail.json_v2.field]]
path = "code"
type = "int"
And mount the Node-RED data directory into Telegraf (read-only) in monitoring/prometheus/docker-compose.yml under telegraf.volumes:
- ${PROJECT_ROOT}/monitoring/node-red/data:/var/log/node-red:ro
4) Prometheus scrape (already in place)
No Prometheus scrape change is required as long as it already scrapes Telegraf (telegraf:9273).
5) Grafana queries to start with
Use your Prometheus data source and try:
- Latest success/failure by container:
last_over_time(node_red_update_event_success[24h])last_over_time(node_red_update_event_failed[24h])
- Failed updates in the last 24h:
sum by (container, project) (increase(node_red_update_event_failed[24h]))
- Average update duration in last 24h:
avg by (container, project) (avg_over_time(node_red_update_event_duration_ms[24h]))
Recommended panels:
- Table: container, project, status (last value), duration_ms (last value)
- Time series: failed count over time
- Stat: total failed updates in last 24h
6) Validation checklist
- Trigger a known update path (including one failure if possible).
- Check Node-RED log file:
tail -n 20 monitoring/node-red/data/update-events.ndjson
- Check Telegraf metrics endpoint for
node_red_update_event_metrics. - Confirm Grafana panel values match the latest Node-RED run.
Optional next step
If you want searchable raw log text and richer log UX, add Loki + Promtail later. Keep this structured metrics path for high-signal alerting even after adding logs.