Files
docker/monitoring/node-red/UPDATE_LOGGING_GRAFANA.md

5.0 KiB

Node-RED update logging for Grafana

This guide adds structured update-event logging to your existing Node-RED + Telegraf + Prometheus + Grafana stack without introducing Loki.

Goal

Track and surface (in Grafana) the latest update attempts from Node-RED, including:

  • when an update attempt started,
  • target container/project,
  • success/failure,
  • optional failure reason,
  • elapsed duration.

1) Add a reusable logger function in Node-RED

Create a Function node named Build update log event and use:

const nowIso = new Date().toISOString();
const startedAt = msg.update_started_at || Date.now();
const durationMs = Math.max(0, Date.now() - startedAt);

const payload = msg.payload || {};
const labels = payload.labels || {};

const status = (msg.update_status || payload.status || "unknown").toString().toLowerCase();
const success = status === "success" ? 1 : 0;
const failed = status === "failed" ? 1 : 0;

msg.payload = {
  ts: nowIso,
  flow: "docker-updates",
  event: msg.update_event || "attempt",
  container: msg.container || labels.container || "unknown",
  project: labels.com_docker_compose_project || msg.project || "unknown",
  host: msg.host || "unknown",
  status,
  success,
  failed,
  duration_ms: durationMs,
  code: Number.isFinite(Number(payload.code)) ? Number(payload.code) : 0,
  error: (msg.update_error || payload.error || "").toString().slice(0, 300)
};

// one JSON line per event for file output
msg.payload = JSON.stringify(msg.payload);
return msg;

Wiring recommendation

Use the same logger function in these branches:

  • before a pull/update command (update_status=started, update_event=attempt),
  • success path (update_status=success, update_event=completed),
  • failure path (update_status=failed, update_event=completed, and include msg.update_error).

Then route each branch into a File node configured as:

  • Filename: /data/update-events.ndjson
  • Action: append to file
  • Add newline: enabled

2) Make update state explicit in existing update flow

In your current update flow (already present in flows.json), add/change Change nodes around your shell/docker nodes:

  • At update start:
    • msg.update_started_at = $millis()
    • msg.update_status = "started"
    • msg.update_event = "attempt"
  • At success:
    • msg.update_status = "success"
    • msg.update_event = "completed"
  • At failure:
    • msg.update_status = "failed"
    • msg.update_event = "completed"
    • msg.update_error = msg.payload.stderr (or equivalent error field)

3) Let Telegraf ingest Node-RED event logs

Append this to monitoring/telegraf/telegraf.conf:

[[inputs.tail]]
  files = ["/var/log/node-red/update-events.ndjson"]
  from_beginning = false
  name_override = "node_red_update_event"
  data_format = "json_v2"

  [[inputs.tail.json_v2]]
    measurement_name = "node_red_update_event"

    [[inputs.tail.json_v2.tag]]
      path = "flow"
    [[inputs.tail.json_v2.tag]]
      path = "event"
    [[inputs.tail.json_v2.tag]]
      path = "container"
    [[inputs.tail.json_v2.tag]]
      path = "project"
    [[inputs.tail.json_v2.tag]]
      path = "host"
    [[inputs.tail.json_v2.tag]]
      path = "status"

    [[inputs.tail.json_v2.field]]
      path = "success"
      type = "int"
    [[inputs.tail.json_v2.field]]
      path = "failed"
      type = "int"
    [[inputs.tail.json_v2.field]]
      path = "duration_ms"
      type = "int"
    [[inputs.tail.json_v2.field]]
      path = "code"
      type = "int"

And mount the Node-RED data directory into Telegraf (read-only) in monitoring/prometheus/docker-compose.yml under telegraf.volumes:

      - ${PROJECT_ROOT}/monitoring/node-red/data:/var/log/node-red:ro

4) Prometheus scrape (already in place)

No Prometheus scrape change is required as long as it already scrapes Telegraf (telegraf:9273).

5) Grafana queries to start with

Use your Prometheus data source and try:

  • Latest success/failure by container:
    • last_over_time(node_red_update_event_success[24h])
    • last_over_time(node_red_update_event_failed[24h])
  • Failed updates in the last 24h:
    • sum by (container, project) (increase(node_red_update_event_failed[24h]))
  • Average update duration in last 24h:
    • avg by (container, project) (avg_over_time(node_red_update_event_duration_ms[24h]))

Recommended panels:

  • Table: container, project, status (last value), duration_ms (last value)
  • Time series: failed count over time
  • Stat: total failed updates in last 24h

6) Validation checklist

  1. Trigger a known update path (including one failure if possible).
  2. Check Node-RED log file:
    • tail -n 20 monitoring/node-red/data/update-events.ndjson
  3. Check Telegraf metrics endpoint for node_red_update_event_ metrics.
  4. Confirm Grafana panel values match the latest Node-RED run.

Optional next step

If you want searchable raw log text and richer log UX, add Loki + Promtail later. Keep this structured metrics path for high-signal alerting even after adding logs.