161 lines
5.0 KiB
Markdown
161 lines
5.0 KiB
Markdown
# Node-RED update logging for Grafana
|
|
|
|
This guide adds structured update-event logging to your existing Node-RED + Telegraf + Prometheus + Grafana stack without introducing Loki.
|
|
|
|
## Goal
|
|
|
|
Track and surface (in Grafana) the latest update attempts from Node-RED, including:
|
|
|
|
- when an update attempt started,
|
|
- target container/project,
|
|
- success/failure,
|
|
- optional failure reason,
|
|
- elapsed duration.
|
|
|
|
## 1) Add a reusable logger function in Node-RED
|
|
|
|
Create a **Function** node named `Build update log event` and use:
|
|
|
|
```javascript
|
|
const nowIso = new Date().toISOString();
|
|
const startedAt = msg.update_started_at || Date.now();
|
|
const durationMs = Math.max(0, Date.now() - startedAt);
|
|
|
|
const payload = msg.payload || {};
|
|
const labels = payload.labels || {};
|
|
|
|
const status = (msg.update_status || payload.status || "unknown").toString().toLowerCase();
|
|
const success = status === "success" ? 1 : 0;
|
|
const failed = status === "failed" ? 1 : 0;
|
|
|
|
msg.payload = {
|
|
ts: nowIso,
|
|
flow: "docker-updates",
|
|
event: msg.update_event || "attempt",
|
|
container: msg.container || labels.container || "unknown",
|
|
project: labels.com_docker_compose_project || msg.project || "unknown",
|
|
host: msg.host || "unknown",
|
|
status,
|
|
success,
|
|
failed,
|
|
duration_ms: durationMs,
|
|
code: Number.isFinite(Number(payload.code)) ? Number(payload.code) : 0,
|
|
error: (msg.update_error || payload.error || "").toString().slice(0, 300)
|
|
};
|
|
|
|
// one JSON line per event for file output
|
|
msg.payload = JSON.stringify(msg.payload);
|
|
return msg;
|
|
```
|
|
|
|
### Wiring recommendation
|
|
|
|
Use the same logger function in these branches:
|
|
|
|
- before a pull/update command (`update_status=started`, `update_event=attempt`),
|
|
- success path (`update_status=success`, `update_event=completed`),
|
|
- failure path (`update_status=failed`, `update_event=completed`, and include `msg.update_error`).
|
|
|
|
Then route each branch into a **File** node configured as:
|
|
|
|
- Filename: `/data/update-events.ndjson`
|
|
- Action: append to file
|
|
- Add newline: enabled
|
|
|
|
## 2) Make update state explicit in existing update flow
|
|
|
|
In your current update flow (already present in `flows.json`), add/change **Change** nodes around your shell/docker nodes:
|
|
|
|
- At update start:
|
|
- `msg.update_started_at = $millis()`
|
|
- `msg.update_status = "started"`
|
|
- `msg.update_event = "attempt"`
|
|
- At success:
|
|
- `msg.update_status = "success"`
|
|
- `msg.update_event = "completed"`
|
|
- At failure:
|
|
- `msg.update_status = "failed"`
|
|
- `msg.update_event = "completed"`
|
|
- `msg.update_error = msg.payload.stderr` (or equivalent error field)
|
|
|
|
## 3) Let Telegraf ingest Node-RED event logs
|
|
|
|
Append this to `monitoring/telegraf/telegraf.conf`:
|
|
|
|
```toml
|
|
[[inputs.tail]]
|
|
files = ["/var/log/node-red/update-events.ndjson"]
|
|
from_beginning = false
|
|
name_override = "node_red_update_event"
|
|
data_format = "json_v2"
|
|
|
|
[[inputs.tail.json_v2]]
|
|
measurement_name = "node_red_update_event"
|
|
|
|
[[inputs.tail.json_v2.tag]]
|
|
path = "flow"
|
|
[[inputs.tail.json_v2.tag]]
|
|
path = "event"
|
|
[[inputs.tail.json_v2.tag]]
|
|
path = "container"
|
|
[[inputs.tail.json_v2.tag]]
|
|
path = "project"
|
|
[[inputs.tail.json_v2.tag]]
|
|
path = "host"
|
|
[[inputs.tail.json_v2.tag]]
|
|
path = "status"
|
|
|
|
[[inputs.tail.json_v2.field]]
|
|
path = "success"
|
|
type = "int"
|
|
[[inputs.tail.json_v2.field]]
|
|
path = "failed"
|
|
type = "int"
|
|
[[inputs.tail.json_v2.field]]
|
|
path = "duration_ms"
|
|
type = "int"
|
|
[[inputs.tail.json_v2.field]]
|
|
path = "code"
|
|
type = "int"
|
|
```
|
|
|
|
And mount the Node-RED data directory into Telegraf (read-only) in `monitoring/prometheus/docker-compose.yml` under `telegraf.volumes`:
|
|
|
|
```yaml
|
|
- ${PROJECT_ROOT}/monitoring/node-red/data:/var/log/node-red:ro
|
|
```
|
|
|
|
## 4) Prometheus scrape (already in place)
|
|
|
|
No Prometheus scrape change is required as long as it already scrapes Telegraf (`telegraf:9273`).
|
|
|
|
## 5) Grafana queries to start with
|
|
|
|
Use your Prometheus data source and try:
|
|
|
|
- Latest success/failure by container:
|
|
- `last_over_time(node_red_update_event_success[24h])`
|
|
- `last_over_time(node_red_update_event_failed[24h])`
|
|
- Failed updates in the last 24h:
|
|
- `sum by (container, project) (increase(node_red_update_event_failed[24h]))`
|
|
- Average update duration in last 24h:
|
|
- `avg by (container, project) (avg_over_time(node_red_update_event_duration_ms[24h]))`
|
|
|
|
Recommended panels:
|
|
|
|
- **Table**: container, project, status (last value), duration_ms (last value)
|
|
- **Time series**: failed count over time
|
|
- **Stat**: total failed updates in last 24h
|
|
|
|
## 6) Validation checklist
|
|
|
|
1. Trigger a known update path (including one failure if possible).
|
|
2. Check Node-RED log file:
|
|
- `tail -n 20 monitoring/node-red/data/update-events.ndjson`
|
|
3. Check Telegraf metrics endpoint for `node_red_update_event_` metrics.
|
|
4. Confirm Grafana panel values match the latest Node-RED run.
|
|
|
|
## Optional next step
|
|
|
|
If you want searchable raw log text and richer log UX, add Loki + Promtail later. Keep this structured metrics path for high-signal alerting even after adding logs.
|