Pipelines and Pizza 🍕

Tag: observability

All the articles with the tag "observability".

First 100 Days of LGTM on Nutanix: What We Built, What We Got Wrong, What's Next

29 May, 2026

A hundred days into running the LGTM stack on Nutanix — what's actually deployed, the bugs that bit us, the surprises that didn't, and what's still on the roadmap. No fabricated war stories; every fix below is in our changelog.
Dashboards That Actually Get Used: 118 Across 18 Folders, Organized by Domain

26 May, 2026

The dashboard set we maintain — 118 dashboards across 18 domain-organized folders, NetApp Harvest + Pure FlashArray + Nutanix multi-vendor storage section, ConfigMap-sidecar GitOps pattern, and the rules that keep the set from sprawling.
Grafana 13 on CloudNativePG: The Real Upgrade Walkthrough

22 May, 2026

Running Grafana on Kubernetes backed by CloudNativePG instead of SQLite, plus the real Grafana 12.4.2 → 13.0.1 upgrade we ran last month, including the irreversible-moment that makes the pre-upgrade Postgres backup load-bearing.
Mimir on Kubernetes: 620K Active Series on Nutanix Objects

19 May, 2026

Deploying Grafana Mimir in SimpleScalable mode on Kubernetes with Nutanix Objects as the S3 backend — the real config running ~620K active series at 365-day retention, including why three of our Mimir components run as singletons on purpose.
Loki in Production: Labels, Per-Stream Retention, and the LogQL Alerts We Run

15 May, 2026

The production side of Loki — the label set we run, the 14-rule per-stream retention table, the LogQL alerts we actually rely on (audit, syslog, firewall), and the ingestion-rate gotcha that bit us early.
Deploying Loki on Kubernetes: SimpleScalable on Nutanix Objects

12 May, 2026

Deploying Grafana Loki in SimpleScalable mode on Kubernetes with Nutanix Objects as the S3 backend — the real values we run, schema configuration, and why we picked SimpleScalable over Distributed.
Alloy in Production: The DaemonSet Config Running The Conveyor's Observability

8 May, 2026

The real Alloy DaemonSet configuration running across fleet — pod logs, kube-audit, node metrics, kubelet, cAdvisor, etcd, apiserver, CoreDNS, CNPG, and synthetic probes — plus the production lessons that shaped it.
Grafana Alloy on Kubernetes: Three Deployments, One Collector

5 May, 2026

How we deploy Grafana Alloy on Kubernetes using three separate topologies — a DaemonSet for pod logs and node metrics, a Deployment with a MetalLB VIP for syslog and SNMP, and a Deployment for OTLP traces. Plus why Telegraf earned a permanent seat at the table.
Nutanix Objects as the Storage Backend for Loki and Mimir

1 May, 2026

How we configured Nutanix Objects as the S3-compatible storage backend for Grafana Loki, Mimir, and Tempo — bucket architecture across two data centers, F5 GSLB for cross-DC HA, credential management with External Secrets Operator, and a dual-retention strategy that survives compactor mistakes.
Building an LGTM Observability Stack on Nutanix: Why We Did It and What It Looks Like

28 Apr, 2026

Architecture overview of a self-hosted LGTM observability stack (Loki, Grafana, Tempo, Mimir) running on Nutanix — why we built it with zero software budget on recycled hardware, what each component does, and the honest trade-offs five months in.