Skip to main content

Observability stack

Infrahub ships an observability stack that runs alongside Infrahub:

  • Grafana Alloy — collects logs and metrics
  • Loki — log storage
  • Prometheus — metric storage
  • Tempo — distributed tracing
  • Grafana — dashboards and visualization
  • Prefect exporter — task-manager metrics

You can deploy it with Docker Compose (Infrahub Enterprise) or on Kubernetes with the infrahub-observability Helm chart. Grafana comes provisioned with pre-built dashboards and data sources for Infrahub, Neo4j, RabbitMQ, and Prefect.

infrahub-observability Helm charthttps://github.com/opsmill/infrahub-helm/tree/stable/charts/infrahub-observability

Docker Compose​

The Infrahub Enterprise Docker Compose deployment can include the observability stack. Add the ?observability=true query parameter when you fetch the compose file:

curl "https://infrahub.opsmill.io/enterprise?observability=true" > docker-compose.yml
docker compose -p infrahub up -d

It combines with other parameters such as a sizing preset:

curl "https://infrahub.opsmill.io/enterprise?size=small&observability=true" > docker-compose.yml

Once running, Grafana is available at http://localhost:3500 with the default credentials admin / admin.

Enable request tracing

Infrahub can export OpenTelemetry traces so you can follow a single request as it moves across the API server, task workers, and the database. Traces are sent to the bundled Tempo instance and surfaced in Grafana under the Tempo data source. To enable it, add the following to a .env file alongside the compose file:

INFRAHUB_TRACE_ENABLE=true
INFRAHUB_TRACE_EXPORTER_TYPE=otlp
INFRAHUB_TRACE_EXPORTER_PROTOCOL=grpc
INFRAHUB_TRACE_EXPORTER_ENDPOINT=http://infrahub-tempo:4317
INFRAHUB_TRACE_INSECURE=true

For the full Infrahub Enterprise Compose install, see the Enterprise install guide.

Kubernetes with Helm​

On Kubernetes the stack is the infrahub-observability Helm chart, which wraps the upstream Grafana and Prometheus community charts. Deploy it bundled with your Infrahub release or as a standalone release.

Prerequisites​

  • A Kubernetes cluster (version 1.24 or later)
  • Helm (version 3 or later) installed on your system
  • A persistent volume provisioner in the cluster. Loki, Prometheus, Tempo, and Grafana enable persistence by default

The stack installs and runs on its own. To collect Infrahub's own metrics and Prefect task data, install it in the same namespace as an Infrahub or Infrahub Enterprise release (or set global.infrahubReleaseName when the release name differs). Without a reachable Infrahub, the stack still starts, but the Prefect exporter stays unhealthy and the Infrahub scrape targets report no data.

Deploy​

Deploy the stack either bundled with your Infrahub release or as its own release.

Bundled with Infrahub​

The infrahub and infrahub-enterprise charts package the observability stack as a subchart, gated by infrahub-observability.enabled (default false). Enable it in your Infrahub values to deploy the stack as part of the same release and namespace — no separate install.

For the infrahub (Community) chart:

# infrahub values
infrahub-observability:
enabled: true

For the infrahub-enterprise chart, nest the key under infrahub:

# infrahub-enterprise values
infrahub:
infrahub-observability:
enabled: true

Apply the change with helm upgrade on your Infrahub release. Enabling the bundled stack also turns tracing on automatically: Infrahub emits traces to the bundled Tempo at <release>-tempo:4317 (where <release> is your Infrahub release name) with no further configuration.

Standalone release​

To manage the stack on its own release lifecycle — for example, to add it next to an existing Infrahub release without changing that release — install the chart directly. The --create-namespace flag creates the target namespace if it does not already exist; drop it when installing into the namespace where Infrahub already runs:

helm install obs oci://registry.opsmill.io/opsmill/chart/infrahub-observability --version 0.1.0 -n infrahub --create-namespace

The release name (obs above) prefixes the service names referenced throughout this guide.

A standalone release does not wire Infrahub's tracing for you. To send traces to its Tempo, set global.tracing on the Infrahub release:

# infrahub values
global:
tracing:
enabled: true
endpoint: "obs-tempo:4317"
protocol: grpc
insecure: true

Services and access​

Every component is exposed through a ClusterIP service, reachable only from inside the cluster:

ServicePurpose
<release>-grafanaGrafana UI
<release>-lokiLog storage (Alloy pushes logs here)
<release>-prometheus-serverMetric storage (Alloy remote-writes here)
<release>-tempoTrace storage (OTLP receiver on port 4317)
<release>-prometheus-node-exporterHost metrics
<release>-infrahub-observability-prefect-exporterPrefect metrics (scraped by Alloy on port 8000)

<release> is your Infrahub release name when bundled, or the observability release name (obs above) when standalone. Run kubectl get svc -n infrahub to list the exact service names for your release.

The log, metric, and trace services only need to be reachable from inside the cluster, so keep them as ClusterIP. Grafana is the only component intended for people to browse.

To open Grafana, forward its service to your machine (replace obs with your Infrahub release name if you bundled the stack):

kubectl port-forward svc/obs-grafana 3000:80 -n infrahub

Then browse to http://localhost:3000 and sign in with the default credentials admin / admin.

To reach Grafana without port-forwarding, set grafana.service.type to LoadBalancer or NodePort, or enable an ingress with grafana.ingress.enabled.

warning

Change the default Grafana credentials before exposing it outside the cluster. Set grafana.admin.existingSecret to a secret you manage rather than relying on the default admin / admin.

Configure the stack​

The component values below set the size, retention, and exposure of each part of the stack:

# Turn off components you don't need
tempo:
enabled: false # disable tracing

# Persistence and retention
loki:
singleBinary:
persistence:
size: 20Gi
prometheus:
server:
retention: 7d
persistentVolume:
size: 50Gi

# Expose Grafana through an ingress and use a managed admin secret
grafana:
ingress:
enabled: true
admin:
existingSecret: grafana-admin

Where you place these values depends on how you deploy:

  • Standalone — set them at the top level of the observability values.yml.
  • Bundled — nest the same blocks under infrahub-observability: (or infrahub.infrahub-observability: for Enterprise) in your Infrahub values:
# infrahub values (bundled)
infrahub-observability:
tempo:
enabled: false
grafana:
ingress:
enabled: true
global values are shared, not nested

global.* values are Helm global values, shared across Infrahub and the bundled subchart, so they always stay at the top level of your values — never nest them under infrahub-observability:. The same applies to the Infrahub chart's global.tracing. When bundled, the stack resolves the Infrahub release name and namespace from the release itself, so global.infrahubReleaseName and global.infrahubNamespace are only needed for a standalone release that points at a separately named or located Infrahub.

OptionDefaultDescription
<component>.enabledtrueToggle any of alloy, loki, tempo, grafana, prometheus, prometheus-node-exporter, prefectExporter
component persistence sizeLoki 10Gi, Tempo 10Gi, Prometheus 20Gi, Grafana 5GiPersistent volume size per component
loki.loki.limits_config.retention_period24hLog retention
tempo.tempo.retention, prometheus.server.retention96hTrace and metric retention
grafana.service.type, grafana.ingress.enabledClusterIP, falseHow Grafana is exposed
grafana.adminPassword, grafana.admin.existingSecretadminGrafana credentials
alloy.cadvisor.enabledtrueScrape per-container metrics from the kubelet cAdvisor endpoint. Disable where cluster policy forbids nodes/proxy access
tempo.tempo.metricsGenerator.enabledfalseGenerate request metrics from spans. Requires tempo.tempo.metricsGenerator.remoteWriteUrl
prefectExporter.enabledtrueDeploy the Prefect metrics exporter
global.infrahubReleaseName, global.infrahubNamespaceinfrahub, release namespaceStandalone only (top-level, never nested): point the release at a separately named or located Infrahub. Auto-resolved when bundled

For the complete list of values, see the chart's values.yaml.

infrahub-observability values.yamlhttps://github.com/opsmill/infrahub-helm/blob/stable/charts/infrahub-observability/values.yaml

Verify the deployment​

Check that the stack's pods are running:

kubectl get pods -n infrahub

You should see an Alloy pod on each node (deployed as a DaemonSet), single-instance Loki, Tempo, Prometheus, and Grafana pods, a node exporter on each node, and the Prefect exporter. The Prefect exporter only becomes healthy once it can reach the Infrahub task manager, so deploy it alongside an Infrahub release. Once Grafana is reachable, open it and confirm that the Infrahub dashboards render and that the Prometheus and Loki data sources connect successfully.

To upgrade the stack later, see Upgrade the observability stack.