Infra Atlas · Decisions

Cloud-native, managed or self-hosted observability?

Three operating models for the same problem: where you sit on the build-vs-buy spectrum.

Reviewed
The verdict

Pick the cloud-native bundle (CloudWatch, Azure Monitor, GCP Cloud Monitoring, OCI Observability) when you live mostly inside one hyperscaler and want zero operational overhead — you trade vendor lock-in for time-to-value. Pick a managed platform (Datadog, Grafana Cloud, New Relic, Honeycomb, Dynatrace) when you need a unified view across multiple clouds, high cardinality, or top-tier UX, and the per-host pricing fits your budget. Pick a self-hosted OSS stack (Prometheus, Grafana stack, ELK, OpenSearch, Jaeger, VictoriaMetrics) when ingest volume makes per-host pricing punitive, when data must stay inside your perimeter, or when the team has the SRE bandwidth to operate it. The deciding factor is your team’s capacity to operate observability infrastructure versus the dollar cost of having someone else operate it.

Head to head
CriterionCloud-nativeManaged platformSelf-hosted OSS
Who operates itThe hyperscalerThe vendorYou
Time to first dashboardMinutes — already on by defaultHours — sign-up, agent install, dashboardsDays to weeks — Helm, storage, retention policy, alerting
Pricing shapePer GB ingest + per metric — bundled in cloud billPer host / per million events / per GB — separate vendor billUnderlying compute + storage cost; no licence fee on the OSS distribution
Cost at high cardinalityPunishing — custom metric pricing scales linearlyPunishing — most vendors price per cardinality dimensionBest — VictoriaMetrics, Mimir, ClickHouse-backed stacks designed for it
Multi-cloud unified viewLimited — each cloud siloYes — the headline reason most teams choose thisYes — you ship telemetry from anywhere
Vendor lock-inHigh — telemetry tied to one cloudMedium — OTel mitigates if you instrument that wayLow — you own the data, choose the storage
Data sovereigntyStays in the cloud region you choseGoes to the vendor — verify their data-residency postureStays wherever you run it
Ops headcount requiredZero dedicated~0.25 FTE — agent maintenance, dashboard hygiene1–3 FTE for a serious deployment — storage, scaling, upgrades, on-call
Best forSingle-cloud teams, simple workloads, lean opsMulti-cloud teams, growth-stage, premium UX needsLarge scale, regulated environments, OSS-culture teams
When to pick which

Pick cloud-native when

  • Your stack lives on one cloud and you have no plans to leave.
  • You want observability on by default with no extra contract or agent install.
  • The team is small and unwilling to dedicate any FTE to operating telemetry infrastructure.
  • Custom-metric and log-ingest volume is modest enough that the cloud’s pricing is acceptable.

Pick a managed platform when

  • You operate across multiple clouds and need a single pane of glass.
  • You need premium UX — session replay, BubbleUp, Watchdog AI, etc.
  • The business is willing to pay the per-host or per-event premium for time saved.
  • You instrument with OpenTelemetry from day one so you can swap vendors later.

Pick a self-hosted OSS stack when

  • Telemetry volume makes managed-platform pricing untenable (typically: very high cardinality, high log volume, or both).
  • Data must stay inside your perimeter for regulatory or contractual reasons.
  • You have SRE bandwidth to operate it and the team is comfortable with Helm, object storage, and storage-tier decisions.
  • You want the freedom to compose — Prometheus for metrics + Loki for logs + Tempo for traces, or whichever combination fits.
The hybrid reality

In practice, most production teams run two of the three: cloud-native for the always-on basics (CloudWatch alarms on managed services, GKE metrics) plus a managed platform OR a self-hosted stack for the application layer. The decision is rarely either-or — it’s about where the application telemetry goes once you instrument it with OpenTelemetry. The cloud-native layer is almost free with the cloud bill; the question is what you bolt on top.

Sources
  1. OpenTelemetry — vendor-neutral instrumentation — https://opentelemetry.io/docs/
  2. AWS CloudWatch pricing — https://aws.amazon.com/cloudwatch/pricing/
  3. Datadog pricing — https://www.datadoghq.com/pricing/
  4. Grafana Cloud pricing — https://grafana.com/pricing/
  5. Mimir — high-cardinality metrics — https://grafana.com/docs/mimir/latest/
  6. VictoriaMetrics — high-cardinality TSDB — https://docs.victoriametrics.com/
Related instruments