Three operating models for the same problem: where you sit on the build-vs-buy spectrum.
Pick the cloud-native bundle (CloudWatch, Azure Monitor, GCP Cloud Monitoring, OCI Observability) when you live mostly inside one hyperscaler and want zero operational overhead — you trade vendor lock-in for time-to-value. Pick a managed platform (Datadog, Grafana Cloud, New Relic, Honeycomb, Dynatrace) when you need a unified view across multiple clouds, high cardinality, or top-tier UX, and the per-host pricing fits your budget. Pick a self-hosted OSS stack (Prometheus, Grafana stack, ELK, OpenSearch, Jaeger, VictoriaMetrics) when ingest volume makes per-host pricing punitive, when data must stay inside your perimeter, or when the team has the SRE bandwidth to operate it. The deciding factor is your team’s capacity to operate observability infrastructure versus the dollar cost of having someone else operate it.
| Criterion | Cloud-native | Managed platform | Self-hosted OSS |
|---|---|---|---|
| Who operates it | The hyperscaler | The vendor | You |
| Time to first dashboard | Minutes — already on by default | Hours — sign-up, agent install, dashboards | Days to weeks — Helm, storage, retention policy, alerting |
| Pricing shape | Per GB ingest + per metric — bundled in cloud bill | Per host / per million events / per GB — separate vendor bill | Underlying compute + storage cost; no licence fee on the OSS distribution |
| Cost at high cardinality | Punishing — custom metric pricing scales linearly | Punishing — most vendors price per cardinality dimension | Best — VictoriaMetrics, Mimir, ClickHouse-backed stacks designed for it |
| Multi-cloud unified view | Limited — each cloud silo | Yes — the headline reason most teams choose this | Yes — you ship telemetry from anywhere |
| Vendor lock-in | High — telemetry tied to one cloud | Medium — OTel mitigates if you instrument that way | Low — you own the data, choose the storage |
| Data sovereignty | Stays in the cloud region you chose | Goes to the vendor — verify their data-residency posture | Stays wherever you run it |
| Ops headcount required | Zero dedicated | ~0.25 FTE — agent maintenance, dashboard hygiene | 1–3 FTE for a serious deployment — storage, scaling, upgrades, on-call |
| Best for | Single-cloud teams, simple workloads, lean ops | Multi-cloud teams, growth-stage, premium UX needs | Large scale, regulated environments, OSS-culture teams |
In practice, most production teams run two of the three: cloud-native for the always-on basics (CloudWatch alarms on managed services, GKE metrics) plus a managed platform OR a self-hosted stack for the application layer. The decision is rarely either-or — it’s about where the application telemetry goes once you instrument it with OpenTelemetry. The cloud-native layer is almost free with the cloud bill; the question is what you bolt on top.