GreptimeDB + Grafana: Build a Modern Observability Stack from Scratch

Set up GreptimeDB as your time-series database and connect it to Grafana for real-time dashboards. A practical tutorial covering installation, data ingestion, PromQL queries, and production dashboards.

GreptimeDB + Grafana: Build a Modern Observability Stack from Scratch illustration
On this page9 sections

Most teams use Prometheus for metrics, but Prometheus was designed as a monitoring tool, not a database. It struggles with long-term storage, high cardinality, and horizontal scaling. GreptimeDB is a cloud-native time-series database that speaks PromQL natively — meaning you can replace Prometheus's storage with GreptimeDB and keep your existing Grafana dashboards and alerting rules unchanged.

What is GreptimeDB?

GreptimeDB is an open-source, cloud-native time-series database written in Rust. It's designed for metrics, logs, and events at scale. Think of it as "what Prometheus storage would look like if redesigned from scratch in 2024."

GreptimeDB vs Prometheus TSDB
Prometheus TSDB
💻Single-node only (no horizontal scaling)
15-day default retention
📊High cardinality = OOM crashes
🗃Local disk storage only
VS
GreptimeDB
💻Distributed, horizontally scalable
Unlimited retention (object storage)
📊Handles high cardinality well
S3/GCS/ADLS for long-term storage

Architecture

GreptimeDB + Grafana Stack
Grafana (Dashboards + Alerts)Visualise metrics, create dashboards, set up alert rules — connects via Prometheus data source
GreptimeDB (Query Engine + Storage)PromQL + SQL query interface. Stores data in columnar format. Handles aggregation.
Data Ingestion (Prometheus remote_write / OTLP / SQL)Accepts data via Prometheus remote write, OpenTelemetry, gRPC, or direct SQL INSERT
Data Sources (Your Services + Infrastructure)Application metrics, Kubernetes metrics, node metrics, custom business metrics

Step 1: Install GreptimeDB

# Option 1: Docker (quickest)
docker run -d --name greptimedb \
  -p 4000:4000 \
  -p 4001:4001 \
  -p 4002:4002 \
  -p 4003:4003 \
  greptime/greptimedb:latest standalone start

# Ports:
# 4000 = HTTP API (SQL, PromQL, writes)
# 4001 = gRPC (high-performance ingestion)
# 4002 = MySQL protocol (connect with mysql CLI!)
# 4003 = PostgreSQL protocol (connect with psql!)

# Verify it's running
curl http://localhost:4000/health
# {"status":"ok"}

# Option 2: Kubernetes (production)
helm repo add greptime https://greptimeteam.github.io/helm-charts/
helm install greptimedb greptime/greptimedb-standalone \
  --namespace monitoring --create-namespace

# Option 3: Binary
curl -fsSL https://raw.githubusercontent.com/GreptimeTeam/greptimedb/main/scripts/install.sh | sh
greptime standalone start

Step 2: Create Tables and Insert Data

# GreptimeDB speaks SQL! Connect via MySQL protocol:
mysql -h 127.0.0.1 -P 4002

-- Create a metrics table
CREATE TABLE IF NOT EXISTS http_requests (
    host STRING,
    method STRING,
    path STRING,
    status_code INT,
    duration_ms DOUBLE,
    ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    TIME INDEX (ts),
    PRIMARY KEY (host, method, path)
)
ENGINE = mito
WITH (
    'storage' = 'File'
);

-- Insert sample metrics
INSERT INTO http_requests (host, method, path, status_code, duration_ms, ts) VALUES
    ('api-1', 'GET', '/users', 200, 45.2, '2025-07-15 10:00:00'),
    ('api-1', 'POST', '/orders', 201, 120.5, '2025-07-15 10:00:01'),
    ('api-2', 'GET', '/users', 200, 38.7, '2025-07-15 10:00:02'),
    ('api-1', 'GET', '/users', 500, 5023.1, '2025-07-15 10:00:03');

-- Query with SQL
SELECT host, AVG(duration_ms) as avg_latency, COUNT(*) as requests
FROM http_requests
WHERE ts > '2025-07-15 00:00:00'
GROUP BY host
ORDER BY avg_latency DESC;

-- Query with PromQL (yes, SQL and PromQL in the same database!)
-- Via HTTP: curl 'http://localhost:4000/v1/promql?query=avg(duration_ms)&start=1721030400&end=1721116800&step=60'

Step 3: Connect Prometheus (Remote Write)

# prometheus.yml — Send metrics to GreptimeDB
global:
  scrape_interval: 15s

remote_write:
  - url: "http://greptimedb:4000/v1/prometheus/write"
    # GreptimeDB accepts standard Prometheus remote_write protocol
    # All your existing scrape configs continue working!

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'kube-state-metrics'
    static_configs:
      - targets: ['kube-state-metrics:8080']

  - job_name: 'my-app'
    static_configs:
      - targets: ['my-app:8080']
    metrics_path: '/metrics'

# Prometheus scrapes as usual, but writes to GreptimeDB for long-term storage
# GreptimeDB handles retention, compression, and querying

Step 4: Set Up Grafana

# Docker Compose: Full stack
version: '3.8'
services:
  greptimedb:
    image: greptime/greptimedb:latest
    command: standalone start
    ports:
      - "4000:4000"
      - "4001:4001"
      - "4002:4002"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana-data:/var/lib/grafana

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=1d'  # Short retention, GreptimeDB handles long-term

volumes:
  grafana-data:

# After starting:
# 1. Open Grafana: http://localhost:3000 (admin/admin)
# 2. Add Data Source:
#    - Type: Prometheus
#    - URL: http://greptimedb:4000/v1/prometheus
#    - Access: Server
#    - Save & Test
# 3. Import a dashboard or create your own!

Step 5: Build a Dashboard

# Grafana dashboard panels using PromQL (via GreptimeDB):

# Panel 1: Request rate
rate(http_requests_total[5m])

# Panel 2: P99 latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# Panel 3: Error rate
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m]))

# Panel 4: Top endpoints by latency
topk(10, avg by (path) (rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])))

# These are STANDARD PromQL queries.
# They work identically whether the backend is Prometheus or GreptimeDB.
# The difference: GreptimeDB can query months/years of data, Prometheus can't.

Why GreptimeDB Over Alternatives?

Time-Series Database Comparison
Feature GreptimeDB Prometheus InfluxDB TimescaleDB
LanguageRustGoGo + RustC (PostgreSQL)
Query languagesSQL + PromQLPromQL onlyInfluxQL + SQLSQL only
Horizontal scaleYesNoEnterprise onlyLimited
Object storageS3, GCS, ADLSLocal onlyCloud onlyLocal only
Grafana compatNative (Prometheus DS)NativeNativeNative (PostgreSQL DS)
LicenseApache 2.0Apache 2.0MIT (v3 proprietary)Apache 2.0

When to Use GreptimeDB

  • Long-term metrics storage: Prometheus for scraping, GreptimeDB for months/years of retention on S3.
  • High cardinality: Kubernetes labels, per-pod metrics, per-endpoint tracking — GreptimeDB handles it without OOM.
  • SQL on metrics: Join metrics with business data. "Show me latency for premium customers" — impossible with pure PromQL.
  • Multi-cloud observability: One GreptimeDB instance stores metrics from AWS, GCP, and on-prem — all queryable together.
  • IoT / edge: Lightweight Rust binary, low resource usage — runs on edge nodes and aggregates to cloud.

GreptimeDB is not a replacement for Prometheus — it's a complement. Prometheus scrapes. GreptimeDB stores and queries. Grafana visualises. Together, they form a modern observability stack that scales from a hobby project to a multi-region enterprise deployment.

Share this article

Stuck on implementation?

Get private, 1-on-1 help with system design, performance, scaling, or any technical challenge.

Book a Session

Related Production Resources

Course

Free learning tracks

Turn this guide into a structured production engineering path.

Lab

Interactive engineering labs

Practice the same ideas through scenario-based simulators.

Reference

Production cheatsheets

Keep the operational commands and checks nearby.

Glossary

Key terms

Review the vocabulary behind the architecture.

Discussion

Questions, corrections, or production notes? Add them here so other learners can benefit.

Continue Reading

Related practical guides from the same production engineering path.

DevOps 16 min read

Are DAGs Dying? The Rise of Declarative Data Pipelines

DAGs are not dying, but task-first orchestration is changing. Learn declarative data pipelines, asset graphs, data contracts, freshness policies, and when Airflow still fits.

Data Engineering DAGs
DevOps 20 min read

Scheduling Systems: How Kubernetes, Airflow, and Distributed Schedulers Place and Run Workloads

How real scheduling systems decide what runs where. Kubernetes scheduler internals, distributed cron, queue-based job orchestration with Airflow and Nomad, bin-packing and fairness algorithms, and the failure modes that determine whether your workloads survive node failure.

Kubernetes Scheduling