v1.0 — now with pipeline orchestration

A Datalake
for the Home

Collect, store, and query every data point in your life. Sensor readings, media libraries, finances, health metrics — all in one beautifully simple CLI.

$ brew install pond

Get Started →
pond — home datalake
# initialize a new datalake in your home directory
$ pond init --name my-home
Datalake created at ~/.pond/my-home

# ingest sensor data from Home Assistant
$ pond ingest --source hass --format json
Ingested 2,847 records in 0.3s

# query temperature over the last 7 days
$ pond query --from -7d "sensor.temperature > 72"
{ "avg": 73.4, "max": 78.1, "min": 68.2, "rows": 1,204 }
3,201 results (0.02s)

12k+

GitHub Stars

50M+

Records Ingested Daily

0ms

Config Required

100%

Local-First

Everything you need.
Nothing you don't.

pond is a single binary with zero dependencies. Ships with built-in connectors, a query engine, and a pipeline orchestrator.

Blazing Fast Ingest

Stream millions of records per second. Automatic schema detection, compression, and partitioning — zero configuration needed.

🔌

20+ Built-in Connectors

Home Assistant, Grafana, Prometheus, Tesla API, Withings, Oura, Fitbit, Stripe, Plaid, YouTube, Plex, and more. Just point and ingest.

🔐

Local-First & Encrypted

Your data never leaves your machine unless you want it to. AES-256 encryption at rest, with optional cloud sync to any S3-compatible backend.

🧠

SQL Query Engine

Full SQL support with window functions, CTEs, and vector search. Query years of data in milliseconds. Export to CSV, JSON, or Parquet.

🔁

Pipeline Orchestration

Define ingest → transform → alert pipelines in YAML. Schedule with cron syntax. Get Slack/Discord notifications on anomalies.

📊

Beautiful CLI Output

Auto-generated sparklines, histograms, and summary stats. Because staring at raw numbers shouldn't be the only option.

Three commands. Endless insight.

From zero to queried in under 60 seconds. No servers, no config files, no yak shaving.

1

Initialize your datalake

Run pond init and point it at your data sources. pond auto-discovers schemas and sets up a columnar store optimized for time-series analytics.

2

Ingest continuously

Use pond ingest to pull data on a schedule or stream it in real-time. Backfills historical data automatically with deduplication.

3

Query and build

Write SQL against your entire life's data. Build dashboards, set alerts, or pipe results into your own apps via the REST API.

Define powerful data pipelines in YAML

Set up automated ingestion, transformation, and alerting with a simple config file. pond handles scheduling, retries, and notifications.

  • Declarative YAML configuration
  • Built-in data transformations
  • Slack, Discord, email alerts
  • Cron-based scheduling
  • Automatic retries with backoff
# ~/.pond/pipelines/energy.yaml
source:
  type: hass
  url: http://homeassistant:8123
  entities:
      - sensor.grid_consumption
      - sensor.solar_production

transform:
  sql: >
        SELECT time_bucket('1h', ts) as hour,
        sum(solar_production) -
        sum(grid_consumption) as net_energy
        FROM raw GROUP BY 1

alert:
  channel: slack
  condition: net_energy < -5.0
  message: "High energy deficit!"

schedule: "0 */6 * * *"

Ready to dive in?

Join thousands of home operators who are finally making sense of their data.