v1.0 — now with pipeline orchestration

A Datalake
for the Home

Collect, store, and query every data point in your life. Sensor readings, media libraries, finances, health metrics — all in one beautifully simple CLI.

$ brew install pond

Get Started →

pond — home datalake

# initialize a new datalake in your home directory
$ pond init --name my-home
✓ Datalake created at ~/.pond/my-home

# ingest sensor data from Home Assistant
$ pond ingest --source hass --format json
✓ Ingested 2,847 records in 0.3s

# query temperature over the last 7 days
$ pond query --from -7d "sensor.temperature > 72"
{ "avg": 73.4, "max": 78.1, "min": 68.2, "rows": 1,204 }
3,201 results (0.02s)

Features

Everything you need.
Nothing you don't.

pond is a single binary with zero dependencies. Ships with built-in connectors, a query engine, and a pipeline orchestrator.

⚡

Blazing Fast Ingest

Stream millions of records per second. Automatic schema detection, compression, and partitioning — zero configuration needed.

🔌

20+ Built-in Connectors

Home Assistant, Grafana, Prometheus, Tesla API, Withings, Oura, Fitbit, Stripe, Plaid, YouTube, Plex, and more. Just point and ingest.

🔐

Local-First & Encrypted

Your data never leaves your machine unless you want it to. AES-256 encryption at rest, with optional cloud sync to any S3-compatible backend.

🧠

SQL Query Engine

Full SQL support with window functions, CTEs, and vector search. Query years of data in milliseconds. Export to CSV, JSON, or Parquet.

🔁

Pipeline Orchestration

Define ingest → transform → alert pipelines in YAML. Schedule with cron syntax. Get Slack/Discord notifications on anomalies.

📊

Beautiful CLI Output

Auto-generated sparklines, histograms, and summary stats. Because staring at raw numbers shouldn't be the only option.

How it Works

Three commands. Endless insight.

From zero to queried in under 60 seconds. No servers, no config files, no yak shaving.

Initialize your datalake

Run pond init and point it at your data sources. pond auto-discovers schemas and sets up a columnar store optimized for time-series analytics.

Ingest continuously

Use pond ingest to pull data on a schedule or stream it in real-time. Backfills historical data automatically with deduplication.

Query and build

Write SQL against your entire life's data. Build dashboards, set alerts, or pipe results into your own apps via the REST API.

Pipeline Example

Define powerful data pipelines in YAML

Set up automated ingestion, transformation, and alerting with a simple config file. pond handles scheduling, retries, and notifications.

Declarative YAML configuration
Built-in data transformations
Slack, Discord, email alerts
Cron-based scheduling
Automatic retries with backoff

# ~/.pond/pipelines/energy.yaml
source:
  type: hass
  url: http://homeassistant:8123
  entities:
      - sensor.grid_consumption
      - sensor.solar_production

transform:
  sql: >
        SELECT time_bucket('1h', ts) as hour,
        sum(solar_production) -
        sum(grid_consumption) as net_energy
        FROM raw GROUP BY 1

alert:
  channel: slack
  condition: net_energy < -5.0
  message: "High energy deficit!"

schedule: "0 */6 * * *"

A Datalake
for the Home

12k+

50M+

0ms

100%

Everything you need.
Nothing you don't.

Blazing Fast Ingest

20+ Built-in Connectors

Local-First & Encrypted

SQL Query Engine

Pipeline Orchestration

Beautiful CLI Output

Three commands. Endless insight.

Initialize your datalake

Ingest continuously

Query and build

Define powerful data pipelines in YAML

Ready to dive in?

A Datalakefor the Home

12k+

50M+

0ms

100%

Everything you need.Nothing you don't.

Blazing Fast Ingest

20+ Built-in Connectors

Local-First & Encrypted

SQL Query Engine

Pipeline Orchestration

Beautiful CLI Output

Three commands. Endless insight.

Initialize your datalake

Ingest continuously

Query and build

Define powerful data pipelines in YAML

Ready to dive in?

A Datalake
for the Home

Everything you need.
Nothing you don't.