NEW v0.9 beta — now with streaming ingest

your data,
all in one pond.

the datalake for the home — unify smart sensors, cameras, energy meters, and everything else into a single queryable lake. no cloud. no subscriptions. just data.

Install Now Explore Features →
pond — ingest
$ pond init --path ~/pond-data
✓ created lake at /home/user/pond-data
✓ registered 3 sources: thermostats, cameras, power-meters
$ pond ingest --all
⟡ streaming 1.2M events/day from 47 devices
⟡ parquet partitions rotating every 15min
$ pond query "SELECT avg(temp) FROM thermostats WHERE ts > now() - interval '1 hour'"
→ 72.4°F
$
Features

built for the data-rich home

everything you need to collect, store, query, and visualize your home data — without sending it to someone else's cloud.

streaming ingest

real-time ingestion from MQTT, Zigbee, Z-Wave, HomeAssistant, and custom APIs. zero-config auto-discovery for 200+ device types.

🪣

columnar storage

data lands in Apache Parquet on local disk — compressed, partitioned, and ready for analytical queries. no external DB needed.

🔍

SQL on your stuff

full SQL engine (DataFusion-powered) runs directly against your lake. joins across device types, time-range filters, aggregates — it just works.

🔒

fully local, fully yours

zero outbound telemetry. your data never leaves your network. runs on a Pi, an N100 mini-PC, or that old laptop collecting dust.

📊

built-in dashboards

point your browser at localhost:9900 and get live dashboards. no Grafana config hell. drag, drop, done.

🔌

plugin ecosystem

write custom sources, transforms, and sinks in Python or Rust. hot-reload them without restarting your lake.

How It Works

three commands. that's it.

no infrastructure to provision. no schemas to define. pond figures out your data and makes it queryable.

1

initialize your lake

run pond init to create a datalake on any directory. pond auto-detects device sources on your local network and sets up streaming connections.

2

ingest everything

pond ingest --all starts collecting from every discovered source. data streams into columnar Parquet files, partitioned by time and device type. compaction runs automatically.

3

query & visualize

pond query opens an interactive SQL shell. or hit the dashboard at :9900. cross-device joins, time windows, anomaly detection — all local, all fast.

Developer API

first-class code experience

pond isn't just a tool — it's a platform. embed it, script it, extend it.

import pond # connect to your local lake lake = pond.connect("~/pond-data") # query with SQL df = lake.sql(""" SELECT device_id, avg(watts) AS power FROM energy_meters WHERE ts > now() - interval '1 hour' GROUP BY device_id ORDER BY power DESC """) # or use the dataframe API readings = lake.table("thermostats") \ .filter(lambda r: r.temp > 80) \ .select(["device_id", "temp", "ts"]) # register a custom source @pond.source("my-sensor") async def read_sensor(): yield {"pm25": read_air_quality(), "ts": now()}
200+
Device Integrations
0ms
Cloud Latency
~50MB
Binary Size
1.2M
Events/sec Ingest

start building your pond

one command to install. zero config to start. your home data has been waiting for this.

curl -fsSL https://get.pond.dev | sh
View on GitHub Read the Docs →