NEW v0.9 beta — now with streaming ingest

your data,
all in one pond.

the datalake for the home — unify smart sensors, cameras, energy meters, and everything else into a single queryable lake. no cloud. no subscriptions. just data.

Install Now Explore Features →

pond — ingest

$ pond init --path ~/pond-data

✓ created lake at /home/user/pond-data

✓ registered 3 sources: thermostats, cameras, power-meters

$ pond ingest --all

⟡ streaming 1.2M events/day from 47 devices

⟡ parquet partitions rotating every 15min

$ pond query "SELECT avg(temp) FROM thermostats WHERE ts > now() - interval '1 hour'"

→ 72.4°F

Features

built for the data-rich home

everything you need to collect, store, query, and visualize your home data — without sending it to someone else's cloud.

⚡

streaming ingest

real-time ingestion from MQTT, Zigbee, Z-Wave, HomeAssistant, and custom APIs. zero-config auto-discovery for 200+ device types.

🪣

columnar storage

data lands in Apache Parquet on local disk — compressed, partitioned, and ready for analytical queries. no external DB needed.

🔍

SQL on your stuff

full SQL engine (DataFusion-powered) runs directly against your lake. joins across device types, time-range filters, aggregates — it just works.

🔒

fully local, fully yours

zero outbound telemetry. your data never leaves your network. runs on a Pi, an N100 mini-PC, or that old laptop collecting dust.

📊

built-in dashboards

point your browser at localhost:9900 and get live dashboards. no Grafana config hell. drag, drop, done.

🔌

plugin ecosystem

write custom sources, transforms, and sinks in Python or Rust. hot-reload them without restarting your lake.

How It Works

three commands. that's it.

no infrastructure to provision. no schemas to define. pond figures out your data and makes it queryable.

initialize your lake

run pond init to create a datalake on any directory. pond auto-detects device sources on your local network and sets up streaming connections.

ingest everything

pond ingest --all starts collecting from every discovered source. data streams into columnar Parquet files, partitioned by time and device type. compaction runs automatically.

query & visualize

pond query opens an interactive SQL shell. or hit the dashboard at :9900. cross-device joins, time windows, anomaly detection — all local, all fast.

Developer API

first-class code experience

pond isn't just a tool — it's a platform. embed it, script it, extend it.

import pond

# connect to your local lake
lake = pond.connect("~/pond-data")

# query with SQL
df = lake.sql("""
  SELECT device_id, avg(watts) AS power
  FROM energy_meters
  WHERE ts > now() - interval '1 hour'
  GROUP BY device_id
  ORDER BY power DESC
""")

# or use the dataframe API
readings = lake.table("thermostats") \
  .filter(lambda r: r.temp > 80) \
  .select(["device_id", "temp", "ts"])

# register a custom source
@pond.source("my-sensor")
async def read_sensor():
    yield {"pm25": read_air_quality(), "ts": now()}
    

your data,all in one pond.