# v0.15.0 — External Benchmarks, Bulk API, and dbt Hub Preparation > **Full technical details:** [v0.15.0.md-full.md](v0.15.0.md-full.md) **Status: ✅ Released** | **Scope: Medium** (~4 weeks) > Validation against the Nexmark streaming benchmark, a bulk API for > creating many stream tables at once, parser architecture improvements, > watermark hold-back for late-arriving data, and preparation for > listing on the dbt Hub. --- ## What problem does this solve? TPC-H validates analytical query correctness, but streaming workloads (event streams, IoT data, activity feeds) have different patterns. Nexmark is the industry-standard streaming benchmark. Operators managing dozens of stream tables needed a way to create them in bulk. And dbt users wanted to find and install the pg_trickle dbt package from the standard dbt Hub marketplace. --- ## Nexmark Streaming Benchmark **Nexmark** is a benchmark suite designed specifically for streaming systems, with queries modelling auction activity: bids, auctions, and persons. It tests patterns like: - Event-time windowing (aggregate over a sliding time window) - Join with late-arriving events (an auction result arriving after the bid) - Top-N per category (highest bids per auction) pg_trickle's differential engine is now validated against the Nexmark query set, demonstrating that it handles streaming event patterns — not just analytical batch queries. --- ## Bulk Create API `pgtrickle.create_stream_tables_from_json(definitions)` accepts a JSON array of stream table definitions and creates all of them in a single call. This is useful for: - Infrastructure-as-code deployments - dbt post-run hooks that create many stream tables from model definitions - Migration scripts that need to set up a complete stream table configuration --- ## Parser Modularisation The internal query parser and differential SQL generator — which analyses a SQL query and produces the incremental update logic — was split into four focused modules: - `types.rs` — the abstract syntax representation - `validation.rs` — checks whether a query is supported - `rewrites.rs` — SQL transformation passes - `sublinks.rs` — subquery extraction logic This makes the parser easier to extend and reduces the risk that a change to one aspect of the parser accidentally affects another. --- ## Watermark Hold-Back for Late-Arriving Data In streaming workloads, events sometimes arrive late — a sensor reading from 2 minutes ago arrives now. If the stream table for that time window has already been refreshed and "closed", the late event would be missed. **Watermark hold-back** allows you to configure a delay on a stream table's watermark, keeping the window open for late events. For example, a 5-minute hold-back means the stream table will not close a time window until 5 minutes after the window's end time, accommodating events up to 5 minutes late. --- ## Delta Cost Estimation A new cost estimator predicts how expensive a differential refresh will be *before* running it, by examining the change buffer size and the complexity of the defining query. AUTO mode uses this estimate to pre-emptively choose FULL refresh when the differential is predicted to be slower, rather than waiting to observe the actual performance. --- ## dbt Hub Preparation The dbt-pgtrickle package was prepared for submission to the **dbt Hub** — the official package registry for dbt. This includes package metadata, documentation, and integration tests that run as part of the dbt Hub certification process. --- ## ORM Integration Guides Documentation guides for using pg_trickle with common ORMs: - **SQLAlchemy** (Python) - **ActiveRecord** (Ruby / Rails) - **Diesel** (Rust) - **Prisma** (Node.js / TypeScript) Each guide shows how to query stream tables from the ORM and how to trigger refreshes from application code. --- ## Scope v0.15.0 broadens the validation coverage to streaming workloads (Nexmark), improves the ergonomics of bulk deployments, and prepares the dbt integration for the wider dbt ecosystem. The watermark hold-back feature addresses a fundamental challenge in streaming analytics: late-arriving data.