# PLAN: OFFSET Support (ORDER BY + LIMIT + OFFSET) **Status:** Complete — Core implementation done. Clippy clean. Upgrade SQL and CHANGELOG shipped in v0.2.2. **Effort:** ~4–8 hours (leverages existing TopK infrastructure) --- ## Problem Statement `OFFSET` is currently rejected in all defining-query contexts: ```sql -- ❌ Rejected today SELECT * FROM orders ORDER BY created_at DESC LIMIT 10 OFFSET 20 -- ❌ Also rejected SELECT * FROM orders ORDER BY created_at DESC OFFSET 10 ``` The original rationale (documented in the FAQ and PLAN_ORDER_BY_LIMIT_OFFSET) was that OFFSET "destroys boundary stability" — every source change can shift which rows fall inside the window. However, this argument applies to **delta-based incremental maintenance**, not to **scoped recomputation**. Since TopK already uses scoped recomputation (re-execute the full query and MERGE the result), adding OFFSET to the query has **zero additional complexity** in the refresh path. --- ## Analysis ### Why OFFSET was originally rejected The FAQ states: > TopK without OFFSET has a stable boundary: only rows that cross the N-th > position threshold change membership. OFFSET destroys this stability. This is true for delta-based IVM (Option D in PLAN_ORDER_BY_LIMIT_OFFSET — the overflow-buffer approach). But TopK was implemented as Option B — scoped recomputation — which doesn't reason about boundaries at all. It simply: 1. Re-executes `SELECT ... ORDER BY ... LIMIT N` against source tables 2. MERGEs the result into the stream table The MERGE is agnostic to what the query returns. Whether it's `LIMIT 10` or `LIMIT 10 OFFSET 20`, the flow is identical. ### Impact on future optimizations PLAN_ORDER_BY_LIMIT_OFFSET describes two future optimizations: | Optimization | Impact of OFFSET | |---|---| | **Option C — Boundary tracking** | With `LIMIT N` alone, you track one boundary (the N-th row's sort key). With `OFFSET M LIMIT N`, you need two boundaries (the M-th and M+N-th). This doubles the boundary-check logic but doesn't fundamentally break it. | | **Option D — Overflow buffer** | More complex: the overflow buffer must cover positions before the OFFSET too. Would need rethinking. Unlikely to be implemented for OFFSET. | Neither optimization is implemented or planned near-term. The scoped recomputation path handles OFFSET correctly with no additional cost. ### Three OFFSET patterns to consider | Pattern | Example | Bounded? | Useful? | |---|---|---|---| | **ORDER BY + LIMIT + OFFSET** | `ORDER BY score DESC LIMIT 10 OFFSET 20` | Yes (N rows) | Yes — "page" or "window" into ranked results | | **ORDER BY + OFFSET** (no LIMIT) | `ORDER BY score DESC OFFSET 10` | No (all rows minus M) | Marginal — large, unbounded result set | | **OFFSET without ORDER BY** | `OFFSET 10` | No | No — non-deterministic, already rejected | ### Recommendation Support **`ORDER BY + LIMIT + OFFSET`** (the "Paged TopK" pattern). Reject the other two: - **`ORDER BY + OFFSET` without LIMIT** materializes a potentially huge result set (total rows minus M). This negates the TopK benefit of bounded storage and bounded refresh cost. Reject with a clear message suggesting adding LIMIT. - **`OFFSET` without ORDER BY** remains non-deterministic and is already rejected. --- ## Use Cases | Use Case | Query Pattern | |---|---| | Paginated dashboards | `ORDER BY created_at DESC LIMIT 20 OFFSET 40` — "page 3" | | Excluding outliers | `ORDER BY score DESC LIMIT 50 OFFSET 5` — top 50, skipping top 5 | | Windowed leaderboards | `ORDER BY points DESC LIMIT 10 OFFSET 10` — "second tier" | | Cascading pages | Multiple stream tables with different OFFSETs for different pages | ### Caveat for users When source data changes, the "page" shifts — a row on page 3 may move to page 2 or 4. This is semantically correct (the stream table always reflects the current state of the page) but may surprise users who expect row stability. This should be documented clearly. --- ## Implementation Plan ### Step 1: Parser — accept ORDER BY + LIMIT + OFFSET **File:** `src/dvm/parser.rs` **Current behavior in `detect_topk_pattern()`:** ```rust if !select.limitOffset.is_null() { return Err(PgTrickleError::UnsupportedOperator( "OFFSET is not supported with LIMIT in defining queries. ..." )); } ``` **Change:** When ORDER BY + LIMIT + OFFSET are all present, extract the OFFSET value (must be a constant non-negative integer, same as LIMIT): ```rust let offset_value = if !select.limitOffset.is_null() { match unsafe { extract_const_int_from_node(select.limitOffset) } { Some(v) if v >= 0 => Some(v), Some(_) => return Err(PgTrickleError::UnsupportedOperator( "OFFSET must be a non-negative integer.".into() )), None => return Err(PgTrickleError::UnsupportedOperator( "OFFSET in defining queries must be a constant integer.".into() )), } } else { None }; ``` Extend `TopKInfo` to include the offset: ```rust pub struct TopKInfo { pub limit_value: i64, pub offset_value: Option, // NEW pub full_query: String, pub base_query: String, pub order_by_sql: String, } ``` **`reject_limit_offset()`** — currently rejects OFFSET unconditionally. Change to only reject OFFSET when ORDER BY + LIMIT are not present (the non-TopK path): ```rust // OFFSET without ORDER BY + LIMIT is not supported. // If ORDER BY + LIMIT + OFFSET → handled by detect_topk_pattern(). if !select.limitOffset.is_null() { // Only reject if this is not a TopK candidate. // TopK detection runs first and validates OFFSET there. return Err(PgTrickleError::UnsupportedOperator( "OFFSET is not supported in defining queries without ORDER BY + LIMIT. \ Use ORDER BY + LIMIT + OFFSET for paged TopK stream tables." .into(), )); } ``` Since `detect_topk_pattern()` runs before `reject_limit_offset()` in the API flow, and TopK tables bypass the DVM pipeline (which has its own reject_limit_offset in step 7), no additional changes are needed in the rejection flow. --- ### Step 2: Catalog — store offset metadata **File:** `src/catalog.rs` Add a new nullable column to `pgtrickle.pgt_stream_tables`: ```sql ALTER TABLE pgtrickle.pgt_stream_tables ADD COLUMN topk_offset INT; -- NULL = no OFFSET (same as OFFSET 0) ``` Add field to `StreamTableMeta`: ```rust pub topk_offset: Option, ``` Update `insert()`, `extract_meta()`, and all SELECT queries to include the new column. **Migration SQL** (for `pg_trickle--{current}--{next}.sql`): ```sql ALTER TABLE pgtrickle.pgt_stream_tables ADD COLUMN topk_offset INT; ``` --- ### Step 3: Refresh — include OFFSET in reconstructed query **File:** `src/refresh.rs` — `execute_topk_refresh()` Current query reconstruction: ```rust let topk_query = format!( "{} ORDER BY {} LIMIT {}", st.defining_query, topk_order_by, topk_limit ); ``` Change to: ```rust let topk_query = if let Some(offset) = st.topk_offset { format!( "{} ORDER BY {} LIMIT {} OFFSET {}", st.defining_query, topk_order_by, topk_limit, offset ) } else { format!( "{} ORDER BY {} LIMIT {}", st.defining_query, topk_order_by, topk_limit ) }; ``` No other refresh logic changes needed — the MERGE is agnostic to the source query shape. --- ### Step 4: API — pass offset through **File:** `src/api.rs` When calling `StreamTableMeta::insert()`, pass `topk_info.offset_value`: ```rust StreamTableMeta::insert( // ... existing args ... topk_info.as_ref().map(|t| t.limit_value as i32), topk_info.as_ref().map(|t| t.order_by_sql.as_str()), topk_info.as_ref().and_then(|t| t.offset_value.map(|v| v as i32)), // NEW // ... ); ``` --- ### Step 5: `strip_order_by_and_limit` → also strip OFFSET **File:** `src/dvm/parser.rs` The existing `strip_order_by_and_limit()` function removes ORDER BY and LIMIT from the query to produce the `base_query`. It must also strip OFFSET. Rename to `strip_order_by_limit_offset()` or extend the existing implementation. --- ### Step 6: Tests **E2E tests** (add to `tests/e2e_topk_tests.rs`): | Test | Scenario | |------|----------| | `test_topk_with_offset_create` | `ORDER BY score DESC LIMIT 10 OFFSET 5` — verify 10 rows stored, starting from position 6 | | `test_topk_with_offset_differential` | Insert/delete rows, verify page shifts correctly after refresh | | `test_topk_with_offset_zero` | `OFFSET 0` — equivalent to no offset | | `test_topk_offset_without_limit_rejected` | `ORDER BY x OFFSET 10` (no LIMIT) → error | | `test_topk_offset_without_order_by_rejected` | `LIMIT 10 OFFSET 5` (no ORDER BY) → error | | `test_topk_offset_non_constant_rejected` | `OFFSET (SELECT ...)` → error | | `test_topk_offset_negative_rejected` | `OFFSET -1` → error | | `test_topk_offset_with_aggregates` | `GROUP BY dept ORDER BY SUM(x) DESC LIMIT 5 OFFSET 2` | | `test_topk_offset_catalog_metadata` | Verify `topk_offset` stored correctly in catalog | | `test_topk_offset_fetch_syntax` | `FETCH NEXT 10 ROWS ONLY OFFSET 5` | **Unit tests** in `parser.rs`: - `detect_topk_pattern` with OFFSET returns correct `offset_value` - `strip_order_by_limit_offset` removes all three clauses --- ### Step 7: Documentation updates | File | Change | |------|--------| | `docs/FAQ.md` | Update "Why is OFFSET not supported with TopK?" → explain it's now supported. Add caveat about page shifting. | | `docs/SQL_REFERENCE.md` | Update create_stream_table docs to mention OFFSET support with TopK. | | `docs/DVM_OPERATORS.md` | Update line 12: `LIMIT / OFFSET` → clarify OFFSET is allowed with ORDER BY + LIMIT. | | `README.md` | Update the SQL support table row for LIMIT/OFFSET. | | `CHANGELOG.md` | Add entry for OFFSET support. | --- ## Task Checklist - [x] Parser: accept OFFSET in `detect_topk_pattern()` (Step 1) - [x] Parser: update `reject_limit_offset()` flow (Step 1) - [x] Parser: extend `strip_order_by_and_limit()` to strip OFFSET (Step 5) - [x] Parser: extend `TopKInfo` with `offset_value` field (Step 1) - [x] Catalog: add `topk_offset` column to schema DDL in `lib.rs` (Step 2) - [x] Catalog: update `StreamTableMeta` struct + CRUD (Step 2) - [x] Refresh: include OFFSET in query reconstruction (Step 3) - [x] API: pass offset through to catalog insert (Step 4) - [x] E2E tests: 8 new tests + 2 updated rejection tests (Step 6) - [x] Documentation: FAQ, SQL_REFERENCE, DVM_OPERATORS, README (Step 7) - [x] `cargo fmt -- --check` clean - [x] `just lint` (clippy) — clean as of v0.2.2 - [ ] `just test-e2e` — requires Docker E2E image build - [x] Upgrade migration SQL (`sql/pg_trickle--0.2.1--0.2.2.sql`) — topk_offset pre-provisioned in 0.2.1; 0.2.2 script handles function signature updates - [ ] Unit tests: parser tests for OFFSET handling in `parser.rs` `#[cfg(test)]` - [x] CHANGELOG entry — added in v0.2.2 --- ## What Remains (Prioritized) | # | Item | Priority | Notes | |---|------|----------|-------| | 1 | ~~**clippy validation**~~ | ~~P0~~ | ✅ Done — clean in v0.2.2 | | 2 | **E2E test run** | P0 | `just build-e2e-image && just test-e2e` to validate all 9 OFFSET tests | | 3 | ~~**Upgrade migration SQL**~~ | ~~P1~~ | ✅ Done — `topk_offset` column pre-provisioned in 0.2.1 | | 4 | ~~**CHANGELOG entry**~~ | ~~P1~~ | ✅ Done — added in v0.2.2 | | 5 | **Unit tests** | P2 | Parser-level tests for `detect_topk_pattern` with OFFSET; nice-to-have since E2E covers the full path | --- ## Complexity Assessment | Component | Effort | Risk | |---|---|---| | Parser changes | 1–2 hours | Low — extends existing TopK detection | | Catalog schema + migration | 1 hour | Low — single nullable column | | Refresh path | 30 min | Very low — one conditional format string | | API plumbing | 30 min | Low — pass-through | | Tests | 2–3 hours | Low — follows existing TopK test patterns | | Documentation | 1 hour | Low | | **Total** | **~4–8 hours** | **Low** | The low effort is because this is purely an extension of the existing TopK infrastructure. No new operators, no new refresh strategies, no query rewriting — just one more clause in the reconstructed query. --- ## What We're NOT Doing - **`ORDER BY + OFFSET` without LIMIT** — rejected because the result set is unbounded. Users should add LIMIT. - **Delta-based OFFSET maintenance** — not needed. Scoped recomputation handles it correctly. - **Boundary tracking optimization (Option C) for OFFSET** — deferred. Can be added later as a pure optimization. Would need to track two sort-key boundaries instead of one. - **IMMEDIATE mode + OFFSET** — rejected (same as TopK without OFFSET). Scoped recomputation is incompatible with within-transaction IVM. --- ## Further Reading - [PLAN_ORDER_BY_LIMIT_OFFSET.md](PLAN_ORDER_BY_LIMIT_OFFSET.md) — Original TopK plan and alternatives analysis - [FAQ: TopK section](../../docs/FAQ.md#topk-order-by--limit) — Current user documentation