# Design & Historical Implementation Plan

This document preserves the initial phased implementation plan and design considerations for `pgproto`.

## 🏗️ Architecture (Historical)

### 1. Internal Storage
Protobuf messages are binary. We store them internally using a Postgres `varlena` (variable length) structure.

```c
typedef struct {
    int32 length;      // Total size including this header
    char  data[1];     // Serialized Protobuf bytes
} ProtobufData;
```

### 2. Schema Registry (Dynamic Reflection)
To understand what fields are in a binary blob, the extension needs the schema. We will use the **Schema-Registered** model.

1.  **Registry Table:** A system table (or extension-owned table) will store `FileDescriptorSet` blobs generated by `protoc`.
2.  **Caching (Shared/Session Memory):** To avoid parsing the schema on every row access, we will cache parsed descriptors in a hash table using Postgres' `TopMemoryContext` for session duration.

---

## 📅 Phased Implementation Plan

### Phase 0: Toolchain Setup (Docker)
Establish the development environment inside an isolated Docker container to avoid polluting the host machine.
-   **Base Environment:** A `Dockerfile` based on the official `postgres:18` image (Latest Stable).
-   **System Dependencies:** `build-essential`, `postgresql-server-dev-18`, `libprotobuf-c-dev`, `protobuf-c-compiler`.

### Phase 1: Varlena Infrastructure & Field-Tag Extraction
Establish the custom type and the C build environment.
-   **Files Requirements:** `pgproto.control`, `Makefile` (PGXS), `pgproto--1.0.sql`, `pgproto.c`.
-   **Internal Custom Type:** `protobuf` tracking a Varlena structure (`vl_len_` and `vl_dat`).
-   **I/O Handlers:** `protobuf_in` and `protobuf_out` using Hex encoding.
-   **Target Functions:** `pb_get_int32(protobuf, tag_number)`.

### Phase 2: Schema Registry & Dynamic Reflection
Transition from hardcoded tag numbers to named query paths.
-   **Schema Table:** `pb_schemas` storing `FileDescriptorSet` binary blobs.
-   **Caching Architecture:** Cache parsed descriptors in a session-wide hash table (`TopMemoryContext`) to prevent parsing on every row fetch.
-   **Target Functions:** `pb_get_string(protobuf, 'schema_name.MessageName', 'field.subfield')`.

### Phase 3: Optimizations & Lazy Parsing
Improve performance of reading large protobuf messages.
-   **Core Logic:** Instead of full deserialization, skip byte-streams of unrelated tags. Use `protobuf-c` pointer skipping or raw wire format tag jumps.

### Phase 4: Query Polish (TOAST, Operators)
Bridge developer ergonomics.
-   **TOAST Support:** Mark storage as `extended` so Postgres automatically compresses large protobuf messages out-of-line.
-   **Operators:** Shorthand syntaxes like `protobuf -> 'field'` and `protobuf #> '{path,to_field}'`.

### Phase 5: Purge JSONB (Strict Native Purity)
The final objective of zero JSONB reliance.
-   **Removals:** Strip any `pb_to_jsonb` utilities or internal `jsonb` conversion pathways used as bridges.
-   **Custom Indexing:** Implement direct indexing using custom C operator classes rather than relying on JSONB indices.

---

## 💻 API Draft (Initial)

### Custom Types
-   `protobuf`: The custom type for storing serialized bytes.

### Functions
-   `pb_to_jsonb(protobuf, text schema_name)` returns `jsonb`
-   `pb_get_string(protobuf, text schema_name, text path)` returns `text`
-   `pb_get_int(protobuf, text schema_name, text path)` returns `int4`

### Operators
-   `protobuf -> path` (Shorthand for extraction).