# pgorca

[![build](https://github.com/quantumiodb/pgorca/actions/workflows/build.yml/badge.svg?branch=main)](https://github.com/quantumiodb/pgorca/actions/workflows/build.yml)

A PostgreSQL 18 extension that plugs the **ORCA query optimizer** (originally from Greenplum / Apache Cloudberry) into a standard single-node PostgreSQL instance.

## Overview

ORCA is a cost-based, rule-driven query optimizer that operates on an intermediate representation called DXL (Data eXchange Language). It was designed for massively-parallel processing (MPP) databases but contains a powerful optimization engine that is useful in single-node mode as well.

This project extracts ORCA's four core libraries and the PostgreSQL integration layer from Apache Cloudberry, adapts them for PG18, and packages the result as a `CREATE EXTENSION`-installable plugin.

### Components

| Directory | Description |
|-----------|-------------|
| `libgpos/` | ORCA memory pool, error handling, concurrency primitives |
| `libnaucrates/` | DXL parser/serializer, metadata abstractions |
| `libgpopt/` | Core optimizer: search, transformation rules, cost model |
| `libgpdbcost/` | GPDB-specific cost model implementation |
| `gpopt/` | PostgreSQL ↔ DXL translation layer (relcache, planner bridge) |
| `compat/` | Stub headers replacing MPP-only Cloudberry types |
| `pg_orca.cpp` | Extension entry point, `planner_hook`, GUC definitions |

## Requirements

- PostgreSQL 18 (commit 8a431b6d)
- [xerces-c](https://xerces.apache.org/xerces-c/) (XML parsing for DXL)
- CMake ≥ 3.16
- C++17 compiler (clang or gcc)

On macOS with Homebrew:
```bash
brew install xerces-c cmake
```

## Build & Install

### 1. Locate pg_config

CMake needs `pg_config` from your PostgreSQL 18 installation to find headers,
libraries, and install paths. Either add the PG `bin/` directory to `PATH`:

```bash
export PATH="/path/to/pg18/bin:$PATH"
# verify
pg_config --version   # should print "PostgreSQL 18.x"
```

Or pass it explicitly on the CMake command line with `-DPG_CONFIG=...` (see below).

### 2. Configure & Build

Two generators are supported. Pick whichever suits your environment.

#### CMake (Unix Makefiles — no extra tools needed)

```bash
mkdir build && cd build

# pg_config is on PATH
cmake .. -DCMAKE_BUILD_TYPE=Debug

# pg_config NOT on PATH — pass it explicitly
cmake .. -DPG_CONFIG=/path/to/pg18/bin/pg_config -DCMAKE_BUILD_TYPE=Debug
```

```bash
cmake --build . --target install -j$(nproc)
```

To rebuild after source changes:

```bash
cmake --build build -j$(nproc)
```

#### Ninja (faster incremental builds)

```bash
mkdir build && cd build
cmake .. -DPG_CONFIG=/path/to/pg18/bin/pg_config -DCMAKE_BUILD_TYPE=Debug -GNinja
```

```bash
ninja -j$(nproc)
ninja install
```

To rebuild after source changes:

```bash
ninja -j$(nproc) -C build
```

### Release build

Replace `Debug` with `Release` in either generator. Example with Ninja:

```bash
mkdir build-release && cd build-release
cmake .. -DPG_CONFIG=/path/to/pg18/bin/pg_config -DCMAKE_BUILD_TYPE=Release -GNinja
ninja -j$(nproc)
ninja install
```

Or with plain CMake:

```bash
mkdir build-release && cd build-release
cmake .. -DPG_CONFIG=/path/to/pg18/bin/pg_config -DCMAKE_BUILD_TYPE=Release
cmake --build . --target install -j$(nproc)
```

Release mode enables `-O3 -DNDEBUG` and disables ORCA internal assertions (`GPOS_DEBUG`).

## Usage

```sql
-- 1. Install the extension in the target database.  This LOADs the
--    shared library into the current session so pg_orca.* GUCs and
--    the planner_hook are live immediately:
CREATE EXTENSION pg_orca;

-- 2. (Recommended) Have every new connection to this database
--    auto-load pg_orca.  Per-database scope; no restart, no cluster
--    GUC change, takes effect for subsequent connections:
ALTER DATABASE mydb SET session_preload_libraries = 'pg_orca';

-- 3. Enable ORCA per session (or persistently via
--    ALTER DATABASE mydb SET pg_orca.enable_orca = on):
SET pg_orca.enable_orca = on;

-- 4. Run a query — ORCA optimizes it.
EXPLAIN SELECT * FROM t WHERE id > 100;
```

Existing sessions are unaffected by step 2 until they reconnect.  If
ORCA cannot handle a query (unsupported feature or internal error) it
falls back to the standard PostgreSQL planner automatically.

Alternative scopes for the preload setting:

```sql
-- Cluster-wide (every database, every role):
ALTER SYSTEM SET session_preload_libraries = 'pg_orca';
SELECT pg_reload_conf();

-- Single role only:
ALTER ROLE bench SET session_preload_libraries = 'pg_orca';
```

`ALTER DATABASE ... SET` and `ALTER SYSTEM SET` overwrite the value;
if a sibling library was already present (`pg_stat_statements`, etc.),
include it explicitly:

```sql
ALTER DATABASE mydb SET session_preload_libraries = 'pg_orca,pg_stat_statements';
```

Roll back:

```sql
ALTER DATABASE mydb RESET session_preload_libraries;
DROP EXTENSION pg_orca;
```

## GUC Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `pg_orca.enable_orca` | `off` | Enable ORCA optimizer |
| `pg_orca.trace_fallback` | `off` | Log a message on fallback to standard planner |
| `optimizer_segments` | `1` | Number of segments for cost estimation |
| `optimizer_sort_factor` | `1.0` | Cost scaling factor for sort operations |
| `optimizer_metadata_caching` | `on` | Cache relation metadata between calls |
| `optimizer_mdcache_size` | `16384` | Metadata cache size (KB) |
| `optimizer_search_strategy_path` | `""` | Path to custom search strategy XML (empty = built-in) |


## Testing

See **[testing.md](testing.md)** for the full guide. Quick start:

```bash
export PG_CONFIG=/Users/jianghua/pg-install/bin/pg_config
export PG_REGRESS_SQL=/Users/jianghua/code/postgresql/src/test/regress

# pg_orca's own regression tests
test/test.sh --orca-tests

# PostgreSQL standard suite with ORCA loaded (--ignore-plans suppresses plan-shape diffs)
test/test.sh --pg-tests --ignore-plans
```

## Architecture Notes

### MPP Stubs

Cloudberry's translation layer references many MPP-only types (`Motion`, `PlanSlice`, `GpPolicy`, `DynamicSeqScan`, etc.). These are stubbed in `compat/cdb/cdb_plan_nodes.h` so the code compiles. ORCA will never generate these nodes in single-node mode.

### GPDB GUCs

Many ORCA configuration knobs were GPDB-specific GUCs. They are re-defined as real GUCs in `pg_orca.cpp` under the `optimizer.*` prefix, so existing ORCA code referencing them continues to work.