# Wiki JOB Report Structure Documentation This document describes the standardized structure for Join Order Benchmark (JOB) test reports published to the pg_track_optimizer wiki. The jo-bench.yml workflow must generate reports consistent with this document. ## Report File Naming Convention **Pattern**: `report-YYYY-MM-DD-HHMMSS.md` **Example**: `report-2026-01-05-084656.md` The timestamp corresponds to the test execution date/time in UTC format. ## Artifact Directory Structure All test artifacts and generated scripts are organized in subdirectories within the wiki repository: ``` wiki/ └── job-pass/ └── YYYY-MM-DD-HHMMSS/ ├── pg_track_optimizer_jobench_results_pass1.zip ├── pg_track_optimizer_jobench_results_pass2.zip ├── postgresql_log.zip └── schema.sql ``` **Example**: ``` wiki/job-pass/2026-01-05-084656/ ``` ### Artifact Files 1. **`pg_track_optimizer_jobench_results_pass1.zip`** - Dump the extension's statistical data to analyse later. It is copied to wiki's repository right after the JOB pass has finalised. - Contains: CSV file with Pass 1 results (without extra indexes) - Generated from: `/tmp/pg_track_optimizer_results_pass1.csv` 2. **`pg_track_optimizer_jobench_results_pass2.zip`** - Dump the extension's statistical data to analyse later. It is copied to wiki's repository right after the JOB pass has finalised. - Contains: CSV file with Pass 2 results (with extra indexes) - Generated from: `/tmp/pg_track_optimizer_results_pass2.csv` 3. **`postgresql_log.zip`** - Save EXPLAINs logged by the extension to match execution results. It is copied to wiki's repository right after the JOB pass has finalised. - Contains: PostgreSQL server log with EXPLAIN ANALYZE dumps - Source: `~/pgdata/logfile.log` - Each query execution includes queryId for matching to tracker records (this is provided by the extension's print format itself, no extra actions needed in workflow) 4. **`schema.sql`** - Auto-generated CREATE TABLE statement - An example of COPY command restoring data from a state dump. - Schema matches current `pg_track_optimizer` view definition - Used for importing CSV artifacts into analysis database ## Report Markdown Structure ### 1. Title ```markdown # Join Order Benchmark Test Results ``` **Note**: Use "Join Order Benchmark" (full name), not "JO-Bench" ### 2. Test Environment Section ```markdown ## Test Environment - **Test Date**: YYYY-MM-DD HH:MM:SS UTC - **PostgreSQL Version**: [full version string from SELECT version()] - **PostgreSQL Commit**: [``](https://github.com/postgres/postgres/commit/) - **Extension Commit**: [``](https://github.com/danolivo/pg_track_optimizer/commit/) - **Test Configuration**: Two-pass testing (Pass 1: no extra indexes, Pass 2: with [extra indexes](extra_indexing)) - **Benchmark Source**: [](https://github.com/danolivo/jo-bench/commit/) ``` **Fields**: - **Test Date**: UTC timestamp of test execution - **PostgreSQL Version**: Full output from `SELECT version()` - **PostgreSQL Commit**: Link to postgres/postgres commit (7-char short hash shown, full hash in URL) - **Extension Commit**: Link to pg_track_optimizer commit - **Test Configuration**: Standard two-pass description with link to `extra_indexing` wiki page - **Benchmark Source**: Link to jo-bench repository commit used for test data/queries ### 3. Results Section - Pass 1 ```markdown ## Top Queries by Error Metrics (Pass 1 - basic Join-Order-Benchmark) This table shows queries that appear in the top 10 for **all** error metrics (avg_avg, rms_avg, twa_avg, and wca_avg). These are the queries with consistently poor estimation across all criteria. **Note**: These results are from **Pass 1** (without extra indexes). ``` **Example**: ``` queryid | query | error | nj_min | nj_max ----------------------+---------------------+--------+--------+-------- 8437282107574130241 | /* 30c.sql */ | 4.35 | 3 | 3 -8317260399803614106 | /* 28c.sql */ | 4.07 | 4 | 4 -1799126654562256384 | /* 29c.sql */ | 3.74 | 2 | 5 3901602676749870732 | /* 7c.sql */ | 3.63 | 3 | 3 (4 rows) ``` ### 4. Results Section - Pass 2 ```markdown ## Top Queries by Error Metrics (Pass 2 - with extra indexes) This table shows queries that appear in the top 10 for **all** error metrics (avg_avg, rms_avg, twa_avg, and wca_avg). These are the queries with consistently poor estimation across all criteria. **Note**: These results are from **Pass 2** (with extra indexes). Pass 1 results (without extra indexes) are available as a separate artifact for comparison. ``` **Example**: ``` queryid | query | error | nj_min | nj_max ----------------------+---------------------+--------+--------+-------- 8437282107574130241 | /* 30c.sql */ | 4.35 | 3 | 3 -8317260399803614106 | /* 28c.sql */ | 4.07 | 4 | 4 -1799126654562256384 | /* 29c.sql */ | 3.74 | 2 | 5 3901602676749870732 | /* 7c.sql */ | 3.63 | 3 | 3 (4 rows) ``` **Table Format**: - **5 columns**: queryid, query, error, nj_min, nj_max - **Query column**: Shows 32 first symbols of the SQL (e.g., `/* 30c.sql */`). - **Error column**: Single aggregated error metric (represents a mean field of simple average RStats error column) - **nj_min/nj_max columns**: Minimum and maximum number of JOIN nodes across executions. If these differ, it indicates plan variability for the same queryId. - Use PostgreSQL psql output format (not markdown table) - Include row count footer: `(N rows)` **SQL Query** to generate this table (approximate): ```sql WITH top_avg AS (SELECT queryid FROM pg_track_optimizer ORDER BY avg_avg DESC LIMIT 10), top_rms AS (SELECT queryid FROM pg_track_optimizer ORDER BY rms_avg DESC LIMIT 10), top_twa AS (SELECT queryid FROM pg_track_optimizer ORDER BY twa_avg DESC LIMIT 10), top_wca AS (SELECT queryid FROM pg_track_optimizer ORDER BY wca_avg DESC LIMIT 10), intersection AS ( SELECT queryid FROM top_avg INTERSECT SELECT queryid FROM top_rms INTERSECT SELECT queryid FROM top_twa INTERSECT SELECT queryid FROM top_wca ) SELECT v.queryid, LEFT(v.query, 32), ROUND(v.avg_avg::numeric, 2) as error, v.nj_min::int, v.nj_max::int FROM pg_track_optimizer v WHERE v.queryid IN (SELECT queryid FROM intersection) ORDER BY error DESC; ``` **Note**: Function ROUND needs first argument to be numeric. So, always cast this argument. ### 5. Column Description Section ```markdown ## Column Description The results above include the following metrics for each query: - **queryid**: Internal PostgreSQL query identifier - **query**: The SQL query text (normalised, with literals replaced by placeholders; truncated to first 32 characters) - **error**: Simple average error across plan nodes - **nj_min**: Minimum number of JOIN nodes across executions - **nj_max**: Maximum number of JOIN nodes across executions (differs from nj_min indicates plan variability) Only queries appearing in the top 10 of **every** error metric are shown, representing the most consistently problematic queries. ``` **Note**: Simplified from the full pg_track_optimizer view schema. Only documents the 5 columns shown in the results table. ### 6. Workflow Artifacts Section ```markdown ## How to Use Workflow Artifacts The workflow performs **two passes** and produces two CSV artifacts for comparison: - **Pass 1** (`pg_track_optimizer_jobench_results_pass1`): [results](job-pass/YYYY-MM-DD-HHMMSS/pg_track_optimizer_jobench_results_pass1.zip) **without** extra indexes. - **Pass 2** (`pg_track_optimizer_jobench_results_pass2`): [results](job-pass/YYYY-MM-DD-HHMMSS/pg_track_optimizer_jobench_results_pass2.zip) **with** extra indexes from [Extra-Indexing](extra_indexing) wiki page. Also, explore the [log file](job-pass/YYYY-MM-DD-HHMMSS/postgresql_log.zip), which contains EXPLAIN ANALYZE dumps for each executed query. You can match them to a 'tracker' record by queryId, attached to each explain. Artifacts can also be downloaded from the workflow run and imported into your own PostgreSQL database for comparative analysis. ### 1. Create the tracking table The schema is auto-generated to match the current `pg_track_optimizer` view definition. Download the schema file: [schema](job-pass/YYYY-MM-DD-HHMMSS/schema.sql) ### 2. Import the CSV artifact ``` \copy job_tracking_data FROM 'pg_track_optimizer_results.csv' CSV HEADER; ``` ### 3. Example analysis queries Find queries with the highest average error: ```sql SELECT queryid, LEFT(query, 80) as query_preview, avg_avg, rms_avg, twa_avg, wca_avg FROM job_tracking_data ORDER BY avg_avg DESC LIMIT 10; ``` Find queries appearing in the top 10 of all error metrics (most consistently problematic): ```sql WITH top_avg AS (SELECT queryid FROM job_tracking_data ORDER BY avg_avg DESC LIMIT 10), top_rms AS (SELECT queryid FROM job_tracking_data ORDER BY rms_avg DESC LIMIT 10), top_twa AS (SELECT queryid FROM job_tracking_data ORDER BY twa_avg DESC LIMIT 10), top_wca AS (SELECT queryid FROM job_tracking_data ORDER BY wca_avg DESC LIMIT 10), intersection AS ( SELECT queryid FROM top_avg INTERSECT SELECT queryid FROM top_rms INTERSECT SELECT queryid FROM top_twa INTERSECT SELECT queryid FROM top_wca ) SELECT t.* FROM job_tracking_data t WHERE t.queryid IN (SELECT queryid FROM intersection) ORDER BY t.avg_avg DESC; ``` ``` **Artifact Links Format**: - Use relative paths from wiki root - Pattern: `job-pass/YYYY-MM-DD-HHMMSS/filename.ext` - All links must be enclosed in markdown link format: `[text](path)` ### 7. Footer ```markdown --- _Automatically generated by [JO-Bench workflow](https://github.com/danolivo/pg_track_optimizer/actions/workflows/jo-bench.yml)_ ``` ## Workflow Implementation Requirements The `jo-bench.yml` workflow and `generate-wiki-page.sh` script must: ### A. Generate Timestamp Variable ```bash TIMESTAMP=$(date -u '+%Y-%m-%d-%H%M%S') # e.g., 2026-01-05-084656 ``` ### B. Create Artifact Directory ```bash ARTIFACT_DIR="${WIKI_DIR}/job-pass/${TIMESTAMP}" mkdir -p "${ARTIFACT_DIR}" ``` ### C. Copy and Compress Artifacts ```bash # Copy CSV results zip "${ARTIFACT_DIR}/pg_track_optimizer_jobench_results_pass1.zip" \ /tmp/pg_track_optimizer_results_pass1.csv zip "${ARTIFACT_DIR}/pg_track_optimizer_jobench_results_pass2.zip" \ /tmp/pg_track_optimizer_results_pass2.csv # Copy PostgreSQL log zip "${ARTIFACT_DIR}/postgresql_log.zip" ~/pgdata/logfile # Copy schema SQL cp "${SCHEMA_FILE}" "${ARTIFACT_DIR}/schema.sql" ``` ### D. Generate Report with Correct Links All artifact links in the markdown must use the format: ```markdown [text](job-pass/YYYY-MM-DD-HHMMSS/filename.ext) ``` ### E. Extract Benchmark Commit ```bash # Get jo-bench repository commit cd ~/jo-bench BENCH_COMMIT=$(git rev-parse HEAD) ``` Include in report metadata: ```markdown - **Benchmark Source**: [`${BENCH_COMMIT:0:7}`](https://github.com/danolivo/jo-bench/commit/${BENCH_COMMIT}) ``` ### F. Simplified Results Query The workflow should execute a simplified query that produces 3 columns only: - `queryid` - `query` (truncated/formatted with query filename comment) - `error` (simple average error) Format as psql output (not markdown table) and wrap in triple backticks. ### G. Wiki Page Naming Create page with name pattern: ```bash WIKI_PAGE="report-${TIMESTAMP}.md" ``` ### H. Update Home Page Link Add link to Home.md in the JO-Bench results section: ```bash sed -i "/## JO-Bench Performance Test Results/a\\- [${TEST_DATE}](report-${TIMESTAMP})" Home.md ``` ### I. Git Operations ```bash cd "${WIKI_DIR}" git add "report-${TIMESTAMP}.md" \ "job-pass/${TIMESTAMP}/" \ "Home.md" git commit -m "Add JO-Bench results for ${TEST_DATE}" git push origin master ``` ## Validation Checklist Before publishing a report, verify: - [ ] Report filename matches `report-YYYY-MM-DD-HHMMSS.md` pattern - [ ] Artifact directory exists: `job-pass/YYYY-MM-DD-HHMMSS/` - [ ] All 4 artifact files present and properly compressed/formatted - [ ] Report title is "Join Order Benchmark Test Results" (not "JO-Bench") - [ ] Test Environment has all 6 metadata fields - [ ] PostgreSQL Version shows full version string - [ ] All commit links use 7-char short hash display, full hash in URL - [ ] Both Pass 1 and Pass 2 result sections present - [ ] Results table uses PostgreSQL psql format (not markdown table) - [ ] Results table has exactly 5 columns (queryid, query, error, nj_min, nj_max) - [ ] Column Description documents only the 5 shown columns - [ ] Artifact links use relative paths: `job-pass/YYYY-MM-DD-HHMMSS/...` - [ ] All artifact links enclosed in markdown format: `[text](path)` - [ ] Footer contains workflow attribution link - [ ] Home.md updated with link to new report - [ ] Benchmark Source commit included in metadata ## Future Improvements - [ ] Implement Pass 1 results generation (currently shows "TODO") - [ ] Add comparative analysis between Pass 1 and Pass 2 - [ ] Consider adding summary statistics section - [ ] Explore visualization options for error metrics comparison # Extra Requirements for Claude - Follow this document as strictly as possible. - Always ask questions if you're not sure about your decision. - Keep report page as laconic as possible. In case of wordy material prefer to fold it into an attachement. - Check workflows on consistency with this document. - Re-check consistency of this document before using it. - Never change this document automatically.