Coding Conventions ================== This chapter collects the small style and naming conventions ProvSQL follows. Most of them are not enforced by tooling, so reviewers rely on contributors to apply them by convention. When in doubt, prefer to look at how nearby code is written and match that. Languages and File Layout ------------------------- ProvSQL is a mixed C and C++ codebase, with the boundary chosen deliberately: - **C** for everything that touches the PostgreSQL extension API: the planner hook, SQL-callable functions, GUC declarations, the mmap background worker shell, type I/O for ``agg_token``, cross-version compatibility shims. C is what PostgreSQL itself is written in and what its headers expect. - **C++17** for everything else: circuit data structures, semiring evaluators, knowledge compilation, tree decomposition, the standalone ``tdkc`` tool. C++ buys us templates, ``std::`` containers, RAII, and exceptions. When a file needs to call into the other language, the boundary goes through plain C function declarations marked ``extern "C"``. See :cfile:`provsql_utils_cpp.h` and :cfile:`c_cpp_compatibility.h` for the helper headers that mediate this. Files are named after the main type or feature they implement (:cfile:`BooleanCircuit.cpp`, :cfile:`provsql_mmap.c`). The component map in :doc:`architecture` lists every source file with a one-line description. Naming ------ - C functions follow ``snake_case``; many of the planner-hook internals additionally start with a verb (``make_``, ``replace_``, ``transform_``, ``rewrite_``). - C++ types use ``PascalCase`` (``BooleanCircuit``, ``Aggregator``). - C++ methods use ``camelCase`` (``getGate``, ``probabilityEvaluation``). - Macros are ``UPPER_CASE``. - Public symbols that are part of ProvSQL's "API surface" but live outside any particular type get a ``provsql_`` prefix (``provsql_error``, ``provsql_planner``, ``provsql_interrupted``). - Gate-type enum values are ``gate_xxx`` (lowercase, see :cfunc:`gate_type`). Error Reporting --------------- Use the convenience macros from :cfile:`provsql_error.h` rather than calling ``elog()`` directly: - :cfunc:`provsql_error` -- aborts the current transaction with the message prefixed by ``ProvSQL:``. Never returns. - :cfunc:`provsql_warning`, :cfunc:`provsql_notice`, :cfunc:`provsql_log` -- non-aborting variants for the ``WARNING``, ``NOTICE``, and ``LOG`` levels. The macros require a *literal* format string (the prefix is concatenated at compile time). If you need to format a runtime string, build it first and pass it as ``"%s"``. Inside C++ code, prefer raising a :cfunc:`CircuitException` (or a purpose-built subclass like :cfunc:`SemiringException` or ``TreeDecompositionException``) and let the SQL-callable wrapper catch it and call :cfunc:`provsql_error`. Throwing across the C/C++ boundary is undefined behaviour, so the catch must happen inside the C++ side. Memory Management ----------------- C code uses ``palloc`` / ``pfree`` and lives inside PostgreSQL's memory contexts: most allocations get reclaimed automatically when the surrounding query or transaction ends. Do not call ``malloc`` from C code. C++ code uses ordinary heap allocation, ``std::unique_ptr`` and ``std::shared_ptr``, and STL containers. These do not interact with PostgreSQL's memory contexts, so a long-lived C++ object that holds a lot of memory should be released explicitly when no longer needed. When a function takes ownership of a pointer it should say so in its Doxygen comment, and the caller should not free it; when it borrows, it should say *that*, and the caller is responsible for freeing. Doxygen Comments ---------------- ProvSQL's API reference is auto-generated by Doxygen from comments embedded in the source. Every public (or otherwise non-trivial) function, type, class, SQL function, file, and global variable should carry a documentation comment. The prevailing style is JavaDoc (``/** ... */``) using the standard Doxygen tags: - ``@brief`` -- one-line summary (required); - ``@param`` / ``@return`` -- parameters and return value; - ``@file`` -- at the top of each source and header file; - ``@defgroup`` / ``@ingroup`` -- used in :sqlfile:`provsql.sql` to organise SQL functions into logical groups. Existing files provide good templates: :cfile:`provsql.c` for C code, :cfile:`BooleanCircuit.h` for |cpp| classes, and :sqlfile:`provsql.sql` for SQL functions. Note that :sqlfile:`provsql.sql` is generated by the Makefile from the ``sql/provsql.*.sql`` sources (see :doc:`build-system`) -- edit those, not the generated file. When you cross-reference an SQL function from prose, use the ``sqlfunc`` role and add a corresponding entry to ``_SQL_FUNC_MAP`` in ``doc/source/conf.py``. Same for the ``cfunc`` role and ``_C_FUNC_MAP`` on the C/|cpp| side, and the ``cfile`` / ``sqlfile`` roles for filenames. The coherence checker (``check-doc-links.py``) run at the end of ``make docs`` will fail the build if a referenced function does not have a map entry, if a map entry is unused, or if a map entry points at a nonexistent Doxygen anchor. The coherence checker does **not** enforce that newly added code carries Doxygen comments at all -- adding them is a convention the project relies on developers to uphold. Wiring a New SQL-Callable C Function ------------------------------------ Adding a new SQL-callable C function requires touching a handful of files. The standard pattern: 1. **Implement the C function** with the ``Datum function_name(PG_FUNCTION_ARGS)`` signature, registered via ``PG_FUNCTION_INFO_V1(function_name)``. Use the ``PG_GETARG_*`` and ``PG_RETURN_*`` macros to convert between ``Datum`` and native types. The ``PG_FUNCTION_INFO_V1`` macro takes care of exporting the symbol; no extra ``PGDLLEXPORT`` is needed. 2. **Declare the SQL function** in :sqlfile:`provsql.sql` (i.e., in either ``sql/provsql.common.sql`` or ``sql/provsql.14.sql``): .. code-block:: plpgsql CREATE OR REPLACE FUNCTION function_name(arg type, ...) RETURNS rettype AS 'MODULE_PATHNAME', 'function_name' LANGUAGE C STRICT; The ``MODULE_PATHNAME`` placeholder is rewritten by PostgreSQL at install time to point at the extension shared object. 3. **Add a Doxygen comment** to the SQL declaration -- the SQL layer is also documented through Doxygen via the perl filter in ``plpgsql_filter.pl``. Use ``@ingroup`` to put the function in the right SQL API group. 4. **Reference it from the user docs** with the ``sqlfunc`` role and add the corresponding entry to ``_SQL_FUNC_MAP`` in ``doc/source/conf.py``. The coherence checker enforces both sides. Adding a New Test ----------------- See :doc:`testing` for the full procedure. In short: drop a ``.sql`` file under ``test/sql/`` and a matching ``.out`` under ``test/expected/``, register it in ``test/schedule.common`` (or ``test/schedule.14``), and make sure the test does not depend on random UUIDs or unstable orderings of symbolic representations (:doc:`testing` has the patterns for normalising both). Pitfalls to Avoid ----------------- A non-exhaustive list of things to avoid because they have caused real bugs in the past: - **Calling C++ code from C without an** ``extern "C"`` **boundary function.** Names get mangled and linking will silently pick the wrong overload (or fail in confusing ways). - **Throwing C++ exceptions across the C boundary.** Always catch in the outermost C++ wrapper and convert to a :cfunc:`provsql_error` call. - **Calling** ``palloc`` **from a thread other than the PostgreSQL backend.** PostgreSQL's memory contexts are not thread-safe. Workers and external tools that allocate must use the C++ side. - **Editing** ``sql/provsql.sql`` **or** ``sql/provsql--.sql`` **directly.** Both are generated by the Makefile from ``sql/provsql.common.sql`` and ``sql/provsql.14.sql``; edit those instead and rebuild. - **Editing** ``test/schedule`` **directly.** Same story: it is generated from ``test/schedule.common`` (and optionally ``test/schedule.14``).