# pg_biscuit A PostgreSQL Index Access Method (IAM) for high-performance pattern matching on text columns. Biscuit indexes are specifically designed to accelerate `LIKE` queries with arbitrary wildcards. ## Overview `pg_biscuit` implements a novel indexing approach that maintains character position information using compressed bitmaps (Roaring bitmaps). Unlike traditional B-tree or trigram (pg_trgm) indexes, Biscuit can efficiently handle complex pattern queries including prefix, suffix, substring, and multi-part patterns. ### Key Features - **Pattern-optimized indexing**: Accelerates `LIKE` queries with `%` and `_` wildcards - **Full CRUD support**: Efficient insert, update, and delete operations with lazy deletion - **Memory-resident architecture**: Index lives in shared memory for fast access - **Compressed bitmaps**: Uses Roaring bitmaps for space-efficient storage - **Automatic cleanup**: Tombstone-based deletion with batch cleanup on threshold - **PostgreSQL 16+ compatible**: Implements standard IAM interface ## Installation ### Prerequisites - PostgreSQL 16 or later - C compiler (gcc or clang) - Optional: Roaring bitmap library for better compression ### Build from Source ```bash # Clone or download the extension cd pg_biscuit # Build and install make sudo make install # Enable in your database psql -d your_database -c "CREATE EXTENSION pg_biscuit;" ``` ## Usage ### Creating a Biscuit Index ```sql -- Basic index on a text column CREATE INDEX idx_username ON users USING biscuit(username); -- Index on varchar or other text types CREATE INDEX idx_email ON emails USING biscuit(email_address); -- Partial index (only index active records) CREATE INDEX idx_active_usernames ON users USING biscuit(username) WHERE status = 'active'; -- Case-insensitive index (use LOWER function) CREATE INDEX idx_username_lower ON users USING biscuit(LOWER(username)); -- Multi-columnar indexing CREATE INDEX idx_user ON users USING biscuit(username, email_address); ``` ### Query Examples Biscuit indexes automatically accelerate these query patterns: ```sql -- Prefix match: 'john%' SELECT * FROM users WHERE username LIKE 'john%'; -- Suffix match: '%@gmail.com' SELECT * FROM users WHERE email LIKE '%@gmail.com'; -- Substring match: '%admin%' SELECT * FROM users WHERE username LIKE '%admin%'; -- Complex pattern: '%a%b%c%' SELECT * FROM logs WHERE message LIKE '%error%database%'; -- Exact match: 'johndoe' SELECT * FROM users WHERE username LIKE 'johndoe'; -- Case-insensitive (requires lowercase index) SELECT * FROM users WHERE LOWER(username) LIKE '%admin%'; -- Multiple filter based querying SELECT * FROM users WHERE username LIKE 'john%' and email_address like '%gmail%'; ``` ### Index Maintenance ```sql -- View all Biscuit indexes in database SELECT * FROM biscuit_indexes; -- Get detailed statistics for an index SELECT biscuit_index_stats('idx_username'::regclass::oid); -- Rebuild index if needed REINDEX INDEX idx_username; -- Clean up deleted records VACUUM ANALYZE users; ``` ## Architecture ### Memory-Resident Design Biscuit indexes are stored entirely in PostgreSQL's shared memory: 1. **Index Build**: Scans heap once during `CREATE INDEX` 2. **Storage**: Lives in index relation's `rd_amcache` 3. **Persistence**: Not written to disk; rebuilds on restart (amortized) using full heap scan 4. **Updates**: Maintained incrementally via INSERT/UPDATE/DELETE hooks ### Data Structures - **Position Index**: Character → Position → Bitmap of record IDs - **Negative Index**: Character → Negative offset → Bitmap (for suffix queries) - **Length Bitmaps**: Precomputed bitmaps for length-based filtering - **Tombstones**: Lazy deletion with bitmap tracking - **Roaring Bitmaps**: Compressed bitmap representation ### Query Optimization The engine includes several optimizations: 1. **Query planning**: In case of multiple queries, it intelligently determines the optimal order of execution 2. **Early Termination**: Stops on empty intersection 3. **Single-Part Fast Path**: Avoids recursion for simple patterns 4. **TID Sorting**: Orders results for sequential heap access 5. **Batch Operations**: Bulk bitmap operations for better performance ## Limitations 1. **Memory-Resident**: Index rebuilds on database restart (not persisted to disk) 3. **Max String Length**: Limited to 256 characters (configurable via `MAX_POSITIONS`) 4. **Case Sensitivity**: Case-insensitive searches require function index with `LOWER()` 5. **No Full-Text Search**: Not a replacement for PostgreSQL's text search features ## Configuration No configuration is required. The extension automatically: - Allocates memory in the index context - Performs cleanup when tombstones reach 1000 (configurable via `TOMBSTONE_CLEANUP_THRESHOLD`) - Rebuilds length bitmaps as needed ## Development ### Code Structure - `pg_biscuit.c`: Main IAM implementation - `pg_biscuit--1.0.sql`: SQL installation script ## Contributing This is an academic/research project demonstrating a novel indexing approach. Contributions are welcome: - Bug reports and fixes - Performance improvements - Documentation enhancements - Benchmark additions ## License PostgreSQL License (similar to BSD/MIT) ## Acknowledgments - Uses Roaring bitmap library (optional dependency) - Implements PostgreSQL Index Access Method interface ## Contributors BISCUIT is developed and maintained by [Sivaprasad Murali](https://linkedin.com/in/sivaprasad-murali) . ## Support and Contact **Issues:** https://github.com/crystallinecore/pg_biscuit/issues **Discussions:** https://github.com/crystallinecore/pg_biscuit/discussions ## **When pg_trgm feels half-baked, grab a pg_BISCUIT 🍪** ---