# pg_biscuit

A PostgreSQL Index Access Method (IAM) for high-performance pattern matching on text columns. Biscuit indexes are specifically designed to accelerate `LIKE` queries with arbitrary wildcards.

## Overview

`pg_biscuit` implements a novel indexing approach that maintains character position information using compressed bitmaps (Roaring bitmaps). Unlike traditional B-tree or trigram (pg_trgm) indexes, Biscuit can efficiently handle complex pattern queries including prefix, suffix, substring, and multi-part patterns.

### Key Features

- **Pattern-optimized indexing**: Accelerates `LIKE` queries with `%` and `_` wildcards
- **Full CRUD support**: Efficient insert, update, and delete operations with lazy deletion
- **Memory-resident architecture**: Index lives in shared memory for fast access
- **Compressed bitmaps**: Uses Roaring bitmaps for space-efficient storage
- **Automatic cleanup**: Tombstone-based deletion with batch cleanup on threshold
- **PostgreSQL 16+ compatible**: Implements standard IAM interface

## Installation

### Prerequisites

- PostgreSQL 16 or later
- C compiler (gcc or clang)
- Optional: Roaring bitmap library for better compression

### Build from Source

```bash
# Clone or download the extension
cd pg_biscuit

# Build and install
make
sudo make install

# Enable in your database
psql -d your_database -c "CREATE EXTENSION pg_biscuit;"
```

## Usage

### Creating a Biscuit Index

```sql
-- Basic index on a text column
CREATE INDEX idx_username ON users USING biscuit(username);

-- Index on varchar or other text types
CREATE INDEX idx_email ON emails USING biscuit(email_address);

-- Partial index (only index active records)
CREATE INDEX idx_active_usernames ON users USING biscuit(username)
WHERE status = 'active';

-- Case-insensitive index (use LOWER function)
CREATE INDEX idx_username_lower ON users USING biscuit(LOWER(username));

-- Multi-columnar indexing
CREATE INDEX idx_user ON users USING biscuit(username, email_address);
```

### Query Examples

Biscuit indexes automatically accelerate these query patterns:

```sql
-- Prefix match: 'john%'
SELECT * FROM users WHERE username LIKE 'john%';

-- Suffix match: '%@gmail.com'
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- Substring match: '%admin%'
SELECT * FROM users WHERE username LIKE '%admin%';

-- Complex pattern: '%a%b%c%'
SELECT * FROM logs WHERE message LIKE '%error%database%';

-- Exact match: 'johndoe'
SELECT * FROM users WHERE username LIKE 'johndoe';

-- Case-insensitive (requires lowercase index)
SELECT * FROM users WHERE LOWER(username) LIKE '%admin%';

-- Multiple filter based querying
SELECT * FROM users WHERE username LIKE 'john%' and email_address like '%gmail%';
```

### Index Maintenance

```sql
-- View all Biscuit indexes in database
SELECT * FROM biscuit_indexes;

-- Get detailed statistics for an index
SELECT biscuit_index_stats('idx_username'::regclass::oid);

-- Rebuild index if needed
REINDEX INDEX idx_username;

-- Clean up deleted records
VACUUM ANALYZE users;
```

## Architecture

### Memory-Resident Design

Biscuit indexes are stored entirely in PostgreSQL's shared memory:

1. **Index Build**: Scans heap once during `CREATE INDEX`
2. **Storage**: Lives in index relation's `rd_amcache`
3. **Persistence**: Not written to disk; rebuilds on restart (amortized) using full heap scan
4. **Updates**: Maintained incrementally via INSERT/UPDATE/DELETE hooks

### Data Structures

- **Position Index**: Character → Position → Bitmap of record IDs
- **Negative Index**: Character → Negative offset → Bitmap (for suffix queries)
- **Length Bitmaps**: Precomputed bitmaps for length-based filtering
- **Tombstones**: Lazy deletion with bitmap tracking
- **Roaring Bitmaps**: Compressed bitmap representation

### Query Optimization

The engine includes several optimizations:

1. **Query planning**: In case of multiple queries, it intelligently determines the optimal order of execution
2. **Early Termination**: Stops on empty intersection
3. **Single-Part Fast Path**: Avoids recursion for simple patterns
4. **TID Sorting**: Orders results for sequential heap access
5. **Batch Operations**: Bulk bitmap operations for better performance

## Limitations

1. **Memory-Resident**: Index rebuilds on database restart (not persisted to disk)
3. **Max String Length**: Limited to 256 characters (configurable via `MAX_POSITIONS`)
4. **Case Sensitivity**: Case-insensitive searches require function index with `LOWER()`
5. **No Full-Text Search**: Not a replacement for PostgreSQL's text search features

## Configuration

No configuration is required. The extension automatically:
- Allocates memory in the index context
- Performs cleanup when tombstones reach 1000 (configurable via `TOMBSTONE_CLEANUP_THRESHOLD`)
- Rebuilds length bitmaps as needed

## Development

### Code Structure

- `pg_biscuit.c`: Main IAM implementation
- `pg_biscuit--1.0.sql`: SQL installation script

## Contributing

This is an academic/research project demonstrating a novel indexing approach. Contributions are welcome:

- Bug reports and fixes
- Performance improvements
- Documentation enhancements
- Benchmark additions

## License

PostgreSQL License (similar to BSD/MIT)

## Acknowledgments

- Uses Roaring bitmap library (optional dependency)
- Implements PostgreSQL Index Access Method interface
## Contributors

BISCUIT is developed and maintained by [Sivaprasad Murali](https://linkedin.com/in/sivaprasad-murali) .


## Support and Contact


**Issues:** https://github.com/crystallinecore/pg_biscuit/issues

**Discussions:** https://github.com/crystallinecore/pg_biscuit/discussions

##

**When pg_trgm feels half-baked, grab a pg_BISCUIT 🍪**

---