--- title: Quickstart --- This guide will walk you through the following steps to get started with ParadeDB: 1. Full text search over a single table 2. Full text search over a JOIN 3. Similarity (i.e. vector) search ## Single Table Search ParadeDB comes with a helpful procedure that creates a table populated with mock data to help you get started. Once connected with `psql`, run the following commands to create and inspect this table. ```sql CALL paradedb.create_bm25_test_table( schema_name => 'public', table_name => 'mock_items' ); SELECT description, rating, category FROM mock_items LIMIT 3; ``` ```csv description | rating | category --------------------------+--------+------------- Ergonomic metal keyboard | 4 | Electronics Plastic Keyboard | 4 | Electronics Sleek running shoes | 5 | Footwear (3 rows) ``` Next, let's create a BM25 index called `search_idx` on this table. A BM25 index is a covering index, which means that multiple columns can be included in the same index. The following code block demonstrates the various Postgres types that can be combined inside a single index. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, description, category, rating, in_stock, created_at, metadata, weight_range) WITH (key_field='id'); ``` Note the mandatory `key_field` option. Every BM25 index needs a `key_field`, which should be the name of a column that will function as a row's unique identifier within the index. Additionally, the `key_field` must be the first field in the list of columns. See [choosing a key field](/documentation/indexing/create_index#choosing-a-key-field) for more details. We're now ready to execute a full-text search. We'll look for the first five rows with a `rating` greater than `2` where `description` matches `shoes` or `category` matches `footwear`. ```sql SELECT description, rating, category FROM mock_items WHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2' ORDER BY description LIMIT 5; ``` ``` csv description | rating | category ----------------------+--------+---------- Comfortable slippers | 3 | Footwear Generic shoes | 4 | Footwear Sleek running shoes | 5 | Footwear Sturdy hiking boots | 4 | Footwear White jogging shoes | 3 | Footwear (5 rows) ``` The ParadeDB-specific `@@@` operator instructs ParadeDB to execute a full-text search using the BM25 index. Its job is to return matching rows as quickly as possible in no particular order. To sort by relevance, the `paradedb.score` function generates BM25 scores for each row. ```sql SELECT description, rating, category, paradedb.score(id) FROM mock_items WHERE description @@@ 'shoes' OR category @@@ 'footwear' AND rating @@@ '>2' ORDER BY score DESC, description LIMIT 5; ``` ``` csv description | rating | category | score ----------------------+--------+----------+----------- Generic shoes | 4 | Footwear | 5.8135376 Sleek running shoes | 5 | Footwear | 5.4211845 White jogging shoes | 3 | Footwear | 5.4211845 Comfortable slippers | 3 | Footwear | 2.9362776 Sturdy hiking boots | 4 | Footwear | 2.9362776 (5 rows) ``` The result table shows that rows matching on both `description` and `category` scored more highly than rows matching only one of the fields. Finally, let's see how ParadeDB handles a phrase query like `white shoes`. Let's also surface results even if there is a word between `white` and `shoes` using the `~` slop operator. ```sql SELECT description, rating, category FROM mock_items WHERE description @@@ '"white shoes"~1' LIMIT 5; ``` ``` csv description | rating | category ---------------------+--------+---------- White jogging shoes | 3 | Footwear (1 row) ``` In addition to simple string queries, ParadeDB supports advanced [query builder functions](/documentation/advanced) similar to the Elastic DSL. ## Joined Search ParadeDB supports full text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. To demonstrate, let's create a table called `orders` that references `mock_items`. ```sql CALL paradedb.create_bm25_test_table( schema_name => 'public', table_name => 'orders', table_type => 'Orders' ); ALTER TABLE orders ADD CONSTRAINT foreign_key_product_id FOREIGN KEY (product_id) REFERENCES mock_items(id); SELECT * FROM orders LIMIT 3; ``` ```csv order_id | product_id | order_quantity | order_total | customer_name ----------+------------+----------------+-------------+--------------- 1 | 1 | 3 | 99.99 | John Doe 2 | 2 | 1 | 49.99 | Jane Smith 3 | 3 | 5 | 249.95 | Alice Johnson (3 rows) ``` Next, let's create a BM25 index over the `orders` table. ```sql CREATE INDEX orders_idx ON orders USING bm25 (order_id, customer_name) WITH (key_field='order_id'); ``` The following query searches for rows where `customer_name` matches `Johnson` and `description` matches `shoes`. ```sql SELECT o.order_id, o.customer_name, m.description FROM orders o JOIN mock_items m ON o.product_id = m.id WHERE o.customer_name @@@ 'Johnson' AND m.description @@@ 'shoes' ORDER BY order_id LIMIT 5; ``` ```csv order_id | customer_name | description ----------+---------------+--------------------- 3 | Alice Johnson | Sleek running shoes 6 | Alice Johnson | White jogging shoes 36 | Alice Johnson | White jogging shoes (3 rows) ``` ## Similarity Search For vector similarity search, let's first generate a vector embeddings column. For the sake of this tutorial, we'll randomly generate these embeddings. ```sql ALTER TABLE mock_items ADD COLUMN embedding vector(3); UPDATE mock_items m SET embedding = ('[' || ((m.id + 1) % 10 + 1)::integer || ',' || ((m.id + 2) % 10 + 1)::integer || ',' || ((m.id + 3) % 10 + 1)::integer || ']')::vector; SELECT description, rating, category, embedding FROM mock_items LIMIT 3; ``` ``` csv description | rating | category | embedding --------------------------+--------+-------------+----------- Ergonomic metal keyboard | 4 | Electronics | [3,4,5] Plastic Keyboard | 4 | Electronics | [4,5,6] Sleek running shoes | 5 | Footwear | [5,6,7] (3 rows) ``` Next, let's create an HNSW index on the `embedding` column of our table. While not required, an HNSW index can drastically improve query performance over very large datasets. ```sql CREATE INDEX on mock_items USING hnsw (embedding vector_cosine_ops); ``` Next, let's query our table with a vector and order the results by cosine distance: ```sql SELECT description, category, rating, embedding FROM mock_items ORDER BY embedding <=> '[1,2,3]', description LIMIT 3; ``` ```csv description | category | rating | embedding --------------------------+------------+--------+----------- Artistic ceramic vase | Home Decor | 4 | [1,2,3] Designer wall paintings | Home Decor | 5 | [1,2,3] Handcrafted wooden frame | Home Decor | 5 | [1,2,3] (3 rows) ``` That's it! Next, let's [load your data](/documentation/getting-started/load) to start running real queries.