# BedquiltDB Conceptual Overview This document attempts to describe BedquiltDB at a high level. ## BedquiltDB Architecture BedquiltDB is divided into two components: - The `bedquilt` extension for PostgreSQL - A set of client "driver" libraries Once the `bedquilt` extension is installed on a PostgreSQL server, a driver library can be connected to the server and used to read and write JSON data. The driver proivides an API that feels native to the language it is written in and manages conversion from language-native data structures to JSON and back again. The following diagram illustrates how a web application written in python might use the `pybedquilt` driver to interface with a database which has `bedquilt` installed: ![BedquiltDB Architecture](/images/bedquilt_architecture.png) As we can see from the diagram, the drivers `find` method is really a thin wrapper around an SQL statement which uses a SQL function called `bq_find`, which is provided by the `bedquilt` extension. All of the functionality of BedquiltDB is implemented in this way, with all the "smart stuff" implemented inside the database, behind custom SQL functions. This approach provides several advantages over simply writing wrapper logic around SQL in a specific language: - The logic of BedquiltDB is performed inside the database, close to the data - Driver libraries become very simple to implement and test For more information on setting up BedquiltDB, see [Installation](installation.md). ## Drivers To use BedquiltDB, the programmer will need to import a BedquiltDB driver for their favourite programming language and use it to connect to the PostgreSQL/BedquiltDB server. Example, with the [pybedquilt](pybedquilt.readthedocs.org) driver: ```python import pybdequilt db = pybedquilt.BedquiltClient('dbname=test') ``` The `db` object holds a connection to the server, and provides an api for the collections in the BedquiltDB database. ## Collections In a BedquiltDB database, JSON data is stored in collections. You write data into collections, then read it back out later, like so: ```python # create a collection called 'users' db.create_collection('users') # get a Collection object, referencing the new 'users' collection users = db['users'] # How many users do you think we have? print users.count() ``` ## Documents Collections contain documents. A document is essentially a single JSON object. The BedquiltDB driver handles converting from native data-structures to JSON and back again. A document can have practically any structure you could want, as long as it's valid JSON, with one exception: all documents must have an `_id` field, with a string value. If a document without an `_id` field is written to a collection, then a random string will be generated and set as the `_id` value. The `_id` field is used as the unique primary-key in the collection. If two documents are saved with the same `_id`, then the second one will over-write the first. Here we see an example of saving a python dictionary to a BedquiltDB collection as a JSON object: ```python users.insert({ "_id": "john@example.com", "name": "John", "age": 45, "address": { "street": "Elm Row", "city": "Edinburgh" } }) ``` We can read that same document out later: ```python john = users.find_one_by_id("john@example.com") ``` Or retrieve it as part of a more general query: ``` edinburgh_users = users.find({"address": {"city": "Edinburgh"}}) ``` ## Writing Data There are two operations which write JSON data to a collection: `insert` and `save`. The `insert` operation takes a JSON document and inserts it into the collection, generating an `_id` value if needed. Regardless, the `insert` operation always returns the `_id` of the inserted document: ```python print pets.insert({"name": "Snuffles", "species": "dog"}) # => "ba40513444b760b7eb2684d8" print pets.insert({"_id": "some_meaningful_identifier", "name": "Larry", "species": "cat"}) # => "some_meaningful_identifier" ``` The `save` operation also takes a JSON document, but it first checks if the document has an `_id` field. If it does, and a document with that same `_id` exists in the collection, then the old document will be overwritten by the new one. Otherwise, `save` behaves the same as `insert`: if there are no documents in the collection with the same `_id` then the document is simply inserted into the collection, and the `_id` returned to the caller: ```python john = users.find_one_by_id('john@example.com') john['age'] = 46 result = users.save(john) print result # => "john@example.com" ``` ## Reading Data There are three operations which read json documents out of collections: `find`, `find_one` and`find_one_by_id`. The `find` operation takes a "query document" and compares it to the documents in the collection, returning the set of documents which match the query. A document is considered a match if the query matches some subset of the document. For example, we could find all active users: ```python cool_people = db['users'].find({ 'active': true }) ``` or, we could find all active users who are living in Edinburgh: ```python cool_people = db['users'].find({ 'active': true, 'address': { 'city': 'Edinburgh' } }) ``` or all active users in Edinburgh who have both `"icecream"` and `"code"` in their list of `likes`: ```python cool_people = db['users'].find({ 'active': true, 'address': { 'city': 'Edinburgh' }, 'likes': ['icecream', 'code'] }) ``` We can also just query for all documents in the collection, by suppling an empty query document: ```python cool_people = db['users'].find({}) ``` For some BedquiltDB drivers, the result of a `find` operation will be a `Cursor` of results, rather than an Array. Generally the driver should use the languages equivalent of lazy sequences to prepresent query result sets. This is so that the results can be streamed from the PostgreSQL server to the client as needed, rather than being eagerly materialised in memory: ```python print db['users'].find({...}) # => ``` We can iterate over the cursor, pulling in results as they are needed: ```python result = db['users'].find({...}) for doc in result: print doc ``` Or we can just turn the result into a list: ```python result = list( db['users'].find({}) ) print type(result) # => list ``` The `find_one` operation also takes a query document, just like `find`, but it only returns at most a single result, or `null` if there were no matching documents: ```python print db['users'].find_one({'email': 'user@example.com'}) # => {_id: '...', 'email': '...', ...} print db['users'].find_one({'this': 'matches': {'nothing'}}) # => None ``` As if that weren't enough, the `find_one_by_id` operation takes a string `id` instead of a query document, and returns the document with the matching `_id` field, or `null` if there are no documents with that `_id`. ```python print db['users'].find_one_by_id('400241') # => {_id: '400241', ...} ``` If we have a list of document ids, we can use `find_many_by_ids` to get them all in one query, rather than using `find_one_by_id` multiple times: ```python print db['orders'].find_many_by_ids(['X2242', 'X5373', 'X1762']) ``` We can also get a list of the distinct values we may have under a given key, with the `distinct` operation. ```python print db['users'].distinct('address.city') ``` ## Advanced Queries All of the examples so far have queried for documents which match a query document. In other words, the query document should be a sub-set of the matched document. BedquiltDB also supports more advanced query operations. For example, we can test that a certain field does _not equal_ to a given value: ```python print db['orders'].find({ 'address': { 'city': { '$noteq': 'Glasgow' } } }) ``` Or that a numeric field is greater than a certain value: ```python print db['articles'].find_one({ 'upvotes': { '$gt': 4 } }) ``` Of course, we can intermingle these special queries with ordinary matching queries too: ```python print db['orders'].find({ 'processed': True', 'address': { 'city': { '$noteq': 'Glasgow' }, 'address1': 'Church Street' } }) ``` See the [BedquiltDB Spec](../spec.md#aside-advanced-query-operations) for full documentation on advanced query operations. ## Skip, Limit and Sort The `find` operation takes a few extra, optional parameters which allow you to control the number of documents that are returned from the query. The `limit` option limits the result set to the desired size: ```python db['users'].find({active: true}, limit=10) ``` The `skip` option omits a number of documents from the start of the result set: ```python db['users'].find({active: true}, limit=10, skip=4) ``` The `sort` option allows you to specify how the result set should be sorted: ```python # sort by age ascending, then by name descending db['users'].find({active: true}, limit=10, skip=4, sort=[{'age': 1}, {'name': -1}] ) ``` Naturally, the `skip`, `limit` and `sort` options to `find` can be used in any combination. If no sort order is specified, the result set is likely to be sorted naturally in the order the documents were written to the collection. However, this behaviour is not guaranteed, so if you care about ordering you should sort by a document field which has meaning to your data. ## Removing Data Removing data from a collection can be accomplished with the `remove`, `remove_one` and `remove_one_by_id` operations. `remove` and `remove_one` take a query document and remove any documents which match the query, while `remove_one_by_id` takes a string `id` and removes the document in the collection with the same `_id`. All of the `remove*` operations return an integer indicating the number of documents that were removed. Beware: the `remove*` operations will permanantly delete data. There is no way to recover data removed in this way. ## Updating Data At the moment, the only way to update a document in a collection is to use the `save` operation detailed above.