Narugn ====== Narugn is a lightweight distributed computer, composed by one or more cells that are connected locally. It requires PostgreSQL with the PL/Proxy extension. Overview -------- The Narugn cluster is composed by cells. Each cell is a PostgreSQL database. Cells belong to one or more PostgreSQL database servers. The Narugn cluster is distributed, in the sense that there is no hierarchy between cells: the user interacts via any cell. The Narugn server is locally connected: each server talks only with neighbouring servers. Therefore a Narugn cluster can have a large number of cells, connected by a large number of small networks. See "Cluster, Server, Cell" below for more details on how cells are arranged. Usage ----- The Narugn cluster is accessed via a standard PostgreSQL connection, so it can be used from the command line, as well as by an application. Extension, Logic and State -------------------------- There are three categories of database objects in a Narugn cell. "Extension" denotes objects that are created with the Narugn extension; they constitute a small core which is not supposed to change frequently. Extension code includes primitives to implement connectivity inside the Narugn cluster, as well as the ability to load additional database objects and distribute them among all the cells in the cluster. We call such additional objects "Logic". Logic can be updated frequently; new Logic entirely replaces any existing Logic. We provide a simple Logic management script which manages dependencies between different Logic fragments and combines them in a single self-contained unit, effectively implementing a basic package system. Some tables or sequences which are part of Logic can be marked as "State", meaning that their contents will be preserved when replacing old Logic with a new one. Three kinds of functions ------------------------ There are three groups of functions in Narugn: - Internal functions Not for user interaction. All part of Extension. - API functions For user interaction with the Narugn cluster. All part of Extension. - Cell functions To be run on cells via API functions. Extension contains five cell functions; additional ones can be added as Logic. Cell functions are prefixed with "cell_" and have all the same input and output parameters. API functions ------------- - execute_sync, execute_sync_abs Two ways to start a distributed computation (very similar, they only differ on the output format). - configure_cell (2 variants) Completes the configuration of a newly created Narugn cell. The first variant creates a cell from scratch; the second variant copies it configuration from an existing cell on the same server. - api_connect Connects to a cell in a neighbouring Narugn server, attempting to merge the two Narugn clusters. - state_table, state_sequence Mark tables and sequences in Logic as carrying State. This means that their contents are preserved on upgrades, provided that the same objects exist in the new Logic. Cell functions -------------- The following five "core" cell functions are implemented by the Narugn extension. Additional cell functions can be added as part of Logic. - cell_ping This is the simplest possible distributed computation; each cell just returns 'OK'. - cell_version Each cell returns version strings for PostgreSQL, the local OS, and the local Narugn extension. - cell_rescan Refreshes the list of neighbours for each reachable cell, and repeats until no new cells are discovered. - cell_logic Replaces the existing Logic with the one that is passed as an argument. - cell_new_server This cell function is for internal use only (by the api_connect API function). Distributed execution details ----------------------------- The function execute_sync ( cell_function IN text , payload VARIADIC text[] DEFAULT '{}' , c OUT cds , z OUT bigint , dt OUT interval , output OUT text ) RETURNS SETOF RECORD crawls the currently known cells, executes the given cell function on each cell, and then return the results. An optional variadic payload can be specified, and will be transmitted to the cell function. An absolute version "execute_sync_abs" is available, where dt is replaced by a timestamp "t". The function named in "cell_function" must be a Narugn cell function; the "cell_" prefix is automatically added to its name. For example, by specifying cell_function := "ping" the following function will be launched on every cell: CREATE OR REPLACE FUNCTION cell_ping ( payload IN text[] , walked IN cdt[] , z OUT bigint , t OUT timestamp with time zone , output OUT text ) RETURNS SETOF RECORD The second input parameter "walked" is the path that has been walked from the starting cell to reach that cell. It is provided to the cell function, in case it needs it (e.g. traceroute-like function). The "cdt" data type contains cell coordinates plus a timestamp with time zone. The output of execute_sync is a set of rows, obtained as the union of all the sets of rows produced by each cell. The first column denotes which cell that row comes from, while the other columns are passed directly as produced by the cell function: * c : cds = coordinates of the cell that produced this row * z : integer = order of this row among all the rows produced by the same cell * dt : interval = when the row was produced, relative to the timestamp when the command was issued on the originating cell * output : text = contents of the row Example: narugn_cell_2_2=# select * from execute_sync('version'); c | z | dt | output -------+---+-----------------+---------------------------------------------------------------------------------------- (2,2) | 1 | 00:00:00.177874 | PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 32-bit (2,2) | 2 | 00:00:00.177976 | Narugn 0.2.0 (3,2) | 1 | 00:00:00.27331 | PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 32-bit (3,2) | 2 | 00:00:00.273416 | Narugn 0.2.0 (4 rows) Security -------- At present there are no privileges or user profiles. This is acceptable since Narugn is still a prototype, suitable for running experiments. The distributed nature of the Narugn cluster makes privileges complicated and requires a separate analysis in order to introduce them without increasing complexity too much. From version 0.3.0, the "narugn" user has no special privileges. All functions are SECURITY INVOKER. Cluster, Server, Cell --------------------- A Narugn cell is a PostgreSQL database satisfying certain conditions. A cell has global coordinates, a pair of integers; cells are placed on a square grid. There is an adjacency notion between cells, which depends on coordinates: cell (x,y) is adjacent to cell (x',y') exactly when |x-x'| + |y-y'| = 1. For instance, each cell can have up to four neighbours. Each cell belongs to a Narugn server. A Narugn server is a PostgreSQL database server, hosting zero or more Narugn cells. For each server S there is a polygon p(S). If a server hosts a cell (x,y), then (x,y) is contained inside p(S). Each server belongs to a cluster. Two servers S1, S2 are said to be adjacent if there are cells (x1,y1) in p(S1) and (x2,y2) in p(S2) such that (x1,y1) is adjacent to (x2,y2). A Narugn cluster is a collection C={S_1,...,S_k} of one or more Narugn servers, satisfying two conditions: 1. (connected) for each i there is j such that S_i is adjacent to S_j; 2. (disjoint) for each distinct i and j, S_i does not overlap with S_j. It is possible to merge two existing clusters, provided that the servers do not overlap; this is implemented via the "connect" API call. See also -------- doc/QUICKSTART.md for a guided tour. Author ------ Gianni Ciolli, 2ndQuadrant Italia Copyright and License --------------------- Copyright (C) 2012, 2013 Gianni Ciolli . Narugn is distributed under the terms of the GNU General Public License version 3 or later, which is available both in the enclosed COPYING file and at http://www.gnu.org/copyleft/gpl.html .