README
------------

Christopher Browne
cbbrowne@ca.afilias.info
2006-10-20
------------

The script configure-replication.sh is intended to allow the gentle
user to readily configure replication using the Slony-I replication
system for PostgreSQL.  This script is based on the configuration
approach taken by the testbed scripts in slony1-engine/tests.  A
customized version of this has been implemented for LedgerSMB
<http://ledgersmb.org/>.

For more general details about Slony-I, see <http://slony.info/>

This script uses a number of environment variables to determine the
shape of the configuration.  In many cases, the defaults should be at
least nearly OK...

Global:
  CLUSTER - Name of Slony-I cluster
  NUMNODES - Number of nodes to set up

  PGUSER - name of PostgreSQL superuser controlling replication
  PGPORT - default port number
  PGDATABASE - default database name

  TABLES - a list of fully qualified table names (e.g. - complete with
           namespace, such as public.my_table)
  SEQUENCES - a list of fully qualified sequence names (e.g. - complete with
           namespace, such as public.my_sequence)

  SLONIKCONFDIR - a directory in which to place the config files.  If
     not set, a unique directory under /tmp will be created for this
     purpose.

Defaults are provided for ALL of these values, so that if you run
configure-replication.sh without setting any environment variables,
you will get a set of slonik scripts.  They may not correspond, of
course, to any database you actually want to use...

For each node, there are also four parameters; for node 1:
  DB1 - database to connect to
  USER1 - superuser to connect as
  PORT1 - port
  HOST1 - host

It is quite likely that DB*, USER*, and PORT* should be drawn from the
default PGDATABASE, PGUSER, and PGPORT values above; that sort of
uniformity is usually a good thing.

In contrast, HOST* values should be set explicitly for HOST1, HOST2,
..., as you don't get much benefit from the redundancy replication
provides if all your databases are on the same server!

slonik config files are generated in a temp directory under /tmp.
(Or, if you set SLONIKCONFDIR, the directory you specify in that
environment variable.)  The usage is thus:

1.  preamble.slonik is a "preamble" containing connection info used by
    the other scripts.

   Verify the info in this one closely; you may want to keep this
   permanently to use with other maintenance you may want to do on the
   cluster.

2.  create_nodes.slonik

    This is the first script to run; it sets up the requested nodes as
    being Slony-I nodes, adding in some Slony-I-specific config tables
    and such.

You can/should start slon processes any time after this step has run.

3.  store_paths.slonik

    This is the second script to run; it indicates how the slons
    should intercommunicate.  It assumes that all slons can talk to
    all nodes, which may not be a valid assumption in a
    complexly-firewalled environment.

4.  create_set.slonik

    This sets up the replication set consisting of the whole bunch of
    tables and sequences that make up your application's database
    schema.

    When you run this script, all that happens is that triggers are
    added on the origin node (node #1) that start collecting updates;
    replication won't start until #5...

    There are two assumptions in this script that could be invalidated
    by circumstances:

     1.  That all of the tables and sequences have been included.

         This becomes invalid if new tables get added to your schema
         and don't get added to the TABLES list in the generator
         script.

     2.  That all tables have been defined with primary keys.

          This *should* be the case soon if not already.

5.  subscribe_set_2.slonik

     And 3, and 4, and 5, if you set the number of nodes higher...

     This is the step that "fires up" replication.  

     The assumption that the script generator makes is that all the
     subscriber nodes will want to subscribe directly to the origin
     node.  If you plan to have "sub-clusters", perhaps where there is
     something of a "master" location at each data centre, you may
     need to revise that.

     The slon processes really ought to be running by the time you
     attempt running this step.  To do otherwise would be rather
     foolish.

Once all of these slonik scripts have been run, replication may be
expected to continue to run as long as slons stay running.

If you have an outage, where a database server or a server hosting
slon processes falls over, and it's not so serious that a database
gets mangled, then no big deal: Just restart the postmaster and
restart slon processes, and replication should pick up.

If something does get mangled, then actions get more complicated:

1 - If the failure was of the "origin" database, then you probably want
   to use FAIL OVER to shift the "master" role to another system.

2 - If a subscriber failed, and other nodes were drawing data from it,
   then you could submit SUBSCRIBE SET requests to point those other
   nodes to some node that is "less mangled."  That is not a real big
   deal; note that this does NOT require that they get re-subscribed
   from scratch; they can pick up (hopefully) whatever data they
   missed and simply catch up by using a different data source.

Once you have reacted to the loss by reconfiguring the surviving nodes
to satisfy your needs, you may want to recreate the mangled node.  See
the Slony-I Administrative Guide for more details on how to do that.
It is not overly profound; you need to drop out the mangled node, and
recreate it anew, which is not all that different from setting up
another subscriber.