README
------------
Christopher Browne
cbbrowne@ca.afilias.info
2006-10-20
------------
The script configure-replication.sh is intended to allow the gentle
user to readily configure replication using the Slony-I replication
system for PostgreSQL. This script is based on the configuration
approach taken by the testbed scripts in slony1-engine/tests. A
customized version of this has been implemented for LedgerSMB
.
For more general details about Slony-I, see
This script uses a number of environment variables to determine the
shape of the configuration. In many cases, the defaults should be at
least nearly OK...
Global:
CLUSTER - Name of Slony-I cluster
NUMNODES - Number of nodes to set up
PGUSER - name of PostgreSQL superuser controlling replication
PGPORT - default port number
PGDATABASE - default database name
TABLES - a list of fully qualified table names (e.g. - complete with
namespace, such as public.my_table)
SEQUENCES - a list of fully qualified sequence names (e.g. - complete with
namespace, such as public.my_sequence)
SLONIKCONFDIR - a directory in which to place the config files. If
not set, a unique directory under /tmp will be created for this
purpose.
Defaults are provided for ALL of these values, so that if you run
configure-replication.sh without setting any environment variables,
you will get a set of slonik scripts. They may not correspond, of
course, to any database you actually want to use...
For each node, there are also four parameters; for node 1:
DB1 - database to connect to
USER1 - superuser to connect as
PORT1 - port
HOST1 - host
It is quite likely that DB*, USER*, and PORT* should be drawn from the
default PGDATABASE, PGUSER, and PGPORT values above; that sort of
uniformity is usually a good thing.
In contrast, HOST* values should be set explicitly for HOST1, HOST2,
..., as you don't get much benefit from the redundancy replication
provides if all your databases are on the same server!
slonik config files are generated in a temp directory under /tmp.
(Or, if you set SLONIKCONFDIR, the directory you specify in that
environment variable.) The usage is thus:
1. preamble.slonik is a "preamble" containing connection info used by
the other scripts.
Verify the info in this one closely; you may want to keep this
permanently to use with other maintenance you may want to do on the
cluster.
2. create_nodes.slonik
This is the first script to run; it sets up the requested nodes as
being Slony-I nodes, adding in some Slony-I-specific config tables
and such.
You can/should start slon processes any time after this step has run.
3. store_paths.slonik
This is the second script to run; it indicates how the slons
should intercommunicate. It assumes that all slons can talk to
all nodes, which may not be a valid assumption in a
complexly-firewalled environment.
4. create_set.slonik
This sets up the replication set consisting of the whole bunch of
tables and sequences that make up your application's database
schema.
When you run this script, all that happens is that triggers are
added on the origin node (node #1) that start collecting updates;
replication won't start until #5...
There are two assumptions in this script that could be invalidated
by circumstances:
1. That all of the tables and sequences have been included.
This becomes invalid if new tables get added to your schema
and don't get added to the TABLES list in the generator
script.
2. That all tables have been defined with primary keys.
This *should* be the case soon if not already.
5. subscribe_set_2.slonik
And 3, and 4, and 5, if you set the number of nodes higher...
This is the step that "fires up" replication.
The assumption that the script generator makes is that all the
subscriber nodes will want to subscribe directly to the origin
node. If you plan to have "sub-clusters", perhaps where there is
something of a "master" location at each data centre, you may
need to revise that.
The slon processes really ought to be running by the time you
attempt running this step. To do otherwise would be rather
foolish.
Once all of these slonik scripts have been run, replication may be
expected to continue to run as long as slons stay running.
If you have an outage, where a database server or a server hosting
slon processes falls over, and it's not so serious that a database
gets mangled, then no big deal: Just restart the postmaster and
restart slon processes, and replication should pick up.
If something does get mangled, then actions get more complicated:
1 - If the failure was of the "origin" database, then you probably want
to use FAIL OVER to shift the "master" role to another system.
2 - If a subscriber failed, and other nodes were drawing data from it,
then you could submit SUBSCRIBE SET requests to point those other
nodes to some node that is "less mangled." That is not a real big
deal; note that this does NOT require that they get re-subscribed
from scratch; they can pick up (hopefully) whatever data they
missed and simply catch up by using a different data source.
Once you have reacted to the loss by reconfiguring the surviving nodes
to satisfy your needs, you may want to recreate the mangled node. See
the Slony-I Administrative Guide for more details on how to do that.
It is not overly profound; you need to drop out the mangled node, and
recreate it anew, which is not all that different from setting up
another subscriber.