Doing switchover and failover with Slony-I

Foreword

    Slony-I is an asynchronous replication system.  Because of that, it
    is almost certain that at the moment the current origin of a set
    fails, the last transactions committed have not propagated to the
    subscribers yet.  They always fail under heavy load, and you know
    it.  Thus the goal is to prevent the main server from failing.
    The best way to do that is frequent maintenance.

    Opening the case of a running server is not exactly what we
    all consider professional system maintenance.  And interestingly,
    those users who use replication for backup and failover
    purposes are usually the ones that have a very low tolerance for
    words like "downtime".  To meet these requirements, Slony-I has
    not only failover capabilities, but controlled master role transfer
    features too.

    It is assumed in this document that the reader is familiar with
    the slonik utility and knows at least how to set up a simple
    2 node replication system with Slony-I.

Switchover

    We assume a current "origin" as node1 (AKA master) with one 
    "subscriber" as node2 (AKA slave).  A web application on a third
    server is accessing the database on node1.  Both databases are
    up and running and replication is more or less in sync.

  Step 1)

    At the time of this writing switchover to another server requires
    the application to reconnect to the database.  So in order to avoid
    any complications, we simply shut down the web server.  Users who
    use pgpool for the applications database connections can shutdown
    the pool only.

  Step 2)
    
    A small slonik script executes the following commands:

	lock set (id = 1, origin = 1);
	wait for event (origin = 1, confirmed = 2);
	move set (id = 1, old origin = 1, new origin = 2);
	wait for event (origin = 1, confirmed = 2);

    After these commands, the origin (master role) of data set 1
    is now on node2.  It is not simply transferred.  It is done
    in a fashion so that node1 is now a fully synchronized subscriber
    actively replicating the set.  So the two nodes completely switched
    roles.

  Step 3)

    After reconfiguring the web application (or pgpool) to connect to
    the database on node2 instead, the web server is restarted and
    resumes normal operation.

    Done in one shell script, that does the shutdown, slonik, move
    config files and startup all together, this entire procedure
    takes less than 10 seconds.

    It is now possible to simply shutdown node1 and do whatever is
    required.  When node1 is restarted later, it will start replicating
    again and eventually catch up after a while.  At this point the
    whole procedure is executed with exchanged node IDs and the
    original configuration is restored.
    

Failover

    Because of the possibility of missing not-yet-replicated
    transactions that are committed, failover is the worst thing
    that can happen in a master-slave replication scenario.  If there
    is any possibility to bring back the failed server even if only
    for a few minutes, we strongly recommended that you follow the
    switchover procedure above.

    Slony does not provide any automatic detection for failed systems.
    Abandoning committed transactions is a business decision that
    cannot be made by a database.  If someone wants to put the
    commands below into a script executed automatically from the
    network monitoring system, well ... its your data.

  Step 1)

    The slonik command

	failover (id = 1, backup node = 2);

    causes node2 to assume the ownership (origin) of all sets that
    have node1 as their current origin.  In the case there would be
    more nodes, All direct subscribers of node1 are instructed that
    this is happening.  Slonik would also query all direct subscribers
    to figure out which node has the highest replication status
    (latest committed transaction) for each set, and the configuration
    would be changed in a way that node2 first applies those last
    minute changes before actually allowing write access to the
    tables.

    In addition, all nodes that subscribed directly from node1 will
    now use node2 as data provider for the set.  This means that
    after the failover command succeeded, no node in the entire
    replication setup will receive anything from node1 any more.

  Step 2)

    Reconfigure and restart the application (or pgpool) to cause it
    to reconnect to node2.

  Step 3)

    After the failover is complete and node2 accepts write operations
    against the tables, remove all remnants of node1's configuration
    information with the slonik command

	drop node (id = 1, event node = 2);


After failover, getting back node1

    After the above failover, the data stored on node1 must be
    considered out of sync with the rest of the nodes.  Therefore, the
    only way to get node1 back and transfer the master role to it is
    to rebuild it from scratch as a slave, let it catch up and then
    follow the switchover procedure.