Gentoo - PostgreSQL replication (native)

By: John McFarlane <john.mcfarlane@rockfloat.com>
Last updated: 02/14/2007 @ 10:00

Abstract:
This document will go thru a step by step implementation of native PostgreSQL replication.



1. Introduction

Database replication is awesome. I've heard some describe replication as a nightmare that's only used when the database wasn't designed properly in the first place. I think that's rubbish. Before you move on I really recommend that you read up on replication and more specifically database replication . Be sure to take note that replication is very different than clustering. Replication is a master-slave technology while true clustering is multi-master. For the time being, if you really need clustering you should prolly look towards Oracle Rac.

This tutorial assumes that you want to explore PostgreSQL's native implementation of replication, which is probably one of the easier to understand and implement. I belive it was a project named "DBMirror" that was moved into the official PostgreSQL Contrib beginning with 7.3.1. There are all sorts of more sophisticated replication packages out there, if you need something more feature rich:

  1. Slony-1
  2. PGreplication
  3. PGCluster
  4. PGPool
  5. DBBalancer
  6. Open Minds (Commercial)
REFERENCE

I'm finished with this step

2. Create a directory to hold the configuration

This howto will have you create a few files. Some are only used and then can be deleted, a few of them are actually used as config files for running the replication.

user# mkdir cluster # Make this wherever you want it
user# cd cluster
    	
I'm finished with this step

3. Installation

Because the replication features are actually part of PostgreSQL all you really need is to make sure you have postgresql and the PostgreSQL Perl module pgperl. For us Gentoo cats this means:

root# emerge --sync
root# emerge -a postgresql  #Answer "N" if you already have the latest
root# emerge -a pgperl
    	
It's important to know the location of a few files as we'll need them later. Here's a list of what you'll need:

user# equery files postgresql | grep \
	-e DBMirror \
	-e MirrorSetup \
	-e AddTrigger \
	-e slaveDatabase \
	-e pending.so
	
# Should return a list like this:
/usr/bin/DBMirror.pl
/usr/lib/postgresql/pending.so  # This is what provides the love
/usr/share/postgresql/contrib/AddTrigger.sql
/usr/share/postgresql/contrib/MirrorSetup.sql
/usr/share/postgresql/contrib/slaveDatabase.conf

    	
Tip If you don't get this list of files Stop! You don't have everything you need. If you happen to be running another distribution you might need to install the contrib package. For example on Debian you should: apt-get install postgresql-contrib-8.x
I will always use full paths when referring to the above files, but it helps to know that the postgresql install actually installed them.
I'm finished with this step

4. Create the master/slave databases

First you need to create two databases, then you need to populate their schemas.
  1. 
    user# createdb -U postgres -E UTF-8 -O postgres cluster-master
    user# createdb -U postgres -E UTF-8 -O postgres cluster-slave01
    user# createdb -U postgres -E UTF-8 -O postgres cluster-slave02
        			
  2. Create a simple schema file for use with populating all three:
    
    user# nano -w cluster.sql
    
    # Make the file have this content:
    CREATE TABLE car(
    	uid SERIAL PRIMARY KEY,
    	name VARCHAR,
    	make VARCHAR,
    	model VARCHAR
    );
        			
  3. Populate the schema for all of the clustered db(s):
    
    user# psql -U postgres cluster-master < cluster.sql
    user# psql -U postgres cluster-slave01 < cluster.sql
    user# psql -U postgres cluster-slave02 < cluster.sql
        			
I'm finished with this step

5. Configure the master for replication

Now we need to update the schema of the master so it can understand the types of questions the slaves will be asking of it:

user# psql -U postgres cluster-master < /usr/share/postgresql/contrib/MirrorSetup.sql
    	
The master also needs a trigger that does the work of exposing the desired tables to the slaves. This means you need one trigger per table that you want replicated. This makes it easy to only replicate the certain tables. This makes it hard if you want to replicate everything, deal with it :)

user# cp /usr/share/postgresql/contrib/AddTrigger.sql AddTrigger.sql
user# nano -w AddTrigger.sql

# Make it look like this
-- Adjust this setting to control where the objects get created.
SET search_path = public;

CREATE TRIGGER "mirror_car"
AFTER INSERT OR DELETE OR UPDATE ON "car"
FOR EACH ROW EXECUTE PROCEDURE "recordchange" ();

--Uncomment this if you also want to replicate the "boat" table:
--CREATE TRIGGER "mirror_boat"
--AFTER INSERT OR DELETE OR UPDATE ON "boat"
--FOR EACH ROW EXECUTE PROCEDURE "recordchange" ();
    	
Now we need to actually create the trigger(s), this step is similar to when we populated the schema;

user# psql -U postgres cluster-master < AddTrigger.sql
    	
From a schema perspective, the master is now ready to replicate.
I'm finished with this step

6. Configure the slave(s) for replication

The slave configuration is a bit different. You don't change the schema, you simply create a text file per slave. We will make one file per slave:

user# cp /usr/share/postgresql/contrib/slaveDatabase.conf slave01.conf
user# cp /usr/share/postgresql/contrib/slaveDatabase.conf slave02.conf    	
    	
You'll now need to edit both files to specify the details of each slave. Here's what slave01 should look like, you can model slave02.conf after it:

#########################################################################
# Config file for DBMirror.pl
# This file contains a sample configuration file for DBMirror.pl
# It contains configuration information to mirror data from 
# the master database to a single slave system.
#
# $PostgreSQL: pgsql/contrib/dbmirror/slaveDatabase.conf,v 1.3 2004/09/10 04:31:06 neilc Exp $
#######################################################################

$masterHost = "localhost";
$masterDb = "cluster-master";
$masterUser = "postgres";
$masterPassword = "postgrespassword";

# Where to email Error messages to
# $errorEmailAddr = "me@mydomain.com";

$slaveInfo->{"slaveName"} = "db-slave01";
$slaveInfo->{"slaveHost"} = "localhost";
$slaveInfo->{"slaveDb"} = "cluster-slave01";
$slaveInfo->{"slavePort"} = 5432;
$slaveInfo->{"slaveUser"} = "postgres";
$slaveInfo->{"slavePassword"} = "postgrespassword";
# If uncommented then text files with SQL statements are generated instead
# of connecting to the slave database directly. 
# slaveDb should then be commented out.
# $slaveInfo{"TransactionFileDirectory"} = '/tmp';

#
# The number of seconds dbmirror should sleep for between checking to see
# if more data is ready to be mirrored.
$sleepInterval = 60;
    	
I'm finished with this step

7. Final magic to bring it all together

Currently the master and slave(s) all support replication, yet the master doesn't know who it's supposed to replicate to. We need to essentially tell the master who her children are:

user# psql -U postgres cluster-master

-- We're now in the psql subsystem
cluster-master=# INSERT INTO dbmirror_mirrorhost VALUES (1, 'db-slave01');
INSERT 0 1
cluster-master=# INSERT INTO dbmirror_mirrorhost VALUES (2, 'db-slave02');
INSERT 0 1

# Exit the psql subsystem:
cluster-master=# \q
    	
I'm finished with this step

8. Startup replication

Ok, here we go. We are going to fire up two replication services, one for each slave:

user# /usr/bin/DBMirror.pl slave01.conf >/dev/null 2>/dev/null &
user# /usr/bin/DBMirror.pl slave02.conf >/dev/null 2>/dev/null &
    	
Now the slaves will check every 60 seconds to see if there is anything they should know about. You can update this time in slave0x.conf.
I'm finished with this step

9. Test to make sure it's really working

We will perform all of this testing by doing database queries. Since we'll need to jump around from db to db, I'll do things in an awkward way that happens to lend it self to cutting and pasting. You do like to cut and paste don't you?

# Validate all three db's are in fact empty:
user# psql -U postgres cluster-master -c 'SELECT * FROM car;'

 uid | name | make | model 
-----+------+------+-------
(0 rows)

user# psql -U postgres cluster-slave01 -c 'SELECT * FROM car;'

 uid | name | make | model 
-----+------+------+-------
(0 rows)

user# psql -U postgres cluster-slave02 -c 'SELECT * FROM car;'

 uid | name | make | model 
-----+------+------+-------
(0 rows)


# Insert a new car into our master database:
user# psql -U postgres cluster-master -c "INSERT INTO car(name, make, model) VALUES('foo', 'Honda', 'Accord');"
INSERT 0 1

# See what's in the first slave:
user# psql -U postgres cluster-slave01 -c 'SELECT * FROM car;'

 uid | name | make  | model  
-----+------+-------+--------
   1 | foo  | Honda | Accord
(1 row)


# Sweet mother!, is it in slave02 too?:
user# psql -U postgres cluster-slave02 -c 'SELECT * FROM car;'

 uid | name | make  | model  
-----+------+-------+--------
   1 | foo  | Honda | Accord
(1 row)

    	
Well, there you have it. One master database that's replicating to two cute little children. Good job, get yourself a cookie.
I'm finished with this step

10. How to replicate accross a network

This is pretty easy, just update the slave0x.conf files to specify the actual hostname on the network (not localhost) and make the corresponding change(s) when you run the INSERT statements. You also need to run the slave commands on... the slaves :)

When I get time I'll flesh this out in more detail

I'm finished with this step

Changelog: Date Description
02/14/2007 @ 10:00 Initial creation

This document was originally created on 02/14/2007


Conventions and tips for this howto document:
  1. This howto is intended for >=Postgres-7.3.1
  2. This howto assumes you have portage-utils installed
  3. This howto assumes your db superuser is: postgres
  4. This howto assumes you have postgres configured to support TCP/IP

Disclaimer:
This page is not endorsed by gentoo.org or any other cool cats. Any information provided in this document is to be used at your own risk.