Slony-I actually does a lot of its necessary maintenance itself, in a "cleanup" thread:
Deletes old data from various tables in the Slony-I cluster's namespace, notably entries in sl_log_1, sl_log_2, and sl_seqlog.
Vacuum certain tables used by Slony-I. As of 1.0.5,
this includes pg_listener
; in earlier versions, you must vacuum that
table heavily, otherwise you'll find replication slowing down because
Slony-I raises plenty of events, which leads to that table having
plenty of dead tuples.
In some versions (1.1, for sure; possibly 1.0.5) there is the
option of not bothering to vacuum any of these tables if you are using
something like pg_autovacuum to handle
vacuuming of these tables. Unfortunately, it has been quite possible
for pg_autovacuum to not vacuum quite
frequently enough, so you may prefer to use the internal vacuums.
Vacuuming pg_listener
"too often" isn't nearly as
hazardous as not vacuuming it frequently enough.
Unfortunately, if you have long-running transactions, vacuums
cannot clear out dead tuples that are newer than the eldest
transaction that is still running. This will most notably lead to
pg_listener
growing large and will slow
replication.
The Duplicate Key Violation bug has helped track down a number of rather obscure PostgreSQL race conditions, so that in modern versions of Slony-I and PostgreSQL, there should be little to worry about.
As of version 1.2, "log switching"
functionality is in place; every so often (by default, once per week,
though you may induce it by calling the stored
function logswitch_start()
), it seeks to switch
between storing data in sl_log_1 and sl_log_2 so that it may seek
to TRUNCATE the "elder" data.
That means that on a regular basis, these tables are completely cleared out, so that you will not suffer from them having grown to some significant size, due to heavy load, after which they are incapable of shrinking back down
In version 2.0, DELETE is no longer used to clear out data in sl_log_1 and sl_log_2; instead, the log switch logic is induced frequently, every time the cleanup loop does not find a switch in progress, and these tables are purely cleared out via TRUNCATE. This eliminates the need to vacuum these tables.
Recent versions of PostgreSQL support an "autovacuum" process which notices when tables are modified, thereby creating dead tuples, and vacuums those tables, "on demand." It has been observed that this can interact somewhat negatively with Slony-I's own vacuuming policies on its own tables.
Slony-I requests vacuums on its tables immediately after completing transactions that are expected to clean out old data, which may be expected to be the ideal time to do so. It appears as though autovacuum may notice the changes a bit earlier, and attempts vacuuming when transactions are not complete, rendering the work pretty useless. It seems preferable to configure autovacuum to avoid vacuum Slony-I-managed configuration tables.
The following query (change the cluster name to match your local configuration) on Postgres 8 versions up to and not including 8.4 will identify the tables that autovacuum should be configured not to process:
mycluster=# select oid, relname from pg_class where relnamespace = (select oid from pg_namespace where nspname = '_' || 'MyCluster') and relhasindex; oid | relname -------+-------------- 17946 | sl_nodelock 17963 | sl_setsync 17994 | sl_trigger 17980 | sl_table 18003 | sl_sequence 17937 | sl_node 18034 | sl_listen 18017 | sl_path 18048 | sl_subscribe 17951 | sl_set 18062 | sl_event 18069 | sl_confirm 18074 | sl_seqlog 18078 | sl_log_1 18085 | sl_log_2 (15 rows) |
The following query (for Postgres versions up to and not including
8.4) will populate
pg_catalog.pg_autovacuum
with suitable configuration
information: INSERT INTO pg_catalog.pg_autovacuum (vacrelid, enabled, vac_base_thresh, vac_scale_factor, anl_base_thresh, anl_scale_factor, vac_cost_delay, vac_cost_limit, freeze_min_age, freeze_max_age) SELECT oid, 'f', -1, -1, -1, -1, -1, -1, -1, -1 FROM pg_catalog.pg_class WHERE relnamespace = (SELECT OID FROM pg_namespace WHERE nspname = '_' || 'MyCluster') AND relhasindex;
There are a couple of "watchdog" scripts available that monitor things, and restart the slon processes should they happen to die for some reason, such as a network "glitch" that causes loss of connectivity.
You might want to run them...
The "best new way" of managing slon(1) processes is via the combination of Section 7.1.2, which creates a configuration file for each node in a cluster, and Section 7.1.4, which uses those configuration files.
This approach is preferable to elder "watchdog" systems in that you can very precisely "nail down," in each config file, the exact desired configuration for each node, and not need to be concerned with what options the watchdog script may or may not give you. This is particularly important if you are using log shipping , where forgetting the -a option could ruin your log shipped node, and thereby your whole day.
In the tools directory, you will find Section 4.3.1 scripts called test_slony_state.pl and test_slony_state-dbi.pl. One uses the Perl/DBI interface; the other uses the Pg interface.
Both do essentially the same thing, namely to connect to a Slony-I node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, so this should be quite safe to run) which examine various Slony-I tables, looking for a variety of sorts of conditions suggestive of problems, including:
Bloating of tables like pg_listener, sl_log_1, sl_log_2, sl_seqlog
Listen paths
Analysis of Event propagation
Analysis of Event confirmation propagation
If communications is a little broken, replication may happen, but confirmations may not get back, which prevents nodes from clearing out old events and old replication data.
Running this once an hour or once a day can help you detect symptoms of problems early, before they lead to performance degradation.
In the directory tools may be found four scripts that may be used to do monitoring of Slony-I instances:
test_slony_replication is a Perl script to which you can pass connection information to get to a Slony-I node. It then queries sl_path and other information on that node in order to determine the shape of the requested replication set.
It then injects some test queries to a test table called
slony_test
which is defined as follows, and which needs to be
added to the set of tables being replicated:
CREATE TABLE slony_test ( description text, mod_date timestamp with time zone, "_Slony-I_testcluster_rowID" bigint DEFAULT nextval('"_testcluster".sl_rowid_seq'::text) NOT NULL ); |
The last column in that table was defined by Slony-I as one lacking a primary key...
This script generates a line of output for each Slony-I node that is active for the requested replication set in a file called cluster.fact.log.
There is an additional finalquery
option that allows
you to pass in an application-specific SQL query that can determine
something about the state of your application.
log.pm is a Perl module that manages logging for the Perl scripts.
run_rep_tests.sh is a "wrapper" script that runs test_slony_replication.
If you have several Slony-I clusters, you might set up configuration in this file to connect to all those clusters.
nagios_slony_test is a script that was constructed to query the log files so that you might run the replication tests every so often (we run them every 6 minutes), and then a system monitoring tool such as Nagios can be set up to use this script to query the state indicated in those logs.
It seemed rather more efficient to have a cron job run the tests and have Nagios check the results rather than having Nagios run the tests directly. The tests can exercise the whole Slony-I cluster at once rather than Nagios invoking updates over and over again.
The methodology of the previous section is designed with a view to minimizing the cost of submitting replication test queries; on a busy cluster, supporting hundreds of users, the cost associated with running a few queries is likely to be pretty irrelevant, and the setup cost to configure the tables and data injectors is pretty high.
Three other methods for analyzing the state of replication have stood out:
For an application-oriented test, it has been useful to set up a view on some frequently updated table that pulls application-specific information.
For instance, one might look either at some statistics about a most recently created application object, or an application transaction. For instance:
create view replication_test as select now() - txn_time as age, object_name from transaction_table order by txn_time desc limit 1;
create view replication_test as select now() - created_on as age, object_name from object_table order by id desc limit 1;
There is a downside: This approach requires that you have regular activity going through the system that will lead to there being new transactions on a regular basis. If something breaks down with your application, you may start getting spurious warnings about replication being behind, despite the fact that replication is working fine.
The Slony-I-defined view, sl_status
provides information as to how up to date different nodes are. Its
contents are only really interesting on origin nodes, as the events
generated on other nodes are generally ignorable.
See also the Section 4.3.3 discussion.
slon(1) daemons generate some more-or-less verbose log files, depending on what debugging level is turned on. You might assortedly wish to:
Use a log rotator like Apache rotatelogs to have a sequence of log files so that no one of them gets too big;
Purge out old log files, periodically.
Create a slon service directory for use with svscan from daemontools. This uses multilog in a pretty basic way, which seems to be standard for daemontools / multilog setups. If you want clever logging, see logrep below. Currently this script has very limited error handling capabilities.
For non-interactive use, set the following environment
variables. BASEDIR
SYSUSR
PASSFILE
DBUSER
HOST
PORT
DATABASE
CLUSTER
SLON_BINARY
If any of the above are not set, the script
asks for configuration information interactively.
BASEDIR
where you want the service directory structure for the slon
to be created. This should not be the /var/service directory.
SYSUSR
the unix user under which the slon (and multilog) process should run.
PASSFILE
location of the .pgpass file to be used. (default ~sysusr/.pgpass)
DBUSER
the postgres user the slon should connect as (default slony)
HOST
what database server to connect to (default localhost)
PORT
what port to connect to (default 5432)
DATABASE
which database to connect to (default dbuser)
CLUSTER
the name of your Slony1 cluster? (default database)
SLON_BINARY
the full path name of the slon binary (default which slon)
This uses tail -F to pull data from log files allowing you to use multilog filters (by setting the CRITERIA) to create special purpose log files. The goal is to provide a way to monitor log files in near realtime for "interesting" data without either hacking up the initial log file or wasting CPU/IO by re-scanning the same log repeatedly.
For non-interactive use, set the following environment
variables. BASEDIR
SYSUSR
SOURCE
EXTENSION
CRITERIA
If any of the above are not set,
the script asks for configuration information interactively.
BASEDIR
where you want the service directory structure for the logrep
to be created. This should not be the /var/service directory.
SYSUSR
unix user under which the service should run.
SOURCE
name of the service with the log you want to follow.
EXTENSION
a tag to differentiate this logrep from others using the same source.
CRITERIA
the multilog filter you want to use.
A trivial example of this would be to provide a log file of all slon ERROR messages which could be used to trigger a nagios alarm. EXTENSION='ERRORS' CRITERIA="'-*' '+* * ERROR*'" (Reset the monitor by rotating the log using svc -a $svc_dir)
A more interesting application is a subscription progress log. EXTENSION='COPY' CRITERIA="'-*' '+* * ERROR*' '+* * WARN*' '+* * CONFIG enableSubscription*' '+* * DEBUG2 remoteWorkerThread_* prepare to copy table*' '+* * DEBUG2 remoteWorkerThread_* all tables for set * found on subscriber*' '+* * DEBUG2 remoteWorkerThread_* copy*' '+* * DEBUG2 remoteWorkerThread_* Begin COPY of table*' '+* * DEBUG2 remoteWorkerThread_* * bytes copied for table*' '+* * DEBUG2 remoteWorkerThread_* * seconds to*' '+* * DEBUG2 remoteWorkerThread_* set last_value of sequence*' '+* * DEBUG2 remoteWorkerThread_* copy_set*'"
If you have a subscription log then it's easy to determine if a given slon is in the process of handling copies or other subscription activity. If the log isn't empty, and doesn't end with a "CONFIG enableSubscription: sub_set:1" (or whatever set number you've subscribed) then the slon is currently in the middle of initial copies.
If you happen to be monitoring the mtime of your primary slony logs to determine if your slon has gone brain-dead, checking this is a good way to avoid mistakenly clobbering it in the middle of a subscribe. As a bonus, recall that since the the slons are running under svscan, you only need to kill it (via the svc interface) and let svscan start it up again laster. I've also found the COPY logs handy for following subscribe activity interactively.