Backup Solutions for Postgres

I am running a Postgres 9.1on a remote Ubuntu 12.04 LTS server. This is the only program running on this server.

The server has two HDDs and I am planning on backing up the server to the second HDD.

I found a program “Backup” in the System Settings which I believe is Deja Dup. This seems like an easy way to backup the server. It is also an incremental solution which would not have to backup the entire system each night - only what has changed.

I could also do a pg_dump to the second disk, but I would have to figure out a way to schedule this daily. The database is currently at 8GB and I am not sure how long a pg_dump would take and if, as the db grows, if it would be completed during the late night hours.

I am looking for opinions as to what might be a good way to perform an automated and reliable backup.

Thanks.

@David Schlam I actually discussed this at the Atlanta PostrgreSQL Users Group a few months ago. So I will give you the high level overview of what my recommendations are.

since this is a remote server (and not “in house”) I would do this.

  • write a script (perl/python/ruby/bash) to do a "pg_dump " to a file. Each table to its own file.
  • have each backup have a date-timestamp in the name.
  • cron job the script to run every # hours (at least once a day).
  • encrypt the backups (pgp/gpg the file is fine)
  • get the backups off the server (rsync to another server, upload them to S3, something). Without doing this, if the server crashes, you lost all your backups. And you want to make sure they are encrypted before they leave the server.
  • use something like logrotate (or write you own) to get rid of old backups.

if you would like more details about this, please let me know.

The more info, the better. Thanks.

Curious about a few things.

pg-dump - any specific reason why each table is exported separately? Less chance of corruption?

I have no experience with scripting. If starting from scratch, would bash be the logical language choice?

Does the pg_dump slow down database functions? Can this be done during normal business hours? Or, if done overnight, is there a way to calculate how long the process would take?

two reasons.

  • you can restore a single table if you mess up a single table. If everything is in one file, you have to restored every table OR edit the backups to restore the single table. And you don’t want to be editing the backups when you are down.
  • if you need to restore several tables, you an do parallelism with multiple files. If it is all in a single file, you are restoring 1 table at a time in a serial fashion. Which is slower method.

bash is fine. I prefer Perl as I have done Perl for over 20 years now (crap I am old!). It really doesn’t matter which scripting language you do, as long as it is loaded on the dbase server.

I have considered writing a generic script that would query the dbase about what tables exist and then back them up one at a time. My current script(s) are very dbase specific and not generic.

technically yes. but 99% of the time you would never notice it. It has to do “queries” (I put those in quotes for a reason) again the dbase/table it is backing up. the 1% of the time you would notice it, you have other performance issues and pd_dump is just another straw on the performance camel’s back.

it depends… my clients hate it when I use that phrase. For most people you can do it during the business hours. I try to avoid the peak hours of the day. Just me being paranoid. If you are doing one pg_dump at a time (serial vs doing it in parallel), performance wise, it is like 1 more customer hitting the database. I have a good sized customer that does a backup of certain tables, once per hours all day/night long. To them they can’t loose more than an hours worth of data and to them the performance hit during the backups is not noticed.

there is no real way to calculate how long the process will take without just doing it. Now I can say that pg_dump is extremely fast. it could take seconds (for a small table) or minutes (for a large table). It is hard to tell because it depends on how large each record is (row in the table) and how many records (rows int the database).

Thanks. I appreciate the logic behind the process. I think it helps make things less painful.

I think this script in this post does exactly what you just described:

link text

I have written a simple script to do the backups. It is PG_Backup. As I find the time I will flush it out more. it is AS-IS.

the link is bad in previous post and I can’t edit it.

corrected link: PG_Backups

[quote=186653:@scott boss]@David Schlam I actually discussed this at the Atlanta PostrgreSQL Users Group a few months ago. So I will give you the high level overview of what my recommendations are.

since this is a remote server (and not “in house”) I would do this.

  • write a script (perl/python/ruby/bash) to do a "pg_dump " to a file. Each table to its own file.
  • have each backup have a date-timestamp in the name.
  • cron job the script to run every # hours (at least once a day).
  • encrypt the backups (pgp/gpg the file is fine)
  • get the backups off the server (rsync to another server, upload them to S3, something). Without doing this, if the server crashes, you lost all your backups. And you want to make sure they are encrypted before they leave the server.
  • use something like logrotate (or write you own) to get rid of old backups.

if you would like more details about this, please let me know.[/quote]

This is more or less the same scheme I use. I run the script below from a cron job once a day.

[code]#! /bin/bash

BACKUP_FILE="/var/postgresql/backups/dbf_backup_$(date +’%Y%m%d%H%M%S’).sql.gz.enc"
BACKUP_URL=“s3://backup/db_backups/$(hostname)/$(basename $BACKUP_FILE)”
S3_CMD="/usr/local/bin/s3cmd"

echo “Backing up postgresql databases to $BACKUP_FILE.”
su - postgres -c “/usr/bin/pg_dumpall” | gzip | openssl smime -encrypt -binary -aes-256-cbc -out “$BACKUP_FILE” -outform DER /etc/ssl/localcerts/db_backup.pem
echo “Backup complete.”

if [[ -e “$BACKUP_FILE” ]]; then
echo “Sending backup to $BACKUP_URL.”
$S3_CMD put “$BACKUP_FILE” “$BACKUP_URL”
$S3_CMD info “$BACKUP_URL”
fi[/code]

[quote=186747:@scott boss]the link is bad in previous post and I can’t edit it.

corrected link: PG_Backups [/quote]
Not sure if this is correct, but the link takes me to bitbucket which tells me “You do not have access to this repository.”

it should take you to bitbucket… but it should be open to all… I will check it now.

you should have access now. I marked it private by accident.

Thanks. That fixed it.

Great!!

Just stumbled across this link:

Just wanted to post it here because I didn’t realize Postgres had automated backups already handled.

Sorry, that was pgAdmin.

if you or anyone else has issues with the script or have enhancements you want, add a ticket/issue at bitbucket and I can see what I can do.

I was able to set up the bash script using the script on the postgres wiki to a second HDD on the server.

I know this doesn’t encrypt the files, however, I discovered that you can transfer the files securely to your local Mac using SSH.

This was actually pretty easy to do and gave me a remote backup of the entire directory.

The only issue that is remaining is setting up the cron job to backup the server. I set up the cron job using Gnome-schedule. The GUI interface made it a little easier to work with. I launched Gnome-schedule as root in terminal:

sudo gnome-schedule

and then ran the cron job as me with the following command:

sudo -u <username> /usr/local/bin/pg_backup_rotated.sh

(of course, replace with your username).