Skip to content

Background reading: Locking (1970)

I’ve been reading some old papers about locking and MVCC in preparation for writing about MVCC in PostgreSQL, and for giving a talk at CouchCamp next week!

I just finished “Locking“, by Jim Gray. He discusses on semaphores, and makes the argument for implementing a locking scheduler to handle errors and deadlocks (which he calls interlocks, or a “deadly embrace” – a term I’m sad we’ve stopped using).

An example from the start of the paper illustrates the power of MVCC:

The classic example is an accounting file. Processes reading the file may share it
concurrently. However, a process requesting write access to the file blocks until all processes currently reading have released the file.

A lovely thing about Postgres’ MVCC is that readers (SELECT) don’t require this type of lock, and most writers don’t block readers. For SELECT, the only statements that will block it are those that make changes to tables which move all rows physically around (VACUUM FULL, CLUSTER, REINDEX, TRUNCATE), or make changes to table structure (ALTER TABLE, DROP TABLE).

Have a look at the explicit locking docs for more detail on the lock modes automatically used by PostgreSQL.

Explaining MVCC in Postgres: system defined columns

I’m playing around with some diagrams for explaining MVCC that I’ll be posting here over the next few days. Not sure if I’ll end up giving up on slides and just use a whiteboard for the talk. I made an illustrated shared buffers deck to go along with Greg Smith’s excellent talk on shared buffers a while back. This is the beginning of a talk that I hope will emulate that.

Here are my first few slides, showing the system-defined columns. The next few slides will describe optimizations PostgreSQL has for managing the side effects of our pessimistic rollback strategy, and reducing IO during vacuuming and index updates.

Variable substitution with psql

Updated: Thanks @johto for s/:bar/:foo/. :)

A coworker asked about variable substitution with psql using \set, and so I looked into it a bit further.

You definitely can do things like this:

16:55 sdeckelmann@[local]:5432|postgres=> \set test 'select * from :foo limit 10;'
16:56 sdeckelmann@[local]:5432|postgres=> \set foo 'test'
16:56 sdeckelmann@[local]:5432|postgres=> :test
myint
-------
1
2
3
4
5
6
7
8
9
10
(10 rows)

But, what about something like this:


=> \set test 'select * from :var limit 10;'
=> :test mytable

Unfortunately, this isn’t supported.

The best you could do is something pathological like:

=> \set s 'select * from '
=> \set pr ' limit 10;'
=> :s mytable :pr
=> :s test :pr
myint
-------
1
2
3
4
5
6
7
8
9
10
(10 rows)

Using logger with pg_standby

Piping logs to syslog is pretty useful for automating log rotation and forwarding lots of different logs to a central log server.

To that end, the command-line utility ‘logger’ is nice for piping output from utilities like pg_standby without having to add syslogging code to the utility itself. Another thing is that logger comes by default with modern packages of syslog.

Here’s an easy way to implement this:


restore_command = 'pg_standby -d -s 2 -t /pgdata/trigger /shared/wal_archive/ %f %p %r 2>&1 | logger -p local3.info -t pgstandby'

Online aggregation paper from 1997 and PSU’s database reading group

A couple weeks ago, Mark Wong and I took a field trip over to the Database Reading Group at Portland State University. It’s a group of students and professors that meet weekly throughout the school year to go over research papers. The papers are picked by the participants, and vary in topic from obscure to very practical.

This week’s paper reading was led by Professor Len Shapiro, and titled “Online Aggregation“. The paper is considered a foundational paper about SQL aggregates (like COUNT() or AVERAGE), and was published in 1997 by researchers from UC Berkeley and IBM. It’s also precursor to research into query parallelization and streaming databases. It was also awarded the SIGMOD “Test of Time” award in 2007, and is cited by over 170 other papers in the ACM archive.

The basic idea behind the paper centered around how to improve user experience in reporting results of aggregate queries – asking questions about how to solve three key problems when solving aggregates: blocking, fairness and control (from a user’s perspective). Roughly: Blocking is what happens when some part of the system waits and doesn’t return results to the user as a result of the waiting. Fairness concerns whether certain types of operations prevent certain groups of data from being processed (the example given had to do with GROUP BY and groups being processed one at a time). Control concerns whether or not a user can exert control over the speed of computation applied to a group (example given being a lever that “speeds up” processing of a set).

One insight from the paper is how online aggregates should be treated differently than traditional query processing – which might favor expensive plans involving sorts so that the output is ordered. When you’re dealing with online aggregates, you prefer unordered, or ideally random order, because your intermediate results will be more representative of the ultimate result. I guess that’s probably obvious once you think about it, but the paper provided some concrete examples.

Another interesting thought experiment involving the planner is how you pick plans that favor non-blocking, fairness and user control. Each of those properties is not narrowly defined, and changes based on individual user expectation. Professor Kristen Tufte mentioned that she’d be interested in how the ideas presented in this paper would be applied today, and Professor David Meier brought up that we might most be interested in applications involving managing Hadoop.

Prof Meier also brought up an interesting paper involving alternating nested loop joins during a discussion about optimizing JOIN algorithms for online aggregates. Another cool thing about the paper is that it involved modifications to Postgres! Granted, it was Postgres95, which doesn’t resemble the modern PostgreSQL 9.0 very much. But it was nice to revisit research that used Postgres that’s still relevant today.

Customizing the RPMs from pgrpms.org

To pick up where Devrim left off in customizing RPMs, here are some more tips for getting your very own RPMs built:

  • Create a VM with your favorite operating system (I’m using versions of CentOS). I need both 32-bit OS and 64-bit OS. This is much easier to manage with separate, local VMs.
  • Install spectool (available here), and SVN
  • The other dependancies were: gcc glibc-devel bison flex python-devel tcl-devel readline-devel zlib-devel openssl-devel krb5-devel e2fsprocs-devel libxml2-devel libxslt-devel pam-devel
  • Edit the postgresql-$VERSION.spec file to your liking: If you’re adding patches, you need to add them in TWO places – first in the Patch#: group, and then again below where the %patch# series starts. Finally, if you’re adding an entirely new package (say in 8.2, pg_standby in contrib), you’ll need to also add the binary (or library, or whatever) to the appropriate %files clause later in the spec file. It’s also a good idea to modify ‘Release’. Here’s a sample diff of my spec file:


--- postgresql-8.2.spec (revision 188)
+++ postgresql-8.2.spec (working copy)
@@ -74,7 +74,7 @@
Summary: PostgreSQL client programs and libraries
Name: postgresql
Version: 8.2.17
-Release: 1PGDG%{?dist}
+Release: 1test%{?dist}
License: BSD
Group: Applications/Databases
Url: http://www.postgresql.org/
@@ -95,7 +95,9 @@
Patch4: postgresql-test.patch
Patch6: postgresql-perl-rpath.patch
Patch8: postgresql-prefer-ncurses.patch
+Patch7: postgresql-pgstat-dir.patch
Patch9: postgresql-use-zoneinfo.patch
+Patch10: pg_standby.patch

Buildrequires: perl glibc-devel bison flex
Requires: /sbin/ldconfig initscripts
@@ -282,7 +284,9 @@
%patch4 -p1
%patch6 -p1
%patch8 -p1
+%patch7 -p1
%patch9 -p1
+%patch10 -p1

pushd doc
tar -zcf postgres.tar.gz *.html stylesheet.css
@@ -604,6 +608,7 @@
%{_bindir}/pg_controldata
%{_bindir}/pg_ctl
%{_bindir}/pg_resetxlog
+%{_bindir}/pg_standby
%{_bindir}/postgres
%{_bindir}/postmaster
%{_mandir}/man1/initdb.*

How have you customized RPMs using this repo? Share your .spec files!

PDXPUG Day 2010: 9.0 and all sorts of good stuff

I’m nearly ready for PDXPUG Day 2010.

We’ve got some fun talks lined up from leaders in the Postgres community in town for OSCON. It’s free to attend. We’ll at the Oregon Convention Center, room D131.

Stop by sometime during the day if you’re in town!

Here’s our schedule:

10am: What’s new in 9.0 – Selena Deckelmann
11am: ORMs and Their Discontents – Christophe Pettus
1:30pm: Mining Your Logs For Fun and Profit – Josh Berkus
2:30pm: PORTAL – Dan Colish
3:30pm: An Introduction to Managing and Troubleshooting PostgreSQL on Windows – Tim Bruce
4:30pm: PostgreSQL in Brazilian Army and Air Force – Luis Dosso

6pm-10pm: Party at the Gotham Tavern!

Foocamp 2010: lovely, expectant, reflective

This has been sitting in my edit queue for too long… here goes.

So, I was invited to Foocamp this year. Back in February, I attended KiwiFoo at the invitation of Nat Torkington. I realize now (sorry, Nat!) that I never blogged about going there. I have some rough notes around that I may try to edit down.

I’ll avoid any direct comparisons for a moment, and just talk about what it was like to dive into a slice of Silicon Valley, dislocated into the countryside for a weekend.

When I arrived, I felt immediately at home, welcomed and appreciated. Sara Winge was so helpful and easy-going about the Tesla Coil Josh brought. (Thanks for the ride, Josh!) I spent a lot of evening time educating people about avoiding touching the sparks (it may be a pretty toy, but it is in fact a dangerous one!) and also playing around with a metal glove that was rigged up so you could get a bit closer in to the coil.

I spent a couple delightful hours considering claude glass with Roberta Cairney. Over and over, I absorbed the positive energy tumbling out of Sumana. Once again, I spoke with Kiwis whose humor and affectionate swearing reminded me of home.

I held one session – about forgetting and the ethics of shifting our culture to assume everything may be on your permanent record, and the ways in which people try to opt out or game the system. I was surprised and overwhelmed by the people who attended. Rita King, Scott Berkun, Biella Coleman and danah boyd were all part of the discussion and I was able to talk about ideas I’ve had tumbling around in my head for years. I believe sysadmins/devops must have conversations about the ethics of the default choices made by developers around configuration and long-term management of log data. People asked provocative questions, and we had a real debate about the ethics. It was a wonderful experience and I came up with at least one good idea that I hope Jesse Robbins and I are going to act on together.

I also ran into several people who are just starting to work on user group and community issues in their geographic areas. Seattle came up over and over – and I’m looking forward to helping Ben Huh and the open gov folks who want to do grassroots organizing in their tech communities. I also met some amazing women there, and hope that we’re able to continue our discussions about business and tech in the future.

I camped, and was in a tent under the stars, and early morning fog. I enjoyed running into a fellow PostgreSQL community member, Paul Ramsey, among the early risers.

Apart from the immediate things to collaborate on, and an incredibly long list of new ideas and connection points, I came away inspired and mentally refreshed. I relished the relative lack of device obsession. The people that I wanted to have conversations with tended to have put their devices away for a few hours, and were focusing on the people in front of them.

The Ignite talks were my favorite talks. Jake Applebaum‘s meditation on wikileaks was particularly inspiring, reminding me to seek out opportunities to change the world for the better.

Q&A about Hot Standby

Updated!: See below.

Here are some questions that came up from trying to use the current PostgreSQL hot standby documentation:

Q: If you set hot_standby = off after having it on, what happens?

A: This change requires a database restart on the hot standby (or replica) server. The database goes into “warm standby” mode, and you can no longer issue queries against it. You can change this right back by setting the parameter to ‘on’ and restarting again.

Q: Can you use hot standby with only a single schema or database?

A: No. Hot Standby is all-or-nothing for a particular PostgreSQL database cluster. A cluster is made up of all the databases that live in a particular $PGDATA instance, and Hot Standby is currently not capable of distinguishing between changes occurring on different particular databases or schemas.

Q: Is the process for setting up hot standby any different for empty databases vs. populated databases?

A: No. The setup process is the same – you must create a base backup.

Q: How do I bring my hot standby out of standby mode?

A: If you’re using something like the following in your recovery.conf file:

restore_command = 'cp xxxx'
standby_mode = 'on'

Change: standby_mode = 'off' and restart your hot standby postgresql instance.

Q: Where did my recovery.conf file go? (after your database came out of warm/hot standby)

A: PostgreSQL automatically changes the name of the file to recovery.done when recovery completes. This helps prevent accidents.

Q: What happens if my archive_timeout = 60 (which creates a 16mb file every minute) and I flood the database with so much activity that my standby falls behind?

A: This is possible, and you may be interested in trying Streaming Replication. However, for the majority of users, a delay in restoring data is acceptable (and possibly desirable). Eventually the standby server will catch up. You can monitor how delayed the server is using functions like txid_current_snapshot().

Q: Are schema changes (like CREATE TABLE or ALTER TABLE) replicated to the standby?

A: Yes! All changes to the database cluster are copied to the standby. This includes any DDL operations, new rows, the effects of autovacuum — any change to the data store on the master is copied to the standby.

Quick start on Hot Standby

Updated.

We could have some better end-user documentation around creating a warm or hot standby system for basic postgresql replication.

To this end, I created a Quick Start doc on the wiki, but it could use more help. Maybe we should create some setup recipes for common situations?

Also – I wrote the following script during a hot standby bugbash PDXPUG had today:


#!/bin/sh

BINPATH=/usr/local/pg90/bin
CP=/bin/cp
PGCTL=${BINPATH}/pg_ctl
PSQL=${BINPATH}/psql
INITDB=${BINPATH}/initdb

sudo mkdir -p /var/tmp/archive
sudo chown ${USER} /var/tmp/archive

${INITDB} hotstandby1

echo 'wal_level = hot_standby' >> hotstandby1/postgresql.conf
echo 'archive_mode = on' >> hotstandby1/postgresql.conf
echo "archive_command = 'cp %p /var/tmp/archive/%f'" >> hotstandby1/postgresql.conf
echo "archive_timeout = 60" >> hotstandby1/postgresql.conf
echo "port = 6543" >> hotstandby1/postgresql.conf

${PGCTL} -D hotstandby1 start -l hotstandby1.log
sleep 5

${PSQL} -p 6543 postgres -c "select pg_start_backup('backup')"
${CP} -pR hotstandby1/ hotstandby2
${PSQL} -p 6543 postgres -c "select pg_stop_backup()"
rm hotstandby2/postmaster.pid
rm hotstandby2/pg_xlog/*

echo 'hot_standby = on' >> hotstandby2/postgresql.conf
echo 'port = 6544' >> hotstandby2/postgresql.conf
echo "standby_mode = 'on'" >> hotstandby2/recovery.conf
echo "restore_command = 'cp -i /var/tmp/archive/%f %p'" >> hotstandby2/recovery.conf

${PGCTL} -D hotstandby2 start -l hotstandby2.log

* Added port specification in case you’ve already got postgres running. Added a BINPATH for custom install directories.