Skip to content

PDXPUG Day 2010: 9.0 and all sorts of good stuff

I’m nearly ready for PDXPUG Day 2010.

We’ve got some fun talks lined up from leaders in the Postgres community in town for OSCON. It’s free to attend. We’ll at the Oregon Convention Center, room D131.

Stop by sometime during the day if you’re in town!

Here’s our schedule:

10am: What’s new in 9.0 – Selena Deckelmann
11am: ORMs and Their Discontents – Christophe Pettus
1:30pm: Mining Your Logs For Fun and Profit – Josh Berkus
2:30pm: PORTAL – Dan Colish
3:30pm: An Introduction to Managing and Troubleshooting PostgreSQL on Windows – Tim Bruce
4:30pm: PostgreSQL in Brazilian Army and Air Force – Luis Dosso

6pm-10pm: Party at the Gotham Tavern!

Foocamp 2010: lovely, expectant, reflective

This has been sitting in my edit queue for too long… here goes.

So, I was invited to Foocamp this year. Back in February, I attended KiwiFoo at the invitation of Nat Torkington. I realize now (sorry, Nat!) that I never blogged about going there. I have some rough notes around that I may try to edit down.

I’ll avoid any direct comparisons for a moment, and just talk about what it was like to dive into a slice of Silicon Valley, dislocated into the countryside for a weekend.

When I arrived, I felt immediately at home, welcomed and appreciated. Sara Winge was so helpful and easy-going about the Tesla Coil Josh brought. (Thanks for the ride, Josh!) I spent a lot of evening time educating people about avoiding touching the sparks (it may be a pretty toy, but it is in fact a dangerous one!) and also playing around with a metal glove that was rigged up so you could get a bit closer in to the coil.

I spent a couple delightful hours considering claude glass with Roberta Cairney. Over and over, I absorbed the positive energy tumbling out of Sumana. Once again, I spoke with Kiwis whose humor and affectionate swearing reminded me of home.

I held one session – about forgetting and the ethics of shifting our culture to assume everything may be on your permanent record, and the ways in which people try to opt out or game the system. I was surprised and overwhelmed by the people who attended. Rita King, Scott Berkun, Biella Coleman and danah boyd were all part of the discussion and I was able to talk about ideas I’ve had tumbling around in my head for years. I believe sysadmins/devops must have conversations about the ethics of the default choices made by developers around configuration and long-term management of log data. People asked provocative questions, and we had a real debate about the ethics. It was a wonderful experience and I came up with at least one good idea that I hope Jesse Robbins and I are going to act on together.

I also ran into several people who are just starting to work on user group and community issues in their geographic areas. Seattle came up over and over – and I’m looking forward to helping Ben Huh and the open gov folks who want to do grassroots organizing in their tech communities. I also met some amazing women there, and hope that we’re able to continue our discussions about business and tech in the future.

I camped, and was in a tent under the stars, and early morning fog. I enjoyed running into a fellow PostgreSQL community member, Paul Ramsey, among the early risers.

Apart from the immediate things to collaborate on, and an incredibly long list of new ideas and connection points, I came away inspired and mentally refreshed. I relished the relative lack of device obsession. The people that I wanted to have conversations with tended to have put their devices away for a few hours, and were focusing on the people in front of them.

The Ignite talks were my favorite talks. Jake Applebaum’s meditation on wikileaks was particularly inspiring, reminding me to seek out opportunities to change the world for the better.

Q&A about Hot Standby

Updated!: See below.

Here are some questions that came up from trying to use the current PostgreSQL hot standby documentation:

Q: If you set hot_standby = off after having it on, what happens?

A: This change requires a database restart on the hot standby (or replica) server. The database goes into “warm standby” mode, and you can no longer issue queries against it. You can change this right back by setting the parameter to ‘on’ and restarting again.

Q: Can you use hot standby with only a single schema or database?

A: No. Hot Standby is all-or-nothing for a particular PostgreSQL database cluster. A cluster is made up of all the databases that live in a particular $PGDATA instance, and Hot Standby is currently not capable of distinguishing between changes occurring on different particular databases or schemas.

Q: Is the process for setting up hot standby any different for empty databases vs. populated databases?

A: No. The setup process is the same – you must create a base backup.

Q: How do I bring my hot standby out of standby mode?

A: If you’re using something like the following in your recovery.conf file:

restore_command = 'cp xxxx'
standby_mode = 'on'

Change: standby_mode = 'off' and restart your hot standby postgresql instance.

Q: Where did my recovery.conf file go? (after your database came out of warm/hot standby)

A: PostgreSQL automatically changes the name of the file to recovery.done when recovery completes. This helps prevent accidents.

Q: What happens if my archive_timeout = 60 (which creates a 16mb file every minute) and I flood the database with so much activity that my standby falls behind?

A: This is possible, and you may be interested in trying Streaming Replication. However, for the majority of users, a delay in restoring data is acceptable (and possibly desirable). Eventually the standby server will catch up. You can monitor how delayed the server is using functions like txid_current_snapshot().

Q: Are schema changes (like CREATE TABLE or ALTER TABLE) replicated to the standby?

A: Yes! All changes to the database cluster are copied to the standby. This includes any DDL operations, new rows, the effects of autovacuum — any change to the data store on the master is copied to the standby.

Quick start on Hot Standby

Updated.

We could have some better end-user documentation around creating a warm or hot standby system for basic postgresql replication.

To this end, I created a Quick Start doc on the wiki, but it could use more help. Maybe we should create some setup recipes for common situations?

Also – I wrote the following script during a hot standby bugbash PDXPUG had today:


#!/bin/sh

BINPATH=/usr/local/pg90/bin
CP=/bin/cp
PGCTL=${BINPATH}/pg_ctl
PSQL=${BINPATH}/psql
INITDB=${BINPATH}/initdb

sudo mkdir -p /var/tmp/archive
sudo chown ${USER} /var/tmp/archive

${INITDB} hotstandby1

echo 'wal_level = hot_standby' >> hotstandby1/postgresql.conf
echo 'archive_mode = on' >> hotstandby1/postgresql.conf
echo "archive_command = 'cp %p /var/tmp/archive/%f'" >> hotstandby1/postgresql.conf
echo "archive_timeout = 60" >> hotstandby1/postgresql.conf
echo "port = 6543" >> hotstandby1/postgresql.conf

${PGCTL} -D hotstandby1 start -l hotstandby1.log
sleep 5

${PSQL} -p 6543 postgres -c "select pg_start_backup('backup')"
${CP} -pR hotstandby1/ hotstandby2
${PSQL} -p 6543 postgres -c "select pg_stop_backup()"
rm hotstandby2/postmaster.pid
rm hotstandby2/pg_xlog/*

echo 'hot_standby = on' >> hotstandby2/postgresql.conf
echo 'port = 6544' >> hotstandby2/postgresql.conf
echo "standby_mode = 'on'" >> hotstandby2/recovery.conf
echo "restore_command = 'cp -i /var/tmp/archive/%f %p'" >> hotstandby2/recovery.conf

${PGCTL} -D hotstandby2 start -l hotstandby2.log

* Added port specification in case you’ve already got postgres running. Added a BINPATH for custom install directories.

Forgetting: Logging as an ethical choice

I have kind of a weird idea for a database person.

Forgetting should be built into our applications by default. I just spent the weekend at FooCamp, and I held a session to discuss this idea, and some of the possible consequences if it were implemented.

To explain why I think this, I’m going to take an extreme stance for a moment and argue a position that I’d like to see rebutted. So, please have at it! :)

For too long we have allowed decisions made by developers – default application settings – to determine what ultimately become surveillance levels.

There are notable counter examples: 4chan intentionally expires postings every few days. Riseup keeps no logs. The EFF documents what we do and do not legally need to keep. These, however, are the efforts of a tiny minority when considered against the rest of the web.

Over time, our conception of what is reasonable has changed around logging and accounting for vast periods of our activities. Never before would a silly recording taken by a 15-year old be stored indefinitely, and then be documented as a watershed event because of how many times it was viewed in a vast global network, rather than for the content of the cultural artifact itself. The log of views themselves were the cultural artifact, and it is celebrated.

Fading away isn’t evil. But we act like it is when we pipe what once was ephemeral into archive.org indefinite storage.

Why have we decided to participate in this social experiment? It really wasn’t a collective decision. Some software developers and investors decided that archival on a massive scale was important or profitable. We started calling these things “part of history” and just storing them without thinking about it. Saving became default.

I’m not saying that archiving the internet, search robots or “opting in” are bad things. But those who least understand archiving’s effect on personal privacy may be the ones most likely to suffer in the future.

The ripple effects of the decision to move from “default expire” to “default save” are vast. Consider for a moment if we were to call the ability to intentionally forget on the internet a human right.

Instead, what we’ve done is to say to millions of people – you do not have the right to forget. Companies will take your locations and status updates, and never delete them. And privacy is rapidly becoming a privilege of those who can afford to buy it.

For the sake of argument, consider the difference between narrative historical documentation and collections of “facts.” The narrative is an aggregation, full of embellishments and forgetting and kernels of truth. Facts are collected, supposedly objectively. Both approaches to capturing historical thought suffer from the fallacy that historical “fact” is fixed and doesn’t evolve based on the viewer and reteller over time. How much worse is this effect when our collections of facts are now ballooning to include every blog post, photo, tweet and web access log you’ve ever made?

The point is not that individuals wish to change history or even obscure events which may reflect poorly on them. (Even though we all do!)

We need to give people a real choice – not a set of ACLs and rules. Choice about what is archived about them, control over that process and a clear delineation between personal artifact and public property.

Kathy Sierra deleted her twitter stream and was accused of removing a piece of history, and possibly the worse internet offense – taking away conversations. Taken at face value, isn’t that the point of conversation? That it is ephemeral?

Conversations leave echos in changed thoughts and light or deep impressions in the minds of the participants. Just because Twitter has by default chosen to retain these conversations indefinitely doesn’t change the nature of conversation itself. No one would argue that just because we share our thoughts that we are obligated to share every thought.

In the same way, we are not obligated to maintain a record of our sharing. And if we do maintain and share a record of our own end of a conversation, we still have the right to ultimately destroy it.

Once shared, of course, an artifact of a conversation can’t be taken away from those that have copies. But authors and owners of the original work must always retain the right to destroy.

So, that brings me to what is ethical in our applications. When we say: “we’re keeping your data forever” and “delete means your account will still be here when you come back”, application developers and companies are making an ethical choice. They are saying, “your shared thoughts aren’t your own – to remember or forget. We are going to remember all of this for you, and you no longer have the right to remove them.”

Connectedness is not the same as openness. Storing vast logs of data related to individuals which connect thousands of facts over the course of their lives should be presented as the ethical choice it is, rather than a technical choice about “defaults”. Picking what we decide to log and store is an ethical and political decision. And it should also be possible for it to be a personal decision.

Starting at Emma

Today, I start at Emma.

I have some clues about what I’ll be up to – working on some big PostgreSQL databases, tearing into the infrastructure and discovering what makes their small company tick. Emma’s work with small businesses, and a focus on humane communication and consensus building completely drew me in.

And I’m looking forward to riding my bike everyday over on the office on Burnside!

Setting up a PDXAPI instance

Today I spent a little time setting up a PDXAPI instance of the CivicApps data. There are a few different tools out there for grabbing the data and loading it up, and so I’m documenting the basic steps here for setting up a spatial SQLite using @lokkju’s python projects.

hg clone https://pyspatialite.googlecode.com/hg/ pyspatialite

cd pyspatialite
mv setup.cfg.OSX setup.cfg
python setup.py build
sudo python setup.py install

cd ..
cd pyod

hg clone https://pyod.googlecode.com/hg/ pyod

# unsatisfied dependency!
sudo easy_install pyyaml

pypython fetcher.py

This creates a 1 GB sqlite database called ‘test.sqlite’.

Next, I’ll be testing out loading this into a CouchDB instance and maybe playing with Max Ogden’s initial PostGIS export.

PgCon 2010 – PL/Parrot, Simulated Annealing, Exclusion Constraints, Postgres-XC

PgCon this year was full of bold ideas, delivered in the quiet manner typical of the Postgres community. Talks by Jonathan Leto, Jan Urbanski and Jeff Davis all presented new features and ideas that show there is so much room yet in Postgres as a project to contribute, and innovate. I was also delighted to see Postgres-XC (touted as a “Postgres RAC”) release code, and give a great presentation on the high-level details.

Jonathan Leto presented work on PL/Parrot, along with David Fetter. Parrot is a dynamic language virtual machine, allowing implementation of multiple dynamic languages which can then share classes (from the docs: “In theory, you will be able to write a class in Perl, subclass it in Python and then instantiate and use that subclass in a Tcl program.”). The project is to embed Parrot in PostgreSQL, and eventually, implement dynamic languages inside the virtual machine. Advantages to doing this are that it will make implementing new dynamic languages in Postgres much easier, because the language implementers won’t have to learn the PL interface. Another useful feature in PL/Parrot is the implementation of a security opcode in Parrot which essentially controls access to open(), a key to implementing a secure procedural language in Postgres. (I’m sure Jonathan will correct me if I didn’t describe this properly :D )

Jan Urbanski gave a talk on join ordering via Simulated Annealing, called Replacing GEQO. The approach was pretty interesting, involved math that required me to scratch my head a bit, and the initial performance improvements for many-join queries made it seem appealing. The original -hackers posting from Jan, includes a few hairy queries from Andres Freund which confound the GEQO referenced later in the thread. Jan’s posted the code, and I’m looking forward to seeing how it develops this year.

Jeff Davis presented exclusion constraints, which are part of 9.0. He is continuing his work on temporal data types with a clever and very useful generalization of UNIQUE. UNIQUE constrains equality, while exclusion constraints allow other operators (in the most cited example, Jeff demonstrates “overlaps” in the PERIOD datatype).

Postgres-XC was officially presented and released. For efforts in Postgres clustering, releasing the code is a huge step forward toward mainstreaming work in the community on clustering. This release solidifies community work that started last year, with NTT and the support of the Japanese PostgreSQL User Group in having a clustering summit back in November 2009.

I was disappointed to miss a few talks (like hypothetical indexes, pg_statsinfo, CB’s pgMQ) but looking forward to hearing the recordings as they are published!

Lightning talks from PgCon 2010

Thanks again to all the folks that volunteered to give five minute talks during the Lightning talks session at PgCon!

Our lightning talks this year were:

  • PostgreSQL Developer Meeting in Five Minutes – Bruce Momjian
  • Slony 1 => 2.0 – Steve Singer
  • PostgreSQL and Twisted – Jan Urbanski
  • The FlyMine Project – Matthew Wakeling
  • Enhanced Tools for PostgreSQL – Tomonari Katsumata
  • Servoy – Christophe Pettus
  • Tail_n_mail – Greg Sabino Mullane
  • GSOC – Robert Treat
  • Pg Staging – Dimitri Fontaine
  • Serializable Transaction Isolation – Kevin Grittner
  • 10 ways to wreck your database – Josh Berkus

All presentations are downloadable from: http://wiki.postgresql.org/wiki/PgCon_2010_Lightning_talks

Image Copyright © 2010 Richard Guy Briggs from here and used by permission.

Need reviewers – preparing for the first commitfest for 9.1

Now is the perfect time to get involved in Postgres development!

Starting June 15, we’re going to have a “reviewfest”, as a prelude to the first commitfest in July. We’ve already got 28 patches in the queue, and all need reviewers.

Think you’re not qualified to review patches? Think again!

From the Reviewing a Patch documentation:

If you can apply a patch and you can use the new feature, you’re already qualified to start reviewing it.

We will formally kick off a reviewfest on June 15.

We will assign reviewers and get all the patches that are queued up for 9.1 reviewed while the final touches are being applied to the 9.0 release. Have a look at Review A Patch guidelines for information about how to conduct your review. We also have a mailing list to track and recruit reviewers – pgsql-rrreviewers. (The extra R’s are for ’round-robin’)

Please subscribe to the list, and post if there is a particular patch you are interested in reviewing!