Headed to PgConf.EU

I’m headed to Amsterdam for PgConf.EU and very excited for my very first European postgres Conference.

I’m giving two talks – Managing Terabytes, and Mistakes Were Made. Both are cautionary tales about the things that one can do terribly wrong with database management, and system operations management. My goal with these talks is to start a conversation about what we can learn from failure.

I encourage everyone to share their stories about what fails. Not only are they great “campfire stories” for entertainment, but they help us all learn faster, and they teach us what ultimately works when everything is failing.

In the same vein, UpdatePDX is putting on another “tales of failure” set of short talks the following week back in Portland. I’ll be leading the charge with a short story of my own, followed by at least two other tales of failure.

Update releases for 9.1.1, 9.0.5, 8.4.9, 8.3.16 and 8.2.22

Today the Global PostgreSQL Development Group released branch updates for all supported versions. You can go ahead and download them now!

There were quite a few fixes for somewhat obscure crashes, fixes for memory leaks discovered by some valgrind testing, and a couple big fixes for GiST indexes, like this:

* Fix memory leak at end of a GiST index scan

gistendscan() forgot to free so->giststate.

This oversight led to a massive memory leak — upwards of 10KB per tuple
— during creation-time verification of an exclusion constraint based on a
GIST index. In most other scenarios it’d just be a leak of 10KB that would
be recovered at end of query, so not too significant; though perhaps the
leak would be noticeable in a situation where a GIST index was being used
in a nestloop inner indexscan. In any case, it’s a real leak of long
standing, so patch all supported branches. Per report from Harald Fuchs.

There were a few fixes for catalog or catalog index corruption, and avoidance of buffer overflows which could cause a backend crash. There were also a few fixes that will improve the performance of VACUUM over time.

Release notes have all the details. Many of the fixes have already been committed to 9.1 (there are only 11 new commits in 9.1.1). So, you’re about to experience a great many bugfixes, users of 8.2->9.0.

Another thing to note – 8.2 will be deprecated in 2011! You ought to upgrade anyway, just to get HOT and to get yourself into a position to use pg_upgrade for future upgrades. But now, you’ve got extra incentive.

My Postgres Performance Checklist

I am asked fairly frequently to give a health assessment of Postgres databases. Below is the process I’ve used and continue to refine.

The list isn’t exhaustive, but it covers the main issues a DBA needs to address.

  1. Run boxinfo.pl on a system
    Fetch the script from http://bucardo.org/wiki/Boxinfo. Run as the postgres user on the system (or a user that has access to the postgres config).
  2. Check network.
    What is the network configuration of the system? What is the network topology between database and application servers? Any errors?
  3. Check hardware.
    How many disks? What is the RAID level? What is the SLA for disk replacement? How many spares? What is the SLA for providing data to the application? Can we meet that with the hardware we have?
  4. Check operating system.
    IO scheduler set to ‘noop’ or ‘deadline’, swappiness set to 0 (http://www.pythian.com/news/1913/what-exactly-is-swappiness/)
  5. Check filesystems.
    Which filesystem is being used? What parameters are used with the filesystem? Typical things: noatime, ‘tune2fs -m 0 /dev/sdXY‘ (get rid of root reserved space on database partition), readahead – set to at least 1MB, 8MB might be better.
  6. Check partitions.
    What are the partition sizes? Are the /, pg_xlog and pgdata directories separated? Are they of sufficient size for production, SLAs, error management, backups?
  7. Check Postgres.
    What is the read/write mix of the application? What is our available memory? What is the anticipated transpactions per second? Where are stats being written (tmpfs)?
  8. Check connection pooler.
    Which connection pooler is being used? Which system is it running on? Where will clients connect from? Which connection style (single statement, single transaction, multi-transaction)?
  9. Backups, disaster recovery, HA
    Big issues. Must be tailored to each situation.

What’s your checklist for analyzing a system?

Seeking: Database Disaster Stories

I’m going to give another “Mistakes Were Made” talk at PgConf.EU next month.

I have many disaster stories of my own, but am always looking for more! Stories of data-destruction and tales of unexpected failure are welcome.

You can leave them in the comments, or email me.

The talk focuses on the ways in which systems fail, and the typical kinds of failure we find in web operations. Types of failure I focus on are:

* Failure to Document
* Failure to Test
* Failure to Verify
* Failure to Imagine
* Failure to Implement

Stories that fall outside those categories are especially welcome.

I look forward to your tales of woe!

9.1 presentation at Windy City Perl Mongers

I recently updated my PostgreSQL 9.1 slides for a presentation at the Windy City Perl Mongers.

We discussed 10 features that the Postgres community decided to emphasize in our press releases. The crowd was primarily people who had never used Postgres before, which was a bit of a different audience for me.

It was great to be able to compare notes with folks who are supporting Oracle and SQL Server, and see a lot of excitement for trying out 9.1.

When I’m traveling around, I’ll be looking for more non-Postgres user groups to give talks like this. Let me know if you’d like me to come speak at yours!

Postgres Open: next year (!), resources, video

Postgres Open is over!

I wanted to share a few resources, and remind attendees to fill out our survey. I really appreciate the detailed comments I’ve been getting! Keep them coming.

I wanted to specially thank our program committee:

Robert Haas
Josh Berkus
Gavin Roy
Greg Smith

They were the people who put together and edited the website, found sponsors, recruited speakers, voted on talks, gave talks and tutorials and executed the many tasks needed to make the conference a success. We plan to make key members of the Postgres community part of the operation of the conference going forward. We’re really just emulating the way that PgCon is run.

I have some more thoughts about what makes a conference “community-operated”, and once my budget numbers are settled, I’m going to share with you what running the conference costs in terms of my time, and in terms of dollars to operate. It’s important to both understand the costs involved, how much of my time is required and what that means for you as either a sponsor, speaker, attendee or volunteer supporting what we are doing.

NEXT YEAR: September 17-19, 2012

I’m pleased to announce that next year’s conference for September 17-19, 2012 at the Westin-Michigan Ave. So mark your calendars now!

The conference will continue to be operated as a non-profit, with proceeds going toward operation of the following year’s event, and a very small percentage going to Technocation, Inc – our fiscal sponsor and a 501(c)3 organization dedicated to developing educational opportunities and resources for software professionals.

We had fantastic support from our sponsors this year, and hope to expand that next year.

In particular: 2ndQuadrant, EnterpriseDB, Heroku and VMWare’s support were instrumental in pulling this event together. We really only started planning in May. It feels good to now have a whole year ahead of us!

With greater sponsor support, we can help fund some of the things that attendees asked for like: soda (which costs $8/soda – I feel as though we should get some kind of gold plating for this), conference tshirts, and a closing party.

Please get in touch if you or a company you know is interested in sponsorship for 2012!

Slides:

Speakers are uploading or linking their slides to the PostgreSQL wiki. If the slides you’re looking for aren’t there, please ping the speaker or me.

Streaming Video:

Streaming content will be available for about 30 days.

I will be getting all the video on flash drives this week. My plan is to upload it to either vimeo or youtube. I don’t really have the resources to provide individual copies of the videos, but if we find a location for raw data upload, I’ll pass that along to you all.

Looking toward Chicago: Postgres Open, local user groups, parties and on to October!

I’ve been incredibly busy this past month, and not blogging – being a free agent has possibly made me busier than I was before!

Postgres Open’s schedule is in near-final state. We’ve started adding talks to our Demo room on Thursday, and are looking forward to a keynote from Charles Fan, SVP at VMWare about recent developments in vmware’s cloud offerings for Postgres.

We’ll also be getting a more in-depth look at Heroku’s new postgres.heroku.com on-demand database service, as well as an open source tool they wrote called WAL-E.

Thanks to Heroku, we’ll be streaming much of the content from the conference live, so you’ll be able to catch the keynotes and many of the talks, even if you’re not there. And we’ll be sharing the videos after.

I believe we’re the first Postgres conference to do this! Someone correct me if I’m wrong. 🙂

While I’m in Chicago, I’m planning to drop by the Windy City Perl Mongers for a reprise of my 9.1 talk from OSCON.

We’re also planning a couple parties for Postgres Open, and hopefully inviting a few of the local user groups to join us.

After that, I’m headed in October to PostgreSQL Conference EU, and will be giving a talk about terabyte Postgres databases (and the problems you run into with them), and a database-specific “Mistakes were Made” talk, about operations and the tools we need to use to help us make fewer mistakes.

The importance of doing things badly

Update: added “code review” to the things that we’re doing well below.

There were a couple themes for me from OSCON last week. One is transitions and change. I’ve got a whole slew of thoughts on this, particularly from my experience leaving the management team of Open Source Bridge.

But the other is the importance of doing things badly. In particular, the importance of doing things badly in open source.

Tim Anglade, at about 41:10, says that he thinks the reason why open source companies make money is because open source is kind of shitty (from an interview he did with Cliff Moon last fall). So, on one hand there’s a Money Making Opportunity. Probably not the one that we’d all prefer, but it is what it is.

When he said that, I immediately thought about the other things that we do badly (other than documentation) and the discussions I’d been having with people last week.

Basically, we had a problem in the Postgres community of experienced developers solving every small bug at nearly the moment it was reported. It’s sort of like a cat sitting at the entrance of the only mousehole.

The effect on the code is amazing – we have clearly documented, concise and consistent code. But the effect on the community is that we don’t have mid-level developers, and it is very difficult for inexperienced developers to build up a portfolio of small projects, based on bugs.

I don’t have a ready solution for this problem. And I do not mean this as a criticism of the thousands of hours our core teams have devoted to fixing bugs. We all benefit from the dedication. I am just pointing out that our system had a clear tradeoff – fewer contributors.

What we could do a bit worse (to address the point of this blog post) is lengthen our response time to solving bugs, and let some less experienced developers respond to the bugs queue. This probably involves creating a bug tracker and holding the tension a bit longer on fixes.

Our committers have made efforts toward spreading the load around more – with commitfest – meaning a greater support of code review, with Tom’s recent presentations about the planner, with our wiki-fied Todo list. And there are many more examples of our committers putting real effort into mentoring, tutoring and finding ways of bringing more people in.

The thing that’s missing from all of those efforts, however, is urgency. That’s what bug-fixing is great for. That’s why we have people who remain in operations work even if they hate being woken up at 3am. Urgent work is worthwhile work (mostly).

I’m sure there are other particular areas where we could do things worse, and thus invite more people to contribute. I’ll be thinking about this more in regard to our project event planning, as I think there’s a bit of a disconnect there, and a huge opportunity to involve more people.

I’m reminded again of David Eaves’ talks about how community management is the core competency of open source, not technology. I struggle with that thought every day, but it rings truer the more I try to work on the significant problems facing any particular open source project.

OSCON: We’re at the end…

I’m finally getting to blog, and here are a few highlights:

* “Mistakes were made” was a great time. Thank you everyone who shared stories. And those of you who attended, please connect with me – email or whatever, and let’s continue our discussions about failure.
* I have a little bit of editing to do left on the harder, better, faster, stronger slides. Talk ratings have been very high (thank you audience! 🙂 Should have that up tomorrow!
* Not having a booth at OSCON was a real bummer for Postgres. We need to figure out a way to make this happen for us every year.
* Great having the time to connect with old friends in the hallways this week.
* Thanks O’Reilly for supporting our open source community.
* Thanks Google Open Source Programs office for bringing together open source leaders yet again this year for some important conversations.

Thank you everyone from the Postgres community who contributed to the Postgres day just before OSCON. All the speakers and their talks are listed here.

We need to keep having adjunct events like this! I think LCA has it right scheduling Mini-BoFs to provide networking opportunities for the distinct groups. I think OSCON should formalize this next year, and figure out a way of facilitating those groups in a more structured way.

I have another blog post brewing about difficult conversations.. but that’s going to have to wait until after I enjoy the brewers fest!