FSM, visibility map and new VACUUM awesomeness


Heikki Linnakangas, listening as Simon Riggs sketches on the chalkboard.

Update: Heikki’s slides are here!

Heikki Linnakangas gave a presentation this past Sunday at FOSDEM about the improved free space map (FSM), which tracks unused space inside the database, and new visibility map, a bitmap which will indicate which data pages can be skipped during a partial VACUUM. This performance enhancement will affect all users of the upcoming 8.4 software release. You can see what the new FSM implementation looked like back in October from depesz’s blog.

Despite Heikki’s modest claim during the talk that the performance tests were inconclusive, the consensus among Postgres contributors is that this feature will result a substantial improvement in the performance of VACUUM for tables that are large, but have few UPDATEs.

The new free space map and Visibility map (in 8.4) and autovacuum (enabled by default starting in version 8.2) are huge administrative usability improvements to version 8 of Postgres. Prior to version 8.1, VACUUM had to be scheduled outside of database system. Autovacuum has been part of the core Postgres distribution for over two years, and is tunable via several global configuration parameters.

The visibility map enables partial VACUUMs — meaning that VACUUM no longer has to examine every tuple to update the FSM. The new FSM implementation eliminates two configuration parameters, effectively automating a formerly manual configuration process.

The new FSM is stored on disk in seperate files inside of $PGDATA/base/, and is cached in shared_buffers. The result is that the max_fsm_* configuration parameters are no longer in 8.4 — Postgres is able to track and adjust this data structure without user intervention.

A few critical features of the new FSM are:

* Now a binary tree structure
* Constructed using 1 byte per heap page
* The top level shows the maximum amount of contiguous space available
* The data structure is auto-repairing and can be reconstructed from the bottom

Previously, every time that VACUUM was run, the free space map had to be reconstructed from scratch. Now, individual nodes in the map may be updated (aka “retail” updates).

Visibility map is a bitmap of heap pages which tracks which tuples on pages are visible to transactions, and therefore not available for VACUUMing.

Previously, when VACUUM ran, it *had* to look at every tuple in a table, because there was no information about which pages may not have been updated since the last VACUUM. With the visibility map, VACUUM will now be able to perform partial scans of table data, skipping pages which are marked as fully visible. Partial scans means fewer I/O operations for VACUUM, and happier database administrators.

Simon Riggs just rocked my world.

I’m in Brussels for the FOSDEM conference, hanging out at the PostgreSQL booth, meeting my European colleagues, and running into friends.

PostgreSQL has a developer’s room and Simon Riggs just wrapped up a talk about Replication. I sincerely hope that the video of the talk turned out well, because it was the most inspiring and technically interesting talk I have seen in a very long time. Unfortunately, I don’t have a copy of the slides at the moment, but word is that they will be posted on the BSD wiki soon.

Simon focused on new features in 8.4 that affect file-based replication, also mentioning streaming, synchronous replication — which will not be included in 8.4, but is being actively worked on. He explained his rationale for objecting to the inclusion of the synchronous replication patches, mostly, I think, based on the complexity of the WAL archiving required as it was implemented.

Then, Simon launched into an in-depth tour of the issues and solutions brought about during his team’s work on Hot Standby. Hot Standby allows read-only queries to be made against a file-based replication enabled Postgres server, known as Point-in-time recovery and WAL Shipping in the Postgres documentation.

Simon started work on PITR-related patches about five years ago, and continues that work with others today.

One fascinating aspect of the hot standby patches is that they ultimately caused performance improvements in sub-transactions across the board – and will likely cause up to 5% improvement in that code path. There were other performance improvements, but I’ll wait for the slides to mention those. At several times during the talk, Simon pointed out features that Postgres has that no other database has — such as multiple options for dealing with conflicts in hot standby (freezing, conflict resolution and timeout).

At the end of the talk, Simon spent a few minutes talking about how Postgres is capable of being the best database, not just the best open source database. And how all the people in the room were capable of contributing as he had. He claimed that prioritization and aiming to work on the biggest, most interesting problem you can are all you need. And he claimed that all that made him different was that he was a little more persistent about solving problems.

Rock on, Simon.

What are you waiting for? Get your PgCon talks in now!


Yes, that’s me, with Tom Lane. You, too, might be able to get your picture with Tom!

Like Josh Berkus said yesterday:

As of today, you have 2 weeks left to submit talk proposals to PGCon.

You know you want to. PGCon is the international conference for PostgreSQL hackers, sysadmins, application developers, SQL geeks and other Smart People. Submit your talk! Be a Smart Person too!

PGCon will be happening May 21-22 in Ottawa, Canada, with tutorials on May 19 and 20. Some financial help is often available for speakers, but none is available for non-speakers. So submit, submit!

We particularly could use some talks on the new 8.4 features, really creative PostgreSQL applications, massive Postgres scaling, PostGIS, BioPostgres, and a few case studies. This means you.

I attended PgCon last year for the first time. Not only were the presentations top notch, but Dan Langille‘s hospitality set the groundwork for yet another fantastic community-building experience PostgreSQL community members experienced during the 2006 Anniversary summit in Toronto, again in 2007 at the first PgCon.

We had plenty of outstanding socializing and hacking opportunities. Last year’s conference started with a gathering of committers that was fodder for great pub and hallway track conversation all week. Great talks I saw included Andrew Sullivan’s Idle thoughts on PostgreSQL Project Management, Greg Sabino Mullane’s Bucardo talk about this multi-master replication tool, and Magnus Hagander’s walk through how search.postgresql.org was implemented.

Ottawa was beautiful last year, and I can’t wait to go back this May!

A year of PDXPUG

Last year was the third year that PDXPUG has been operating in Portland, and I decided to look back at our year of meetings. Here goes:

January 11 – 10 things you can use in PostgreSQL 8.3
February 26 – Extreme Database Makeover: RT
March 20 – Managing Internet Services: Using the right tool for the job
April 17 – Rails on PostgreSQL
May 15 – PostgreSQL for Pythoneers
June 19 – The relational model
July 20 – PDXPUG DAY!, and the schedule
August 21 – Tsearch2 and Materialized Views (Guest speaker from Seattle!!)
September 18 – The Visual Planner
October 16 – Point In Time Recovery
November 20 – Reviewed 8.4 features with the help of depesz’s blog
December – Coder’s Social

Thanks everyone who gave talks and attended meetings! User groups are only as good as the people who participate in them, and this list shows just how talented, diverse and fun the Postgres community is in Portland. I love you guys!

Looking forward – once again, we’ve already scheduled talks through the next four months! I feel like the group is running on its own momentum, and that is a fabulous feeling. We have a data visualization talk, another Extreme Database Makeover, and hopefully a presentation about teaching database theory with PostgreSQL.

Our next meeting is on January 15, 7pm with Stephen Jazdzewski traveling all the way from Eugene to present SplendidCRM, a formerly Microsoft SQL-only system that is now compatible with PostgreSQL. I am happy to see more of our Microsoft colleagues joining and presenting to the user group communities, as I’ve always felt they are underrepresented in our groups. Also, I’m happy to host another out-of-town presenter here in Portland! Hope to see you on the 15th.

Open Source Bridge

wordle rocks

There’s going to be a new conference in Portland next July.

We’re calling it Open Source Bridge.

Our goal is this:

Create a completely volunteer-run, community conference to connect developers working with open source.

Let me explain with a little background:

My first tech conference was LISA in San Diego in 1997. I ran into Linus Torvalds in the hallway with my friend Steve, and we were both star-struck. I was still a student at the time, and loved every minute I spent rubbing elbows with people that were the pop-stars of the UNIXy world.

Since then, I attended LISA a few more times, OSCON, countless user group meetings for Perl, PostgreSQL. The last two years have been filled with local unconferences (BarCampPortland and WhereCampPDX to name just two) and travel to incredible community conferences like PgCon, LUG Radio Live, SCALE, Northwest Linux Fest, the Linux Plumbers Conference and last weekend’s Mentor Summit. And while on the board of the Legion of Tech, I’ve met and connected with more people than I ever thought I could know in Portland.

I love conferences. And I love Portland. Maybe you can guess what’s coming next.

During an intense brainstorming session at Side Project To Startup, a group of concerned Portlanders drew together a plan for a new conference. We packed a tiny room, and had a heated discussion about what we wanted, what Portland needed, and how we might do it. By the end of the session, Audrey Eschright and I agreed to co-chair. And with the support of Portland’s incredible tech community, we knew we could make it happen.

We called a few people, and I invited everyone over to talk about what to do next. We were: Audrey, Reid Beels, Professor Bart Massey, Rick Turoczy, Jake Kuramoto, Dawn Foster, Kelly Guimont, Adam Duvander.

We looked at the giant pieces of paper we’d scribbled notes on a few weeks before, and ate dinner together on a warm fall evening. And we decided to have a Town Hall.

town hall meeting, Oct 30, 2008, 7.30pm, Cubespace

Since then, we’ve been joined by Ward Cunningham (AboutUs), Irene Schwarting (Companies By Design), Harvey Mathews (SAO) and Clay Neal (City of Portland).

But enough with the history lesson!

Open Source Bridge will bring together the diverse tech communities of the greater Portland area and showcase our unique and thriving open source environment.

Open Source Bridge
will have curated, discussion-focused conference sessions, mini-conferences for critical topics and will include unconference sessions.

We will show how well Portland does open source and share our best practices for development, community and connectedness with the rest of the world.

Lots of ideas are buzzing around in our heads, and we’d love to talk about them with you! If you’d like to contribute to the effort, stop by the town hall event October 30, 2008 at Cubespace. We’ll have another meeting November 6th, and it will be announced on Calagator.

At the town hall, you’ll have a chance to meet the members of the core organizing committee, and pick up a responsibility or two. We’ll be breaking off into teams for each of the major areas requiring organization, and distributing the work across many people. We will create a mailing list after this first meeting for those who just want to hear about what we’re up to, or participate in some other way.

Thanks for your interest, and we hope to see you tomorrow night!

Mentor Summit Report for PostgreSQL

mentor summit

Update: Fixed the etherboot wiki link.

I attended the Google Summer of Code Mentor Summit this past weekend on behalf of PostgreSQL. We met at the Google campus in Mountain View.

This event was an unconference and so, none of the sessions were determined in advance.

Some of the highlights were:

  • Leslie Hawthorn and Chris DiBona went into some detail with the whole group about the selection process for GSOC. This session made me feel as though PostgreSQL had relatively good chances for being accepted again next year. Google, however, does not pre-announce projects/products, so there is no sure thing about our (or any other project’s) involvement.
  • I met MusicBrainz guys and was pleased to receive many bars of chocolate they requested to be distributed to SFPUG and PDXPUG members as thanks for making an great database.
  • Attended three sessions concerning recruitment and retention of students. This is a topic that many people were interested in, but that few people feel they have a proper strategy for.

I also led a session on recruitment and retention of students to open source projects. Some of the ideas that came out of that and the related sessions were:

  • Determine what makes you personally need to be part of Postgres (joy of learning, scratching a technical itch, making a tool for your job, fame). Find out which of those things your student also needs or wants and try to give that or help your student achieve that thing.
  • Have a clearly defined method for students to keep journals. Several projects simply used MediaWiki and templates.
  • Use git (or other distributed revision control), and have students commit early and often to a branch that mentors have access to.
  • The Etherboot project has a great system: http://etherboot.org/wiki/soc/2008/start
  • Hold weekly meetings over IRC. These can be brief, but help get students accustomed to your project’s culture and way of doing things.
  • Ask the student: “are you on track?”, ask the mentor: “do you think the student is on track?” on a weekly basis
  • If you want students to stick around, find incremental responsibilities to assign that are driven by their enthusiasm.
  • Interview on the phone all your students ahead of time, not just the ones you think might be a problem.
  • Require a phone number on the application for the student.
  • Require a secondary contact so that if the student “disappears” there’s a backup person to contact. (and contact that person BEFORE SoC starts)

I made good connections with members of Git, Parrot, WorldForge, Ruby and many other community leaders. I was particularly impressed by the ideas and stories from the current Debian project leader, Steve McIntyre and Gentoo council member Donnie Berkholz. Donnie recommended some books about recruitment that I plan to read and review in the next few weeks.

The issue of mailing list moderation and the number of people required to keep mailing lists functioning properly came up frequently. If you know a moderator for a Postgres mailing list, please consider thanking them for doing a very tedious, extremely important and often thankless job.

I also spent some time discussing with Leslie Hawthorn and Cat Allman how to increase the total number of women mentors and students next year. Leslie and I shared some ideas and I offered to help implement them next year. One thing the crowd asked for was explicit training on how to recruit and manage female students. Realistically, this information will apply to all students, and I hope this training helps us recruit more students overall.

I thought the conference went quite well. I hope PostgreSQL is accepted next year, and that one of our mentors is able to attend this conference. And, if you go, be sure to register for the hotel early, and stay at the Wild Palms.

User Groups redux

lousy cup!
actually, i love this cup. thanks, eric! 🙂

It’s a bit late for an “announcement”, but Gabrielle and I are re-presenting the User Groups talk to the Portland Linux Users Group tonight. We’re all about audience participation, and so we’re going to focus on helping PLUG pick a few topics and presenters for upcoming meetings. And whatever else they want to talk about 🙂

Meeting starts at 7pm and here’s where:

Fariborz Maseeh College of Engineering & Computer Science Building
Room FAB 86-01 (This is in the basement.)
The building is on SW 4th across from SW College Street.
See location H-10 on map at http://pdxLinux.org/campus_map.jpg

Beer afterward at Jax!

Jax Bar And Restaurant
826 SW 2nd Avenue

Leaving US PostgreSQL Assoc. – what’s next for me?

A smiling pug
image credit to bugbunnybambam

A few weeks ago, I decided to resign from the United States PostgreSQL Association board. Shortly after, I left for a long vacation where I thought about what I wanted to do next – both professionally and in a volunteer capacity.

Looking back, I started volunteering for PostgreSQL two years ago. I’ve led PDXPUG, staffed many conference booths, given nearly a dozen talks and run conferences. Of the work I’ve done, I’ve been most surprised by the creation of the PUGS website and all the user groups that followed.

This may sound silly – but I was so incredibly proud to see user groups in Oklahoma, Toronto, Los Angeles and the D.C.-area (BWPUG) hold meetings, share their experiences and publish fantastic presentation slideshows. All while I was out of the country!

That’s a true sign of success to me: groups of people leading themselves and sharing their knowledge with each other. It’s open community, with minimal bureaucracy, and (I hope) maximum fun.

With that in mind, I’ve decided to make this next year’s volunteer work focused on a simple idea:

Enable people to connect and learn directly from each other.

So what you can expect from me over the next year is more of the same, but now with that end goal in mind: more PostgreSQL user groups (for as long as the postgresql.org folks would like me to stay), more ways to connect people directly to each other, more authentic community building through un-conferences, and more contributions – through code, testing and presenting of that work.

To give you an idea — here’s what I’m up to over the next couple of months:

  • Linux Plumber’s Conference, September 17-19 – with Gabrielle Roth, we’ll be presenting information about databases (PostgreSQL specifically) and filesystem performance using data gathered from the recently installed PostgreSQL performance lab.
  • PostgreSQL Conference West, October 10-12 – I’m not organizing this year, but I’m organizing a session on hacking PostgreSQL, led by some PostgreSQL hackers!
  • WhereCampPDX, October 17-19 – I’m helping organize this un-conference for geography-specific tech – practicioners, professionals, enthusiasts, artists! We’ve got some great ideas and hope to publish details in the next week about the awesome folks involved, the venue and the parties!

Hope to see you at these events!

I haven’t talked about my work much in this blog, and probably will continue not to do that much here – but I also wanted to share that I’ve taken a position with End Point Corporation, a fantastic company that works on open source software, and provides support for PostgreSQL. I’ll be focusing on PostgreSQL, and doing a little Perl development here and there.

Pluggable architecture, not just for code

hands

photo from Chris Zakorchemny

One OSCON session that made me think was “Does Open Source need to be organic?” The panel contained Brian Aker (MySQL), Rob Lanphier (Linden Lab), Stephen O’Grady (Redmonk), Theodore Ts’o (Linux Foundation). The session was less about business vs. community, and more about how to increase community involvement in your projects.

Brian Aker mentioned Launchpad, and the way that it handles code forks. Forks are integrated into the system using a new revision control system – Bazaar. The forks are front and center – allowing all developers on the project to add forks and update them, incorporating them in with the primary code distribution point. This model reinforces the idea that forks are natural and can be positive evolutions in open source projects.

My big take-away: If you want to increase community contribution to open source projects, provide public and easy-to use interfaces. Publish your API early and create pluggable interfaces! Let developers add functionality and publish their add-ons easily, both in your project’s development space and on their own.

The same principal can be applied to the people side of open source projects. In your organization, make roles, tasks and responsibilities transparent. Let everyone – inside AND outside the project – know what they could be doing to get things done. The mistake that many projects make is assuming that people know what they could be doing.

Think of the people-side of projects the same way as you think about the code. Documented APIs are the same as public mailing lists, blog entries and wikis that reveal what your organization is actually doing, and how new people can get involved. Roles and titles that are meaningful let people know who they should bring their ideas to. And that lowers barriers to participation.

Leadership is not just telling people what to do – it’s inspiring, facilitating and then getting out of the way of people who are willing and capable of doing things on their own. Community grown from inspiration, and then fed by encouragement, fun and recognition of accomplishment, are the ones that last. And these communities are the ones that I want to be part of.

Running a Successful User Group

running a successful user group

After the People For Geeks talk, I presented “Running a Successful User Group” with Gabrielle Roth on Wednesday. You can find our slides and our presentation handout over on Bacon and Tech. The handout is pretty cool, take a minute and print it out!