Where PostgreSQL succeeds and what to do next

Response to my earlier post about meritocracy was overwhelming.

Also, Robert posted a response focusing on code, and how the PostgreSQL project works around Commitfest.

Addressing some criticisms

I talked to Bruce Momjian about a few things that I said toward the end of my earlier post. Things that may have offended people in our community.

We focused mainly on that I brought a discussion of the outside world into the microcosm of PostgreSQL. And that I brought two things together: intrinsic ability of an individual to succeed, and the value of an individual’s contribution to Postgres itself.

I talked about a world that is filled with people who are poor, uneducated or disenfranchised who we, as a project, probably just can’t reach. And that by mentioning these facts, which Bruce and I agreed were facts, I was confusing and insulting people who contribute so much time and hard work to our project.

What PostgreSQL does well

To clarify, PostgreSQL does an admirable job of promoting and encouraging the work of the people who step up and contribute code. Robert’s post about Commitfest shows how much effort goes into finding and encouraging the type of people that we’d like to contribute more code, review our code and document it.

As a project, we also do pretty well with encouraging non-code contributions. In particular, I think we do very well with conferences: finding creative ways of sponsoring them, seeking out and developing new speakers, and helping start user groups. The focus has always been on finding the right people, in the communities that we see growing, and encouraging them. Today, we see conferences and great Postgres representation in Japan, Russia, Canada, the US, China, Cuba, the European Union and Brazil. And there are more.

So, I think that we (Postgres) are succeeding, and growing.

I brought up my criticisms in the context of Robert’s original post, and a request that I lay out my concerns about invoking meritocracy. The concerns I expressed are more about the outside world, how that world impacts Postgres, and how Postgres can impact the rest of the world.

I do think we can do more to create structures that encourage participation, the Commitfest being a great example of how to implement and succeed in the future. I’ve seen a few people step up and offer help in the last couple weeks, and I’ll encourage them in their work. And hopefully talk about their successes here.

What do we do next?

What I wanted to do was provoke a larger discussion about what we could be doing. I didn’t offer any particular solutions. I just asked that we think for a moment about what we might be able to do

And that, magically, happened.

David Fetter asked: “Which of those barriers do you see as important to address first?”

I’d like to connect Postgres more with the people in regions that our community doesn’t yet reach.

So, I’ve put up a survey asking people who live in high population regions that our community doesn’t really serve at all – most of Africa and the Middle East.

Please take a moment and let us know how you use Postgres, and what ways the Postgres community can connect with you.

The plan over the next six months is to both find ways of getting Postgres experts to give talks in those regions, and to find ways of supporting more people who want to be advocates for Postgres.

Where meritocracy fails

Robert wrote about patches and rejection today, and quoted me from some tweets I made about meritocracy. I think Robert made some good points in his post, and I’m going to make some suggestions about patch review.

But first, I want to address my irritation about meritocracy

The first thing that I’ll say is that I’m not sure exactly what people mean when they mention meritocracy. A definition of it is “Meritocracy…is a system of government or other administration (such as business administration) wherein appointments are made and responsibilities assigned to individuals based upon their “merits”, namely intelligence, credentials, and education, determined through evaluations or examinations.”

My assuption was that Ed was saying, “Postgres is awesome because our community is meritocratic.” I don’t believe that’s our strongest value, or quality as a community. And, it’s not something that I think embodies what is awesome about Postgres.

Our strongest quality is our ability to create great code.

We consistently produce readable, reliable and robust code amongst geographically diverse people who have very strong, divergent opinions about a great many things. We find common ground in the production of database software between people who are rhetorically violent even in agreement.

The code quality arises from a commitment by Postgres hackers to discuss in public decisions that many developers prefer to make in private. We are committed to a kind of radical transparency about our code that, at least in our shared Postgres myth, is embodied in Tom Lane’s example. He overwhelmingly gifts to us his time and passion, in the form of methodical reviews of code. And that’s not to say that our reviews are perfect in tone or fact, but just that we consistently do them.

When I think about our review process as it has evolved through Commitfest, it seems so undeniably humane and personal. I know at the same time that it’s still frightening… Just last week a developer talked to me about how much he feared someone tearing into *him* and his code, picking apart decisions he’d made and the bits he knew needed more work. Anyone who shares a creative work knows how this feels – whether it’s a painting, poetry, music or code.

But I don’t think that commitfest or the direct reviews fellow hackers still provide to each other, produced a meritocracy. And I don’t think that we should pursue meritocratic organization much more than we already have.

What we have is something that largely works, and produces a product we feel good about endorsing and improving. There are elements of “promotion through merit”. We pay closer attention now to giving commit access to people who it seems really ought to have it. And we recognize individual efforts where it is appropriate in our commit logs – something many projects fail to do.

At the same time, the operation of the project is dominated by people who fit into a very specific profile. And that’s something like:

  • the top 1% of the world in terms of salary,
  • are male,
  • had parents that were mostly successful (aren’t in jail for violent offenses for example), and
  • either don’t have kids, or have a partner or paid helper that does most of the childcare during the work day.

I count myself among you, with the exception that I’m not male, and I don’t have kids. But I guarantee you that if I did have kids, either my partner would provide the bulk of childcare during the work day, or we would pay someone to do it for us.

I bring this up because in a truly meritocratic organization, privilege wouldn’t matter. Anyone could join us. But the truth is, not everyone can join the Postgres project. And that’s why bringing up the myth, and applying it to an organization I contribute to annoys me.

I try to think regularly about my own privilege, and the place of open source software like Postgres in the world. I consider how to contribute to an organization that is not only is excellent in terms of what it produces, but is also something to be proud of because of the way that people treat and care for each other.

So, I don’t think more, or purer meritocracy helps us have better relationships or treat people well.

We are still small enough at our core (somewhere around 300 people at any point in time), that we can operate like the best businesses do. We rely on good relationships between small groups who tend to appoint leaders to communicate between teams. Our teams seem to often be pairs, or small businesses, which fits our project’s need for deep understanding of each feature.

But apart from the practicality of avoiding further pursuit of meritocracy, I don’t believe that it helps us with talents that we need as a project now. What matters is not that someone is the best at something, but that they have the time to put some effort in, which will then motivate others. That someone out there has a few minutes to write a review, file a bug report or fix a typo on our websites.

What we have to do is create structures that invite people to give what they can, when they can give it. This is what we enable with our extensive comments and thorough documentation. We probably could use someone with Tom Lane’s singular attention and time to our web site, but I think we could make better use of 10 people who could devote a fraction of that time, consistently and with good humor.

So, ending the pursuit of a mythical meritocracy doesn’t mean that we start accepting code which doesn’t meet high standards, or that all of the sudden we’re going to include more code from people in the bottom 1% of the world in terms of salary. It means that we take a look at different aspects of our project and see what is within our means to open up and make accessible to people who aren’t exactly like us.

Report from first day at PgEast and hoping for another tool to be opened up

I wrote up some quick notes from talks and conversations over at the Emma Tech blog.

The most exciting talk I sat in today so far was about an Oracle PL/SQL to Postgres PL/PgSQL translation tool that I’m hoping the company who created it will open source. We’ll see. Fortunately, a fellow conference-goer had an inspirational story to share about open sourcing another tool for Postgres, which meant incredible adoption in just a few months in our community.

Not every project will see that kind of immediate benefit and growth from open sourcing, but there is a certain class of project – where most people can complete 80% of a useful tool, but don’t bother to put in the additional effort to get the remaining 20% of the features that they’d really like to have.

But, when someone does finally release a tool that provides that extra 20% of features, adopting the new tool is a no-brainer.. particularly if it is open source. I think this PL/SQL conversion tool falls into this sweet spot.

Now I’m sitting in the Foreign Data Wrappers talk and very excited to see what Andrew is announcing. Great to see people creating things that make the crowd here clap, smile and celebrate.

First day in NYC for #pgeast

I’m here in NYC today, and looking over the schedule. I also posted the wifi keys below for the conference if you’re looking!

It’s not easy to link to individual talks, but here’s my short list of talks and people I’m going to try to connect with over the next three days:

  • True serializable transactions are here! – Kevin Grittner
  • Building your first mongodb application – Brendan W McAdams
  • Defense against the dark arts: protecting your data from orms – Vanessa Hurst
  • pgbouncer: A practical implementation of a multiserver database farm behind the firewall – Lou Picciano
  • Range Types – Jeff Davis
  • The Write STuff – Greg Smith
  • Getting started with PL/Proxy – Peter Eisentraut
  • Streaming databases: stepping outside of Postgres – Theo Schlossnagle
  • Data-driven cache invalidation – Magnus Hagander
  • Creating and Using Foreign Data Wrappers – Andrew Dunstan
  • PostgreSQL Performance Pitfalls – Greg Smith
  • View Triggers – David Fetter
  • Introduction to Write Ahead Logging – Robert Haas
  • Experiences with MongoDb as a queue and dict server – Tejaswi Nadahalli
  • Monitoring and Managing MongoDB and Postgres Applications with ClearStone – Tim Sneed
  • Comparing the Apache Cassandra Architecture to PostgreSQL – Jake Luciani

And if you’re searching for the wifi keys for the conference:

SkyTop: conf181pa
PennTopSouth: conf182pa
PennTopNorth: conf183pa
Madison: conf184pa
6th floor (Executive6fl): conf060pa

Also, I’m here to talk to folks about working at Emma. We’re hiring! Find me if you want to chat. 🙂

GSoC 2011, accepting submissions starting March 28!

The PostgreSQL project has been accepted into the Google Summer of Code 2011.

Students may begin submitting proposals starting March 28, concluding
on April 8.

Development work runs from May 23 through August 15. For students,
suggested projects, ideas and details are at:
http://wiki.postgresql.org/wiki/GSoC_2011
Our GSoC landing page is at:
http://www.google-melange.com/gsoc/org/show/google/gsoc2011/postgresql

We encourage students to contact project admins – me, Josh Berkus and
Robert Treat this year – if they have questions. Once students have a
proposal in mind, we will encourage them to engage with pgsql-hackers
to flesh out their proposals and get feedback the same way that all
contributors do. For those of you who have been around for previous
GSoCs, this should be familiar to you. 🙂

Many thanks to the 15 volunteer mentors and admins this year (in no
particular order):

  • Dave Page – Past mentor – pgAdmin, Windows, Packaging, Infrastructure
  • Heikki Linnakangas – Postgres Committer
  • Magnus Hagander – Postgres Committer, pgAdmin
  • Guillaume Lelarge – pgAdmin
  • Jehan-Guillaume de Rorthais – phpPgAdmin
  • Joe Abbate – Python-related, catalog-related projects
  • David E. Wheeler – Perl-related, extensions, PGXN
  • Mark Wong – benchmarking, monitoring, performance
  • Tatsuo Ishii – Postgres Committer, pgpool-II
  • Stephen Frost – Postgres contributor
  • Devrim Gündüz – Administration related software (dashboard)
  • Josh Berkus – auto-configuration, performance testing
  • Selena Deckelmann – configuration, testing
  • Andreas Scherbaum – performance, configuration, testing
  • Robert Treat – Past mentor 2x, co-admin, Mentor Summit attendee.

We can always accept more mentors! Actual assignment to projects
depends greatly on the proposals from students. Please contact me if
you are interested.

Google Summer of Code 2011 application started! Looking for mentors.

PostgreSQL is applying for GSoC again this year. We’re looking for:

* Mentors
* Project ideas

Are you a PostgreSQL community member, and would you like to mentor? Please let me know! Our application deadline is Friday, March 11, 2011 so please contact me *before* Friday.

I’ve started a wiki page: http://wiki.postgresql.org/wiki/GSoC_2011

It’s seeded with last year’s todo lists and information. We need to add project ideas for students to it.

The wiki pages for 2008 and 2010 are available, including links to the original student proposals:

http://wiki.postgresql.org/wiki/GSoC_2010
http://wiki.postgresql.org/wiki/GSoC_2008

Broken windows, broken code, broken systems

A few days ago, I asked:

I spend a lot of time thinking about the little details in systems – like the number of ephemeral ports consumed, number of open file descriptors and per-process memory utilization over time. Small changes across 50 machines can add up to a large overall change in performance.

And then, today, I saw this article:

One of the more telling comments I received was the idea that since the advent of virtualization, there’s no point in trying to fix anything anymore. If a weird error pops up, just redeploy the original template and toss the old VM on the scrap heap. Similar ideas revolved around re-imaging laptops and desktops rather than fixing the problem. OK. Full stop. A laptop or desktop is most certainly not a server, and servers should not be treated that way. But even that’s not the full reality of the situation.

I’m starting to think that current server virtualization technologies are contributing to the decline of real server administration skills.

There definitely has been a shift – “real server administration skills” are now more about packaging, software selection and managing dramatic shifts in utilization. It’s less important know to know exactly how to manage M4 with sendmail, and more important that you know you should probably use postfix instead. I don’t spend much time convincing clients that they need connection pooling; I debug the connection pooler that was chosen.

The available software for web development and operations is quite broad – the version of Linux you select, whether you are vendor supported or not, and the volume of open source tools to support applications.

Inevitably, the industry has shifted to configuration management, rather than configuration. And, honestly, the shift started about 15 years ago with cfengine.

Now we call this DevOps, the idea that systems management should be programmable. Burgess called this “Computer Immunology”. DevOps is a much better marketing term, but I think the core ideas remain the same: Make programmatic interfaces to manage systems and automate.

But, back to the broken window thing! I did some searching for development and broken windows and found that in 2007, a developer talked about Broken Window Theory:

People are reluctant to break something that works, but not so much when it doesn’t. If the build is already broken, then people won’t spend much time making sure their change doesn’t break it (well, break it further). But if the build is pristine green, then they will be very careful about it.

In 2005, Jeff Atwood mentioned the original source, and said “Maybe we should be sweating the small stuff.”

That stuck with me because I admit that I focus on the little details first. I try to fix and automate where I can, but for political or practical reasons, I often am unable to make the comprehensive system changes I’d like to see.

So, given that most of us live in the real world where some things are just left undone, where do we draw the line? What do we consider a bit of acceptable street litter, and what do we consider a broken window? When is it ok to just reboot the system, and when do you really need to figure out exactly what went wrong?

This decision making process is often the difference between a productive work day, and one filled with frustration.

The strategies that we use to make this choice are probably the most important aspects of system administration and devops today. There, of course, is never a single right answer for every business. But I’m sure there are some themes.

For example:

James posted “Rules for Infrastructure” just the other day, which is a repost of the original gist. What I like about this is that they are phrased philosophically: here are the lines in the sand, and the definitions that we’re all going to agree to.

Where do you draw the line? And how do you communicate to your colleagues where the line is?

Intro to PostgreSQL class starts March 7!

Remember that class I announced about a month ago?

Well, it’s happening for real. We’re starting March 7th and going for 6 weeks. Sign up now if you’re want to join us for this first edition of the class.

I’m planning to do screen casts for a lot of the content, and have just started playing around with Screenflow.

The first couple weeks are primarily about using psql and learning key features of PostgreSQL, with some history sprinkled in. The next two weeks dive into features like: full text search, built-in functions, our many datatypes, indexing and transactional DDL. I’ll be surveying students as we go along to add detail where I can on key features they’re interested in. The last few weeks go into administration, maintenance and configuration. I’ll also be throwing in details about the PostgreSQL community – people, the best places to go for help, and hopefully some cameos from Postgres community members.

So, don’t forget to sign up today! Especially because this pudding says so:

Image courtesy of @thesethings