Broken windows, broken code, broken systems

A few days ago, I asked:

I spend a lot of time thinking about the little details in systems – like the number of ephemeral ports consumed, number of open file descriptors and per-process memory utilization over time. Small changes across 50 machines can add up to a large overall change in performance.

And then, today, I saw this article:

One of the more telling comments I received was the idea that since the advent of virtualization, there’s no point in trying to fix anything anymore. If a weird error pops up, just redeploy the original template and toss the old VM on the scrap heap. Similar ideas revolved around re-imaging laptops and desktops rather than fixing the problem. OK. Full stop. A laptop or desktop is most certainly not a server, and servers should not be treated that way. But even that’s not the full reality of the situation.

I’m starting to think that current server virtualization technologies are contributing to the decline of real server administration skills.

There definitely has been a shift – “real server administration skills” are now more about packaging, software selection and managing dramatic shifts in utilization. It’s less important know to know exactly how to manage M4 with sendmail, and more important that you know you should probably use postfix instead. I don’t spend much time convincing clients that they need connection pooling; I debug the connection pooler that was chosen.

The available software for web development and operations is quite broad – the version of Linux you select, whether you are vendor supported or not, and the volume of open source tools to support applications.

Inevitably, the industry has shifted to configuration management, rather than configuration. And, honestly, the shift started about 15 years ago with cfengine.

Now we call this DevOps, the idea that systems management should be programmable. Burgess called this “Computer Immunology”. DevOps is a much better marketing term, but I think the core ideas remain the same: Make programmatic interfaces to manage systems and automate.

But, back to the broken window thing! I did some searching for development and broken windows and found that in 2007, a developer talked about Broken Window Theory:

People are reluctant to break something that works, but not so much when it doesn’t. If the build is already broken, then people won’t spend much time making sure their change doesn’t break it (well, break it further). But if the build is pristine green, then they will be very careful about it.

In 2005, Jeff Atwood mentioned the original source, and said “Maybe we should be sweating the small stuff.”

That stuck with me because I admit that I focus on the little details first. I try to fix and automate where I can, but for political or practical reasons, I often am unable to make the comprehensive system changes I’d like to see.

So, given that most of us live in the real world where some things are just left undone, where do we draw the line? What do we consider a bit of acceptable street litter, and what do we consider a broken window? When is it ok to just reboot the system, and when do you really need to figure out exactly what went wrong?

This decision making process is often the difference between a productive work day, and one filled with frustration.

The strategies that we use to make this choice are probably the most important aspects of system administration and devops today. There, of course, is never a single right answer for every business. But I’m sure there are some themes.

For example:

James posted “Rules for Infrastructure” just the other day, which is a repost of the original gist. What I like about this is that they are phrased philosophically: here are the lines in the sand, and the definitions that we’re all going to agree to.

Where do you draw the line? And how do you communicate to your colleagues where the line is?

Customizing the RPMs from pgrpms.org

To pick up where Devrim left off in customizing RPMs, here are some more tips for getting your very own RPMs built:

  • Create a VM with your favorite operating system (I’m using versions of CentOS). I need both 32-bit OS and 64-bit OS. This is much easier to manage with separate, local VMs.
  • Install spectool (available here), and SVN
  • The other dependancies were: gcc glibc-devel bison flex python-devel tcl-devel readline-devel zlib-devel openssl-devel krb5-devel e2fsprocs-devel libxml2-devel libxslt-devel pam-devel
  • Edit the postgresql-$VERSION.spec file to your liking: If you’re adding patches, you need to add them in TWO places – first in the Patch#: group, and then again below where the %patch# series starts. Finally, if you’re adding an entirely new package (say in 8.2, pg_standby in contrib), you’ll need to also add the binary (or library, or whatever) to the appropriate %files clause later in the spec file. It’s also a good idea to modify ‘Release’. Here’s a sample diff of my spec file:


--- postgresql-8.2.spec (revision 188)
+++ postgresql-8.2.spec (working copy)
@@ -74,7 +74,7 @@
Summary: PostgreSQL client programs and libraries
Name: postgresql
Version: 8.2.17
-Release: 1PGDG%{?dist}
+Release: 1test%{?dist}
License: BSD
Group: Applications/Databases
Url: http://www.postgresql.org/
@@ -95,7 +95,9 @@
Patch4: postgresql-test.patch
Patch6: postgresql-perl-rpath.patch
Patch8: postgresql-prefer-ncurses.patch
+Patch7: postgresql-pgstat-dir.patch
Patch9: postgresql-use-zoneinfo.patch
+Patch10: pg_standby.patch

Buildrequires: perl glibc-devel bison flex
Requires: /sbin/ldconfig initscripts
@@ -282,7 +284,9 @@
%patch4 -p1
%patch6 -p1
%patch8 -p1
+%patch7 -p1
%patch9 -p1
+%patch10 -p1

pushd doc
tar -zcf postgres.tar.gz *.html stylesheet.css
@@ -604,6 +608,7 @@
%{_bindir}/pg_controldata
%{_bindir}/pg_ctl
%{_bindir}/pg_resetxlog
+%{_bindir}/pg_standby
%{_bindir}/postgres
%{_bindir}/postmaster
%{_mandir}/man1/initdb.*

How have you customized RPMs using this repo? Share your .spec files!

Snow Leopard and PostgreSQL: installation help links

snow_leopard_yvonne_n_1968

A few reports of issues have been raised on the mailing lists around upgrading to Snow Leopard. There have been some good tutorials and hints posted on blogs that aren’t in the planet.postgresql.org roll, so here are a few things that might help you out:

Photo courtesy of yvonne_n_1968, under a Creative Commons license