The Final Crontab: an introduction to crontabber

I gave a talk at Monitorama today about crontabber. (slides)

My coworker tells me that I left out the part of “why you should care” about crontabber from my first few slides. So here’s a list:

  • Retries jobs on failure automatically
  • Dependency-aware, and won’t execute child jobs that depend on parents that have failed
  • Nagios integration including support for WARNINGs and CRITICALs, and configurable escalation from WARNING to CRITICAL (e.g. 3 WARNINGS == CRITICAL).

Those three are probably the top features sysadmins who are not happy with how cron is managing jobs wish they had.

Crontabber needs at least Python 2.6, Postgres 9.2, is FOSS and being used in production. We’ve used a version of the code since February 2013, and currently have the python module version you can install with pip install crontabber is currently running in our stage environment.

Let us know what you think!

VPN Problems and Ubuntu: killing off the dnsmasq zombie

I’ve been having problems with VPN, DNS and Ubuntu for a year. But, I’m also pretty lazy when it comes to spending time on configuration. And configuring VPNs is like last on my list of ways I’d like to spend my time.

In short, I’d rather reboot than figure out exactly why my networking just stopped working.

REBOOT.

Fortunately, I had an easy (for me) work-around for most of my VPN needs: use SSH and a jump-host for getting to servers. I found it annoying when I wanted to look at a website on protected network space, or had a service on an unusual port that I wanted to test things against. I would work around with SSH tunnels, or I would fire up my Mac, whose VPN settings worked flawlessly.

That all said, I thought today, a sunny, lovely fall day in Portland, I would fix my VPN.

And so, my buddy @uberj_ helped me get things sorted.

The root cause of all my VPN heartache was the dnsmasq daemon controlling my DNS. And, related, network-manager. There are a few places that document exactly how to disable dnsmasq

  • DNS in Ubuntu 12.04 http://www.stgraber.org/2012/02/24/dns-in-ubuntu-12-04/
  • Disabling dnsmasq as your local DNS server in Ubuntu http://mark.orbum.net/2012/05/14/disabling-dnsmasq-as-your-local-dns-server-in-ubuntu/

However, they leave out one important step: killing off the existing dnsmasq process. For the unlucky, restarting network-manager does not kill off dnsmasq.

So, to find and kill dnsmasq, do the following:

 sudo service network-manager stop
 kill `ps -C dnsmasq -o pid=`
 sudo service network-manager start

Then, start your VPN and check out the contents of the /etc/resolv.conf. If all went well, you’ve got nameserver addresses other than 127.0.0.1 in the file.

Yay!

Sadly, this was not the end of my story.

After a few minutes, NetworkManager started dnsmasq up again!

Zombie dnsmasq

So, like any reasonable sysadmin, I opened up the /etc/NetworkManager/NetworkManager.conf file, uncommented the dns=dnsmasq line, and replaced it with dns=/dev/null. My guess was that you can probably put just about anything other than dnsmasq into that line to permanently disable the plugin.

I ran sudo service network-manager restart, checked /etc/resolv.conf and felt pretty smug.

I tried also uninstalling dnsmasq-base package, but unfortunately that takes out a number of other packages I appear to need. So, I left /dev/null in my NetworkManager.conf, and updated this blog post.

But wait...

While editing this blog post, dnsmasq took over my DNS settings again.

A clue as to what was happening was in /var/log/syslog:

Oct 18 10:20:10 localhost dnsmasq[30535]: started, version 2.59 cache disabled
Oct 18 10:20:10 localhost dnsmasq[30535]: compile time options: IPv6 GNU-getopt DBus i18n DHCP TFTP conntrack IDN
Oct 18 10:20:10 localhost dnsmasq[30535]: DBus support enabled: connected to system bus
Oct 18 10:20:10 localhost dnsmasq[30535]: warning: no upstream servers configured

It turns out that dnsmasq was still getting revived by NetworkManager. Why NetworkManager doesn’t seem to care about configuration settings was beyond my willingness to investigate today. So, I did some more searching about truly killing of dnsmasq for good.

And I found this thread, and this sample configuration file. In the output for the dnsmasq process from ps:

nobody   30777 30759  0 10:21 ?        00:00:00 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/var/run/sendsigs.omit.d/network-manager.dnsmasq.pid --listen-address=127.0.0.1 --conf-file=/var/run/nm-dns-dnsmasq.conf --cache-size=0 --proxy-dnssec --enable-dbus --conf-dir=/etc/NetworkManager/dnsmasq.d

I dug into the thread, and the suggestion was to set port=0 in the config. I created a file called custom in /etc/NetworkManager/dnsmasq.d. And ran sudo service network-manager restart.

And then I got this in my syslog:

Oct 18 10:21:10 localhost dnsmasq[30777]: started, version 2.59 DNS disabled

FINALLY.

FINALLY!

Setting up HBase for Socorro

Setting up HBase for use with Socorro is a bit of a bear! The default Vagrant config sets up a VM with filesystem-only. For those that want to try out the HBase support, or are on a path toward setting up a production instance, these instructions might help you along the way.

You may also be interested in Lars’ recent blog posts about Socorro:

Here’s how I got it all working on an Ubuntu Precise (12.04) system, along with some scripts for launching important processes and putting test crashes into the system so you can tell that it is working. Ultimately, my goal is to incorporate all of this into some setup scripts to help new users out.

Set up HBase and Thrift

Socorro uses the Thrift API to insert new crashes and retrieve them through the middleware layer. These Quickstart instructions are pretty helpful for getting HBase installed.

Then, you need to edit

/etc/hosts

and remove the ‘127.0.1.1’ entry, and add your hostname to the localhost ‘127.0.0.1’ line. Also, it’s helpful for the defaults to add ‘crash-stats‘ and ‘crash-reports‘ as host aliases. Your final config line for localhost would look like:

127.0.0.1       localhost wuzetian crash-reports crash-stats

(where wuzetian is your hostname)

You also need to add configuration for HBase. Here’s an example:


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///var/tmp/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/var/tmp/zookeeper</value>
  </property>
</configuration>

That sets the location for your HBase files for and zookeeper. This setup is for testing, so I put the directories in a location can easily clear out.

Then, to start HBase and Thrift up:

/etc/init.d/hadoop-hbase-master start
/etc/init.d/hadoop-hbase-thrift start

Setting up processor tools

The processor that looks at raw crashes runs two tools by default: minidump_stackwalk and exploitable.

You can build these from the socorro source tree with:

make minidump_stackwalk

Then make install should put these files into a useful location.

You can also just copy the binaries from the stackwalk/bin directory and the other is exploitable/exploitable.

The paths for these are configured in config/processor.ini: exploitability_tool_pathname and minidump_stackwalk_pathname

There’s also a symbols resolver configured, but I am not setting this up in my test.

Disable LZO compression for HBase (unless you have it configured

Our hbase schema is configured to use LZO compression by default. Change that to ‘NONE’ and load the schema into hbase:

/bin/cat /home/socorro/dev/socorro/analysis/hbase_schema | sed 's/LZO/NONE/g' | /usr/bin/hbase shell

Set up crashmover

Update two lines in scripts/config/collectorconfig.py:

localFS.default = '/home/socorro/primaryCrashStore'
fallbackFS.default = '/home/socorro/fallback'

Set those to directories that you can store crash dumps.

Configure processor and monitor to use HBase

You need to set the processor up to use HBase instead of local crash storage.

The easiest way to do this is as follows:

PYTHONPATH=. python socorro/processor/processor_app.py --admin.conf=./config/processor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/processor2.ini
PYTHONPATH=. python socorro/processor/monitor_app.py --admin.conf=./config/monitor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/monitor2.ini

Then edit both files to reflect your HBase configuration.

Starting up

The docs suggest starting up four daemons in screen sessions. I mocked up a shell script and a screenrc to get you started.

And that’s it! You should now have a working system, with crashes being submitted and stashed into HBase, and the monitor and processor picking up crashes as they arrive and running the stackwalk and exploitable tools against the crashes.

Please let me know if these instructions work, or don’t work, for you.

Updates on my Lenovo X230 situation: Skype, screencap work; Vidyo not so much

Here was my wish list from before:

  • Camera working: Done! The trick was ‘uvcvideo‘, which I eventually built as a kernel module.
  • A Skitch replacement: Mostly done. I was given Shutter Project as a recommendation. I haven’t had a look at it yet. PrtSc actually takes pics of my visible desktop and I added a Firefox Addon called “Awesome Screenshot”. That solves my problems for now.
  • Vidyo working: Not working. I can now get video, and audio OUT, but I can’t hear other people. I need to dig into and troubleshoot this more. Skype, however, does work well. It does tend to flake out (slow video, loss of audio) far more on the Lenovo than on the Mac.
  • A package for my .bash_profile, .ssh and .gpg directories that I can install in any new system: Not done.
  • A better driver for the touchpad that doesn’t let my mouse jump around while I’m typing: Not done.
  • Change configuration to have the mouse behave like the latest OS X (reverse scrolling): Not done.

Overall, I feel much more comfortable on my Linux laptop now than my Mac. The mousing in particular is frustrating without buttons on the Mac.

I still switch back and forth because of Vidyo. I’m hoping in the next week or so to figure out what’s wrong with my audio and get it solved for good.

The nicest productivity improvements have been around test servers like HBase and Thrift, and being able to recompile my kernel at a moment’s notice for new features.

I am a feminist hacker: Reflections on the first AdaCamp

I had a wonderful time at the first AdaCamp, held in Melbourne, Australia on January 14, 2012.

I didn’t take notes during most of the sessions, and spent a lot of time listening and thinking.

The two important things I took away from the first AdaCamp were about context – my context, and the camp itself.
Continue reading

Puppet Faces: defaults and ‘puppet node clean’

Puppet Faces are an extendable API for tricking out your Puppet instances. (“Faces” is just short for “Interfaces”.) Just a couple days ago I wrote about my survey of puppet + ec2 provisioning tools.

The problem I’m trying to solve, which I don’t feel like I’ve solved well, is how to give a type to a new system at bootstrap time, without using DNS. The type variable maps to a node manifest group, and determines the personality of a host – is it a database, webserver or development instance?
Continue reading

Going from Vagrant and Puppet into EC2: A short survey of 5 tools (and two I didn’t bother trying)

I thought this would be easy.

I started using Vagrant, and was productive with it in about a day. Really a couple hours. Most of my time was spent downloading the correct version of VirtualBox, looking for starter images and then a small amount of time experimenting with the Vagrantfile scripting language (for multiple VMs).

And we made some Puppet configs.
Continue reading

Day 2 at PgConf.EU: hallway track and the marketing of Postgres

The hallway track is always my favorite part of the conference. I had to give a full-length and a lightning talk today, so much of my time was spent making sure I was really prepared and then giving the talks!

But between talks, I got to chat with Heroku, 2ndQuadrant and EnterpriseDB folks about what they think is coming next in the world of enterprise development and Postgres.

One topic that I touched on in those conversations and my lightning talk (Postgres needs an aircraft carrier) was that our plan for world domination needs to get quite a bit more specific and actionable.

For the open source community, the right question is not “are we ready to tackle the enterprise?” — the right question is: Which market segment and customer group are we going to target for complete market domination?

One area that we definitely already dominate is online poker. We have had a few blog posts about it, but not a whole lot else. Another is GIS through PostGIS.

I created a survey to try and capture some scenarios from the developers who work with customers every day solving problems. We need to know more about the people using Postgres and the way that they use the database.

If we can get 30 responses, I’ll publish the results. It’s a bit long, and requires some thought, so I imagine it will take some time to get them all.

If you have a customer that you think represents a good target market for Postgres, take 10 minutes and fill out the survey for us!