The Final Crontab: an introduction to crontabber

I gave a talk at Monitorama today about crontabber. (slides)

My coworker tells me that I left out the part of “why you should care” about crontabber from my first few slides. So here’s a list:

  • Retries jobs on failure automatically
  • Dependency-aware, and won’t execute child jobs that depend on parents that have failed
  • Nagios integration including support for WARNINGs and CRITICALs, and configurable escalation from WARNING to CRITICAL (e.g. 3 WARNINGS == CRITICAL).

Those three are probably the top features sysadmins who are not happy with how cron is managing jobs wish they had.

Crontabber needs at least Python 2.6, Postgres 9.2, is FOSS and being used in production. We’ve used a version of the code since February 2013, and currently have the python module version you can install with pip install crontabber is currently running in our stage environment.

Let us know what you think!

Setting up HBase for Socorro

Setting up HBase for use with Socorro is a bit of a bear! The default Vagrant config sets up a VM with filesystem-only. For those that want to try out the HBase support, or are on a path toward setting up a production instance, these instructions might help you along the way.

You may also be interested in Lars’ recent blog posts about Socorro:

Here’s how I got it all working on an Ubuntu Precise (12.04) system, along with some scripts for launching important processes and putting test crashes into the system so you can tell that it is working. Ultimately, my goal is to incorporate all of this into some setup scripts to help new users out.

Set up HBase and Thrift

Socorro uses the Thrift API to insert new crashes and retrieve them through the middleware layer. These Quickstart instructions are pretty helpful for getting HBase installed.

Then, you need to edit

/etc/hosts

and remove the ‘127.0.1.1’ entry, and add your hostname to the localhost ‘127.0.0.1’ line. Also, it’s helpful for the defaults to add ‘crash-stats‘ and ‘crash-reports‘ as host aliases. Your final config line for localhost would look like:

127.0.0.1       localhost wuzetian crash-reports crash-stats

(where wuzetian is your hostname)

You also need to add configuration for HBase. Here’s an example:


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///var/tmp/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/var/tmp/zookeeper</value>
  </property>
</configuration>

That sets the location for your HBase files for and zookeeper. This setup is for testing, so I put the directories in a location can easily clear out.

Then, to start HBase and Thrift up:

/etc/init.d/hadoop-hbase-master start
/etc/init.d/hadoop-hbase-thrift start

Setting up processor tools

The processor that looks at raw crashes runs two tools by default: minidump_stackwalk and exploitable.

You can build these from the socorro source tree with:

make minidump_stackwalk

Then make install should put these files into a useful location.

You can also just copy the binaries from the stackwalk/bin directory and the other is exploitable/exploitable.

The paths for these are configured in config/processor.ini: exploitability_tool_pathname and minidump_stackwalk_pathname

There’s also a symbols resolver configured, but I am not setting this up in my test.

Disable LZO compression for HBase (unless you have it configured

Our hbase schema is configured to use LZO compression by default. Change that to ‘NONE’ and load the schema into hbase:

/bin/cat /home/socorro/dev/socorro/analysis/hbase_schema | sed 's/LZO/NONE/g' | /usr/bin/hbase shell

Set up crashmover

Update two lines in scripts/config/collectorconfig.py:

localFS.default = '/home/socorro/primaryCrashStore'
fallbackFS.default = '/home/socorro/fallback'

Set those to directories that you can store crash dumps.

Configure processor and monitor to use HBase

You need to set the processor up to use HBase instead of local crash storage.

The easiest way to do this is as follows:

PYTHONPATH=. python socorro/processor/processor_app.py --admin.conf=./config/processor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/processor2.ini
PYTHONPATH=. python socorro/processor/monitor_app.py --admin.conf=./config/monitor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/monitor2.ini

Then edit both files to reflect your HBase configuration.

Starting up

The docs suggest starting up four daemons in screen sessions. I mocked up a shell script and a screenrc to get you started.

And that’s it! You should now have a working system, with crashes being submitted and stashed into HBase, and the monitor and processor picking up crashes as they arrive and running the stackwalk and exploitable tools against the crashes.

Please let me know if these instructions work, or don’t work, for you.