Setting up HBase for Socorro

Setting up HBase for use with Socorro is a bit of a bear! The default Vagrant config sets up a VM with filesystem-only. For those that want to try out the HBase support, or are on a path toward setting up a production instance, these instructions might help you along the way.

You may also be interested in Lars’ recent blog posts about Socorro:

Here’s how I got it all working on an Ubuntu Precise (12.04) system, along with some scripts for launching important processes and putting test crashes into the system so you can tell that it is working. Ultimately, my goal is to incorporate all of this into some setup scripts to help new users out.

Set up HBase and Thrift

Socorro uses the Thrift API to insert new crashes and retrieve them through the middleware layer. These Quickstart instructions are pretty helpful for getting HBase installed.

Then, you need to edit


and remove the ‘’ entry, and add your hostname to the localhost ‘’ line. Also, it’s helpful for the defaults to add ‘crash-stats‘ and ‘crash-reports‘ as host aliases. Your final config line for localhost would look like:       localhost wuzetian crash-reports crash-stats

(where wuzetian is your hostname)

You also need to add configuration for HBase. Here’s an example:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

That sets the location for your HBase files for and zookeeper. This setup is for testing, so I put the directories in a location can easily clear out.

Then, to start HBase and Thrift up:

/etc/init.d/hadoop-hbase-master start
/etc/init.d/hadoop-hbase-thrift start

Setting up processor tools

The processor that looks at raw crashes runs two tools by default: minidump_stackwalk and exploitable.

You can build these from the socorro source tree with:

make minidump_stackwalk

Then make install should put these files into a useful location.

You can also just copy the binaries from the stackwalk/bin directory and the other is exploitable/exploitable.

The paths for these are configured in config/processor.ini: exploitability_tool_pathname and minidump_stackwalk_pathname

There’s also a symbols resolver configured, but I am not setting this up in my test.

Disable LZO compression for HBase (unless you have it configured

Our hbase schema is configured to use LZO compression by default. Change that to ‘NONE’ and load the schema into hbase:

/bin/cat /home/socorro/dev/socorro/analysis/hbase_schema | sed 's/LZO/NONE/g' | /usr/bin/hbase shell

Set up crashmover

Update two lines in scripts/config/

localFS.default = '/home/socorro/primaryCrashStore'
fallbackFS.default = '/home/socorro/fallback'

Set those to directories that you can store crash dumps.

Configure processor and monitor to use HBase

You need to set the processor up to use HBase instead of local crash storage.

The easiest way to do this is as follows:

PYTHONPATH=. python socorro/processor/ --admin.conf=./config/processor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/processor2.ini
PYTHONPATH=. python socorro/processor/ --admin.conf=./config/monitor.ini --source.crashstorage_class=socorro.external.hbase.crashstorage.HBaseCrashStorage --admin.dump_conf=config/monitor2.ini

Then edit both files to reflect your HBase configuration.

Starting up

The docs suggest starting up four daemons in screen sessions. I mocked up a shell script and a screenrc to get you started.

And that’s it! You should now have a working system, with crashes being submitted and stashed into HBase, and the monitor and processor picking up crashes as they arrive and running the stackwalk and exploitable tools against the crashes.

Please let me know if these instructions work, or don’t work, for you.

Updates on my Lenovo X230 situation: Skype, screencap work; Vidyo not so much

Here was my wish list from before:

  • Camera working: Done! The trick was ‘uvcvideo‘, which I eventually built as a kernel module.
  • A Skitch replacement: Mostly done. I was given Shutter Project as a recommendation. I haven’t had a look at it yet. PrtSc actually takes pics of my visible desktop and I added a Firefox Addon called “Awesome Screenshot”. That solves my problems for now.
  • Vidyo working: Not working. I can now get video, and audio OUT, but I can’t hear other people. I need to dig into and troubleshoot this more. Skype, however, does work well. It does tend to flake out (slow video, loss of audio) far more on the Lenovo than on the Mac.
  • A package for my .bash_profile, .ssh and .gpg directories that I can install in any new system: Not done.
  • A better driver for the touchpad that doesn’t let my mouse jump around while I’m typing: Not done.
  • Change configuration to have the mouse behave like the latest OS X (reverse scrolling): Not done.

Overall, I feel much more comfortable on my Linux laptop now than my Mac. The mousing in particular is frustrating without buttons on the Mac.

I still switch back and forth because of Vidyo. I’m hoping in the next week or so to figure out what’s wrong with my audio and get it solved for good.

The nicest productivity improvements have been around test servers like HBase and Thrift, and being able to recompile my kernel at a moment’s notice for new features.

Abstract for PSU Tech Talk, Feb 1, 4pm

I’m doing a tech talk at PSU about open source community:

Collaborative chaos: what it means to write code, manage projects and work with people in open source communities

Working in software and with computers means wildly different things depending on who you talk to. In open source, the work spans every aspect of software development — from the marketing and documentation to the troubleshooting end-user systems.

The “community manager” or “organizer” role in open source communities is probably the least-well defined in our industry, but is seen as a crucial part of open source software development. 

Selena will talk about her work as a serial user group starter, open source conference circuit speaker, conference organizer and contributor to PostgreSQL — all roles considered part of community management. She’ll also talk about other kinds of community management roles available at small and large companies, or as a volunteer in an open source project. 

Selena is a major contributor to PostgreSQL, she founded and runs the Postgres Open conference and keeps chickens. Selena has been working with open source software for over 15 years.She’s keynoted at SCALE, DjangoCon and LISA, and regularly gives technical talks about Postgres, open source and trolling. She is currently a data architect at Mozilla, makers of the Firefox browser.

Current status: little victories

I’ve got a lot going on right now.

Nothing feels momentous about any particular thing. I’m trying a lot of new ideas and work, struggling, failing and trying again. The transition from the last couple of years of insane travel and starting a business to development work and staying closer to home has been a very good one.


For those that have asked about my work status recently:

Mozilla is great. You can see a lot of the work I do in the Socorro commit feed. Or, in my bug feed. I hang out in #breakpad, #db and a few other channels on And I’m going to give a talk about Postgres and Backups on February 6th, based on the research I’ve been doing into open source solutions for binary backups.

It’s wonderful to be working in public. I love how much time I have to write software and think about database architecture. I’ve been digging out of a backlog of application and DBA-related work and just coming up to speed on Socorro for a couple months, and that’s starting to pay off.

It’s also wonderful to have coworkers, working on the same things. Most of my work life has been solitary, both in physical proximity and the work itself. Now, all my code is reviewed and I work closely with developers and engineers, daily, on everything.


I’ve been organizing PyLadies meetups with Flora Worley and a few others. We now have more than 60 people who have joined the Meetup, and over 20 women show up to every workshop and hackathon. It feels quite unreal to have 20 women I didn’t know a month ago showing up, forking repos and sending me commits every day. I ask newcomers to send me a commit that links them to our github landing page.

Travel/Speaking in 2013

I’m giving a talk at Portland State University on Feb 1. I’ll be in Mountain View Feb 4-8.

I’m confirmed to be speaking at PyCon March 16 about K-12 teachers and what we in the open source community can do to help them.

I’ll be speaking at a conference in Taiwan in April, and another in the US in May.

Recent talks

My most recent talk was a plenary session at LISA 2012, a USENIX conference in San Diego. It was about the false dichotomy of Education vs Training, and what we can do to improve education of sysadmins. Specifically, I gave shout outs to!


So many other little things are going on. I restarted my sourdough and I’m reorganizing my house, one room at a time. We’re remodeling bits of the basement. We replaced a terrible light fixture in the house, and got an ESPN subscription with cable (which I love and hate at the same time). I’m reading and re-reading some lovely science fiction, at a pace of about 2 books a week. I’m walking more, catching up with family and planning things all the way into 2014.

I’m saying “no” a lot recently to doing more things, volunteering for conferences, and travel. Which, is hard.

Of all the stuff I’m working on right now, PyLadies is the hardest and the most rewarding. So, I’m making space in my life for that, for the little bits of teaching I get to do, and for connecting more women with each other and the open source communities that I love.