Weekly Feminist Work in Tech by Mozillians roundup – Week of March 3, 2014

We have a ton of individual work done by MoFo and MoCo employees related to feminism, feminist activism and the larger technology community. So much is happening, I can barely keep track!

I’ve reached out to a few people I work with to get some highlights and spread the word about interesting projects we’re all working on. If you are a Mozillian and occasionally or regularly work on feminist issues in the tech community, please let me know! My plan is to ping people every Friday morning and post a blog post about what’s happened in the last week.

Without further ado:

Dispatch from me, Selena Deckelmann:

  • I’m presenting at SF Github HQ on Thurs March 13, 7pm as part of the Passion Projects series (Julie Horvath’s project). I’ll be talking about teaching beginners how to code and contribute to open source, specifically through my work with PyLadies. I’m giving a similar talk this afternoon at Portland State University to their chapter of the ACM.
  • Just wrapped up a Git workshop for PyLadiesPDX and am gearing up for a test-run of a “make a Flask blog in 80-lines of code” workshop! Course materials are available here for “intro to git” workshops.
  • Lukas, Liz, me and others (I’m not sure who all else!!) are coordinating a Geekfeminism and feminist hackerspace meetup at PyCon 2014. The details aren’t published yet, so stay tuned!
  • PyLadies PyCon 2014 lunch is happening again!
  • PyLadies will also be holding a Mani-Pedi party just like in 2013. Stay tuned for details!
  • Brownbags for the most recent GNOME Outreach Program for Women contributors are scheduled for next Friday March 14, 10am and 2pm. (thanks Larissa!!) Tune in at http://air.mozilla.com. One of the GNOME Outreach Program for Women contributors is Jennie Rose Halperin, and another is Sabina Brown.

Dispatch from Liz Henry:

  • I’m doing a lot of work to support Double Union feminist hackerspace, a nonprofit in San Francisco. We are hosting tech and arts workshops, and establishing connections with other hackerspaces in the US and around the world. Lukas is also involved with this effort! We have over 100 members now using the space.
  • For PyCon I would like to host fairly informal sessions in our Feminist Hacker Lounge, on QA, bug triaging, and running/writing WebQA automated tests with pytest and selenium.
  • I’m hoping to have funding for an OPW intern for this upcoming round to work on the back end of a QA community facilitating tool, using Python and various APIs for Mozilla tools like Bugzilla, Moztrap, and the Mozillians profiles.

Dispatch from Lukas Blakk:

  • Just held the Lesbians Who Tech hackathon at the Mozilla SF space and it was an amazing weekend of networking, recruiting for Mozilla, doing a stump speech on the radical/political possibilities of open source, and also just a lot of social fun.
  • I’m nearing the point of Project Kick Off for The Ascend Project which will be a 6 week training course for underrepresented in current tech mainstream (and underemployed/unpaid) persons who will learn how to write automatable tests for MozMill. This first one will take place at the Portland office in Sept/Oct 2014 (Starts on Sept 8th). There’s so much more here, but this is just a sound bite.
  • I’m trying to determine what budget I can get agreement on to put towards women in tech outreach this year.
  • PyCon – yes! Such Feminist, So Hackerspace, Much gathering of geek feminists!

Dispatch from Larissa Shapiro:

  • OPW wrapup and next session – we’re wrapping up the current round, scheduling brownbags for two of the current interns, etc. Funding is nearly secured for the next round and we have like 6 willing mentors. w00t.
  • I’m also providing space for/speaking at an upcoming event in the Mountain View office: last year’s African Techwomen emerging leaders were part of a documentary and the Diaspora African Women’s Network is holding a screening and a planning session for how to support next year’s ELs and other African and African-American bay area women in tech both through this and other projects, March 29. Open to Mozilla folks, let me know if you’re interested.

Anything else that’s come up in the last week, or that you’d like Mozillians to know about? Let me know in the comments!

Printing flashcards on 3×5 index cards

I’m making a few sets of these for a meetup tomorrow night:

Camera_2_18_14_4_58_PM

Camera_2_18_14_4_57_PM-4

In the past, I’d printed out text from a spreadsheet tool and then used a lot of tape and scissors to make the magic happen.

I wanted to step up my game a little and print things directly onto index cards. Printing is a little tricky on most printers because you won’t be able to print double-sided unless you have a very fancy printer. I have a Cannon MG6220 (mostly for printing photos).

Here’s how to set up a word processing app to print out cards (I used Pages):

  1. Configure the page layout to be 3″ x 5″. Here’s the Page Setup dialog box for Pages: Screenshot 2014-02-18 17.18.04
  2. Add your flashcards to the document! Add the cards with the front and back of each flashcard as single, adjacent pages. A sample PDF of what this looks like is here.
  3. To make the printing work, I set up 2 printing presets – one called ‘front flashcards’ and the other called ‘back flashcards’. The ‘front’ is set up to print odd pages only, and ‘back’ is set up to print even pages only and in reverse order.
  4. Print the front side of the cards using your ‘front flashcards’ preset.
  5. Flip the stack of cards over, face down, and put them back into the paper feeder (YMMV with this in the event that your printer is set up differently than mine).
  6. Print the other side of the flashcards using your ‘back flashcards’ preset.

That’s it!

PyLadies Meeting notes from “Negotiating the job market: a panel discussion”

Flora Worley organized a fantastic PyLadies PDX meeting called “Negotiating the job market: a panel discussion“.

The meeting was organized in three parts:

  1. Experiences (good, bad and ugly) from four women who entered the software industry in the last 1-2 years.
  2. Managing expectations and setting boundaries from three people, two recent entrants into the industry, and me.
  3. Negotiating the application/interview/offers process (which we turned into a group discussion, led by one panelist)

We kicked things off by asking people to get into groups of four and talk to each other about why they came and what they were hoping to get out of the meeting.

Some of the comments from the meeting and feedback after included:

  • On what was good prep for interviewing: Attending PyLadies and Python User Group meetups to learn new skills, hear about what modules and techniques people are using. (from Amy Boyle, a local developer)
  • Attending PyLadies helped fill in gaps in knowledge useful as a working programmer, even after having a CS degree.
  • “I love being a Pylady, and if it weren’t for this community, I honestly don’t know that I would have continued learning to code.”

Below are my notes from the first panel, anonymized and edited a bit.

How did you find your first job in the industry and know it was the right place for you?

  • I knew a founder of the company from college
  • Knew someone and they invited me to apply. Wrote a great cover letter and got an interview even though they thought I didn’t have the skills for the exact job I applied for.
  • Got the job by going to talks and staying and talking to the speakers.
  • Decided I was more interested in data than my major! Looked around and found a company that was doing a lot with data.

At what point do you say you “know” a programming language?

  • I shy away from saying I “know” something — seems presumptuous to say that the same way it seems weird to say “I’m a writer.” If you’re getting paid to do a thing, though, then you get to call yourself the “thinger”. Coworker has been asking me for help with python and I know the answers to his questions so…
  • Finding ways to help others with things is a good way to boost your confidence.
  • Once I give a talk about something, I have to say I know how to do it because people will come to me for help. It’s a way to force yourself to cram, find out what you know and really don’t know. On my resume I don’t say “I know” — I say “I have experience with XYZ; I have managed to learn these things to get the job done.” Most technical interviews ask more general questions.. not exact syntax of a language.
  • Interviews seem to be trying to figure out if you can learn whatever it happens to be that they need you for at the job.
  • We have to support many languages. So many languages at the same time can get overwhelming.
  • I don’t say I’m an IOS programmer, but I help people write and improve their IOS code all the time.

What was your interview like?

  • Shared projects I made. We just went through my github repo. Tell me about this project, what did you do, what did you use.
  • Was interviewed over Skype
  • Interviewed with 4 people in a marathon. None of those interviews were super technical.
  • The interviews seemed to be gauging whether you would be able to just talk about whatever comes up.
  • They asked me about a lot of command line skills and brought lots of people in from different parts of the company.
  • The interview had me sit down with a program — there’s a bug in this program solve it. What I learned is what matters is the process of how you work on bugs — and being able to communicate that while you’re doing it, not that you actually fix it.
  • Most valued skill is resilience rather than a technical background. You can learn, compensate, fill in gaps. Need willingness to learn, capacity to be frustrated/despair and just move on. People don’t want to hear you freaking out, just want you to do it. Even if it takes you a long time.

Were there classes or resources that prepared you for interviewing?

  • Go out and make something for yourself. Nothing better than finding a thing you want to make, and then making it. Going back to old code, really fulfilling to see progress and wanting to make old code better. You have something to share!
  • Coming to meetings like PyLadies and the Python User Groups. I took some CS classes, but there was lots that I missed. You don’t realize how much you don’t know! Get to meetups to hear about what others are using, what is current in the field. Having a sense of with what the latest stuff is, what are good blogs, best practices, helps out. (seconded by another panelist)
  • Very helpful to go to PyLadies and talk with people around my same experience level. In conversation, people would explain stuff to you in a way that made sense, not just a bunch of jargon.
  • Learning that ecosystem and what tools people use is huge. Helps you figure out what a job entails and what do you need to know to build it.
  • For most of the stuff you’ll build, you’re going to use other libraries. It helps having experience mushing things together. Iterate on it. Every little thing you learn, you’ll find ways of improving stuff.

What if your coding style is really different than other people’s? How do you handle that?

  • Get good at giving constructive feedback
  • “days of spaghetti code is behind us” at the mercy of other people’s crappy code — tech companies deal with this a lot less.
  • Don’t be afraid to say “no idea what this function ” learned a lot about better code by reviewing a lot of code

What kinds of positions can you get? And where do you want to go?

  • Really overwhelming to learn everything I needed to know to support product. It’s an all guys team, learned to love them all, took on a mother role.
  • Teaching stuff all guy team can be intimidating, but most of the guys I work with are college educated CS guys, me having no CS background was intimidating until I realized I knew and could learn this stuff as quickly as they could. No CS didn’t matter.
  • I’m doing tech support, wish I could do more coding. It’s a mixed bag doig support, you learn a lot about system. Found that management wanted everyone to level up in tech support and not go to other parts of the company. I’ve had to come to terms with managers trying to keep me in that role. Need to see how long they expect you to stay in that role before you can move on — ask up front.
  • Had similar experience — resistance to moving from tech support into other roles. Most people in support are looking to move into another role. Make sure you’re on the same page with manager rather than be surprised by it later.
  • On working with guys: my communication style was not really effective when I started. Sometimes had customers that said “can I talk to tech support?” “can I talk to an engineer?”. Had to learn how to be kind of cocky — “This is how this works.” If I was wrong, I was wrong, and had to fix it but that was much more effective at communicating with customers and coworkers.
  • Had to learn that nothing wrong with asking for help. Saves a lot of time.
  • Everything takes me longer than I think it will. Triple your estimates!
  • I needed to learn its ok to ask questions, ask questions confidently rather than despair of never being a programmer. We all tell ourselves crap like: “i’m just generally stupid and can’t learn anything” or “can’t learn javascript in a day, so clearly i’m an idiot”. But turning that around is important for yourself.
  • Be dilligent in the way you ask questions: 1) first google it, stack overflow, check reference book; 2) write out the question fully (rubber ducking) — articulate the problem fully and you might solve it for yourself — what did i expect, what happened, what else can i expect….
  • I’ve had people say to me: “I don’t know why pyladies needs to exist.” I said: “I think its nice its a safe place where that you can articulate your questions without fear.” I don’t know what to do about it other than say that.
  • Men might be made clear to you that they believe we are post-sexist, post-racist. If you have too many of those people around, find a new job.
  • “you need to be careful about where you are” when expressing opinions about feminism.
  • Be open and honest, and they tend to understand.
  • Sometimes it may be the case you have to figure out if an opinion is being expressed because of privilege or malice. If privilege, it can be worked on.
  • Try to internalizing confidence. Tell yourself: “I know how to program” more. Once I really believed that, my programming got better. Because I know I will solve the problem I’m working on!

Do you find that as a woman you communicate differently and you are interpreted differently?

  • Here’s an example: when someone says: “Can you pass the salt?” One person might respond by thinking, “I understand you’re being polite and phrasing that request as a question rather than a command. Sure, here’s the salt!” Or another person might respond with, “I could pass the salt to you. Is that actually what you want me to do?” Pretty different communication styles.
  • Try to be flexible and ride out conflict.
  • I question myself a lot — did i interrupt too many times, was i too aggressive. I worry I am too aggressive.
  • Be yourself. Take time to figure out if you are happy or not.
  • Try to compromise and avoid “truth bombs” where you “explain how the world is”. Take conversations case by case and try not to take it personally.
  • If it gets to be too much, get out of there!

Everyday Postgres: Describing an “ideal” Postgres Operational Environment

I spent some time thinking about what things in the Postgres environment (and specifically for crash-stats.mozilla.com) make me happy, and which things bother me so much that I feel like something is pretty wrong until they are fixed or monitored.

Here’s what I came up with:

I’m planning to go through each of these items and talk about how we address them in the Web Engineering team, and that will include implementing some new things over the next couple of quarters that we haven’t had in the past.

One thing that didn’t surprise me about this list was how much documentation is needed to keep environments running smoothly. By smoothly, I mean that other people on the team can jump in and fix things, not just a single domain expert.

Sometimes docs come in the form of scripts or code. However, some prose and explanation of the thinking behind the way things works is often also necessary. I frequently underestimate how much domain knowledge I have that I really aught to be sharing for the sake of my team.

Everyday Postgres: Specifying all your INSERT columns

Postgres has so many convenient features, including the ability to not provide a list of columns to an INSERT.

For example:

CREATE TABLE temp_product_versions ( LIKE product_versions );
INSERT INTO temp_product_versions ( SELECT * from product_versions ); 

That’s pretty badass.

However, you may encounter trouble in paradise later if you use this kind of shortcut in production code.

See if you can spot the error in this code sample below.

Here’s the error message:

ERROR:  column "is_rapid_beta" is of type boolean but expression is of type citext
LINE 10:     repository
             ^
HINT:  You will need to rewrite or cast the expression.
QUERY:  INSERT INTO releases_recent
SELECT 'MetroFirefox',
    version,
    beta_number,
    build_id
    update_channel,
    platform,
    is_rapid,
    is_rapid_beta,
    repository
FROM releases_recent
    JOIN products
        ON products.product_name = 'MetroFirefox'
WHERE releases_recent.product_name = 'Firefox'
    AND major_version_sort(releases_recent.version)
        >= major_version_sort(products.rapid_release_version)
CONTEXT:  PL/pgSQL function update_product_versions(integer) line 102 at SQL statement

And here’s the code (long!)


I’m sure quite a few of you found the problem right away. For the rest of us…

Here’s the error message you get if you specify the columns for the INSERT:

ERROR:  INSERT has more target columns than expressions
LINE 10:     repository
             ^
QUERY:  INSERT INTO releases_recent (
    product_name,
    version,
    beta_number,
    build_id,
    update_channel,
    platform,
    is_rapid,
    is_rapid_beta,
    repository
)
SELECT 'MetroFirefox',
    version,
    beta_number,
    build_id
    update_channel,
    platform,
    is_rapid,
    is_rapid_beta,
    repository
FROM releases_recent
    JOIN products
        ON products.product_name = 'MetroFirefox'
WHERE releases_recent.product_name = 'Firefox'
    AND major_version_sort(releases_recent.version)
        >= major_version_sort(products.rapid_release_version)
CONTEXT:  PL/pgSQL function update_product_versions(integer) line 112 at SQL statement

Now, it should be completely obvious. There’s a missing comma after build_id.

Implicit columns for INSERT are a convenient feature when you’re getting work done quickly, they are definitely not a best practice when writing production code. If you know of a linting tool for plpgsql that calls this kind of thing out, I’d love to hear about it and use it.

My nerd story: it ran in the family, but wasn’t inevitable

This is about how I came to identify as a hacker. It was inspired by Crystal Beasley’s post. This is unfortunately also related to recent sexist comments from a Silicon Valley VC about the lack of women hackers. I won’t bother to hate-link, as others have covered his statements fully elsewhere.

I’ve written and talked about my path into the tech industry before, and my thoughts about how to get more women involved. But I didn’t really ever start the story where it probably should have been started: in my grandfather’s back yard.

grandpa-our wedding

I spent the first few years of my life in Libby, MT. Home of the Libby Dam, an asbestos mine and loggers. My grandfather, Bob, was a TV repairman in this very remote part of Montana. He was also a bit of a packrat.

I can still picture the backyard in my mind — a warren of pathways through busted up TVs, circuit boards, radios, transistors, metal scrap, wood and hundreds of discarded appliances that Grandpa would find broken and eventually would fix.

His garage was similarly cramped — filled with baby jars, coffee cans and bizarre containers of electronic stuff, carefully sorted. Grandpa was a Ham, so was my uncle and Grandma. I don’t remember exactly when it happened, but at some point my uncle taught me Morse code. I can remember writing notes full of dots and dashes and being incredibly excited to learn a code and to have the ability to write secret messages.

I remember soldering irons and magnifying lens attachments to glasses. We had welding equipment and so many tools. And tons of repaired gadgets, rescued from people who thought they were dead for good.

Later, we had a 286 and then a 386 in the house. KayPro, I think, was the model. I’d take off the case and peer at the dust bunnies and giant motherboard of that computer. I had no idea what the parts were back then, but it looked interesting, a lot like the TV boards I’d seen in piles when I was little.

I never really experimented with software or hardware and computers as a kid. I was an avid user of software. Which, is something of a prelude to my first 10 years of my professional life as a sysadmin. I’m a programmer now, but troubleshooting and dissecting other people’s software problems still feels the most natural.

Every floppy disk we had was explored. I played Dig Dug and Mad Libs on an Apple IIe like a champ. I mastered PrintShop and 8-in-1, the Office-equivalent suite we had at the time. And if something went wrong with our daisy wheel printer, I was on it – troubleshooting paper jams, ribbon outages and stuck keys.

My stepdad was a welder, and he tried to get me interested in mechanical things (learning how to change the oil in my car was a point of argument in our family). But, to be honest, I really wasn’t that into it beyond fixing paper jams so that I could get a huge banner printed out.

I was good at pretty much all subjects in school apart from spelling and gym class. I LOVED to read — spending all my spare time at the local library for many years, and then consuming books (sometimes two a day) in high school. My focus was: get excellent grades, get a scholarship, get out of Montana.

And there were obstacles. We moved around pretty often, and my new schools didn’t always believe that I’d taken certain classes, or that grades I’d gotten were legit. I had school counselors tell me that I shouldn’t take math unless I planned “to become a mathematician.” I was required to double up on math classes to “make up” algebra and pre-algebra I’d taken one and two years earlier. I gave and got a lot of shit from older kids in classes because I was immature and a smartass. I got beat up on buses, again because I was a smartass and had a limited sense of self-preservation.

The two kids I knew who were into computers in high school were really, really nerdy. I was awkward, in Orchestra, and didn’t wear the kind of cool girl clothes you need to make it socially. I wasn’t exactly at the bottom of the social heap, but I was pretty close for most of high school. And avoiding those kids who hung out after school in the computer lab was something I knew to do to save myself social torture.

Every time I hear people say that all we need to do is offer computer science classes to get more girls coding, I remember myself at that age, and exactly what I knew would happen to me socially if I openly showed an interest in computers.

I lucked out my first year in college. I met a guy in my dorm who introduced me to HTML and the web in 1994. He spent hours telling me story after story about kids hacking hotel software and stealing long distance. He introduced me to Phrack and 2600. He and his friends helped me build my first computer. I remember friends saying stuff to me like, “You really did that?” at the time. Putting things together seemed perfectly natural and fun, given my childhood spent around family doing exactly the same thing.

It took two more years before I decided that I wanted to learn how to program, and that I maybe wanted to get a job doing computer-related work after college. What I do now has everything to do with those first few months in 1994 when I just couldn’t tear myself away from the early web, or the friends I made who all did the same thing.

Jen posted an awesome chart of her nerd story. I made one sorta like it, but couldn’t manage to make it as terse.

Whether it was to try and pursue a music performance career after my violin teacher encouraged me, starting out as a chemistry major after a favorite science teacher in high school told me I should, or moving into computer science after several years of loving, mischievous fun with a little band of hackers; what made the difference in each of these major life decisions was mentorship and guidance from people I cared about.

Maybe that’s a no-brainer to people reading this. I say it because sometimes people think that we come to our decisions about what we do with our lives in a vacuum. That destiny or natural affinity are mostly at work. I’m one anecdata point against destiny being at work, and I have heard lots of stories like mine. Especially from new PyLadies.

Not everyone is like me, but I think plenty of people are. And those people who are like me — who could have picked one of many careers, who like computers but weren’t in love or obsessed with them at a young age — could really use role models, fun projects and social environments that are low-risk in middle school, high school and college. And as adults.

Making that happen isn’t easy, but it’s worth sharing our enthusiasm and creating spaces safe for all kinds of people to explore geeky stuff.

Thanks to an early childhood enjoyment of electronics and the thoughtfulness of a few young men in 1994, I became the open source hacker I am today.

Everyday Postgres: INSERT with SELECT

This is a continuation of a series of posts about how I use Postgres everyday.

One of the most pleasant aspects of working with Postgres is coming across features that save me lots of typing. Whenever I see repetitive SQL queries, I now tend to assume there is a feature available that will help me out.

One such feature is INSERT using a SELECT, and beyond that, using the output of a SELECT statement in place of VALUES.

Take for example:

INSERT into foo_bar (foo_id, bar_id) VALUES ((select id from foo where name = 'selena'), (select id from bar where type = 'name'));    
INSERT into foo_bar (foo_id, bar_id) VALUES ((select id from foo where name = 'funny'), (select id from bar where type = 'name'));
INSERT into foo_bar (foo_id, bar_id) VALUES ((select id from foo where name = 'chip'), (select id from bar where type = 'name'));

I think a lot of people know that this is possible. There are a few problems with it – like if the result of the WHERE clause isn’t unique in both cases, you’d get an error. In this case, id in both tables were surrogate keys, with both name and type being unique.

What some people don’t realize is that you can SELECT, and then directly insert that into a table:

INSERT into foo_bar (foo_id, bar_id) ( 
  SELECT foo.id, bar.id FROM foo CROSS JOIN bar 
    WHERE type = 'name' AND name IN ('selena', 'funny', 'chip') 
);

If the values you wanted to take from the table bar were not all the same, the query would be considerably more complex. Given that I only am interested in a single value from bar, and I want it joined with a series of explicitly selected values from foo, this version of the query saves me a lot of typing.

The bigger picture, however, was pointed out in the comments by Marko:

VALUES is just a special type of SELECT and that INSERT writes the
result of an arbitrary SELECT statement into the table. Consider:

SELECT 1; vs. VALUES (1);

SELECT * FROM (SELECT 1) sq; vs. SELECT * FROM (VALUES (1)) sq;

INSERT INTO quix VALUES (1); vs. INSERT INTO quix SELECT 1;

The reason VALUES is often used with INSERT is that many RDMBSs don’t
support SELECT without a FROM clause, so using VALUES is more
convenient. It’s also handy if you have a list of data you want to
SELECT, e.g. VALUES (..), (..), (..);

I may have referenced this feature a few times when breaking down functions used for reports in Socorro. It’s super convenient and saves quite a bit of typing! You can put any valid SQL query in there, including CTEs. The documentation for INSERT provides a few more examples.

Everyday Postgres: Tuning a brand-new server (the 10-minute edition)

Server tuning is a topic that consumes many books, blog posts and wiki pages.

Below is some practical advice for getting low-hanging fruit out of the way if you’re new to tuning Postgres and just want something that will likely work well-enough on low volume systems. I’d say looking at this list and making changes on a new system should take 10 minutes or less.

Run pgtune

Greg Smith open sourced a utility for making a first pass at tuning Postgres for a local system with pgtune. This tool is easy to run – just copy it to a target system and then point it at your existing Postgres config. It puts its changes into a new file at the very bottom.

Use XFS

Filesystem choice matters. Greg Smith goes into some detail on why ext3 is a terrible performance choice for a database filesystem in his talk Righting Your Writes. At this point, XFS is the filesystem that should be your default choice. If you want to explore ext4 or zfs (if that’s an option for you), that may be worth looking at. It is “safe” however to choose XFS. Depending on your disk situation, recreating your filesystem might take a bit longer than 10 minutes, but hopefully this will save you time and bad performance in the future!

Increase your readahead buffer

On Linux, the readahead buffer (brief explanation) is set way to small for most database systems. Increase this to about 1 MB with blockdev -setra 2048 [device].

For further performance analysis

I wrote this performance checklist a while back for assessing a system’s health. I’d say a review of all the things on that list would take probably half a day. Following up and making the changes could take a day or more. These kinds of analysis are worth exploring periodically to ensure you haven’t missed important changes in your environment or your application over time.

Everyday Postgres: How I write queries using psql: Common Table Expressions

This this series of posts about using Postgres every day. The last post was about \ commands in psql.

I’m now going to share in a series of posts my workflow for writing queries, and some of the things about working with Postgres that I take for granted in writing queries.

Shortcuts I can’t live without

Three important shortcuts you should learn are:

  • \e: Pulls the last query you executed into a buffer in your favorite editor
  • \df+ [function]: This displays [function] information, and the + dumps the function itself to STDOUT
  • \ef [function]: This pulls [function] into a buffer in your favorite editor. This is the most convenient way to grab a copy of an individual function for me.
  • \ef: This opens your favorite editor and puts a template for a function (in any supported procedural language) in a buffer

I’ll talk about writing functions in a future post.

Thinking in CTEs

In searching through my recent psql history, I found quite a few WITH queries. These are Common Table Expressions, a useful feature supported by many databases that allows you to embed subqueries in your SQL in a very readable format. CTEs have a lot more interesting features and properties, like RECURSIVE.

However, I tend to just use CTEs as a more convenient form of a subquery. This allows me to break apart long queries into smaller, testable chunks. I usually will write a subquery so that it’s in my command history, generate some fake data for testing, and go back to that query in my history to test edge cases.

I iterate on the smaller tables until I have a set of understandable “paragraphs” of SQL. This makes it easier for me to explain the logic of the query with others, and makes testing each piece easier in the event that something breaks. Usually, when a CTE breaks, I’ve made an assumption about incoming data that’s incorrect.

The composability of SQL is often terrible. CTEs help break apart the complexity visually. There’s some warnings about CTEs not performing well under certain circumstances. My approach is to design with CTEs and optimize for performance only if needed.

Other advantages of CTEs

In case you’re not yet convinced CTEs are worth learning, I made a bullet list of advantages from some useful comments about how others are using CTEs:

  • Alternative to throwaway VIEWs and temporary tables when querying replicas (comment from bma)
  • Variable declaration – to emulate DECLARE in SQL Server, for example
  • Easier to understand queries and faster development time (ME)

An example of the kinds of queries I write

Something you’d see a lot in my command history are queries that look like this:

WITH crashes AS (                                                               
    SELECT uptime_string AS category                                                      
        , sum(report_count) AS report_count                                     
    FROM signature_summary_uptime                                               
        JOIN signatures USING (signature_id)                                       
    WHERE                                                                           
        signatures.signature = 'Fake Signature #1'                                             
        AND report_date >= '2013-08-05T00:00:00+00:00'::timestamptz             
        AND report_date < '2013-08-12T00:00:00+00:00'::timestamptz              
        AND product_name IN ('Firefox')  AND version_string IN ('1')            
    GROUP BY category                                                           
),                                                                              
totals AS (                                                                     
    SELECT                                                                      
        category                                                                
        , report_count                                                          
        , sum(report_count) OVER () as total_count                              
    FROM crashes                                                                
)                                                                               
SELECT category                                                                 
    , report_count                                                              
    , round((report_count * 100::numeric)/total_count,3)::TEXT                  
as percentage                                                                   
FROM totals                                                                     
ORDER BY report_count DESC                                                      
;

You’ll see that I have one or more WITH clauses, and then a query that performs a final summary query using the data from the CTEs.

This query probably was asked for something like this:

Please provide counts of crashes with the same uptime, for Firefox version 1, and the signature ‘Fake Signature #1’ for the last week, including a percentage of all of the sampled crashes.

While I’m sure there are better ways to write the query above, I wanted to show how I have made a pattern for myself to speed up query writing. I’m not always interested in the best possible query. Hopefully, the Postgres planner makes up for many of my sins as a developer!

What I am interested in is finding answers to problems quickly for my coworkers.

In answering the question I was asked, I first dig out an appropriate summary table (we have quite a few in Socorro). I found the signature_summary_uptime table, and fortunately it has product_name and version_string available in the table. I only need to join signatures to fulfill the request. (Yay for denormalized data that supports the kinds of queries we often run!)

Next, I see that I’m being asked for a total percentage, so I need to calculate a sum across all the rows that I retrieve. That can be very slow, so I create a second CTE that uses data from the first CTE (rather than doing two full table scans to calculate the total). I use a window function instead of SUM() here because I’ve done experiments to see which tends to be faster.

And, finally once I have all the data together, I run my final query using my two CTE tables.

How CTEs and breaking down this process have helped me

So, I’ve had about a year to practice. A query like this today takes me 10-15 minutes to assemble and test. They are typically slightly more complex — with more dependencies, and maybe 2-3 more tables involved in JOINs. But they follow the same basic pattern.

Most queries on my data sets conform to recognizable patterns.

After a few months, we recognized that moving JSON for crash data into Postgres also would be a win, and was easy to process using very similar queries.

That’s all helped make finding answers about Firefox crashes easier and faster!