{"id":5068,"date":"2013-11-12T08:46:34","date_gmt":"2013-11-12T16:46:34","guid":{"rendered":"http:\/\/www.chesnok.com\/daily\/?p=5068"},"modified":"2013-11-13T13:00:19","modified_gmt":"2013-11-13T21:00:19","slug":"how-i-write-queries-using-psql-ctes","status":"publish","type":"post","link":"https:\/\/www.chesnok.com\/daily\/2013\/11\/12\/how-i-write-queries-using-psql-ctes\/","title":{"rendered":"Everyday Postgres: How I write queries using psql: Common Table Expressions"},"content":{"rendered":"<p>This this series of posts about using Postgres every day. The last post was about <a href=\"http:\/\/www.chesnok.com\/daily\/2013\/11\/06\/top-10-psql-commands-i-use\/\"><code>\\<\/code> commands in psql<\/a>.<\/p>\n<p>I&#8217;m now going to share in a series of posts my workflow for writing queries, and some of the things about working with Postgres that I take for granted in writing queries.<\/p>\n<h2>Shortcuts I can&#8217;t live without<\/h2>\n<p>Three important shortcuts you should learn are:<\/p>\n<ul>\n<li><code>\\e<\/code>: Pulls the last query you executed into a buffer in your favorite editor <\/li>\n<li><code>\\df+ [function]<\/code>: This displays <code>[function]<\/code> information, and the <code>+<\/code> dumps the function itself to STDOUT<\/li>\n<li><code>\\ef [function]<\/code>: This pulls <code>[function]<\/code> into a buffer in your favorite editor. This is the most convenient way to grab a copy of an individual function for me.<\/li>\n<li><code>\\ef<\/code>: This opens your favorite editor and puts a template for a function (in any supported procedural language) in a buffer<\/li>\n<\/ul>\n<p>I&#8217;ll talk about writing functions in a future post.<\/p>\n<h2>Thinking in CTEs<\/h2>\n<p>In searching through my recent <code>psql<\/code> history, I found quite a few <code>WITH<\/code> queries. These are <a href=\"http:\/\/www.postgresql.org\/docs\/current\/static\/queries-with.html\">Common Table Expressions<\/a>, a useful feature supported by many databases that allows you to embed subqueries in your SQL in a very readable format. CTEs have a lot more interesting features and properties, like <code>RECURSIVE<\/code>.<\/p>\n<p>However, I tend to just use CTEs as a more convenient form of a <strong>subquery<\/strong>. This allows me to break apart long queries into smaller, testable chunks. I usually will write a subquery so that it&#8217;s in my command history, generate some fake data for testing, and go back to that query in my history to test edge cases.<\/p>\n<p>I iterate on the smaller tables until I have a set of understandable &#8220;paragraphs&#8221; of SQL. This makes it easier for me to explain the logic of the query with others, and makes testing each piece easier in the event that something breaks. Usually, when a CTE breaks, I&#8217;ve made an assumption about incoming data that&#8217;s incorrect.<\/p>\n<p>The composability of SQL is often terrible. CTEs help break apart the complexity visually. There&#8217;s some warnings about CTEs not performing well under certain circumstances. My approach is to design with CTEs and optimize for performance only if needed.<\/p>\n<h2>Other advantages of CTEs<\/h2>\n<p>In case you&#8217;re not yet convinced CTEs are worth learning, I made a bullet list of advantages from some useful comments about how others are using CTEs:<\/p>\n<ul>\n<li>Alternative to throwaway VIEWs and temporary tables when querying replicas (<a href=\"http:\/\/www.chesnok.com\/daily\/2013\/11\/12\/how-i-write-queries-using-psql-ctes\/comment-page-1\/#comment-1121281986\">comment from bma<\/a>)<\/li>\n<li><a href=\"http:\/\/www.chesnok.com\/daily\/2013\/11\/12\/how-i-write-queries-using-psql-ctes\/comment-page-1\/#comment-1120950143\">Variable declaration<\/a> &#8211; to emulate DECLARE in SQL Server, for example<\/li>\n<li>Easier to understand queries and faster development time (ME)<\/li>\n<\/ul>\n<h2>An example of the kinds of queries I write<\/h2>\n<p>Something you&#8217;d see a lot in my command history are queries that look like this:<\/p>\n<pre><code>WITH crashes AS (                                                               \n    SELECT uptime_string AS category                                                      \n        , sum(report_count) AS report_count                                     \n    FROM signature_summary_uptime                                               \n        JOIN signatures USING (signature_id)                                       \n    WHERE                                                                           \n        signatures.signature = 'Fake Signature #1'                                             \n        AND report_date &gt;= '2013-08-05T00:00:00+00:00'::timestamptz             \n        AND report_date &lt; '2013-08-12T00:00:00+00:00'::timestamptz              \n        AND product_name IN ('Firefox')  AND version_string IN ('1')            \n    GROUP BY category                                                           \n),                                                                              \ntotals AS (                                                                     \n    SELECT                                                                      \n        category                                                                \n        , report_count                                                          \n        , sum(report_count) OVER () as total_count                              \n    FROM crashes                                                                \n)                                                                               \nSELECT category                                                                 \n    , report_count                                                              \n    , round((report_count * 100::numeric)\/total_count,3)::TEXT                  \nas percentage                                                                   \nFROM totals                                                                     \nORDER BY report_count DESC                                                      \n;\n<\/code><\/pre>\n<p>You&#8217;ll see that I have one or more <code>WITH<\/code> clauses, and then a query that performs a final summary query using the data from the CTEs.<\/p>\n<p>This query probably was asked for something like this:<\/p>\n<blockquote>\n<p>Please provide counts of crashes with the same uptime, for Firefox version 1, and the signature &#8216;Fake Signature #1&#8217; for the last week, including a percentage of all of the sampled crashes.<\/p>\n<\/blockquote>\n<p>While I&#8217;m sure there are better ways to write the query above, I wanted to show how I have made a pattern for myself to speed up query writing. I&#8217;m not always interested in the best possible query. Hopefully, the Postgres planner makes up for many of my sins as a developer!<\/p>\n<p>What I am interested in is finding answers to problems quickly for my coworkers.<\/p>\n<p>In answering the question I was asked, I first dig out an appropriate summary table (we have quite a few in Socorro). I found the <code>signature_summary_uptime<\/code> table, and fortunately it has <code>product_name<\/code> and <code>version_string<\/code> available in the table. I only need to join <code>signatures<\/code> to fulfill the request. (Yay for denormalized data that supports the kinds of queries we often run!)<\/p>\n<p>Next, I see that I&#8217;m being asked for a <strong>total percentage<\/strong>, so I need to calculate a sum across all the rows that I retrieve. That can be very slow, so I create a second CTE that uses data from the first CTE (rather than doing two full table scans to calculate the total). I use a <a href=\"http:\/\/www.postgresql.org\/docs\/9.3\/static\/functions-window.html\">window function<\/a> instead of <code>SUM()<\/code> here because I&#8217;ve <a href=\"http:\/\/www.chesnok.com\/daily\/2013\/08\/05\/why-use-over-instead-of-a-cross-join\/\">done experiments to see which tends to be faster<\/a>.<\/p>\n<p>And, finally once I have all the data together, I run my final query using my two CTE tables.<\/p>\n<h2>How CTEs and breaking down this process have helped me<\/h2>\n<p>So, I&#8217;ve had about a year to practice. A query like this today takes me 10-15 minutes to assemble and test. They are typically slightly more complex &#8212; with more dependencies, and maybe 2-3 more tables involved in JOINs. But they follow the same basic pattern.<\/p>\n<p>Most queries on my data sets conform to recognizable patterns.<\/p>\n<p>After a few months, we recognized that moving JSON for crash data into Postgres also would be a win, and was easy to process using very similar queries.<\/p>\n<p>That&#8217;s all helped make finding answers about <a href=\"http:\/\/crash-stats.mozilla.com\">Firefox crashes<\/a> easier and faster!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This this series of posts about using Postgres every day. The last post was about \\ commands in psql. I&#8217;m now going to share in a series of posts my workflow for writing queries, and some of the things about &hellip; <a href=\"https:\/\/www.chesnok.com\/daily\/2013\/11\/12\/how-i-write-queries-using-psql-ctes\/\">Continue reading &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[603,604],"class_list":["post-5068","post","type-post","status-publish","format-standard","hentry","category-postgresql","tag-ctes","tag-everyday-postgres"],"_links":{"self":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts\/5068","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/comments?post=5068"}],"version-history":[{"count":10,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts\/5068\/revisions"}],"predecessor-version":[{"id":5085,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/posts\/5068\/revisions\/5085"}],"wp:attachment":[{"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/media?parent=5068"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/categories?post=5068"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.chesnok.com\/daily\/wp-json\/wp\/v2\/tags?post=5068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}