Red Green Repeat Adventures of a Spec Driven Junkie

Using Subqueries for better SQL queries

After spending time in PostgreSQL’s console, I want to document some of my learning.

As much as I want to spend time in Ruby and have ActiveRecord handle all the querying, SQL is very powerful and can do a lot of work!

One thing I have grown to love: subtables or subqueries. They are results which can be used for another query.

I like PostgreSQL’s approach using the WITH subquery format. The basic syntax in PostgreSQL is:

WITH subquery AS (
  SELECT *
  FROM source ...
),
another_subquery AS (
  SELECT *
  FROM another_source ...
)

SELECT *
  FROM       subquery
  INNER JOIN another_subquery
  WHERE      subquery.join_attribute = another_subquery.join_attribute

Similar queries can be applied to other SQL databases from within the WHERE clause. I like the WITH format as subqueries can be separated from the main SELECT statement.

Example

Let’s have an example with a bit more details: joining results from multiple statements to create a report. In this example: social media clicks in an interval.

Table 1: Articles Table

  • keep track of articles details
  • columns:
author title content article_slug created_at updated_at

Table 2: Attribution link table

  • this table keeps track of all the ‘shares’ an article has through its unique slug, which service it originated, and time it was clicked at
  • columns:
article_slug uniq_hash service_source click_time

A question that can be answered is: how many clicks does an author get on a particular social media service over a course of a day?

An easy way to answer this with subtables is for each author, create a subtable of their clicks and have the main query join that result with an interval table.

WITH social_media_clicks AS (
  SELECT     articles.author, attribution_link.click_time click_time
             COUNT(attribution_link.service_source = 'facebook') facebook_count
             COUNT(attribution_link.service_source = 'twitter')  twitter_count
  FROM       articles
  INNER JOIN attribution_link
  WHERE      articles.article_slug = attribution_link.article_slug
),
five_min_intervals AS (
  SELECT
   (SELECT min(click_time)::date FROM social_media_clicks) + ( n    || '  minutes')::interval start_time,
   (SELECT min(click_time)::date FROM social_media_clicks) + ((n+5) || '  minutes')::interval end_time
  FROM generate_series(0, (24*60), 5) n
)

SELECT     articles.author, facebook_count, twitter_count
FROM       social_media_clicks s
INNER JOIN five_min_intervals  f
WHERE      s.click_time >= f.start_time and s.click_time < f.end_time

The main query can use a subtable as its source and even perform a join on it as if it was a real table.

Thanks to Cat Recall for the generate series statement!

Overall

I’m not a SQL expert, but whenever parts can be reused to build bigger things, I am very happy. Subtables and subqueries enable this and WITH helps keep organize the SQL statement into cleaner parts.

Go forth and (sub)query!