Using Subqueries for better SQL queries
After spending time in PostgreSQL’s console, I want to document some of my learning.
As much as I want to spend time in Ruby and have ActiveRecord handle all the querying, SQL is very powerful and can do a lot of work!
One thing I have grown to love: subtables or subqueries. They are results which can be used for another query.
I like PostgreSQL’s approach using the WITH subquery
format. The basic syntax in PostgreSQL is:
WITH subquery AS (
SELECT *
FROM source ...
),
another_subquery AS (
SELECT *
FROM another_source ...
)
SELECT *
FROM subquery
INNER JOIN another_subquery
WHERE subquery.join_attribute = another_subquery.join_attribute
Similar queries can be applied to other SQL databases from within the
WHERE
clause. I like the WITH
format as subqueries can be
separated from the main SELECT
statement.
Example
Let’s have an example with a bit more details: joining results from multiple statements to create a report. In this example: social media clicks in an interval.
Table 1: Articles Table
- keep track of articles details
- columns:
author | title | content | article_slug | created_at | updated_at |
Table 2: Attribution link table
- this table keeps track of all the ‘shares’ an article has through its unique slug, which service it originated, and time it was clicked at
- columns:
article_slug | uniq_hash | service_source | click_time |
A question that can be answered is: how many clicks does an author get on a particular social media service over a course of a day?
An easy way to answer this with subtables is for each author, create a subtable of their clicks and have the main query join that result with an interval table.
WITH social_media_clicks AS (
SELECT articles.author, attribution_link.click_time click_time
COUNT(attribution_link.service_source = 'facebook') facebook_count
COUNT(attribution_link.service_source = 'twitter') twitter_count
FROM articles
INNER JOIN attribution_link
WHERE articles.article_slug = attribution_link.article_slug
),
five_min_intervals AS (
SELECT
(SELECT min(click_time)::date FROM social_media_clicks) + ( n || ' minutes')::interval start_time,
(SELECT min(click_time)::date FROM social_media_clicks) + ((n+5) || ' minutes')::interval end_time
FROM generate_series(0, (24*60), 5) n
)
SELECT articles.author, facebook_count, twitter_count
FROM social_media_clicks s
INNER JOIN five_min_intervals f
WHERE s.click_time >= f.start_time and s.click_time < f.end_time
The main query can use a subtable as its source and even perform a join on it as if it was a real table.
Thanks to Cat Recall for the generate series statement!
Overall
I’m not a SQL expert, but whenever parts can be reused to build bigger
things, I am very happy. Subtables and subqueries enable this and
WITH
helps keep organize the SQL statement into cleaner parts.
Go forth and (sub)query!