Red Green Repeat Adventures of a Spec Driven Junkie

Reference Queries for PostgreSQL's Document Type from Ruby on Rails

I want to document my explorations in querying PostgreSQL’s document column types as a reference because I always create a query that puts the Ruby on Rails’ PostgreSQL adapter in a failed state.

I will build from my previous articles on building out a Ruby on Rails system with PostgreSQL and use the system to document the results of my practice.

If you’re looking for help to constructing queries for PostgreSQL’s document column types (hstore, JSON, and JSONB), you will find some reference for matching, greater/less than, dates, match array items, etc.

This article will take you about five minutes to read.

Square Piano source and more information

Introduction

I love having a document column in my relational database system so I can store both data types with ease.

Querying normalized data is easy, especially through an object relational mapper, such as Ruby on Rails’ ActiveRecord.

Querying the document column has been a hit or a miss for me and I always get stuck.

I want to have a set of reference queries so wheneve I do have to query PostgreSQL’s document columns from Ruby on Rails, I can just look here.

Factory Setup

I use factories to generate data in such a way that there’s realistic data that queries give varying results, to make it challenging.

Also, with FactoryBot’s create_list method, I can create items I need to test out queries. Make sure the queries do work as intended to return (or not return) items based on the query.

To create items, I have a variety of data generated, using sample to randomize here and there with values.

The following is the factory file I created to practice my queries.

FactoryBot.define do
  factory :user do
    transient do
      past_date { 3.days.ago }
    end

    transient do
      array_data {
        [
          "blues",
	  # ...
          "soul",
        ].sample(3)
      }
    end

    transient do
      nested_data {
        {
          gender:
            [
              "male",
              "female",
              "did_not_answer"
            ].sample,
          marital_status:
            [
              :divorced,
		  # ...
              :single,
              married: [
                :no_children,
                children: rand(1..10)
              ].sample
            ].sample,
          age: rand(18..65),
          profile: {
            setting: {
              value: [
                :set,
			# ...
                :false,
              ].sample(3)
            }
          }
        }
      }
    end

    # Mirror data in all three types
    sequence :hstore do |n|
      {
        id: n,
        date: past_date,
        music_preference: array_data,
        survey: nested_data.to_json
      }
    end

    sequence :json do |n|
      {
        id: n,
        date: past_date,
        music_preference: array_data,
        survey: nested_data
      }
    end

    sequence :jsonb do |n|
      {
        id: n,
        date: past_date,
        music_preference: array_data,
        survey: nested_data
      }
    end
  end
end

I explicitly set all the columns to have the same values and to experiment with the differences between how PostgreSQL handles the column types.

Blowing Up

The main reason for doing all this work to setup a new Ruby on Rails application with PostgreSQL, FactoryBot, and Faker is to practice queries.

Even as I wrote this article, I miscontructed the SQL statement and kept putting the adapter in a failed state, causing all queries to fail:

[1] pry(#<RSpec::ExampleGroups::User>)> User.where("(hstore -> 'id') = ?", 1)
=> #<User::ActiveRecord_Relation:0x5979d48>
[2] pry(#<RSpec::ExampleGroups::User>)> User.first
ActiveRecord::StatementInvalid: PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block

from /home/vagrant/.rvm/gems/ruby-2.5.8/gems/activerecord-6.1.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `exec'
Caused by PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block

from /home/vagrant/.rvm/gems/ruby-2.5.8/gems/activerecord-6.1.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `exec'

Imagine doing this on production?!

Exact Match

The basic query on the documnet column is for an exact match of a value.

For my factory, the queries are against the id part of the document, which supposed to be an integer type.

Hstore

User.where("hstore ->> 'id' = ?", "integer")
User.where("CAST (hstore ->> 'id' AS INTEGER) = ?", 1)
User.where("(hstore ->> 'id')::int) = ?", 1)

JSON

User.where("json ->> 'id' = ?", "integer")
User.where("CAST ( json ->> 'id' AS INTEGER) = ?", 1)
User.where("(json ->> 'id')::int) = ?", 1)

JSONB

User.where("jsonb -> 'id' = ?", "1")
User.where("jsonb -> 'id' = ?", 1)
User.where("CAST (jsonb -> 'id' AS INTEGER) = ?", 1)
User.where("(jsonb -> 'id')::int) = ?", 1)

There are some nuances. Casting helps, unless the query is fine to match against "1" vs. 1. JSONB will match using -> OR ->> while JSON will only match using ->>.

Nested object

Matching a value in a nested object, this I could only get going with the JSONB column type.

User.where("jsonb -> 'survey' ->> 'marital_status' = ?", 'widowed' )

Greater-Less Than Integers

To query for values greater-less than, casting the value to an integer is essential for the operator.

Hstore

User.where("CAST (hstore ->> 'id' AS INTEGER) > ?", 1)

JSON

User.where("CAST (json ->> 'id' AS INTEGER) > ?", 1)

JSONB

User.where("jsonb -> 'id' > ?", 1)

Greater-Less-Equal Than Dates

There’s a different section for time and date as this shows a slight difference in the serialization of the document column types show.

Hstore and JSON store their items as text and requires casting and the input time to be in iso8601 format while JSONB can handle time directly without the need for a cast.

Hstore

User.where("CAST ( hstore -> 'date' AS DATE) <= ?", Time.now.iso8601)

JSON

User.where("CAST (json ->> 'date' AS DATE) < ?", Time.now.iso8601)

JSONB

User.where("jsonb ->> 'date' < ?", Time.now)

Object Matching

The following query matches I could only get working with the JSONB column. Having the same values in all document column types was nice because it kept me from blowing up when the query kept failing.

Match any value in array

Given an array, match against any value in that array.

User.where("jsonb -> 'music_preference' ? :style", style: 'pop')

Match Object

To create a query to match a specific object:

User.where("jsonb -> 'survey' -> 'marital_status' = ?", { married: "no_children" }.to_json )

Match Array

A query to match a specific array:

User.where("jsonb -> 'survey' -> 'profile' -> 'setting' -> 'value' = :item", item: ['set', 'unset', 'true'].to_json)

Conclusion

Above is my mini-reference card for querying PostgreSQL’s document type through Ruby on Rails’ ActiveRecord interface. Creating a whole environment to practice has improved my understanding on how to query.

At the same time, PostgreSQL’s syntax is subtle and if I did not have this reference here, I would still bang my head whenever my query would put the adapter in a failed state.