The Power of PostgreSQL functions

Alexey Soshin
Engineering at Depop
5 min readJan 31, 2021

--

Photo by Paola Blašković on Unsplash

At Depop we utilise PostgreSQL in most of our microservices. In addition to common data types, such as integers, floats and strings, PostgreSQL also supports array data types and has a whole set of functions that operate on them.

Here’s just one of the problems those capabilities helped us solve.

Let’s imagine that this is how we select multiple products with plain JDBC:

// productIds is a Seq[Int]
val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (${productIds.mkString(",")})""";

When invoking SQL from your code, the last thing you want to do is to concatenate or interpolate SQL strings. It’s basically welcoming SQL injection.

It would be much better to use bind variables:

val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (?)""";

But, when you have an unknown number of variables, such as is the case when receiving an array of elements, this may create a problem. You simply don’t know how many variables you need to bind.

So, you may resort to something like:

val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (${List.fill(productIds.length)("?").mkString(",")})""";

And then binding those variables in a loop. Which is safer, but much more verbose.

It would be useful if we would have a SQL function that given a string, could represent it as separate rows.

Then our query would remain constant:

val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (func(?))""";

If you check the PostgreSQL documentation, you’ll quickly find that there’s theunnest function for that.

It receives an array and returns you a virtual table, where each element of an array is a row.

So, if we had the following string:

'{1, 2, 3}'

We could have created the following table out of it:

+--------------------+
|col1 |
+--------------------+
|1 |
|2 |
|3 |
+--------------------+

And now our query is as simple as:

val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (unnest(?::int[]))""";

Now it’s a good time to clarify, that although the above examples discuss plain JDBC, at Depop we’re using the Slick library to communicate between our Scala microservices and PostgreSQL.

While Slick is a very powerful tool, it still has its limitations, and sometimes you have to resort to literal values:

val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (#${productIds.mkString(",")})""";

Using # before a variable tells Slick to insert it as-is into our query.

And that’s the equivalent of the first query we’ve discussed:

val query = sql"""SELECT product_id, likes FROM product_likes WHERE product_id IN (${productIds.mkString(",")})""";

The above example converted a collection into a virtual table with a single column.

What if you need a virtual table with multiple columns instead?

Let’s say we receive as input a JSON string:

val input = "[[3,2,82.38,null,'World'], [4,1,43.12,'Hello',null]]"

And want to represent it as the following table:

+----+----+-----+-----+-----+
|col1|col2|col3 |col4 |col5 |
+----+----+-----+-----+-----+
|3 |2 |82.38|null |World|
|4 |1 |43.12|Hello|null |
+----+----+-----+-----+-----+

How can we do that?

So, if we had the following string:

'{"3,2,82.38,null,World", "4,1,43.12,'Hello',null"}'

We could have created the following table out of it using unnest:

+--------------------+
|col1 |
+--------------------+
|3,2,82.38,null,World|
|4,1,43.12,Hello,null|
+--------------------+

Getting the required string is trivial: we just need to replace the first [ and last ] with { and }, all other [] are to become “” , and finally, ‘’ needs to go.

We could do that in our Scala code, but let’s see how we can use PostgreSQL functions for that instead.

There are multiple ways to achieve that: replace, translate or even regexp_replace. We’ll use a combination of replace and translate for that:

translate(
replace(replace('[[3,2,82.38,null,''World''], [4,1,43.12,''Hello'',null]]', '[[', '{"'),'', ''),
'[, ], ''', '", ",')

The interesting part here is how translate works. It says the following:

Replace this:
[ with "
] with "
'' with nothing

So, [ becomes , ] becomes , and‘’, which is an escaped string is removed. Exactly as we wanted.

Next, let’s create a table from this string.

It’s as easy as casting it to text[] and invoking unnest on the result:

select unnest('{"3,2,82.38,null,World","4,1,43.12,Hello,null"}'::text[]) as col1

Results in:

+--------------------+
|col1 |
+--------------------+
|3,2,82.38,null,World|
|4,1,43.12,Hello,null|
+--------------------+

Half of the work done. We converted a single string into multiple rows, just as we did in the previous unnest example, but this time each row is a comma separated string.

How do we separate them into columns now, though?

For that, we can use another function, string_to_array

We’ll invoke it on each row:

select string_to_array(nest.l, ',') as a
from (
select unnest('{"3,2,82.38,null,World","4,1,43.12,Hello,null"}'::text[]) as l
) nest

And that results in the following:

+----------------------+
|col1 |
+----------------------+
|{3,2,82.38,null,World}|
|{4,1,43.12,Hello,null}|
+----------------------+

Notice that it’s still a table with a single column, though. What has changed is that the type of this column is text[] now.

The function string_to_array is different from the cast we used previously, because it supports any type of separator, and not only commas.

What’s left is to turn elements of an array into columns:

And this results in:

+----+----+-----+-----+-----+
|col1|col2|col3 |col4 |col5 |
+----+----+-----+-----+-----+
|3 |2 |82.38|null |World|
|4 |1 |43.12|Hello|null |
+----+----+-----+-----+-----+

Exactly as we wanted.

Note that we’re casting the columns to the correct types: a[3]::float , for example. That’s because, by default, those are all text

Here’s an extra question:

Why even use string_to_array , when we could simply unnest the array?

The reason is that unnest flattens the array completely, including nested arrays.

So this command:

select unnest('{{3,2,82.38,null,World},{4,1,43.12,Hello,null}}'::text[]) as l

Will result in the following table:

+-----+
|l |
+-----+
|3 |
|2 |
|82.38|
|NULL |
|World|
|4 |
|1 |
|43.12|
|Hello|
|NULL |
+-----+

Which is not what we wanted.

Conclusions

PostgreSQL has a wide array of useful functions, including functions that turn strings into arrays, and arrays into table rows and table columns.

These PostgreSQL capabilities to create virtual tables can be useful in many different cases.

Virtual tables can be joined, used in updates or even saved using create table as

Don’t ignore these capabilities. By utilising them, you can make your application more secure and more performant.

--

--

Solutions Architect @Depop, author of “Kotlin Design Patterns and Best Practices” book and “Pragmatic System Design” course