The Lateral Join Chronicles: Unleash the Power of PostgreSQL!
Unleashing the power of PostgreSQL: A deep dive into Lateral Joins.
Hey there! My name is Ashutosh Muley and currently I'm a Member of Technical Staff 2 at Fyle. In this beginner's guide, we'll explore a unique feature in PostgreSQL that can take your data querying to the next level. Are you ready to level up your PostgreSQL querying skills? Let's dive into the fascinating world of lateral joins! Say goodbye to complex queries and hello to simplified and efficient queries!
What are LATERAL JOINs ?
At its core, a LATERAL JOIN in PostgreSQL allows us to reference columns from the preceding tables in a query. This feature comes in handy when we need to incorporate correlated subqueries or functions. Unlike regular joins, LATERAL JOIN
s give us the ability to access and use the results of a subquery for each row of the main query.
The true power of LATERAL JOINs becomes evident in situations where there are data dependencies between different parts of the query. Whether we need to calculate running totals, perform self-joins, or navigate hierarchical data structures, LATERAL JOINs provide a dynamic and efficient solution.
But enough with the definitions, let's see how LATERAL JOINs work in practice...
How do I use LATERAL JOINs, show me how?
To help us understand this better, let's take a simple example. Let's assume we have a sample database with the sales data of a sales firm.
Table "states"
Column | Type |
--------+------------------------+
id | integer |
name | character varying(255) |
Table "salespeople"
Column | Type |
---------------+------------------------+
id | integer |
full_name | character varying(255) |
home_state_id | integer |
Table "products"
Column | Type |
-------------+------------------------+
id | integer |
name | character varying(255) |
description | text |
Table "sales"
Column | Type |
----------------+---------+
id | integer |
amount | numeric |
product_id | integer |
salesperson_id | integer |
state_id | integer |
Our goal here is to find the total sales achieved by the salespeople in their respective home states.
Now, the simplest way to approach this would be (pseudo-code):
for salesperson in salespeople:
loop
let sum = 0;
for sale in sales:
if (
sale.state == salesperson.home_state
and sale.salesperson = salesperson
) then:
sum += sale.amount;
end if;
end loop;
end loop;
One thing to note here is that we need 2 loops for solving this. Firstly, we iterate through the list of salespeople. For each salesperson
, we iterate through the sales
table and find the sum of the sales in the salesperson's home state.
Before using LATERAL JOIN
for this, let's write a query to perform the desired operation with a simple CTE.
WITH state_wise_sales_for_each_salesperson as (
SELECT
s.salesperson_id,
s.state_id,
sum(s.amount) as total_sales
FROM sales s
GROUP BY s.salesperson_id, s.state_id
)
SELECT
sp.id salesperson_id,
sp.full_name,
sp.home_state_id,
st.name as home_state_name,
swsfes.total_sales
FROM state_wise_sales_for_each_salesperson swsfes
JOIN salespeople sp
ON sp.id = swsfes.salesperson_id
AND sp.home_state_id = swsfes.state_id
JOIN states st on st.id = sp.home_state_id;
Now let's break it down. In the above query, we start by creating a Common Table Expression (CTE) named state_wise_sales_for_each_salesperson
that calculates the total sales achieved by each salesperson in every state. This is achieved by grouping the results of the sales
table by salesperson_id
and state_id
and calculating the sum
of the amount for each group of sales.
Next, we select the salesperson_id
, home_state_id
, and total_sales
columns from the state_wise_sales_for_each_salesperson
CTE and join the salespeople
table to obtain the sales for each salesperson's home state, and we have our result. We also join with the salespeople
table and states
table to show the full_name
and home_state_name
respectively.
Now that we have our query all set, let's take a peek at the query plan:
Now, did you notice the number of Sequential scans, Sorts (even though we never used order by 😅), and aggregates that the PostgreSQL engine is juggling? It feels like a bit too much, don't you think?
But fret not! Let us simplify this query. Drumroll, please... Let's rewrite the same query but now with using a LATERAL JOIN
!
SELECT
sp.id salesperson_id,
sp.full_name,
sp.home_state_id,
st.name as home_state_name,
home_state_sales.total_sales
FROM salespeople sp
JOIN states st on sp.home_state_id = st.id
JOIN LATERAL (
SELECT sum(amount) as total_sales
FROM sales s
WHERE s.salesperson_id = sp.id
AND s.state_id = sp.home_state_id
) home_state_sales ON TRUE
;
By leveraging the power of lateral joins, we have made our query more readable and maintainable.
Let's now attempt to understand this.
In this query, we used a LATERAL JOIN
with a subquery. The subquery (enclosed in parentheses) helps us calculate the total sales for each salesperson in their home state.
The LATERAL JOIN
allows us to reference columns from preceding tables, in this case, the salespeople
table, inside the subquery. The use of LATERAL
treats the subquery as if it is part of the main query and allows us to use values from the preceding tables in the subquery.
And just like that, we have it 🙃: the total sales achieved by the salespeople in their respective home states. Voilà!
"I get it, you're probably thinking, 'Why didn't I know about this before?”
Let's quickly take a peek at the query plan of our new query:
And just like that, we can witness a magical transformation! The PostgreSQL engine performs fewer operations to run our query, making everything seem clearer and simpler (just like a spell from a wizard).
Looking back, we can see that the FROM-clause is like the "outer loop" in our pseudo code, and the LATERAL can be seen as the "inner loop."
Conclusion
Summarizing, learning about LATERAL JOIN in PostgreSQL introduces a useful tool for enhancing query flexibility. It allows you to refer to columns from tables that appear before or after in the query, facilitating the retrieval of related data from different tables. This feature is particularly beneficial for handling complex calculations and aggregations, leading to improved query performance.
However, it is important to exercise caution when using LATERAL JOIN, as it can increase query complexity and make them more challenging to manage. To utilize it effectively, you need to have a clear understanding of your data structure and table relationships.
This beginner's guide showed us a basic understanding of LATERAL JOIN
and demonstrated how it can simplify and optimize our queries.
Happy querying!