Correlated Query Error in Netezza

Standard
Beta... you know... for correlation?

Beta… you know… for correlation?

Fun and interesting error today. Here’s the actual error text:

Error 2: this form of correlated query is not supported – consider rewriting

I’d never heard of this “correlated query” business before so I had to look it up to sort out what was going on. Turns out that you can reference a table in the outside part of a query from within a subquery by calling the alias… Or, rather, you can’t in Netezza.

Tip of the day:

Check for alias references in your subquery and get rid of them

Being that this code wasn’t actually meant to attempt this crazy bit of semi-circular logic, I ended up finding the invalid references and removing them. But it did get me thinking about alternatives to using correlated subqueries.

The example over on the Wikipedia page had something about finding all records indicating a salary higher than the average for the department. It gives an alternative that breaks this into two steps:

  1. Create a view that captures the department and its average salary
  2. Join the view with averages to the main table using a NATURAL JOIN and a where clause to limit

I’d like to take this moment to say OUCH!!! My brain hurts from the very thought of using a natural join in real life. Here are a couple of (less painful to see) alternatives for accomplishing the same task.

-- Using two steps to pull this list

-- First step same as Wikipedia
create view dept_avg as
select department, avg(salary) as department_average
from employees
group by department;

select e.*
from employees e
left join
dept_avg da
on e.department = da.department
where e.salary > da.department_average;

This effectively does the same step but uses a left join instead of a natural join. You could also use a join with multiple criteria in the second step, such as:

select e.*
from employees e, dept_avg da
where e.department = da.department
and e.salary > da.department_average;

For those enamored of more elegant code, here’s a way of doing it all in one, albeit more complex, step:

select *
from
(select employee_id, salary, department, avg(salary) over (partition by department) as department_average
from employees) as windowedinsanity
where salary > department_average;

I did a quick test using a similar structure to find customers with a sale that did not represent their total sales amount to compare the three methodologies above. The windowed version, the last one presented, was about 50% faster than the other two. I did not bother trying this with a natural join or with the correlated subquery because, you know, it was in Netezza.

How would you solve this problem?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.