Common Table Expressions versus Temp Tables in Netezza

Standard

keep-calm-with-or-without-youThis seems to be either a controversial or overly-technical topic: should you use WITH (a common table expression a.k.a. CTE) or a TEMP TABLE. Both can serve similar purposes but each has their own strengths and weaknesses in how they work with other aspects of your query or procedure.

So today we’ll take a look at both without going into crazy detail but covering at least the basics.

WITH Statement

The WITH statement is used to create an in-line set of data similar to a sub-query. It is placed at the start of the query with an AS statement to specify the construction of the set(s).

Pros

  • Cleaner Code: The query of interest (that producing the output results) looks less cluttered using a WITH statement than a sub-query
  • Limited Construction Time: While the WITH statement does create a temporary set, its I/O footprint may be lower than that of a temp table by not requiring separate statistics to be generated
  • Single Statement: If you are in an environment requiring that all of your code must be in a single statement (such as referencing from other programming languages via pass-through code), then WITH is a better option as it is part of the same query.

Cons

  • Single Query Use: A WITH statement is associated ONLY with the query to which it is affixed – it cannot be used through multiple queries without its being restated each time.
  • Limited Optimization Capability: While WITH does create a temporary data set, it is not one that can be specifically distributed across key values. I also cannot have its own statistics generated.

TEMP TABLE

The TEMP TABLE is used to create a data set that functions just like a table but is dropped once the transaction or session is complete.

Pros

  • Optimizable: Because temp tables function basically as a table, you can distribute and/or organize them on the SPUs. You can also generate statistics on them for ease of calculation. This can significantly speed up processing of other steps.
  • Flexibility: Temp tables may be altered as needed during the course of processing. This allows for the inclusion of additional information as it is calculated.
  • Persistence: The temp table lasts until the transaction or session is terminated or until specifically dropped. As such, it may be used in multiple query steps without the need for restatement.

Cons

  • Permissions: You must have write permissions in order to execute a CREATE TEMP TABLE command. If you have only read permissions to the environment in which you are working, then you cannot use temp tables.
  • Management: If you are doing a substantial amount of processing or are creating very large temp tables, it may be necessary to do manual clean-up as you go. Once the steps using the temp table are complete, you can drop them just like regular tables, but this means extra management steps.

Net Result

It’s kinda’ a draw, which means it’s programmer’s preference. I personally tend to use temp tables more than CTEs because it’s more frequent that I will create a data set that needs to be used in multiple steps. Also, with the volumes of data in some of these queries, being able to distribute or generate statistics to improve performance is a big benefit. That said, if you don’t need those things and prefer not to have subqueries in the middle of your SQL statement, try a CTE.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.