Netezza Stored Procedures and Optional Arguments

Standard

netezzaLet’s say you have a process that requires a start date, but may or may not have an end date, as a parameter feeding in. How can you accomplish this in Netezza via a stored procedure? The internet has not had great answers to this question but it’s not as hard as some other programmers make it out to be. So today we tackle optional arguments – in easy mode.

A Note About Arguments… and Tools

There are three ways to build stored procedures:

  1. Without any arguments – you can still declare parameters but they cannot be passed into the call
  2. With set arguments – you specify each argument expected and its data type
  3. With VARARGS – you allow any number of parameters with any data type to be passed

In this example, we will use option 2 from above – set arguments. While it is possible to use option 3, it gets messy really fast. And, like I said before, this is easy mode.

Also, if you’re developing in NZSQL, you should really consider using the Aginity Workbench. It’s free and provides a TON of good features – like setting up the template of a stored procedure with a few quick entries. I say this not only because I work for the company, but also because I totally just used it to create the outline of the little test stored procedure I’ll use for this example.

On to the Example!

Let’s say, for the sake of argument (get it? argument? I crack myself up!) that I want to count the number of distinct cookie IDs to each web page on my site during a specific period of time. (And no, I’m not dropping cookies on you… unless I am…) I might write a quick stored procedure that allows me to specify both the start date and the end date as such:

CREATE OR REPLACE PROCEDURE SP_TEST_OPTARGS(DATE, DATE)
RETURNS INTEGER
EXECUTE AS OWNER
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
Start_date ALIAS FOR $1;
End_date ALIAS FOR $2;
BEGIN

create temp table tmp_optargs_out as
select web_page, count(distinct cookie_id) as cookies
from WEB_VISITS
where visit_timestamp between start_date and end_date
group by web_page;

END;
END_PROC;

Simple enough. When called, the user enters the start date and end date in proper format, including single quotes, and separated by a comma. No big deal. Unless I want the user to be able to not specify an end date… then we might have something like this:

CREATE OR REPLACE PROCEDURE SP_TEST_OPTARGS(DATE)
RETURNS INTEGER
EXECUTE AS OWNER
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
Start_date ALIAS FOR $1;
BEGIN

create temp table tmp_optargs_out as
select web_page, count(distinct cookie_id) as cookies
from WEB_VISITS
where visit_timestamp >= start_date
group by web_page;

END;
END_PROC;

Screen Shot 2015-05-27 at 11.19.19 PMNow, if I were to execute both of these statements, Netezza would create essentially two different stored procedures with the same name. One would show that it requires two date arguments and the other would require only one argument. Ta-da! You now have an optional argument.

What About that VARARGS Thing?

VARARGS is a kinda nutty deal but it can serve a purpose. Basically, it allows you to put in up to 64 arguments, in no particular order, and assumes that you’ll figure out what to do with them in the stored procedure later on. Ick.

The other thing it does is force you to specify your data types when entering your arguments provided that they aren’t simply numeric or character varying. So if you want to use something as a date later on, you’ve got to add an in-line cast to it such as ‘2015-01-01’::date. If there are only so many things that you want your code to handle, it’s probably best to use the option above. But if you’re appalled at the idea of having to support two different stored procedures that do the same thing, this may be the option for you.

Here is the code to do the same thing as above except using VARARGS:

CREATE OR REPLACE PROCEDURE SP_TEST_VARARGS(VARARGS)
RETURNS INTEGER
EXECUTE AS OWNER
LANGUAGE NZPLSQL AS
BEGIN_PROC

DECLARE
idx int4;
typ oid;
min_date date;
max_date date;
form_type varchar(255);

BEGIN

create temp table tmp_args(arg_no int4, type_no bigint, arg_value varchar(255));

for i IN 0 .. PROC_ARGUMENT_TYPES.count - 1 LOOP

idx := i::int + 1;
typ:= PROC_ARGUMENT_TYPES(i);

insert into tmp_args
select idx as arg_no
, typ as type_no
, $idx::varchar(255) as arg_value
;

END LOOP;

min_date := min(arg_value::date) from tmp_args where type_no = 1082;

max_date := max(arg_value::date) from tmp_args where type_no = 1082;

CALL UT_DROP_TABLE_IF_EXISTS('tmp_varargs_out');

create table tmp_varargs_out as
select web_page, count(distinct cookie_id) as cookies
from WEB_VISITS
where visit_timestamp between min_date and decode(max_date=min_date,TRUE,current_date,max_date)
group by web_page;

END;
END_PROC;

Here’s the Step-by-Step

  1. Create your stored procedure and use VARARGS instead of specifying arguments to be entered. Note that VARARGS essentially creates an array, which will have a starting position of 0 rather than 1.
  2. Declare variables for the index position of the arguments, the OID of the data type applied to the argument, and any arguments you’re actually looking to use the values of.
  3. Create a temp table to store the argument values along with their positions and data types. This is useful for referencing later or assigning to the variables you want to use.
  4. Loop through the arguments to capture their associated information in the temp table from Step 3.
  5. Assign the variables you want to use based on the information captured from the arguments. In the example, this is done by looking for the minimum and maximum dates, as denoted by a data type OID of 1082.
  6. Do the rest of whatever you were trying to do with the variables in the first place.

That’s much more complex. It does get the job done but it takes a lot more effort to get there. And there’s the possibility of user error if someone forgets to properly specify the data type in the argument list provided. For instance, someone calling the VARARGS stored procedure with CALL SP_TEST_VARARGS(‘2015-05-27’,‘2015-12-31’); the following call would have NO date arguments. Both of the dates would be read as varchar and assigned to OID 705. The resulting min and max dates would then be null and no data would be returned in the table created.

Moral of the story – stick to the easy way. Much easier to code and error-proof. Plus, VARARGS are annoying.

Side note: if you need to know the what the weird OID numbers mean, you can query for them using the code below. A given OID may have multiple records from this due to differences in precision/scale or character length.

SELECT DISTINCT FORMAT_TYPE, ATTTYPID
FROM _V_RELATION_COLUMN;

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.