Netezza is a super-fast platform for databases… once you have data on it. Somehow, getting the data to the server always seems like a bit of a hassle (admittedly, not as big a hassle as old school punchcards). If you’re using Netezza, you’re probably part of a large organization that may also have some hefty ETL tools that can do the transfer. But if you’re not personally part of the team that does ETL, yet still need to put data onto Netezza, you’ve got to find another way. The EXTERNAL TABLE functionality may just be the solution for you.
You have some sort of basic file on a non-server drive that you need to load into a table in Netezza. This is usually an unencrypted, delimited file. The delimiter can be whatever you like but standard might be comma- or pipe-delimited.
There are several ways to accomplish this task. If you are using the Aginity Workbench for Netezza, you can use the Import function. The wizard steps will ask you to specify field formats and a table name. This process does add a little bit of overhead and may take longer than you’d like. You can also use Workbench to create a table in the structure you need and then Import to it directly. However, if your file has any incongruous lines, the process is likely to error.
The EXTERNAL TABLE Solution
External tables basically form a link from the Netezza server to your local environment (desktop or shared drive) but do not immediately import the data. You just specify how it would. If you were to query the external table, it would read from the file as needed but would not store a copy of the data on the server.
Technically, this is explicit external table schema definition with options set. There are more streamlined ways to do this but, if you’re fussy about getting the schema absolutely right, I recommend using the method below. It also allows for better error checking and handling if you run into problems.
create external table EXTERNALTABLENAMEHERE ( tbl_id bigint, date_entered timestamp, name_of_stuff varchar(100), spend_amt numeric(16,2) ) USING ( DATAOBJECT('C:\Data Sources\some_random_file.txt') REMOTESOURCE 'odbc' DELIMITER '~' SKIPROWS 1 MAXERRORS 1000 LOGDIR 'C:\\' ); create table PERMANENTTABLENAMEHERE as select * from EXTERNALTABLENAMEHERE;
The first part of EXTERNAL TABLE looks largely like a normal CREATE TABLE statement. You give the external table a name and provide the DDL. The second part is where all the fun stuff happens.
The USING statement indicates that it will be reading from a different source. You then state the source by referencing the file location as the DATAOBJECT and the source connection as the REMOTESOURCE. If the file is already located on the Netezza server somewhere (just not in a table in a database), you can skip the REMOTESOURCE piece. REMOTESOURCE can take one of three values: odbc, jdbc, or ole-db. Use the one appropriate for your connection type.
The DELIMITER allows you to set a custom field delimiter. It defaults to pipe if you don’t put anything in so be sure to check this before running your code.
Next in the example above is SKIPROWS. This is useful if you have headers in your first record (or however many). Just provide the number of records to ignore at the top.
MAXERRORS is the number of erred records that can be written to the log before the process will fail. This does not happen when you execute the CREATE EXTERNAL TABLE step. It will happen if you attempt to read something from the EXTERNAL TABLE either by running a SELECT statement or using CREATE TABLE AS to pull the data from the external table into a normal table.
Make sure that you specify a LOGDIR if you use MAXERRORS. This is the directory to which the log file and errors will be written (which means you need to have write access there). Any errors will show up in a file that gets written to the TABLENAME.DATABASE.NZBAD. You will also see TABLENAME.DATABASE.NZLOG as the log file written, whether or not any errors were encountered.
The final step is to CREATE TABLE from the external table. This reads the file from the link in the external table definition, reads the external table DDL, and combines the two into a regular table.
Other Options with External Tables
There are a bunch of options that are configurable during your CREATE EXTERNAL TABLE procedure. I recommend looking at the IBM External Table Load Options documentation, which also goes through what default behaviors are for this command. Some of the ones I’ve used before are below with explanation.
BOOLSTYLE Used to specify how BOOLEAN fields are encoded in your file. For example, if you have ‘True’ and ‘False’ as your values, you would use True_False as your option command.
FILLRECORD Sometimes you have a file with fields missing at the end (not optimal, but it happens). In this case, use FILLRECORD TRUE to allow for records with missing fields to be entered into the table.
ENCODING Files with character encoding not aligned to that used by the database must be specified. You can state Latin9 or UTF8 as the encoding structure.
MAXROWS If you are worried that you may not have it quite right yet and want to test load a sample, use the MAXROWS command to give a number of rows to load in. This will take less time than loading the whole file if you are still in development.
REQUIREQUOTES and QUOTEDVALUE These kind of go together to deal with quoting of string values. If you set REQUIREQUOTES TRUE in your options, that means any strings not enclosed in quotes will error. It also means that you have to specify the QUOTEDVALUE as YES, SINGLE or DOUBLE. YES and SINGLE are roughly synonymous. DOUBLE means that strings are enclosed in double-quotes.
TIMESTYLE This allows you to specify 24HOUR or 12HOUR to denote how times should be read. It defaults to 24HOUR so if you are using 12HOUR (with A.M. and P.M.), you need to state 12HOUR.
Again, these are only a handful of the available options. If you’re having trouble with a given file, check the documentation and/or leave a comment below.