Let’s say you have a process that requires a start date, but may or may not have an end date, as a parameter feeding in. How can you accomplish this in Netezza via a stored procedure? The internet has not had great answers to this question but it’s not as hard as some other programmers make it out to be. So today we tackle optional arguments – in easy mode.
I’ve recently come to love (read:be obsessed with) windowing functions in my coding. They’re just so useful and practical.
For those who haven’t experienced the joys of windowing, here’s the deal. They allow you to do calculations across multiple rows without actually having to group, thereby storing aggregate info on each record. That means you keep all the data associated with the row and can add calculated fields that rely on interaction with other rows. Pretty swiffy, huh?
Below are just a few funky functions that I’ve found helpful. I’m not saying that these aren’t resource intensive, but they may just save you from having to join to some crazy aggregation sub-queries and then export to Excel for further manipulation to get the same result.
I did an interview a couple of weeks ago with Loyalty360 and now they’ve put out an article from that. It’s about Data Science and suchness. I guess I’m actually getting to be somebody… or something.
You can read the article here. It may require that you register but it should be free of charge.
A few days ago, I got in touch with the parent of a friend of mine from high school who is a college counselor with Kelleher Cohen Associates in the Boston area. Her job is to help high school students find colleges that fit their personality and academic needs, apply for financial aid, and the complete the application process for the schools.
During the course of conversation, she asked about what I was doing for work. When I started describing marketing analytics to her, she got even more inquisitive. Turns out she has a female student with whom she is working that is very interested in Math. It’s not the student’s best subject, but the one that she looks forward to every day. As this student nears college age, she has expressed that she will likely not pursue math further. When I asked why she would abandon a subject that she enjoys, she said that, according to the student, “Pretty girls don’t do math.”
Imagine, if you will, begin able to calculate the time between visits to a website, transactions in a store, logs from a punch-clock, etc. in just one step. Well, I have found the way!
Fun and interesting error today. Here’s the actual error text:
Error 2: this form of correlated query is not supported – consider rewriting
I’d never heard of this “correlated query” business before so I had to look it up to sort out what was going on. Turns out that you can reference a table in the outside part of a query from within a subquery by calling the alias… Or, rather, you can’t in Netezza.
Tip of the day:
Check for alias references in your subquery and get rid of them
Most English-speakers are familiar with Murphy’s Law, which states:
Anything that can go wrong, will go wrong.
Realistically, this should read:
Anything that can go wrong, will go wrong… in the worst possible way at the most inconvenient time.
There are a lot of opportunities for Murphy’s Law to prove itself in the world of data analysis. Here are some of the ones that most often occur in my traipsing through the data.
Slicing data into manageable chunks for viewing is crucial when you start dealing with more records than will fit in something like Excel (without PowerPivot, of course). One of the most common ways to look at data in a more easily-digestible manner is to use percentiles, or some derivative thereof, to group records based on a ranking. This allows you to then compare equal-sized groups to one another in order to form conclusions as to relative behavior.
Most SQL variants have workarounds for how to accomplish this task that may or may not actually cover 100% of your data (may drop a few records here or there when trying to round a percentage to a whole number of records to pull). PostgreSQL, on the other hand, has a handy function built in for doing this sort of thing without having to worry about getting full coverage on your table.
There are a whole set of fields in the databases I’m using here that are tilde-delimited (~) varchar strings with a mess of key-value pairs, the values from which I really need. Unfortunately, since they are varying character lengths, in no particular set order within that field, it is impossible to substring your way efficiently through them. Thankfully, there is a RegEx genius on my team who produced a handy chuck of code that pgSQL can easily recognize, parse and process for pulling precisely what I need.
Fun new conundrum in dealing with data formats/displays in PostgreSQL! A client recently requested that we provide a data extract that included percentages as a three character number, left zero-padded, rather than a decimal/numeric. There’s not a great way in Postgres to show a numeric with leading zeros (actually, I’ve yet to find a data type that does this consistently as a built-in to any platform). Instead, you have to do a little bit of work to get to your end result as a character (or text) field.