A few days ago, I got in touch with the parent of a friend of mine from high school who is a college counselor with Kelleher Cohen Associates in the Boston area. Her job is to help high school students find colleges that fit their personality and academic needs, apply for financial aid, and the complete the application process for the schools.
During the course of conversation, she asked about what I was doing for work. When I started describing marketing analytics to her, she got even more inquisitive. Turns out she has a female student with whom she is working that is very interested in Math. It’s not the student’s best subject, but the one that she looks forward to every day. As this student nears college age, she has expressed that she will likely not pursue math further. When I asked why she would abandon a subject that she enjoys, she said that, according to the student, “Pretty girls don’t do math.” Continue reading
Imagine, if you will, begin able to calculate the time between visits to a website, transactions in a store, logs from a punch-clock, etc. in just one step. Well, I have found the way! Continue reading
Beta… you know… for correlation?
Fun and interesting error today. Here’s the actual error text:
Error 2: this form of correlated query is not supported – consider rewriting
I’d never heard of this “correlated query” business before so I had to look it up to sort out what was going on. Turns out that you can reference a table in the outside part of a query from within a subquery by calling the alias… Or, rather, you can’t in Netezza.
Tip of the day:
Check for alias references in your subquery and get rid of them Continue reading
Damn you, Murphy! … Wrong Murphy? Damn you, anyway!
Most English-speakers are familiar with Murphy’s Law, which states:
Anything that can go wrong, will go wrong.
Realistically, this should read:
Anything that can go wrong, will go wrong… in the worst possible way at the most inconvenient time.
There are a lot of opportunities for Murphy’s Law to prove itself in the world of data analysis. Here are some of the ones that most often occur in my traipsing through the data. Continue reading
Slicing data into manageable chunks for viewing is crucial when you start dealing with more records than will fit in something like Excel (without PowerPivot, of course). One of the most common ways to look at data in a more easily-digestible manner is to use percentiles, or some derivative thereof, to group records based on a ranking. This allows you to then compare equal-sized groups to one another in order to form conclusions as to relative behavior.
Most SQL variants have workarounds for how to accomplish this task that may or may not actually cover 100% of your data (may drop a few records here or there when trying to round a percentage to a whole number of records to pull). PostgreSQL, on the other hand, has a handy function built in for doing this sort of thing without having to worry about getting full coverage on your table. Continue reading
There are a whole set of fields in the databases I’m using here that are tilde-delimited (~) varchar strings with a mess of key-value pairs, the values from which I really need. Unfortunately, since they are varying character lengths, in no particular set order within that field, it is impossible to substring your way efficiently through them. Thankfully, there is a RegEx genius on my team who produced a handy chuck of code that pgSQL can easily recognize, parse and process for pulling precisely what I need. Continue reading
Fun new conundrum in dealing with data formats/displays in PostgreSQL! A client recently requested that we provide a data extract that included percentages as a three character number, left zero-padded, rather than a decimal/numeric. There’s not a great way in Postgres to show a numeric with leading zeros (actually, I’ve yet to find a data type that does this consistently as a built-in to any platform). Instead, you have to do a little bit of work to get to your end result as a character (or text) field. Continue reading
Recently, in speaking with the President of Kobie regarding my team, I used the term “data munging” to describe a lot of the work that we do. He laughed, thinking I had said “data munching” (mmmm, tasty!) and asked if that was a technical term. The short answer is that yes, it is another term for data wrangling (which, incidentally, is one of my favorite terms in the industry). Continue reading
In one of the PostgreSQL systems we work with often, there is only one street address field. To get around the need for a change to the table structure, our IT folks simply use a new line character (\n) to denote that there is a second line of address information. The problem becomes, when we go to generate a mailing list, how can we export a list to send to the mailhouse with all address information on one line?
The answer is to use the SPLIT_PART function. See the example below: Continue reading
Hit a rather irksome problem recently in developing a WordPress website on the StudioPress Genesis framework. I could not find anywhere to change the default sort order (Newest to Oldest by time of publication). Nothing in the functions. Nothing I could find in the lib settings. Nothing in the theme options (which I believe should be changed).
Finally, after hearing back from the developers and having them point me to a nice, clean function for supposedly resolving this, I gave it a try. It had the unfortunate side effect of knocking out my static homepage because it went into the loop, thought the homepage had a loop, and decided to simply sort all the posts in the database into that page. At least it showed them in the specified order.
Thankfully, I came up with a quick way to resolve this pesky problem. All you have to do is wrap the function with a check to determine if it is a page. Here’s the code: Continue reading