Epoch time is both a blessing and a curse. It is super-convenient for counting seconds (and doing calculations based on them) but can also be a pain to try to get into something readable as, or comparable to, a recognizable date. So today we’ll get into and out of epoch to show its flexibility without our brains having to be contortionists too.
I’ve recently come to love (read:be obsessed with) windowing functions in my coding. They’re just so useful and practical.
For those who haven’t experienced the joys of windowing, here’s the deal. They allow you to do calculations across multiple rows without actually having to group, thereby storing aggregate info on each record. That means you keep all the data associated with the row and can add calculated fields that rely on interaction with other rows. Pretty swiffy, huh?
Below are just a few funky functions that I’ve found helpful. I’m not saying that these aren’t resource intensive, but they may just save you from having to join to some crazy aggregation sub-queries and then export to Excel for further manipulation to get the same result.
Imagine, if you will, begin able to calculate the time between visits to a website, transactions in a store, logs from a punch-clock, etc. in just one step. Well, I have found the way!
Slicing data into manageable chunks for viewing is crucial when you start dealing with more records than will fit in something like Excel (without PowerPivot, of course). One of the most common ways to look at data in a more easily-digestible manner is to use percentiles, or some derivative thereof, to group records based on a ranking. This allows you to then compare equal-sized groups to one another in order to form conclusions as to relative behavior.
Most SQL variants have workarounds for how to accomplish this task that may or may not actually cover 100% of your data (may drop a few records here or there when trying to round a percentage to a whole number of records to pull). PostgreSQL, on the other hand, has a handy function built in for doing this sort of thing without having to worry about getting full coverage on your table.
There are a whole set of fields in the databases I’m using here that are tilde-delimited (~) varchar strings with a mess of key-value pairs, the values from which I really need. Unfortunately, since they are varying character lengths, in no particular set order within that field, it is impossible to substring your way efficiently through them. Thankfully, there is a RegEx genius on my team who produced a handy chuck of code that pgSQL can easily recognize, parse and process for pulling precisely what I need.
Fun new conundrum in dealing with data formats/displays in PostgreSQL! A client recently requested that we provide a data extract that included percentages as a three character number, left zero-padded, rather than a decimal/numeric. There’s not a great way in Postgres to show a numeric with leading zeros (actually, I’ve yet to find a data type that does this consistently as a built-in to any platform). Instead, you have to do a little bit of work to get to your end result as a character (or text) field.
In one of the PostgreSQL systems we work with often, there is only one street address field. To get around the need for a change to the table structure, our IT folks simply use a new line character (\n) to denote that there is a second line of address information. The problem becomes, when we go to generate a mailing list, how can we export a list to send to the mailhouse with all address information on one line?
The answer is to use the SPLIT_PART function. See the example below:
If you’ve worked with datetime formats at all in PostgreSQL (or any other SQL version) before, you’ve probably dealt with date_part (or extract). In pgSQL, this will pull out the numeric value of the part of the date that you have specified. For instance:
/* Pull the Month from the timestamp without time zone */ date_part('month','2012-03-13 12:45:22') /*Yields*/ 3
Another common one if you’re trying to eliminate the time portion is to truncate the date using date_trunc.
/* Truncate date so all timestamps map to midnight */ date_trunc('day','2012-03-13 12:45:22') /* Yields */ '2012-03-13 00:00:00'
Earlier today, I went looking for a similar construct to extract and isolate only the time portion and ran across a rather cool little feature. Seems there is a built-in function housed in the pg_catalog schema that allows for very quick parsing. It’s called, unsurprisingly, time.
Extracting Time Part from Timestamp Example
select pg_catalog.time('2012-03-13 12:45:22'); /* Yields */ '12:45:22'
It may not look like much code (which, to be fair, I greatly appreciated), but this works like a charm. You can then use the time syntax for any grouping or editing you want to do. For instance, this was being used to chunk up the day into meal periods for a client analysis so we could state that transaction_time between ’12:00:00′ and ’15:00:00′ is lunch with a simple case statement.
After finding this awesome little tidbit, I did some more research on this mysterious pg_catalog schema and it turns out that there are TONS of functions hidden away in there (including one simply called ‘date’ that may suit better than date_trunc if you want only the date portion). Found a pretty comprehensive schema of pg_catalog here and will be testing out the functions as needed.
Do you have a better way of handling different components of dates and timestamps? I’d love to hear about it. Please leave a comment below.
One of the most basic calculations on dates is to tell the time elapsed between two dates. Often it is more helpful to show the date as a number of months rather than a number of days. In PostgreSQL, subtracting one date from another will yield a number of days that then takes a tremendously complex formula to manipulate into months.
The best way I have found to get around this is to use the built in AGE function. The age function calculates the difference between two dates and returns an interval. This may not seem like we’ve gotten very far in calculating the number of months between the two dates, but stick with me on this one. You can then EXTRACT the pieces of the interval needed to calculate months and use them in a simple equation.
EXTRACT(year FROM age(end_date,start_date))*12 + EXTRACT(month FROM age(end_date,start_date))