Date Intervals in HiveQL

Standard

black-beehive-wigThis picture is definitely not me. But as this is my first post regarding Hive, I felt the need to include a photo of a ridiculous beehive hairdo.

As is the case with many other Data Scientists, I am being pulled increasingly into the world of Hadoop and all the technologies associated with it. Lately this has meant trying to sort out how to do certain functions in HiveQL that I’ve grown familiar and comfortable with in various types of SQL.

Today’s conundrum was trying to determine if someone is at least 18 years old based on their birthday. Normally, I would use one of the following:

date_of_birth <= current_date - interval('18 years') -- check for True condition OR extract(years from age(current_date,date_of_birth)) --check for >= 18

There are a few functions that can be used here: DATE_ADD, DATE_SUB, MONTHS_BETWEEN, or ADD_MONTHS.

The DATE_ADD and DATE_SUB are roughly synonymous except that one adds days and the other subtracts them. I suppose that you could add a negative number of days though, if you wanted to just learn one of the two. That might look something like this:

date_of_birth <= date_add(current_date,(-18*365))

The number of days being added is calculated using 365 as a nice round number for dealing with years. However, it does not take into account that there could a few leap years in the mix. The DATE_ADD function does not allow decimals in the number of days to add so it cannot be used as -18*365.25 or something. This is not my preferred method. I like more precision.

Next up is the ADD_MONTHS function, which is sort of like using the interval except that you have to calculate the interval based on months rather than years.

date_of_birth <= add_months(current_date,(-18*12))

I prefer this method because it accounts for the leap year stuff by ignoring the actual number of days and just changing the months.

Similarly, the MONTHS_BETWEEN could be used along with division to get to the number of years.

months_between(current_date,date_of_birth)/12.0 --check for >= 18

Epic Epoch Time in Netezza and PostgreSQL

Standard

EpochTimeBeganEpoch time is both a blessing and a curse. It is super-convenient for counting seconds (and doing calculations based on them) but can also be a pain to try to get into something readable as, or comparable to, a recognizable date. So today we’ll get into and out of epoch to show its flexibility without our brains having to be contortionists too. Continue reading

Calculating Months Between Two Dates in PostgreSQL

Standard

Calendar Page EOMOne of the most basic calculations on dates is to tell the time elapsed between two dates. Often it is more helpful to show the date as a number of months rather than a number of days. In PostgreSQL, subtracting one date from another will yield a number of days that then takes a tremendously complex formula to manipulate into months.

The best way I have found to get around this is to use the built in AGE function. The age function calculates the difference between two dates and returns an interval. This may not seem like we’ve gotten very far in calculating the number of months between the two dates, but stick with me on this one. You can then EXTRACT the pieces of the interval needed to calculate months and use them in a simple equation.

EXTRACT(year FROM age(end_date,start_date))*12 + EXTRACT(month FROM age(end_date,start_date))

Continue reading