Calculating Percentiles in PostgreSQL

Standard

Slicing data into manageable chunks for viewing is crucial when you start dealing with more records than will fit in something like Excel (without PowerPivot, of course). One of the most common ways to look at data in a more easily-digestible manner is to use percentiles, or some derivative thereof, to group records based on a ranking. This allows you to then compare equal-sized groups to one another in order to form conclusions as to relative behavior.

Most SQL variants have workarounds for how to accomplish this task that may or may not actually cover 100% of your data (may drop a few records here or there when trying to round a percentage to a whole number of records to pull). PostgreSQL, on the other hand, has a handy function built in for doing this sort of thing without having to worry about getting full coverage on your table.

-- Select percentiles within member_type
ntile(100) OVER(PARTITION BY member_type ORDER BY member_type, netsales_past12mo DESC) as percentile

-- Select deciles across full table
ntile(10) OVER(ORDER BY netsales_past12mo DESC) as decile

Here’s the Step-by-Step:

Percentiles within Member Type Example

  1. Use the NTILE function to split the data into 100 roughly even sized groups
  2. Specify the PARTITION that will specify that the percentiles be calculated within each distinct member type
  3. Specify the ORDER BY to rank the customers within each partition – this part is REALLY important if you want the ntile to be meaningful
  4. This is coded as though being selected into a new variable, hence the field alias

Deciles across the Full Table Example

  1. Use the NTILE function to split the data into 10 roughly even sized groups
  2. Specify the ORDER BY to rank the customers within each partition – this part is REALLY important if you want the ntile to be meaningful
  3. This is coded as though being selected into a new variable, hence the field alias

Caveats:

  • You’ll note that the main difference between the two examples above is the use of the partition. If you leave out the partition, the entire source will sort as one list. Partitions are used to window data so that the system kinda’ treats them as mini-lists and you can do some other really cool stuff with that. More on windowing in another post soon.

4 thoughts on “Calculating Percentiles in PostgreSQL

    • admin

      Thanks for the catch! I’ve updated it everywhere I could find it. This is a particularly interesting choice of capitalization given that PostgreSQL stands for “Post-INGRES SQL” where INGRES is an acronym. So, grammatically, it should be Post-GRES SQL. But I’ll give it to them since it’s now a proper name. Thanks for reading the post with such an eye for detail. 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.