A few days ago, I got in touch with the parent of a friend of mine from high school who is a college counselor with Kelleher Cohen Associates in the Boston area. Her job is to help high school students find colleges that fit their personality and academic needs, apply for financial aid, and the complete the application process for the schools.
During the course of conversation, she asked about what I was doing for work. When I started describing marketing analytics to her, she got even more inquisitive. Turns out she has a female student with whom she is working that is very interested in Math. It’s not the student’s best subject, but the one that she looks forward to every day. As this student nears college age, she has expressed that she will likely not pursue math further. When I asked why she would abandon a subject that she enjoys, she said that, according to the student, “Pretty girls don’t do math.” Continue reading
Slicing data into manageable chunks for viewing is crucial when you start dealing with more records than will fit in something like Excel (without PowerPivot, of course). One of the most common ways to look at data in a more easily-digestible manner is to use percentiles, or some derivative thereof, to group records based on a ranking. This allows you to then compare equal-sized groups to one another in order to form conclusions as to relative behavior.
Most SQL variants have workarounds for how to accomplish this task that may or may not actually cover 100% of your data (may drop a few records here or there when trying to round a percentage to a whole number of records to pull). PostgreSQL, on the other hand, has a handy function built in for doing this sort of thing without having to worry about getting full coverage on your table. Continue reading
Recently, in speaking with the President of Kobie regarding my team, I used the term “data munging” to describe a lot of the work that we do. He laughed, thinking I had said “data munching” (mmmm, tasty!) and asked if that was a technical term. The short answer is that yes, it is another term for data wrangling (which, incidentally, is one of my favorite terms in the industry). Continue reading
This one had been bugging me for a while now. There are a lot of analyses where it is useful to select multiple random groups. Usually, this would involve picking a bunch of numbers out of your head and trying them as the seed values (I like using phone numbers without the area codes – then I can call the person and tell them they rocked my randomization).
But today, as I struggled with pulling a multitude of sample sets, I decided to come up with a more elegant solution for generating random number seed values. Behold the random loop: Continue reading