Randomizing Random Selection

Standard

Random Numbers SerifThis one had been bugging me for a while now. There are a lot of analyses where it is useful to select multiple random groups. Usually, this would involve picking a bunch of numbers out of your head and trying them as the seed values (I like using phone numbers without the area codes – then I can call the person and tell them they rocked my randomization).

But today, as I struggled with pulling a multitude of sample sets, I decided to come up with a more elegant solution for generating random number seed values. Behold the random loop:

%macro loopdiloop;
%do i = 1 %to 250;

%let seedno = %sysfunc(round(%sysfunc(rand(UNIFORM))*1000000,1));

%put &seedno;

proc surveyselect data=basepopulation noprint
out = outputset
method = sys
seed = &seedno
sampsize = tochoose;
strata stratify;
run;

%put &seedno;
%put &i;

/*Processing things here*/

proc sql;
insert into trials
select &seedno as seedno, stuff as outputvar, things as othervar
from outputset;
quit;
%end;
%mend;

%loopdiloop;

Here’s the Step-by-Step:

  1. Declare your MACRO and name it.
  2. Set your DO loop for as many iterations as you want. In this example, it is set to 250.
  3. Set your seed value to a random calculation. Note that, because RAND outputs a decimal, you need to multiply it by something large and round it to an integer.
  4. Use the SURVEYSELECT procedure to select samples. Here, I have chosen to select a stratified random sample using the STRATA command.
  5. Display the current seed value and iteration using PUT commands. This is totally optional, but I find it nice to see where I am.
  6. Do whatever processing you need to do to the sample.
  7. APPEND or INSERT a summary of information into a table for later viewing. The “Trials” table here contains the seed value (so it can be referenced later if you need to reprocess or check anything or store out the sample it selected or whatever) as well as the key metrics needed, here called stuff and things.
  8. END the looping iterations and the macro! (Exclamation mark because it’s that important)
  9. Call the macro using its name.

Caveats

  • In this example, there are a number of elements that were previously processed and referenced in the code. For instance:
    • tochoose was a set of numbers sampled from a control population so that the strata could be applied appropriately.
    • The STRATA were already calculated and a variable created in both the test and control data sets on which to stratify the selection.
  • When using the LET statement, you have to use %SYSFUNC for a couple of the functions in order to get it to evaluate. If you do not, you will store out the string of what was meant to be a command, which is not helpful as a random value.

2 thoughts on “Randomizing Random Selection

  1. Joe

    Hi Lexy,

    I think your site is great!

    Curious…how would you pull a repeatable random sample of data from Netezza using a SAS passthrough wrapper? Setseed?

    • admin

      I have tried several permutations of using setseed and none seem to generate consistent resulting values from randomization. If you can append a randomized value to the data stored in Netezza, that might be your best bet. That way the random token remains on the database. Sorry I can’t be of more help on this. I’ll let you know if I come up with a repeatable method.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.