Random Sampling Methods

I love random sampling :)

Simple random samples are as the name would imply a very simple, basic way of drawing samples. There's the exact same chance of drawing anything out of the group. You have a bin of coins and you take one out by random and it's the same chance for every coin. You have a lotto sphere full of balls and one comes out randomly. It's the exact same chance for all the balls in the sphere.

I use random numbers to create my iChing coin game results :)

http://www.bellaonline.com/code/iching/

With systematic samples, you break the big group into smaller sub groups. Then you choose a random number from the first group and then hop that same number down the list to get a group of random items evenly distributed throughout the set.

This makes the most sense when you look at things that have some order to them like editors in my BellaOnline system ordered by name. So let's say my figures end up grouping them into blocks of 30. I choose a random one in the first set of 30, let's say the 10th editor. Then I take the one 30 down from that - person 40. Then the editor 30 down from that - person 70. This gives me randomly drawn editors spread throughout the group.

The problem here is that you have your items in some sort of an order for this to work. If it's something fairly even like plastic forks coming off an assembly line then it's probably fine to grab every 100th one and check it for problems. However let's say it's something where the order matters a fair bit. Let's say I take Boston police data and put it in order by time of the event, and then I set up my selection so it chooses something every 24 hours to look at. So every day it's looking at 2am to grab a random sample there. All of my samples could be drunk driving arrests, because that's what happens at 2am! Even though the rest of the day is quiet, because my random sample keeps hitting at a certain frequency, it could hit when certain events are going on.

Next comes Stratified Samples. This tries to account for the above situation of "similar types of data". You divide your data INTO those groups. So you put all the drunk driving arrests together, you put all the 'area check' entries together. Now you draw randomly from each of those groups. That way you know you get a little of each type of item which should be represented.

An example with my BellaOnline editors is I could put them in categories based on how long they've been with us. Under 6 months, 6-12 months, and so on. Then if I checked them to see how well they're writing I could compare how a brand new actively-being-taught editor writes for her site vs a long term editor.

Cluster Samples takes on the task in a different way. Instead of grouping the data into groups where each group has a special trait, a cluster sample groups the data so each group represents everything in the total population. So let's take the police example again. Rather than putting all the drunk arrests into a group - which is what the stratified sample would have done - you instead put a full day into a group. That way the group has a little of everything - it has the area checks in the early morning, the drunk driving at night, the schoolbus watch in the afternoon, and so on. A random item drawn from this cluster will hopefully be just as random as a random item drawn from the entire population, since one is sort of a "smaller version" of the other.

Statistics Basics