# Binomial Distribution

The Binomial Distribution, also called the Binomial Probability Distribution, has four properties:

1) a fixed number of observations n in the sample. So let's say 10 out of my total trainees in the past year.

2) Each observations can only be one of two mutually exclusive and collectively exhaustive categories. So we can say the trainee graduated or was let go. Those are exclusive options and the only two options.

3) The probability of the observation falling into the category we're interested in is consistent from observation to observation. So everyone has - overall - the same chance of graduating or not graduating.

4) All observations are independent from all other observations.

#1 seems fairly easy. You have a population of data, you decide what sample size you want to work with for ease of polling or data gathering or whatever other reason.

#2 also seems fairly easy. Hopefully even before you begin you know what it is you want to evaluate. It's the reason you started on the project.

#3 is challenging. Let's say we had a trainer issue in Oct-Dec and trainees were falling out of the system like flies as a result. If I then take my entire sample set from that time period, I could easily have a very high failure rate. I have to figure out a way to take my sample fairly across all situations to get a fair overview of what the entire year of training was like. So I have to be aware of issues like that and know how to account for them. Or let's say for some reason that younger trainees were 99% successful and elderly trainees were only 10% successful. I would want to make sure I tried to sample across age groups. If I only sampled people under the age of 25, I could get skewed results.

#4 seems relatively easy to me. This data is already done and set. Drawing an observation isn't going to affect any other observation. The data records don't know or care which one I chose. Now this *could* be different if I was in a live training room and grabbed a physical human being out of the room to interrogate them and when they came back to the room they were shaking and in tears. Now the next person I grabbed would be affected by that first interaction :) But I don't think the fear factor happens to data records.

Statistics Basics