# Type 1 Error vs Type 2 Error

A type I error, with a probability of alpha, is when you reject the null hypothesis even though it's true. So let's say you have a hypothesis that the average age of a Northeastern online student is 35 years old. Then you draw a sample of 10 students to work with. Somehow you manage to draw the only 10 students in the entire population who are 90 years or older. You now reject your null hypothesis. However, you have made a type I error by rejecting it. It was actually a true hypothesis and you just happened to draw an odd set of sample observations.

A type II error, with a probability of beta, is when you accept the null hypothesis even though it's false. So again using this same situation let's say that you somehow thought the average age of Northeastern online students was 90 years old. You then drew your sample of 10 students who were all the 90 year old students. Even though they were the only 90 year old students in the entire system, you now have "confirmation" that your null hypothesis is true. You think that your results are all set and match up with what you had expected. You don't realize that, really, the average age of the entire population is 35 years old.

A researcher's hope is to draw a large enough sample that they somehow don't manage to get the outliers and nothing else in their sample set. And of course to make sure they really are randomly drawing the sample. For example if they accidentally sorted their student list by student birthdate, and then took all the ones at the bottom of the list, they could easily get this kind of a result.

It's critically important to make sure your sample really is as random as you can make it, so you don't accidentally create a pattern.

For example let's say you are figuring out the age group that goes to see a movie. So you wait until the movie's over and you decide to randomly grab 10 people from the exiting group to talk to them. However your technique is to wait until most of the people are gone, so it's more quiet, and then grab some people to talk in that quiet hallway. You figure this is still random. But because you're waiting until most are gone, you might now be selecting from the eldest people in the movie theater because they are moving the slowest. So now your results might skew and say everyone who saw that movie was over 70.

I.e. you might not even realize that your "random" sampling was already skewed because of something you were doing.

Statistics Basics