# Confidence Interval Estimate

A confidence interval estimate represents a range of data, and the probability that the range stated encompasses a certain population parameter. The larger the sample size you work with, the more precise the range will be.

Example #1:
Say that you are dealing with the mean of weight loss on a low carb diet after 3 months. The population is all low carb dieters. It would be extremely expensive to try to talk to every single low carb dieter around the world and find out how much weight they lost after 3 months. So instead you get a sample of 100 people who have been on a low carb diet for 3 months. The confidence interval estimate might state that based on this sample low carbers lost between 40-50 pounds in 3 months, and that we are 90% sure that the overall population mean is in this range.

What this means is that we could have chosen a group of 100 people who fairly mirror the overall group of low carbers. But there is also of course a chance that we happened to grab 100 people, for whatever reason, who are not following the low carb diet properly. They are not losing a lot of weight. It could be that for the overall population that people lose on average 80 pounds in 3 months. This particular group of 100 people we chose as our sample happened to be all light-losers. So that is what the confidence interval estimate is saying. That we're 90% sure that we *do* have the overall population mean included in our range - but that there is a chance that we don't.

The larger our sample size, the more likely it is that we include enough variety in our sample to account for outliers and strange values and other issues, and average everything out to get a more accurate representation.

Example #2:
Let's say I tracked all police incidents in Boston for an entire year. I decided to look at all the OUI traffic stops and let's say there were 1000 of them over the year. I didn't feel like examining all 1000 of them so I pull out a sample of only 10 of them. I try to figure out what time of day they're happening. In my sample of 10 I happen to hit a few at 2am and a few at noon. I might decide that OUIs happen all the time with equal frequency. However let's say that I increase my sample size to 100 observations. I now realize that even in the 100 samples there are only 3 at noon in the entire large group. So those are rare anomalies. The other 97 entries are all at 2am. Now my range is far more accurate because I have a larger sample size. Now if I state they tend to happen between 1-3am I could have a much higher probability, maybe even 95%, that the overall population mean is in that range. The confidence interval estimate is explaining what the range is for *this sample set*, and its probability that the range includes the mean of the overall population.

In the above area I talked about how the sample size getting larger caused the width of the confidence interval estimate to narrow, because more observations were being taken into account. When you look at the formula, you can see how that is represented mathematically, because the sample size N is in the denominator. Even though it's being square rooted, it is still growing, which means the overall number is shrinking.

Now let's say we start playing with the standard deviation's value. The standard deviation is in the numerator. So as that standard deviation grows, the overall range is going to grow.

Here is a graph that shows the same general group of data but one set has a standard deviation of 1, one has a standard deviation of 2, and one has a standard deviation of 3:

See how the group with a standard deviation of 1 is tall and thin and is all centered closely around its mean? In comparison the group with a standard deviation of 3 is much more spread out - much more data exists further away from that mean.

So because the range of data is already fat to begin with, the confidence interval will also get more wide as that standard deviation gets wider.

So let me think of a real life example. Let's say I'm tracking the number of birds that come to my backyard feeder every day, which I've done. Let's say that the data looks like the blue line - a standard deviation of 1 - where pretty much every day I get 15 birds. Sometimes I get 14 and sometimes 16 but most of the time it's right around 15. This means even if I look at the data over five years and grab samples to work with that the range will be pretty narrow. I know it'll be around 15. The width of that standard deviation is small, so the width of any confidence intervals will also be narrow.

However, let's say I move to a different neighborhood where the numbers fluctuate wildly. some days I get 5 birds and some days 25 birds. Now my standard deviation is short and fat - there's a far wider variance in my data. I have the same *mean* - in both cases the numbers are centering around 15. But the data in the new house is much more varied. Because the standard deviation is larger, it means that the range of my confidence interval is also larger.

Statistics Basics