# Normal Distribution

I have to comment first that I absolutely adore charts and graphs and am thrilled by this chapter :) It's so useful for all the sites I run!

A normal distribution is a symmetrical, bell shaped curve. There's more probability of a "central" number and less probability of outliers the further away you get. The tallest part of the curve is at the center mean / median and then it tapers off evenly on either side. I see normal distributions all the times in my graphs. For example editors on my site write an average of 4 articles a month. Some write more, some write less, but there's a big "bump" at that 4 value and then it tapers off fairly evenly in either direction.

Here's an example of normal distributions.

A uniform distribution is one where the probability of results is even across all the options. So if I was playing Dungeons and Dragons and tracking how many times my 20-sided-die came up with each number over the course of a campaign, hopefully over time it would come up with an equal probability of each number.

The exponential distribution is one of my favorites, it's where there is a skewing going on towards one end. There are a whole lot of certain items then less, and less heading down towards the bottom number. It's not a bell where they center around a mean and tail off on either end. Rather there's a huge number on one end and a small number on the other end. An example here might be the probability of having to let go an editor for plagiarism at BellaOnline per month. It might be that the vast majority of times the probably is zero so the zero mark is extremely high. Then having to let 1 editor go would be much less frequent. Having to let 2 editors go would be even more infrequent and so on. The numbers would drop precipitously as we move along the graph.

Someone asked how I choose how to lay out a given graph.

For me, being a person primarily interested in mentoring and less in scientific analysis, the reason I make graphs is to help my visitors or editors understand something easily. I am making the graphs for an "average lay person" who knows little about math or statistics but who needs to understand something. So my main desire is to make the graph easy to understand. Some common practices - like starting the graph at a non-zero number - would baffle them. So I try to choose whatever graph helps them see the data most easily.

So really the question is - why are they looking at the graph? What do they want to learn? So let's say I'm graphing for an editor their site traffic month by month. They would be looking at this because they want to know if their traffic is going up or down or staying steady. If I gave them a pie chart it'd be confusing to see how it changed month to month. So say I show them this -

an editor looking at that cannot answer their question easily. However, let's say instead I show them this -

Now the editor can answer the questions they have about their site traffic more easily.

So the key is to know what the viewer wants to know about the data. What do they need to understand? Then you lay out the data to help them understand that.

There's a great book called The Long Tail by Chris Anderson. Long tails are prevalent on the web and quite fascinating. In essence the idea is that the majority of traffic is going to be on popular pages. So for example on my low carb site I get a ton of traffic on my low carb beer pages. Then the numbers drop down from the highs of the super-popular pages, through the middle pages, and then they start to taper off into all the hardly-hit pages - but all of those hardly hit pages still do get some traffic. The more pages you have, the more that all those smaller numbers add up.

So for example I've read that every single tune in the iTunes library gets downloaded at least once a month. Every single one! There is no entry, no matter how obscure, that lingers untouched. Someone out there in the web universe is interested in it. So by having 14 million songs in their library, they ensure at least 14 million downloads every month. Yes, some songs get far more downloads. Those are the big end of the graph. But all of those "long tail" little downloads also have a huge impact.

So when you do these types of graphs, you do look at the big front end of it. But you also look at the long tail of it, and examine that area! The graphs are immensely useful.

Nearly Normal Distributions

I see nearly-normal distributions all the time in my various websites, they are the usual way that things graph out.

A normal distribution looks like a bell when you graph it out. It's high in the middle and lower on both ends, in a symmetrical way.

The mean and median both match up because of that symmetry.

Technically a normal distribution is infinite although I have to say in practice that my nearly-normal charts are never infinite. They usually start at zero (although sometimes there are no zero values, like with article traffic, it's extremely rare for an article to get ZERO traffic at all in a given month). They usually go up to some top number, whatever that is and that is generally predictable based on history of whatever I'm tracking.

Theoretically the middle 50% of the bell shape is going to have 1 1/3 standard deviations. This is more technical than I tend to get with my graphs, but I do agree with the corresponding visual thought that the bulk of the bell is tucked right around the center point. That is, my graphs which are nearly normal distributions don't tend to be squashed very wide bells. They tend to be "sturdy thick" bells which is what this theory seems to be indicating with numbers. I do pay attention, when doing up graphs, at how "sturdy thick" a bell is, and if one is looking very squashed, it makes me pay attention.

Statistics Basics