Friday, 27 January 2012

Statistics, statistics.

It really, really is amazing what one can learn (and has to learn) just because of some spinning. I have learned on Thursday last week that Excel will find the median of a list of numbers by... choosing the value in the middle of the list. Which means that instead of finding the statistical median (half of the values are larger than this number and half of the values are smaller than it), it finds the number in the middle of a list. Thank you, Excel, I could have done that myself. By placing a simple link to the cell in the middle of the list. Because yes, I can count! So I worked on a little more, using the average instead, and grinding my teeth (figuratively speaking only, though).

In case you do not have your own version of a spinning experiment that you need to evaluate and analyse, and have no clue what I'm talking about: the average is made by summing up all numbers in a dataset and then dividing it by the amount of numbers. The median is the value in the middle of the dataset. Why do I want the median instead of the average? Median is way less vulnerable to outliers in the data. Case in point - let's assume we have a list of values that goes 1, 2, 3, 4, 100. Average is (1+2+3+4+100) divided by 5, which is 22. In contrast to this, the median of the same dataset is 3.

And Friday morning, I had a little private time with Mr. Google again, and I found out that yes, you can force Excel to get the statistical median. You can tell it to find the x-largest number in a bunch of cells. Which means that if you know how big your list of data is (I do) and see if it's an odd or even number of data (I do) and, if it's even, can divide said number giving the length by 2 (I do) and can make Excel calculate an average (I do) you can tell it to calculate the average between the two numbers in the middle of the list. Gotcha.

Oh, and I need the median... because after some fiddling and trying out and looking at data and getting brain-dead and trying something else, the most patient husband of them all and me came up with a possibility to give relative variations of thread thickness. Which uses a simple formula built around either the median (good) or the average (not quite as good). And now please excuse me while I try to get my now nicely-sized histogram bins into a good graph form.

Which will include giving my computer a split personality - because the programme I plot to do my plotting with is only running on Linux machines.


Phiala said...

Excel is notoriously bad at statistics, to the point that it gives wrong answers for some very common tasks. And for others, as you've found, doesn't give the results you'd expect.

I use R for everything (statistics and graphs), but that's a substantial learning curve if all you need is the median and a few histograms.

Anonymous said...

If I might suggest, R with the package R Commander makes simple calculations (as well as some slightly more difficult ones) easy. It's a menu-driven means of getting basic statistics. It's also a great entry into R because the code for each calculation is reported back to the user, allowing you to learn as you go.