Ok I have been doing some statistical analysis of values in arrays lately (because it’s fun!) and wanted to pass on some of the more popular equations, along with what they mean, or what they mean to tell you.
Mean – Many of you no doubt know the mean as the average of a set of numbers, you calculate this by summing the values in the data set and dividing the sum by the number of values. This seems straight forward, and the concepts below rely on the mean so it is crucial that this be calculated. Ext.Array (http://docs.sencha.com/ext-js/4-1/#!/api/Ext.Array-method-mean) has a handy way of calculating the mean value of a dataset stored in what else – an array.
In the world of probability the mean is also known as the expected value, and provides a good idea of where the next value will fall.
Variance – Now that you have the mean value of the array, you can start to look for the variance in your dataset. The variance is the measure of how far the numbers in your dataset are spread out. The baseline for this is the mean, which is also known as the expected value! Variance is calculated by taking the mean and subtracting that number from the value, then squaring the difference (between the mean and the value) for each of the numbers in your dataset. Wait! You are only half way there! Once you complete that task you should have a new dataset comprise of squared differences, now to get the variance just calculate the mean of the dataset of squared differences.
Standard Deviation – OK, now that we have worked out the mean, and the variance, what is standard deviation, and what do we plan to do with it. Before I explain, let’s take a look at the chart below:
In this chart you see the greek symbols for mean (μ) and standard deviation (σ) . The type of curve represented by the chart above is named the bell curve. So you may be asking yourself what standard deviation is right — well, it is the square root of the variance and shows how spread out a set of numbers is. In the bell curve you can see that 34.1% of the dataset falls within -1 standard deviation and 34.1% falls within +1 standard deviation of the mean. This dataset can be categorized as normalized.
Frequency – Now let’s talk about frequency, which is commonly expressed as a histogram, or a chart that contains ‘bins’ and the number of items per ‘bin’.
Now, how do you go about calculating the values above given a set of values in a JavaScript array?
Luckily, I stumbled across some handy JavaScript snippets from Larry Battle (http://bateru.com/news/2011/03/javascript-standard-deviation-variance-average-functions/ ), and of course when you couple that with some extjs Array singleton functions you get an easy to use list of stats functions.
Average (mean)
var avg = Ext.mean(yourArray);
Variance
// Function returns variance for an array of numbers
function variance(arr){
// Make sure that your input is of type array
if (!Ext.isArray(arr)) {
return false;
}
var avg = Ext.mean(arr);
i = arr.length,
v = 0;
while (i--){
v+= Math.pow((arr[i] – avg), 2 );
}
V /= arr.length;
return (v);
}
Standard Deviation
function stdev(arr){
// Make sure that your input is of type array
if (!Ext.isArray(arr)) {
return false;
}
var sd = Math.sqrt(variance(arr));
return sd;
}
Frequency
function histogram(arrayOfNumbers){
var bins = 10,
histogramDataValues = arrayOfNumbers, // Source Values
i = 0,
over = 0,
under = 0,
binContent = [], // Becomes the data array
delta;
var lowestNumber = Ext.Array.min(histogramDataValues),
highestNumber = Ext.Array.max(histogramDataValues),
delta = Math.abs((lowestNumber - highestNumber) / bins);
// Set the initial value of all bins to zero
// and populate the categories data based on the deltas
for (; i <= bins; i++) {
binContent[i] = 0;
histogramDataValues[i] = lowestNumber + Math.round(100 * (i - 1) * delta) / 100;
}
// Populate the bins with the
for (i = 0; i < histogramDataValues.length; i++) {
if (histogramDataValues[i] < lowestNumber) {
under++;
}
else if (histogramDataValues[i] >= highestNumber) {
over++;
}
else {
var ndata = histogramDataValues[i] - lowestNumber,
thisBin = Math.floor(ndata / delta);
thisBin = Math.abs(thisBin);
binContent[thisBin]++;
}
}
return binContent;
}
