R Functions For Statistics
These are some useful R functions for statistics.
These are some useful first functions in R as I needed them.
One of the first things you need to learn to do is create functions. We are often given a data set of numbers to deal with. Those numbers need to be in a function to take advantage of the power of R programming.
Squaring A Number
Squaring a number looks like this in your R console:
11.17^2
Square Root Of A Number
To take the square root of a number, use this function in your R console:
sqrt(36)
Creating Functions
Enter the code I am giving you into your Rgui or Rstudio or whatever else you are using and press enter.
Here is how to create a function:
x=c()
We are usually given a data set to work with. Let us start with some basic. I will just enter the number one through ten so we have something to work with.
x=c(2,2,1,3,3,3,4,5,6,7,7,8,8,8,9,9,9,9,2,3,4,1,1,2,2,6,6,7,7,8,8,9,9,)
Now, we have a function with data.
Ordering Data
Let us now order the data.
order(x)
This version of order sorts the data by position.
[1] 3 22 23 1 2 19 24 25 4 5 6 20 7 21 8 9 26 27 10 11 28 29 12 13 14 30 31 15 16 17 18 32 33
The smallest value is 1 and it is in the 3rd, 22nd, and 23 positions.
If you want the numbers in ascending order, do this:
x[order(x)]
[1] 1 1 1 2 2 2 2 2 3 3 3 3 4 4 5 6 6 6 7 7 7 7 8 8 8 8 8 9 9 9 9 9 9
This version of the order function sorts the numbers in ascending order.
If you want the data set in descending order then do this:
x[order(x,decreasing=TRUE)]
You will get this:
[1] 9 9 9 9 9 9 8 8 8 8 8 7 7 7 7 6 6 6 5 4 4 3 3 3 3 2 2 2 2 2 1 1 1
Calculating A Sum
To calculate a sum of some numbers, put them into a function like we did above.
Importing or copying/pasting works just fine.
x=c(1,2,3,4,5)
Then just use the sum() function.
sum(x)
Calculating The Range
First, we need to get some data into a function.
x=c(55,22,87,14,64,62,94,91,61,44,11)
Next, we order the data.
x[order(x)]
To find the range, subtract the min from the max.
\(94-11=83\)
Calculating the Mean
To calculate the Mean of a data set we use the mean() function.
It accepts a vector as an input. That is what we created above.
We will just work on the data set we created above.
To use the mean() function, we use the variable name we created as an argument to the mean() function. So:
mean(x)
5.393939
Finding the Median
If we had 3 numbers then finding the median would be pretty quick. However, when we have 3000 numbers it is a different story.
The median() function also accepts a vector as input. We use it in the same way as above.
To find the median of a data set quickly we do this:
median(x)
6
Finding the Mode
The mode of a vector of values can be found using the mode() function.
It again accepts a vector as an input.
It returns the most frequently occurring value in the data set.
In any data set, there can be no mode, one mode, or multiple modes.
First, we have to create our own function to find the mode.
Type it like this:
mode=function(x)
{u=unique(x)
tab=tabulate(match(x,u))
u[tab==max(tab)]
}
Then we just do:
mode(x)
9
This also works on a character vector.
If we have:
letter=c('a','s','s','s','d','d','d','f','f','f','g','g','h','h','h','h','h','j','j','j','j','k','k','k','k','k','k','k','k','l','l','l','l','l','l')
mode(letter)
“k”
The mode of this character vector is “k” because it occurs more than any other letter.
Calculating The Standard Deviation
To calculate standard deviation, use the sd() function.
Get data into a function.
x=c(55,22,87,14,64,62,94,91,61,44,11)
Use the sd() function.
sd(x)
Calculating The Variance
To calculate the variance of some numbers, we square the standard deviation.
Get data into a function.
x=c(55,22,87,14,64,62,94,91,61,44,11)
Use the sd() function to find the standard deviation.
sd(x) = 29.7
Square this number.
29.7^2 = 882.1
Calculating The Coefficient Of Variation
The coefficient of variation is:
\[cv = \frac{\text{standard deviation}}{mean}\]
Let us start with getting data into a function.
x=c(86,70,62,68,69,54,66,55,81,68,61,62,98,54,62)
Use the mean() function to find the mean of the data.
mean(x)
Now, use the sd() function to find the standard deviation
sd(x)
Next, use the formula from above.
Cv = sd(x) / mean(x)