Creating sample fractions in R
Random sampling is an important aspect in data analysis. In R, there are several ways to create a random sample, including using the built-in sample()
function and the sample_frac()
function from the dplyr
package.
In this tutorial, we will explore both methods and demonstrate how to create sample fractions in R.
1. The sample()
Function
The sample()
function is a built-in function in R that can be used to create a random sample of data. The syntax for the sample()
function is as follows:
mtcars[sample(1:nrow(mtcars), size = 3), ]
Here, 1:nrow(mtcars) creates a sequence of integers from 1 to the number of rows in the mtcars dataset.
> 1:nrow(mtcars)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
[27] 27 28 29 30 31 32
And, sample(1:nrow(mtcars), size = 3) randomly samples 3 integers from the sequence of integers created in step 2.
> mtcars[sample(1:nrow(mtcars), size = 3), ]
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
2. The sample_frac()
Function
Let’s now see how to keep a certain percentage of observations using the dplyr package: sample_frac()
. We’ll create a random sample of 23% of the observations in the mtcars
data set:
library(dplyr)
> sample_frac(mtcars, 0.23)
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 280 19.2 6 167.6 123 3.92 3.44 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.44 18.90 1 0 4 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.20 19.47 1 1 4 1
Valiant 18.1 6 225.0 105 2.76 3.46 20.22 1 0 3 1
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 17.98 0 0 3 4
Happy randomsampling!