Understanding data objects in R: one-dimensional, two-dimensional, and multi-dimensional objects
R is a convenient programming language in data analytics and offers a wide range of data structures to work with. These objects can be classified based on their dimensionality, which determines how they can be used and analysed.
In this tutorial, we will explore the three main types of data objects in R: one-dimensional, two-dimensional, and multi-dimensional data objects.
One-Dimensional objects: vector
In R, the most basic data object is the vector.
A vector is a one-dimensional data structure that can hold elements of the same data type, such as numbers, characters or strings, or logical values. Vectors are commonly used for storing and manipulating single-column data. Although vectors look like rows, they actually work like columns. Vectors can be of different types, including numeric, character, logical, and complex.
# Creating a vector
numbers <- c(1, 2, 3, 4, 5)
> numbers
[1] 1 2 3 4 5
characters <- c("apple", "banana", "cherry")
> characters
[1] "apple" "banana" "cherry"
Please note that in R, vectors can store values of different types, but they are coerced to a common type. For example, a vector containing both strings and numbers will convert all elements to strings:
> characters <- c("apple", "banana", "cherry", 12, 34)
> class(characters)
[1] "character"
Here, when creating a vector with mixed types like c("apple", "banana", "cherry", 12, 34)
, R converts all elements to a common type. In this case, since the vector contains strings and a number, R has converted all elements to strings.
Vectors can used to apply regular arithmetic operations:
> sum(numbers)
[1] 15
> mean(numbers)
[1] 3
> sqrt(numbers)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068
> numbers == c(1, 12, 3, 4, 23)
[1] TRUE FALSE TRUE TRUE FALSE
#Vector Indexing: Elements in a vector can be accessed using square brackets []
and an index value.
Indexing in R starts at 1, not 0 like in Python.
> numbers[3]
[1] 3
> characters[3]
[1] "cherry"
However, note what happens when we try to do the same in a vector with non-numeric values:
> characters * 2
Error in characters * 2 : non-numeric argument to binary operator
The error occurs because R expects numeric data to be present in the vector for arithmetic operations like multiplication. Since the vector characters
contains both character strings and numeric values, R cannot perform the multiplication operation.
Two-Dimensional objects: matrices, frames
Two-dimensional data objects include matrices, tibbles and data frames. A matrix is a two-dimensional array of elements, all of which must be of the same data type. Data frames are similar to matrices, but can hold elements of different data types in each column.
matrix_example <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
> matrix_example
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
# Creating a data frame
data_frame_example <- data.frame(
Name = c("John", "Jane", "Bob"),
Age = c(30, 25, 35),
Gender = c("Male", "Female", "Male")
)
> data_frame_example
Name Age Gender
1 John 30 Male
2 Jane 25 Female
3 Bob 35 Male
A matrix or dataframe is considered two-dimensional because it has rows and columns. To access an element in a matrix, we need to specify both the row and column indices.
For example, in my_matrix
, to access the element in the first row and the second column (3
), we use
my_matrix[1, 2]
[1] 3
> data_frame_example[2,3]
[1] "Female"
#Note that querying non-existent items will return NULL
> data_frame_example[2,4]
NULL
Multidimensional objects: arrays and lists
Arrays are similar to matrices i.e. all elements must be of the same data type, but can have more than two dimensions:
#array
> array(c(1:12), dim = c(2, 3, 2))
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
array() is a function in R used to create arrays, and c(1:12) creates a vector containing the numbers 1 through 12.
dim = c(2, 3, 2) specifies the dimensions of the array. In this case, the array will have 2 rows, 3 columns, and 2 "layers" (depth).
Lists are the most versatile data structure as can hold elements of different data types, including other data structures like vectors, matrices, and even other lists. Lists are created using the list() function:
#list
list_example <- list(
numbers = c(1, 2, 3),
characters = c("apple", "banana", "cherry"),
matrix = matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
)
$numbers
[1] 1 2 3
$characters
[1] "apple" "banana" "cherry"
$matrix
[,1] [,2]
[1,] 1 3
[2,] 2 4