8 Data Structures

8.1 Overview

There are 5 main data structures in R:

Data Structure	Dimensionality	Contents	Notes
Vector	1D	homogeneous	the “base” object
Matrix	2D	homogeneous	a vector with 2 dimensions
Array	ND	homogeneous	a vector with N dimensions
List	1D; can be nested	heterogeneous	a collection of any R objects, each of any length
Data frame	2D	heterogeneous	a special kind of list: a collection of (column) vectors of any type, all of the same length

Vectors are homogeneous data structures which means all of their elements have to be of the same type (see Chapter 7), e.g. integer, double, character, logical.

Matrices and arrays are vectors with more dimensions, and as such, are also homogeneous.

Lists are the most flexible. Their elements can be any R objects, including lists, and therefore can be nested.

Data frames are a special kind of list. Their elements are one or more vectors, which can be of any type, and form columns. Therefore a data.frame is a two-dimensional data structure where rows typically correspond to cases (e.g. individuals) and columns represent variables. As such, data.frames are the most common data structure for statistical analysis.

R Data Structure summary - Best to read through this chapter first and then refer back to this figure

Tip

Check object class with class().

Check object class and contents’ types with str().

Caution

Many errors in R occur because a variable is, or gets coerced to, the wrong type or class by accident. That’s why it is essential to be able to:

check the type of a variable using typeof() or class()
convert (coerce) between types or classes using as.* functions

8.2 Vectors

A vector is the most basic and fundamental data structure in R. Other data structures are made up of one or more vectors.

x <- c(1, 3, 5, 7)
x

[1] 1 3 5 7

class(x)

[1] "numeric"

typeof(x)

[1] "double"

A vector has length() but no dim(), e.g.

length(x)

[1] 4

dim(x)

NULL

8.2.1 Initializing a vector

See Initializing vectors

8.3 Matrices

A matrix is a vector with 2 dimensions.

To create a matrix, you pass a vector to the matrix() function and specify number of rows using nrow and/or number of columns using ncol;

x <- matrix(21:50,
            nrow = 10, ncol = 3)
x

      [,1] [,2] [,3]
 [1,]   21   31   41
 [2,]   22   32   42
 [3,]   23   33   43
 [4,]   24   34   44
 [5,]   25   35   45
 [6,]   26   36   46
 [7,]   27   37   47
 [8,]   28   38   48
 [9,]   29   39   49
[10,]   30   40   50

class(x)

[1] "matrix" "array"

A matrix has length (length(x)) equal to the number of all (i, j) elements or nrow * ncol (if i is the row index and j is the column index) and dimensions (dim(x)) as expected:

length(x)

[1] 30

dim(x)

[1] 10  3

nrow(x)

[1] 10

ncol(x)

[1] 3

8.3.1 Construct by row vs. by column

By default, vectors are constructed by column (byrow = FALSE), e.g.

x <- matrix(1:20, nrow = 10, ncol = 2, byrow = FALSE)
x

      [,1] [,2]
 [1,]    1   11
 [2,]    2   12
 [3,]    3   13
 [4,]    4   14
 [5,]    5   15
 [6,]    6   16
 [7,]    7   17
 [8,]    8   18
 [9,]    9   19
[10,]   10   20

You can set the byrow argument to TRUE to fill the matrix by row instead:

x <- matrix(1:20, nrow = 10, ncol = 2, byrow = TRUE)
x

      [,1] [,2]
 [1,]    1    2
 [2,]    3    4
 [3,]    5    6
 [4,]    7    8
 [5,]    9   10
 [6,]   11   12
 [7,]   13   14
 [8,]   15   16
 [9,]   17   18
[10,]   19   20

8.3.2 Initialize a matrix

You can initialize a matrix with some constant value, e.g. 0:

x <- matrix(0, nrow = 6, ncol = 4)
x

     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    0    0    0    0
[3,]    0    0    0    0
[4,]    0    0    0    0
[5,]    0    0    0    0
[6,]    0    0    0    0

Note

To initialize a matrix with NA values, it is most efficient to use NA of the appropriate type, e.g. NA_real_ for a numeric matrix, NA_character_ for a character matrix, etc. See NA types.

For example, to initialize a numeric matrix with NA values:

x <- matrix(NA_real_, nrow = 6, ncol = 4)
x

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA
[5,]   NA   NA   NA   NA
[6,]   NA   NA   NA   NA

8.3.3 Bind vectors by column or by row

Use cbind (“column-bind”) to convert a set of input vectors to columns of a matrix. The vectors must be of the same length:

x <- cbind(1:10, 11:20, 41:50)
x

      [,1] [,2] [,3]
 [1,]    1   11   41
 [2,]    2   12   42
 [3,]    3   13   43
 [4,]    4   14   44
 [5,]    5   15   45
 [6,]    6   16   46
 [7,]    7   17   47
 [8,]    8   18   48
 [9,]    9   19   49
[10,]   10   20   50

class(x)

[1] "matrix" "array"

Similarly, you can use rbind (“row-bind”) to convert a set of input vectors to rows of a matrix. The vectors again must be of the same length:

x <- rbind(1:10, 11:20, 41:50)
x

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]   11   12   13   14   15   16   17   18   19    20
[3,]   41   42   43   44   45   46   47   48   49    50

class(x)

[1] "matrix" "array"

8.3.4 Combine matrices

cbind() and rbind() can be used to combine two or more matrices together - or vector and matrices:

cbind(matrix(1, nrow = 5, ncol = 2), matrix(2, nrow = 5, ncol = 4))

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    2    2
[2,]    1    1    2    2    2    2
[3,]    1    1    2    2    2    2
[4,]    1    1    2    2    2    2
[5,]    1    1    2    2    2    2

8.4 Arrays

Arrays are vectors with dimensions.
You can have 1D, 2D or any number of dimensions, i.e. ND arrays.

8.4.1 One-dimensional (“1D”) array

A 1D array is just like a vector but of class array and with dim(x) equal to length(x). Remember, vectors have only length(x) and undefined dim(x).

x <- 1:10
xa <- array(1:10, dim = 10)
class(x)

[1] "integer"

is.vector(x)

[1] TRUE

length(x)

[1] 10

dim(x)

NULL

class(xa)

[1] "array"

is.vector(xa)

[1] FALSE

length(xa)

[1] 10

dim(xa)

[1] 10

It is rather unlikely you will need to use a 1D array instead of a vector.

8.4.2 Two-dimensional (“2D”) array

A 2D array is a matrix:

x <- array(1:40, dim = c(10, 4))
class(x)

[1] "matrix" "array"

dim(x)

[1] 10  4

8.4.3 Multi-dimensional (“ND”) array

You can build an N-dimensional array:

x <- array(1:60, dim = c(5, 4, 3))
x

, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

, , 2

     [,1] [,2] [,3] [,4]
[1,]   21   26   31   36
[2,]   22   27   32   37
[3,]   23   28   33   38
[4,]   24   29   34   39
[5,]   25   30   35   40

, , 3

     [,1] [,2] [,3] [,4]
[1,]   41   46   51   56
[2,]   42   47   52   57
[3,]   43   48   53   58
[4,]   44   49   54   59
[5,]   45   50   55   60

class(x)

[1] "array"

You can provide names for each dimensions using the dimnames argument. It accepts a list where each elements is a character vector of length equal to the dimension length. Using the same example as above, we pass three character vector of length 5, 4, and 3 to match the length of the dimensions:

x <- array(1:60,
            dim = c(5, 4, 3),
            dimnames = list(letters[1:5],
                            c("alpha", "beta", "gamma", "delta"),
                            c("x", "y", "z")))

3D arrays can be used to represent color images. Here, just for fun, we use rasterImage() to show how you would visualize such an image:

x <- array(sample(0:255, size = 12 * 12 * 3, replace = TRUE), dim = c(12, 12, 3))
par("pty")

[1] "m"

par(pty = "s")
plot(NULL, NULL,
     xlim = c(0, 100), ylim = c(0, 100),
     axes = FALSE, ann = FALSE, pty = "s")
rasterImage(x / 255, xleft = 0, ybottom = 0, xright = 100, ytop = 100)

8.5 Lists

To define a list, we use list() to pass any number of objects.
If these objects are passed as named arguments, the names will be used as element names:

x <- list(one = 1:4,
          two = sample(seq(0, 100, by = .1), size = 10),
          three = c("mango", "banana", "tangerine"),
          four = median)
class(x)

[1] "list"

str(x)

List of 4
 $ one  : int [1:4] 1 2 3 4
 $ two  : num [1:10] 60.5 49.8 11.6 79.4 4.7 84.1 54 33.8 59.8 28.1
 $ three: chr [1:3] "mango" "banana" "tangerine"
 $ four :function (x, na.rm = FALSE, ...)

length(x)

[1] 4

8.5.1 Nested lists

Since each element can be any object, we can build nested lists:

x <- list(alpha = letters[sample(26, size = 4)],
          beta = sample(12),
          gamma = list(i = rnorm(10),
                       j = runif(10),
                       k = seq(0, 1000, length.out = 10)))
x

$alpha
[1] "m" "y" "i" "s"

$beta
 [1]  7 12  4  3  2  9 10 11  8  5  1  6

$gamma
$gamma$i
 [1] -0.49950479  0.76008816  0.66291184  1.57208812 -0.32377761 -1.54783749
 [7]  0.85863480 -0.42780248 -0.01446595  1.22413819

$gamma$j
 [1] 0.4918345 0.6308624 0.4534827 0.6671955 0.6290164 0.5113424 0.6185237
 [8] 0.6215820 0.4426468 0.2028927

$gamma$k
 [1]    0.0000  111.1111  222.2222  333.3333  444.4444  555.5556  666.6667
 [8]  777.7778  888.8889 1000.0000

In the example above, alpha, beta, and gamma, are x’s elements. Notice how the length of the list refers to the number of these top-level elements:

length(x)

[1] 3

8.5.2 Initialize a list

When setting up experiments, it can be very convenient to set up and empty list, where results will be stored (e.g. using a for-loop):

x <- vector("list", length = 4)
x

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

length(x)

[1] 4

8.5.3 Add element to a list

You can add a new elements to a list by assigning directly to an element that doesn’t yet exist, which will cause it to be created:

x <- list(a = 1:10, b = rnorm(10))
x

$a
 [1]  1  2  3  4  5  6  7  8  9 10

$b
 [1] -0.6794488 -1.1206218  2.0765351  1.5620331 -1.7668615 -0.1810182
 [7]  0.3050708  0.8432511  0.2231490  0.7249860

x$c <- 30:21
x

$a
 [1]  1  2  3  4  5  6  7  8  9 10

$b
 [1] -0.6794488 -1.1206218  2.0765351  1.5620331 -1.7668615 -0.1810182
 [7]  0.3050708  0.8432511  0.2231490  0.7249860

$c
 [1] 30 29 28 27 26 25 24 23 22 21

8.5.4 Combine lists

You can combine lists with c(), just like vectors:

l1 <- list(q = 11:14, r = letters[11:14])
l2 <- list(s = LETTERS[21:24], t = 100:97)
x <- c(l1, l2)
x

$q
[1] 11 12 13 14

$r
[1] "k" "l" "m" "n"

$s
[1] "U" "V" "W" "X"

$t
[1] 100  99  98  97

length(x)

[1] 4

8.6 Combining different types with `c()`

It’s best to use c() to either combine elements of the same type into a vector, or to combine lists.

As we’ve seen, if all arguments passed to c() are of a single type, you get a vector of that type:

x <- c(12.9, 94.67, 23.74, 46.901)
x

[1] 12.900 94.670 23.740 46.901

class(x)

[1] "numeric"

If arguments passed to c() are a mix of numeric and character, they all get coerced to character.

(x <- c(23.54, "mango", "banana", 75))

[1] "23.54"  "mango"  "banana" "75"

class(x)

[1] "character"

If you pass more types of objects (which cannot be coerced to character) you get a list, since it is the only structure that can support all of them together:

(x <- c(42, mean, "potatoes"))

[[1]]
[1] 42

[[2]]
function (x, ...) 
UseMethod("mean")
<bytecode: 0x14121f358>
<environment: namespace:base>

[[3]]
[1] "potatoes"

class(x)

[1] "list"

8.7 Data frames

Note

A data frames is a special type of list where each element has the same length and forms a column, resulting in a 2D structure. Unlike matrices, each column can contain a different data type.

data.frames are usually created with named elements:

x <- data.frame(Feat_1 = 1:5,
                Feat_2 = rnorm(5),
                Feat_3 = paste0("rnd_", sample(seq(100), size = 5)))
x

  Feat_1      Feat_2 Feat_3
1      1  0.77235502 rnd_88
2      2 -0.03160052 rnd_97
3      3 -0.25616570 rnd_89
4      4  0.39471538  rnd_2
5      5  0.95879235 rnd_37

class(x)

[1] "data.frame"

str(x)

'data.frame':   5 obs. of  3 variables:
 $ Feat_1: int  1 2 3 4 5
 $ Feat_2: num  0.7724 -0.0316 -0.2562 0.3947 0.9588
 $ Feat_3: chr  "rnd_88" "rnd_97" "rnd_89" "rnd_2" ...

class(x$Feat_1)

[1] "integer"

Note

Unlike a matrix, the elements of a data.frame are its columns, not the individual values in each position. Therefore the length of a data.frame is equal to the number of columns.

mat <- matrix(1:100, nrow = 10)
length(mat)

[1] 100

df <- as.data.frame(mat)
length(df)

[1] 10

Just like with lists, you can add new columns to a data.frame using assignment to a new element, i.e. column:

x <- data.frame(PIDN = sample(8001:9000, size = 10, replace = TRUE),
                Age = rnorm(10, mean = 48, sd = 2.9))
x

   PIDN      Age
1  8139 49.57915
2  8834 45.64672
3  8607 49.54766
4  8543 46.75052
5  8272 42.48355
6  8775 50.98014
7  8428 48.90046
8  8612 47.83906
9  8322 47.71173
10 8116 47.01615

x$Weight <- rnorm(10, mean = 84, sd = 1.5)
x

   PIDN      Age   Weight
1  8139 49.57915 80.62033
2  8834 45.64672 80.91312
3  8607 49.54766 82.99770
4  8543 46.75052 85.53435
5  8272 42.48355 83.88932
6  8775 50.98014 81.48952
7  8428 48.90046 85.75131
8  8612 47.83906 83.07899
9  8322 47.71173 83.23572
10 8116 47.01615 82.05945

8.8 Generating sequences

Other than assigning individual elements explicitly with c(), there are multiple ways to create numeric sequences.

Colon notation allows generating a simple integer sequence:

x <- 1:5
x

[1] 1 2 3 4 5

typeof(x)

[1] "integer"

seq(from, to, by)

seq(1, 10, by = .5)

 [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
[16]  8.5  9.0  9.5 10.0

seq(from, to, length.out = n)

seq(-5, 12, length.out = 11)

 [1] -5.0 -3.3 -1.6  0.1  1.8  3.5  5.2  6.9  8.6 10.3 12.0

seq(object) generates a sequence of length equal to length(object)

seq(iris)

[1] 1 2 3 4 5

seq_along(object) is the optimized version of seq(object):

seq_along(iris)

[1] 1 2 3 4 5

seq(n) is equivalent to 1:n

seq(12)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12

# same output as
1:12

 [1]  1  2  3  4  5  6  7  8  9 10 11 12

seq_len(n) is an optimized version of seq(n):

seq_len(12)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12

8.9 Naming object elements

All objects’ elements can be named.

8.9.1 Vectors

You can create a vector with named elements:

SBP = c(before = 179, after = 118)
SBP

before  after 
   179    118

Use names() to get a vector’s elements’ names:

names(SBP)

[1] "before" "after"

You can add names to an existing, unnamed, vector:

N <- c(112, 120)
names(N)

NULL

names(N) <- c("Cases", "Controls")
N

   Cases Controls 
     112      120

Matrices and data frames can have column names (colnames) and row names (rownames):

xm <- matrix(1:15, nrow = 5)
xdf <- as.data.frame(xm)
colnames(xm)

NULL

colnames(xdf)

[1] "V1" "V2" "V3"

rownames(xm)

NULL

colnames(xm) <- colnames(xdf) <- paste0("Feature", seq(3))
rownames(xm) <- rownames(xdf) <- paste0("Case", seq(5))
xm

      Feature1 Feature2 Feature3
Case1        1        6       11
Case2        2        7       12
Case3        3        8       13
Case4        4        9       14
Case5        5       10       15

xdf

      Feature1 Feature2 Feature3
Case1        1        6       11
Case2        2        7       12
Case3        3        8       13
Case4        4        9       14
Case5        5       10       15

Lists are vectors so they have names. These can be defined when a list is created using the name-value pairs or added/changed at any time.

x <- list(HospitalName = "CaliforniaGeneral",
          ParticipatingDepartments = c("Neurology", "Psychiatry", "Neurosurgery"),
          PatientIDs = 1001:1018)
names(x)

[1] "HospitalName"             "ParticipatingDepartments"
[3] "PatientIDs"

Add/Change names:

names(x) <- c("Hospital", "Departments", "PIDs")
x

$Hospital
[1] "CaliforniaGeneral"

$Departments
[1] "Neurology"    "Psychiatry"   "Neurosurgery"

$PIDs
 [1] 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015
[16] 1016 1017 1018

Remember that data a frame is a special type of list. Therefore in data frames colnames and names are equivalent:

colnames(iris)

[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

names(iris)

[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

Tip

Note: As we saw, matrices have colnames() and rownames() Using names() on a matrix will assign names to individual elements, as if it was a long vector.

8.10 Initialize - coerce - test data structures

The following table lists the functions to initialize, coerce (=convert), and test the core data structures, which are shown in more detail in the following paragraphs:

Initialize	Coerce	Test
`matrix(NA, nrow = x, ncol = y)`	`as.matrix(x)`	`is.matrix(x)`
`array(NA, dim = c(x, y, z))`	`as.array(x)`	`is.array(x)`
`vector(mode = "list", length = x)`	`as.list(x)`	`is.list(x)`
`data.frame(matrix(NA, x, y))`	`as.data.frame(x)`	`is.data.frame(x)`

8.11 Attributes

R objects may have some builtin attributes but you can add arbitrary attributes to any R object. These are used to store additional information, sometimes called metadata.

8.11.1 Print all attributes

To print an object’s attributes, use attributes:

attributes(iris)

$names
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

$class
[1] "data.frame"

$row.names
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
[109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
[127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
[145] 145 146 147 148 149 150

This returns a named list. In this case we got names, class, and row.names of the iris data frame.

8.11.2 Get or set specific attributes

You can assign new attributes using attr:

(x <- c(1:10))

 [1]  1  2  3  4  5  6  7  8  9 10

attr(x, "name") <- "Very special vector"

Printing the vector after adding a new attribute, prints the attribute name and value underneath the vector itself:

 [1]  1  2  3  4  5  6  7  8  9 10
attr(,"name")
[1] "Very special vector"

Our trusty str function will print attributes as well:

str(x)

 int [1:10] 1 2 3 4 5 6 7 8 9 10
 - attr(*, "name")= chr "Very special vector"

8.11.2.1 A matrix is a vector - a closer look

Let’s see how a matrix is literally just a vector with assigned dimensions.
Start with a vector of length 20:

x <- 1:20
x

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

The vector has no attributes - yet:

attributes(x)

NULL

To convert to a matrix, we would normally pass our vector to the matrix() function and define number of rows and/or columns:

xm <- matrix(x, nrow = 5)
xm

     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

attributes(xm)

$dim
[1] 5 4

Just for demonstration, let’s instead directly add a dimension attribute to our vector:

attr(x, "dim") <- c(5, 4)
x

     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

class(x)

[1] "matrix" "array"

Just like that, we have created a matrix.

8.1 Overview

8.2 Vectors

8.2.1 Initializing a vector

8.3 Matrices

8.3.1 Construct by row vs. by column

8.3.2 Initialize a matrix

8.3.3 Bind vectors by column or by row

8.3.4 Combine matrices

8.4 Arrays

8.4.1 One-dimensional (“1D”) array

8.4.2 Two-dimensional (“2D”) array

8.4.3 Multi-dimensional (“ND”) array

8.5 Lists

8.5.1 Nested lists

8.5.2 Initialize a list

8.5.3 Add element to a list

8.5.4 Combine lists

8.6 Combining different types with c()

8.7 Data frames

8.8 Generating sequences

8.9 Naming object elements

8.9.1 Vectors

8.10 Initialize - coerce - test data structures

8.11 Attributes

8.11.1 Print all attributes

8.11.2 Get or set specific attributes

8.11.2.1 A matrix is a vector - a closer look

8.6 Combining different types with `c()`