A data.frame is a special kind of list: it is rectangular. Each element (column) of the list has same length, and where each row has a "row name". Each column has its own class, but the class of one column can be different from the class of another column (unlike a matrix, where all elements must have the same class).
In principle, a data.frame could have no rows and no columns:
> structure(list(character()), class = "data.frame")
NULL
<0 rows> (or 0-length row.names)
But this is unusual. It is more common for a data.frame to have many columns and many rows. Here is a data.frame with three rows and two columns (a
is numeric class and b
is character class):
> structure(list(a = 1:3, b = letters[1:3]), class = "data.frame")
[1] a b
<0 rows> (or 0-length row.names)
In order for the data.frame to print, we will need to supply some row names. Here we use just the numbers 1:3:
> structure(list(a = 1:3, b = letters[1:3]), class = "data.frame", row.names = 1:3)
a b
1 1 a
2 2 b
3 3 c
Now it becomes obvious that we have a data.frame with 3 rows and 2 columns. You can check this using nrow()
, ncol()
, and dim()
:
> x <- structure(list(a = numeric(3), b = character(3)), class = "data.frame", row.names = 1:3)
> nrow(x)
[1] 3
> ncol(x)
[1] 2
> dim(x)
[1] 3 2
R provides two other functions (besides structure()
) that can be used to create a data.frame. The first is called, intuitively, data.frame()
. It checks to make sure that the column names you supplied are valid, that the list elements are all the same length, and supplies some automatically generated row names. This means that the output of data.frame()
might now always be exactly what you expect:
> str(data.frame("a a a" = numeric(3), "b-b-b" = character(3)))
'data.frame': 3 obs. of 2 variables:
$ a.a.a: num 0 0 0
$ b.b.b: Factor w/ 1 level "": 1 1 1
The other function is called as.data.frame()
. This can be used to coerce an object that is not a data.frame into being a data.frame by running it through data.frame()
. As an example, consider a matrix:
> m <- matrix(letters[1:9], nrow = 3)
> m
[,1] [,2] [,3]
[1,] "a" "d" "g"
[2,] "b" "e" "h"
[3,] "c" "f" "i"
And the result:
> as.data.frame(m)
V1 V2 V3
1 a d g
2 b e h
3 c f i
> str(as.data.frame(m))
'data.frame': 3 obs. of 3 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
$ V2: Factor w/ 3 levels "d","e","f": 1 2 3
$ V3: Factor w/ 3 levels "g","h","i": 1 2 3