4 Basic data types
We have already said that logical values can be used to subset a data frame, and all the values in a given column of a data frame must be of the same type or class. But what does this mean?
4.1 Understanding class
R has the following basic data classes:
- numeric (includes integer and double)
- character
- logical
- complex
- raw
Generally, in bioinformatics, values belong to one of the first three classes. Read more about the complex and raw data types here.
## [1] "numeric"
## [1] "character"
## [1] "logical"
The numeric category is fairly self-explanatory. What are character and logical?
Character values are exactly what they sound like: stored characters (letters and / or numbers). In the birthweight table, the “birth.date” and “location” columns contain character values.
## [1] "General" "Silver Hill" "Silver Hill" "Silver Hill" "Memorial"
## [6] "Memorial"
Characters are recognizable by the quotation marks that appear around them in the output. R cannot perform mathematical operations on numbers stored as characters.
Logical values are TRUE, FALSE, or NA (missing). Logical values are the result of comparing one item to another with relational operators.
The relational operators in R are:
>
greater than>=
greater than or equal to<
less than<=
less than or equal to==
equal to!=
not equal to
birthweight[birthweight$head.circumference > 35, c("length", "weeks.gestation", "maternal.height", "paternal.height")]
length | weeks.gestation | maternal.height | paternal.height | |
---|---|---|---|---|
1 | 52 | 38 | 164 | NA |
4 | 53 | 41 | 161 | 175 |
7 | 52 | 40 | 170 | 181 |
15 | 53 | 40 | 171 | 183 |
16 | 53 | 40 | 170 | 185 |
18 | 49 | 40 | 152 | 170 |
20 | 58 | 41 | 173 | 180 |
21 | 54 | 38 | 172 | 172 |
23 | 52 | 39 | 170 | 178 |
25 | 51 | 38 | 165 | NA |
31 | 58 | 41 | 172 | 185 |
33 | 51 | 40 | 168 | 181 |
34 | 51 | 39 | 157 | NA |
35 | 54 | 42 | 175 | 184 |
42 | 53 | 44 | 174 | 189 |
location | maternal.age | paternal.age | |
---|---|---|---|
11 | Memorial | 20 | 22 |
14 | Memorial | 19 | 20 |
15 | Silver Hill | 19 | 19 |
16 | Memorial | 20 | 24 |
21 | Silver Hill | 18 | 20 |
22 | Silver Hill | 20 | 23 |
26 | General | 20 | 23 |
28 | General | 20 | 20 |
37 | Silver Hill | 20 | 20 |
39 | General | 19 | NA |
42 | Silver Hill | 20 | 26 |
Notice that when R is asked to perform a comparison between a number and a missing value, the result is a missing value.
ID | paternal.age | paternal.education | paternal.cigarettes | paternal.height | |
---|---|---|---|---|---|
NA | NA | NA | NA | NA | NA |
NA.1 | NA | NA | NA | NA | NA |
7 | 365 | 30 | 10 | 25 | 181 |
24 | 321 | 39 | 10 | 0 | 171 |
NA.2 | NA | NA | NA | NA | NA |
26 | 1360 | 23 | 10 | 35 | 179 |
28 | 1363 | 20 | 10 | 35 | 185 |
NA.3 | NA | NA | NA | NA | NA |
36 | 1191 | 21 | 10 | 25 | 185 |
37 | 431 | 20 | 10 | 35 | 180 |
NA.4 | NA | NA | NA | NA | NA |
## [1] 38 39 41 41 39 39 34 38 38 38 41 37 39 41 38 35 39 37 38 44 41 37 41 41 35
## [26] 39 42 42 33 33 39 45 44
ID | birth.date | location | length | birthweight | head.circumference | weeks.gestation | smoker | maternal.age | maternal.cigarettes | maternal.height | maternal.prepregnant.weight | paternal.age | paternal.education | paternal.cigarettes | paternal.height | low.birthweight | geriatric.pregnancy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1107 | 1/25/1967 | General | 52 | 3.23 | 36 | 38 | no | 31 | 0 | 164 | 57 | NA | NA | NA | NA | 0 | FALSE |
17 | 820 | 10/7/1967 | General | 52 | 3.77 | 34 | 40 | no | 24 | 0 | 157 | 50 | 31 | 16 | 0 | 173 | 0 | FALSE |
18 | 752 | 10/19/1967 | General | 49 | 3.32 | 36 | 40 | yes | 27 | 12 | 152 | 48 | 37 | 12 | 25 | 170 | 0 | FALSE |
26 | 1360 | 2/16/1968 | General | 56 | 4.55 | 34 | 44 | no | 20 | 0 | 162 | 57 | 23 | 10 | 35 | 179 | 0 | FALSE |
28 | 1363 | 4/2/1968 | General | 48 | 2.37 | 30 | 37 | yes | 20 | 7 | 163 | 47 | 20 | 10 | 35 | 185 | 1 | FALSE |
33 | 1088 | 7/24/1968 | General | 51 | 3.27 | 36 | 40 | no | 24 | 0 | 168 | 53 | 29 | 16 | 0 | 181 | 0 | FALSE |
36 | 1191 | 9/7/1968 | General | 53 | 3.65 | 33 | 42 | no | 21 | 0 | 165 | 61 | 21 | 10 | 25 | 185 | 0 | FALSE |
39 | 1600 | 10/9/1968 | General | 53 | 2.90 | 34 | 39 | no | 19 | 0 | 165 | 57 | NA | NA | NA | NA | 0 | FALSE |
40 | 532 | 10/25/1968 | General | 53 | 3.59 | 34 | 40 | yes | 31 | 12 | 163 | 49 | 41 | 12 | 50 | 191 | 0 | FALSE |
41 | 223 | 12/11/1968 | General | 50 | 3.87 | 33 | 45 | yes | 28 | 25 | 163 | 54 | 30 | 16 | 0 | 183 | 0 | FALSE |
Many of R’s functions also return logical values.
## [1] TRUE
## [1] FALSE
4.2 Coercion: converting between classes
The birthweight data frame has three columns that should probably be logical values: “smoker”, “low.birthweight”, and “geriatric.pregnancy”. All of these are questions that can be answered with TRUE/FALSE. However, only “geriatric.pregnancy” is stored as a logical value. Storing “smoker” and “low.birthweight” as logical values would be more useful, since it allows us to subset the data frame more easily.
Changing the class of data is known as coercion.
## [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [25] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [37] TRUE TRUE FALSE FALSE FALSE FALSE
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
The as.logical()
function converted “low.birthweight” to a logical vector, but could not convert “smoker,” and returned a vector of missing data denoted by NA. Why is this?
The coercion rule in R is as follows:
logical > integer > numeric > complex > character
R can convert logical values to integers, store integers as the more general numeric type, or represent numeric data as a character, but these coercion operations cannot always be reversed without losing information.
## [1] 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
## [39] 0 0 0 0
The as.logical()
function only operates on “low.birthweight” the way we want because the data was encoded as 0s and 1s. If any other numbers were used, the results might be unexpected.
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Let’s convert the “low.birthweight” column to logical.
ID | birth.date | location | length | birthweight | head.circumference | weeks.gestation | smoker | maternal.age | maternal.cigarettes | maternal.height | maternal.prepregnant.weight | paternal.age | paternal.education | paternal.cigarettes | paternal.height | low.birthweight | geriatric.pregnancy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1107 | 1/25/1967 | General | 52 | 3.23 | 36 | 38 | no | 31 | 0 | 164 | 57 | NA | NA | NA | NA | FALSE | FALSE |
697 | 2/6/1967 | Silver Hill | 48 | 3.03 | 35 | 39 | no | 27 | 0 | 162 | 62 | 27 | 14 | 0 | 178 | FALSE | FALSE |
1683 | 2/14/1967 | Silver Hill | 53 | 3.35 | 33 | 41 | no | 27 | 0 | 164 | 62 | 37 | 14 | 0 | 170 | FALSE | FALSE |
27 | 3/9/1967 | Silver Hill | 53 | 3.55 | 37 | 41 | yes | 37 | 25 | 161 | 66 | 46 | NA | 0 | 175 | FALSE | TRUE |
1522 | 3/13/1967 | Memorial | 50 | 2.74 | 33 | 39 | yes | 21 | 17 | 156 | 53 | 24 | 12 | 7 | 179 | FALSE | FALSE |
569 | 3/23/1967 | Memorial | 50 | 2.51 | 35 | 39 | yes | 22 | 7 | 159 | 52 | 23 | 14 | 25 | NA | TRUE | FALSE |
365 | 4/23/1967 | Memorial | 52 | 3.53 | 37 | 40 | yes | 26 | 25 | 170 | 62 | 30 | 10 | 25 | 181 | FALSE | FALSE |
808 | 5/5/1967 | Silver Hill | 48 | 2.92 | 33 | 34 | no | 26 | 0 | 167 | 64 | 25 | 12 | 25 | 175 | FALSE | FALSE |
1369 | 6/4/1967 | Silver Hill | 49 | 3.18 | 34 | 38 | yes | 31 | 25 | 162 | 57 | 32 | 16 | 50 | 194 | FALSE | FALSE |
1023 | 6/7/1967 | Memorial | 52 | 3.00 | 35 | 38 | yes | 30 | 12 | 165 | 64 | 38 | 14 | 50 | 180 | FALSE | FALSE |
822 | 6/14/1967 | Memorial | 50 | 3.42 | 35 | 38 | no | 20 | 0 | 157 | 48 | 22 | 14 | 0 | 179 | FALSE | FALSE |
1272 | 6/20/1967 | Memorial | 53 | 2.75 | 32 | 40 | yes | 37 | 50 | 168 | 61 | 31 | 16 | 0 | 173 | FALSE | TRUE |
1262 | 6/25/1967 | Silver Hill | 53 | 3.19 | 34 | 41 | yes | 27 | 35 | 163 | 51 | 31 | 16 | 25 | 185 | FALSE | FALSE |
575 | 7/12/1967 | Memorial | 50 | 2.78 | 30 | 37 | yes | 19 | 7 | 165 | 60 | 20 | 14 | 0 | 183 | FALSE | FALSE |
1016 | 7/13/1967 | Silver Hill | 53 | 4.32 | 36 | 40 | no | 19 | 0 | 171 | 62 | 19 | 12 | 0 | 183 | FALSE | FALSE |
792 | 9/7/1967 | Memorial | 53 | 3.64 | 38 | 40 | yes | 20 | 2 | 170 | 59 | 24 | 12 | 12 | 185 | FALSE | FALSE |
820 | 10/7/1967 | General | 52 | 3.77 | 34 | 40 | no | 24 | 0 | 157 | 50 | 31 | 16 | 0 | 173 | FALSE | FALSE |
752 | 10/19/1967 | General | 49 | 3.32 | 36 | 40 | yes | 27 | 12 | 152 | 48 | 37 | 12 | 25 | 170 | FALSE | FALSE |
619 | 11/1/1967 | Memorial | 52 | 3.41 | 33 | 39 | yes | 23 | 25 | 181 | 69 | 23 | 16 | 2 | 181 | FALSE | FALSE |
1764 | 12/7/1967 | Silver Hill | 58 | 4.57 | 39 | 41 | yes | 32 | 12 | 173 | 70 | 38 | 14 | 25 | 180 | FALSE | FALSE |
1081 | 12/14/1967 | Silver Hill | 54 | 3.63 | 38 | 38 | no | 18 | 0 | 172 | 50 | 20 | 12 | 7 | 172 | FALSE | FALSE |
516 | 1/8/1968 | Silver Hill | 47 | 2.66 | 33 | 35 | yes | 20 | 35 | 170 | 57 | 23 | 12 | 50 | 186 | TRUE | FALSE |
272 | 1/10/1968 | Memorial | 52 | 3.86 | 36 | 39 | yes | 30 | 25 | 170 | 78 | 40 | 16 | 50 | 178 | FALSE | FALSE |
321 | 1/21/1968 | Silver Hill | 48 | 3.11 | 33 | 37 | no | 28 | 0 | 158 | 54 | 39 | 10 | 0 | 171 | FALSE | FALSE |
1636 | 2/2/1968 | Silver Hill | 51 | 3.93 | 38 | 38 | no | 29 | 0 | 165 | 61 | NA | NA | NA | NA | FALSE | FALSE |
1360 | 2/16/1968 | General | 56 | 4.55 | 34 | 44 | no | 20 | 0 | 162 | 57 | 23 | 10 | 35 | 179 | FALSE | FALSE |
1388 | 2/22/1968 | Memorial | 51 | 3.14 | 33 | 41 | yes | 22 | 7 | 160 | 53 | 24 | 16 | 12 | 176 | FALSE | FALSE |
1363 | 4/2/1968 | General | 48 | 2.37 | 30 | 37 | yes | 20 | 7 | 163 | 47 | 20 | 10 | 35 | 185 | TRUE | FALSE |
1058 | 4/24/1968 | Silver Hill | 53 | 3.15 | 34 | 40 | no | 29 | 0 | 167 | 60 | 30 | 16 | NA | 182 | FALSE | FALSE |
755 | 4/25/1968 | Memorial | 53 | 3.20 | 33 | 41 | no | 21 | 0 | 155 | 55 | 25 | 14 | 25 | 183 | FALSE | FALSE |
462 | 6/19/1968 | Silver Hill | 58 | 4.10 | 39 | 41 | no | 35 | 0 | 172 | 58 | 31 | 16 | 25 | 185 | FALSE | TRUE |
300 | 7/18/1968 | Silver Hill | 46 | 2.05 | 32 | 35 | yes | 41 | 7 | 166 | 57 | 37 | 14 | 25 | 173 | TRUE | TRUE |
1088 | 7/24/1968 | General | 51 | 3.27 | 36 | 40 | no | 24 | 0 | 168 | 53 | 29 | 16 | 0 | 181 | FALSE | FALSE |
57 | 8/12/1968 | Memorial | 51 | 3.32 | 38 | 39 | yes | 23 | 17 | 157 | 48 | NA | NA | NA | NA | FALSE | FALSE |
553 | 8/17/1968 | Silver Hill | 54 | 3.94 | 37 | 42 | no | 24 | 0 | 175 | 66 | 30 | 12 | 0 | 184 | FALSE | FALSE |
1191 | 9/7/1968 | General | 53 | 3.65 | 33 | 42 | no | 21 | 0 | 165 | 61 | 21 | 10 | 25 | 185 | FALSE | FALSE |
431 | 9/16/1968 | Silver Hill | 48 | 1.92 | 30 | 33 | yes | 20 | 7 | 161 | 50 | 20 | 10 | 35 | 180 | TRUE | FALSE |
1313 | 9/27/1968 | Silver Hill | 43 | 2.65 | 32 | 33 | no | 24 | 0 | 149 | 45 | 26 | 16 | 0 | 169 | TRUE | FALSE |
1600 | 10/9/1968 | General | 53 | 2.90 | 34 | 39 | no | 19 | 0 | 165 | 57 | NA | NA | NA | NA | FALSE | FALSE |
532 | 10/25/1968 | General | 53 | 3.59 | 34 | 40 | yes | 31 | 12 | 163 | 49 | 41 | 12 | 50 | 191 | FALSE | FALSE |
223 | 12/11/1968 | General | 50 | 3.87 | 33 | 45 | yes | 28 | 25 | 163 | 54 | 30 | 16 | 0 | 183 | FALSE | FALSE |
1187 | 12/19/1968 | Silver Hill | 53 | 4.07 | 38 | 44 | no | 20 | 0 | 174 | 68 | 26 | 14 | 25 | 189 | FALSE | FALSE |
Note that the output of as.logical(birthweight$low.birthweight)
must be assigned to the “low.birthweight” column in order for the values in the column to change.
4.3 Exercise 2: converting “smoker” from character to logical
Simple coercion is not going to convert the “smoker” column from character to logical.
How can you solve this problem?