You’ve already been exposed to a few examples of relational and boolean operations in earlier tutorials. A formal exploration of these techniques follow.
Relational operations play an important role in data manipulation. Anytime you subset a dataset based on one or more criterion, you are making use of a relational operation. The relational operators (also known as logical binary operators) include ==
, !=
, <
, <=
, >
and >=
. The output of a condition is a logical vector TRUE
or FALSE
.
Relational operator | Syntax | Example |
---|---|---|
Exact equality | == |
3 == 4 -> FALSE |
Exact inequality | != |
3 != 4 -> TRUE |
Less than | < |
3 < 4 -> TRUE |
Less than or equal | <= |
4 <= 4 -> TRUE |
Greater than | > |
3 > 4 -> FALSE |
Greater than or equal | >= |
4 >= 4 -> TRUE |
Boolean operations can be used to piece together multiple evaluations.
R has three boolean operators: The AND operator, &
; The NOT operator, !
; And the OR operator, |
.
The &
operator requires that the conditions on both sides of the boolean operator be satisfied. You would normally use this operator when addressing a question along the lines of “x
must be satisfied AND y
must be satisfied”.
The |
operator requires that at least one condition be met on either side of the boolean operator. You would normally use this operator when addressing a question along the lines of “x
must be satisfied OR y
must be satisfied”. Note that the output will also be TRUE if both conditions are met.
The !
operator is a negation operator. It will reverse the outcome of a condition. It can be interpreted as “I do NOT want x
to be true”. So if the outcome of an expression is TRUE
, preceding that expression with !
will reverse the outcome to FALSE
and vice-versa.
Boolean operator | Syntax | Example | Outcome |
---|---|---|---|
AND | & |
4 == 3 & 1 == 1 4 == 4 & 1 == 1 |
FALSE TRUE |
OR | | |
4 == 4 | 1 == 1 4 == 3 | 1 == 1 4 == 3 | 1 == 2 |
TRUE TRUE FALSE |
NOT | ! |
! (4 == 3) ! (4 == 4) |
TRUE FALSE |
The following table breaks down all possible Boolean outcomes where T
= TRUE
and F
= FALSE
:
Boolean operation | Outcome |
---|---|
T & T |
TRUE |
T & F |
FALSE |
F & F |
FALSE |
T | T |
TRUE |
T | F |
TRUE |
F | F |
FALSE |
! T |
FALSE |
! F |
TRUE |
If the input values to a boolean operation are numeric vectors and not logical vectors, the numeric values will be interpreted as FALSE
if zero and TRUE
otherwise. For example:
1 & 2
[1] TRUE
1 & 0
[1] FALSE
Note that the operation a == (3 | 4)
is not the same as (a == 3) | (a == 4)
. The former will return FALSE
whereas the latter will return TRUE
if a = 3
. This is because the Boolean operator evaluates both sides of its expression as separate logical outcomes (i.e. T
and F
values). In the latter case, the Boolean expression is asking “is a
equal to 3
OR is a
equal to 4
”. Since one of the conditions is true, the expression ends up evaluating TRUE | FALSE
which returns TRUE
(see above table).
<- 3
a <- 4
b == 3) | (a == 4) (a
[1] TRUE
In the former expression, the boolean operator |
is evaluating 3
OR 4
on its right-hand side. As mentioned in the previous section, logical values take on a value of 0
for FALSE and any non-zero value for TRUE, so when evaluating 3 | 4
, it’s really seeing TRUE | TRUE
which, according to the aforementioned table will output TRUE
.
3 | 4
[1] TRUE
So in the end, the expression a == (3 | 4)
is really evaluating the condition a == TRUE
which returns false (since 3 is not equal to the logical value TRUE
).
== (3 | 4) a
[1] FALSE
The relational operators are used to compare single elements (i.e. one element at a time). If you want to compare two objects as a whole (e.g. multi-element vectors or data frames), use the identical()
function. For example:
<- c(1, 5, 6, 10)
a <- c(1, 5, 6)
b identical(a, a)
[1] TRUE
identical(a, b)
[1] FALSE
identical(mtcars, mtcars)
[1] TRUE
Notice that identical
returns a single logical vector, regardless the input object’s dimensions.
Note that the data structure must match as well as its element values. For example, if d
is a list and a
is an atomic vector, the output of identical
will be false even if the internal values match.
<- list( c(1, 5, 6, 10) )
d identical(a, d)
[1] FALSE
If we convert d
from a list to an atomic vector using the unlist
function (thus matching data structures), we get:
identical(a, unlist(d))
[1] TRUE
%in%
The match operator %in%
compares two sets of vectors and assesses if an element on the left-hand side of %in%
matches any of the elements on the right-hand side of the operator. For each element in the left-hand vector, R returns TRUE
if the value is present in any of the right-hand side elements or FALSE
if not.
For example, given the following vectors:
<- c( "a", "b", "cd", "fe")
v1 <- c( "b", "e") v2
find the elements in v1
that match any of the values in v2
.
%in% v2 v1
[1] FALSE TRUE FALSE FALSE
The function checks whether each element in v1
has a matching value in v2
. For example, element "a"
in v1
is compared to elements "b"
and "e"
in v2
. No matches are found and a FALSE
is returned. The next element in v1
, "b"
, is compared to both elements in v2
. This time, there is a match (v2
has an element "b"
) and TRUE
is returned. This process is repeated for all elements in v1
.
The logical vector output has the same length as the input vector v1
(four in this example).
If we swap the vector objects, we get a two element logical vector since we are now comparing each element in v2
to any matching elements in v1
.
%in% v1 v2
[1] TRUE FALSE
NA
When assessing if a value is equal to NA
the following evaluation may behave unexpectedly.
<- c (3, 67, 4, NA, 10)
a == NA a
[1] NA NA NA NA NA
The output is not a logical data type we would expect from an evaluation. Instead, you must make use of the is.na()
function:
is.na(a)
[1] FALSE FALSE FALSE TRUE FALSE
As another example, if we want to keep all rows in dataframe d
where z
= NA
, we would type:
<- data.frame(x = c(1,4,2,5,2,3,NA),
d y = c(3,2,5,3,8,1,1),
z = c(NA,NA,4,9,7,8,3))
is.na(d$z), ] d[
x y z
1 1 3 NA
2 4 2 NA
You can, of course, use the !
operator to reverse the evaluation and omit all rows where z
= NA
,
!is.na(d$z), ] d[
x y z
3 2 5 4
4 5 3 9
5 2 8 7
6 3 1 8
7 NA 1 3