You’ve already been exposed to a few examples of relational and boolean operations in earlier tutorials. A formal exploration of these techniques follow.

Relational operations

Relational operations play an important role in data manipulation. Anytime you subset a dataset based on one or more criterion, you are making use of a relational operation. The relational operators (also known as logical binary operators) include ==, !=, <, <=, > and >=. The output of a condition is a logical vector TRUE or FALSE.

Relational operator Syntax Example
Exact equality == 3 == 4 -> FALSE
Exact inequality != 3 != 4 -> TRUE
Less than < 3 < 4 -> TRUE
Less than or equal <= 4 <= 4 -> TRUE
Greater than > 3 > 4 -> FALSE
Greater than or equal >= 4 >= 4 -> TRUE

Boolean operations

Boolean operations can be used to piece together multiple evaluations.

R has three boolean operators: The AND operator, &; The NOT operator, !; And the OR operator, |.

The & operator requires that the conditions on both sides of the boolean operator be satisfied. You would normally use this operator when addressing a question along the lines of x must be satisfied AND y must be satisfied”.

The | operator requires that at least one condition be met on either side of the boolean operator. You would normally use this operator when addressing a question along the lines of “x must be satisfied OR y must be satisfied”. Note that the output will also be TRUE if both conditions are met.

The ! operator is a negation operator. It will reverse the outcome of a condition. It can be interpreted as “I do NOT want x to be true”. So if the outcome of an expression is TRUE, preceding that expression with ! will reverse the outcome to FALSE and vice-versa.

Boolean operator Syntax Example Outcome
AND & 4 == 3 & 1 == 1
4 == 4 & 1 == 1
FALSE
TRUE
OR | 4 == 4 | 1 == 1
4 == 3 | 1 == 1
4 == 3 | 1 == 2
TRUE
TRUE
FALSE
NOT ! !(4 == 3)
!(4 == 4)
TRUE
FALSE

The following table breaks down all possible Boolean outcomes where T = TRUE and F = FALSE:

Boolean operation Outcome
T & T TRUE
T & F FALSE
F & F FALSE
T | T TRUE
T | F TRUE
F | F FALSE
!T FALSE
!F TRUE

If the input values to a boolean operation are numeric vectors and not logical vectors, the numeric values will be interpreted as FALSE if zero and TRUE otherwise. For example:

1 & 2
[1] TRUE
1 & 0
[1] FALSE

A word of caution

Note that the operation a == (3 | 4) is not the same as (a == 3) | (a == 4). The former will return FALSE whereas the latter will return TRUE if a = 3. This is because the Boolean operator evaluates both sides of its expression as separate logical outcomes (i.e. T and F values). In the latter case, the Boolean expression is asking “is a equal to 3 OR is a equal to 4. Since one of the conditions is true, the expression ends up evaluating TRUE | FALSE which returns TRUE (see above table).

a <- 3
b <- 4
(a == 3) | (a == 4)
[1] TRUE

In the former expression, the boolean operator | is evaluating 3 OR 4 on its right-hand side. As mentioned in the previous section, logical values take on a value of 0 for FALSE and any non-zero value for TRUE, so when evaluating 3 | 4, it’s really seeing TRUE | TRUE which, according to the aforementioned table will output TRUE.

3 | 4
[1] TRUE

So in the end, the expression a == (3 | 4) is really evaluating the condition a == TRUE which returns false (since 3 is not equal to the logical value TRUE).

a == (3 | 4)
[1] FALSE

Comparing multidimensional objects

The relational operators are used to compare single elements (i.e. one element at a time). If you want to compare two objects as a whole (e.g. multi-element vectors or data frames), use the identical() function. For example:

a <- c(1, 5, 6, 10)
b <- c(1, 5, 6)
identical(a, a)
[1] TRUE
identical(a, b)
[1] FALSE
identical(mtcars, mtcars)
[1] TRUE

Notice that identical returns a single logical vector, regardless the input object’s dimensions.

Note that the data structure must match as well as its element values. For example, if d is a list and a is an atomic vector, the output of identical will be false even if the internal values match.

d <- list( c(1, 5, 6, 10) )
identical(a, d)
[1] FALSE

If we convert d from a list to an atomic vector using the unlist function (thus matching data structures), we get:

identical(a, unlist(d))
[1] TRUE

The match operator %in%

The match operator %in% compares two sets of vectors and assesses if an element on the left-hand side of %in% matches any of the elements on the right-hand side of the operator. For each element in the left-hand vector, R returns TRUE if the value is present in any of the right-hand side elements or FALSE if not.

For example, given the following vectors:

v1 <- c( "a", "b", "cd", "fe")
v2 <- c( "b", "e")

find the elements in v1 that match any of the values in v2.

v1 %in% v2
[1] FALSE  TRUE FALSE FALSE

The function checks whether each element in v1 has a matching value in v2. For example, element "a" in v1 is compared to elements "b" and "e" in v2. No matches are found and a FALSE is returned. The next element in v1, "b", is compared to both elements in v2. This time, there is a match (v2 has an element "b") and TRUE is returned. This process is repeated for all elements in v1.

The logical vector output has the same length as the input vector v1 (four in this example).

If we swap the vector objects, we get a two element logical vector since we are now comparing each element in v2 to any matching elements in v1.

v2 %in% v1
[1]  TRUE FALSE

Checking if a value is NA

When assessing if a value is equal to NA the following evaluation may behave unexpectedly.

a <- c (3, 67, 4, NA, 10)
a == NA
[1] NA NA NA NA NA

The output is not a logical data type we would expect from an evaluation. Instead, you must make use of the is.na() function:

is.na(a)
[1] FALSE FALSE FALSE  TRUE FALSE

As another example, if we want to keep all rows in dataframe d where z = NA, we would type:

d <- data.frame(x = c(1,4,2,5,2,3,NA), 
                y = c(3,2,5,3,8,1,1), 
                z = c(NA,NA,4,9,7,8,3))

d[ is.na(d$z), ]
  x y  z
1 1 3 NA
2 4 2 NA

You can, of course, use the ! operator to reverse the evaluation and omit all rows where z = NA,

d[ !is.na(d$z), ]
   x y z
3  2 5 4
4  5 3 9
5  2 8 7
6  3 1 8
7 NA 1 3