Let's say I have a data.frame with an age variable and I don't notice it is a character variable. I want to subset to the rows that are at least 12 years old. It will yield the following:
library(dplyr)
mydf <- data.frame(age=c("9", "10", "11", "12", "13"))
filter(mydf, age >= 12)
#> age
#> 1 9
#> 2 12
#> 3 13
Created on 2025-04-04 with reprex v2.1.1
This is because 12 is converted to a character in the backend.
Is there a way to know that this conversion happened?
As a reference, in SAS, a message is generated such as Character values have been converted to numeric values at the places given by: (Line):(Column). 40:12
1 Like
> mydf <- data.frame(age=c("9", "10", "11", "12", "13"))
> filter(mydf, as.numeric(age) >= 12)
age
1 12
2 13
> mydf <- data.frame(age=c("9", "10", "11", "B", "12", "13"))
> filter(mydf, as.numeric(age) >= 12)
age
1 12
2 13
Warning message:
There was 1 warning in `filter()`.
ℹ In argument: `as.numeric(age) >= 12`.
Caused by warning:
! NAs introduced by coercion
Let's pretend someone didn't notice that age was stored as a character and blindly used the inequality.
You can check if age
is numeric and if not convert it to numeric, something like:
if (!(is.numeric(mydf$age))) {
warning("Warning: age is not numeric, converting to numeric")
mydf$age <- as.numeric(mydf$age)
}
I don't know if that is creating a conflict with your data formats...
Hope this helps

Having introduced R to students in undergraduate economics courses, I think the "Let's pretend..." scenario is not unlikely.
Yes, this is based on something that happened in real code. We figured out the issue but it was far from obvious. Only because some counts weren't matching in places that should have of tables made by different folks.