Subset doesn't take all values from DF

I'm pretty new to coding, but it seems that my subset is missing values and I'm wondering what i am doing wrong. So, I have a data frame called «df_envel» with 4 colums : Elevation, distance, profil, date. I am trying to subset this dataframe to get only values that equals -0.1 m. I have tried multiple subset methods but all methods misses some -0.1 values and put some NA's instead. Here's the subset code lines I tried which all returns to the same number of values:

Here is my code:

f<- df_envel[which(df_envel$Elevation=='-0.1'),]

f<- df_envel %>% filter(Elevation == '-0.1')

f<- subset(df_envel, Elevation %in% '-0.1')

Does anybody know what I might be doing wrong?

So Elevation is a character string? I would have thought it would be stored as a numeric.

If Elevation is a character all of these should work. Although you may need to remove leading and trailing spaces.

Elevation is numeric, is my code okay for numeric?

no, you should not compare a numeric to a value involving quote marks ', because the quote marks will cast the value to character type

Oh my bad. So should f <- df_envel[which(df_envel$Elevation == -0.1) work?

if should work if you would expect exact values of -0.1 to be present .
otherwise you might consider a tolerance value and looking for values +- your tolerance either side of -0.1

Alright, with the tolerance it worked! Thanks a lot. For the record, here is the line code I used :
df_envel[ which(df_envel$Elevation < -0.05 & df_envel$Elevation >-0.15),]

good job :slight_smile:
for your convenience you could set a parameter early in your script for a tolerance and then use it wherever its useful.

mytol <- 0.05


df_envel[ which(df_envel$Elevation < (-0.1 + mytol) & df_envel$Elevation > (-0.1 - mytol),]

Oh yeah, good idea! Thank you :slight_smile:

For future reference, the official R way to compare 2 numeric values for equality is this.

isTRUE(all.equal(x, y))

This is because decimals cannot always be exactly represented as floating point numbers in the computer (which uses binary storage). For example, 0.1 cannot be exactly represented.

1 Like

Also, the usual way of doing it with a tolerance is this. Take the difference then use abs() to make it positive. I added a pair of brackets around (-0.1) to make it clearer for you. Also, you don't need which in your example.

mytol <- 0.05

df_envel[abs(df_envel$Elevation - (-0.1)) < mytol, ]

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.