Subset doesn't take all values from DF

I'm pretty new to coding, but it seems that my subset is missing values and I'm wondering what i am doing wrong. So, I have a data frame called «df_envel» with 4 colums : Elevation, distance, profil, date. I am trying to subset this dataframe to get only values that equals -0.1 m. I have tried multiple subset methods but all methods misses some -0.1 values and put some NA's instead. Here's the subset code lines I tried which all returns to the same number of values:

Here is my code:

f<- df_envel[which(df_envel$Elevation=='-0.1'),]

f<- df_envel %>% filter(Elevation == '-0.1')

f<- subset(df_envel, Elevation %in% '-0.1')

Does anybody know what I might be doing wrong?

So Elevation is a character string? I would have thought it would be stored as a numeric.

If Elevation is a character all of these should work. Although you may need to remove leading and trailing spaces.

Elevation is numeric, is my code okay for numeric?

no, you should not compare a numeric to a value involving quote marks ', because the quote marks will cast the value to character type

Oh my bad. So should f <- df_envel[which(df_envel$Elevation == -0.1) work?

if should work if you would expect exact values of -0.1 to be present .
otherwise you might consider a tolerance value and looking for values +- your tolerance either side of -0.1

Alright, with the tolerance it worked! Thanks a lot. For the record, here is the line code I used :
df_envel[ which(df_envel$Elevation < -0.05 & df_envel$Elevation >-0.15),]

good job :slight_smile:
for your convenience you could set a parameter early in your script for a tolerance and then use it wherever its useful.

mytol <- 0.05

...

df_envel[ which(df_envel$Elevation < (-0.1 + mytol) & df_envel$Elevation > (-0.1 - mytol),]

Oh yeah, good idea! Thank you :slight_smile:

For future reference, the official R way to compare 2 numeric values for equality is this.

isTRUE(all.equal(x, y))

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/all.equal

This is because decimals cannot always be exactly represented as floating point numbers in the computer (which uses binary storage). For example, 0.1 cannot be exactly represented.

https://www.exploringbinary.com/why-0-point-1-does-not-exist-in-floating-point/

1 Like

Also, the usual way of doing it with a tolerance is this. Take the difference then use abs() to make it positive. I added a pair of brackets around (-0.1) to make it clearer for you. Also, you don't need which in your example.

mytol <- 0.05

df_envel[abs(df_envel$Elevation - (-0.1)) < mytol, ]

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.