filter() does not recognize the name of a column

I have this tibble

> DS90_2
# A tibble: 50 x 3
   Parametro           Tabla   Valor
   <chr>               <chr>   <dbl>
 1 Aluminio total (Al) Tabla 1   5  
 2 Aluminio total (Al) Tabla 2  10  
 3 Aluminio total (Al) Tabla 3   1  
 4 Aluminio total (Al) Tabla 4   1  
 5 Aluminio total (Al) Tabla 5  10  
 6 Arsénico total (As) Tabla 1   0.5
 7 Arsénico total (As) Tabla 2   1  
 8 Arsénico total (As) Tabla 3   0.1
 9 Arsénico total (As) Tabla 4   0.2
10 Arsénico total (As) Tabla 5   0.5
# ... with 40 more rows

When I use filter() to column "Parametro", does not work:

> filter(DS90_2,Parametro== "Aluminio total (Al)")
# A tibble: 0 x 3
# ... with 3 variables: Parametro <chr>, Tabla <chr>, Valor <dbl>

But, when I change the column, it works:

> filter(DS90_2,Tabla== "Tabla 1")
# A tibble: 10 x 3
   Parametro            Tabla   Valor
   <chr>                <chr>   <dbl>
 1 Aluminio total (Al)  Tabla 1  5   
 2 Arsénico total (As)  Tabla 1  0.5 
 3 Cadmio total (Cd)    Tabla 1  0.01
 4 Cinc total (Zn)      Tabla 1  3   
 5 Cobre total (Cu)     Tabla 1  1   
 6 Cromo total (Cr)     Tabla 1 NA   
 7 Manganeso total (Mn) Tabla 1  0.3 
 8 Níquel total (Ni)    Tabla 1  0.2 
 9 Plomo total (Pb)     Tabla 1  0.05
10 Selenio total (Se)   Tabla 1  0.01

I have checked spaces, letters and copied and pasted the name of the column and the selection to avoid making character errors.

Hello there,

Do a full reprex and I can have a look to see what is going on here for you (FAQ: How to do a minimal reproducible example ( reprex ) for beginners)

1 Like

1.- Minimal Dataset (Sample Data)

1.1.- Imported data from excel:

> head(DS90)
# A tibble: 6 x 6
  Parametro           `Tabla 1` `Tabla 2` `Tabla 3` `Tabla 4` `Tabla 5`
  <chr>                   <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
1 Aluminio total (Al)      5         10        1         1         10  
2 Arsénico total (As)      0.5        1        0.1       0.2        0.5
3 Cadmio total (Cd)        0.01       0.3      0.02      0.02       0.5
4 Cinc total (Zn)          3         20        5         5          5  
5 Cobre total (Cu)         1          3        0.1       1          3  
6 Cromo total (Cr)        NA         NA        2.5       2.5       10 

1.2.- Handled table

> head(DS90_2,10)
# A tibble: 10 x 3
   Parametro           Tabla   Valor
   <chr>               <chr>   <dbl>
 1 Aluminio total (Al) Tabla 1   5  
 2 Aluminio total (Al) Tabla 2  10  
 3 Aluminio total (Al) Tabla 3   1  
 4 Aluminio total (Al) Tabla 4   1  
 5 Aluminio total (Al) Tabla 5  10  
 6 Arsénico total (As) Tabla 1   0.5
 7 Arsénico total (As) Tabla 2   1  
 8 Arsénico total (As) Tabla 3   0.1
 9 Arsénico total (As) Tabla 4   0.2
10 Arsénico total (As) Tabla 5   0.5

2.- Runnable Code

2.1.- Import table from excel

> library(readxl)
> DS90 <- read_excel("DS90.xlsx")

2.2.- Manipulate columns and create another table

> library(tidyverse)
> library(dplyr)
> DS90_2<-DS90%>%pivot_longer(`Tabla 1`:`Tabla 5`,names_to="Tabla",values_to="Valor")
> head(DS90_2,8)
# A tibble: 8 x 3
  Parametro           Tabla   Valor
  <chr>               <chr>   <dbl>
1 Aluminio total (Al) Tabla 1   5  
2 Aluminio total (Al) Tabla 2  10  
3 Aluminio total (Al) Tabla 3   1  
4 Aluminio total (Al) Tabla 4   1  
5 Aluminio total (Al) Tabla 5  10  
6 Arsénico total (As) Tabla 1   0.5
7 Arsénico total (As) Tabla 2   1  
8 Arsénico total (As) Tabla 3   0.1

2.3.- Using filter() with "Parametro" column (without results):

> filter(DS90_2,Parametro=="Aluminio total (Al)")
# A tibble: 0 x 3
# ... with 3 variables: Parametro <chr>, Tabla <chr>, Valor <dbl>

Doing the same but with another column (the results are fine):

> filter(DS90_2,Tabla=="Tabla 1")
# A tibble: 10 x 3
   Parametro            Tabla   Valor
   <chr>                <chr>   <dbl>
 1 Aluminio total (Al)  Tabla 1  5   
 2 Arsénico total (As)  Tabla 1  0.5 
 3 Cadmio total (Cd)    Tabla 1  0.01
 4 Cinc total (Zn)      Tabla 1  3   
 5 Cobre total (Cu)     Tabla 1  1   
 6 Cromo total (Cr)     Tabla 1 NA   
 7 Manganeso total (Mn) Tabla 1  0.3 
 8 Níquel total (Ni)    Tabla 1  0.2 
 9 Plomo total (Pb)     Tabla 1  0.05
10 Selenio total (Se)   Tabla 1  0.01

I hope that reprex has been understood and used correctly

Yes, I understand, thank you very much. As the data obtained from reprex was presented below, I thought it would be possible to work with it.

Here are the data obtained with dput(DS90_prueba)

structure(list(Parametro = c("Aluminio total (Al)", "Arsénico total (As)", 
"Cadmio total (Cd)", "Cinc total (Zn)", "Cobre total (Cu)", "Cromo total (Cr)", 
"Manganeso total (Mn)", "Níquel total (Ni)", "Plomo total (Pb)", 
"Selenio total (Se)"), `Tabla 1` = c(5, 0.5, 0.01, 3, 1, NA, 
0.3, 0.2, 0.05, 0.01), `Tabla 2` = c(10, 1, 0.3, 20, 3, NA, 3, 
3, 0.5, 0.1), `Tabla 3` = c(1, 0.1, 0.02, 5, 0.1, 2.5, 0.5, 0.5, 
0.2, 0.01), `Tabla 4` = c(1, 0.2, 0.02, 5, 1, 2.5, 2, 2, 0.2, 
0.01), `Tabla 5` = c(10, 0.5, 0.5, 5, 3, 10, 4, 4, 1, 0.03)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

I'm not able to reproduce your issue with the sample data you have provided

# library calls
library(dplyr)
library(tidyr)

# Sample data on a copy/paste friendly format
DS90_prueba <- structure(list(Parametro = c("Aluminio total (Al)", "Arsénico total (As)", 
                                            "Cadmio total (Cd)", "Cinc total (Zn)", "Cobre total (Cu)", "Cromo total (Cr)", 
                                            "Manganeso total (Mn)", "Níquel total (Ni)", "Plomo total (Pb)", 
                                            "Selenio total (Se)"), `Tabla 1` = c(5, 0.5, 0.01, 3, 1, NA, 
                                                                                 0.3, 0.2, 0.05, 0.01), `Tabla 2` = c(10, 1, 0.3, 20, 3, NA, 3, 
                                                                                                                      3, 0.5, 0.1), `Tabla 3` = c(1, 0.1, 0.02, 5, 0.1, 2.5, 0.5, 0.5, 
                                                                                                                                                  0.2, 0.01), `Tabla 4` = c(1, 0.2, 0.02, 5, 1, 2.5, 2, 2, 0.2, 
                                                                                                                                                                            0.01), `Tabla 5` = c(10, 0.5, 0.5, 5, 3, 10, 4, 4, 1, 0.03)), row.names = c(NA, 
                                                                                                                                                                                                                                                        -10L), class = c("tbl_df", "tbl", "data.frame"))

# Relevant code
DS90_prueba %>%
    pivot_longer(`Tabla 1`:`Tabla 5`,
                 names_to = "Tabla",
                 values_to = "Valor") %>% 
    filter(Parametro == "Aluminio total (Al)")
#> # A tibble: 5 x 3
#>   Parametro           Tabla   Valor
#>   <chr>               <chr>   <dbl>
#> 1 Aluminio total (Al) Tabla 1     5
#> 2 Aluminio total (Al) Tabla 2    10
#> 3 Aluminio total (Al) Tabla 3     1
#> 4 Aluminio total (Al) Tabla 4     1
#> 5 Aluminio total (Al) Tabla 5    10

Created on 2020-10-03 by the reprex package (v0.3.0)
Try reinstalling/updating dplyr and tidyr, if the issues persist, share a proper reproducible example (like the one I have posted).

2 Likes

Thank you very much.

Regards,

Try

dplyr::filter(DS90_2,Parametro== "Aluminio total (Al)").

You might be using stats::filter, which does not handle data frames this way.

Hi, thank you, but but did not deliver any results. Finally I used:

DS90_2_df <- DS90_2 %>% as.data.frame()
nombres <- distinct(DS90_2_df, Parametro)
Al_DS90<-DS90_2_df%>%filter(Parametro==nombres[1,])

Regards,

Thank you very much!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Respondí abajo. Muchas gracias.

library(readxl)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyverse)
DS90_prueba <- read_excel("C:\\Users\\Camilo Montes\\Documents\\R\\APPS\\EA_PRUEBA\\EA-SMA-02-20\\DS90.xlsx")
head(DS90_prueba)
#> # A tibble: 6 x 6
#>   Parametro           `Tabla 1` `Tabla 2` `Tabla 3` `Tabla 4` `Tabla 5`
#>   <chr>                   <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#> 1 Aluminio total (Al)      5         10        1         1         10  
#> 2 Arsénico total (As)      0.5        1        0.1       0.2        0.5
#> 3 Cadmio total (Cd)        0.01       0.3      0.02      0.02       0.5
#> 4 Cinc total (Zn)          3         20        5         5          5  
#> 5 Cobre total (Cu)         1          3        0.1       1          3  
#> 6 Cromo total (Cr)        NA         NA        2.5       2.5       10
DS90_2<-DS90_prueba%>%pivot_longer(`Tabla 1`:`Tabla 5`,names_to="Tabla",values_to="Valor")
head(DS90_2)
#> # A tibble: 6 x 3
#>   Parametro           Tabla   Valor
#>   <chr>               <chr>   <dbl>
#> 1 Aluminio total (Al) Tabla 1   5  
#> 2 Aluminio total (Al) Tabla 2  10  
#> 3 Aluminio total (Al) Tabla 3   1  
#> 4 Aluminio total (Al) Tabla 4   1  
#> 5 Aluminio total (Al) Tabla 5  10  
#> 6 Arsénico total (As) Tabla 1   0.5
filter(DS90_2,Parametro=="Aluminio total (Al)")
#> # A tibble: 0 x 3
#> # ... with 3 variables: Parametro <chr>, Tabla <chr>, Valor <dbl>
Created on 2020-10-03 by the reprex package (v0.3.0)

The code is not runnable since we don't have access to your local files and you haven't provided sample data on a copy/paste friendly format, the reproducible example is supposed to be self contained so you have to provide code that runs with the sample data, not with a file in your computer.

Maybe if you read the reprex guide in Spanish?

3 Likes

The guide advises you not to do this.
You should be aware that you alone have your xlsx file, we dont. so no data is being communicated in a reproducible format.

try dput(DS90_prueba) , that will provide a copy/paste-able format.