Select multiple columns values under a boolean condition

Hello everyone,

I'm facing a little problem right now, I hope some of you will be able to help me with this !

Here's a reproducible example for my data:

I am currently working with a large dataset of multiple rows which correspond to various dataframes each. Each rows/dataframe information in common for certain variables. That's what I would like to know.

I would like to return for each couple of rows the variable that are in common (both TRUE).

I did something like that which is not effective :


data |>
 slice(1,2) |>                                          # select row 1 & 2 
    select(  where (rownames (1) == rownames (2) ) |> 
        as.vector()                                     # to export the result in vector

Thanks a lot in advance !

Maybe like this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- tibble::tribble(
  ~table,~Var.X,~Var.Y,~Var.Z,~Var.Q,
"TABLE1",T,T,T,F,
"TABLE2",T,F,T,T,
"TABLE3",T,F,T,F,
"TABLE4",T,F,T,T 
)

varnames <- setdiff (names(df1),"table")

# combine all rows to compare table variables
df1 <- df1 |>
  mutate(seqnr=row_number())
df1a <-df1
names(df1a) <- paste0("a",names(df1a))
df1b <-df1
names(df1b) <- paste0("b",names(df1b))

df2 <- tidyr::crossing(df1a,df1b) |>
  filter(aseqnr>bseqnr) |>
  select(-c(aseqnr,bseqnr)) |>
  rename(table1=atable,table2=btable) 

# variable exists in two tables
anames <- paste0("a",varnames)
bnames <- paste0("b",varnames)
df3 <- cbind(df2[,c("table1","table2")],(df2[,anames]==T) & (df2[,bnames]==T))
names(df3) <- c("table1","table2",varnames)

# copy variable name
df4 <- df3 |>
  mutate(across(
    .cols=!any_of(c("table1","table2")),
    .fns = \(x) {cc <-c("",cur_column())[1+x] ; return(cc)}
  ))

# combine variable names in column m
df4 |>
  rowwise() |>
  mutate(m=paste(c_across(!any_of(c("table1","table2"))),collapse = " ")) |>
  select(table1,table2,m) |>
  mutate(m=stringr::str_squish(m)) |>
  print()
#> # A tibble: 6 × 3
#> # Rowwise: 
#>   table1 table2 m                
#>   <chr>  <chr>  <chr>            
#> 1 TABLE2 TABLE1 Var.X Var.Z      
#> 2 TABLE3 TABLE1 Var.X Var.Z      
#> 3 TABLE3 TABLE2 Var.X Var.Z      
#> 4 TABLE4 TABLE1 Var.X Var.Z      
#> 5 TABLE4 TABLE2 Var.X Var.Z Var.Q
#> 6 TABLE4 TABLE3 Var.X Var.Z
1 Like

Well thanks ! It's very useful and work on the example !

Unfortunately, I do not understand why it does not work in my data,

The Error message i have got :

# When trying to run 

df2 <- tidyr::crossing(df1a,df1b) |>
  filter(aseqnr>bseqnr) |>
  select(-c(aseqnr,bseqnr)) |>
  rename(table1=aVariables,table2=bVariables) 

Error in `df2[, anames]`:
! Can't subset columns that don't exist.
x Column `aVariables` doesn't exist.

We can see the difference of coding techniques, which yours seems to be far greater than me ahaha

Hello @Thibaud ,

show us the code that you use before the error occurs and also the 'names' of your columns (data fields).

Well ok, thank you very much !

Here's my data structure :

First the column of the table naame, secondly the 'intitulé' to explain what is it about, after the structure, and finally all other columns are variables

Thanks in advance !

can you justify why you typed aVariables
the image of your data shows that HanOostdijk's assumed table column is Tables in yours
so I would have expect you to write aTables on that basis

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.