subsetting data.frame columns by rownames AND values

Ok guys this one I tried for awhile but got nowhere. I have a data frame (df) as follows:

df <- data.frame(
  one = c(2,1,2,0,0,1),
  two = c(4,5,3,0,1,3),
  three = c(1,0,2,0,7,4),
  four = c(3,2,1,0,0,0)
)

row.names(df) <- c('mm1','mm2','mm3', 'GC1', 'GC2', 'GC3')
df

    one two three four
mm1   2   4     1    3
mm2   1   5     0    2
mm3   2   3     2    1
GC1   0   0     0    0
GC2   0   1     7    0
GC3   1   3     4    0

I want to remove all columns corresponding to any rowname of ^GC that has any value greater than 0. So in this case I would remove columns 1,2, and 3. The result would look like this:

    four
mm1    3
mm2    2
mm3    1
GC1    0
GC2    0
GC3    0

Then after this is done, I would like to remove all rows of ^GC (which should now all be 0's across all columns). The final result would look like this:

    four
mm1    3
mm2    2
mm3    1

Maybe there is a simpler way to combine these two steps, but I want to be sure that I eliminate the column from the dataset that has a value in one of the ^GC rows.

!?!?!?

Thanks!

Instead of rownames you'd be better to make a new column (r say) with those values. Then split r into the GC part and the number part. Then use dplyr::group_by and dplyr::filter. Sorry I can't give you the exact code at the moment.

Hi woodward no worries. im not sure I understand exactly what you are saying!

Hm, tricky, You can do it by finding the indices of the cells you want to keep.

df <- data.frame(
  one = c(2,1,2,0,0,1),
  two = c(4,5,3,0,1,3),
  three = c(1,0,2,0,7,4),
  four = c(3,2,1,0,0,0)
)
row.names(df) <- c('mm1','mm2','mm3', 'GC1', 'GC2', 'GC3')

library(stringr)
library(tibble)
library(dplyr)
i <- which(str_detect(row.names(df), "^GC"))  # rows with GC
j <- which(names(df) %in% c("one", "two", "three", "four"))  # columns to check
k <- which(colSums(df[i, j]) == 0)  # find columns which are all 0
df %>% 
  rownames_to_column("temp") %>% # save rownames
  slice(-i) %>% 
  select(k + 1, "temp") %>% 
  column_to_rownames("temp")
#>     four
#> mm1    3
#> mm2    2
#> mm3    1

Created on 2019-11-13 by the reprex package (v0.3.0)

Thanks woodward Im testing this right now, however Im not sure how to circumvent this line:

j <- which(names(df) %in% c("one", "two", "three", "four"))  # columns to check

I need to test every column of the matrix. There are >5000 of them, and they have random names

j is just the indices of the columns you need to check. Set it however you want.

Actually you might be able to do a lot with the subset function.

j <- which(names(df) %in% names(subset(df,,one:four)))

Yes it's easier with subset.

i <- str_detect(row.names(df), "^GC")  # rows with GC
j <- names(df) %in% names(subset(df, TRUE, one:four))  # columns to check
k <- colSums(df[i, j]) == 0  # find columns which are all 0
subset(df, !i, k)

Another option

library(tidyverse)

df <- data.frame(
    one = c(2,1,2,0,0,1),
    two = c(4,5,3,0,1,3),
    three = c(1,0,2,0,7,4),
    four = c(3,2,1,0,0,0), 
    row.names = c('mm1','mm2','mm3', 'GC1', 'GC2', 'GC3')
)

keep_columns <- df %>% 
    rownames_to_column() %>% 
    filter(str_detect(rowname, "GC")) %>% 
    summarise_if(is.numeric, sum) %>% 
    select_if(~.x == 0) %>% 
    names()

df %>% 
    rownames_to_column() %>% 
    filter(str_detect(rowname, "mm")) %>% 
    column_to_rownames() %>% 
    select(keep_columns)
#>     four
#> mm1    3
#> mm2    2
#> mm3    1

Hey Andres, how would I change this if instead of a data frame, I had an object, and inside that object was a matrix and not a data frame?

If you want to apply the tidyverse based solution, you would have to convert to data frame and to matrix again at the end, it must be a base R solution working directly with a matrix but I suspect it would be hard to read and ugly to write (at least for me).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.