How to subset a data frame by a rowvalue

I have a data frame (data.frame) that looks like this (simplified):

      [1] [2]  [3] 
[A]    2    4    3 
[B]    1    5    7 
[C]    2    3    4

I want to subset out all the columns that correspond to row A >= 3. So that if I did that the resulting matrix would look like this:

      [2]  [3] 
[A]    4    3 
[B]    5    7 
[C]    3    4

I can't get this for the life of me. I tried:

Test <- data.frame[, data.frame$"A" >= 3]

And I get returned 3x0

1 Like

Hi, there's a tidy solution to filter rows of data frames with dplyr::filter

new_df <- old_df %>% filter(A < 3)

So I ran:

new_df <-old_df %>% dplyr::filter("A" > 3)

and I got returned the same 3x3 (none of the columns subsetted) and now the rownames are erased and replaced with 1, 2, 3!

1 Like

I think if you resort to the t() function (transpose) in base R, it will get you a version of what you want.

You may have to reassign names at the end.

See the example below.

library(tidyverse)

df <- data.frame(
  one = c(2,1,2),
  two = c(4,5,3),
  three = c(3,7,4)
)

row.names(df) <- c('a','b','c')
df
#>   one two three
#> a   2   4     3
#> b   1   5     7
#> c   2   3     4

t(df)
#>       a b c
#> one   2 1 2
#> two   4 5 3
#> three 3 7 4

t(df) %>% as.data.frame() %>% 
  filter(a >=3) %>% 
  t() %>% 
  as.data.frame() 
#>   V1 V2
#> a  4  3
#> b  5  7
#> c  3  4

Created on 2019-10-29 by the reprex package (v0.3.0)

I just tried this and the output was the original dataframe with the columns now listed as V1, V2 etc..

My actual data frame is 5x526 where the 5 rownames are gene names "B2m", "Isg15" etc... and the columns are cell identifiers.

When I run the code, and the one I tried before, its still 5x526

Im not understanding why none of this is working!

Then it's time for a reproducible example, called a reprex, since I clearly misunderstood what you're trying to do.

While creating a reprex and got it to work with phiggins example. Im not sure, I must have made a mistake before.

Is there a way to retain the column names instead of having it erase them and put V1, V2, V3 etc...?

Also this is a stupid question but in the example code it just prints the output after as.data.frame()

how do I save this to a new variable?

1 Like

There are no stupid questions on this forum; everyone of us either asking or reading any question has the potential to learn. Sok?

@phiggins has a workable suggestion in his post:

library(tidyverse)

df <- data.frame(
  one = c(2,1,2),
  two = c(4,5,3),
  three = c(3,7,4)
)
row.names(df) <- c('a','b','c')
t(df)
t(df) %>% as.data.frame() %>% 
  filter(a >=3) %>% 
  t() %>% 
  as.data.frame() 

The resulting object df can first be assigned to its own name

df <- t(df)

Then you need to decide if you want a,b,c as rownames or a variable. If rownames, you're done; if a variable

df <- tibble::rownames_to_column(df)

To get more precise guidance, we still need some rows of your actual dataframe in a reproducible example, called a reprex,

1 Like

This is one of those cases where base R makes things much simpler, see this other solution

df <- data.frame(
    one = c(2,1,2),
    two = c(4,5,3),
    three = c(3,7,4)
)

row.names(df) <- c('a','b','c')

new_df <- df[,df["a",] >= 3]
new_df
#>   two three
#> a   4     3
#> b   5     7
#> c   3     4
2 Likes

thanks andresrcs this worked like gangbusters.

You guys are the best always so helpful.

@andresrcs has the best solution, but I'll offer a hint to simplify the data.frame creation by specifying row names directly:

df <- data.frame(
one = c(2,1,2),
two = c(4,5,3),
three = c(3,7,4),
row.names = c('A','B','C')
)

new_df <- df[,df["A",] >= 3]
new_df
#> two three
#> A 4 3
#> B 5 7
#> C 3 4

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.