Using tidyverse to count times a string appears in a table.

Pioneer82 · October 21, 2019, 2:42pm

raytong, thanks for your suggestion. I'll test your code and let you know if it works or not. Thanks.

Pioneer82 · October 21, 2019, 3:35pm

Leon, I ran your test and it produced the desired output. I'm going to run it against an isolated portion of my live data and see how it fares.

Leon · October 21, 2019, 4:18pm

Let me know how it goes

Pioneer82 · October 21, 2019, 8:04pm

Leon, it runs well. It gets the results I seek. One more question. The table that results places the file names on the y-axis, and the clones in the x-axis. What function could I use if I wanted to reverse that, i.e. the clones on the x-axis and the file names on the y-axis? It would work even better because I have several thousand clones in each file so it would be easier to review the data.

Leon · October 22, 2019, 9:07am

I understand your question @Pioneer82 , but... I would highly recommend you get more familiar with Tidyverse by working through this book https://r4ds.had.co.nz/, trust me, it'll be well worth the effort.

Here, you'll get introduced to the concept of tidy data. Briefly, observations are rows, variables are columns and each cell hold one value and one value only. The Tidyverse tools are setup to work on this data format. Therefore, depending on if you want to view the clones as observations and the sequences as variables or vice versa, you should setup your tibble accordingly. Don't do calculations row-wise, do them column-wise.

Hope it makes sense, if not then there is a lot more information in the book I keep mentioning (for a reason )

But to answer your question, transposing a Tibble can be done like so:

Load libraries

library('tidyverse')

Create example data

set.seed(24816)
n = 10
d = tibble(id = sample(LETTERS, n),
           x = rnorm(n),
           y = rnorm(n),
           z = rnorm(n))
d

# A tibble: 10 x 4
   id         x       y       z
   <chr>  <dbl>   <dbl>   <dbl>
 1 Q      0.128 -0.484   0.769 
 2 K     -0.320  1.32   -0.0408
 3 J     -1.35   0.426  -0.946 
 4 G     -1.60   1.22    0.0126
 5 R      1.04   0.867   1.30  
 6 Y     -1.56   0.422  -0.948 
 7 V      1.42  -0.0327 -0.941 
 8 O     -1.17  -0.607   1.39  
 9 N     -2.30   0.121   0.0360
10 Z      1.84  -1.19    1.06

Transpose Tibble

d %>%
  gather(key = var_name, value = value, x:z) %>% 
  spread_(key = d %>% names %>% pluck(1), value = 'value')

# A tibble: 3 x 11
  var_name       G      J       K       N      O      Q     R       V      Y     Z
  <chr>      <dbl>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>  <dbl> <dbl>
1 x        -1.60   -1.35  -0.320  -2.30   -1.17   0.128 1.04   1.42   -1.56   1.84
2 y         1.22    0.426  1.32    0.121  -0.607 -0.484 0.867 -0.0327  0.422 -1.19
3 z         0.0126 -0.946 -0.0408  0.0360  1.39   0.769 1.30  -0.941  -0.948  1.06

Pioneer82 · October 22, 2019, 3:01pm

Leon, thank you for your explanation. It makes a lot of sense. I got hold of a copy of the book, so I'm reading it during my train commute :-p I'll be working plenty with R so I'm hoping the book will give me a boost up the learning curve. Again, thank you very much for you help.

Leon · October 22, 2019, 3:27pm

You're very welcome! Please remember to mark the solution to your question

Happy learning!

system · October 29, 2019, 3:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.