Using tidyverse to count times a string appears in a table.

raytong, thanks for your suggestion. I'll test your code and let you know if it works or not. Thanks.

Leon, I ran your test and it produced the desired output. I'm going to run it against an isolated portion of my live data and see how it fares.

1 Like

Let me know how it goes :slightly_smiling_face:

Leon, it runs well. It gets the results I seek. One more question. The table that results places the file names on the y-axis, and the clones in the x-axis. What function could I use if I wanted to reverse that, i.e. the clones on the x-axis and the file names on the y-axis? It would work even better because I have several thousand clones in each file so it would be easier to review the data.

I understand your question @Pioneer82 , but... I would highly recommend you get more familiar with Tidyverse by working through this book, trust me, it'll be well worth the effort.

Here, you'll get introduced to the concept of tidy data. Briefly, observations are rows, variables are columns and each cell hold one value and one value only. The Tidyverse tools are setup to work on this data format. Therefore, depending on if you want to view the clones as observations and the sequences as variables or vice versa, you should setup your tibble accordingly. Don't do calculations row-wise, do them column-wise.

Hope it makes sense, if not then there is a lot more information in the book I keep mentioning (for a reason :wink:)

But to answer your question, transposing a Tibble can be done like so:

Load libraries


Create example data

n = 10
d = tibble(id = sample(LETTERS, n),
           x = rnorm(n),
           y = rnorm(n),
           z = rnorm(n))
# A tibble: 10 x 4
   id         x       y       z
   <chr>  <dbl>   <dbl>   <dbl>
 1 Q      0.128 -0.484   0.769 
 2 K     -0.320  1.32   -0.0408
 3 J     -1.35   0.426  -0.946 
 4 G     -1.60   1.22    0.0126
 5 R      1.04   0.867   1.30  
 6 Y     -1.56   0.422  -0.948 
 7 V      1.42  -0.0327 -0.941 
 8 O     -1.17  -0.607   1.39  
 9 N     -2.30   0.121   0.0360
10 Z      1.84  -1.19    1.06  

Transpose Tibble

d %>%
  gather(key = var_name, value = value, x:z) %>% 
  spread_(key = d %>% names %>% pluck(1), value = 'value')
# A tibble: 3 x 11
  var_name       G      J       K       N      O      Q     R       V      Y     Z
  <chr>      <dbl>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>  <dbl> <dbl>
1 x        -1.60   -1.35  -0.320  -2.30   -1.17   0.128 1.04   1.42   -1.56   1.84
2 y         1.22    0.426  1.32    0.121  -0.607 -0.484 0.867 -0.0327  0.422 -1.19
3 z         0.0126 -0.946 -0.0408  0.0360  1.39   0.769 1.30  -0.941  -0.948  1.06

Leon, thank you for your explanation. It makes a lot of sense. I got hold of a copy of the book, so I'm reading it during my train commute :-p I'll be working plenty with R so I'm hoping the book will give me a boost up the learning curve. Again, thank you very much for you help.

You're very welcome! Please remember to mark the solution to your question :+1:

Happy learning!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.