Sum of the number of occurrences R array

juudd · May 9, 2019, 3:55pm

Hello

How to sum the number of occurrences at each position in the fragment?
On the abscissa there are the possible letters.
On the ordinate there are the positions (position 1, position 2, ..., position 8).
In the table we have the occurrence of each letter in the 8 positions.

For fragment 1 I will have: TATLQAKA: 5 + 2 + 3 + 4 + 2 + 1 + 1 + 4 = 22
(T was found 5 times in position 1 + A which was found 2 times in position 2 + T which was found 3 times)

QCLKMLET 3 + 2 + 5 + 3 + 1 + 7 + 3 + 3 = 27

How to do it in R rather than in the hand.

Thank you very much if you have an idea!

Screenshot%20from%202019-05-09%2011-03-06
Screenshot%20from%202019-05-03%2017-02-43

Yarnabrina · May 9, 2019, 5:41pm

Welcome to the community!

I believe the following does what you want.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

df <- data.frame(Segments = c("TATLQAKA", "QCLKMLET", "EARKIFRF", "SKKELKEL", "CPVGFLGN", "VNTQPGFH", "GDGRGDIC", "IPSNFVSP", "EFFRGLNI", "SCTYQLAR"),
                 stringsAsFactors = FALSE)

str_split_fixed(string = df$Segments,
                pattern = "",
                n = 8) %>%
  as.data.frame() %>%
  mutate_all(.funs = list(counts = ~ table(.)[.])) %>%
  select(ends_with("_counts")) %>%
  mutate(scores = rowSums(x = .),
         strings = df$Segments) %>%
  select(strings, scores)
#>     strings scores
#> 1  TATLQAKA     12
#> 2  QCLKMLET     14
#> 3  EARKIFRF     11
#> 4  SKKELKEL     10
#> 5  CPVGFLGN     13
#> 6  VNTQPGFH     10
#> 7  GDGRGDIC     10
#> 8  IPSNFVSP     10
#> 9  EFFRGLNI     14
#> 10 SCTYQLAR     16

^{Created on 2019-05-09 by the reprex package (v0.2.1)}

juudd · May 10, 2019, 8:47am

Hello,

if it's possible to change

data.frame(Segments = c("TATLQAKA", "QCLKMLET", "EARKIFRF", "SKKELKEL", "CPVGFLGN", "VNTQPGFH", "GDGRGDIC", "IPSNFVSP", "EFFRGLNI", "SCTYQLAR"),
                stringsAsFactors = FALSE)
by a  segment list contained in a file like "segment.txt",

Think you

Yarnabrina · May 10, 2019, 1:59pm

I'm not sure I understand what you're asking.

I'm guessing that you meant that you have the strings in a comma separated text file. Is that it?

If so, then it can be done easily. First, suppose there's a text file named segment.txt in the working directory. It's contents are in this form:

"Segments"
"TATLQAKA"
"QCLKMLET"
"EARKIFRF"
"SKKELKEL"
"CPVGFLGN"
"VNTQPGFH"
"GDGRGDIC"
"IPSNFVSP"
"EFFRGLNI"
"SCTYQLAR"

Then, read that file in R using read.csv and use the rest of the code.

By the way, I found that strsplit can be used in this instead of stringr::str_split_fixed. Hence, being biased, let me post that solution too (essentially they are same).

library(package = "dplyr")

dataset <- read.csv(file = "segment.txt",
                    stringsAsFactors = FALSE)

do.call(what = rbind,
        args = strsplit(x = dataset$Segments,
                        split = "")) %>%
  as.data.frame() %>%
  mutate_all(.funs = ~ table(.)[.]) %>%
  mutate(strings = dataset$Segments,
         scores = rowSums(.[-9]))

Now, if your question is answered, will you please consider marking this thread as solved?

If you don't know how to do it, please take a look at this thread:

juudd · May 10, 2019, 2:58pm

Hi, Yarnabrina

Thank you very much !

system · May 17, 2019, 2:58pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.