If you use the tidyverse, I don't think add_column()
is the best way, since you need to always reference the original data frame (file
in your example), it's more natural to use mutate()
:
library(tidyverse)
read_csv("Marker, Alleles, Line1, Line2, Line3, Line4, Line5, Line6, Line7
1, C/G, C, Y, C, C, G, Y, N
2, A/T, A, T, T, N, K, T, A
3, G/A, A, N, G, G, G, A, X") %>%
mutate(Allele_A = substring(Alleles,1,1),
Allele_B = substring(Alleles,3,3),
Allele_missing = "N")
Then, for Allele_H, it's a bit more complicated. If I understand correctly, you want to take all values in Line1
to Line7
, select the unique()
ones, and concatenate them together with paste0(.x, collapse="")
. Since you don't want to name each of the columns manually, you will want to select them using e.g. starts_with("Line")
, that means you have to use a function that accepts selecting, such as across()
. The additional difficulty is that you need to work on rows, since a standard mutate()
would collapse each column. So that gives this somewhat big expression:
library(tidyverse)
read_csv("Marker, Alleles, Line1, Line2, Line3, Line4, Line5, Line6, Line7
1, C/G, C, Y, C, C, G, Y, N
2, A/T, A, T, T, N, K, T, A
3, G/A, A, N, G, G, G, A, X") %>%
mutate(Allele_A=substring(Alleles,1,1),
Allele_B=substring(Alleles,3,3),
Allele_missing="N") %>%
rowwise() %>%
mutate(Allele_H = paste0(unique(c_across(starts_with("Line"))),collapse=""))
Where the last column is:
... %>%
pull(Allele_H)
#> [1] "CYGN" "ATNK" "ANGX"
Of course, you can separate the function that generate Allele_H, especially if you want to add some sorting or some complex operation:
get_H <- function(vec){
c_across(all_of(vec)) %>%
unique() %>%
sort() %>%
paste0(collapse = "")
}
read_csv("Marker, Alleles, Line1, Line2, Line3, Line4, Line5, Line6, Line7
1, C/G, C, Y, C, C, G, Y, N
2, A/T, A, T, T, N, K, T, A
3, G/A, A, N, G, G, G, A, X") %>%
mutate(Allele_A=substring(Alleles,1,1),
Allele_B=substring(Alleles,3,3),
Allele_missing="N") %>%
rowwise() %>%
mutate(Allele_H = get_H(starts_with("Line"))) %>%
pull(Allele_H)
#> [1] "CGNY" "AKNT" "AGNX"