how to do iterative match over dataframe rows in R?

Hi to all
i have data like this

df <- read.delim(sep="|", text=c("
M1 M2
A/A G/G
T/T NA
NA C/C
A/A G/G
A/A A/A
"))

now I would like to match row1 of M1 column with itself and with other rows of the same column (iterative matching) if it matches then 0 otherwise 1 but any row of M1 column has NA I do not want include that row in matches and I do not match this row with other rows of M1 column and I would like to repeat this with M2 column too and finally I will sum column at bottom (6 for M1 and 10 for M2). any help in this regard is highly appreciated
finally my results looks like this

M1 M2
0 0
1 0
0 1
0 0
0 1
1 0
0 0
0 0
1 0
1 0
0 1
0 0
0 0
0 1
0 1
0 0
1 0
0 1
0 0
0 1
0 1
1 0
0 1
0 1
0 0
6 10
Thanks in advance
Genetist

Hi,

Welcome to the RStudio community!

Your data and explanation are a bit confusing. You show a very small example of input and output, but then you start talking about B3 - B5 which is mentioned nowhere in the data.

I suggest you create a reprex so we can try and help you out. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

Good luck,
PJ

Hi @genetist,
I suggest you first split your columns into "first_letter" and "second_letter", and then do the comparisons to see if they are the same:

# Add to your data to show various extra data combinations
df <- read.delim(sep="|", text=c("
|IND|M1|M2|
|1|A/A|G/G|
|2|T/T|NA|
|3|NA|C/C|
|4|T/C|A/T|
|5|G/T|A/A|
"))

df
#>    X IND   M1   M2 X.1
#> 1 NA   1  A/A  G/G  NA
#> 2 NA   2  T/T <NA>  NA
#> 3 NA   3 <NA>  C/C  NA
#> 4 NA   4  T/C  A/T  NA
#> 5 NA   5  G/T  A/A  NA

suppressPackageStartupMessages({
  library(tidyverse)
})

df %>% 
  select(!c(1,5)) %>% 
  separate(col=M1, sep="/", into=c("M1_1", "M1_2")) %>% 
  separate(col=M2, sep="/", into=c("M2_1", "M2_2")) %>%
  mutate(m1 = ifelse(M1_1 == M1_2, 1, 0),
         m2 = ifelse(M2_1 == M2_2, 1, 0)) -> out.df
out.df
#>   IND M1_1 M1_2 M2_1 M2_2 m1 m2
#> 1   1    A    A    G    G  1  1
#> 2   2    T    T <NA> <NA>  1 NA
#> 3   3 <NA> <NA>    C    C NA  1
#> 4   4    T    C    A    T  0  0
#> 5   5    G    T    A    A  0  1

out.df %>% 
  select(!c(2,3,4,5))
#>   IND m1 m2
#> 1   1  1  1
#> 2   2  1 NA
#> 3   3 NA  1
#> 4   4  0  0
#> 5   5  0  1

Created on 2021-08-06 by the reprex package (v2.0.0)

I hope this is what you wanted since your description was unclear as @pieterjanvc rightly pointed out.

Dear PJ
Good Afternoon
Sorry my bad post and thanks for your valuable time, Now I edited it I hope It will be better now. I would like to upload my out file being a new member i do not have permission to do that.
Thanking you very much

Dear DavoWW
Good Afternoon
Thanks for your kind reply and for your valuable time to help me. Now i edited my post hope it will be better now. sorry for my bad post :slight_smile:

I don't believe you can omit the NA's on a per column basis and still end up with a rectangular data.frame necessarily. So i've had to keep them in.

df <- read.delim(sep=" ", text=c("
M1 M2
A/A G/G
T/T NA
NA C/C
A/A G/G
A/A A/A
"))

names(df)

library(tidyverse)

process_a_column <- function(x){
  expand.grid(x,x) %>% 
    mutate(check = as.integer(Var1 != Var2))  %>%
    pull(check) 
}

base_df <- map_dfc(names(df),~process_a_column(df[[.]])) %>% set_names(names(df))
(df_with_sum <- bind_rows(base_df,summarise_all(base_df,sum,na.rm=TRUE)) %>% as.data.frame)

Dear nirgrahamuk
Good Morning
Thank you very much for reply. No problem even if we keep NA in final data, i am much interested in sum row i.e. 26th Row in base_df output and your code working like i expected :slight_smile:
thanking you very much for your valuable time spent to help on my problem
Regards

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.