Creating new data frame from 2 separate existing data frames based on a condition

Curry · February 27, 2023, 2:47pm

Hi,

I am fairly new to coding and am having a bit of trouble at the moment with this concept. I currently have 2 df with similar information, except one has a much greater data count. I need to try and create a new df3 which has the information from both df1 and df2. I need to filter the information based on df1 rows and df2 column names, and have them come out in rows. This is currently what I have done and am finding it difficult to continue.

df3 <- for (i in 1:ncol(df2)
if(df1$row == colnames(df2))

)

I'm unsure if this is even the right approach. Apologies if this is the wrong tag.

Many thanks

FJCC · February 27, 2023, 3:09pm

Sorry, I don't understand your goal. Can you write code to make three small data frames as examples of df1, df2, and df3? You can use the data.frame() function to make them and post that code between lines of three back ticks, like this
```
df1 <- data.frame(A = 1:3, ....)
df2 <- data.frame(....)
df3 <- data.frame)...)
```

Curry · February 27, 2023, 5:22pm

Hi,

Apologies for the poor explanation. Hopefully I can clarify a bit more now.

here is the format of df2 -

I tried to put df1 also but I cannot put 2 screenshots in as a new user. However, df1 has only two columns. Column 1 has 160 rows with the same information that the column names has for df1, and column two of df1 has mutation types.

df1 has roughly 160 entries of patients while df2 has roughly 60,000+ entries. I need to utilise df2 but it has too much information. I only need the same entries from df1. So basically, if df2 = df1 then extract df2 and put into df3. It's important that I keep all of df2 information on the extracted patients. I hope this explains it a little better.

Thank you

FJCC · February 27, 2023, 6:06pm

You lost me with the above sentences.
Please post the output of

dput(head(df1))

Curry · February 27, 2023, 6:15pm

As you can see df1 only has two columns. The patient data is also not in the same order as in df2

FJCC · February 27, 2023, 6:37pm

Here is a toy example of what I think you want to do. I keep the columns of df2 whose name appears in the Patient column of df1.

df1 <- data.frame(Patient = c("ABC","ERT","HJI"), TP53 = c("MUT","MUT","WT"))
df1
#>   Patient TP53
#> 1     ABC  MUT
#> 2     ERT  MUT
#> 3     HJI   WT

df2 <- data.frame(ID = c("ENSG1","ENSG2","ENSG3","ENSG4"),
                  XCV = 1:4, ERT = 2:5, UIN = 3:6, HJI = 4:7, 
                  TYU = 5:8, ABC = 6:9)
df2
#>      ID XCV ERT UIN HJI TYU ABC
#> 1 ENSG1   1   2   3   4   5   6
#> 2 ENSG2   2   3   4   5   6   7
#> 3 ENSG3   3   4   5   6   7   8
#> 4 ENSG4   4   5   6   7   8   9

df3 <- df2[, c("ID", df1$Patient)]
df3
#>      ID ABC ERT HJI
#> 1 ENSG1   6   2   4
#> 2 ENSG2   7   3   5
#> 3 ENSG3   8   4   6
#> 4 ENSG4   9   5   7

^{Created on 2023-02-27 with reprex v2.0.2}

Curry · February 27, 2023, 8:08pm

That worked wonderfully thank you very much. I was initially trying to do the same thing with an if and for statement. Trying to compare the two and if true then place it into another df. Your method works perfectly thank you.

As you can see I rearranged the column to be the rows and vice versa. However I am now trying to remove the V1,V2,V3.... and have the first row be the header row. I should be able to accomplish this on my own. Many thanks again for your help

Curry · February 27, 2023, 8:10pm

Wrong screenshot my apologies. I suppose now you can see the difference I meant in my previous reply.

FJCC · February 27, 2023, 8:22pm

You can use the pivot_ functions from tidyr to reshape your data.

df1 <- data.frame(Patient = c("ABC","ERT","HJI"), TP53 = c("MUT","MUT","WT"))
df1
#>   Patient TP53
#> 1     ABC  MUT
#> 2     ERT  MUT
#> 3     HJI   WT

df2 <- data.frame(ID = c("ENSG1","ENSG2","ENSG3","ENSG4"),
                  XCV = 1:4, ERT = 2:5, UIN = 3:6, HJI = 4:7, 
                  TYU = 5:8, ABC = 6:9)
df2
#>      ID XCV ERT UIN HJI TYU ABC
#> 1 ENSG1   1   2   3   4   5   6
#> 2 ENSG2   2   3   4   5   6   7
#> 3 ENSG3   3   4   5   6   7   8
#> 4 ENSG4   4   5   6   7   8   9

df3 <- df2[, c("ID", df1$Patient)]
df3
#>      ID ABC ERT HJI
#> 1 ENSG1   6   2   4
#> 2 ENSG2   7   3   5
#> 3 ENSG3   8   4   6
#> 4 ENSG4   9   5   7
library(tidyr)
df3 <- df3 |> pivot_longer(-ID, names_to = "Patient") |> 
  pivot_wider(names_from = "ID")
df3
#> # A tibble: 3 × 5
#>   Patient ENSG1 ENSG2 ENSG3 ENSG4
#>   <chr>   <int> <int> <int> <int>
#> 1 ABC         6     7     8     9
#> 2 ERT         2     3     4     5
#> 3 HJI         4     5     6     7

^{Created on 2023-02-27 with reprex v2.0.2}

Curry · March 4, 2023, 1:49pm

Your help is much appreciated! Thank you it worked perfectly.

system · April 15, 2023, 1:49pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.