Hi everyone,
I am a beginner in R and I have to do a project, please if someone knows about how to combine two data frames with different number of rows and columns.
Thank you, I will appreciate very much.
Regards,
Milagros
Hi everyone,
I am a beginner in R and I have to do a project, please if someone knows about how to combine two data frames with different number of rows and columns.
Thank you, I will appreciate very much.
Regards,
Milagros
If they have a common "key" you can use a "join" function
https://dplyr.tidyverse.org/reference/join.html
If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.
glimpse(df1)
glimpse(df1)
Observations: 10,170
Variables: 5
Patient <int> 1369, 1410, 1156, 663, 1198, 740, 574, 787, 623, 1116, 956, 568, 522, 14… Weight 94.89, 64.56, 119.04, 61.56, 50.42, 59.68, 51.01, 75.96, 101.26, 94.55, …
Height <dbl> 1.59, 1.67, 1.49, 1.86, 1.57, 1.86, 1.84, 1.62, 1.84, 1.46, 1.74, 1.55, … IMC 37.53, 23.15, 53.62, 17.79, 20.46, 17.25, 15.07, 28.94, 29.91, 44.36, 16…
$ Date 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01,…
glimpse(df2)
Observations: 10,167
Variables: 6
Patient <int> 1369, 1410, 1156, 663, 1198, 740, 574, 787, 623, 1116, 956, 568,… Systolic 113, 91, 91, 114, 100, 96, 111, 132, 141, 107, 84, 94, 116, 121,…
Diastolic <int> 93, 87, 58, 73, 60, 59, 87, 56, 82, 75, 55, 76, 63, 51, 63, 74, … AvBloodPressure 93, 110, 92, 121, 99, 92, 109, 133, 83, 91, 90, 121, 109, 131, 8…
HeartRate <int> 109, 99, 93, 62, 61, 106, 53, 101, 78, 93, 56, 110, 109, 100, 55… Date 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 201…
Hi,
Dataframe df1 has 10,170 and df2 has 10,167 with different variables. The only common variables are Patient and Date.
Please, how do I combine these two data frames?
Please read the link I gave you before and try to provide data on a copy/paste friendly format (also if this is real data, please make sure it is anonymized)
#Abrir archivos
BloodPressure <- read.csv(file.choose())
View(BloodPressure)
Glucose <- read.csv(file.choose())
View(Glucose)
Oximetry <- read.csv(file.choose())
View(Oximetry)
Weight_Height <- read.csv(file.choose())
View(Weight_Height)
unique(BloodPressure)
View(BloodPressure)
unique(Weight_Height)
View(Weight_Height)
#Join
dfnewq<-right_join(df1, df2, by = NULL, copy = FALSE, suffix = c("Patient","Date"))
View(dfnewq)
dfnewq<-merge(df1, df2, by = c("Patient","Date"))
View(dfnewq)
na.omit
df11<-na.omit(df1)
df22<-na.omit(df2)
#Boxplot
boxplot(datosWeight$Weight,
main = "Weight",
boxwex = 0.5,col="blue")
#cbind
df<-cbind.data.frame(df1,df2)
View(df)
#Glucose con formato de fecha y ordenado
Glucose$Date<-as.Date(as.character(Glucose$Date), format="%m/%d/%Y")
View(Glucose)
df3<-arrange(Glucose,Patient,Date)
View(df3)
#Oximetry con formato de fecha y ordenado
Oximetry$Date<-as.Date(as.character(Oximetry$Date), format="%m/%d/%Y")
View(Oximetry)
df4<-arrange(Oximetry,Patient,Date)
View(df4)
#Esta integracion no esta bien
df5 = merge(df1, df2, by.x=c("Patient", "Date"))
View(df5)
df5<-unique(df5)
View(df5)
newdf <- df2 %>% right_join(df1, by=c("Patient","Date"))
View(newdf)
m3 <-merge(df1, df2, by.x = "Patient", by.y = "Patient", all.x = FALSE)
View(m3)
m3 <-full_join(df1, df2, by= "Patient")
View(m3)
total <- merge(df1,df2,by=c("Patient","Date"))
View(total)
total <- left_join(df2,df1,by=c("Patient","Date"),all.x=TRUE)
View(total)
df1 is attached below.
Please let me know how to join these both data frames.
All I can see is a screenshot of a dataframe, I can't copy data from that, please read the guide and use a copy/paste friendly format.
Here is the Spanish version in case you prefer
Thank you very much, I am sending data:
datapasta::df_paste(head(df1, 10)[, c("Patient","Date","Weight","Height")])data.frame(
Patient = c(1L,
1L,1L,1L,1L,2L,
2L,2L,2L,2L),
Date = c("2014-02-24",
"2014-11-13","2015-04-30",
"2014-08-29",
"2015-04-30","2015-01-09",
"2015-01-01",
"2015-01-13","2015-01-05",
"2013-03-13"),
Weight = c(47.6,
48.1,49.2,49.7,
76,46.06,46.07,
46.53,46.85,110.53),
Height = c(1.57,
1.57,1.52,1.55,
1.54,1.58,1.58,
1.58,1.58,1.82)
)
datapasta::df_paste(head(df2, 10)[, c("Patient","Date","Systolic","Diastolic","HeartRate")])data.frame(
Patient = c(1L,1L,
1L,1L,2L,2L,2L,
2L,2L,2L),
Date = c("2014-02-24","2014-11-13",
"2015-04-30",
"2014-08-29","2015-01-13",
"2013-03-05",
"2015-01-05",
"2015-01-01","2013-03-01",
"2015-01-09"),
Systolic = c(93L,96L,
98L,108L,80L,
86L,109L,112L,118L,
137L),
Diastolic = c(69L,73L,
74L,76L,50L,86L,
55L,50L,61L,91L),
HeartRate = c(95L,
100L,87L,81L,84L,
106L,80L,107L,59L,
67L)
)
Is this what you are trying to do?
library(dplyr)
# Sample data on a copy/paste friendly format
df1 <- data.frame(
Patient = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
Weight = c(47.6, 48.1, 49.2, 49.7, 76, 46.06, 46.07, 46.53, 46.85, 110.53),
Height = c(1.57, 1.57, 1.52, 1.55, 1.54, 1.58, 1.58, 1.58, 1.58, 1.82),
Date = as.factor(c("2014-02-24",
"2014-11-13","2015-04-30","2014-08-29","2015-04-30",
"2015-01-09","2015-01-01","2015-01-13","2015-01-05",
"2013-03-13"))
)
df2 <- data.frame(
Patient = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
Systolic = c(93L, 96L, 98L, 108L, 80L, 86L, 109L, 112L, 118L, 137L),
Diastolic = c(69L, 73L, 74L, 76L, 50L, 86L, 55L, 50L, 61L, 91L),
HeartRate = c(95L, 100L, 87L, 81L, 84L, 106L, 80L, 107L, 59L, 67L),
Date = as.factor(c("2014-02-24",
"2014-11-13","2015-04-30","2014-08-29","2015-01-13",
"2013-03-05","2015-01-05","2015-01-01","2013-03-01",
"2015-01-09"))
)
df1 %>%
full_join(df2, by = c("Patient", "Date")) %>%
select(Patient, Date, everything())
#> Patient Date Weight Height Systolic Diastolic HeartRate
#> 1 1 2014-02-24 47.60 1.57 93 69 95
#> 2 1 2014-11-13 48.10 1.57 96 73 100
#> 3 1 2015-04-30 49.20 1.52 98 74 87
#> 4 1 2014-08-29 49.70 1.55 108 76 81
#> 5 1 2015-04-30 76.00 1.54 98 74 87
#> 6 2 2015-01-09 46.06 1.58 137 91 67
#> 7 2 2015-01-01 46.07 1.58 112 50 107
#> 8 2 2015-01-13 46.53 1.58 80 50 84
#> 9 2 2015-01-05 46.85 1.58 109 55 80
#> 10 2 2013-03-13 110.53 1.82 NA NA NA
#> 11 2 2013-03-05 NA NA 86 86 106
#> 12 2 2013-03-01 NA NA 118 61 59
Created on 2020-02-01 by the reprex package (v0.3.0)
Hi andresrcs, thank you for your help.
Yes, I've tried with this function but If I apply to the entire data in df1 (10,170 obs.) and df2 (10,167 obs.), I will get in total 25,000 obs.
If I join df1 and df2 I should get in total something about 10,170 obs.
That would not be the case if you have different combinations of Patient and Date in both datasets (like in your sample data) because of the use of full_join()
, this function keeps all combinations and fills missing data with NA
s thus resulting in more rows. If you only want to keep combinations from one particular data frame then you should use left_join()
or right_join()
or if you only want to keep matching combinations among both data frames use inner_join()
(this will result in fewer rows).
Thank you very much Andres
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.