merging datasets where observation appears more than once

Erica · May 29, 2020, 6:42pm

Hi

I want to merge one secondary df into my core df and preserve the format of my core df, creating a new column where the data for each observation is separated by a comma, and leaving blank cells in the new column for observations of the core df that don't appear in the secondary one

ie.

# core data set
obs_names <- c("county 1", "county 2", "county 3", "county 4")
col1 <- c("value 1", "value 2", "value 3", "value 4")
df1 <- data.frame(obs_names, col1)

#secondary data set 

obs_names <- c("area 1", "area 2", "area 3")
col1 <- c("county 1", "county 2", "county 1")
df2 <- data.frame(obs_names, col1)

how can I merge to create something like:
New Dataset

      names      col1           col2
      county 1  value 1       area 1, area 3
      county 2  value 2       area 2
      county 3  value 3   
      county 4  value 4

phil_hummel · May 29, 2020, 7:23pm

I would advise against having multiple values in one column, it makes it really difficult to do statistics, plotting, and just about everything else from here on out. Would an end result that is "tidy" meet your needs?

 names    col1   col2
1 country 1 value 1 area 1
2 country 1 value 1 area 3
3  county 2 value 2 area 2
4  county 3 value 3   <NA>
5  county 4 value 4   <NA>

If you have two existing data frames this can be produced with join in dplyr. If the data is in text, you can reorganize it to build a tidy data frame from the start.

system · June 19, 2020, 7:23pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.