How to create new cols based on other and replicate the rows

learning1 · June 8, 2023, 2:58pm

Hello everybody,
I have the following situation. I'am working on time series database and there are data about wastewater plant, days, and gene measure(df_1)

I would like to create a new column (city) based on other (plant). The particularity is that each plant is reflecting the behaviour of one, two, three or more cities (df_2). I have tried to create the column based on another (mutate(city=(case_when ...) but I don't know how to automatically replicate the rows so that plants that join one, two or three cities can be completed

My data set initial:
df1

The dataset that I would like to create with the column "corresponding city":
df2

I will try to simplify my initial and final database (df_1 and df_2) with the following example.
Thank yoy for your help

#barcelona with barcelona
#madrid with madrid, toledo
#valencia with valencia, alboraya, altea

df_1<-data.frame(plant= c("barcelona", "madrid", "valencia",
                    "barcelona", "madrid", "valencia",
                    "barcelona", "madrid", "valencia"),
           day= c(1,1,1,
                  2,2,2,
                  3,3,3),
           measure= c(10, 15, 18,
                      12, 14, 13,
                      21, 25, 10))
df_1

#Dataset that I want to create.
df_2<-data.frame(plant= c("barcelona", "madrid", "madrid", "valencia","valencia","valencia",
                    "barcelona", "madrid", "madrid", "valencia","valencia","valencia",
                    "barcelona", "madrid", "madrid", "valencia","valencia","valencia"),
           day= c(1,1,1,1,1,1,
                  2,2,2,2,2,2,
                  3,3,3,3,3,3),
           
           measure= c(10, 15, 15, 18, 18, 18,
                      12, 14, 14, 13, 13, 13,
                      21, 25, 25, 10, 10, 10),
           
           corresponding_city= c("barcelona", "madrid", "toledo", "valencia", "alboraya","altea",
                                 "barcelona", "madrid", "toledo", "valencia", "alboraya","altea",
                                 "barcelona", "madrid", "toledo", "valencia", "alboraya","altea"))
df_2

nirgrahamuk · June 8, 2023, 3:36pm

#barcelona with barcelona
#madrid with madrid, toledo
#valencia with valencia, alboraya, altea

df_1<-data.frame(plant= c("barcelona", "madrid", "valencia",
                          "barcelona", "madrid", "valencia",
                          "barcelona", "madrid", "valencia"),
                 day= c(1,1,1,
                        2,2,2,
                        3,3,3),
                 measure= c(10, 15, 18,
                            12, 14, 13,
                            21, 25, 10))
df_1


(lkup <- data.frame(
  plant = c("barcelona",rep("madrid",2),rep("valencia",3)),
  corresponding_city = c("barcelona","madrid","toledo","valencia","alboraya","altea")))

# base r
(br_1 <- merge(df_1,lkup))
(br_2 <- br_1[order(br_1$day,br_1$plant),])

#dplyr 
(require(dplyr))
(dp_1 <- inner_join(df_1,lkup, relationship="many-to-many"))
(dp_2 <- arrange(dp_1,day,plant))

learning1 · June 12, 2023, 9:07am

Thank you for your help

system · June 19, 2023, 9:08am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.