How to subset a dataframe for common column "character" with another dataframe?

DC7 · January 5, 2021, 7:22am

Hi I have two dataframes. Reproducible examples:
dataframe:1


data.frame(
  stringsAsFactors = FALSE,
          Bacteria = c("Acidaminococcus_intestini",
                       "Akkermansia_muciniphila","Alistipes_indistinctus",
                       "Alistipes_inops","Alistipes_putredinis","Alistipes_shahii"),
          estimate = c(-0.366771565396125,
                       0.553631680677047,0.508522403813429,0.374210897216381,
                       0.469470874555095,0.542879858475023),
                se = c(0.172956113013377,
                       0.140510650137267,0.139978598302408,0.139596632663061,
                       0.139801942041036,0.141394831899225),
           country = c("metaanalysis","metaanalysis",
                       "metaanalysis","metaanalysis","metaanalysis",
                       "metaanalysis")
)

dataframe:2

data.frame(
  stringsAsFactors = FALSE,
          Bacteria = c("Acidaminococcus_intestini",
                       "Actinomyces_odontolyticus","Actinomyces_sp_HPA0247",
                       "Actinomyces_sp_ICM47","Adlercreutzia_equolifaciens",
                       "Agathobaculum_butyriciproducens"),
          estimate = c(-0.19433268440639,
                       0.117836468006733,-0.256617985612771,-0.431658784453513,
                       -0.0479365516909252,-0.139033057208901),
                se = c(0.0832977360987506,
                       0.0830681230310939,0.0835678091594368,0.0847262371945899,
                       0.0829567045675912,0.0831204764715223),
           country = c("erp002469","erp002469",
                       "erp002469","erp002469","erp002469","erp002469")
)

The two datasets are different except for some Bacteria names common among the two. Now, I want a subset of the second dataframe where only those Bacteria will be present who are common with the first dataframe.
which means I will have only the first row as an output from the second dataset

                   Bacteria   estimate         se   country 
1 Acidaminococcus_intestini -0.1943327 0.08329774 erp002469

Can anyone please help me?

thanks,
DC

technocrat · January 5, 2021, 7:49am

suppressPackageStartupMessages({
  library(dplyr)
})
df1 <- data.frame(
  stringsAsFactors = FALSE,
  Bacteria = c("Acidaminococcus_intestini",
               "Akkermansia_muciniphila","Alistipes_indistinctus",
               "Alistipes_inops","Alistipes_putredinis","Alistipes_shahii"),
  estimate = c(-0.366771565396125,
               0.553631680677047,0.508522403813429,0.374210897216381,
               0.469470874555095,0.542879858475023),
  se = c(0.172956113013377,
         0.140510650137267,0.139978598302408,0.139596632663061,
         0.139801942041036,0.141394831899225),
  country = c("metaanalysis","metaanalysis",
              "metaanalysis","metaanalysis","metaanalysis",
              "metaanalysis")
)



df2 <- data.frame(
  stringsAsFactors = FALSE,
  Bacteria = c("Acidaminococcus_intestini",
               "Actinomyces_odontolyticus","Actinomyces_sp_HPA0247",
               "Actinomyces_sp_ICM47","Adlercreutzia_equolifaciens",
               "Agathobaculum_butyriciproducens"),
  estimate = c(-0.19433268440639,
               0.117836468006733,-0.256617985612771,-0.431658784453513,
               -0.0479365516909252,-0.139033057208901),
  se = c(0.0832977360987506,
         0.0830681230310939,0.0835678091594368,0.0847262371945899,
         0.0829567045675912,0.0831204764715223),
  country = c("erp002469","erp002469",
              "erp002469","erp002469","erp002469","erp002469")
)

semi_join(df1,df2,by="Bacteria")
#>                    Bacteria   estimate        se      country
#> 1 Acidaminococcus_intestini -0.3667716 0.1729561 metaanalysis

^{Created on 2021-01-04 by the reprex package (v0.3.0.9001)}

DC7 · January 5, 2021, 8:57am

Thanks a lot @technocrat. It works.

system · January 12, 2021, 8:57am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.