How to subset a data frame that the households that have member of a specific ethnic group?

There is a data frame I want to subset the households that have member of specific ethnic group.
There are five main variables that need to consider.

  1. Variables (Number, Number_2, Number_3) are represent the households and have to group them.
  2. variable Ethnic represent the ethnic group.
  3. ** variable PERSNUM** represent the family or household member. e.g 1 is for husband, 2 is for wife, 3 is for first child, 4 is for second child , 5 is for father, 6 is for mother and so on.

I want to subset the data that households have the 11 ethnic group member. It could have one member or two and more or all of member of ethnic 11.

here is the sample of data:

df <- data.frame(
        Village = c(rep("1", "30")),
        Number = c(33,  33, 33, 33, 33, 33, 33, 1,  1,  30, 30, 30, 30, 30, 30, 30,
                   31,  31, 31, 31, 36, 36, 36, 36, 62, 62, 62, 62, 69, 69),
        Number_1 = c(183,   183,    183,    183,    183,    183,    183,    151,    151,    255,    255,    255,    255,    255,    255,
                     255,   31, 31, 31, 31, 111,    111,    111, 111, 287, 287, 287,287, 219, 219),
        Number_3 = c(137,   137,    137,    137,    137,    137,    137,    113,    113,    191,    191,    191,    191,    191,    191,
                     191,   23, 23, 23, 23, 83, 83, 83, 83, 215, 215, 215, 215, 164, 164),
        PERSNUM = c(1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  1,  2,  3,  4,  5,  6,
                    1,  2,  3,  1,  2,  3,  4, 5,  1, 2, 3, 4, 1, 2),
        Ethnic= c(33,  33, 33, 33, 33, 33, 33, 1,  1,  1,  1,  1,  1,  0,  11,
                       11,  11, 11, 11, 11, 0,  0,  11, 11, 11, 11, 11, 11, 11, 11)) 

I assumed that a household is defined by the values of the columns Village, Number, Number_1, and Number_3. Does this give you what you want?

df <- data.frame(
  Village = c(rep("1", "30")),
  Number = c(33,  33, 33, 33, 33, 33, 33, 1,  1,  30, 30, 30, 30, 30, 30, 30,
             31,  31, 31, 31, 36, 36, 36, 36, 62, 62, 62, 62, 69, 69),
  Number_1 = c(183,   183,    183,    183,    183,    183,    183,    151,    151,    255,    255,    255,    255,    255,    255,
               255,   31, 31, 31, 31, 111,    111,    111, 111, 287, 287, 287,287, 219, 219),
  Number_3 = c(137,   137,    137,    137,    137,    137,    137,    113,    113,    191,    191,    191,    191,    191,    191,
               191,   23, 23, 23, 23, 83, 83, 83, 83, 215, 215, 215, 215, 164, 164),
  PERSNUM = c(1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  1,  2,  3,  4,  5,  6,
              1,  2,  3,  1,  2,  3,  4, 5,  1, 2, 3, 4, 1, 2),
  Ethnic= c(33,  33, 33, 33, 33, 33, 33, 1,  1,  1,  1,  1,  1,  0,  11,
            11,  11, 11, 11, 11, 0,  0,  11, 11, 11, 11, 11, 11, 11, 11)) 
library(dplyr)

Eth11 <- df |> filter(Ethnic == 11) |> select(-PERSNUM, -Ethnic) |> distinct()
Eth11
#>   Village Number Number_1 Number_3
#> 1       1     30      255      191
#> 2       1     31       31       23
#> 3       1     36      111       83
#> 4       1     62      287      215
#> 5       1     69      219      164
df_Filtered <- semi_join(df, Eth11, by = c("Village", "Number","Number_1", "Number_3"))
df_Filtered
#>    Village Number Number_1 Number_3 PERSNUM Ethnic
#> 1        1     30      255      191       3      1
#> 2        1     30      255      191       1      1
#> 3        1     30      255      191       2      1
#> 4        1     30      255      191       3      1
#> 5        1     30      255      191       4      0
#> 6        1     30      255      191       5     11
#> 7        1     30      255      191       6     11
#> 8        1     31       31       23       1     11
#> 9        1     31       31       23       2     11
#> 10       1     31       31       23       3     11
#> 11       1     31       31       23       1     11
#> 12       1     36      111       83       2      0
#> 13       1     36      111       83       3      0
#> 14       1     36      111       83       4     11
#> 15       1     36      111       83       5     11
#> 16       1     62      287      215       1     11
#> 17       1     62      287      215       2     11
#> 18       1     62      287      215       3     11
#> 19       1     62      287      215       4     11
#> 20       1     69      219      164       1     11
#> 21       1     69      219      164       2     11

Created on 2023-05-19 with reprex v2.0.2

1 Like

Thank you @FJCC It has worked.

@FJCC , how to mutate a new variable when a household members are one Ethnic?
if all household members are one ethnic should be yes otherwise no.
Thank you

I think this does what you want.

library(dplyr)
df <- df |> group_by(Village, Number, Number_1, Number_3) |> 
  mutate(SingleGrp = ifelse(length(unique(Ethnic)) == 1, "Yes","No"))

It looks correct but when I did run I got jut No. But there must be yes too. I don't know why.
Thank you,

Using the data originally posted, I get a mix of Yes and No.

df <- data.frame(
  Village = c(rep("1", "30")),
  Number = c(33,  33, 33, 33, 33, 33, 33, 1,  1,  30, 30, 30, 30, 30, 30, 30,
             31,  31, 31, 31, 36, 36, 36, 36, 62, 62, 62, 62, 69, 69),
  Number_1 = c(183,   183,    183,    183,    183,    183,    183,    151,    151,    255,    255,    255,    255,    255,    255,
               255,   31, 31, 31, 31, 111,    111,    111, 111, 287, 287, 287,287, 219, 219),
  Number_3 = c(137,   137,    137,    137,    137,    137,    137,    113,    113,    191,    191,    191,    191,    191,    191,
               191,   23, 23, 23, 23, 83, 83, 83, 83, 215, 215, 215, 215, 164, 164),
  PERSNUM = c(1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  1,  2,  3,  4,  5,  6,
              1,  2,  3,  1,  2,  3,  4, 5,  1, 2, 3, 4, 1, 2),
  Ethnic= c(33,  33, 33, 33, 33, 33, 33, 1,  1,  1,  1,  1,  1,  0,  11,
            11,  11, 11, 11, 11, 0,  0,  11, 11, 11, 11, 11, 11, 11, 11)) 
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- df |> group_by(Village, Number, Number_1, Number_3) |> 
  mutate(SingleGrp = ifelse(length(unique(Ethnic)) == 1, "Yes","No"))

table(df$SingleGrp)
#> 
#>  No Yes 
#>  11  19

Created on 2023-05-22 with reprex v2.0.2

@FJCC , Thank you, There was something in my computer when I restarted it has fixed.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.