How to subset a data frame that the households that have member of a specific ethnic group?

Shadan · May 20, 2023, 3:49am

There is a data frame I want to subset the households that have member of specific ethnic group.
There are five main variables that need to consider.

Variables (Number, Number_2, Number_3) are represent the households and have to group them.
variable Ethnic represent the ethnic group.
** variable PERSNUM** represent the family or household member. e.g 1 is for husband, 2 is for wife, 3 is for first child, 4 is for second child , 5 is for father, 6 is for mother and so on.

I want to subset the data that households have the 11 ethnic group member. It could have one member or two and more or all of member of ethnic 11.

here is the sample of data:

df <- data.frame(
        Village = c(rep("1", "30")),
        Number = c(33,  33, 33, 33, 33, 33, 33, 1,  1,  30, 30, 30, 30, 30, 30, 30,
                   31,  31, 31, 31, 36, 36, 36, 36, 62, 62, 62, 62, 69, 69),
        Number_1 = c(183,   183,    183,    183,    183,    183,    183,    151,    151,    255,    255,    255,    255,    255,    255,
                     255,   31, 31, 31, 31, 111,    111,    111, 111, 287, 287, 287,287, 219, 219),
        Number_3 = c(137,   137,    137,    137,    137,    137,    137,    113,    113,    191,    191,    191,    191,    191,    191,
                     191,   23, 23, 23, 23, 83, 83, 83, 83, 215, 215, 215, 215, 164, 164),
        PERSNUM = c(1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  1,  2,  3,  4,  5,  6,
                    1,  2,  3,  1,  2,  3,  4, 5,  1, 2, 3, 4, 1, 2),
        Ethnic= c(33,  33, 33, 33, 33, 33, 33, 1,  1,  1,  1,  1,  1,  0,  11,
                       11,  11, 11, 11, 11, 0,  0,  11, 11, 11, 11, 11, 11, 11, 11))

FJCC · May 20, 2023, 4:11am

I assumed that a household is defined by the values of the columns Village, Number, Number_1, and Number_3. Does this give you what you want?

df <- data.frame(
  Village = c(rep("1", "30")),
  Number = c(33,  33, 33, 33, 33, 33, 33, 1,  1,  30, 30, 30, 30, 30, 30, 30,
             31,  31, 31, 31, 36, 36, 36, 36, 62, 62, 62, 62, 69, 69),
  Number_1 = c(183,   183,    183,    183,    183,    183,    183,    151,    151,    255,    255,    255,    255,    255,    255,
               255,   31, 31, 31, 31, 111,    111,    111, 111, 287, 287, 287,287, 219, 219),
  Number_3 = c(137,   137,    137,    137,    137,    137,    137,    113,    113,    191,    191,    191,    191,    191,    191,
               191,   23, 23, 23, 23, 83, 83, 83, 83, 215, 215, 215, 215, 164, 164),
  PERSNUM = c(1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  1,  2,  3,  4,  5,  6,
              1,  2,  3,  1,  2,  3,  4, 5,  1, 2, 3, 4, 1, 2),
  Ethnic= c(33,  33, 33, 33, 33, 33, 33, 1,  1,  1,  1,  1,  1,  0,  11,
            11,  11, 11, 11, 11, 0,  0,  11, 11, 11, 11, 11, 11, 11, 11)) 
library(dplyr)

Eth11 <- df |> filter(Ethnic == 11) |> select(-PERSNUM, -Ethnic) |> distinct()
Eth11
#>   Village Number Number_1 Number_3
#> 1       1     30      255      191
#> 2       1     31       31       23
#> 3       1     36      111       83
#> 4       1     62      287      215
#> 5       1     69      219      164
df_Filtered <- semi_join(df, Eth11, by = c("Village", "Number","Number_1", "Number_3"))
df_Filtered
#>    Village Number Number_1 Number_3 PERSNUM Ethnic
#> 1        1     30      255      191       3      1
#> 2        1     30      255      191       1      1
#> 3        1     30      255      191       2      1
#> 4        1     30      255      191       3      1
#> 5        1     30      255      191       4      0
#> 6        1     30      255      191       5     11
#> 7        1     30      255      191       6     11
#> 8        1     31       31       23       1     11
#> 9        1     31       31       23       2     11
#> 10       1     31       31       23       3     11
#> 11       1     31       31       23       1     11
#> 12       1     36      111       83       2      0
#> 13       1     36      111       83       3      0
#> 14       1     36      111       83       4     11
#> 15       1     36      111       83       5     11
#> 16       1     62      287      215       1     11
#> 17       1     62      287      215       2     11
#> 18       1     62      287      215       3     11
#> 19       1     62      287      215       4     11
#> 20       1     69      219      164       1     11
#> 21       1     69      219      164       2     11

^{Created on 2023-05-19 with reprex v2.0.2}

Shadan · May 20, 2023, 9:26am

Thank you @FJCC It has worked.

Shadan · May 22, 2023, 9:05am

@FJCC , how to mutate a new variable when a household members are one Ethnic?
if all household members are one ethnic should be yes otherwise no.
Thank you

FJCC · May 22, 2023, 2:46pm

I think this does what you want.

library(dplyr)
df <- df |> group_by(Village, Number, Number_1, Number_3) |> 
  mutate(SingleGrp = ifelse(length(unique(Ethnic)) == 1, "Yes","No"))

Shadan · May 22, 2023, 4:30pm

It looks correct but when I did run I got jut No. But there must be yes too. I don't know why.
Thank you,

FJCC · May 22, 2023, 4:35pm

Using the data originally posted, I get a mix of Yes and No.

df <- data.frame(
  Village = c(rep("1", "30")),
  Number = c(33,  33, 33, 33, 33, 33, 33, 1,  1,  30, 30, 30, 30, 30, 30, 30,
             31,  31, 31, 31, 36, 36, 36, 36, 62, 62, 62, 62, 69, 69),
  Number_1 = c(183,   183,    183,    183,    183,    183,    183,    151,    151,    255,    255,    255,    255,    255,    255,
               255,   31, 31, 31, 31, 111,    111,    111, 111, 287, 287, 287,287, 219, 219),
  Number_3 = c(137,   137,    137,    137,    137,    137,    137,    113,    113,    191,    191,    191,    191,    191,    191,
               191,   23, 23, 23, 23, 83, 83, 83, 83, 215, 215, 215, 215, 164, 164),
  PERSNUM = c(1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  1,  2,  3,  4,  5,  6,
              1,  2,  3,  1,  2,  3,  4, 5,  1, 2, 3, 4, 1, 2),
  Ethnic= c(33,  33, 33, 33, 33, 33, 33, 1,  1,  1,  1,  1,  1,  0,  11,
            11,  11, 11, 11, 11, 0,  0,  11, 11, 11, 11, 11, 11, 11, 11)) 
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- df |> group_by(Village, Number, Number_1, Number_3) |> 
  mutate(SingleGrp = ifelse(length(unique(Ethnic)) == 1, "Yes","No"))

table(df$SingleGrp)
#> 
#>  No Yes 
#>  11  19

^{Created on 2023-05-22 with reprex v2.0.2}

Shadan · May 23, 2023, 4:30am

@FJCC , Thank you, There was something in my computer when I restarted it has fixed.

system · May 30, 2023, 4:31am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.