I am trying to attach demographics
data (full_name and personality) to a dataset
of texts written by individuals (reprex provided below). There are several issues:
- In the
dataset
, whenever there are two people with the same name in the same year, both names (in columnname
) include differentiating last name initials (e.g., Adam A. and Adam B.)
However, indemographics
, when such duplicates exist, the first name does not have their surname added as an initial (e.g., Adam and Adam B.) - To standardize, I created aninitials
column that added surname initials to everyone, with a view to aiding the join later. - It is also possible for the same person to appear in different years, but if there is no first name duplicate in that year, there will be no initials (e.g., Laura C. in year 3 vs just Laura in year 6)
- Possible also to have a different person with same initials (Adam Apple vs Adam Another -- both Adam in
name
and Adam A. ininitials
. The only way we tell them apart is by year)
This is what I am trying to achieve by joining demographics
to dataset
:
Note: typo for name
in second last row -- should be Claudia J. instead of just Claudia
I thought about doing multiple left joins, first for the non-initial names and then by = c("name" = "name"))
and then for those with same name in same year and thus have initials by = c("name" = "initials"))
but I got very funky results and different nrows.
Reprex data:
dataset <- tibble::tribble(
~year, ~name, ~text,
3L, "Adam A.", "fubar",
3L, "Adam B.", "asdsdasd",
3L, "Laura B.", "blah",
3L, "Laura C.", "brown hairball",
3L, "Laura C.", "black hairball",
3L, "John", "quick brown fox",
3L, "Zeke", "over lazy dog",
6L, "Adam", "different person same initials",
6L, "Jack", "birds are cool",
6L, "Laura", "appear again",
6L, "Claudia J.", "foo",
6L, "Claudia M.", "bar"
)
# initials field is created
demographics <- tibble::tribble(
~year, ~full_name, ~name, ~initials, ~personality,
3L, "John Green", "John", "John G.", "INTP",
3L, "Adam Apple", "Adam", "Adam A.", "INTJ",
3L, "Adam Banana", "Adam B.", "Adam B.", "ESFJ",
3L, "Laura Bosch", "Laura", "Laura B.", "ESFJ",
3L, "Laura Caley", "Laura C.", "Laura C.", "ISFP",
3L, "Zeke Wong", "Zeke", "Zeke W.", "ENFP",
6L, "Adam Another", "Adam", "Adam A.", "ESTP",
6L, "Jack Sparrow", "Jack", "Jack S.", "INTJ",
6L, "Laura Caley", "Laura", "Laura C.", "ISFP",
6L, "Abi-Maria", "Abi-Maria", "Abi-Maria", "ENFJ",
6L, "Douglas Orange", "Douglas", "Douglas O.", "ISFJ",
6L, "Claudia Jane", "Claudia", "Claudia J.", "ISFP",
6L, "Claudia Miley", "Claudia M.", "Claudia M.", "INFP"
)
Any help is greatly appreciated!