Hello once again Posit Community.
I have some data that has unfortunately come to me as rather ugly character variables that I need to convert into a dataframe. I think I have something of a solution, but I would like to see if it's possible without resorting to a loop and if the presence of lists are going to be an issue.
First, the data looks something like this:
df <- data.frame(
Participant = c('Greg', 'Greg', 'Donna', 'Donna','Johnathan','Johnathan','Lewis','Lewis'),
Rating = c("[(4, 0.04), (5, 0.05)]", "[(1, 0.01), (2, 0.02), (3, 0.3), (4, 0.04), (5, 0.05)]", "[(4, 0.04), (5, 0.05)]", "[(2, 0.02), (3, 0.3), (4, 0.04)]", "[(1, 0.01), (4, 0.04), (5, 0.05)]", "[(3, 0.3), (4, 0.04)]", "[(3, 0.3), (4, 0.04), (5, 0.05)]", "[(4, 0.04), (5, 0.05)]")
)
Participant | Rating |
---|---|
Greg | [(4, 0.04), (5, 0.05)] |
Greg | [(1, 0.01), (2, 0.02), (3, 0.3), (4, 0.04), (5, 0.05)] |
Donna | [(4, 0.04), (5, 0.05)] |
Donna | [(2, 0.02), (3, 0.3), (4, 0.04)] |
Jonathan | [(1, 0.01), (4, 0.04), (5, 0.05)] |
Jonathan | [(3, 0.3), (4, 0.04)] |
Lewis | [(3, 0.3), (4, 0.04), (5, 0.05)] |
Lewis | [(4, 0.04), (5, 0.05)] |
And it needs to look more like the following:
Participant | Rating_01 | Rating_02 | Rating_03 | Rating_04 | Rating_05 |
---|---|---|---|---|---|
Greg | 0 | 0 | 0 | 0.04 | 0.05 |
Greg | 0.01 | 0.02 | 0.3 | 0.04 | 0.05 |
Donna | 0 | 0 | 0 | 0.04 | 0.05 |
Donna | 0 | 0.02 | 0.3 | 0.04 | 0 |
Jonathan | 0.01 | 0 | 0 | 0.04 | 0.05 |
Jonathan | 0 | 0 | 0.3 | 0.04 | 0 |
Lewis | 0 | 0 | 0.3 | 0.04 | 0.05 |
Lewis | 0 | 0 | 0 | 0.04 | 0.05 |
I have written a small function to split the data and assign names:
ratersplit<- function(x) {
cleaner <- str_replace_all(x,c(`\\[` = "", `\\]` = "", `\\(` = "", `\\)`=""))
len <- length(unlist(strsplit(cleaner,", ")))
nameout <- paste0("Rating_",str_pad(unlist(strsplit(cleaner,", "))[c(TRUE,FALSE)], 2, pad = "0"))
varout <- as.numeric(unlist(strsplit(cleaner,", "))[c(FALSE,TRUE)])
return(list(ratcols = nameout, ratvars = varout))
}
Which returns the names and values of any given row that it is given:
> ratersplit(df$Rating[1])
$ratcols
[1] "Rating_04" "Rating_05"
$ratvars
[1] 0.04 0.05
And appended and filled the empty columns with 0's so that they can be "filled" easier:
df[, setdiff((paste0("Rating_",str_pad(1:5,2,pad = "0"))),names(df))] <- 0
Participant | Rating | Rating_01 | Rating_02 | Rating_03 | Rating_04 | Rating_05 |
---|---|---|---|---|---|---|
Greg | [(4, 0.04), (5, 0.05)] | 0 | 0 | 0 | 0 | 0 |
Greg | [(1, 0.01), (2, 0.02), (3, 0.3), (4, 0.04), (5, 0.05)] | 0 | 0 | 0 | 0 | 0 |
Donna | [(4, 0.04), (5, 0.05)] | 0 | 0 | 0 | 0 | 0 |
Donna | [(2, 0.02), (3, 0.3), (4, 0.04)] | 0 | 0 | 0 | 0 | 0 |
Jonathan | [(1, 0.01), (4, 0.04), (5, 0.05)] | 0 | 0 | 0 | 0 | 0 |
Jonathan | [(3, 0.3), (4, 0.04)] | 0 | 0 | 0 | 0 | 0 |
Lewis | [(3, 0.3), (4, 0.04), (5, 0.05)] | 0 | 0 | 0 | 0 | 0 |
Lewis | [(4, 0.04), (5, 0.05)] | 0 | 0 | 0 | 0 | 0 |
But I can't think of a way to utilize rowwise()
to with mutate_at(c(x$ratcols) = x$ratvals)
and fill the corresponding values that might not just error or fill the columns with the lists instead of the raw values.
And I fear that using a loop will have the same problem.
Would happily welcome any suggestions you would be willing to offer.
Thank you in advance!