Hello,
I want to compare df_a
against df_b
and determine the overlap. The name identifies the product while the value determines the number you have. Summing the values add to 75 each. I want to calculate the overlap between df_a and df_b but it needs to be weighted as even though both sets contain e
the value might be 2 vs 3 (which means that overlap is not 100% identical). Where both sets contain say f
and the value is in both cases 4
it means that the overlap there is identical.
Is there a quick way to measure overlap or a package or something that can help? Do note that both sets do not contain all the elements nor are of the same length.
df_a <-
data.frame(
stringsAsFactors = FALSE,
name = c("a","b","c","d","e",
"f","g","h","i","j","k","l","m","n","o","p",
"q","r","s","t","u","v","w","x","y","z","aa",
"bb","cc"),
value = c(1,4,4,4,1,1,1,
1,1,2,4,1,3,2,3,3,4,2,2,4,4,
3,4,4,1,3,3,2,3)
)
df_b <-
data.frame(
stringsAsFactors = FALSE,
name = c("e","f","g","h","i","j",
"k","l","m","n","o","p","q","r","s","t","u","v",
"w","x","y","z","aa","bb","cc","dd","ee"),
value = c(1,4,4,4,1,4,1,4,
1,2,3,1,3,2,3,4,1,2,2,3,4,3,
4,4,4,3,3)
)
sum(df_a$value)
#> [1] 75
sum(df_b$value)
#> [1] 75
Created on 2021-04-24 by the reprex package (v0.3.0)