weighted mean with NA in weight vector

hendrixl114 · September 14, 2021, 5:43pm

Hello,
I need to obtain weighted means for a set of subjects with multiple measures over time. I am running a MAIC (Matched Adjusted Indirect Comparison) and am weighting my individual patient data with aggregate data from another study. That isn't really important though, the main thing is that I have missing data at each time point and the weighting won't work if there are NA's.

The MAIC package creates a vector with weights for each subject. All have baseline data, so I've used this 'full' dataset to create the weights, so that all subjects have a weight (wtr_stsbltx). This works fine. I have 679 subjects and the weight vector length is 679.

The problem comes when I try to apply the vector of weights (n=679) to a set of subjects with missing data for a subsequent time point. I get NAs for the means and SDs:

wk2ststx.adj=data.frame(noeds1tx %>%
mutate(wtr_stsbltx) %>%
summarise(N2 = sum(wtr_stsbltx) ,
STS_week2.mean = weighted.mean(STS_week2, wtr_stsbltx) ,
STS_week2.sd = sqrt(sum(wtr_stsbltx / sum(wtr_stsbltx) * (STS_week2 - STS_week2.mean)^2)),
WEEK_2_CFB_STS.mean = weighted.mean(WEEK_2_CFB_STS, wtr_stsbltx),
WEEK_2_CFB_STS.se = sqrt((blststx.adj$BASELINE_STS.sd^2/blststx.adj$NBL)+(STS_week2.sd^2/sum(wtr_stsbltx))),
))
my solution has been to get separate weights for each time point so that I don't have to deal with missing data. however this now seems wrong to me as I get different weights for the same subjects at different time points.

What I need to be able to do is locate the weights for the missing subjects and remove them. or if there is another way of thinking about this I'd be happy to hear it.

Many thanks in advance!

HanOostdijk · September 14, 2021, 7:07pm

Please provide a small example of your case with e.g. 3 patients and 4 time-points that illustrates your case.
See reprex for how this could be done.

And probably you are aware that functions like sum and mean have an argument that can exclude NA values.
But I don't know if that would help in your case. Therefore my request for a reprex.

startz · September 14, 2021, 7:12pm

weighted.mean() does have an na.rm = TRUE argument as an option.

hendrixl114 · September 15, 2021, 6:39pm

I haven't been able to get it to work for all of the calculations I need done. It works for simple weighted means but I haven't been able to figure out the syntax for more complex calculations. Can you tell me the syntax for where to put the na.rm for each of the following (not weighted.mean) calculations please?

wk2ststx.adj=data.frame(wk2ststx %>%
mutate(wtr_ststxwk2) %>%
summarise(N2 = sum(wtr_ststxwk2) ,
STS_week2.mean = weighted.mean(STS_week2, wtr_ststxwk2, na.rm=TRUE) ,
STS_week2.sd = sqrt(sum(wtr_ststxwk2 / sum(wtr_ststxwk2) * (STS_week2 - STS_week2.mean)^2)),
WEEK_2_CFB_STS.mean = weighted.mean(WEEK_2_CFB_STS, wtr_ststxwk2, na.rm=TRUE),
WEEK_2_CFB_STS.se = sqrt((blststx.adj$BASELINE_STS.sd^2/blststx.adj$NBL)+(STS_week2.sd^2/sum(wtr_ststxwk2))),
WEEK_2_CFB_STS.sd = sqrt(sum(wtr_ststxwk2 / sum(wtr_ststxwk2) * (WEEK_2_CFB_STS- WEEK_2_CFB_STS.mean)^2))

))

startz · September 15, 2021, 7:11pm

Not really sure what you want to do, but sum() also has the na.rm option. sqrt() and ^ propagate NAs.

hendrixl114 · September 15, 2021, 7:24pm

Here is a bit of data:
'CFB'=change from baseline.
I want to obtain the weighted mean and SD for each time point, then calculate the weighted mean change and SE from baseline at each time point. When I run the code now, if any subject has an NA at a time point an NA is returned instead of the mean/SE. I'd like to either find a way to remove the NA from the correct position in the vector of weights corresponding to the position of the NAs or make the functions ignore the NA. or if someone has an entirely different way of approaching this problem I would love to hear it!

tribble(
~id, ~BASELINE, ~DAY_1_CFB, ~WEEK_2_CFB, ~WEEK_4_CFB, ~WEEK_8_CFB, ~WEEK_12_CFB,
2 , 3 , 5 , 7 , 12 , NA, NA,
44 , 5 , -1 , 4 , 0 , 1 , 0 ,
429 , 3 , 4 , NA, 5 , 4 , NA)

weights=as.vector(c(0.7736762, 0.6889595, 0.8251115, 0.7411247, 0.8452947, 0.8750179, 1.1460260,
1.0778067, 0.8804010, 0.8923645, 0.9545200, 0.7158591, 0.8908700, 0.8867936,
0.9149441, 1.7827734, 0.9617388, 0.9031469, 0.9069772, 0.9239883, 0.9024339))

Does this help?

startz · September 15, 2021, 8:03pm

I'm not sure this will help, but

 z <- c(1,2,NA)
> z
[1]  1  2 NA
> z[!is.na(z)]
[1] 1 2

This removes NAs, but one would want to take care that the same positions are removed from all your variables.

hendrixl114 · September 16, 2021, 3:57pm

That is helpful! I think that might be exactly what I need, I will give it a try! Thank you!

system · October 7, 2021, 3:57pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.