Missing data - Margin plot interpretation

neisse · November 2, 2018, 11:14pm

I've been having my doubts on the topic of interpretation of margin plots for missing data.

Package VIM provides with the margin function that outputs the following graph:

MarginPlot

Basically it displays the distribution of one variable's missing data in the other variable (Red) and the distribution of non-missing data as well (blue).

My current decision rule is the following:

If the distribution of missing data matches the distributions of non-missing data it means that I can use one variable to input the other.
However, if the distributions are different (Var2 in the image), I could not use the other variable (Var1) to input values of the first (Var1). If I did so I would be extrapolating predictions.
If the missing data in Var2 is also Missing in Var1 (as it is on the image) I can't use Var2 to input Var1 because the missing data could be MNAR, and in such case, I would be over (or under) estimating the inputted values.

I'm concerned as to whether these decision rules are precise or not.

englianhu · November 3, 2018, 4:04am

You might try to refer to mine (although the study in Chinese langauge, but the coding in English and output in numeric). I use imputeTS, tidyr and Amelia package.