Linear Regression on Gini Coefficient and Other Bounded Dependant Variables

Yarnabrina · July 23, 2019, 2:16pm

I'd like to refer to one more related thread here, where I suggested my approach (I haven't searched earlier, but now it seems similar to the links I shared below). Actually, this is where Richard pointed out that if sample values are exactly on the boundaries, we'll be in problem. Of course, he is right, but it happens with probability 0. But I understand that the model should be foolproof, and hence there must be a better way.

Regarding linear regression, I think the main problem is that one can't guarantee that the predictions will be in a certain range, as Fernando pointed out. I haven't tried to reproduce it, but it's a line, right? So, unless it's parallel to x-axis (which does not happen, and probability of happening is 0), it'll cross the bound in the y-axis certainly.

[You may have covered this point in the last part of the thread, I didn't really follow it. I didn't understand how slope terminates somewhere. What do you mean by \ni? I didn't get which set contains what x? (Or, is it \exists?)]

Also, it seems to me that the assumptions of linear regression may not be satisfied in this case. Also, the assumption of homoscedasticity is probably not going to be satisfied. As @whuber suggested in the shared link below, the random components corresponding to the Gini coefficients in the boundary region (both 0 and 1 end) are unlikely to have same extent of variation as the ones corresponding to Gini values in the middle region (close to 0.5).

Here are the two relevant links:

https://stats.idre.ucla.edu/stata/faq/how-does-one-do-regression-when-the-dependent-variable-is-a-proportion/