Log transform a moderate positvely skewed independent variable in lme

Jolien · September 9, 2021, 1:29pm

Hi there all!

I am running into trouble. I did an association model, but now that i look to one of the independent variables (meandailylegactivitystepsperday) more closely, i can definitely see a (moderate?) positive skew.. I am trying to transform that using log() in the model, the same way i did do this for a positively skewed dependent variable (ketosis).

I've read online that independent variables do not need to have a normal distribution for a linear model. However, another study excecuted by my teacher using partly the same dataset and using this same meandailylegactivitystepsperday did also notice the positively skewed data and did logtransform this in the models.

So i feel that i'll have to do this too but i can't seem to work it out (double () or just single () gave different errors)

Could someone help me out? Thanks in advance!

startz · September 9, 2021, 1:35pm

You are correct that there is no statistical reason to transform the independent variable. Of course, either linear or log might be a better functional form. For that matter, if you have enough data you can try including both.

Jolien · September 9, 2021, 1:38pm

Hi there startz! Thanks for taking the time for your quick reply!!

By including both: you mean two models, one with log('steps....') and one with just 'steps' i assume? So then run the same models twice?

And then: can you notice something wrong with my log() transformation? I honestly don't see it..

Thanks in advance!

startz · September 9, 2021, 1:41pm

You can run two separate regressions, but decide in advance how you will decide which one is better. (If you do this, be sure that the two regressions run on identical observations.) But I was actually suggested including both the lines and log term as independent variables in one regression and seeing if it's obvious which one "works."

As a guess, does steps sometimes equal zero? That would be a problem for taking logs of course.

Jolien · September 9, 2021, 1:59pm

Whoaaa you're amazing! Yes 'steps' contains 5 zeros! That it should't have obviously but allright! Thanks so much i am gonna try to delete the 0 rows and then see what happens!

Exciteeeed this might work!

These are a lot of exclamation marks but you'll hopefully notice my gratitude.

As for the other part, i don't know how to put two the same independent variables in the model and then see which one works. However, i was thinking of making two models for log(ketosis) and steps. One being log(ketosis) ~ steps (-3, -2 and -1), and the other being log(ketosis) ~ log(steps) (-3 -2 and -1). And then using AIC to check which one is better?

Does that make as much sense as i think it does or is this the misinterpretation of someone who is educated for figuring out what is wrong with animals instead of data?

Then: if i can use the AIC scores, can i just do one comparison (one with log(steps) and one with normal steps) and say: hey, AIC score for (for example) normal steps is lower than the model with log(steps), so based on this i can say, for all the models (i have different ones with different random factors and different outcome variables) i can use normal steps? (if it wasn't obvious by now, english is not my native language and i am struggling to explain myself properly).

Thanks!

startz · September 9, 2021, 2:14pm

Using the AIC is perfectly reasonable. But remember that if you eliminate observations with steps==0 from the log equation you have to eliminate them from the linear version as well before making the comparison.

(Your English is pretty darn good!)

Jolien · September 9, 2021, 2:26pm

Thanks again startz! You are making my day!

I eliminated the 0 values in the original csv dataset in excel just now! The safest way for me hehe. And i re-uploaded the dataset and am running all the 'steps' models again. So both the log and linear function don't have any zeros anymore.

I guess the biggest question left for me is, do i need to separately run all the models containing 'steps' again with log(steps), or can just one (two?) comparison model suffice? Or maybe two as i am making association models for either behaviour - calcium (with or without random factors farm, parity and farm&parity) and behaviour - ketosis (same random factors configuration) .

I've attached the screenshots below to better explain (visualization is key, isn't it?) the issue. I can imagine there being a far more efficient way to run the different models as this was just repeated work every time (and then the same but for calcium instead of ketosis, and then the same but for eating, ruminating, lying and walking instead of walking... go figure the time haha). But i am already very fond that all of these are actually running without errors and that i understand them.

I hope this question is clear!

Thanks again!

P.S.: logtransformation does work now that i have eliminated the zeroes!

startz · September 9, 2021, 2:50pm

There isn't a clear statistical answer. Since you know about animals, you might use your judgement as to whether the effects of the independent variables are likely to be similar for different outcomes!

Jolien · September 9, 2021, 3:44pm

That will do! Thanks a lot startz, you've been of great help.

I am going to think long and hard and then i think i will judge that as there are no statistical reasons to transform the independent variable, i will keep it this way ;).

Really, thanks again for all the tips!

Have a nice day!

system · September 16, 2021, 3:44pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.