transform data for a simple linear regresion with Rstudio

Hi, I have a set of numeric data, with two variables: I'm comparing biomass of fish (g/ha) and biomass of invertebrates (g/ha).
The problem is my data is not normal to make a simple regresion. It doesn't work either when I transform my data into LOG. I've read about the BoxCox transofrmation but can't really understand how it works and how to code it in Rstudio.
Any help with BoxCox or any other idea that could be usefull?

thanks so much!

I think is not possible to directly help, as we don't know the data, but I tend to be fan of not transforming the data (other than scaling).
I paste a couple of references regarding variable transformation issues, focusing on ecological research:

as a simple example on how misleading can be even simple data transformations, and assuming you have in one plot half the biomass than in other...

[1] 0.5
[1] 0.8120982
[1] 0.7071068
[1] 2.061154e-09

finding the right transformation may be harder than it looks. From these examples, the only one that that can be easily 'back-transformed' is sqrt(20)/sqrt(40) , that goes back to 0.5.

Better check what are the standards on your area of research (and what I know about fish biomass, like growth, it is normally modelled as non linear)


1 Like

thanks, I'll check the links!
these are the data, they are very simple. Every value is from a diferent sampling site in the same river.
biomass fish g/ha (weight)
site 1: 2114661,85
site 2: 57285,68
site 3: 39322,94
biomass invertebrates g/ha (weight)
site 1: 6772,69516
site 2: 8365,58797
site 3:4527,99555

I just want to find the correlation between these two variables and if the regresion is significant or not. The problem is that when I plot these data, beacuse of the value 2114661,85 the plot is not lineal as it should be to make a linear model. So i don't think any transformation is helping me with that, and idk what else to do.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.