The reason why I used abs() is that some values in my variables are negative:
> summary(MFGCOST)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3900.00 13.72 33.29 65.78 78.05 53138.51
> summary(QtySold)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-40.000 1.000 1.000 2.806 3.000 499.000
> summary(MarginDollars)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2222.00 6.43 16.95 28.77 37.62 24316.27
The reason I am using log transformation is that some values in my object variables are pretty huge so log scale down the number to help me see the correlation better.
I am having an error message:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :NA/NaN/Inf in 'x'
There are likely NA, NaN or Inf in one of your datasets. Using the all function is looking to see if every data point is an NA. You should use the any function instead which will tell you if there are anyNA in the dataset. Also you could use the which function to find their locations.
Here is a toy example showing the difference:
dummy <- c(1, 3, 45, 3, 5, NA_real_)
# only returns TRUE is all elements are NA
all(is.na(dummy))
#> [1] FALSE
# returns TRUE is any elements are NA
any(is.na(dummy))
#> [1] TRUE
# gives you the index of which elements are NA
which(is.na(dummy))
#> [1] 6
Yes, in R, log(0) returns -Inf. This StackOverflow discussion might help:
Basically, if your data have meaningful zeroes, then a log transformation is not appropriate because the natural logarithm is only defined for x > 0. If the zeroes are really just missing data, then they need to be encoded and dealt with as missing data. There are other transformations (such as square root) that might be more appropriate for data like yours.
@jcblum
So much to learn about Data Science every day. I love it!
Also, I do have some negative values in my variables (Customer returns, etc.), so I don't think square root will work either.
Where do I find readings on Square Root Transformation?
The function log1p will compute log(x+1) where x is a numeric vector. So log1p(0) is equivalent to log(1). It will do a log transformation for base 10 by default. The function will work well for non-negative x