I want to run a linear model so I visualize the outliers of a trade by creating a boxplot. Which shows that there are outliers in the variable. There are multiple outliers visible as individual points above the upper whisker of the box plot. • The box plot shows a right-skewed (positively skewed) distribution for the trade variable, as evidenced by:
• The median line (thick horizontal line inside the box) is closer to the bottom of the box.
• The upper whisker is longer than the lower whisker.
• The presence of high outliers.
• Central Tendency: The median appears to be around 50 units on the trade scale.
• Spread: The interquartile range (represented by the height of the box) spans from approximately 35 to 70 units on the trade scale.
• Range: Excluding outliers, the data ranges from about 10 (lower whisker) to 100 (upper whisker) on the trade scale.
• Outlier Magnitude: Some outliers extend beyond 120 on the trade scale, which is significantly higher than the upper quartile.
Usually, you should not do anything to points that are outside the whiskers of a box plot. Unless you have a specific reason to exclude a data point you should keep it. The whiskers do not mark the limits of possible data, they mark the likely extent of a small normally distributed data set. If the data have a lot of skew, or if it is a very large data set, points beyond the whiskers are to be expected.
An adequate reason to exclude a data point is that it is physically impossible or very unlikely. For example, if the height of a person is recorded as a negative number or as 5 meters, it would be reasonable to exclude the value. Another example would be measuring a voltage, then checking the meter calibration and finding that it is not working properly.
I don't see the issue with RStudio, but anyway ...
note that in (pearson) correlation analysis, which is not exactly the same thing as linear regression, both variables should follow a normal distribution, which does not seem to be the case here. I'm also not sure that we can talk about outliers in correlation analysis, as @FJCC said.
However, in linear regression analysis, for instance with lm(), the fit residuals should follow a normal distribution and, in that case, residuals that are very unlikely (let's say 1 % or even less) could be flagged as outliers if the fitted model and the weighting of the observations are trustworthy.