How to insert monotonic in geom_smooth instead of linear model o loess model

PaoloEmilio99 · March 12, 2024, 1:15pm

Hi, my name is Paolo. I'm doing some Spearman's correlation analysis for a project with a big dataset (240k obs.). For justifying Spearman's instead of Pearson's, I have to produce plots where I highlight that the two variables don't have a linear relation before the phase of calculating correlation coefficients. I'm using ggplot2.

Here's my lines, where s2 is the dataset, x and y two quantitative variables (expenditures in euro).

s2 %>%
       ggplot(aes(x = `Farm Net Value Added`,
                  y = `NPK+Difesa`))+
       geom_point()+
       geom_smooth(method = "lm", se = F,aes(color = "blue"))+
       geom_smooth(se = F, aes(color = "red"))+
       scale_color_identity(name = "legend",
                            labels = c("linear", "i need it monotonic"),
                            guide = "legend")+
       labs(x = "Variable 1 (€)",
            y = "Variable 2 (€)",
            title = "plot Variable 1,2")+
            theme_bw()

This is my output, where blue line is the linear model smooth and red curve is the default smooth that I'm trying to replace with a monotonic function:

Question: there is a function in ggplot2 for plotting a monotonic decreasing curve or an argument for doing that with geom_smooth (I tried with method = "loess" but the output is similar to the red curve (non monotonic).

I'm thanking in advance all the people that will respond to my post

AlexisW · March 12, 2024, 4:03pm

Seeing your data, I would start with plotting it on a log scale to see if the relationship becomes more obvious.

For your question, to fit a monotonic increasing curve I think you need to specify a model. The easiest are probably something like Variable2 ~ log(Variable1) or Variable2 ~ sqrt(Variable1).

nirgrahamuk · March 12, 2024, 4:44pm

I took a stab at it :

i2 <- filter(iris,
             Species=="versicolor") |> 
  select(starts_with("Sepal")) |> distinct() |> 
  rename(x=Sepal.Length,
         y=Sepal.Width) |> arrange(x,y)

(iso_model <- isoreg(x=i2$x,    y=i2$y))

i2$iso_pred <- iso_model$yf

iso_smoothed <- smooth.spline(x=i2$x,
                          y = iso_model$yf,spar=.5)
i2$iso_smoothed_pred <- predict(iso_smoothed,x=i2$x)$y

i2 |>
  ggplot(aes(x = x,
             y = y))+
  geom_point()+
  geom_smooth(method = "lm", se = F,aes(color = "blue"))+
  geom_smooth(se = F, aes(color = "red"),linetype="dashed")+
  geom_line(aes(y=iso_pred,color="green"))+
  geom_line(aes(y=iso_smoothed_pred,color="purple"),linewidth=1)+
  scale_color_identity(name = "legend",
                       labels = c("linear", "iso","smoothed iso (monotonic)","i need it monotonic"),
                       guide = "legend")+
  labs(x = "Variable 1 (€)",
       y = "Variable 2 (€)",
       title = "plot Variable 1,2")+
  theme_bw() + theme(legend.position = "bottom")

PaoloEmilio99 · March 12, 2024, 4:55pm

Hi Alexis thank you for reply me. I'll try to transform variables. Instead of creating new columns in the dataset, is it possible to transform directly when plotting ? I mean like

ggplot(aes(log(x = The variable I need to transform),
                       y = the other variable))+ ...

PaoloEmilio99 · March 12, 2024, 5:26pm

HI, thank you for your reply. I tried with your code renaming the variables of my dataset:

i2 <- filter(s2 |> 
   rename(x=`Farm Net Value Added`,
          y=`NPK+Difesa`) |> arrange(x,y))
 
 (iso_model <- isoreg(x=i2$x,    y=i2$y))
 
 i2$iso_pred <- iso_model$yf
 
 iso_smoothed <- smooth.spline(x=i2$x,
                               y = iso_model$yf,spar=.5)
 i2$iso_smoothed_pred <- predict(iso_smoothed,x=i2$x)$y
 
 i2 |>
   ggplot(aes(x = x,
              y = y))+
   geom_point()+
   geom_smooth(method = "lm", se = F,aes(color = "blue"))+
   geom_smooth(se = F, aes(color = "red"),linetype="dashed")+
   geom_line(aes(y=iso_pred,color="green"))+
   geom_line(aes(y=iso_smoothed_pred,color="purple"),linewidth=1)+
   scale_color_identity(name = "legend",
                        labels = c("linear", "iso","smoothed iso (monotonic)","i need it monotonic"),
                        guide = "legend")+
   labs(x = "Variable 1 (€)",
        y = "Variable 2 (€)",
        title = "plot Variable 1,2")+
   theme_bw() + theme(legend.position = "bottom")

Here's the result:

AlexisW · March 12, 2024, 5:36pm

yes, just keep the x= outside of the log:

ggplot(aes(x = log(Variable1),
                       y = log(Variable2)))+ ...

Or even better, just add:

ggplot(aes(...)) +
 scale_x_log10() +
 scale_y_log10()

nirgrahamuk · March 12, 2024, 5:37pm

The green line is an isotonic regression, so its monotonic in the way you want but its not smooth, decrease the spar=.5 of the smooth.spline to a lower number that achieves a smoother result, without bending below the green line.

PaoloEmilio99 · March 12, 2024, 5:54pm

Thanks for the tips, I tried with span=.2 and then .1 but it seems the same. And, instead of a monotonic increasing curve, for trying to see if a monotonic decreasing curve fits to my plot, what I have to do, in parallel to transforming variables as @AlexisW said?

nirgrahamuk · March 12, 2024, 5:57pm

if a monotonic descreasing function might be a good fit for you, surely the lm would have a negative slope ?

PaoloEmilio99 · March 12, 2024, 6:08pm

Only watching plots , I assume that when variable on Y decreases the X increases. But the problem is that there are 239,790 observation and maybe I only led myself astray by the fact that many observations are concentrated towards the origin of the axes , and the outliers are on the margins (high X low Y, low x high Y). That dots disposition makes me think of that trend

PaoloEmilio99 · March 12, 2024, 6:14pm

Edit : I mean could not necessarily be linear , but decrescent.

nirgrahamuk · March 12, 2024, 6:17pm

if the general trend was ambigious your lm slope would be close to flat / horizontal,
your non-monotonic smooth tells you where its increasing/decreasing over different regions,so it tells you how it goes in different regions.

you can fit the lm a second time after you remove outliers.

Looking at the image you shared it seems clearly increasing where the greatest mass of dots are

PaoloEmilio99 · March 12, 2024, 6:23pm

Without outliers , using interquartile range method (Tukey, 1977) , I lost a lot of observation (50k):

nirgrahamuk · March 12, 2024, 6:32pm

I would look at that zoomed in region again , without the outliers having been removed also. coord_cartesian can do that.

Generally I'd advise you to adjust your scales so that values can be usefully read off from them ...

finally analsyis wise - yeah, it seems that the smoothed red dashed line is giving potentially useful information about a discontinuity around that low positive value, whatever it is.

you could you could probably fit a segmented lm.

PaoloEmilio99 · March 12, 2024, 6:38pm

The aim of this plot is to justify the use of Spearman's instead of Pearson's, because, analysing summary stats of variables ad QQplots I noticed that there aren't gaussian distributions of values. And for that reason Spearman's is a more robust indicator of correlation . Maybe I can demonstrate that by plotting columns of ranks instead column of variables ? like they said here : plotting spearman correlation (with geom_smooth?) - #3 by Matthias

system · April 23, 2024, 6:38pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.