How to plot columns with huge amount of observations from two different datasets?

red_devil · June 21, 2023, 8:08am

Hi, I am facing issues in plotting two columns with lots of observations using ggplot2 in R. It's getting clustered and no matter what I do to the code chunk, it just doesn't get better.

technocrat · June 21, 2023, 8:46am

It's not really possible to give a good answer without a reprex.See the FAQ: How to do a minimal reproducible example reprex for beginners. It;s going to depend on the type of graph, the number of aesthetics to be applied and the number of data points. In the simple case, with a really large number of observations, sampling might work.

library(ggplot2)
d <- data.frame(
  x = sample(1:10000,replace = TRUE),
  y = sample(1:10000,replace = TRUE))

d |> ggplot(aes(x,y)) + geom_point()


d$x <- sample(d$x,100)
d$y <- sample(d$y,100)
d |> ggplot(aes(x,y)) + geom_point()

^{Created on 2023-06-21 with reprex v2.0.2}

red_devil · June 21, 2023, 8:56am

sharing the code chunk with you to make you better understand my problem

library(ggplot2)
ggplot(data = overall_standing) +
geom_point(mapping = aes(x = P, y = Club)) +
geom_point(data = overall_attendance, aes(x = stadium, y = average))

The Columns Club and Stadium contain a good number of observations, and even when I plot them on the y-axis, it still remains clustered.

nirgrahamuk · June 21, 2023, 9:27am

what relation does P have to Stadium ( proposed x axis) , and what relation does Club have to average (proposed y axis)?; do they share common units ?

system · July 12, 2023, 9:27am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.