`data` must be a <data.frame>, or an object coercible by `fortify()`, not a double vector


# Read the data file
countries <- read_excel("C:/Users/Tiago/3D Objects/Projeto/countries of the world.xlsx")

# Filter out NA values from the 'Service' column
data <- na.omit(countries$Service)

# Use Sturges' rule to determine the number of bins
num_bins <- ceiling(1 + log2(length(data)))

# Create the histogram with cumulative relative frequencies
ggplot(data, aes(x=Service)) +
  geom_histogram(aes(y=..density..), bins=num_bins, fill="skyblue", color="black", cumulative=TRUE) +
  geom_density(aes(y=..density..), color="red", cumulative=TRUE) +
  labs(title="Histogram of Cumulative Relative Frequencies with Density Plot",
       y="Cumulative Relative Frequency") +

The object data made in the line

data <- na.omit(countries$Service)

is just a vector, the content of one column of countries. If you want to filter out the rows of countries that have NA in the Service column, try

 data <- countries[!is.na(countries$Service), ]

Then data will be a data frame.
In the next line

num_bins <- ceiling(1 + log2(length(data)))

you will want to replace length(data) with nrow(data).

Thank you, it solved that part however when I run the code the graph is not cumulative. And appears two Warning messages:
1: In geom_histogram(aes(y = ..density..), bins = num_bins, fill = "skyblue", :
Ignoring unknown parameters: cumulative
2: In geom_density(aes(y = ..density..), color = "red", cumulative = TRUE) :
Ignoring unknown parameters: cumulative. What it can be?

Instead of using geom_histogram(), I think you should use stat_ecdf()

ggplot(data, aes(x=Service)) + stat_ecdf()

I don't understand how a cumulative density would be different from that.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.