install.packages("ggplot2")
install.packages("readxl")
library(ggplot2)
library(readxl)
# Read the data file
countries <- read_excel("C:/Users/Tiago/3D Objects/Projeto/countries of the world.xlsx")
# Filter out NA values from the 'Service' column
data <- na.omit(countries$Service)
# Use Sturges' rule to determine the number of bins
num_bins <- ceiling(1 + log2(length(data)))
# Create the histogram with cumulative relative frequencies
ggplot(data, aes(x=Service)) +
geom_histogram(aes(y=..density..), bins=num_bins, fill="skyblue", color="black", cumulative=TRUE) +
geom_density(aes(y=..density..), color="red", cumulative=TRUE) +
labs(title="Histogram of Cumulative Relative Frequencies with Density Plot",
x="Service",
y="Cumulative Relative Frequency") +
theme_minimal()```
The object data
made in the line
data <- na.omit(countries$Service)
is just a vector, the content of one column of countries
. If you want to filter out the rows of countries
that have NA in the Service column, try
data <- countries[!is.na(countries$Service), ]
Then data
will be a data frame.
In the next line
num_bins <- ceiling(1 + log2(length(data)))
you will want to replace length(data)
with nrow(data)
.
Thank you, it solved that part however when I run the code the graph is not cumulative. And appears two Warning messages:
1: In geom_histogram(aes(y = ..density..), bins = num_bins, fill = "skyblue", :
Ignoring unknown parameters: cumulative
2: In geom_density(aes(y = ..density..), color = "red", cumulative = TRUE) :
Ignoring unknown parameters: cumulative
. What it can be?
Instead of using geom_histogram(), I think you should use stat_ecdf()
ggplot(data, aes(x=Service)) + stat_ecdf()
I don't understand how a cumulative density would be different from that.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.