Hi guys,
I'm trying to plot a histogram by using ggplot2. I want to visualize two factor variables (vote and psppipla). However I can't figure out how to remove NA's so that they don't show up on the histogram. Can anyone tell me how I can remove them?
data_Austria$vote[data_Austria$vote == "Not eligible to vote"] <- NA
data_Austria$vote[data_Austria$vote == "Refusal"] <- NA
data_Austria$vote[data_Austria$vote == "Don't know"] <- NA
data_Austria$vote[data_Austria$vote == "No answer"] <- NA
data_Austria$psppipla[data_Austria$psppipla == "Refusal"] <- NA
data_Austria$pspippla[data_Austria$psppipla == "Don't know"] <- NA
data_Austria$psppipla[data_Austria$psppipla == "No answer"] <- NA
ggplot(data=subset(data_Austria, !is.na(psppipla), !is.na(vote))) + geom_bar(mapping = aes(x=psppipla, fill=vote))
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue (including sample data)? Please have a look at this guide, to see how to create one:
BTW geom_bar() does not produce a histogram, it produces a bar plot
Thanks for your reply! This is my first time working with RStudio and I'm still trying to figure everything out. I'm working with the data from the European Social Survey (2016), but I'm only looking at the data from Austria. I tried to prepare a reproducible example like you have asked me to and hope that I understood you correctly. I want to ignore "No answer", "Refusal" and "Don't know" so that they don't show up on the bar plot.
psppipla <- c("Not at all", "Very little", "Some", "A lot", "A great deal", "No answer")
vote <- c("Yes", "No", "Not eligible to vote", "Refusal", "Don't know", "No answer")
data_Austria <-data.frame(psppipla, vote)
Thank you so much for your help. I would really appreciate if you could answer me one more very similar question. After creating a bar plot, I'm now trying to create a histogram with the same data. But I still can't figure out how to remove the NA's from vote this time, because the "filter" did not work.
In your sample data there are no NAs, remember this stands for "Not Available" and basically means there is no value at all, so you can't compare something to nothing, the correct way would be to use is.na() function, see this example with some NAs added to your sample data.
library(tidyverse)
psppipla <- c("Not at all", "Very little", "Some", "A lot", "A great deal", "No answer")
vote <- c("Yes", "No", NA, "Refusal", NA, "No answer")
data_Austria <-data.frame(psppipla, vote)
data_Austria %>%
filter(!is.na(vote)) %>%
ggplot() +
geom_bar(mapping = aes(x=psppipla, fill=vote))
Also, have in mind that dplyr functions like filter() do not perform in-place modifications, they return a new data frame instead, if you want changes to persist, you have to explicitly assign them to a variable (it could be the same one if you want to overwrite it).