Plotting babynames with geom_col()

Hi. Sorry to bother you again but life is full of questions. I did get the code

babynames %>% 
  group_by(sex) %>%
  top_n(5,n) %>%
  ungroup() %>%
  select(sex, name, year, n) %>% 
  arrange(sex, desc(n)) %>%
  ggplot(aes(x = name, y = n)) + geom_col()

And now I'm trying to use ggplot and geom_col to visualize the names by plot and the result is just weird. Can you please help me check what's wrong with my code? Thank you very much!

1 Like

Thanks for the reprex, just missing

library(babynames)
library(dplyr)
library(ggplot2)

Not a biggie.

Here's the data going to ggplot

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(babynames)) 
babynames %>% 
  group_by(sex) %>%
  top_n(5,n) %>%
  ungroup() %>%
  select(sex, name, year, n) %>% 
  arrange(sex, desc(n))
#> # A tibble: 10 x 4
#>    sex   name     year     n
#>    <chr> <chr>   <dbl> <int>
#>  1 F     Linda    1947 99686
#>  2 F     Linda    1948 96209
#>  3 F     Linda    1949 91016
#>  4 F     Linda    1950 80432
#>  5 F     Mary     1921 73982
#>  6 M     James    1947 94756
#>  7 M     Michael  1957 92695
#>  8 M     Robert   1947 91642
#>  9 M     Michael  1956 90620
#> 10 M     Michael  1958 90520

Created on 2020-03-02 by the reprex package (v0.3.0)

The plot with geom_col() is about as condensed as possible a representation of the data. It does answer the question:

For each name, how many occurrences?

So, the question for the analyst is what else a plot should draw attention to. The rank change over years? Which sex is more consistently in the top five?

From the question comes the plot.

What is the question?

As @technocrat says, it really depends on what you're trying to show with the plot. I think the reason your getting unexpected results is that geom_col() is combining all years of each name.

If you just want to plot the number of children with each top name per year, you could create a name_year and do something like this. Also, decided to split this by sex into female and male. You could do a lot more to clean this up, but it should be a good start for you!

library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
library(babynames)

top_names <- babynames %>% 
  group_by(sex) %>%
  top_n(5, n) %>%
  ungroup() %>%
  select(sex, name, year, n) %>% 
  arrange(sex, desc(n))

top_names %>% 
  ggplot(aes(x = paste(name, year), y = n)) + 
  geom_col() +
  facet_wrap(vars(sex), scales = "free_x") +
  labs(x = "")

Created on 2020-03-02 by the reprex package (v0.3.0)

Thank you both very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.