Creating a Barplot to Visualize the Relationship between Gender and Political Identity

For context, I want to create a barplot showing the relationship between respondents' gender and their political identity (left or right only). Something like:
Screenshot 2023-04-18 at 11.13.05
This is what I've done so far:

# create dataframe with only those respondents are either left or right and who have strong partisan identity, create new column to label them as left (1) or right (2)
 
strongId_frame <- data.frame(bes_strongId) # make dataframe out of only those who say they have strong partisan identity 

strongId_frame$generalElectionVote <- as.character(strongId_frame$generalElectionVote) # change variable showing how respondents voted into character data type (was originally factor variable)

strongestId_frame <- strongId_frame[complete.cases(strongId_frame$generalElectionVote), ] #remove the rows with missing values in columns which record no response for how someone voted

partisan_identity <- ifelse(strongId_frame$generalElectionVote %in% c("Labour", "Green Party", "Liberal Democrat"), 1,
                     ifelse(strongId_frame$generalElectionVote %in% c("Conservative"), 2, NA)) #make new column where all left-voting respondents become 1 and right-voting respondents 2


# what % of women are left identifying?
women_w_strongId <- strongestId_frame[strongestId_frame$gender == "Female",]

women_w_strongId %>%
  count(strongestId_frame$partisan_identity == 1) %>%
  mutate(percent = n/sum(n)*100)

Possible problem #1 - Maybe the partisan_identity

I also tried:

partisan_identity <- ifelse(strongId_frame$generalElectionVote %in% c(2, 3, 7), 1,
                            ifelse(strongId_frame$generalElectionVote %in% c(1), 2, NA))

partisan_identity <- ifelse(match(strongId_frame$generalElectionVote, c("Labour", "Green Party", "Liberal Democrat")) > 0, 1,
                            ifelse(match(strongId_frame$generalElectionVote, c("Conservative")) > 0, 2, NA))

I can't find partisan_identity listed at end of colnames(strongestId_frame), or names(strongestId_frame)

I get the following using summary(strongestId_frame$partisan_identity):

Length  Class   Mode 
     0   NULL   NULL

Possible problem #2 - Maybe the problem is to do with missing data

??

I'm pretty sure all the column and value names are right.

Hi there,

Will this provide you with some pointers?

# Load Libraries ----------------------------------------------------------
library("tidyverse")


# Make som data -----------------------------------------------------------
my_data <- tibble(
  identifies_as = sample(c("Male", "Female"),
                         size = 100,
                         replace = TRUE),
  political_orientation = sample(c("Left", "Right"),
                                 size = 100,
                                 replace = TRUE))


# Wrangle data ------------------------------------------------------------
my_data_summarised <- my_data %>%
  group_by(identifies_as) %>% 
  summarise(
    pct_identifies_as_left = sum(political_orientation == "Left")/n() * 100)


# Visualise ---------------------------------------------------------------
my_data_summarised %>% 
  ggplot(aes(x = factor(identifies_as, levels = c("Male", "Female")),
             y = pct_identifies_as_left)) +
  geom_col(colour = "black", fill = "white") +
  geom_vline(xintercept = 0) +
  geom_hline(yintercept = 0) +
  scale_y_continuous(limits = c(0, 100)) +
  theme_minimal() +
  labs(x = "", y = "% that identifies as left") +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5),
        panel.grid.major.x = element_blank())

Hi Leon! Many thanks for your time.

This is how I understood how I might be able to use this. Apologies if this is confusing as I renamed a few of your variable names for consistencies sake with other work I have done.

# delete missing data in gender
sum(is.na(bes_strongId$gender))
bes_strongId_clean <- bes_strongId[complete.cases(bes_strongId$gender), ]

First, I checked whether or not there was missing in the gender column, as I was worried this might cause problems with the barplot. I deleted it.

# Convert the factor variable to a data frame
bes_strongId_clean <- as.data.frame(bes_strongId_clean)

# Create new partisan_identity column -------------------------------------
library(dplyr)
bes_strongId_clean <- bes_strongId_clean %>%
  mutate(generalElectionVote = case_when(
    generalElectionVote %in% c("Labour", "Green Party", "Liberal Democrat") ~ "Left",
    generalElectionVote %in% c("Conservative") ~ "Right",
    TRUE ~ "Other"
  ))

Here, I think I create a new column called partisan_identity to merge voting choices in the generalElectionVote column, creating three possible variable values for someone's identity - right, left, and other. I don't need to do this for gender because its already a factor variable

# Wrangle data ------------------------------------------------------------
my_data_summarised <- bes_strongId_clean %>%
  group_by(bes_strongId_clean$gender) %>% 
  summarise(
    pct_identifies_as_left = sum(partisan_identity == "Left")/n() * 100)

# Visualise ---------------------------------------------------------------
my_data_summarised %>% 
  ggplot(aes(x = factor(gender, levels = c("Male", "Female")),
             y = pct_identifies_as_left)) +
  geom_col(colour = "black", fill = "white") +
  geom_vline(xintercept = 0) +
  geom_hline(yintercept = 0) +
  scale_y_continuous(limits = c(0, 100)) +
  theme_minimal() +
  labs(x = "", y = "% that identifies as left") +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5),
        panel.grid.major.x = element_blank())

I don't run into any problems until I run the plot.

1. If I run the plot command as it is, I'm told:

Error in `geom_col()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (2)
✖ Fix the following mappings: `x`

2. If I run the plot command having deleted geom_col(), I get:
Screenshot 2023-04-19 at 11.38.17

3. Other times I run the plot I get told object 'gender' not found:

It made sense to me why this was an error. Your code would work because you created a kind of data frame for gender (identifies_as) and inputted that as the x value.

So, I tried adding in the dataframe I created: x = factor(bes_strongId_clean$gender, levels = c("Male", "Female")), This gave me the same error message as written above in 1.

Problems appear to be caused by the gender variable. Any thoughts?

1 Like

Without access to your data, I can only suggest to compare the example data I created with yours and identify discrepancies.

Please post the output of

dput(my_data_summarised)

Sure! Here we are:

structure(list(`bes_strongId_clean$gender` = structure(1:2, levels = c("Male", 
"Female"), class = "factor"), pct_identifies_as_left = c(NA_real_, 
NA_real_)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

Well, that is this data:

  bes_strongId_clean$gender pct_identifies_as_left
1                      Male                     NA
2                    Female                     NA

You cannot plot this data, since there is no data? (Other than NAs)

Yes, I got the same output...

I tried doing the whole thing from the beginning again today:

# delete missing  data in gender
bes_clean <- bes[complete.cases(bes$gender), ]

# clean up party identification variable 
bes_clean <- bes_clean[complete.cases(bes_clean$partyIdStrength), ]
exclude_values <- c("Don't know", "Not very strong")
bes_clean <- subset(bes_clean, !(partyIdStrength %in% exclude_values))

# clean up election vote variables =
bes_clean <- bes_clean[complete.cases(bes_clean$generalElectionVote), ] 
exclude_values <- c("Don't know", "Brexit Party/Reform UK", "Plaid Cymru", "Scottish National Party (SNP)", "I would/did not vote", "Other" )
bes_clean <- subset(bes_clean, !(generalElectionVote %in% exclude_values))
bes_clean$generalElectionVote <- factor(bes_clean$generalElectionVote)

I think we're good, up till here.

# make new column with factor variables 
bes_clean <- bes_clean %>%
  mutate(partisan_identity = if_else(generalElectionVote %in% c("Labour", "Green Party", "Liberal Democrat"), "left",
                                   if_else(generalElectionVote == "Conservative", "right", NA_character_)))

So, unlike previously, the new partisan_identity column is showing up as a new column in the dataset. When I use unique(), class(), and summary(), it showed me I had created 12406, correctly named character variables.

At this point, I tried to run the plot and got something that looked good, but without any responses filled in:

# Wrangle data---------------------------------------------------------------
my_data_summarised <- bes_clean %>%
  group_by(gender) %>% 
  summarise(
    pct_identifies_as_left = sum(partisan_identity == "Left")/n() * 100)

# Visualise ---------------------------------------------------------------
library(ggplot2)
my_data_summarised %>% 
  ggplot(aes(x = factor(gender, levels = c("Male", "Female")),
             y = pct_identifies_as_left)) +
  geom_col(colour = "black", fill = "white") +
  geom_vline(xintercept = 0) +
  geom_hline(yintercept = 0) +
  scale_y_continuous(limits = c(0, 100)) +
  theme_minimal() +
  labs(x = "", y = "% that identifies as left") +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5),
        panel.grid.major.x = element_blank())

Screenshot 2023-04-20 at 21.29.48

So, I converted the partisan_identity variables into factor variables: bes_clean$partisan_identity <- factor(bes_clean$partisan_identity).

Unfortunately, I got the same graph plot!

Any clue?

If

dput(my_data_summarised)

once again shows NA values for pct_identifies_as_left, try

my_data_summarised <- bes_clean %>%
  group_by(gender) %>% 
  summarise(
    pct_identifies_as_left = sum(partisan_identity == "Left", na.rm = TRUE)/n() * 100)
1 Like

Well, this could work. Just guessing based on what you've shared so far. The goal should be to get the variables in the shape and class that you need them to be. Chiefly:

  • x is a character variable for gender
  • y is a numeric variable indicating percentage

Once you've got the data in the correct shape you can then go about asking your questions and visualizing the data.

# package libraries I used
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.2
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.2
#> Warning: package 'readr' was built under R version 4.2.2
#> Warning: package 'purrr' was built under R version 4.2.2
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.2
#> Warning: package 'forcats' was built under R version 4.2.2
#> Warning: package 'lubridate' was built under R version 4.2.2
library(janitor)
#> Warning: package 'janitor' was built under R version 4.2.2
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
library(scales)
#> 
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#> 
#>     discard
#> The following object is masked from 'package:readr':
#> 
#>     col_factor

# sample data
set.seed(123)
strongID_frame <- tibble(
  id = seq(1, 100, 1),
  generalElectionVote = sample(
    x = c("Labour", "Green Party", "Liberal Democrat", "Conservative", "Other", NA_character_),
    size = 100,
    replace = TRUE
  ),
  gender = sample(
    x = c("Female", "Male"),
    size = 100,
    replace = TRUE
  )
)

# view
strongID_frame
#> # A tibble: 100 × 3
#>       id generalElectionVote gender
#>    <dbl> <chr>               <chr> 
#>  1     1 Liberal Democrat    Male  
#>  2     2 <NA>                Male  
#>  3     3 Liberal Democrat    Male  
#>  4     4 Green Party         Male  
#>  5     5 Green Party         Male  
#>  6     6 <NA>                Female
#>  7     7 Liberal Democrat    Male  
#>  8     8 Other               Male  
#>  9     9 Conservative        Male  
#> 10    10 <NA>                Female
#> # ℹ 90 more rows

# tidy and transform to match the need
strongID_frame <- strongID_frame %>%
  # remove the rows with missing values in columns which record no response for how someone voted
  drop_na(generalElectionVote) %>%
  # new variable to flag for partisan left
  mutate(
    partisan_identity_left = if_else(
      condition = generalElectionVote %in% c("Labour", "Green Party", "Liberal Democrat"),
      true = 1,
      false = 2
    )
  ) %>%
  # new variable to flag for partisan left + female
  mutate(
    women_w_strongID_left = if_else(
      condition = partisan_identity_left == 1 & gender == "Female",
      true = 1,
      false = 2
    )
  )

# what % of women are left identifying?
strongID_frame %>%
  tabyl(gender, women_w_strongID_left) %>%
  adorn_percentages() %>%
  adorn_pct_formatting()
#>  gender     1      2
#>  Female 48.6%  51.4%
#>    Male  0.0% 100.0%

# data viz
strongID_frame %>%
  # data tidy
  tabyl(gender, partisan_identity_left) %>%
  adorn_percentages() %>%
  as_tibble() %>%
  select(gender, `1`) %>%
  ggplot() +
  geom_col(
    mapping = aes(
      x = gender,
      y = `1`
    ),
    fill = "white",
    color = "black",
    linewidth = 1
  ) +
  scale_y_continuous(labels = percent) +
  labs(
    y = "% that identify as left",
    x = "Gender"
  )

Created on 2023-04-20 with reprex v2.0.2

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.