Greetings!
I am looking to create two graphs for women and another for men. Within each graph, I am interested to compare Whites and African Americans
I searched for solutions already available but couldn't get one that fits the data.
My data contains:
gender<-dat$gender ( #Gender column has values female and male)
race<-dat$race (#race column has values white and non-white)
callback<-dat$received_callback (#callback column has values 0 and 1)
I get the following error
Error in plot.window(...) : invalid 'xlim' value
In addition: Warning message:
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
I get this error now:
Error in plot.xy(xy, type, ...) : invalid plot type
In addition: Warning message:
In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
I tried using na.omit() to remove NAs but the error persists
The csv file you linked, as it stands, cannot be used to easily produce a plot like you posted above. Here is a quick examination of the data set.
DF <- read.csv("~/R/Play/resume.csv")
#Pick the 3 columns of interest so that the following summary() is easy to read
DF_reduced <- DF[, c("received_callback", "race", "gender")]
summary(DF_reduced)
received_callback race gender
Min. :0.00000 Length:4870 Length:4870
1st Qu.:0.00000 Class :character Class :character
Median :0.00000 Mode :character Mode :character
Mean :0.08049
3rd Qu.:0.00000
Max. :1.00000
table(DF_reduced$received_callback)
0 1
4478 392
table(DF_reduced$race)
black white
2435 2435
table(DF_reduced$gender)
f m
3746 1124
The values of received_callback are 0 or 1, meaning, I suppose, No and Yes. The race column only has two values, white and black. If you plot received_callback versus race for one gender, you will see only four points: (black, 0), (black, 1), (white, 0), (white,1). Rather than simply show how I would handle the data, let me ask some questions:
Why are you doing this? Is it homework for a class? Is it self study?
How would you describe the number you want to see on the y axis? How would you calculate it?
Since it is for a class, I'll avoid directly giving you the answer. How would you calculate the fraction of Yes (1) received_callback responses for each combination of gender and race? That is, how would you make a table like
gender race Frac
f black 0.xxx
m black 0.yyy
f white 0.zzz
m white 0.www
I could make a graph, I used the plot() function but it seems I had to add something that would make the plot realistic and more representative of the data.
Now you can share the code as the due date is over .
Sorry, this slipped my mind yesterday.
Here are two plotting methods to make individual plots for females and males. The first uses the base plotting package and the second uses ggplot. I spent no time polishing the appearance of the plots.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
DF <- read.csv("~/R/Play/resume.csv")
Summary <- DF |> group_by(gender, race) |>
summarize(Frac = mean(received_callback))
#> `summarise()` has grouped output by 'gender'. You can override using the `.groups` argument.
Summary
#> # A tibble: 4 x 3
#> # Groups: gender [2]
#> gender race Frac
#> <chr> <chr> <dbl>
#> 1 f black 0.0663
#> 2 f white 0.0989
#> 3 m black 0.0583
#> 4 m white 0.0887
Summary$race <- factor(Summary$race)
#using the base plotting method
par(mfrow = c(1, 2))
tmp <- subset(Summary, gender == "f")
plot.default(x=tmp$race, y=tmp$Frac, ylab="callback",xaxt = "n",
xlab= "race", type = "b", main = "Female")
axis(side = 1, at = c(1,2), labels = tmp$race)
tmp <- subset(Summary, gender == "m")
plot.default(x=tmp$race, y=tmp$Frac, ylab="callback",xaxt = "n",
xlab= "race", type = "b", main = "Male")
axis(side = 1, at = c(1,2), labels = tmp$race)
#Using ggplot
library(ggplot2)