Hi, I'm new at using R, and am having trouble with plotting a public data set. I'm using the HIV/AIDS Diagnosis by Neighborhood, Sex, and Race/Ethnicity (https://data.cityofnewyork.us/Health/HIV-AIDS-Diagnoses-by-Neighborhood-Sex-and-Race-Et/ykvb-493p/about_data) but can't seem to get a graph. I used: ggplot(data = HIV_AIDS_Diagnoses_by_Neighborhood_Sex_and_Race_Ethnicity_20240131, mapping = aes(x = "Total Number of HIV Diagnoses", y ="Total Number of AIDS Diagnoses" ))+ geom_point() + geom_line()
The two columns you are trying to plot have a few rows containing an asterisk. This forces the whole column to be of the data type character. I used as.numeric() to convert them to numbers. I also manually edited the original file to shorten those two column names, just to save typing.
I took a slightly different approach to that of FJCC in dealing with the names issue and renamed the whole dataset. Table_names gives a list of equivalencies.
All your variables which we would expect to be numeric are coming in as character, except year. Luckily FJCC spotted the problem. I just coverted all the variables to numeric.
This code, using {data.table] will produce the same plot that FJCC has produced but I think you have more data quality problems. Look at the summaries and the DT[, table(race)] outputs.
It looks like you have some serious outliers intotalhiv & totalaids that almost certainly are typos and I would question two categories of Other/Unknown & Unknown in race. Again I suspect a data-entry error.