mark gndr=9 as NA

Dear community,

I am working with a dataset that has a "9" put, when the proband did not give an answer on what gender he/she is. I want to have these 9's exclueded from my data. The problem is that i do not want the other 9's in all the different colums to be excluded.

All the other missing values of the other variables are marked with "66", "77", "88", "99", and "999". I marked them as "NA" right when importing the data. But I don't know how to do it with the "9", since I dont want to have all the other 9's in the other colums (e.g jbstfy, happy) to be removed.

I hope you can help me out.

PS: Do i mark gender as a factor? or as character?

This is the data:

This is my code until now:

data <- read.csv2(file.choose(), header=T, sep = ",", dec=",", na.strings= c("NA", "66", "77", "88", "99", "999"))

Two basic ways to go about this:

  1. You can handle NA for a specific column:
is_unknown_gender <- data[["gndr"]] == 9
data[["gndr"]][is_unknown_gender] <- NA
  1. (My recommendation) Replace the numeric codes for non-numeric data with factors.
data[["gndr"]] <- factor(
  levels = c(1, 2), # 9 is not included, so those values become NA
  labels = c("male", "female")

Factors have the benefit of making it easier to read the dataset and code.

Forgive me if the column names or code values are wrong. I didn't download your data because of internet security habits.

Wow, thank you so much for your quick help!!!

Hi @Stephan95, I think it is good practice to accept an answer as a solution if you think it solved your problem :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.