Hey all, I am curious as to codes that would be used to add a column, based on a value that is in another column. For example, I want to make a column characterizing samples as "diseased" or "healthy", based on whether or not they have a "D" or "H" in their dataset id. For example, (the commas are supposed to separate)
dataset_id,bacteria,(would like to add a column here)
Site4H,268,healthy
Site4D,479,diseased
SIte8H,345,healthy
Site8D,567,disease
Hey @livjos! You might be interested in some of the functions in the dplyr package:
dplyr::recode() can directly translate values of a column (eg. "D" becomes "Diseased") (oops, I thought you had H or D in a separate column), and
dplyr::case_when() can create values for a column based on conditions in one or more other columns.
If you have a peak at the documentation links for those functions, they'll show you some great examples of what you want to do. In this case, you might want to use case_when() with the base endsWith() function, which will return TRUE or FALSE depending on whether each row ends with the supplied suffix.