First, I've really tried to avoid asking this and I sincerely want to teach myself how to do this, but after hitting my head against a wall for a few days, I've only wasted time and have nothing to show for my work. I can't even put out a reprex (I read how to do that, I'm so confused I can't even guess what code to start with).
My goal is to create a new variable in an existing dataset. I will call this new variable ha_rescue. Options for this will be "yes" or "no". I have 5 other columns, called "new_drug_1" up to "new_drug_5". If any of 6 words are in any of these columns, then I would want "yes" in my new variable "ha_rescue". If there is nothing or any other word in those columns, I would want "no" in variable ha_rescue.
My dataset is called "ha".
From what I have gathered, I should be using the dplyr package and mutate? I've seen examples of people creating new variables from numbers, but not from characters. The examples I see on youtube or other help sites have a lot of code and without starting their project from scratch I get pretty confused. I am new to R Studio and used it for my first semester in a clinical research masters program so I'm still a novice but I want to become proficient.
TLDR: Is dplyr the best way to create a new variable with specific character requirements from 5 other character variables?
I think this will help you. I first made an example dataset with 5 columns new_drug_1 through new_drug_5 and saved it as ha - you already have something that looks like this. Then I make a vector of the words I'm searching for. Then I create the ha_rescue variable by checking if each column is in the word list. The | symbol is OR in R.
library(tidyverse)
#Make an example dataset
allwords <- c("apple", "banana", "carrot", "cucumber", "lettuce", "tomato")
ha <- tibble(new_drug_1=sample(allwords, 200, replace=TRUE),
new_drug_2=sample(allwords, 200, replace=TRUE),
new_drug_3=sample(allwords, 200, replace=TRUE),
new_drug_4=sample(allwords, 200, replace=TRUE),
new_drug_5=sample(allwords, 200, replace=TRUE))
ha
#> # A tibble: 200 x 5
#> new_drug_1 new_drug_2 new_drug_3 new_drug_4 new_drug_5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 carrot banana banana banana tomato
#> 2 tomato lettuce apple carrot banana
#> 3 carrot cucumber lettuce tomato cucumber
#> 4 lettuce carrot banana carrot apple
#> 5 banana cucumber banana apple cucumber
#> 6 apple apple cucumber banana banana
#> 7 carrot cucumber lettuce cucumber apple
#> 8 lettuce cucumber apple carrot cucumber
#> 9 apple lettuce cucumber lettuce cucumber
#> 10 lettuce tomato apple lettuce tomato
#> # ... with 190 more rows
#This would be a vector of your 6 words you are searching for
#In my case, it just has 2 words
wordlist <- c("apple", "banana")
ha %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))
#> # A tibble: 200 x 6
#> new_drug_1 new_drug_2 new_drug_3 new_drug_4 new_drug_5 ha_rescue
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 carrot banana banana banana tomato yes
#> 2 tomato lettuce apple carrot banana yes
#> 3 carrot cucumber lettuce tomato cucumber no
#> 4 lettuce carrot banana carrot apple yes
#> 5 banana cucumber banana apple cucumber yes
#> 6 apple apple cucumber banana banana yes
#> 7 carrot cucumber lettuce cucumber apple yes
#> 8 lettuce cucumber apple carrot cucumber yes
#> 9 apple lettuce cucumber lettuce cucumber yes
#> 10 lettuce tomato apple lettuce tomato yes
#> # ... with 190 more rows
load the reprex package by using the command library(reprex)
Copy the code you want to run into reprex - don't copy what went to the console. It shouldn't have the > at the beginning of lines, this is just the code. Note you need to include everything including reading in data as the reprex is a standalone session and won't use what is in your environment
Run the command reprex(). This will run whatever you have copied.
@StatSteph Thanks again, I watched the video and tried to recreate her example, even using the basic code in the first 2 minutes, and it didn't work with that either. Here is a screen shot
Thanks, the error is informative. You don't have the function %>% which comes in the package magrittr. You didn't copy your entire reprex into the window so I can't tell which packages you did load, but I would suggest loading the entire tidyverse package. You'll see my code in my first response does that.
I reinstalled reprex and tidyverse, here is the reprex w/ that information in it.
library(reprex)
#> Warning: package 'reprex' was built under R version 3.6.2
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.2
wordlist <- c("DROPERIDOL", "METOCLOPRAMIDE", "OLANZAPINE", "PROCHLORPERAZINE", "ELETRIPTAN", "SUMATRIPTAN")
ha %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))
#> Error in eval(lhs, parent, parent): object 'ha' not found
A reprex (short for reproducible example) has to be, by definition, reproducible, and we can't reproduce your code since you are not providing sample data on a copy/paste friendly format (i.e. ha data frame) thus getting you into an endless back and forth
Please try to follow this guide an make a proper reproducible example illustrating your issue
That error means you don't have an object named ha in your work space. That dataframe must be read in somehow. I suggest you take a look at some intro to R materials, maybe the R for Data Science book. https://r4ds.had.co.nz/
Yeah, I've been reading that today. I know how to load a dataframe, it states it's loaded on the top right area. What's curious is, I read the tutorial recommended to me by @andresrcs, and realized that most reprex's should have some data in it, my dataset has 102 colums which wouldn't work for a reprex, so I created a new df by just selecting the columns (new_drug1 - 5). Running that actually worked, but wasn't reflected in the reprex which I wanted to share with you all.
Reprex below
ha_reprex_df %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))
#> Error in ha_reprex_df %>% mutate(ha_rescue = if_else(new_drug_1 %in% wordlist | : could not find function "%>%"
Read the guide more carefully, a reprex must be self contained so it has to include library calls and creation of the sample data on the code itself, regardless if the data exist in your current environment or not (code being reprexed is run on an independent clean R session)
So the point of this is to find the error in my coding, but what I did in the reprex works. When I convert my reprex friendly code to what I was trying to do before (on the original dataset with all the variables I will need), still no success, but I can't share that reprex since the original dataset has 102 columns and 6000 rows. I'm proud of myself for getting my reprex to work but I feel like I'm where I started?
Try to find a subset of your data that reproduces the issue, or you could share a link (Dropbox, Google Drive, Box, etc) to your actual dataset so we can take a look.
I'll keep at it, it has patient sensitive information in it so I can't share it, even if I delete the sensitive stuff (knowing me, I'd still find a way to screw that up). While I need to get this done, can't lose my job over it...
So I figured it out! I was expecting a new variable to pop up on top and it did not. So I assigned a new name to the code you gave me and can pull it up that way. Thank you all for your help, especially on Christmas Eve / Christmas Day :).