How to rename specific recorded observation values in a single variable

Trying to rename recorded observation values for a specific variable: "B36". The data is a household survey and in the specific question/variable B36 asks respondents to record their answer as one of eight numeric values (1,2,3,4,5,6,7,99). I want to correlate these values with the actual name of the agency associated with each number. Obviously I can find alot of stuff online about renaming whole variables, but not so much for observation values.

Here is the whole code I was running - and I can get everything to work except when I plug in the mutate function, which is probably wrong.

 HND_plotlevel_Sec_B_df |>
  mutate(B36 = recode(B36, 1 = "Cosecha Anterior/Previous Harvest", 
                      2 = "Otro Agricultor/Other Farmer", 
                      3 = "Comprada como Grano/Purchased as Grain", 
                      4 = "Semilla Envasada Comprada/Purchased Packaged Seed", 
                      5 = "Recibido de ONG", 6 =  "DICTA/SAG", 7 = "CIAL",
                      99 = "Otra/Other")),
  ggplot(aes(as.factor(B36))) +
  geom_bar() +
  labs(x="Fuente de Semilla/Seed Source", 
       y="Productores Individuales/Individual Producers") +

You are missing an opening " at 6 = DICTA/SAG".

Yes, thank you for identifying that. I reran the code with that fix and still nothing. This is what the error message/code says:

Error: unexpected '=' in:
"HND_plotlevel_Sec_B_df |>
mutate(B36 = recode(B36, 1 ="

I'm thinking that the mutate function is wrong for my purposes here?

It looks like mutate should work. :confused:

An alternative, slightly drastic, approach would be to switch to {data.table}.

I think this will work.

DT  <- HND_plotlevel_Sec_B_df)

DT[,B36 := fcase(B36 == 1, "Cosecha Anterior/Previous Harvest", 
B36 == 2,  "Otro Agricultor/Other Farmer", 
B36 == 3, "Comprada como Grano/Purchased as Grain", 
B36 == 4, "Semilla Envasada Comprada/Purchased Packaged Seed", 
B36 == 5,  "Recibido de ONG", 
B36 == 6,  "DICTA/SAG", 
B36 == 7, "CIAL",
B36 == 99,"Otra/Other")]

Hey thank you so much for this! I super appreciate your help. It really sheds a great light on things. In particular, I didn't know about the data.table package, and I can see that this is really a better fit since my data is so relatively big.

I plugged in all of what you advised

DT |>
  ggplot(aes(as.factor(B36))) +
  geom_bar() +

Please see how the labels are right on top of the axis. Do you have any idea how to get them to wrap below the axis (at the 45 degree angle we're telling it to)

Off the top of my head no. My guess is that the labels are just too long but I am not sure.

However I have the feeling that {data.table} may not always play well with {ggplot2} so you might want to convert DT into a data.frame. The easiest way to do this would be


Otherwise can you give us some sample data?
A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between


Do you need vertical bars? If you flipped the plot so the names are on the vertical axis you might be okay

I "think" this will work

DT |>
  ggplot(aes(as.factor(B36))) +
  geom_bar() + coord_flip()

I thought there were better ways and there are. Try these:

 p  <-  ggplot(DT, aes(as.factor(B36))) 
 p +  geom_bar() + scale_x_discrete(labels = scales::label_wrap(10))
 p + geom_bar() + coord_flip() +  scale_x_discrete(labels = scales::label_wrap(10))

When I typed in the dput() code that you suggested, it gave an output that was way to big. The data is a household survey about Honduran small farmer bean producers, so there are like a hundred variables and 500 observations. But I typed in this code below instead, and it produced these results related to the B36 variable we are investigating.

> dput(HND_plotlevel_Sec_B_df[1:10, c(36) ])  
c(3L, 1L, 2L, 1L, 3L, 99L, 1L, 1L, 2L, 1L)

Is that what you were thinking? I'm happy to redo it. I really appreciate your interest and assistance!

Yes, this is all awesome. You're right - great intuition. There was no reasoning why the bar chart needed to be vert. Horizontal is kinda better given that the variable names are long. This all becomes so clean and elegant, I really appreciate it. I'm so grateful for your insights!!! Here is what the chart is looking like now:

Thank you so much for introducing me to the scales package, and even giving me a few examples to compare the syntax and output. I am spending a few hours researching this and trying to learn a few things, so thank you for this direction. One thing I cannot decipher though is (going back to the horizontal bars with the bar/factor names wrapped horizontally), what are you telling the code with the 10 part of the code:.

scale_x_discrete(labels = scales::label_wrap(10))

Well, I'm still trying to figure it out but it seems to be setting some kind of character length. I have not read enough of the {scales} documentation to really get a feel for it. It is not seem to be setting some kind of absolute length which is what I first thought it was doing.

I hear you. Ok. Thanks again so much! :v:

Hey, one last question. So I tried to recreate the steps you did for a new question - to alter the observation name in a new dataframe and then run the plot on the revised data table (this time just answered as either "1" or "2"). And it seems to me that I followed your steps to create the new table (DV). And its even simpler this time - only two observation answers to convert. But when I ran the code, it told me "error in DV". It created the new data table "DV," but it failed to convert the values in X7 from "1" and "2" into their new names for the new table. And the error code says that its reading the DV as a function... right? Any idea why it wants to read DV as a function and not a table, even though the second line of code clearly creates the table, and I can confirm that (its in the environment pane right under "DT", just without the obs values changed on X7)

> library(data.table)
> DV  <-
> DV(,X7 := fcase(X7 == 1, "CIAL/DICTA", 
+                 X7 == 2, "Otra/Other",))
Error in DV(, `:=`(X7, fcase(X7 == 1, "CIAL/DICTA", X7 == 2, "Otra/Other",  : 
  could not find function "DV"

Data.table has weird syntax compared to much of R.

Not tested but I think you want

DV[   ,X7 := fcase(X7 == 1, "CIAL/DICTA", 
                  X7 == 2, "Otra/Other"))] 

The expression must begin and end with square brackets

DV[           ] 

Also You had an extra comma in there.

You might find A data.table and dplyr tour useful.

Gotcha, ok, ya - in fact, I had originally used the brackets, but then just started playing around with it when it gave me an error and that was when I switched it over. But I'll switch it back. In fact, lets switch the data set around so this is alot more straightforward.

DQ <-
DQ [ ,cyl  :=  fcase(cyl == 4, "Four Cylinder",
                                     cyl == 6, "Six Cylinder",
                                     cyl == 8, "Eight Cylinder",    )  ]
Error in fcase(cyl == 4, "Four Cylinder", cyl == 6, "Six Cylinder", cyl ==  : 
  Received 7 inputs; please supply an even number of arguments in ..., consisting of logical condition, resulting value pairs (in that order). Note that the default argument must be named explicitly, e.g., default=0

Its wierd that its error is saying its seeing a seventh input. I'm seeing only six inputs: logical condition 1/valued pairing1/ logical condition 2/valued pairing2/ logical condition 3/valued pairing3. There are no other input options in the mtcars data set for that "cyl" variable besides those three numeric values (ie: no default "99" or "0") - at least not that I can see.

Trailing comma.

DQ <-
DQ [ ,cyl  :=  fcase(cyl == 4, "Four Cylinder",
                     cyl == 6, "Six Cylinder",
                     cyl == 8, "Eight Cylinder"    )  ]

Omg you got it. That had me running around for like hours. That was the reason I switched the square brackets in data.table over parenthesis - just pulling random levers lol. THANKS!!

And then to get it on the bar plot would you do this:

DQ <-
DQ [ ,cyl  :=  fcase(cyl == 4, "Four Cylinder",
                     cyl == 6, "Six Cylinder",
                     cyl == 8, "Eight Cylinder")]|>
  ggplot(aes(as.factor(cyl))) +

It looks fine to me. As far as I can figure out one cannot chain a data.table directly into ggplot2 using the data.table method so a pipe seems like the best method.

The alternative is :

ggplot(DQ, ggplot(aes(as.factor(cyl))) +