help using pmap and mutate

Hello, people
I'm trying to use pmap.
I can write some simple map or map_df function.
I just want to use the mutate function in order to convert some variables to factor.
I have each variable's level/label stored.

This is how I do it step by step. As you will see, I'm coding 3 times the same verb and only changing the vector/list as input in mutate:

library(tidyverse)
set.seed(753963)
N  <- 1000
enquete  <- data.frame( 
                  country = sample(1:4, size = N, replace = TRUE), 
                  sex = sample(1:3, size = N, replace = TRUE), 
                  code = sample(1:3, size = N, replace = TRUE)) 

enquete


country_lab<-c("Chad","Nigeria","Egypt","Senegal")
country_lev<-c(1:4)


sex_lab<-c("man","woman","no binary")
sex_lev<-c(1:3)

code_lab<-c("work","stud","ret")
code_lev<-c(1:3)


enquete %>% 
  mutate( 
    country=factor(country,country_lev,country_lab),
    sex=factor(sex,sex_lev,sex_lab),
    code=factor(code,code_lev,code_lab)
        )



So, I thought using pmap because I have a list of 3 variables, a list of 3 variable's levels, and 3 variable's labels.
I am printing the first variable and his levels/labels just to ilustrate.

levels_<-list(country_lev,sex_lev,code_lev )
labels_<-list(country_lab,sex_lab,code_lab )
variables_<-c("country","sex","code")

variables_[1]
levels_[1]
labels_[1]

pmap(list(variables_,levels_,labels_), 
     .f=function(variables_, levels_, labels_)
     {enquete %>% mutate(variables_=factor(variables_,levels =levels_,labels = labels_)  )   }
     )

When I run the pmap lines, I notice that no change is applied.
I think I am not too far away from the solution.

As always, thanks for your time and interest.
Have a nice weekend.

I think this gets you what you want. The key is that a data frame is a list with some extra features. The map() functions will iterate over a data frame, acting on each column in turn.

library(tidyverse)
set.seed(753963)
N  <- 1000
enquete  <- data.frame( 
  country = sample(1:4, size = N, replace = TRUE), 
  sex = sample(1:3, size = N, replace = TRUE), 
  code = sample(1:3, size = N, replace = TRUE)) 

country_lab<-c("Chad","Nigeria","Egypt","Senegal")
country_lev<-c(1:4)


sex_lab<-c("man","woman","no binary")
sex_lev<-c(1:3)

code_lab<-c("work","stud","ret")
code_lev<-c(1:3)

levels_<-list(country_lev,sex_lev,code_lev )
labels_<-list(country_lab,sex_lab,code_lab )

NewDF <- pmap_dfc(list(enquete, levels_,labels_), 
     .f=function(Col, lev, lab) {factor(Col,levels =lev,labels = lab) }
)
str(NewDF)
#> tibble [1,000 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ country: Factor w/ 4 levels "Chad","Nigeria",..: 1 3 3 4 4 1 3 4 4 3 ...
#>  $ sex    : Factor w/ 3 levels "man","woman",..: 3 3 3 1 1 1 3 1 1 1 ...
#>  $ code   : Factor w/ 3 levels "work","stud",..: 2 2 3 3 3 2 3 2 2 3 ...

Created on 2023-04-07 with reprex v2.0.2

Thanks. The code is very fast and does what I needed It in the example data.
I don't know why you didn't declare the variables to mutate. Again, Im not a purr expert, and a total newbie with pmap.
I added the variable city:

enquete  <- data.frame( 
                  country = sample(1:4, size = N, replace = TRUE), 
                  sex = sample(1:3, size = N, replace = TRUE), 
                  city = sample(1:3, size = N, replace = TRUE), 
                  code = sample(1:3, size = N, replace = TRUE)) 

And I ran the code to test It. It failed.
I'll try to adapt It in order to declare the variables to process.
One last doubt...your code has 3 objects and they are the data frame, labels, and levels?
Thanks, FJCC. Nice code/work.
Have a nice week.

One reason I did not use mutate() is that it returns an entire tibble (data.frame). As pmap iterates over the levels and labels you don't want a tibble returned at each step, you want a column returned.

The code should be adaptable to four columns. If you get stuck, post your code and we will figure it out.

Yes, my code has three objects and they are the data frame, labels, and levels.

I added city. City has no factors and I don't want to process It.
So, I would like to declare as list the columns I want to process inside pmap_df.

 enquete2  <- data.frame( 
   country = sample(1:4, size = N, replace = TRUE), 
   sex = sample(1:3, size = N, replace = TRUE), 
   city = sample(1:3, size = N, replace = TRUE), 
   code = sample(1:3, size = N, replace = TRUE)) 
 
levels_<-list(country_lev,sex_lev,code_lev )
labels_<-list(country_lab,sex_lab,code_lab )
variables_<-list('country','sex','code')
   
 pmap_dfc(list(enquete2, variables_,levels_,labels_), 
          .f=function(variables_, lev, lab) {factor(Col,levels =lev,labels = lab) }
 )
  

As you can see, variables_ is a list with the columns I want to "mutate".
This is the error message

Error in `pmap()`:
! Can't recycle `.l[[1]]` (size 4) to match `.l[[2]]` (size 3).
Run `rlang::last_trace()` to see where the error occurred.

It's a issue with the dimension of enquete2, 4 columns. And the remaining lists inside pmap have 3 elements.

I think one way to solve it is just to create the city column, use selec(-city) and store the data frame as enquete2_alone, and run the former code over enquete2_alone and saving the data frama as enquete_avec.
After that, using bind_cols would fix this issue.

bind_cols(city, enquete2_avec) would be fine.
I'm not sure what is more efficient now.

I would skip one or more columns by putting an if() in the function inside of pmap and use a special value in the levels_ and labels_ vectors to mark those columns.

enquete2  <- data.frame( 
  country = sample(1:4, size = N, replace = TRUE), 
  sex = sample(1:3, size = N, replace = TRUE), 
  city = sample(1:3, size = N, replace = TRUE), 
  code = sample(1:3, size = N, replace = TRUE)) 

levels_<-list(country_lev,sex_lev, "None", code_lev )
labels_<-list(country_lab,sex_lab,"None", code_lab )

OUT2 <- pmap_dfc(list(enquete2, levels_,labels_), 
         .f=function(Col, lev, lab) {
           if(lev[1] == "None") Col else factor(Col,levels =lev,labels = lab) }
)
1 Like

Heres a way to achieve the same result with across() rather than map()
I'm relying on structuring the lookup information in a particular way; and making use of the name attributes as well.

library(tidyverse)
N <- 20
enquete2  <- data.frame( 
  country = sample(1:4, size = N, replace = TRUE), 
  sex = sample(1:3, size = N, replace = TRUE), 
  city = sample(1:3, size = N, replace = TRUE), 
  code = sample(1:3, size = N, replace = TRUE)) 

variables_ <- list(
  "country" = structure(1:4, names = c("Chad", "Nigeria", "Egypt", "Senegal")),
  "sex" = structure(1:3, names = c("man", "woman", "no binary")),
  "code" = structure(1:3, names = c("work", "stud", "ret"))
)

mutate(enquete2,
       across(.cols = names(variables_),
              .fns = \(x){
                factor(x,
                       levels = variables_[[cur_column()]],
                       labels = names(variables_[[cur_column()]]))}
              )) |> tibble()
2 Likes