Renaming function issue

Slavek · November 17, 2021, 11:49am

Hi,
I have this simple df:

source <- data.frame(
  stringsAsFactors = FALSE,
  URN = c("aaa","bbb","ccc",
          "ddd","eee","fff","ggg","hhh"),
  Name = c("xxx","xxx","yyy",
           "yyy","yyy","zzzz","abcde","zzzz"),
  A1 = c("None.",NA,
             "No comments related to this exercise","Na",
             "N/A","Interesting comment","abc", "whatever is fine"),
  A2 = c("Nothing",
             "I have nothing in common","NA",NA,
             "Another comment","....?","xxxx", "All fine"),
  B1 = c("Service","All good",
             "aa"," I don't know",
             "The final comment about that","Nothing.","na","Everything"),
  B2 = c("aaa","Nothing ",
             "None","My final comments are ok", "I don't know ",
             "Nothing.","Another comment","really"),
  Q4 = c(2019,2020,2020,2019,
         2020,2021,2021,2019)
)

where I applied this function:

renamed <- rename_with(source,
            .fn=~paste0("Comm",.),
            .cols=where(~is.character(.x) & 
                          any(nchar(.x) > 15)))
renamed

It works well so now I am trying to apply it to my real, large df (to large to be here) with following variables:

'data.frame':	27641 obs. of  131 variables:
 $ BranchID                       : int  35049916 35049916 35049916 35049916 35049916 35049916 35049916 35049916 35049916 35049916 ...
 $ Country_Name                   : chr  "xxx" "xxx" "xxx" "xxx" ...
 $ RegistrationDate               : POSIXct, format: NA NA NA NA ...
 $ Event_Date                     : POSIXct, format: "2021-06-29" "2021-08-04" "2021-07-30" "2021-07-28" ...
 $ InterviewDate                  : Date, format: "2021-07-14" "2021-08-31" "2021-08-14" "2021-08-13" ...
 $ ModelCode                      : logi  NA NA NA NA NA NA ...
 $ Data_Type_ID                   : Factor w/ 4 levels "xxx","yyy",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ VehicleTypeID                  : int  1 NA 1 1 2 1 2 1 1 1 ...
 $ URN                            : chr  "21GB017097" "21GB00110440" "21GB00109687" "21GB0326133" ...
 $ QA1                            : int  10 10 10 10 10 10 10 10 10 10 ...
 $ QA31                           : int  10 10 10 10 10 10 10 10 10 10 ...
 $ QN3a                           : chr  "Everything explained well, from start to finish was a good experience " NA NA "Helpful advised on model and wating times " ...
 $ QN5a                           : chr  NA NA NA "Time it took to hand over " ...
 $ QN5b                           : chr  NA NA NA NA ...
 $ QN3b                           : chr  NA NA NA NA ...
 $ QF70                           : int  10 10 10 10 8 10 10 10 10 10 ...
 $ Q11                            : Factor w/ 4 levels "Yes, it was offered spontaneously",..: 1 4 2 1 4 1 4 1 1 1 ...
 $ QF71                           : chr  NA NA NA NA ...

and I can see this error:

Error: 'nchar()' requires a character vector

What can I change in my code? Can you help?

nirgrahamuk · November 17, 2021, 12:26pm

I can reproduce your error by converting one of your variables to factor type, I can then avoid your error by changing the single ampersand to a double ampersand (thereby shortcutting the logic check)

rename_with(source %>% mutate(B3=factor(B2)),
                       .fn=~paste0("Comm",.),
                       .cols=where(~is.character(.x) & 
                                       any(nchar(.x) > 15)))

rename_with(source %>% mutate(B3=factor(B2)),
                       .fn=~paste0("Comm",.),
                       .cols=where(~is.character(.x) && 
                                       any(nchar(.x) > 15)))

Slavek · November 17, 2021, 12:58pm

Thank you but now, I can see

Error: `where()` must be used with functions that return `TRUE` or `FALSE`.

I thought that the function selects character variables with long content only:

is.character(.x) &  any(nchar(.x) > 15

I don't know why it is not working on my large data with the variables listed...

nirgrahamuk · November 17, 2021, 2:23pm

either
a) consider again how you might provide a representative example dataset. (perhaps by dplyr::slice)
b) run your criteria against all columns and see if there are any that dont result in either TRUE or FALSE results

mutate(source,across(.cols=everything(),.fns = ~is.character(.x) && 
                         any(nchar(.x) > 15))) %>% distinct()

Slavek · November 17, 2021, 4:19pm

I think there is an easier solution but I cannot remember how it can be done.
I know we can easily create a new df where we select only string variables from the original one.
Then I could apply my code. Do you know how we could do that?

Slavek · November 18, 2021, 9:50pm

Ok, I selected string variables only using my original source file and:

library(dplyr)
source <- select_if(original.file, is.character)

and thirst records are following:

source <- data.frame(
           stringsAsFactors = FALSE,
               Country_Name = c("United Kingdom","United Kingdom","United Kingdom",
                                "United Kingdom"),
                    OrgCode = c("00007P", "GBC500", "GBC210", "01519Q"),
                        VIN = c("VF12S8763",
                                "W0VM1000","WVZ604259","VF3CW0497"),
                      Model = c("dfdfsdf", "sdv", "sdv", "sdsasd"),
                 FamilyName = c("abd", "fsf", "sfd", "sd"),
             CommercialName = c(NA, "E2JO", NA, NA),
                 EngineType = c("Other", "Other", "Other", "Other"),
                  ModelName = c("aaa", "bbb", "ccc", "ddd"),
                        URN = c("21G9234901025","21G9100813605","21GB021349733",
                                "21GB012802731"),
                       QN3a = c(NA,NA,
                                "Your sales representative, was most helpful",
                                "No pressure from Salesmen.  Safe environment."),
                       QN5a = c(NA, NA, "Nothing to dislike.", "None"),
                       QN5b = c(NA,
                                "When I went to pick up my new car it had a dent on the rear panel",NA,NA),
                       QN3b = c(NA,
                                "Not much really what should have been a good experience for me",NA,NA),
                        QF3 = c(NA,
                                "They were friendly enough but the problems overshadowed everything",NA,NA),
                        QF5 = c(NA,
                                "Again the damage in the car overshadowed anything that was said by the staff",NA,NA),
                     QF11_1 = c(NA, "Damaged from my first view", NA, NA),
                       QF13 = c(NA,
                                "They missed the damage in both occasions",NA,NA),
                        QG2 = c(NA, "Only a brief overview was given.", NA, NA),
                    OrgName = c("aaa", "bbb", "ccc", "ddd"),
   ParentBranch_Description = c("104", "5", "4", "102"),
                                  Parent_Branch_Description = c("SALES REGION : NORTH","2","1","SALES REGION 1: NORT",
                                NA)
)


renamed <- rename_with(source,
                       .fn=~paste0("Comm",.),
                       .cols=where(~is.character(.x) &&
                                     any(nchar(.x) > 10)))

unfortunately, this error appears after remaning:


Error: `where()` must be used with functions that return `TRUE` or `FALSE`.
Run `rlang::last_error()` to see where the error occurred.

What am I doing wrong?

system · December 9, 2021, 9:51pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.