Recoding multiple integer variables in one go

Hi, I have my data frame with multiple int variables (B1, C1, D1 etc) and they contain numbers from 1 to 10 (and NAs). Is any way of multiplying all of the responses by 10 in one go? 1s should become 10s, 2s-20s...10s-100s.

I also need to recode all of these int variables into new three level variables (B13L, C13L, D13L) where 1-6 (or 10 to 60 after recoding) =1, 7-8 (or 70-80)=2 and 9=10 (90-100)=3.

How can I do it in one go?

Can you help?

Hi @Slavek,

The mutate_if() and mutate_at() functions—part of dplyr—can help you here! They both allow you to modify a number of columns in the same way, but they allow you to choose the target columns differently: mutate_if() lets you target columns that pass a test (called a predicate function), while mutate_at() lets you specify the columns (based on the their name) using a collection of helper functions. For example:

# mutate all the numeric functions
df %>% mutate_if(is.numeric, ~ .x * 10)

# mutate all the columns whose names end in '3L'
df %>% mutate_at(vars(ends_with('3L')), ~ .x * 10)

Do those examples work for you?

1 Like

Thank you very much. That is an excellent solution. I have tried that for B1, C1 etc. using

df %>% mutate_at(vars(ends_with('1')), ~ .x * 10)

and it worked (king of) but results were not saved. I found it because I realised that one variable ending with 2 did not change:

A1 A2 B1 C1
90 9 90 90

I used this code:

df %>% mutate_at(vars(ends_with('2')), ~ .x * 10)

and I can see that only this variable changed this time but others are unchanged:

A1 A2 B1 C1
9 90 9 9

How can I fix it? I need all of them to be recoded and saved in the data frame.

I also need to recode all of these int variables into new three level variables (B13L, C13L, D13L) where 1-6 (or 10 to 60 after recoding) =1, 7-8 (or 70-80)=2 and 9=10 (90-100)=3.

How can I do it in one go?

library(readxl)
#> Warning: package 'readxl' was built under R version 3.4.4
df <- read_excel("C:/Users/sdanilowicz/Documents/Test File for recodes.xlsx", sheet = "Sales")
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.4
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>% mutate_at(vars(ends_with('1')), ~ .x * 10)
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5   100   100   100   100   100   100
#>  2 9ES0~ 2017-10-17 16:12:00    10   100   100   100   100   100   100
#>  3 9ES0~ 2017-11-02 12:46:00    10   100    10   100    80   100    70
#>  4 9ES0~ 2018-11-09 09:53:00     2   100    40   100   100   100    60
#>  5 9ES0~ 2018-11-19 08:19:00     4   100   100   100   100   100   100
#>  6 9ES0~ 2018-11-20 14:58:00     4    50   100   100    70   100   100
#>  7 9ES0~ 2018-12-07 19:20:00     1    70   100   100   100   100   100
#>  8 9ES0~ 2018-12-10 07:35:00     1   100   100   100   100   100   100
#>  9 9ES0~ 2018-12-14 13:52:00     3   100   100     0   100    90   100
#> 10 9ES0~ 2018-12-17 14:22:00    10   100    90    90   100    90     0
#> 11 9ES0~ 2018-12-23 07:20:00     1   100   100     0    60   100   100
#> 12 9ES0~ 2018-12-19 06:19:00     1   100   100   100   100    40   100
#> 13 9ES0~ 2018-12-27 18:13:00     2   100   100   100   100   100   100
#> 14 9ES0~ 2018-12-31 09:51:00     5   100   100   100   100   100   100
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

df %>% mutate_at(vars(ends_with('2')), ~ .x * 10)
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00    50    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00   100    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00   100    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00    20    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00    40    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00    40     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00    10     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00    10    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00    30    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00   100    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00    10    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00    10    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00    20    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00    50    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

head(df)
#> # A tibble: 6 x 11
#>   URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1    H1
#>   <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10    10
#> 2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10    10
#> 3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7     3
#> 4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6    10
#> 5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10    10
#> 6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10    10
#> # ... with 1 more variable: I1 <dbl>

reprex()
#> Error in reprex(): could not find function "reprex"

Created on 2019-01-25 by the reprex package (v0.2.1)

1 Like

I know I can recode variables one by one using car library

library(readxl)
#> Warning: package 'readxl' was built under R version 3.4.4
df <- read_excel("C:/Users/sdanilowicz/Documents/Test File for recodes.xlsx", sheet = "Sales")
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>


library(car)
#> Warning: package 'car' was built under R version 3.4.4
#> Loading required package: carData
#> Warning: package 'carData' was built under R version 3.4.4
df$A2<-recode(df$A2,"1=10;2=20;3=30;4=40;5=50;6=60;7=70;8=80;9=90;10=100")
df$B1<-recode(df$B1,"1=10;2=20;3=30;4=40;5=50;6=60;7=70;8=80;9=90;10=100")
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00    50   100    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00   100   100    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00   100   100     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00    20   100     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00    40   100    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00    40    50    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00    10    70    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00    10   100    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00    30   100    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00   100   100     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00    10   100    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00    10   100    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00    20   100    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00    50   100    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

head(df)
#> # A tibble: 6 x 11
#>   URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1    H1
#>   <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 9ES0~ 2017-10-15 08:42:00    50   100    10    10    10    10    10    10
#> 2 9ES0~ 2017-10-17 16:12:00   100   100    10    10    10    10    10    10
#> 3 9ES0~ 2017-11-02 12:46:00   100   100     1    10     8    10     7     3
#> 4 9ES0~ 2018-11-09 09:53:00    20   100     4    10    10    10     6    10
#> 5 9ES0~ 2018-11-19 08:19:00    40   100    10    10    10    10    10    10
#> 6 9ES0~ 2018-11-20 14:58:00    40    50    10    10     7    10    10    10
#> # ... with 1 more variable: I1 <dbl>

reprex()
#> Error in reprex(): could not find function "reprex"

Created on 2019-01-25 by the reprex package (v0.2.1)

but hat is not the point.
I would like to apply this recoding to all int variables or specified variables.

A sidenote here. I see you trying to create reprexes all around the site, but none of them are actually reprexes.
The main issue seems to be that you don't even load the package. As a result, take a look at your last line of pasted data, it says: Error in reprex(): could not find function "reprex".

What you're providing here (and elsewhere) is the copy of your console output. We can look at it to make judgments, but it doesn't help us answer your question any better, as we can't take your code, add a couple of lines and give it back to you thus solving your problem: we only can talk about your problem in abstract terms.

2 Likes

just to build on the last comment, the main limitation to helping you is that you are not providing sample data in a suitable format, you have already used datapasta on a previous topic, try to use it again.

3 Likes

Re reprex

I 2nd (or 3rd) the reprex request. It'll make the life much easier for people who want to help you.


Re:

the line ...mutate_at(vars(ends_with('2')), ~ .x * 10)... will only apply a function to any variable in your table that ends with '2'. Since A1, B1, C1 that seems to be working as expected.

Similar logic for vars(ends_with('2')

Note also that you can pipe together multiple mutate_iffunction calls together to have a single code block that sort your various issues.

for example, something like

library(dplyr)
df <- tribble(
  ~colA, ~colB, ~A2,
  "a",   1,     3,
  "b",   2,     5,
  "c",   3,     6
)
df <- df %>% 
  mutate_at(vars(ends_with('B')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('2')), ~ .x * 10)

  df
#> # A tibble: 3 x 3
#>   colA   colB    A2
#>   <chr> <dbl> <dbl>
#> 1 a        10    30
#> 2 b        20    50
#> 3 c        30    60

Created on 2019-01-25 by the reprex package (v0.2.1)

2 Likes

That is absolutely brilliant!!! Thank you!
I managed to use this code to recode my variables

library(dplyr)

df %>% mutate_if(is.numeric, ~ .x * 10)
df <- df %>% 
 mutate_at(vars(starts_with('A')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('B')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('C')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('D')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('E')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('F')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('G')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('2')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('3')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('4')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('5')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('6')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('7')), ~ .x * 10)

Now, I would like to recode all values of -10 to be NAs. I know I could do it one buy one using a code like this:

df$A1[df$A1==-10] <- NA
df$A2[df$A2==-10] <- NA
df$B1[df$B1==-10] <- NA
...
df$D7[df$D7==-10] <- NA

but I hope that I could use mutate-if function again somehow because recoding applies to all the variables above.

Finally, I would like to do a clever recoding into new variables.
Is it possible to recode all numeric variables above into new variables with following logic: 10-60=0, 70-80=50 and 90-100=100? I need new set of recoded variables added to my df with _3 indicator like B2_3, C2_3, D3_3 as they represent 3 levels of responses (1=0, 2=50 and 3=100).

How can I do that?

Ooops. I can see that mutate function does not change the data permanently. Would transform function better? I don't know how to convert mutate into transform but I need to have these changes saved in the df...

This begs for a function to be written on top of it, BTW.

1 Like

What do you mean? Can you elaborate please?

Some regular expressions could help too, but it's hard to give more specific answers with out sample data.

df <- df %>% 
    mutate_at(vars(matches("[[:alpha:]]+[[:digit:]]+")), ~ .x * 10)
1 Like

Well, my understanding of mutate is that this function changes values only temporarly because when I use summary() or head() or anything like that after using this function, I still can see initial, unchanged numbers.
I need to recode values of these variables permanently so when I run summary() before the recoding I can see Min:1, Max: 10. After the operation I should see Min:10, Max:100 for all recoded variables...

I would like to avoid using this:

library(car)
df$A2<-recode(df$A2,"1=10;2=20;3=30;4=40;5=50;6=60;7=70;8=80;9=90;10=100")

prepared for each variable manually...

In general all dplyr functions don't modify dataframes in place, they create a new modified dataframe instead, if you want your changes to persist, then you need to explicitly assign them to the original dataframe (or a new one) with the assign operator <-

library(dplyr)

# This is made up data, you have to use your real data instead
df <- tribble(
    ~colA, ~A1, ~B2,
    "a",   1,     3,
    "b",   2,     5,
    "c",   3,     6
)
df <- df %>% 
    mutate_at(vars(matches("[[:alpha:]]+[[:digit:]]+")), ~ .x * 10)
df
#> # A tibble: 3 x 3
#>   colA     A1    B2
#>   <chr> <dbl> <dbl>
#> 1 a        10    30
#> 2 b        20    50
#> 3 c        30    60
3 Likes

Thank you very much for your help! :+1:

1 Like

Sorry I wasn't able to loop back to help you, @Slavek! Thesis submission looms :sob: I'm glad @andresrcs and @taras were able to get you over the line!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.