Recoding multiple integer variables in one go

Slavek · January 25, 2019, 11:46am

Hi, I have my data frame with multiple int variables (B1, C1, D1 etc) and they contain numbers from 1 to 10 (and NAs). Is any way of multiplying all of the responses by 10 in one go? 1s should become 10s, 2s-20s...10s-100s.

I also need to recode all of these int variables into new three level variables (B13L, C13L, D13L) where 1-6 (or 10 to 60 after recoding) =1, 7-8 (or 70-80)=2 and 9=10 (90-100)=3.

How can I do it in one go?

Can you help?

rensa · January 25, 2019, 12:29pm

Hi @Slavek,

The mutate_if() and mutate_at() functions—part of dplyr—can help you here! They both allow you to modify a number of columns in the same way, but they allow you to choose the target columns differently: mutate_if() lets you target columns that pass a test (called a predicate function), while mutate_at() lets you specify the columns (based on the their name) using a collection of helper functions. For example:

# mutate all the numeric functions
df %>% mutate_if(is.numeric, ~ .x * 10)

# mutate all the columns whose names end in '3L'
df %>% mutate_at(vars(ends_with('3L')), ~ .x * 10)

Do those examples work for you?

Slavek · January 25, 2019, 12:47pm

Thank you very much. That is an excellent solution. I have tried that for B1, C1 etc. using

df %>% mutate_at(vars(ends_with('1')), ~ .x * 10)

and it worked (king of) but results were not saved. I found it because I realised that one variable ending with 2 did not change:

A1 A2 B1 C1
90 9 90 90

I used this code:

df %>% mutate_at(vars(ends_with('2')), ~ .x * 10)

and I can see that only this variable changed this time but others are unchanged:

A1 A2 B1 C1
9 90 9 9

How can I fix it? I need all of them to be recoded and saved in the data frame.

I also need to recode all of these int variables into new three level variables (B13L, C13L, D13L) where 1-6 (or 10 to 60 after recoding) =1, 7-8 (or 70-80)=2 and 9=10 (90-100)=3.

How can I do it in one go?

Slavek · January 25, 2019, 5:21pm

library(readxl)
#> Warning: package 'readxl' was built under R version 3.4.4
df <- read_excel("C:/Users/sdanilowicz/Documents/Test File for recodes.xlsx", sheet = "Sales")
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.4
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>% mutate_at(vars(ends_with('1')), ~ .x * 10)
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5   100   100   100   100   100   100
#>  2 9ES0~ 2017-10-17 16:12:00    10   100   100   100   100   100   100
#>  3 9ES0~ 2017-11-02 12:46:00    10   100    10   100    80   100    70
#>  4 9ES0~ 2018-11-09 09:53:00     2   100    40   100   100   100    60
#>  5 9ES0~ 2018-11-19 08:19:00     4   100   100   100   100   100   100
#>  6 9ES0~ 2018-11-20 14:58:00     4    50   100   100    70   100   100
#>  7 9ES0~ 2018-12-07 19:20:00     1    70   100   100   100   100   100
#>  8 9ES0~ 2018-12-10 07:35:00     1   100   100   100   100   100   100
#>  9 9ES0~ 2018-12-14 13:52:00     3   100   100     0   100    90   100
#> 10 9ES0~ 2018-12-17 14:22:00    10   100    90    90   100    90     0
#> 11 9ES0~ 2018-12-23 07:20:00     1   100   100     0    60   100   100
#> 12 9ES0~ 2018-12-19 06:19:00     1   100   100   100   100    40   100
#> 13 9ES0~ 2018-12-27 18:13:00     2   100   100   100   100   100   100
#> 14 9ES0~ 2018-12-31 09:51:00     5   100   100   100   100   100   100
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

df %>% mutate_at(vars(ends_with('2')), ~ .x * 10)
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00    50    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00   100    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00   100    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00    20    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00    40    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00    40     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00    10     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00    10    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00    30    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00   100    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00    10    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00    10    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00    20    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00    50    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

head(df)
#> # A tibble: 6 x 11
#>   URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1    H1
#>   <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10    10
#> 2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10    10
#> 3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7     3
#> 4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6    10
#> 5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10    10
#> 6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10    10
#> # ... with 1 more variable: I1 <dbl>

reprex()
#> Error in reprex(): could not find function "reprex"

Created on 2019-01-25 by the reprex package (v0.2.1)

Slavek · January 25, 2019, 5:39pm

I know I can recode variables one by one using car library

library(readxl)
#> Warning: package 'readxl' was built under R version 3.4.4
df <- read_excel("C:/Users/sdanilowicz/Documents/Test File for recodes.xlsx", sheet = "Sales")
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00     5    10    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00    10    10    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00    10    10     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00     2    10     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00     4    10    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00     4     5    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00     1     7    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00     1    10    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00     3    10    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00    10    10     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00     1    10    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00     1    10    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00     2    10    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00     5    10    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>


library(car)
#> Warning: package 'car' was built under R version 3.4.4
#> Loading required package: carData
#> Warning: package 'carData' was built under R version 3.4.4
df$A2<-recode(df$A2,"1=10;2=20;3=30;4=40;5=50;6=60;7=70;8=80;9=90;10=100")
df$B1<-recode(df$B1,"1=10;2=20;3=30;4=40;5=50;6=60;7=70;8=80;9=90;10=100")
df
#> # A tibble: 14 x 11
#>    URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 9ES0~ 2017-10-15 08:42:00    50   100    10    10    10    10    10
#>  2 9ES0~ 2017-10-17 16:12:00   100   100    10    10    10    10    10
#>  3 9ES0~ 2017-11-02 12:46:00   100   100     1    10     8    10     7
#>  4 9ES0~ 2018-11-09 09:53:00    20   100     4    10    10    10     6
#>  5 9ES0~ 2018-11-19 08:19:00    40   100    10    10    10    10    10
#>  6 9ES0~ 2018-11-20 14:58:00    40    50    10    10     7    10    10
#>  7 9ES0~ 2018-12-07 19:20:00    10    70    10    10    10    10    10
#>  8 9ES0~ 2018-12-10 07:35:00    10   100    10    10    10    10    10
#>  9 9ES0~ 2018-12-14 13:52:00    30   100    10     0    10     9    10
#> 10 9ES0~ 2018-12-17 14:22:00   100   100     9     9    10     9     0
#> 11 9ES0~ 2018-12-23 07:20:00    10   100    10     0     6    10    10
#> 12 9ES0~ 2018-12-19 06:19:00    10   100    10    10    10     4    10
#> 13 9ES0~ 2018-12-27 18:13:00    20   100    10    10    10    10    10
#> 14 9ES0~ 2018-12-31 09:51:00    50   100    10    10    10    10    10
#> # ... with 2 more variables: H1 <dbl>, I1 <dbl>

head(df)
#> # A tibble: 6 x 11
#>   URN   InterviewDate          A2    B1    C1    D1    E1    F1    G1    H1
#>   <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 9ES0~ 2017-10-15 08:42:00    50   100    10    10    10    10    10    10
#> 2 9ES0~ 2017-10-17 16:12:00   100   100    10    10    10    10    10    10
#> 3 9ES0~ 2017-11-02 12:46:00   100   100     1    10     8    10     7     3
#> 4 9ES0~ 2018-11-09 09:53:00    20   100     4    10    10    10     6    10
#> 5 9ES0~ 2018-11-19 08:19:00    40   100    10    10    10    10    10    10
#> 6 9ES0~ 2018-11-20 14:58:00    40    50    10    10     7    10    10    10
#> # ... with 1 more variable: I1 <dbl>

reprex()
#> Error in reprex(): could not find function "reprex"

Created on 2019-01-25 by the reprex package (v0.2.1)

but hat is not the point.
I would like to apply this recoding to all int variables or specified variables.

taras · January 25, 2019, 5:47pm

A sidenote here. I see you trying to create reprexes all around the site, but none of them are actually reprexes.
The main issue seems to be that you don't even load the package. As a result, take a look at your last line of pasted data, it says: Error in reprex(): could not find function "reprex".

What you're providing here (and elsewhere) is the copy of your console output. We can look at it to make judgments, but it doesn't help us answer your question any better, as we can't take your code, add a couple of lines and give it back to you thus solving your problem: we only can talk about your problem in abstract terms.

andresrcs · January 25, 2019, 5:53pm

just to build on the last comment, the main limitation to helping you is that you are not providing sample data in a suitable format, you have already used datapasta on a previous topic, try to use it again.

EconomiCurtis · January 25, 2019, 6:10pm

Re reprex

I 2nd (or 3rd) the reprex request. It'll make the life much easier for people who want to help you.

Re:

Slavek:

df %>% mutate_at(vars(ends_with('1')), ~ .x * 10)
and it worked (king of) but results were not saved. I found it because I realised that one variable ending with 2 did not change:

A1 A2 B1 C1
90 9 90 90

the line ...mutate_at(vars(ends_with('2')), ~ .x * 10)... will only apply a function to any variable in your table that ends with '2'. Since A1, B1, C1 that seems to be working as expected.

Similar logic for vars(ends_with('2')

Note also that you can pipe together multiple mutate_iffunction calls together to have a single code block that sort your various issues.

for example, something like

library(dplyr)
df <- tribble(
  ~colA, ~colB, ~A2,
  "a",   1,     3,
  "b",   2,     5,
  "c",   3,     6
)
df <- df %>% 
  mutate_at(vars(ends_with('B')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('2')), ~ .x * 10)

  df
#> # A tibble: 3 x 3
#>   colA   colB    A2
#>   <chr> <dbl> <dbl>
#> 1 a        10    30
#> 2 b        20    50
#> 3 c        30    60

^{Created on 2019-01-25 by the reprex package (v0.2.1)}

Slavek · January 28, 2019, 12:55pm

That is absolutely brilliant!!! Thank you!
I managed to use this code to recode my variables

library(dplyr)

df %>% mutate_if(is.numeric, ~ .x * 10)
df <- df %>% 
 mutate_at(vars(starts_with('A')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('B')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('C')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('D')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('E')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('F')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('G')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('2')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('3')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('4')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('5')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('6')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('7')), ~ .x * 10)

Now, I would like to recode all values of -10 to be NAs. I know I could do it one buy one using a code like this:

df$A1[df$A1==-10] <- NA
df$A2[df$A2==-10] <- NA
df$B1[df$B1==-10] <- NA
...
df$D7[df$D7==-10] <- NA

but I hope that I could use mutate-if function again somehow because recoding applies to all the variables above.

Finally, I would like to do a clever recoding into new variables.
Is it possible to recode all numeric variables above into new variables with following logic: 10-60=0, 70-80=50 and 90-100=100? I need new set of recoded variables added to my df with _3 indicator like B2_3, C2_3, D3_3 as they represent 3 levels of responses (1=0, 2=50 and 3=100).

How can I do that?

Slavek · January 28, 2019, 2:32pm

Ooops. I can see that mutate function does not change the data permanently. Would transform function better? I don't know how to convert mutate into transform but I need to have these changes saved in the df...

taras · January 28, 2019, 2:42pm

Slavek:

That is absolutely brilliant!!! Thank you!
I managed to use this code to recode my variables

library(dplyr)

df %>% mutate_if(is.numeric, ~ .x * 10)
df <- df %>% 
 mutate_at(vars(starts_with('A')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('B')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('C')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('D')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('E')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('F')), ~ .x * 10) %>% 
  mutate_at(vars(starts_with('G')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('2')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('3')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('4')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('5')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('6')), ~ .x * 10) %>% 
  mutate_at(vars(ends_with('7')), ~ .x * 10)

This begs for a function to be written on top of it, BTW.

taras · January 28, 2019, 2:42pm

What do you mean? Can you elaborate please?

andresrcs · January 28, 2019, 2:47pm

Some regular expressions could help too, but it's hard to give more specific answers with out sample data.

df <- df %>% 
    mutate_at(vars(matches("[[:alpha:]]+[[:digit:]]+")), ~ .x * 10)

Slavek · January 28, 2019, 3:11pm

Well, my understanding of mutate is that this function changes values only temporarly because when I use summary() or head() or anything like that after using this function, I still can see initial, unchanged numbers.
I need to recode values of these variables permanently so when I run summary() before the recoding I can see Min:1, Max: 10. After the operation I should see Min:10, Max:100 for all recoded variables...

Slavek · January 28, 2019, 3:17pm

I would like to avoid using this:

library(car)
df$A2<-recode(df$A2,"1=10;2=20;3=30;4=40;5=50;6=60;7=70;8=80;9=90;10=100")

prepared for each variable manually...

andresrcs · January 28, 2019, 3:35pm

In general all dplyr functions don't modify dataframes in place, they create a new modified dataframe instead, if you want your changes to persist, then you need to explicitly assign them to the original dataframe (or a new one) with the assign operator <-

library(dplyr)

# This is made up data, you have to use your real data instead
df <- tribble(
    ~colA, ~A1, ~B2,
    "a",   1,     3,
    "b",   2,     5,
    "c",   3,     6
)
df <- df %>% 
    mutate_at(vars(matches("[[:alpha:]]+[[:digit:]]+")), ~ .x * 10)
df
#> # A tibble: 3 x 3
#>   colA     A1    B2
#>   <chr> <dbl> <dbl>
#> 1 a        10    30
#> 2 b        20    50
#> 3 c        30    60

Slavek · January 28, 2019, 5:14pm

Thank you very much for your help!

rensa · January 29, 2019, 12:30am

Sorry I wasn't able to loop back to help you, @Slavek! Thesis submission looms I'm glad @andresrcs and @taras were able to get you over the line!

system · February 5, 2019, 12:30am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.