Adding a new column based on different ranges of number stored in vectors

Dallak · February 11, 2022, 4:31am

I have this dataset, I want to add a new column for different ranges of number based on day. The ranges of number I am interested on are stored in v1 and v2.

ex <- data.frame('id'= seq(1:26),
                 'day'= c(105:115, 1:12,28:30),
                 'letter' = LETTERS[1:26],
                 s = rep(1:26, each = 3, len = 26) )

v1 <- c(107,112,10)
v2 <- c(109,115,28)

word <- c("pen","desk","light")

The column I want to add is stored in word. From day 107:109 I want to add the word "pen", 112:115 "desk", and 10:28 "light". For the rest of the values that fall outside of these ranges, I want to use the word "undetermined".

I am aware of this post Adding Values to a New Column in a Dataframe by Index - General - RStudio Community, but not sure how to extend it to my case.

Thank you in advance!

FJCC · February 11, 2022, 6:03am

Is this what you want?

ex <- data.frame('id'= seq(1:26),
                 'day'= c(105:115, 1:12,28:30),
                 'letter' = LETTERS[1:26],
                 s = rep(1:26, each = 3, len = 26) )

v1 <- c(107,112,10)
v2 <- c(109,115,28)

word <- c("pen","desk","light")
library(dplyr)

ex <- ex |> mutate(NewColumn = case_when(
  day >= v1[1] & day <= v2[1] ~ word[1],
  day >= v1[2] & day <= v2[2] ~ word[2],
  day >= v1[3] & day <= v2[3] ~ word[3],
  TRUE ~ "undetermined"
))
ex
#>    id day letter s    NewColumn
#> 1   1 105      A 1 undetermined
#> 2   2 106      B 1 undetermined
#> 3   3 107      C 1          pen
#> 4   4 108      D 2          pen
#> 5   5 109      E 2          pen
#> 6   6 110      F 2 undetermined
#> 7   7 111      G 3 undetermined
#> 8   8 112      H 3         desk
#> 9   9 113      I 3         desk
#> 10 10 114      J 4         desk
#> 11 11 115      K 4         desk
#> 12 12   1      L 4 undetermined
#> 13 13   2      M 5 undetermined
#> 14 14   3      N 5 undetermined
#> 15 15   4      O 5 undetermined
#> 16 16   5      P 6 undetermined
#> 17 17   6      Q 6 undetermined
#> 18 18   7      R 6 undetermined
#> 19 19   8      S 7 undetermined
#> 20 20   9      T 7 undetermined
#> 21 21  10      U 7        light
#> 22 22  11      V 8        light
#> 23 23  12      W 8        light
#> 24 24  28      X 8        light
#> 25 25  29      Y 9 undetermined
#> 26 26  30      Z 9 undetermined

^{Created on 2022-02-10 by the reprex package (v2.0.1)}

Dallak · February 11, 2022, 6:47am

Yes, it is @FJCC! Thank you so much.
Since there is a bit of manual work, is there a way to replicate this string from 1:100, for example:

day >= v1[1] & day <= v2[1] ~ word[1]

The output I am looking for is something like:

day >= v1[1] & day <= v2[1] ~ word[1],
day >= v1[2] & day <= v2[2] ~ word[2],
day >= v1[3] & day <= v2[3] ~ word[3],
day >= v1[4] & day <= v2[4] ~ word[4],
day >= v1[5] & day <= v2[5] ~ word[5],
.
.
.
day >= v1[00] & day <= v2[100] ~ word[100],

Thank you, again!

Dallak · February 11, 2022, 8:38am

I guess I figured it out and it is working.

a <- seq(1:100)
a1 <- seq(1:100)
a2 <- seq(1:100)
sprintf("day >= v1[%s] & day <= v2[%s] ~ word[%s],", a,a1,a2 )

Thank you again, @FJCC!

FJCC · February 11, 2022, 3:04pm

Here is a more elegant method that uses fuzzyjoin.

ex <- data.frame('id'= seq(1:26),
                  'day'= c(105:115, 1:12,28:30),
                  'letter' = LETTERS[1:26],
                  s = rep(1:26, each = 3, len = 26) )
v1 <- c(107,112,10)
v2 <- c(109,115,28)
word <- c("pen","desk","light")
Ranges <- data.frame(v1,v2,word)
library(dplyr)
library(fuzzyjoin)
library(tidyr)
ex2 <- ex |> fuzzy_left_join(Ranges, by = c(day = "v1", day = "v2"),
                              match_fun = list(`>=`, `<=`)) |> 
   mutate(word = replace_na(word, "undetermined")) |> 
   select(-v1, -v2)
ex2
   id day letter s         word
1   1 105      A 1 undetermined
2   2 106      B 1 undetermined
3   3 107      C 1          pen
4   4 108      D 2          pen
5   5 109      E 2          pen
6   6 110      F 2 undetermined
7   7 111      G 3 undetermined
8   8 112      H 3         desk
9   9 113      I 3         desk
10 10 114      J 4         desk
11 11 115      K 4         desk
12 12   1      L 4 undetermined
13 13   2      M 5 undetermined
14 14   3      N 5 undetermined
15 15   4      O 5 undetermined
16 16   5      P 6 undetermined
17 17   6      Q 6 undetermined
18 18   7      R 6 undetermined
19 19   8      S 7 undetermined
20 20   9      T 7 undetermined
21 21  10      U 7        light
22 22  11      V 8        light
23 23  12      W 8        light
24 24  28      X 8        light
25 25  29      Y 9 undetermined
26 26  30      Z 9 undetermined

Dallak · February 12, 2022, 4:35am

Thank you, @FJCC! This is very helpful. Much appreciated!

system · February 19, 2022, 4:35am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.