How to scale up/down columns in a dataset conditionally

Ellenz · May 14, 2019, 3:03pm

If I have two datasets, DF1 and DF2, with the same column names but different values for part of the columns (the temp columns in this example). My question is how to change the values in one dataset (DF2) according to the values of a column in another dataset (DF1). The example datasets are as follows.

DF1
ID elevation temp1 temp2 temp3
101 -10 2 3.5 2.2
102 2000 3.1 3.0 5.1
103 1400 4.0 2.8 4.1
...

DF2
ID elevation temp1 temp2 temp3
101 350 1.8 3.0 2.9
102 2206 2.1 4.0 4.1
103 1450 1.0 2.3 6.1
...

The two datasets have the similar dimension, and ID column, but different values for elevation, temp1, temp2, and temp3.
Assume that the values in DF1 are correct, while I need to scale up/down values of "temp1", "temp2", and "temp3" columns in DF2 according to differences between "elevation" column of the two datasets.

Specifically, if for each row, | DF1[i,]$elevation - DF2[i,]$elevation | < 100, then move to the next row (i.e. not consider for the case that elevation differences are less than 100), else, if for each row i, | DF1[i,]$elevation - DF2[i,]$elevation | >= 100, then there may be the two following conditions and scaling up/down processes accordingly.
(1) DF2[i,]$elevation < DF1[i,]$elevation, DF2[i,3:5] = DF2[i,3:5]- (6.5/1000)x(DF1[i,]$elevation - DF2[i,]$elevation)
(2) DF2[i,]$elevation > DF1[i,]$elevation, DF2[i,3:5]= DF2[i,3:5]+ (6.5/1000)x(DF2[i,]$elevation - DF1[i,]$elevation)

How to do this in a loop in R? Thanks.

FJCC · May 14, 2019, 9:50pm

I would rotate the data into a long format, join the two data sets, do the calculation and then rotate back to the wide format.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
DF1 <- data.frame(ID = c(101,102,103), elevation = c(-10, 2000, 1400),
                  temp1 = c(2,3.1,4.0), temp2 = c(3.5, 3.0, 2.8),
                  temp3 = c(2.2, 5.1, 4.1))
DF2 <- data.frame(ID = c(101,102,103), elevation = c(350, 2206, 1450),
                  temp1 = c(1.8, 2.1, 1.0), temp2 = c(3.0, 4.0, 2.3),
                  temp3 = c(2.9, 4.1, 6.1))
DF1_lng <- DF1 %>% gather(key = TempSeq, value = Value, temp1:temp3)
DF2_lng <- DF2 %>% gather(key = TempSeq, value = Value, temp1:temp3)
DFjoin <- inner_join(DF1_lng, DF2_lng, by = c("ID", "TempSeq"), suffix = c("_1", "_2"))
DFjoin <- DFjoin %>% mutate(Value2adj = ifelse(abs(elevation_1 - elevation_2) >= 100,
                            Value_2 + (elevation_2 - elevation_1) * 6.5/1000, Value_2))
DFjoin
#>    ID elevation_1 TempSeq Value_1 elevation_2 Value_2 Value2adj
#> 1 101         -10   temp1     2.0         350     1.8     4.140
#> 2 102        2000   temp1     3.1        2206     2.1     3.439
#> 3 103        1400   temp1     4.0        1450     1.0     1.000
#> 4 101         -10   temp2     3.5         350     3.0     5.340
#> 5 102        2000   temp2     3.0        2206     4.0     5.339
#> 6 103        1400   temp2     2.8        1450     2.3     2.300
#> 7 101         -10   temp3     2.2         350     2.9     5.240
#> 8 102        2000   temp3     5.1        2206     4.1     5.439
#> 9 103        1400   temp3     4.1        1450     6.1     6.100
DF2adj <- DFjoin %>% select(ID, elevation = elevation_2, TempSeq, Value2adj) %>% 
  spread(key = TempSeq, value = Value2adj)
DF2adj                            
#>    ID elevation temp1 temp2 temp3
#> 1 101       350 4.140 5.340 5.240
#> 2 102      2206 3.439 5.339 5.439
#> 3 103      1450 1.000 2.300 6.100

^{Created on 2019-05-14 by the reprex package (v0.2.1)}

Ellenz · May 15, 2019, 2:02am

Thanks, it works. Now I have a small question.
If the column names are c('jan','feb','mar','apr','may','jun','jul','aug','sept','oct','nov','dec') instead of ('temp1','temp2','temp3'),
I use the similar approach and get DF2adj, but the column names are in the order c("apr" "aug" "dec" "feb" "jan" "jul" "jun" "mar" "may", "nov" "oct" "sept"). How to reorder it as "jan":"dec", like the 12 monthly order?

Ellenz · May 15, 2019, 2:22am

I just tried to reorder the column names manually, and it worked.

DF2adj = DF2adj[c("ID", "elevation", "jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sept", "oct", "nov", "dec")]

FJCC · May 15, 2019, 2:23am

You can also make the column an ordered factor, though manually reordering is an equally good solution.

library(tidyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(jan = 1, feb = 2, mar = 3, apr = 4, may = 5, jun = 6,
                 jul = 7, aug = 8, sep = 9, oct = 10, nov = 11, dec = 12)
df
#>   jan feb mar apr may jun jul aug sep oct nov dec
#> 1   1   2   3   4   5   6   7   8   9  10  11  12
df2 <- df %>% gather(key = Month, value = Value, jan:dec)
str(df2)
#> 'data.frame':    12 obs. of  2 variables:
#>  $ Month: chr  "jan" "feb" "mar" "apr" ...
#>  $ Value: num  1 2 3 4 5 6 7 8 9 10 ...

df3 <- df2 %>% spread(key = Month, value = Value)
df3
#>   apr aug dec feb jan jul jun mar may nov oct sep
#> 1   4   8  12   2   1   7   6   3   5  11  10   9

df4 <- df2 %>% 
  mutate(Month = factor(Month, 
                        levels = c("jan", "feb", "mar", "apr", "may", "jun", 
                                   "jul", "aug", "sep", "oct", "nov", "dec"), 
                        ordered = TRUE)) %>% 
  spread(key = Month, value = Value)
df4
#>   jan feb mar apr may jun jul aug sep oct nov dec
#> 1   1   2   3   4   5   6   7   8   9  10  11  12

^{Created on 2019-05-14 by the reprex package (v0.2.1)}

Ellenz · May 15, 2019, 8:02am

Thanks, it works.
I tried to keep the decimal places of columns jan:dec to 2, but couldn't get it correct. What is the problem? Thanks.

DF2adj <- mutate_at(vars(temp1:temp3), list(~round(.,2)))
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "quosures"

FJCC · May 15, 2019, 1:59pm

You seem to have left out the .tbl argument of mutate_at(). Try using

DF2adj <- DF2adj %>% mutate_at(.vars = vars(temp1:temp3), 
   .funs = round, digits = 2)

Ellenz · May 16, 2019, 4:04pm

Thanks. I found that I can only handle some basic R functions. Now I have need to do some variations of the dataframes and I don't know how to adapt the code. For example, if I have DF1, DF2 as in the initial question. But now I have DF3, etc. until DF5, which have the same dimension, format, and column names, but different values for temp1:temp3. I want to create a new dataframe as DF.new, with column names: ID, elevation, temp1, temp2, temp3. I want to calculate average values of temp1 from DF1 to DF5, average values of temp2 from DF1 to DF5, and so on, as temp1:temp3 in DF.new. How to do this? Could you could give me a hint on how to combine DF1 to DF5 to DF.new to achieve this goal? Should I still use "mutate_at" or "mutate" function?
In my example, I tried to create an empty list and put all dataframes in, and then calculate the average values, but I couldn't get it to work. Thanks for your help.

DF1 <- data.frame(ID = c(101,102,103), elevation = c(-10, 2000, 1400),
temp1 = c(2,3.1,4.0), temp2 = c(3.5, 3.0, 2.8),
temp3 = c(2.2, 5.1, 4.1))
DF2 <- data.frame(ID = c(101,102,103), elevation = c(350, 2206, 1450),
temp1 = c(1.8, 2.1, 1.0), temp2 = c(3.0, 4.0, 2.3),
temp3 = c(2.9, 4.1, 6.1))
DF3 <- DF1
DF4 <- DF2
DF5 <- DF1

DF.list = list()
for(i in 1:5){
DF.list[[1]] <- DF1
DF.list[[2]] <- DF2
DF.list[[3]] <- DF3
DF.list[[4]] <- DF4
DF.list[[5]] <- DF5
}

DF.new = as.data.frame(ID=DF.list[[1]]$ID, elevation=DF.list[[1]]$elevation, ...)

FJCC · May 16, 2019, 4:25pm

dplyr has the bind_rows() function that will combine a list of data frames. Its .id parameter can be used to make a column showing which data frame in the list was the origin of the data. Though that is not necessary in this case, I think it makes it easier to follow what is going on. I broke out the code steps into several intermediate data frames to make it easier to inspect each step of the process.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
DF1 <- data.frame(ID = c(101,102,103), elevation = c(-10, 2000, 1400),
                  temp1 = c(2,3.1,4.0), temp2 = c(3.5, 3.0, 2.8),
                  temp3 = c(2.2, 5.1, 4.1))
DF2 <- data.frame(ID = c(101,102,103), elevation = c(350, 2206, 1450),
                  temp1 = c(1.8, 2.1, 1.0), temp2 = c(3.0, 4.0, 2.3),
                  temp3 = c(2.9, 4.1, 6.1))
DF3 <- DF1
DF4 <- DF2
DF5 <- DF1

DF.list = list()
for(i in 1:5){
  DF.list[[1]] <- DF1
  DF.list[[2]] <- DF2
  DF.list[[3]] <- DF3
  DF.list[[4]] <- DF4
  DF.list[[5]] <- DF5
}
DF.New <- bind_rows(DF.list, .id = "DF")
DF_lng <- DF.New %>% gather(key = TEMP, value = Value, temp1:temp3)
DF_Means <- DF_lng %>% group_by(ID, TEMP) %>% summarize(Avg = mean(Value)) 
DF_Means_Wide <- DF_Means %>% spread(key = TEMP, value = Avg)
DF_Means_Wide
#> # A tibble: 3 x 4
#> # Groups:   ID [3]
#>      ID temp1 temp2 temp3
#>   <dbl> <dbl> <dbl> <dbl>
#> 1   101  1.92  3.3   2.48
#> 2   102  2.7   3.4   4.70
#> 3   103  2.8   2.60  4.90

^{Created on 2019-05-16 by the reprex package (v0.2.1)}

system · May 23, 2019, 4:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.