range for multiple columns per observation

medici · February 3, 2021, 2:01pm

Hello everyone! I hope you are doing well! I have a medical dataset with multiple temperature measurements per patient. (every 6 hours for 48 hours, 9 measurements per patient).

names(TempDataset)
[1] "Date" "MRN"
[3] "Age" "Gender"
[5] "Exclude" "Etiology"
[7] "Surivived to decannulation" "Survived"
[9] "Timetodecannul" "TimetoICUdischarge"
[11] "Temperature0hs" "Temperature6hs"
[13] "Temperature12hs" "Temperature18hs"
[15] "Temperature24hs" "Temperature30hs"
[17] "Temperature36hs" "Temperature42hs"
[19] "Temperature48hs"

All the temperature columns [11:19] are double. I want to create additional variables that can do summary statistics per patient. (Temperature range /patient, mean/ patient etc). Is that possible?

In the case example of range, I have tried to use the range function through mutate, to create a Temperaturerange variable, however it has not worked. Any ideas?

TempDataset %>%

mutate(Temperaturerange = range(TempDataset$Temperature0hs,TempDataset$Temperature6hs,TempDataset$Temperature12hs,TempDataset$Temperature18hs
,TempDataset$Temperature24hs, TempDataset$Temperature30hs, TempDataset$Temperature36hs,TempDataset$Temperature42hs, TempDataset$Temperature48hs))
Error: Problem with mutate() input Temperaturerange.
x Input Temperaturerange can't be recycled to size 217.
i Input Temperaturerange is range(...).
i Input Temperaturerange must be size 217 or 1, not 2.
Run rlang::last_error() to see where the error occurred.

I am quite new to R so please excuse me if the question is considered simple or naive.

pieterjanvc · February 3, 2021, 3:03pm

Hi,

Welcome to the RStudio community!

What you like to do is not that difficult when you get to know all the functions from the Tidyverse, but it can be a bit of a learning curve in the beginning, so bear with me here.

Let me first show me a possible solution:

library(tidyverse)

#Generate data
set.seed(1)
myData = tibble(
  MRN = 1:5,
  AGE = sample(18:65, 5, replace = T),
  Temperature0hs = runif(5, 36, 40),
  Temperature24hs = runif(5, 36, 40),
  Temperature48hs = runif(5, 36, 40)
)
myData
#> # A tibble: 5 x 5
#>     MRN   AGE Temperature0hs Temperature24hs Temperature48hs
#>   <int> <int>          <dbl>           <dbl>           <dbl>
#> 1     1    21           38.6            38.7            40.0
#> 2     2    56           38.5            37.5            37.5
#> 3     3    18           36.2            39.1            39.1
#> 4     4    51           36.8            38.0            39.7
#> 5     5    40           36.7            38.9            36.8

#Make the format long
myData = myData %>% 
  pivot_longer(-c(MRN, AGE), names_to = "time", 
               values_to = "temp") 
myData
#> # A tibble: 15 x 4
#>      MRN   AGE time             temp
#>    <int> <int> <chr>           <dbl>
#>  1     1    21 Temperature0hs   38.6
#>  2     1    21 Temperature24hs  38.7
#>  3     1    21 Temperature48hs  40.0
#>  4     2    56 Temperature0hs   38.5
#>  5     2    56 Temperature24hs  37.5
#>  6     2    56 Temperature48hs  37.5
#>  7     3    18 Temperature0hs   36.2
#>  8     3    18 Temperature24hs  39.1
#>  9     3    18 Temperature48hs  39.1
#> 10     4    51 Temperature0hs   36.8
#> 11     4    51 Temperature24hs  38.0
#> 12     4    51 Temperature48hs  39.7
#> 13     5    40 Temperature0hs   36.7
#> 14     5    40 Temperature24hs  38.9
#> 15     5    40 Temperature48hs  36.8

#Do analyses
myData %>% 
  group_by(MRN) %>% 
  summarise(
    minTemp = min(temp),
    maxTemp = max(temp),
    meanTemp = mean(temp),
    rangeTemp = maxTemp - minTemp
    )
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 5 x 5
#>     MRN minTemp maxTemp meanTemp rangeTemp
#>   <int>   <dbl>   <dbl>    <dbl>     <dbl>
#> 1     1    38.6    40.0     39.1     1.32 
#> 2     2    37.5    38.5     37.9     0.996
#> 3     3    36.2    39.1     38.1     2.86 
#> 4     4    36.8    39.7     38.2     2.91 
#> 5     5    36.7    38.9     37.5     2.16

^{Created on 2021-02-03 by the reprex package (v0.3.0)}

So I started by recreating a slimmed down version of your dataset, only using 3 temperature moments and 2 other variables.

The key function to make this data easier to work with, it changing the format from wide to long. This is done using the pivot_longer() function and telling it that all columns apart from MRN and AGE need to be changed into a long format. You can see the before and after in the example.

Once you have the data in this long format, it's much easier to work with the other dplyr functions to get the analyses you like. I provided a few examples.

Please read the documentation on the various functions if you like to learn more about them, and let us know if you get stuck again!

Hope this helps,
PJ

medici · February 3, 2021, 8:12pm

Thanks a lot! It worked perfectly well!

system · February 10, 2021, 8:12pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.