Calculate the change for each patient with a given ID

Elijawaa · December 9, 2019, 9:02pm

Hello Everyone,

I need help with a project I am working on. I created a new data from the Iris data which is similar to what I am working on.,

For each patient ID, I want to calculate the change that has occurred. For example, for Sepal Length Change(ie L.Change), I have the value to be (5.0-4.9-4.0-5.1=-9). I will then put this value (-9) on the last row for that patient ID number (thus 1 in this case) and then do same for the rest of the patients.

I was hoping the code below could work then I will think of how to turn it into a function, but I am getting the error below.
I am wondering if anyone can help me with a function to solve this.I have over 900 unique ID numbers(observations) and more than 30 variables.

Error: object 'L.Change' not found

L.Change[4,]==irisn$Sepal.Length[4,]-irisn$Sepal.Length[3,]-irisn$Sepal.Length[2,]-irisn$Sepal.Length[1,]

ID Sepal.Length L.Change Sepal.Width W.Change
1 5.1 NA 3.5 NA
1 4.0 NA 4.0 NA
1 4.9 NA 5.0 NA
1 5.0 NA 9.0 NA
2 6.5 NA 7.0 NA
2 7.0 NA 6.0 NA
2 3.9 NA 8.5 NA
2 7.0 NA 7.5 NA
3 4.7 NA 3.5 NA
3 7.0 NA 2.6 NA

FJCC · December 10, 2019, 2:38am

The problem with the line of code you posted is that L.Change does not exist outside of irisn. You have to refer to it as irisn$L.Change. Also, == compares two objects for equality, it does not assign a value to the left hand side.
I do not think that is a workable approach to get your values, though maybe someone else can see a way to do it along those lines. I suggest staying away from placing summary calculations in a special column at the end of a group of rows. That is common in spreadsheets but it does not match well with the way R works. I took the liberty of removing the two "Change" columns and summarizing the data into a new data frame using typical R functions for grouping rows and summarizing results. Does the code below help you at all? I realize I do not know what you need to do next, so this could be useless to you.

library(tidyr)
library(dplyr)
DF <- read.csv("/home/fjcc/R/Play/Sepal.txt", sep = " ")
DF
#>    ID Sepal.Length Sepal.Width
#> 1   1          5.1         3.5
#> 2   1          4.0         4.0
#> 3   1          4.9         5.0
#> 4   1          5.0         9.0
#> 5   2          6.5         7.0
#> 6   2          7.0         6.0
#> 7   2          3.9         8.5
#> 8   2          7.0         7.5
#> 9   3          4.7         3.5
#> 10  3          7.0         2.6
Calc <- function(x, y) {
  MaxRow <- max(y)
  x[y == MaxRow] - sum(x[y < MaxRow])
}
DF2 <- DF %>% gather(key = Feature, value = "Value", Sepal.Length:Sepal.Width) %>%  
  group_by(ID, Feature) %>%  
  mutate(RowID = row_number()) %>%  
  summarize(Change = Calc(Value, RowID))
DF2
#> # A tibble: 6 x 3
#> # Groups:   ID [3]
#>      ID Feature       Change
#>   <int> <chr>          <dbl>
#> 1     1 Sepal.Length  -9    
#> 2     1 Sepal.Width   -3.5  
#> 3     2 Sepal.Length -10.4  
#> 4     2 Sepal.Width  -14    
#> 5     3 Sepal.Length   2.3  
#> 6     3 Sepal.Width   -0.900

^{Created on 2019-12-09 by the reprex package (v0.2.1)}

Elijawaa · December 10, 2019, 3:55pm

Hi FJCC,

Thanks for your quick response to my question. I guess my calculation earlier on deviated from what I needed in the actual data. So what I intend to do in the actual data is to sum up the difference at each time points which in this case will be the sum of (4.0-5.1)+(4.9-4.0)+(5.0-4.9) = (-1.1)+(0.9)+(0.1)=-0.1

So with the last row, since I did not want to create a new row, what I do is to take the difference/ change that has occurred at that time and add it to the sum of the previous change values. If a new row will help us to do this, then that will be fine.

Because the focus of this analysis is on the sum of these changes for each ID, I would like to create a new column called the change(#just beside each variable). Or create this change value in excel before importing into R ( as I have done)

I will take only the ID and the new values(thus the sum of the change) for the analysis.

Thanks and I hope you can help me again.

FJCC · December 10, 2019, 7:25pm

If you only need the new values and the IDs for the analysis, I would calculate them like this. The idea is to shift the values of each ID/Feature combination by one row in a new column and calculate the sum of the differences between the original data and the shifted data. Note that the data set I used only has the first two values of ID = 3.


library(tidyr)
library(dplyr)
DF <- read.csv("/home/fjcc/R/Play/Sepal.txt", sep = " ")
DF
#>    ID Sepal.Length Sepal.Width
#> 1   1          5.1         3.5
#> 2   1          4.0         4.0
#> 3   1          4.9         5.0
#> 4   1          5.0         9.0
#> 5   2          6.5         7.0
#> 6   2          7.0         6.0
#> 7   2          3.9         8.5
#> 8   2          7.0         7.5
#> 9   3          4.7         3.5
#> 10  3          7.0         2.6

DF2 <- DF %>% gather(key = Feature, value = "Value", Sepal.Length:Sepal.Width) %>%  
  group_by(ID, Feature) %>%  
  mutate(LeadVal = lead(Value)) %>%  
  summarize(Change = sum(LeadVal - Value, na.rm = TRUE))
DF2
#> # A tibble: 6 x 3
#> # Groups:   ID [3]
#>      ID Feature       Change
#>   <int> <chr>          <dbl>
#> 1     1 Sepal.Length -0.1000
#> 2     1 Sepal.Width   5.5   
#> 3     2 Sepal.Length  0.5   
#> 4     2 Sepal.Width   0.5   
#> 5     3 Sepal.Length  2.3   
#> 6     3 Sepal.Width  -0.900
DF2 <- DF2 %>% mutate(Feature = paste(Feature, "Chg", sep = "_")) %>% 
                 spread(key = Feature, value = Change)
DF2
#> # A tibble: 3 x 3
#> # Groups:   ID [3]
#>      ID Sepal.Length_Chg Sepal.Width_Chg
#>   <int>            <dbl>           <dbl>
#> 1     1          -0.1000           5.5  
#> 2     2           0.5              0.5  
#> 3     3           2.3             -0.900

^{Created on 2019-12-10 by the reprex package (v0.2.1)}

Elijawaa · December 12, 2019, 2:31pm

@FJCC Thank you so much for your help!!!

Elijawaa · December 17, 2019, 9:10pm

Hi FJCC,

I am trying to model a longitudinal data over time. Some of the variables are continues and others are categorical. Can you give me a guide on how to write the code.

Thanks

FJCC · December 17, 2019, 9:25pm

Please start a new thread since this is a different question. Also, I do not have experience modeling longitudinal data but I am sure other people on the forum can help you. If you prepare a Reproducible Example it will be much easier for others to help you.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Elijawaa · December 18, 2019, 9:11am

Hello FJCC,

I was using the first code you helped me with on a larger data set of about 3000 rows and more than 100 columns and I had this result. I have NA’s at each column too but I am not sure if that’s the reason I am getting this.

Thanks

FJCC · December 18, 2019, 3:38pm

To see if the NAs are the problem, add na.rm = TRUE to the sum function.

x[y == MaxRow] - sum(x[y < MaxRow], na.rm = TRUE)

However, if the value in x[y == MaxRow] is NA, I do not think that will help. Does it make sense to replace the NAs with zero in this data set?

Elijawaa · December 18, 2019, 4:34pm

I will use na.rm=True so that I can account for the missing values.

This code works perfectly fine but I tried editing to take the difference between the first and last rows only but it did not work.

DF2 <- DF %>% gather(key = Feature, value = "Value", Sepal.Length:Sepal.Width) %>%
group_by(ID, Feature) %>%
mutate(LeadVal = lead(Value)) %>%
summarize(Change = LeadVal - last(Value), na.rm = TRUE))
DF2

Can the lag function also be used for the difference?

FJCC · December 19, 2019, 3:02am

DF <- read.csv("/home/fjcc/R/Play/Sepal.txt", sep = " ")
DF
#>    ID Sepal.Length Sepal.Width
#> 1   1          5.1         3.5
#> 2   1          4.0         4.0
#> 3   1          4.9         5.0
#> 4   1          5.0         9.0
#> 5   2          6.5         7.0
#> 6   2          7.0         6.0
#> 7   2          3.9         8.5
#> 8   2          7.0         7.5
#> 9   3          4.7         3.5
#> 10  3          7.0         2.6

library(dplyr)

library(tidyr)
DF2 <- DF %>% gather(key = Feature, value = "Value", Sepal.Length:Sepal.Width) %>%  
  group_by(ID, Feature) %>% 
  mutate(Row = row_number())

MaxRows <- DF2 %>% summarize(MaxRow = max(Row))

FirstRow = DF2 %>% filter(Row == 1)

LastRow <- DF2 %>% ungroup() %>% semi_join(MaxRows, by = c("ID", "Feature", "Row" = "MaxRow"))

Diffs <- inner_join(FirstRow, LastRow, by = c("ID", "Feature"), suffix = c(".First", ".Last"))
Diffs <- mutate(Diffs, Difference = Value.Last - Value.First)
Diffs
#> # A tibble: 6 x 7
#> # Groups:   ID, Feature [6]
#>      ID Feature      Value.First Row.First Value.Last Row.Last Difference
#>   <int> <chr>              <dbl>     <int>      <dbl>    <int>      <dbl>
#> 1     1 Sepal.Length         5.1         1        5          4    -0.1000
#> 2     2 Sepal.Length         6.5         1        7          4     0.5   
#> 3     3 Sepal.Length         4.7         1        7          2     2.3   
#> 4     1 Sepal.Width          3.5         1        9          4     5.5   
#> 5     2 Sepal.Width          7           1        7.5        4     0.5   
#> 6     3 Sepal.Width          3.5         1        2.6        2    -0.900

^{Created on 2019-12-18 by the reprex package (v0.2.1)}

system · December 26, 2019, 3:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.