how to run a function for each subject within a dataset?

Beavet82 · May 3, 2025, 5:54pm

Hi, I am looking for some advice.

I am very basic user and writing functions or loops is outside of what I feel comfortable.

I am working on a dataset with repeated measurements for several subjects.
I have come across a function that someone wrote which I can use with my database, tested and worked.

The issue I am having is that it only works if I filter out of my dataset the data for each individual before running the function.

I am sure this can be resolved adding some code to extract from the dataset each of the individuals to generate the value calculated by the function, and creating a new dataset... but I have no idea how to do it and after watching several videos I am no closer to a solution than 48h ago...

Any help will be appreciated.
Thanks

FJCC · May 3, 2025, 7:12pm

It's hard to give specific advice without seeing some of your data and the function you want to apply to it. Can out post the data for two subjects and the function? To post data, make the subset then run the dput function and post its output. If your data set with two subjects is called DF. Run

dput(DF)

and post the output.
When posting either function output or the code of the function you want to run, put lines with three back ticks before and after text, like this
```
function output or code goes here
```

prubin · May 3, 2025, 7:19pm

I'll demonstrate one approach using the iris dataset (pretend each species is a subject) and the mean function (substituting for your function).

library(dplyr)
results <- iris |> group_by(Species) |> summarize(SLmean = mean(Sepal.Length), PLmean = mean(Petal.Length))
results
#> # A tibble: 3 × 3
#>   Species    SLmean PLmean
#>   <fct>       <dbl>  <dbl>
#> 1 setosa       5.01   1.46
#> 2 versicolor   5.94   4.26
#> 3 virginica    6.59   5.55

^{Created on 2025-05-03 with reprex v2.1.1}

Beavet82 · May 5, 2025, 8:24am

Good morning,

thanks for the feedback. I needed to refresh myself how to insert a code in Posit to be reproduceable.

This is an example dataset:

require(datasets)
Theoph
dat <- Theoph
dat$time <- dat$Time

This is the code I want to use (with thanks to Ace and Helmut, Bioequivalence and Bioavailability Forum • Calculation of time above MIC):

# Code for getting time over MIC of a **thereshold** of x (thereshold  labelled as th):

f2 <- function(dat,th,logarithmic=FALSE) {
  above = 0
  w <- dat$conc > th
  w[w == FALSE] <- 0
  w[w == TRUE] <- 1
  w2 <- which(abs(diff(w)) == 1)
  if (logarithmic == FALSE) {
    for (i in w2) {
      n1 <- diff(w)[i] * -1*(dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                                (th-dat$conc[i+1])/(dat$conc[i]-dat$conc[i+1])))
      above <- above + n1
    }
  }
  if (logarithmic == TRUE) {
    for (i in w2) {
      if (diff(w)[i] == 1) {
        n1 <- (dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                  (th-dat$conc[i+1])/(dat$conc[i]-dat$conc[i+1])))
        above <- above - n1
      }
      if (diff(w)[i] == -1) {
        n1 <- (dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                  (log(th)-log(dat$conc[i+1]))/(log(dat$conc[i])-log(dat$conc[i+1]))))
        above <- above + n1
      }
    }
  }
  return(above)
}

# Determine what concentration MIC thereshold (th = 4):

f2(dat, th = 4, logarithmic = TRUE)
f2(dat, th = 4, logarithmic = FALSE)

The above gives me the time in hours when the MIC is over 4, but for all subjects combined of the dataset.

What i would like to get is the individual times, and so far I know how to do it only one-by-one:

# Extract one subjet and apply re-run function f2.

dat <- dat %>%
  filter(Subject ==1)

f2 <- function(dat,th,logarithmic=FALSE) {
  above = 0
  w <- dat$conc > th
  w[w == FALSE] <- 0
  w[w == TRUE] <- 1
  w2 <- which(abs(diff(w)) == 1)
  if (logarithmic == FALSE) {
    for (i in w2) {
      n1 <- diff(w)[i] * -1*(dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                                (th-dat$conc[i+1])/(dat$conc[i]-dat$conc[i+1])))
      above <- above + n1
    }
  }
  if (logarithmic == TRUE) {
    for (i in w2) {
      if (diff(w)[i] == 1) {
        n1 <- (dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                  (th-dat$conc[i+1])/(dat$conc[i]-dat$conc[i+1])))
        above <- above - n1
      }
      if (diff(w)[i] == -1) {
        n1 <- (dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                  (log(th)-log(dat$conc[i+1]))/(log(dat$conc[i])-log(dat$conc[i+1]))))
        above <- above + n1
      }
    }
  }
  return(above)
}

f2(dat, th = 4, logarithmic = TRUE)
f2(dat, th = 4, logarithmic = FALSE)

The above give me the results I need but i have to run the code for the n individuals making my dataset.
Also the result appears only in the console rather than integrating it to the dataset as a new variable or making a new dataset with the results for each of the individuals.

Hope this post makes my question more clear.

many thanks
Beatriz

StatSteph · May 5, 2025, 1:14pm

Here's a solution:

require(datasets)
Theoph
#>     Subject   Wt Dose  Time  conc
#> 1         1 79.6 4.02  0.00  0.74
#> 2         1 79.6 4.02  0.25  2.84
#> 3         1 79.6 4.02  0.57  6.57
#> 4         1 79.6 4.02  1.12 10.50
#> 5         1 79.6 4.02  2.02  9.66
#> 6         1 79.6 4.02  3.82  8.58
#> 7         1 79.6 4.02  5.10  8.36
#> 8         1 79.6 4.02  7.03  7.47
#> 9         1 79.6 4.02  9.05  6.89
#> 10        1 79.6 4.02 12.12  5.94
#> 11        1 79.6 4.02 24.37  3.28
#> 12        2 72.4 4.40  0.00  0.00
#> 13        2 72.4 4.40  0.27  1.72
#> 14        2 72.4 4.40  0.52  7.91
#> 15        2 72.4 4.40  1.00  8.31
#> 16        2 72.4 4.40  1.92  8.33
#> 17        2 72.4 4.40  3.50  6.85
#> 18        2 72.4 4.40  5.02  6.08
#> 19        2 72.4 4.40  7.03  5.40
#> 20        2 72.4 4.40  9.00  4.55
#> 21        2 72.4 4.40 12.00  3.01
#> 22        2 72.4 4.40 24.30  0.90
#> 23        3 70.5 4.53  0.00  0.00
#> 24        3 70.5 4.53  0.27  4.40
#> 25        3 70.5 4.53  0.58  6.90
#> 26        3 70.5 4.53  1.02  8.20
#> 27        3 70.5 4.53  2.02  7.80
#> 28        3 70.5 4.53  3.62  7.50
#> 29        3 70.5 4.53  5.08  6.20
#> 30        3 70.5 4.53  7.07  5.30
#> 31        3 70.5 4.53  9.00  4.90
#> 32        3 70.5 4.53 12.15  3.70
#> 33        3 70.5 4.53 24.17  1.05
#> 34        4 72.7 4.40  0.00  0.00
#> 35        4 72.7 4.40  0.35  1.89
#> 36        4 72.7 4.40  0.60  4.60
#> 37        4 72.7 4.40  1.07  8.60
#> 38        4 72.7 4.40  2.13  8.38
#> 39        4 72.7 4.40  3.50  7.54
#> 40        4 72.7 4.40  5.02  6.88
#> 41        4 72.7 4.40  7.02  5.78
#> 42        4 72.7 4.40  9.02  5.33
#> 43        4 72.7 4.40 11.98  4.19
#> 44        4 72.7 4.40 24.65  1.15
#> 45        5 54.6 5.86  0.00  0.00
#> 46        5 54.6 5.86  0.30  2.02
#> 47        5 54.6 5.86  0.52  5.63
#> 48        5 54.6 5.86  1.00 11.40
#> 49        5 54.6 5.86  2.02  9.33
#> 50        5 54.6 5.86  3.50  8.74
#> 51        5 54.6 5.86  5.02  7.56
#> 52        5 54.6 5.86  7.02  7.09
#> 53        5 54.6 5.86  9.10  5.90
#> 54        5 54.6 5.86 12.00  4.37
#> 55        5 54.6 5.86 24.35  1.57
#> 56        6 80.0 4.00  0.00  0.00
#> 57        6 80.0 4.00  0.27  1.29
#> 58        6 80.0 4.00  0.58  3.08
#> 59        6 80.0 4.00  1.15  6.44
#> 60        6 80.0 4.00  2.03  6.32
#> 61        6 80.0 4.00  3.57  5.53
#> 62        6 80.0 4.00  5.00  4.94
#> 63        6 80.0 4.00  7.00  4.02
#> 64        6 80.0 4.00  9.22  3.46
#> 65        6 80.0 4.00 12.10  2.78
#> 66        6 80.0 4.00 23.85  0.92
#> 67        7 64.6 4.95  0.00  0.15
#> 68        7 64.6 4.95  0.25  0.85
#> 69        7 64.6 4.95  0.50  2.35
#> 70        7 64.6 4.95  1.02  5.02
#> 71        7 64.6 4.95  2.02  6.58
#> 72        7 64.6 4.95  3.48  7.09
#> 73        7 64.6 4.95  5.00  6.66
#> 74        7 64.6 4.95  6.98  5.25
#> 75        7 64.6 4.95  9.00  4.39
#> 76        7 64.6 4.95 12.05  3.53
#> 77        7 64.6 4.95 24.22  1.15
#> 78        8 70.5 4.53  0.00  0.00
#> 79        8 70.5 4.53  0.25  3.05
#> 80        8 70.5 4.53  0.52  3.05
#> 81        8 70.5 4.53  0.98  7.31
#> 82        8 70.5 4.53  2.02  7.56
#> 83        8 70.5 4.53  3.53  6.59
#> 84        8 70.5 4.53  5.05  5.88
#> 85        8 70.5 4.53  7.15  4.73
#> 86        8 70.5 4.53  9.07  4.57
#> 87        8 70.5 4.53 12.10  3.00
#> 88        8 70.5 4.53 24.12  1.25
#> 89        9 86.4 3.10  0.00  0.00
#> 90        9 86.4 3.10  0.30  7.37
#> 91        9 86.4 3.10  0.63  9.03
#> 92        9 86.4 3.10  1.05  7.14
#> 93        9 86.4 3.10  2.02  6.33
#> 94        9 86.4 3.10  3.53  5.66
#> 95        9 86.4 3.10  5.02  5.67
#> 96        9 86.4 3.10  7.17  4.24
#> 97        9 86.4 3.10  8.80  4.11
#> 98        9 86.4 3.10 11.60  3.16
#> 99        9 86.4 3.10 24.43  1.12
#> 100      10 58.2 5.50  0.00  0.24
#> 101      10 58.2 5.50  0.37  2.89
#> 102      10 58.2 5.50  0.77  5.22
#> 103      10 58.2 5.50  1.02  6.41
#> 104      10 58.2 5.50  2.05  7.83
#> 105      10 58.2 5.50  3.55 10.21
#> 106      10 58.2 5.50  5.05  9.18
#> 107      10 58.2 5.50  7.08  8.02
#> 108      10 58.2 5.50  9.38  7.14
#> 109      10 58.2 5.50 12.10  5.68
#> 110      10 58.2 5.50 23.70  2.42
#> 111      11 65.0 4.92  0.00  0.00
#> 112      11 65.0 4.92  0.25  4.86
#> 113      11 65.0 4.92  0.50  7.24
#> 114      11 65.0 4.92  0.98  8.00
#> 115      11 65.0 4.92  1.98  6.81
#> 116      11 65.0 4.92  3.60  5.87
#> 117      11 65.0 4.92  5.02  5.22
#> 118      11 65.0 4.92  7.03  4.45
#> 119      11 65.0 4.92  9.03  3.62
#> 120      11 65.0 4.92 12.12  2.69
#> 121      11 65.0 4.92 24.08  0.86
#> 122      12 60.5 5.30  0.00  0.00
#> 123      12 60.5 5.30  0.25  1.25
#> 124      12 60.5 5.30  0.50  3.96
#> 125      12 60.5 5.30  1.00  7.82
#> 126      12 60.5 5.30  2.00  9.72
#> 127      12 60.5 5.30  3.52  9.75
#> 128      12 60.5 5.30  5.07  8.57
#> 129      12 60.5 5.30  7.07  6.59
#> 130      12 60.5 5.30  9.03  6.11
#> 131      12 60.5 5.30 12.05  4.57
#> 132      12 60.5 5.30 24.15  1.17
dat <- Theoph
dat$time <- dat$Time

# Code for getting time over MIC of a **thereshold** of x (thereshold  labelled as th):

f2 <- function(dat,th,logarithmic=FALSE) {
  above = 0
  w <- dat$conc > th
  w[w == FALSE] <- 0
  w[w == TRUE] <- 1
  w2 <- which(abs(diff(w)) == 1)
  if (logarithmic == FALSE) {
    for (i in w2) {
      n1 <- diff(w)[i] * -1*(dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                                (th-dat$conc[i+1])/(dat$conc[i]-dat$conc[i+1])))
      above <- above + n1
    }
  }
  if (logarithmic == TRUE) {
    for (i in w2) {
      if (diff(w)[i] == 1) {
        n1 <- (dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                  (th-dat$conc[i+1])/(dat$conc[i]-dat$conc[i+1])))
        above <- above - n1
      }
      if (diff(w)[i] == -1) {
        n1 <- (dat$time[i+1] + ((dat$time[i]-dat$time[i+1]) *
                                  (log(th)-log(dat$conc[i+1]))/(log(dat$conc[i])-log(dat$conc[i+1]))))
        above <- above + n1
      }
    }
  }
  return(above)
}

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)

dat |>
  group_by(Subject) |>
  nest() |>
  mutate(
    Time=map_dbl(data, ~f2(.x, th=4, logarithmic = TRUE))
  )
#> # A tibble: 12 × 3
#> # Groups:   Subject [12]
#>    Subject data               Time
#>    <ord>   <list>            <dbl>
#>  1 1       <tibble [11 × 5]> 19.9 
#>  2 2       <tibble [11 × 5]>  9.57
#>  3 3       <tibble [11 × 5]> 11.0 
#>  4 4       <tibble [11 × 5]> 11.9 
#>  5 5       <tibble [11 × 5]> 12.6 
#>  6 6       <tibble [11 × 5]>  6.34
#>  7 7       <tibble [11 × 5]>  9.48
#>  8 8       <tibble [11 × 5]>  9.41
#>  9 9       <tibble [11 × 5]>  8.93
#> 10 10      <tibble [11 × 5]> 16.3 
#> 11 11      <tibble [11 × 5]>  7.86
#> 12 12      <tibble [11 × 5]> 12.7

^{Created on 2025-05-05 with reprex v2.1.1}

Beavet82 · May 8, 2025, 10:03am

Thanks a lot @StatSteph , this update in the code worked for my data-set.

Kind regards
Beatriz

system · May 15, 2025, 10:03am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.