How to eliminate outlier trajectories from a dataset?

Hello everyone,

I would like to find a way to eliminate outliers that are beyond the 25th and 75th percentile of my dataset. The issue is that each row of my dataset represents a trajectory, and I would like to remove not only singular values, but a whole trajectory that, at least at one point along its duration (colummns F11 to F110), is considered outlier.

Here is a sample of my data:

I would appreciate any help! If you have any questions or if I did not express myself clearly, please do not hesitate to ask.

Interpreting the application of the 25/75 test to the entire data frame rather than on a row-by-row basis. The approach is to move from the initial data frame to its subsets of outliers and keepers through a series of what questions—what must be done to bring the initial object closer to the desired object and that involves choosing what function to apply.

# data
d <- data.frame(C = c(
  "w", "w", "w", "w", "w", "l", "l", "l",
  "w", "w"
), F10 = c(
  858, 831, 614, 802, 782, 472, 449,
  629, 560, 565
), F11 = c(
  864, 825, 615, 750, 738, 446,
  454, 510, 565, 567
), F12 = c(
  872, 812, 618, 654, 680,
  430, 453, 474, 556, 558
), F13 = c(
  898, 772, 621, 563,
  642, 428, 457, 472, 561, 544
), F14 = c(
  853, 718, 621,
  529, 625, 438, 452, 481, 558, 531
), F15 = c(
  691, 677,
  617, 515, 626, 482, 465, 491, 543, 519
), F16 = c(
  642, 615, 533, 576, 506, 494, 503, 569, 512
), F17 = c(
  619, 615, 566, 611, 511, 515, 512, 549, 512
), F18 = c(
  603, 614, 605, 627, 507, 562, 576, 582, 517
), F19 = c(
  590, 617, 640, 630, 514, 622, 610, 580, 527
), F110 = c(
  579, 624, 630, 606, 562, 648, 673, 597, 540

# functions
df_spot_outliers <- function(x) unlist(d[x,2:12]) <= q[1] | unlist(d[x,2:12]) >= q[2]
my_quantile <- function(x) quantile(d[x,2:12], probs = the_probs)
spot_outliers <- function(x) d[x,2:12] <= r[x,1] | d[x,2:12] >= r[x,2]

# main

the_probs = c(0.25,0.75)

# assuming outliers are calculated on a data frame basis

(q <- quantile(unlist(d[2:11]),prob = the_probs))
#>   25%   75% 
#> 513.5 630.0

m <- matrix(nrow = 10, ncol = 11)
for(i in 1:10) m[i,] <- df_spot_outliers(i)
# show quantiles used
#>   25%   75% 
#> 513.5 630.0
(outliers <- d[which(rowMeans(m) != 0),])
#>    C F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F110
#> 1  w 858 864 872 898 853 691 642 639 630 630  645
#> 2  w 831 825 812 772 718 677 642 619 603 590  579
#> 4  w 802 750 654 563 529 515 533 566 605 640  630
#> 5  w 782 738 680 642 625 626 576 611 627 630  606
#> 6  l 472 446 430 428 438 482 506 511 507 514  562
#> 7  l 449 454 453 457 452 465 494 515 562 622  648
#> 8  l 629 510 474 472 481 491 503 512 576 610  673
#> 10 w 565 567 558 544 531 519 512 512 517 527  540
#>   25%   75% 
#> 513.5 630.0
(keepers <- d[which(rowMeans(m) == 0),])
#>   C F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F110
#> 3 w 614 615 618 621 621 617 615 615 614 617  624
#> 9 w 560 565 556 561 558 543 569 549 582 580  597

Thank you very much, that helped me a lot!

