Hello all,
I am looking to compare pollution episodes to find events that overlap. I have 3 lists of pollution episodes for different species, all of different lengths.
I want to be able to pick out the date times where the overlap occurs.
Here is the code that I have so far (I've no idea how to even attempt this and previous attempts revealed a long list of true/false answers that I didn't know how to parse.
Files can be found here .
library(tidyverse)
library(purrr)
library(dplyr)
library(lubridate)
library(intervals)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
CO <- read.csv("file1.csv")
CO$CO.Time.Start<- ymd_hm(CO$CO.Time.Start)
CO$CO.Time.End <- ymd_hm(CO$CO.Time.End)
CO <- CO %>% mutate(interval = interval(start = CO.Time.Start, end = CO.Time.End))
NO <- read.csv("file2.csv")
NO$NO.Start.Time <- ymd_hm(NO$NO.Start.Time)
NO$NO.Stop.Time <- ymd_hm(NO$NO.Stop.Time)
NO <- NO %>% mutate(interval = interval(start = NO.Start.Time, end = NO.Stop.Time))
PM <- read.csv("C:\\file3.csv")
PM$PM.Start.Time <- ymd_hm(PM$PM.Start.Time)
PM$PM.Stop.Time <- ymd_hm(PM$PM.Stop.Time)
PM <- PM %>% mutate(interval = interval(start = PM.Start.Time, end = PM.Stop.Time))
CO / NO Episodes
NO$Event[which(CO$interval %within% NO$interval) ]
CO / PM2.5 Episodes
PM$Event[which(CO$interval %within% PM$interval)]
NO / PM2.5 Episodes
PM$Event[which(NO$interval %within% PM$interval)]
CO / NO / PM2.5 Episodes
PM$Event[which(NO$interval %within% PM$interval & CO$interval %within% PM$interval)]
There should be overlap with PM and CO on July 5th, for example, and that doesn't get reported.
Number of episodes for each species, if it matters:
CO = 67
NO = 76
PM = 86
I think you are misreading what the {intervals} package is doing. See Overview of the intervals package .
I think what you may need is the "interval" and "int_overlaps" commands in {lubridate} package though, at the moment I an not sure what the unequal N's are doing.
Here is what I am getting so far. Note I have drastically shortened your file names and renamed your variables to cut down on typing and to remove the spaces in your variable names. R really does not like something like CO Time Start as a variable name.
Does this look like any you want?
suppressMessages(library(data.table))
suppressMessages(library(tidyverse))
CO <- fread("CO.csv")
NO <- fread("NO.csv")
PM <- fread("PM.csv")
# Rename
names(CO) <- names(NO) <- names(PM) <- c("event", "begn", "fin")
# Convert character data to POSIXct format
CO[, c("begn", "fin") := lapply(.SD, ymd_hm),
.SDcols = c("begn", "fin")]
NO[, c("begn", "fin") := lapply(.SD, ymd_hm),
.SDcols = c("begn", "fin")]
PM[, c("begn", "fin") := lapply(.SD, ymd_hm),
.SDcols = c("begn", "fin")]
# Get intervals
COInt <- CO[, interval(begn, fin)]
NOInt <- NO[, interval(begn, fin)]
PMInt <- PM[, interval(begn, fin)]
# Look for overlaps
int_overlaps(COInt, NOInt)
int_overlaps(COInt, PMInt)
Sort of - I had tried this method beforehand and was gifted a very long list of TRUE/FALSE.
To save time/manually counting, I want to be able to pick out the event numbers only.
My other gripe with int_overlaps is that it can only check between two intervals - I'm looking for the intersection of all 3 as well. Saving the result to a variable isn't helpful in this instance because it's a logical vector and doesn't check again.
Yes, it is returning a logical vector so it's doing what it is intended to do. I think I have a bit better idea of what you want though I am not sure of how to go about it. I'll poke around a bit and see what I can do.
With luck someone with a better grasp of {lubridate} data formats may be able to come up with some ideas.
FJCC
February 22, 2024, 3:58pm
5
The join_by() function of dplyr with its within() and overlaps() helpers may be what you need.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
NO <- read.csv("~/R/Play/NO.csv")
CO <- read.csv("~/R/Play/CO.csv")
PM <- read.csv("~/R/Play/PM.csv")
NO <- NO |> mutate(across(2:3, ymd_hm))
CO <- CO |> mutate(across(2:3, ymd_hm))
PM <- PM |> mutate(across(2:3, ymd_hm))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(2:3, ymd_hm)`.
#> Caused by warning:
#> ! 2 failed to parse.
#which NO envents started and ended during a CO event?
NO_CO <- inner_join(NO, CO, join_by(within(NO_Start, NO_End, CO_Start, CO_End)))
head(NO_CO)
#> Event.x NO_Start NO_End Event.y CO_Start
#> 1 9 2023-07-22 07:00:00 2023-07-22 13:00:00 12 2023-07-21 06:00:00
#> 2 12 2023-08-06 05:00:00 2023-08-06 13:00:00 19 2023-08-05 09:00:00
#> 3 15 2023-08-12 21:00:00 2023-08-13 02:00:00 23 2023-08-12 12:00:00
#> 4 20 2023-08-28 03:00:00 2023-08-28 13:00:00 33 2023-08-27 17:00:00
#> 5 23 2023-09-07 09:00:00 2023-09-07 14:00:00 37 2023-09-06 15:00:00
#> 6 24 2023-09-07 20:00:00 2023-09-08 02:00:00 37 2023-09-06 15:00:00
#> CO_End
#> 1 2023-07-22 23:00:00
#> 2 2023-08-07 02:00:00
#> 3 2023-08-13 10:00:00
#> 4 2023-08-28 13:00:00
#> 5 2023-09-08 11:00:00
#> 6 2023-09-08 11:00:00
#which NO events overlap CO events
NO_CO_2 <- inner_join(NO, CO, join_by(overlaps(NO_Start, NO_End, CO_Start, CO_End)))
head(NO_CO_2)
#> Event.x NO_Start NO_End Event.y CO_Start
#> 1 5 2023-07-09 02:00:00 2023-07-09 13:00:00 8 2023-07-08 07:00:00
#> 2 9 2023-07-22 07:00:00 2023-07-22 13:00:00 12 2023-07-21 06:00:00
#> 3 10 2023-07-24 06:00:00 2023-07-24 13:00:00 13 2023-07-23 03:00:00
#> 4 12 2023-08-06 05:00:00 2023-08-06 13:00:00 19 2023-08-05 09:00:00
#> 5 13 2023-08-07 02:00:00 2023-08-07 11:00:00 19 2023-08-05 09:00:00
#> 6 15 2023-08-12 21:00:00 2023-08-13 02:00:00 23 2023-08-12 12:00:00
#> CO_End
#> 1 2023-07-09 10:00:00
#> 2 2023-07-22 23:00:00
#> 3 2023-07-24 06:00:00
#> 4 2023-08-07 02:00:00
#> 5 2023-08-07 02:00:00
#> 6 2023-08-13 10:00:00
Created on 2024-02-22 with reprex v2.0.2
1 Like
FJCC:
NO_CO <- inner_join(NO, CO, join_by(within(NO_Start, NO_End, CO_Start, CO_End)))
head(NO_CO)
Thank you! This worked. Been breaking my head with this all week.
Lovely, I was completely forgetting a join in this situation.
system
Closed
February 29, 2024, 4:15pm
8
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.