How to aggregate a column grouping by time

Panagiotis_Emmanouil · December 7, 2023, 5:31pm

Hi all,
I am working on the Bellabeat public dataset and I want to aggregate heart rate values by seconds(60 secons) using the Time column which I created:
heart_rate<- heart_rate%>%
mutate(Time=as_hms(Time))
Does anyone have an idea how I can do it?
Thanks
Panos

jrkrideau · December 7, 2023, 10:44pm

The Bellabeat public dataset seems to have ~9 .csv files. Which one are you using?

Can you also show us the code you are using Just copy the code and paste it between these tickmarks.
```

```

Thanks.

Panagiotis_Emmanouil · December 8, 2023, 8:06am

heart_rate<- read.csv("heartrate_seconds_merged.csv")
heart_rate$Time=as.POSIXct(heart_rate$Time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
glimpse(heart_rate)
heart_rate<- heart_rate%>%
separate(Time,c("Date","Time"),sep=" ")
glimpse(heart_rate)
heart_rate<- heart_rate%>%
mutate(Date=as.Date(Date))
glimpse(heart_rate)
heart_rate<- heart_rate%>%
mutate(Time=as_hms(Time))
Up to that point I have worked I split the Date column to Date and Time(hms). I want to aggegate values every 60 seconds

jrkrideau · December 8, 2023, 12:30pm

Thanks for the code.

I downloaded the file archive.zip from kaggle . When the archive is unpacked I get two folders Bellabeat Case Study data and Bellabeat Case Study pivot tables but I cannot see heartrate_seconds_merged.csv.
Below is a screenshot of the contents of Bellabeat Case Study data.

Have I downloaded the wrong dataset? If so, can you point me to a better source or. perhaps, you could just post some sample data here? A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

Panagiotis_Emmanouil · December 8, 2023, 12:49pm

Hi,
the original file name was "

Bellabeat-Case-Study/FitBit Dataset/Fitabase Data 4.12.16-5.12.16.zip at main · sayantanbagchi/Bellabeat-Case-Study · GitHub

jrkrideau · December 8, 2023, 1:24pm

Thanks. Your link took me to Fitabase Data 4.12.16-5.12.16.zip which when downloaded and unpacked gives me heartrate_seconds_merged.csv. I do not see a .xlsx file. Should I assume that heartrate_seconds_merged.csv is the .csv equivalent?

Panagiotis_Emmanouil · December 8, 2023, 4:01pm

I added a code my analysis:
t<- heart_rate %>%
mutate(Time = as.POSIXct(paste(Date, Time))) %>%
group_by(Time = cut(Time, breaks="1 min"),Id) %>%
mutate(HeartRate = mean(Value))
However the output is not what i wanted:

jrkrideau · December 8, 2023, 4:35pm

Does something like this do you any good?
(shamelessly stolen from R aggregate second data to minutes more efficient)

library(data.table)
DT1 <- fread("~/RJunk/datasets/heartrate_seconds_merged.csv")
DT1[, Time := as.POSIXct(Time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())]
DT1[, agg := sum(Value), by = .(timestamp = cut(Time, breaks= "1 min"))]

jrkrideau · December 8, 2023, 4:53pm

OOPS. that should read

library(data.table)
DT1 <- fread("~/RJunk/datasets/heartrate_seconds_merged.csv")
DT1[, Time := as.POSIXct(Time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())]
DT1[, agg :=  mean(Value), by = .(timestamp = cut(Time, breaks= "1 min"))]
DT

jrkrideau · December 11, 2023, 1:27pm

I finally found some time to get back to this problem.

I think this may do what you want.

library(data.table)
library(lubridate)
DT1<- fread("~/RJunk/datasets/heartrate_seconds_merged.csv")
DT1[, Time := as.POSIXct(Time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())]
DT1[, agg := mean(Value),  by = .(timestamp = cut(Time, breaks= "1 min"))]
DT1[, Time := floor_date(Time, "minute")]

DT2  <- DT1[, .SD[1], by= Time]
DT2[1:5,]

Panagiotis_Emmanouil · December 11, 2023, 5:16pm

Hi,
a quick question. I haven't understood why you replace the DT1 frame with the DT2 data frame. DT2 <- DT1[, .SD[1], by= Time]
SD is the function?
Many thanks for all your support. You are a star

jrkrideau · December 11, 2023, 5:43pm

DT1 is still the complete 2,483,658 row dataset while DT2 is only 44,181 rows.

DT1[, .N]
DT2[, .N]

I did not have to create a new data.table. It was just easier when I was debugging plus I like to have a complete dataset handy if I need to compare results. It's probably cleaner not to have DT2 in the final code.

SD is a miraculous data.table function that does all sorts of amazing things. Someday, I may even master a few of them.How to use .SD in the R data.table package or better yet, Making .SD your best friend. Some of How to use .SD in the R data.table package is dated and the code does not run without editing.

system · January 1, 2024, 5:44pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.