Analyze ridership Data by type and weekday

Yomi_Opackz · July 18, 2023, 6:41pm

I tried to analyze data by type and weekday which also include creating weekday field.

Group by usertype and weekday

Calculate the number of rides and average duration.

#> This is the syntax:

  mutate(weekday = wday(started_at, label = TRUE)) %>%  
group_by(member_casual, weekday) %>%  
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>% 
arrange(member_casual, weekday)```

# The error message reads as follows:
#> Error in `mutate()`:
ℹ In argument: `weekday = wday(started_at, label = TRUE)`.
Caused by error in `wday()`:
! could not find function "wday"
Run `rlang::last_trace()` to see where the error occurred.

#> rlang::last_trace()
<error/dplyr:::mutate_error>
Error in `mutate()`:
ℹ In argument: `weekday = wday(started_at, label = TRUE)`.
Caused by error in `wday()`:
! could not find function "wday"

Here is the head of the data:
#> # A tibble: 5 × 15
  ride_id  started_at          ended_at            rideable_type start…¹ start…²
  <chr>    <dttm>              <dttm>              <chr>           <dbl> <chr>  
1 22178529 2019-04-01 00:02:22 2019-04-01 00:09:48 6251               81 Daley …
2 22178530 2019-04-01 00:03:02 2019-04-01 00:20:30 6226              317 Wood S…
3 22178531 2019-04-01 00:11:07 2019-04-01 00:15:19 5649              283 LaSall…
4 22178532 2019-04-01 00:13:01 2019-04-01 00:18:58 4151               26 McClur…
5 22178533 2019-04-01 00:19:26 2019-04-01 00:36:13 3270              202 Halste…
# … with 9 more variables: end_station_id <dbl>, end_station_name <chr>,
#   member_casual <chr>, date <date>, month <chr>, day <chr>, year <chr>,
#   day_of_week <chr>, ride_length <dbl>, and abbreviated variable names
#   ¹start_station_id, ²start_station_name
# ℹ Use `colnames()` to see all variable names

# Here is the data.frame
data.frame(
  stringsAsFactors = FALSE,
     member_casual = c("member", "member", "member", "member", "member"),
               day = c("01", "01", "01", "01", "01"),
       ride_length = c(446, 1048, 252, 357, 1007)

technocrat · July 18, 2023, 7:11pm

lubrary(lubridate)

Yomi_Opackz · July 18, 2023, 8:16pm

Thank you. Im using the R version 4 . 2 . 2 - I hope it works.

Here is what popped up when I loaded "lubridate":

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

date, intersect, setdiff, union

Warning message:
package ‘lubridate’ was built under R version 4.2.3

jrkrideau · July 18, 2023, 8:24pm

Nothing to worry about. It just means that R-base & lubridate have functions with the same name. R will use the lubridate functions as the default in this session. If yo need to use the base functions instead (very unlikily) you can specify the funtion by

base::date

Yomi_Opackz · July 18, 2023, 8:30pm

I'm using RDesktop

I tried the syntax again and this is the output error:

#> Error in group_by():
! Must group by variables found in .data.
Column weekday is not found.
Run rlang::last_trace() to see where the error occurred.

jrkrideau · July 18, 2023, 8:48pm

I think you need to supply the data.frame If we call your data "mydata"

dat1  %>%  mutate(weekday = wday(started_at, label = TRUE)) %>%  
  group_by(member_casual, weekday) %>%  
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday

Yomi_Opackz · July 18, 2023, 9:03pm

Thank you @jrkrideau. I will give it a shot and let you know the outcome.

technocrat · July 18, 2023, 9:03pm

Usually not a problem when you see this type of message

Yomi_Opackz · July 18, 2023, 9:28pm

I will need to start the whole process all over again.

Could not pull the analysis out. I appreciate your professional input.

Yomi_Opackz · July 18, 2023, 9:32pm

Thank you. The syntax popped this output:

#> Error in all_trips_v2 %>% mutate(weekday = wday(started_at, label = TRUE)) %>% :
could not find function "%>%"

I will start the process from the scratch again. Your input is appreciated.

technocrat · July 18, 2023, 9:36pm

library(dplyr)

You were on the right track to start

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

the_datetime <- ymd_hms("2019-04-01 00:02:22")
wday(the_datetime, label = TRUE)
#> [1] Mon
#> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

^{Created on 2023-07-18 with reprex v2.0.2}

Yomi_Opackz · July 22, 2023, 8:55pm

Thank you for your support. I started the cleaning process again from scratch.

Despite installing and loading all the packages required

The ridership data by type and weekday could not be analyzed

here is the syntax:

  mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday)```

#> And here is the error prompt:
Error in `mutate()`:
ℹ In argument: `weekday = wday(started_at, label = TRUE)`.
Caused by error in `wday()`:
! unused argument (label = TRUE)
Run `rlang::last_trace()` to see where the error occurred.

# Your professional guidance is appreciated
# Thank you!

Yomi_Opackz · July 22, 2023, 9:03pm

Thank you for your support. I started the cleaning process again from scratch.

Despite installing and loading all the packages required

The ridership data by type and weekday could not be analyzed

here is the syntax:

  mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday)```

#> And here is the error prompt:
Error in `mutate()`:
ℹ In argument: `weekday = wday(started_at, label = TRUE)`.
Caused by error in `wday()`:
! unused argument (label = TRUE)
Run `rlang::last_trace()` to see where the error occurred.

# Your professional guidance is appreciated
# Thank you!

technocrat · July 22, 2023, 9:20pm

is the key—what kind of date representation is this?

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
# ISO formatted date string works
wday("2023-07-22",label = TRUE)
#> [1] Sat
#> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# system date works
wday(Sys.Date(),label = TRUE)
#> [1] Sat
#> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# epoachal date seems to quarter work
wday(system("date"))
#> [1] 0
# but not really, since it is day 0
# and 1970-01-01 was a Thurday, day 5
as.Date(system("date"))
#> [1] "1970-01-01"
wday(as.Date(system("date")))
#> [1] 5

Yomi_Opackz · July 24, 2023, 9:33pm

I'm a R newbie. Could you please let me know what you discovered.

Are you particular about the date format?

I appreciate your help. Could you please explain to me in a simple form.

Thank you!

technocrat · July 24, 2023, 11:25pm

Cut and paste a chunk of your unprocessed date column

Yomi_Opackz · July 26, 2023, 12:14pm

This is the data.frame:

  stringsAsFactors = FALSE,
              date = c("2019-04-01","2019-04-01",
                       "2019-04-01","2019-04-01","2019-04-01","2019-04-01",
                       "2019-04-01","2019-04-01","2019-04-01","2019-04-01"),
       day_of_week = c("Monday","Monday","Monday",
                       "Monday","Monday","Monday","Monday","Monday","Monday",
                       "Monday")```

# the first ten observations in the date column.

Yomi_Opackz · July 26, 2023, 1:14pm

it is important you also know that at one point of the data processing. I created a new vesion of the dataframe (v2).

As indicated in the guide note for the data cleaning (# We will create a new version of the dataframe (v2) since data is being removed) and this syntax was implemented to create a new version of the dataframe because bad data is beign removed: r all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

I hope you will find it useful as you try to help me.

here is the dataframe (v2)

         date = c("2019-04-01","2019-04-01","2019-04-01",
                  "2019-04-01","2019-04-01","2019-04-01","2019-04-01",
                  "2019-04-01","2019-04-01","2019-04-01"),
  day_of_week = c("Monday","Monday","Monday","Monday",
                  "Monday","Monday","Monday","Monday","Monday","Monday")```

technocrat · July 26, 2023, 7:02pm

OK, those are ISO date strings, so

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

d <- data.frame(
  date = c(
    "2019-04-01", "2019-04-01", "2019-04-01",
    "2019-04-01", "2019-04-01", "2019-04-01", "2019-04-01",
    "2019-04-01", "2019-04-01", "2019-04-01"
  ),
  day_of_week = c(
    "Monday", "Monday", "Monday", "Monday",
    "Monday", "Monday", "Monday", "Monday", "Monday", "Monday"
  )
)
d$date <- ymd(d$date)
d$dow <- wday(d$date)
d
#>          date day_of_week dow
#> 1  2019-04-01      Monday   2
#> 2  2019-04-01      Monday   2
#> 3  2019-04-01      Monday   2
#> 4  2019-04-01      Monday   2
#> 5  2019-04-01      Monday   2
#> 6  2019-04-01      Monday   2
#> 7  2019-04-01      Monday   2
#> 8  2019-04-01      Monday   2
#> 9  2019-04-01      Monday   2
#> 10 2019-04-01      Monday   2

Yomi_Opackz · July 27, 2023, 11:42am

@technocrat Thank you! I appreciate your effort and prompt response.

here is what happened after trying the syntax:

Error in lapply(list(...), .num_to_date) : object 'd' not found
> d$dow <- wday(d$date)
Error in wday(d$date) : object 'd' not found
> ```

# now that you know what the data.frame looks like.
# what would you suggest to be  best possible way to -
analyze the rideship data by type and weekday?