Receiving NA when using max/min function

Hi everyone,

I am getting NA in my R Console when there are values in my datasets. I have double checked if there are empty cells in spreadsheets as well.

When I use head(all_trips), there are data in the trip_time columns.
Then when I use my max function, I keep getting return as NA.

Is there something wrong with the way I code? Please kindly assist. Thank you in advanced.

First off - it's much easier for your readers if you cut and paste code/output into a preformatted text box. Or, even better, use reprex.

I can't see anything wrong (but then neither can I copy the code to run myself)
Try using a different column name (ie not max)

Hi, this is my code, I'm actually using csv files which I have edited in excel before linking to variable in R.


title: "R Notebook"
author: "Edwin Tay"
date: "26/7/2023"
output: html_notebook

Case Study for Cyclistic

Introduction

Cyclists is a bike-share offering company that launched since 2016 and has grown to a fleet of 5824 bicycles that are geotracked and locked into a network of 692 stations across Chicago.
They offer different pricing plans for both casual riders and annual members.
Their finance analysts have concluded that annual members are much more profitable than casual riders.

STEP 1: ASK

Business Task/Question:

How do annual members and casual riders use Cyclistic bikes differently?

Stakeholders:

  • Lily Moreno (Director of Marketing and Manager)
  • Cyclistic Executive Team
# Packages required for this analysis
install.packages("tidyverse") # for data import
install.packages("lubridate") # for date functions
install.packages("ggplot2") # for visualization
library(tidyverse)
library(lubridate)
library(ggplot2)
library(dplyr)
library(readr)

STEP 2: PREPARE

qw months of data were collected (Jul 2022 to Jun 2023) instead of 12.
R Studio keeps crashing if I uploaded too much datasets even with 6 datasets.
The data collected is first-party type as they were collected by Cyclistic themselves.
The data collected is also under license by Lyft Bikes and Scooters, LLC ("Bikeshare").
Therefore, the data is reliable.

Data preparation and cleaning will be done in spreadsheet.

STEP 3: PROCESS

R Programming was chosen to handle the process as it can help to break down large data and show great visualization.
Compared between the original and new datasets and the changelog to ensure that the data is clean and properly documented.
Verified that the data is clean (unnecessary data filtered out).

Data Cleaning done in Spreadsheets before importing into R Studio

First change of datasets: Removed lattitude and longitude columns of the datasets (irrelevant in our analysis)
Second change of datasets: Removed station names/id columns from datasets (there are majority of missing data in some months)
Third change of datasets: Change the time format to dd/mm/yyyy hh:mm
Fourth change of datasets: Added a new column for the trip_duration where end time - start time in hh:mm format

Linking to datasets

d202207 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202207-divvy-tripdata-remove.csv")
d202208 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202208-divvy-tripdata-remove.csv")
d202209 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202209-divvy-tripdata-remove.csv")
d202210 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202210-divvy-tripdata-remove.csv")
d202211 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202211-divvy-tripdata-remove.csv")
d202212 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202212-divvy-tripdata-remove.csv")
d202301 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202301-divvy-tripdata-remove.csv")
d202302 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202302-divvy-tripdata-remove.csv")
d202303 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202303-divvy-tripdata-remove.csv")
d202304 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202304-divvy-tripdata-remove.csv")
d202305 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202305-divvy-tripdata-remove.csv")
d202306 <- read_csv("C:/Users/edwin/Desktop/Case_Studies/001/Data/202306-divvy-tripdata-remove.csv")

STEP 4: ANALYZE

Datasets will be formatted to match each other in order to combine them.

# Checking the columns of all datasets (must be same)
colnames(d202207)
colnames(d202208)
colnames(d202209)
colnames(d202210)
colnames(d202211)
colnames(d202212)
colnames(d202301)
colnames(d202302)
colnames(d202303)
colnames(d202304)
colnames(d202305)
colnames(d202306)
# Check if there is any abnormalities
str(d202207)
str(d202208)
str(d202209)
str(d202210)
str(d202211)
str(d202212)
str(d202301)
str(d202302)
str(d202303)
str(d202304)
str(d202305)
str(d202306)
# Convert ride_id and rideable_type to character type
d202207 <- mutate(d202207, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202208 <- mutate(d202208, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202209 <- mutate(d202209, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202210 <- mutate(d202210, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202211 <- mutate(d202211, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202212 <- mutate(d202212, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202301 <- mutate(d202301, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202302 <- mutate(d202302, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202303 <- mutate(d202303, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202304 <- mutate(d202304, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202305 <- mutate(d202305, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
d202306 <- mutate(d202306, ride_id=as.character(ride_id), rideable_type=as.character(rideable_type))
# Combine the datasets together
all_trips <- bind_rows(d202207, d202208, d202209, d202210, d202211, d202212, d202301, d202302, d202303, d202304, d202305, d202306)
all_trips <-mutate(all_trips, trip_time=as.numeric(trip_time))

Inspecting the new table for any irregularities or abnormalities.

colnames(all_trips)
nrow(all_trips)
dim(all_trips)
head(all_trips)
str(all_trips)
summary(all_trips)

# Checking if there are abnormalities in member_casual and trip_duration columns.
check_trips <- all_trips %>%
  group_by(member_casual) %>%
  summarize(max = max(trip_time))
head(check_trips)

If there is one single NA in the column the result will be NA as well.
It's good practise to add "na.rm = TRUE" to these functions (max, min, mean...)

numbers = c(1, 3, 5, 7, 9, NA)

max(numbers)
max(numbers, na.rm = TRUE)

Sorry, I am still pretty new to R programming. Can you let me know where should I add "na.rm = TRUE" in my code?

check_trips <- all_trips %>%
group_by(member_casual) %>%
summarize(max = max(trip_time))
head(check_trips)

here:

... %>%
summarize(max = max(trip_time, na.rm=TRUE))

There are values now! Thank you so much! Really appreciate all the help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.