How do I predict customer conversions in R? Need help walking-through,

AyomideA · March 7, 2024, 5:00am

Hello All,

I am trying to predict how likely it is that casual customers will convert into memberships in R. I know that I can either use logistic regression or a decision tree but need help after I upload the dataset onto the console.

Here is what I have so far :

read.csv("/cloud/project/Divvy_Trips_2020_Q1-Divvy_Trips_2020_Q1 (2).csv", header=FALSE)

Error Message - Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '/cloud/project/Divvy_Trips_2020_Q1-Divvy_Trips_2020_Q1 (2).csv': No such file or directory

Why is it stating that no such file is in the directory if I uploaded it and imported it already?

andresrcs · March 7, 2024, 5:03am

Using absolute file paths is usually a bad idea in R, the error message you are getting can only mean that you are not specifying the file path correctly. We can't see your environment so we can't tell you what the correct file path is but I think you should look into using a relative file path instead.

AyomideA · March 7, 2024, 5:07am

I'm sorry I fairly new to R. I am not understanding. I thought I had to use the exact file name to import the dataset to begin working on it. Here I have included a screenshot of the environment panel. Thank you for explaining.

andresrcs · March 7, 2024, 5:11am

You should use a relative file path to the csv file like in the first line of code you are showing but maybe you are dealing with a problematic file name. Try changing the file name to something safer (with out parentheses). Or even better use the UI tools to import the file using a dialog box, that way you don't need to manually type anything

AyomideA · March 7, 2024, 6:05am

Great thank you! I changed it to "Bike_Trips_2020"

data1 <- read_csv("Bike_Trips_2020.csv")
Rows: 426887 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
dbl (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, e...
time (1): ride_length

Use spec() to retrieve the full column specification for this data.
Specify the column types or set show_col_types = FALSE to quiet this message.
Warning message:
One or more parsing issues, call problems() on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)

Here's a screenshot so you can better see everything.

AyomideA · March 9, 2024, 10:30pm

Hello All, next I am trying to set up the linear regression model in R.
Ultimately my goal is to predict how many casual customers will convert to memberships in 2 years.

Firstly, I am inputting the linear regression as such:

library(readr)
usertypeandridelength <- read_csv("Bike_Trips_2020.csv") upload the data
lmRide_Length =lm(Ride_length~User_Type, data = usertypeandridelength) #Create linear regression
summary(lmRide_Length) #Review the results

I received an error message:
library(readr)

usertypeandridelength <- read_csv("Bike_Trips_2020_csv") upload the data
Error: 'Bike_Trips_2020_csv' does not exist in current working directory ('/cloud/project').
library(readr)
usertypeandridelength <- read_csv("Bike_Trips_2020.csv") upload the data
Rows: 426887 Columns: 15
── Column specification ────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (7): ride_id, rideable_type, started_at, ended_at, start_station_name, end_station_name, user_type
dbl (7): start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng, day_of_week
time (1): ride_length

Use spec() to retrieve the full column specification for this data.
Specify the column types or set show_col_types = FALSE to quiet this message.
Warning message:
One or more parsing issues, call problems() on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)

lmRide_Length =lm(Ride_length~User_Type, data = usertypeandridelength) #Create linear regression
Error in eval(predvars, data, env) : object 'Ride_length' not found

Where am I going wrong, I am following the instructions listed on Data camp, verbatim.

Can anyone assist with this?

Warm Regards,

Heres a screenshot as well:

AyomideA · March 10, 2024, 1:24am

Hello, I have successfully uploaded the dataset I would like to use, I just do not know how to picot to the ;I near regression model.

For context, I am trying to predict how many casual customers will convert into memberships in R.

I have the code as:

install.packages("tidyverse")
install.packages("tidymodels")
library(tidymodels)

#Read the dataset and convert the variable to a factor
dataset2 <- read_csv("Bike_Trips_2019.csv")
dataset2$y = as.factor(dataset2$y)

plot the gender and birth year against the target variable
ggplot(dataset2, aes(gender, fill = y))+
geom_bar(x=gender,y=user_type)
coord_flip()

Per the data camp article: https://www.datacamp.com/tutorial/logistic-regression-R

Error message:
Use spec() to retrieve the full column specification for this data.
Specify the column types or set show_col_types = FALSE to quiet this message.

dataset2$y = as.factor(dataset2$y)
Error in $<-:
! Assigned data as.factor(dataset2$y) must be compatible with existing data.
Existing data has 365069 rows.
Assigned data has 0 rows.
Only vectors of size 1 are recycled.
Caused by error in vectbl_recycle_rhs_rows():
! Can't recycle input of size 0 to size 365069.
Run rlang::last_trace() to see where the error occurred.
Warning message:
Unknown or uninitialised column: y.
rlang::last_trace()
<error/tibble_error_assign_incompatible_size>
Error in $<-:
! Assigned data as.factor(dataset2$y) must be compatible with existing data.
Existing data has 365069 rows.
Assigned data has 0 rows.
Only vectors of size 1 are recycled.
Caused by error in vectbl_recycle_rhs_rows():
! Can't recycle input of size 0 to size 365069.

Backtrace:
▆

├─base::$<-(*tmp*, y, value = <fct>)
└─tibble:::$<-.tbl_df(*tmp*, y, value = <fct>)
└─tibble:::tbl_subassign(...)

└─tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)

I am not sure where I am going wrong. Can anyone help?

AyomideA · March 10, 2024, 1:30am

Here is a screenshot so you can view my dataset , y= the user type, it is being measured against demographic data, attributes of user type.

system · March 31, 2024, 1:31am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.