Hi. I'm new to R and was looking for help. I was hoping to create a scatterplot using two variables (danceability and popularity), however, when I run the code, it includes all of the variables in the csv, even though I have defined the x and y variables. Below is the code:
#Section 1 - 1.1
df <- read.csv("spotify_dataset.csv")
print(df)
#Section 1 - 1.2
set.seed(1008444239)
sample(1:112,1)
#The number I received from the sample is 49
unique(df$track_genre)
my_regression_df <- df[df$track_genre == "hardcore",]
#Section 1 - 1.3
sample(1:112,2)
#The two numbers that I received from the sample are 22 and 47
unique(df$track_genre)
my_group_diff_df <- df[df$track_genre == "dancehall" | df$track_genre == "happy",]
#Section 2 - 2.1
my_regression_df <- read.csv("spotify_dataset.csv")
install.packages("ggplot2")
library(ggplot2)
plot1 <- ggplot(data=df, aes(x=popularity, y=danceability)) + geom_point()
plot1
plot1_x <- ggplot(data=my_regression_df, aes(x=popularity, y=danceability)) + geom_point() +
xlab("popularity index")
plot1_x
plot1_y <- ggplot(data=my_regression_df, aes(x=popularity, y=danceability)) + geom_point() +
xlab("popularity index") + ylab("danceability index")
plot1_y
plot1_title <- ggplot(data=my_regression_df, aes(x=popularity, y=danceability)) + geom_point() +
xlab("popularity index") + ylab("danceability index") + ggtitle("popularity vs. danceability")
plot1_title