jak123
February 7, 2022, 9:49am
1
Hi R comm
im using this dataset: Billboard "The Hot 100" Songs | Kaggle
running this code:
music_df <- billboard100 %>%
select(date:artist, weeks_popular = "weeks.on.board")
library(lubridate)
library(stringr)
music_df %>%
mutate(date = ymd(date)) %>%
distinct(date) %<%
mutate(month = floor_date(date,"month"))
music_df$artist <- as.character(music_df$artist)
music_df %>%
mutate(date = ymd(date)) %>%
primary_artist = ifelse(str_detect(artist, "Featuring"),
str_match(artist, "(.*)\sFeaturing")[,2],
artist) %>%
select(artist, primary_artist)
want to split the artist into primary artist and featuring, but im getting an error:
Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
object 'artist' not found
Thanks!
Try
select(date,artist, weeks_popular = "weeks.on.board")
substituting a comma for a colon
to save me from downloading an 18mb file from kaggle can you please share a small sample of the data in a forum friendly way ? i.e. share the results of
dput(head(billboard100))
jak123
February 7, 2022, 1:29pm
5
dput(head(music_df,5)) gives me 75000 characters
jak123
February 7, 2022, 1:30pm
6
same for dput(head(music_df))
What does str(music_d) give you? I get the impression that R is not reading a delimiter properly so instead of several columns of data you are getting a single vector.
what about the original dataset though ? i.e. not music_df
dput outputs are 'too large' because of the use of factors where characters would do, therefore:
dput(head(mutate(billboard100,across(where(is.factor),as.character))))
jak123
February 7, 2022, 3:56pm
11
structure(list(date = c("2021-11-06", "2021-11-06", "2021-11-06",
"2021-11-06", "2021-11-06", "2021-11-06"), rank = 1:6, song = c("Easy On Me",
"Stay", "Industry Baby", "Fancy Like", "Bad Habits", "Way 2 Sexy"
), artist = c("Adele", "The Kid LAROI & Justin Bieber", "Lil Nas X & Jack Harlow",
"Walker Hayes", "Ed Sheeran", "Drake Featuring Future & Young Thug"
), last.week = 1:6, peak.rank = c(1L, 1L, 1L, 3L, 2L, 1L), weeks.on.board = c(3L,
16L, 14L, 19L, 18L, 8L)), row.names = c(NA, 6L), class = "data.frame")
jak123
February 7, 2022, 4:59pm
12
so everything was factor before, but how does that make the dput longer? and thanks
billboard100 <- structure(list(date = c("2021-11-06", "2021-11-06", "2021-11-06",
"2021-11-06", "2021-11-06", "2021-11-06"), rank = 1:6, song = c("Easy On Me",
"Stay", "Industry Baby", "Fancy Like", "Bad Habits", "Way 2 Sexy"
), artist = c("Adele", "The Kid LAROI & Justin Bieber", "Lil Nas X & Jack Harlow",
"Walker Hayes", "Ed Sheeran", "Drake Featuring Future & Young Thug"
), last.week = 1:6, peak.rank = c(1L, 1L, 1L, 3L, 2L, 1L), weeks.on.board = c(3L,
16L, 14L, 19L, 18L, 8L)), row.names = c(NA, 6L), class = "data.frame")
library(tidyverse)
library(lubridate)
billboard100 %>% select(-last.week,-peak.rank) %>%
mutate(date = ymd(date),
split_artist = str_split_fixed(artist,
"Featuring",
2)
)
system
Closed
February 15, 2022, 5:37pm
15
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.