tidy list of 120 csv files


i've got a problem with getting the data from https://www.kaggle.com/groffo/ads16-dataset into a usable form.
Here is an example file: https://www.dropbox.com/s/p5bfg3tssec0ow8/U0001-RT.csv?dl=0

i put all 120 *RT.csv files into one folder. One file contains 300 ad ratings.
I imported them with:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim, stringsAsFactors = FALSE, sep = ";")

my goal is to get a row per person and the 300 ad ratings as columns (maybe there is some way to use the already existing labels [from Cat0 to Cat19] , but divide it into 15 sub-categories [Cat01, Cat02...Cat11, Cat12]).

Maybe someone got a few tips for me (playing around for one day already - getting serious headache now). That would be great!


The code I am providing is not very beautiful nor elegant, but it should do the trick and hopefully relieve you from your headache:)


temp = list.files(pattern="*.csv")
myfiles <- lapply(temp, function(filename){
  dat <- read.csv(filename, stringsAsFactors = FALSE, header = TRUE, sep = ";")
  dat$ID <- as.character(filename) # this adds an ID to each element of the list

transform_fun <- function(x){
  x %>% t() %>% as.data.frame() %>%  mutate(Cat = rownames(x %>% t() %>% as.data.frame())) %>%
    mutate(ID = x[1,21]) %>% 
    filter(Cat != "ID") %>% 
    separate(col = 2, into= c(as.character(1:15)),sep = ",") %>% 
    gather(subcategory, Value,2:16) %>% 
    mutate(Category = paste(Cat, subcategory, sep = "_")) %>% 
    select(Category, ID, Value) %>% 
    spread(Category, Value)

df <- lapply(myfiles, transform_fun) %>% do.call(bind_rows,.)

Hope that helps!

Edit: I changed rbind to bind_rows

wow, this is more than i expected. Thank you very, very much for your time!
i will analyze the code and hopefully i can learn to do it by myself the next time.

Hi Markez, the first part indeed was a great help, but I am unable to follow your second part. Now myfiles is a large list with each element as a dataframe. I want to filter certain columns for each dataframe (there were 70 files) in all.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.