Dear R users,
I have the following excel table (it spans about 2 years worth of data):
I want to take the England level data (last row) for columns ‘total road accidents’, total accidents attended’, ‘emergency accidents’ and ‘emergency accidents attended’ and transpose it into a data frame in R, with 5 columns (date, total road accidents, total accidents attended, emergency accidents, and emergency accidents attended).
I have managed this (see solution below) but in what feels like an extremely clunky way. I really want to improve my R skills in terms of loading in and transposing data as I am working with a lot of excel data that is not properly formatted for R.
Please note I also posted this on stackoverflow. I am aware of your cross posting policy and so I have reformulated here. However, as some images may be helpful, I will leave the link to the stackoverflow post. I tried to paste all the images in here but I believe users are only allowed to embed 1 image
Transposing excel table into data frame in R - Stack Overflow
Here is my solution at present:
Start by generating list of dates for date column (this proved to be simpler than reading the dates from the excel file)
start_date <- as.Date("2021-10-25")
end_date <- as.Date("2021-11-31") #in my code this includes dates up to 2023 but the reprex below only includes first two weeks
date_range <- as.character(seq(start_date, end_date, "weeks"))
Read in the data (an excel file in my code, substituted here with a reprex including the first two weeks of data)
accidents <- structure(list(`Week Commencing` = c("City", "Birmingham", "Bradford",
"Brighton and Hove", "Bristol", "Cambridge", "Canterbury", "Carlisle",
"Chelmsford", "Chester", "Chichester", "Colchester", "Coventry",
"Derby", "Doncaster", "Durham", "ENGLAND"), `44494...2` = c("Total road accidents",
"1058", "369", "471", "1025", "717", "1495", "3350", "1006",
"1696", "2194", "73", "66", "1221", "128", "629", "15498"), `44494...3` = c("Total accidents Attended",
"856", "310", "391", "818", "599", "1193", "2530", "834", "1347",
"1703", "54", "52", "996", "106", "223", "12012"), `44494...4` = c("Total accidents Conveyed",
"616", "234", "298", "577", "458", "798", "1965", "609", "1035",
"1198", "49", "44", "660", "81", "140", "8762"), `44494...5` = c("Emergency accidents",
"880", "270", "376", "936", "626", "1243", "2815", "815", "1496",
"1892", "33", "34", "989", "63", "615", "13083"), `44494...6` = c("Emergency accidents attended",
"671", "214", "327", "692", "489", "948", "2070", "661", "1071",
"1364", "26", "28", "758", "54", "203", "9576"), `44494...7` = c("Emergency accidents conveyed",
"484", "148", "245", "442", "357", "591", "1532", "452", "765",
"946", "19", "25", "422", "40", "130", "6598"), `44501...8` = c("Total road accidents",
"974", "339", "461", "1027", "709", "1323", "3214", "1135", "1636",
"2146", "62", "68", "1092", "119", "614", "14919"), `44501...9` = c("Total accidents Attended",
"780", "289", "379", "812", "581", "1063", "2516", "900", "1300",
"1665", "44", "49", "895", "94", "213", "11580"), `44501...10` = c("Total accidents Conveyed",
"578", "218", "305", "533", "445", "730", "1901", "652", "1008",
"1180", "38", "44", "573", "73", "141", "8419"), `44501...11` = c("Emergency accidents",
"974", "297", "427", "1047", "758", "1481", "2848", "818", "1504",
"1929", "30", "37", "1012", "62", "557", "13781"), `44501...12` = c("Emergency accidents attended",
"713", "250", "351", "752", "560", "1086", "2159", "697", "1151",
"1418", "24", "28", "793", "46", "164", "10192"), `44501...13` = c("Emergency accidents conveyed",
"494", "164", "263", "476", "392", "664", "1559", "470", "867",
"944", "18", "24", "442", "35", "109", "6921")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -17L))
select England level data and remove first column
england_accidents <- accidents[17, -1]
Extract the columns desired for analysis:
total_accidents <- (england_accidents[, c(1, seq(7, ncol(england_accidents),
by = 6))])
total_accidents_attended <- (england_accidents[,
seq(2, ncol(england_accidents), 6)])
total_emergencies <- (england_accidents[,
seq(4, ncol(england_accidents), 6)])
total_emergencies_attended <- (england_accidents[,
seq(5, ncol(england_accidents), 6)])
Transpose the vectors from row to column format:
total_accidents <- t(total_accidents)
total_accidents_attended <- t(total_accidents_attended)
total_emergencies <- t(total_emergencies)
total_emergencies_attended <- t(total_emergencies_attended)
make df
accidents_df <- as.data.frame(cbind(date_range, total_accidents,
total_accidents_attended, total_emergencies,
total_emergencies_attended))
final changes to fix column/row names
accidents_columns <- c("total_accidents",
"total_accidents_attended" ,"total_emergencies",
"total_emergencies_attended")
names(accidents_df)[2:5] <-accidents_columns
rownames(accidents_df) <- 1:nrow(accidents_df)