I'm trying to transform an excel file and was "handed down" an Rscript to transform specific files. I am very new to R so apologies in my lack of understanding.
this is my code:
# libraries
library(dplyr)
library(tidyr)
library(readxl)
library(openxlsx) # Load on each new session
# Use this for leavers survey
# change the file path "<path>/NEW - leavers data 2019 2020.xlsx"
inputFilepath <- "C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST.xlsx"
outputFilepath <- "C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST Reshaped.xlsx"
# C:/Users/bangathj/OneDrive - Centrica/Documents/Projects/Automation/Excel R Transformation Project/Leavers Survey/SCH0132 - Exit Interview Responses 2022-07-15 04_30 BST.xlsx
df = read_excel(inputFilepath, sheet="Sheet1")
# Step 1: Remove Duplicates
dfDuplicatesRemoved = df %>%
group_by("Employee ID", "Question Body") %>%
filter(row_number() == n()) %>% # keep last row
ungroup()
# Step 2: Pivot Data
dfTransformed = dfDuplicatesRemoved %>%
select("Employee ID", "Question Body", "Questionnaire Answer") %>%
pivot_wider(id_cols="Employee ID", names_from="Question Body", values_from="Questionnaire Answer")
# Step 3: Obtain an Employee Profile by fetching all data except for question related data.
dfEmployeeProfile = df %>%
dplyr::select(-c("Question Body", "Questionnaire Answer", "Question", "Questionnaire")) %>%
group_by("Employee ID") %>%
filter(row_number() == n()) %>% # keep last row
ungroup()
# Step 4: Join transformed data with Employee Profile to have a complete view
dfTranformedWithEmployeeProfile = dfTransformed %>% right_join(dfEmployeeProfile)
write_xlsx(dfTranformedWithEmployeeProfile, outputFilepath)
rm(list = ls())
when I ran it initially, I ran into an error due to backticks, where, in "Step 1" columns "Employee ID" and "Question Body" would not be detected. I fixed this by replacing the backticks with apostrophes, as, in the earlier code, the line was written as:
This was a mistake and sent you down the wrong track.
You could use feature of library(rlang) to interpret the strings as symbols for group_by, or use group_by_at that expects to get strings.
You've chosen to try something that I did not demonstrate to you....
Was there a reason you didn't choose to try one of the options that I showed as working ?
This is an aside but when using !!sym() you process one at a time so it would be group_by(!!sym("Employee ID"), !!sym("Question Body"))
however, I expect this would give the same error as your other code with group_by_at.
The interpretation from the error message is clear. Despite your belief that there is a variable "Employee ID" in the df, there is no such variable.
I recommend you use R to report to you the columnames of 'df' so you can base your manipulations of it from that , sorting them into alphabetical order, might help here.
Its clear your excel has preliminary rows that arent intented to be the table to read in and make df out of.
you will have to count the records to skip and use the skip parameter of read_excel to make a table having skipped the initial rows. Read xls and xlsx files — read_excel • readxl (tidyverse.org)
It worked! thank you, it was an issue with the excel file itself. All that's left is to replace the old code which writes the transformed data frame into excel, I'm guessing write_xlsx is not a used function anymore.