I am working on combining multiple Excel files of statistics about several countries. The excel file already comes in a standard format where the excel file name is Country1.xlsx, Country2.xlsx, Country3.xlsx ... and so on.
Here's an example of how one of the files look like for Country1.xlsx
, that is Jordan:
The name of the country in all of the files sets always on cell A1 in Excel. I use this code to combine the excel files into one data frame which works well:
library(tidyverse)
library(dplyr)
library(readxl)
library(purrr)
library(stringr)
Excel_dir <- "<MyExcelDirectoryPath>"
df_combined_long <-
dir_ls(Excel_dir, regex = "[.]xlsx$") %>%
set_names(path_ext_remove(path_file(.))) %>%
map_dfr(read_excel, .id = "CountryName")
.id = "CountryName"
will set the first column of the data frame to the name of the excel file for each combined file.
However, I would like the observations of this column to be the country name from each file; the one setting in cell A1. Manually, I can rename each excel file to the perspective country, but this would be for a large number of files and this is how they're formatted from their original source.
This is how I would like the final data frame to look like, as an example:
df_combined_long
CountryName Year Population Migrants Median Age
Jordan 2020 10203134 10220 23.8
Jordan 2019 10101694 10220 22.4
... ... ... ... ...
... ... ... ... ...
Palestine 2020 1235461 6325 20.6
Palestine 2019 1267891 6035 20.9
... ... ... ... ...
... ... ... ... ...
Is there a way to set the first column in the data frame based on a value in a certain cell from the excel sheet?