I have a script with a function that cleans data in a batch of files in a folder. One important column in the dataframes is in number format that start with 0, where the 0 is important. Importing into R with extra code helps the zeros display correctly only if it is manually saved in the csv a certain way. When opened in excel, saving the column format as Custom 00000000000000 ( it needs 14 spaces) manually does the trick.
I would like to know if there is a way to add this as part of the function when saving/writing the csv files.
In order to preserve the leading zeros, one approach is to save/read that column as character strings. The following example shows the basics:
my_df1 <- data.frame(xx=letters[1:3],
yy=c("00000000000987",
"00000000123456",
"00000000000022"))
my_df1
write.csv(my_df1, row.names=FALSE) # Show output in console
write.csv(my_df1, file="my_df.csv", row.names=FALSE) # Save to disk
my_df2 <- read.csv("my_df.csv") # Default read drops the leading zeros!
my_df2
my_df3 <- read.csv("my_df.csv",
colClasses=c(NA,"character")) # Capture leading zeros from column 2
my_df3
Alternatively, you could read and process the column as numbers but pad with zeros
just before outputting as strings:
library(stringr)
str_pad(my_df2$yy, 14, pad = "0")
Or, my preferred option, read and process the column as numbers, then create and format the column directly in an Excel file (xlsx not csv) on-the-fly: