Hello Everyone,
As a coding newbie, I am finding difficult to add a specific column from almost 3500 different CSV files. Each csv files contains 4 columns (ID, x (Long), y (Lat), and Value (Population)). I want to add the "value" column of 3500 csv files and make separate CSV indicating their sum and also its corresponding csv file name.
CSV file Structure:
|ID|X(Lon)|Y(lat)|value|
|4|76.54881139|9.02487862|2588.940186|
|5|76.55790669|9.024840158|2604.013184|
|6|76.56700194|9.024801471|2826.956543|
|7|76.57609716|9.02476256|2546.4375|
|8|76.58519233|9.024723423|2185.00708|
|9|76.59428747|9.024684061|2459.988525|
|10|76.60338256|9.024644474|2826.956787|
|11|76.61247761|9.024604662|2826.957031|
|12|76.62157262|9.024564625|2826.956543|
I'd use purrr
for this:
Here's an example (note that I've put my three files in a folder in My Documents called "csv-files").
library(tidyverse)
# see the files
read_csv("~/csv-files/file1.csv", show_col_types = F)
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 5
#> 2 2 10
#> 3 3 15
#> 4 4 20
#> 5 5 25
read_csv("~/csv-files/file2.csv", show_col_types = F)
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 4
#> 3 3 6
#> 4 4 8
#> 5 5 10
read_csv("~/csv-files/file3.csv", show_col_types = F)
#> # A tibble: 5 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 3
#> 2 2 6
#> 3 3 9
#> 4 4 12
#> 5 5 15
# function to summarise file
sum_file = function(path){
dat = read_csv(path, show_col_types = F)
tibble(file = path,
sum = sum(dat$y, na.rm = T))
}
# list files
files = list.files("csv-files", pattern = ".csv", full.names = T)
# summarise all
summary = map_dfr(files, sum_file)
summary
#> # A tibble: 3 x 2
#> file sum
#> <chr> <dbl>
#> 1 csv-files/file1.csv 75
#> 2 csv-files/file2.csv 30
#> 3 csv-files/file3.csv 45
write_csv(summary, "~/summary_file.csv")
Created on 2022-01-07 by the reprex package (v2.0.1)
Thank you so much @JackDavison. God Bless you. How to save the extracted map_dfr values into a single csv file.
Ah yes, I forgot about that part!
I'd do something like:
summary = map_dfr(files, sum_file)
readr::write_csv(summary, "~/summary_file.csv")
Where "~/summary_file.csv" can be replaced with some other path to wherever you want to save your csv file.
Thank you again @JackDavison . I have a mix of NA and numerical values in some of the CSV files . Do I need to make any changes in the above code?
Yes you would, instead of:
sum(dat$y)
You'd write:
sum(dat$y, na.rm = TRUE)
This tells sum()
to ignore the NAs in your data.
You are a lifesaver.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.