Issue repeating col_names when appending with write_csv

bryanrt · July 8, 2021, 1:43am

Issue

I am trying to automate a workflow which at one stage will append a csv file with updated values. I need to maintain col_names but I find that if I have append = T and col_names = T in my write_csv statement I end up with reapeats of the colnames mixed into my values. I assume this is a common enough issue that there is a simple fix I am unaware of. Any help is much appreaciated!

Reprex

This is just the lines that write the object and then write the object to the csv. If you run this once and then rerun the write_csv line a few times you will see what I am talking about.

NMTSO_test <- readr::read_csv(file = "https://gitlab.com/Bryanrt-geophys/sac2eqtransformr/-/raw/master/sample_data/NMTSO_meta.csv", col_names = T)

STEAD_csv <- NMTSO_test %>%
  dplyr::mutate(
    network_code = dplyr::if_else(network_code != "NULL", as.character(network_code), ""),
    receiver_code = dplyr::if_else(receiver_code != "NULL", as.character(receiver_code), ""),
    receiver_type = dplyr::if_else(receiver_type != "NULL",as.character(receiver_type), ""),      # manually defined, not sure if BH is correct though.
    receiver_latitude = dplyr::if_else(receiver_latitude != "NULL",as.character(receiver_latitude), ""),
    receiver_longitude = dplyr::if_else(receiver_longitude != "NULL",as.character(receiver_longitude), ""),
    receiver_elevation_m = dplyr::if_else(receiver_elevation_m != "NULL",as.character(receiver_elevation_m), ""),
    p_arrival_sample = dplyr::if_else(p_arrival_sample != "NULL",as.character(p_arrival_sample), ""),
    p_status = dplyr::if_else(p_status != "NULL",as.character(p_status), ""),
    p_weight = dplyr::if_else(p_weight != "NULL",as.character(p_weight), ""),
    p_travel_sec = dplyr::if_else(p_travel_sec != "NULL",as.character(p_travel_sec), ""),
    s_arrival_sample = dplyr::if_else(s_arrival_sample != "NULL",as.character(s_arrival_sample), ""),
    s_status = dplyr::if_else(s_status != "NULL",as.character(s_status), ""),
    s_weight = dplyr::if_else(s_weight != "NULL",as.character(s_weight), ""),
    source_id = dplyr::if_else(source_id != "NULL",as.character(source_id), ""),     # is this a user defined index, or does this come from IRIS?
    source_origin_time = dplyr::if_else(source_origin_time != "NULL",as.character(source_origin_time), ""),
    source_origin_uncertainty_sec = dplyr::if_else(source_origin_uncertainty_sec != "NULL",as.character(source_origin_uncertainty_sec), ""),
    source_latitude = dplyr::if_else(source_latitude != "NULL",as.character(source_latitude), ""),
    source_longitude = dplyr::if_else(source_longitude != "NULL",as.character(source_longitude), ""),
    source_error_sec = dplyr::if_else(source_error_sec != "NULL",as.character(source_error_sec), ""),
    source_gap_deg = dplyr::if_else(source_gap_deg != "NULL", as.character(source_gap_deg), ""),          # check this variables source to see the unit. deg or m?
    source_horizontal_uncertainty_km = dplyr::if_else(source_horizontal_uncertainty_km != "NULL",as.character(source_horizontal_uncertainty_km), ""),   # is this an std_1 value from statistical eval? May be in the dat files.
    source_depth_km = dplyr::if_else(source_depth_km != "NULL",as.character(source_depth_km), ""),
    source_depth_uncertainty_km = dplyr::if_else(source_depth_uncertainty_km != "NULL",as.character(source_depth_uncertainty_km), ""),     # is this an std_1 value from statistical eval? May be in the dat files.
    source_magnitude = dplyr::if_else(source_magnitude != "NULL",as.character(source_magnitude), ""),
    source_magnitude_type = dplyr::if_else(source_magnitude_type != "NULL",as.character(source_magnitude_type), ""),
    source_magnitude_author = dplyr::if_else(source_magnitude_author != "NULL",as.character(source_magnitude_author), ""),
    source_mechanism_strike_dip_rake = dplyr::if_else(source_mechanism_strike_dip_rake != "NULL",as.character(source_mechanism_strike_dip_rake), ""),
    source_distance_deg = dplyr::if_else(source_distance_deg != "NULL",as.character(source_distance_deg), ""),
    source_distance_km = dplyr::if_else(source_distance_km != "NULL",as.character(source_distance_km), ""),
    back_azimuth_deg = dplyr::if_else(back_azimuth_deg != "NULL",as.character(back_azimuth_deg), ""),
    snr_db = dplyr::if_else(snr_db != "NULL",as.character(snr_db) %>% stringr::str_remove_all(., c(",")) %>% stringr::str_remove_all(c("c")) %>% gsub("[(]", "[", .) %>% gsub("[)]", "]", .), ""),
    coda_end_sample = dplyr::if_else(is.na(coda_end_sample) == FALSE, paste0("[[",ceiling(coda_end_sample), ".]]"), ""),
    trace_start_time = dplyr::if_else(trace_start_time != "NULL",as.character(trace_start_time), ""),
    trace_category = dplyr::if_else(trace_category != "NULL",as.character(trace_category), ""),
    trace_name = dplyr::if_else(trace_name != "NULL",as.character(trace_name), "")
  )
# %>% dplyr::arrange(trace_name)

STEAD_csv$snr_db[which(is.na(NMTSO_test$snr_db %>% lapply(`[[`, 1)) == T)] = "[True True True]"

readr::write_csv(x = STEAD_csv, file = here::here("NMTSO_test.csv"), append = TRUE, col_names = TRUE)

Edits

Failed Fix

I tried using the base R function if.exists() to control an if statement to run either a write with col_names or without colnames:

    dplyr::if_else(file.exists(here::here("NMTSO_meta.csv")) == F,
                   readr::write_csv(x = STEAD_csv, file = here::here("NMTSO_meta.csv"), append = TRUE, col_names = TRUE),
                   readr::write_csv(x = STEAD_csv, file = here::here("NMTSO_meta.csv"), append = TRUE, col_names = F)
                   )

This only output the fallowing error:

Error: Can't use NA as column index with `[` at position 1.

lucasgraybuck · July 8, 2021, 4:43am

How about a standard if/else instead of using dplyr for that step?

out_file <- here::here("NMTSO_meta.csv")

if(file.exists(out_file)) {
  readr::write_csv(
    x = STEAD_csv, 
    file = out_file, 
    append = TRUE, 
    col_names = FALSE)
} else {
  readr::write_csv(
    x = STEAD_csv, 
    file = out_file, 
    append = FALSE, 
    col_names = TRUE)
}

HanOostdijk · July 8, 2021, 10:30am

Hello @bryanrt ,

I think that readr::write with only append=T or append=F should be sufficient.
Why your file.exists logic does not handle that correctly, I do not know.

Some suggestions in case of further questions from your site:

please use a minimal (not-) working example of your problem: reprex so that we can redo the relevant part of your code
don't repeat code when not necessary: use a function for the tests on NULL
(Are you sure that those fields really contain the characters NULL?)
There are constructions in dplyr to apply this function to all or a selection of the table columns e.g. across
Assign the result of here::here("NMTSO_meta.csv") to a variable and reuse this variable

jimhester · July 8, 2021, 11:25am

Setting col_names = TRUE always includes the column names, even if append = TRUE.

However it defaults to !append, so if you append only if a file already exists I think it will have the behavior you are looking for. e.g.

library("readr")

mt <- head(mtcars)

out_file <- "mtcars.csv"

for (i in 1:3) {
  write_csv(mt, out_file, append = file.exists(out_file))
}

readLines("mtcars.csv")
#>  [1] "mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb"
#>  [2] "21,6,160,110,3.9,2.62,16.46,0,1,4,4"         
#>  [3] "21,6,160,110,3.9,2.875,17.02,0,1,4,4"        
#>  [4] "22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"       
#>  [5] "21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"     
#>  [6] "18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"      
#>  [7] "18.1,6,225,105,2.76,3.46,20.22,1,0,3,1"      
#>  [8] "21,6,160,110,3.9,2.62,16.46,0,1,4,4"         
#>  [9] "21,6,160,110,3.9,2.875,17.02,0,1,4,4"        
#> [10] "22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"       
#> [11] "21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"     
#> [12] "18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"      
#> [13] "18.1,6,225,105,2.76,3.46,20.22,1,0,3,1"      
#> [14] "21,6,160,110,3.9,2.62,16.46,0,1,4,4"         
#> [15] "21,6,160,110,3.9,2.875,17.02,0,1,4,4"        
#> [16] "22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"       
#> [17] "21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"     
#> [18] "18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"      
#> [19] "18.1,6,225,105,2.76,3.46,20.22,1,0,3,1"

^{Created on 2021-07-08 by the reprex package (v2.0.0)}

bryanrt · July 8, 2021, 2:42pm

This is a super clean solution, I don't know how I didn't see this logic myself!

Thank you kindly @lucasgraybuck, @HanOostdijk, and @jimhester for the great suggestions and guidance.

system · July 15, 2021, 2:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.