problem with the mutate function ??

remsssteack · August 5, 2021, 12:34pm

Hello,

A few month ago I asked for help for importing a file and someone gave me a insane code to do so using the mutate function.
Today I need to run it again but it appear that the mutate function make my r session crash.

Does anyone have the same issue ?

gueyenono · August 5, 2021, 1:30pm

Hi @remsssteack

I'm curious to know how you realized that the mutate function (from the dplyr package) was the reason why your R session crashed.

remsssteack · August 5, 2021, 2:08pm

Well I ran my code one by one and it was a segment containing it that crashed the session. the code line was has such : read_delim() %>% SetNames() %>% mutate()
i then just ran read_delim() %>% SetNames() and it didn't crashed.
But i doubt it the function that caused the crash because i can see an output in the console i wonder if it's not a problem with versions of rstudio or r or the packages but idk.

gueyenono · August 5, 2021, 2:40pm

Maybe you should try reinstalling the things you suspect are causing the issue (e.g. packages, RStudio, ...)

remsssteack · August 5, 2021, 6:22pm

Here is a link to the code that used to work :

Also @arthur.t Sorry for pinging but do you have an idea ?

Best regards,

remsssteack · August 6, 2021, 9:01am

Update :
I don't think there is a problem with the mutate function but with the IDE since the log files says : "ERROR system error 10053"
which looks related to this post : RStudio crashing when navigating through file explorer (system error 10053)
I thought it was related to the versions of my sessions , so I updated everything but the crash is still there.
I can provide a .mp4 video that shows the crash via pm if someone is interested since i can't post it there (format not accepted).

I can't think of any solution

arthur.t · August 6, 2021, 11:49am

Sorry, I don't know how to fix crashes. mutate is one of the most commonly used functions in all of R, so I think most likely the issue is something else.

nirgrahamuk · August 6, 2021, 11:57am

how about attempting a reprex ?
if you have a dataframe, before you send it to a problematic 'mutate'
you can use standard methods (dput()) to provide an example of the data.frame at that point.
I recommend you also do a sessionInfo() to make clear what your R and package Versions are

remsssteack · August 6, 2021, 12:07pm

thing is that to print out a reprex i need a working R session :

reprex()
i Rendering reprex...
Error: This reprex appears to crash R
Call `reprex()` again with `std_out_err = TRUE` to get more info
Run `rlang::last_error()` to see where the error occurred.

Also with std_out_err=TRUE :

This reprex appears to crash R.
See standard output and standard error for more details.

Standard output and error

-- nothing to show --

About the sessioninfo() :

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
#> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=French_France.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.33        magrittr_2.0.1    rlang_0.4.11      fansi_0.5.0      
#>  [5] stringr_1.4.0     styler_1.5.1      highr_0.9         tools_4.1.0      
#>  [9] xfun_0.24         utf8_1.2.2        withr_2.4.2       htmltools_0.5.1.1
#> [13] ellipsis_0.3.2    yaml_2.2.1        digest_0.6.27     tibble_3.1.3     
#> [17] lifecycle_1.0.0   crayon_1.4.1      purrr_0.3.4       vctrs_0.3.8      
#> [21] fs_1.5.0          glue_1.4.2        evaluate_0.14     rmarkdown_2.9    
#> [25] reprex_2.0.1      stringi_1.7.3     compiler_4.1.0    pillar_1.6.2     
#> [29] backports_1.2.1   pkgconfig_2.0.3

^{Created on 2021-08-06 by the reprex package (v2.0.1)}

Is there something else helpful to print out ??

nirgrahamuk · August 6, 2021, 12:09pm

But you told us that you have a working R session, until you mutate. and im asking for the data before that point.
Or does it indeed crash earlier ?

remsssteack · August 6, 2021, 12:20pm

Oh so I use mutate to import the data let me show you the input of the mutate :

library(tidyverse)
###################################################################################### p 1000 t1 #################
# read each row of text individually so we can parse out the information manually
election0 <- 
  read_delim(
    "~/MASTER_1/stage/travail/R/donnees/2014/2014_t1+1000.txt", 
    "\n",
    col_names = FALSE,locale = locale(encoding = "ISO-8859-1")) %>%
  setNames("line_text")
#> Rows: 4956 Columns: 1
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\n"
#> chr (1): X1
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Dput(election0) also crash the session so here the first few row :

head(election0)
#> # A tibble: 6 x 1
#>   line_text                                                                     
#>   <chr>                                                                         
#> 1 "Date de l'export;Code du département;Type de scrutin;Libellé du département;~
#> 2 "25/03/2014 12:50:21;01;LI2;AIN;004;Ambérieu-en-Bugey;00008198;00003422;41,74~
#> 3 "\n25/03/2014 12:50:21;01;LI2;AIN;007;Ambronay;00001770;00000511;28,87;000012~
#> 4 "\n25/03/2014 12:50:21;01;LI2;AIN;014;Arbent;00002167;00001061;48,96;00001106~
#> 5 "\n25/03/2014 12:50:21;01;LI2;AIN;022;Artemare;00000857;00000237;27,65;000006~
#> 6 "\n25/03/2014 12:50:21;01;LI2;AIN;025;Bâgé-la-Ville;00002166;00001051;48,52;0~

basically it's a raw importation of an csv file that the mutate function is suppose to split into variables :

election0 <- 
  read_delim(
    "~/MASTER_1/stage/travail/R/donnees/2014/2014_t1+1000.txt", 
    "\n",
    col_names = FALSE,locale = locale(encoding = "ISO-8859-1")) %>%
  setNames("line_text") %>%
  mutate(
    # split by delimiter
    split_text  = strsplit(line_text, ";"),
    # assume the first 17 elements are common
    split_df    = map(split_text, ~.[1:17]),
    # and everything past this is repeating 11
    split_names = map(split_text, ~.[-c(1:17)]),
    columns     = map_dbl(split_text, length),
    # the number of repeating 11 name data elements
    n_names     = (columns - 17)/11)

nirgrahamuk · August 6, 2021, 12:23pm

does

dput(head(election0))

crash the session ?
if not then please post that.

remsssteack · August 6, 2021, 12:26pm

Here :

library(tidyverse)

election0 <- 
  read_delim(
    "~/MASTER_1/stage/travail/R/donnees/2014/2014_t1+1000.txt", 
    "\n",
    col_names = FALSE,locale = locale(encoding = "ISO-8859-1")) %>%
  setNames("line_text")
#> Rows: 4956 Columns: 1
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\n"
#> chr (1): X1
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

dput(head(election0))
#> structure(list(line_text = c("Date de l'export;Code du département;Type de scrutin;Libellé du département;Code de la commune;Libellé de la commune;Inscrits;Abstentions;% Abs/Ins;Votants;% Vot/Ins;Blancs et nuls;% BlNuls/Ins;% BlNuls/Vot;Exprimés;% Exp/Ins;% Exp/Vot;Code Nuance;Sexe;Nom;Prénom;Liste;Sièges / Elu;Sièges Secteur;Sièges CC;Voix;% Voix/Ins;% Voix/Exp;", 
#> "25/03/2014 12:50:21;01;LI2;AIN;004;Ambérieu-en-Bugey;00008198;00003422;41,74;00004776;58,26;00000191;2,33;4,00;00004585;55,93;96,00;LDVG;F;EXPOSITO;Josiane;AMBERIEU AMBITION;0;0;0;00000954;11,64;20,81;LDVG;F;PIDOUX;Catherine;VIVONS NOTRE VILLE;0;0;0;00000822;10,03;17,93;LUMP;M;FORTIN;Christophe;AMBERIEU RENOUVEAU;0;0;0;00001383;16,87;30,16;LDVD;M;FABRE;Daniel;PAROLE  AUX AMBARROIS;0;0;0;00001426;17,39;31,10;", 
#> "\n25/03/2014 12:50:21;01;LI2;AIN;007;Ambronay;00001770;00000511;28,87;00001259;71,13;00000068;3,84;5,40;00001191;67,29;94,60;LDVG;F;LEVRAT;Gisèle;AMBRONAY POUR TOUS;0;0;0;00000552;31,19;46,35;LDVD;M;FOURNIER;Gabriel;AMBRONAY Demain;0;0;0;00000178;10,06;14,95;LDVD;M;MANCUSO;Vincent;AGIR ENSEMBLE POUR L'AVENIR D'AMBRONAY;0;0;0;00000461;26,05;38,71;", 
#> "\n25/03/2014 12:50:21;01;LI2;AIN;014;Arbent;00002167;00001061;48,96;00001106;51,04;00000176;8,12;15,91;00000930;42,92;84,09;LUMP;F;MAISSIAT;Liliane;POUR L'AVENIR DE TOUS, CONTINUONS ENSEMBLE;23;0;3;00000930;42,92;100,00;", 
#> "\n25/03/2014 12:50:21;01;LI2;AIN;022;Artemare;00000857;00000237;27,65;00000620;72,35;00000027;3,15;4,35;00000593;69,19;95,65;LDVD;F;CHARMONT-MUNET;Mireille;AVEC VOUS, GARDONS LE CAP;13;0;2;00000386;45,04;65,09;LDVD;M;LESEUR;Philippe;ARTEMARE, UNE DYNAMIQUE POUR L'AVENIR;2;0;0;00000207;24,15;34,91;", 
#> "\n25/03/2014 12:50:21;01;LI2;AIN;025;Bâgé-la-Ville;00002166;00001051;48,52;00001115;51,48;00000290;13,39;26,01;00000825;38,09;73,99;LDIV;M;REPIQUET;Dominique;Préparons l'avenir;23;0;5;00000825;38,09;100,00;"
#> )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
#> ))

^{Created on 2021-08-06 by the reprex package (v2.0.1)}

nirgrahamuk · August 6, 2021, 12:34pm

This runs for me, but I'm still using 4.05. Maybe someone else with 4.1 can try it

election0 <- structure(list(line_text = c("Date de l'export;Code du département;Type de scrutin;Libellé du département;Code de la commune;Libellé de la commune;Inscrits;Abstentions;% Abs/Ins;Votants;% Vot/Ins;Blancs et nuls;% BlNuls/Ins;% BlNuls/Vot;Exprimés;% Exp/Ins;% Exp/Vot;Code Nuance;Sexe;Nom;Prénom;Liste;Sièges / Elu;Sièges Secteur;Sièges CC;Voix;% Voix/Ins;% Voix/Exp;", 
                                          "25/03/2014 12:50:21;01;LI2;AIN;004;Ambérieu-en-Bugey;00008198;00003422;41,74;00004776;58,26;00000191;2,33;4,00;00004585;55,93;96,00;LDVG;F;EXPOSITO;Josiane;AMBERIEU AMBITION;0;0;0;00000954;11,64;20,81;LDVG;F;PIDOUX;Catherine;VIVONS NOTRE VILLE;0;0;0;00000822;10,03;17,93;LUMP;M;FORTIN;Christophe;AMBERIEU RENOUVEAU;0;0;0;00001383;16,87;30,16;LDVD;M;FABRE;Daniel;PAROLE  AUX AMBARROIS;0;0;0;00001426;17,39;31,10;", 
                                          "\n25/03/2014 12:50:21;01;LI2;AIN;007;Ambronay;00001770;00000511;28,87;00001259;71,13;00000068;3,84;5,40;00001191;67,29;94,60;LDVG;F;LEVRAT;Gisèle;AMBRONAY POUR TOUS;0;0;0;00000552;31,19;46,35;LDVD;M;FOURNIER;Gabriel;AMBRONAY Demain;0;0;0;00000178;10,06;14,95;LDVD;M;MANCUSO;Vincent;AGIR ENSEMBLE POUR L'AVENIR D'AMBRONAY;0;0;0;00000461;26,05;38,71;", 
                                          "\n25/03/2014 12:50:21;01;LI2;AIN;014;Arbent;00002167;00001061;48,96;00001106;51,04;00000176;8,12;15,91;00000930;42,92;84,09;LUMP;F;MAISSIAT;Liliane;POUR L'AVENIR DE TOUS, CONTINUONS ENSEMBLE;23;0;3;00000930;42,92;100,00;", 
                                          "\n25/03/2014 12:50:21;01;LI2;AIN;022;Artemare;00000857;00000237;27,65;00000620;72,35;00000027;3,15;4,35;00000593;69,19;95,65;LDVD;F;CHARMONT-MUNET;Mireille;AVEC VOUS, GARDONS LE CAP;13;0;2;00000386;45,04;65,09;LDVD;M;LESEUR;Philippe;ARTEMARE, UNE DYNAMIQUE POUR L'AVENIR;2;0;0;00000207;24,15;34,91;", 
                                          "\n25/03/2014 12:50:21;01;LI2;AIN;025;Bâgé-la-Ville;00002166;00001051;48,52;00001115;51,48;00000290;13,39;26,01;00000825;38,09;73,99;LDIV;M;REPIQUET;Dominique;Préparons l'avenir;23;0;5;00000825;38,09;100,00;"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
library(tidyverse)
mutate(election0,
       # split by delimiter
       split_text  = strsplit(line_text, ";"),
       # assume the first 17 elements are common
       split_df    = map(split_text, ~.[1:17]),
       # and everything past this is repeating 11
       split_names = map(split_text, ~.[-c(1:17)]),
       columns     = map_dbl(split_text, length),
       # the number of repeating 11 name data elements
       n_names     = (columns - 17)/11)

my sessionInfo is

R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4     readr_2.0.0     tidyr_1.1.3     tibble_3.1.2    ggplot2_3.3.5  
[9] tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     cellranger_1.1.0 pillar_1.6.1     compiler_4.0.5   dbplyr_2.1.1     tools_4.0.5      jsonlite_1.7.2  
 [8] lubridate_1.7.10 lifecycle_1.0.0  gtable_0.3.0     pkgconfig_2.0.3  rlang_0.4.10     reprex_2.0.0     cli_3.0.0       
[15] rstudioapi_0.13  DBI_1.0.0        haven_2.3.1      xml2_1.3.2       withr_2.2.0      httr_1.4.2       fs_1.5.0        
[22] generics_0.1.0   vctrs_0.3.8      hms_1.1.0        grid_4.0.5       tidyselect_1.1.0 glue_1.4.1       R6_2.4.1        
[29] fansi_0.4.1      readxl_1.3.1     tzdb_0.1.1       modelr_0.1.8     magrittr_2.0.1   backports_1.1.7  scales_1.1.1    
[36] ellipsis_0.3.2   rvest_1.0.0      assertthat_0.2.1 colorspace_1.4-1 utf8_1.1.4       stringi_1.4.6    munsell_0.5.0   
[43] broom_0.7.8      crayon_1.4.1

Perhaps you should check if this truncated data/code work or still crash for you.
i.e. perhaps it is fine for you on these records, but the true dataset has special characters somewhere further down that are the problem....

remsssteack · August 6, 2021, 12:42pm

This code runs fine on my session too.
Idk if this can be caused by special characters. Also the code worked fine 2 months ago, there is no reason that such a trivial function doesn't work now, I really don't understand what's happening

nirgrahamuk · August 6, 2021, 1:03pm

That shows the reprex lacks value in debugging, as it doesn't reproduce the error on any machine.
You may have to make an effort to find the problematic row.
How many rows are there in total ?

Here is one approach, you can use this on your election0 frame, then check in your workingdirectory for the loggingfile.txt and see the last ok row, the one after will feature the problem.

library(tidyverse)


do_a_row <- function(x,row){
  print(row)
  # browser()
mutate(x %>% slice(row),
       # split by delimiter
       split_text  = strsplit(line_text, ";"),
       # assume the first 17 elements are common
       split_df    = map(split_text, ~.[1:17]),
       # and everything past this is repeating 11
       split_names = map(split_text, ~.[-c(1:17)]),
       columns     = map_dbl(split_text, length),
       # the number of repeating 11 name data elements
       n_names     = (columns - 17)/11)
  write(x = paste0("row ",row," ok" ), file = "loggingfile.txt",append=TRUE)
}
if(file.exists("loggingfile.txt"))
  file.remove("loggingfile.txt")

walk(seq_len(nrow(election0)),
     ~do_a_row(election0,.x))

remsssteack · August 6, 2021, 1:13pm

Interesting ! The session crashed at the 1304 th row. the last ok row was the 1303. The frame has 4956 rows in total.

remsssteack · August 6, 2021, 1:15pm

Also when trying to print the row with : election0[1304,1]
the session crash let me have a look at the csv

remsssteack · August 6, 2021, 1:20pm

Well the row seems normal also the file hasn't changed since the first time I ran the code.
I tried the code dput(slice(election0,1300:1305))
But it crashed too.

remsssteack · August 6, 2021, 1:24pm

Here is the file source : if you want to try.