I have a dataframe called data_base that looks like that:
The dataframe is this Data Base. Exists more variables I showed that necesaries to understand the problem.
Cod_Corto Cuenca Temporada Alcalinidad Carb_ Bicarb_ Fluoruros Cloruros Nitratos Nitratos_N Sulfatos Nitritos Nitritos_N
<fct> <fct> <fct> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr>
1 GW-026 Tumbes Avenida 283. 0.3 346. 0.495 572. <0.031 <0.007 901. <0.003 <0.001
2 GW-007 Zarumilla Avenida 146. 0.3 179. 0.24399999999… 279. <0.031 <0.007 274. <0.003 <0.001
3 GW-008 Zarumilla Avenida 108. 0.3 132. 0.314 62.8 2.22900… 0.504 61.0 <0.003 <0.001
4 GW-009 Zarumilla Avenida 138. 0.3 168. 0.184 584. <0.031 <0.007 151. <0.003 <0.001
5 SW-056 Tumbes Avenida 156. 0.3 191. 0.20200000000… 68.6 0.151 3.4000000… 124. <0.003 <0.001
6 GW-005 Tumbes Avenida 211 0.3 257. 0.157 64.9 1.04499… 0.2359999… 146. <0.003 <0.001
7 GW-004 Tumbes Avenida 278. 0.3 339 0.17 28.8 5.57800… 1.26 127. <0.003 <0.001
8 GW-003 Tumbes Avenida 311. 0.3 380 0.09 19.8 0.56699… 0.128 19.4 <0.003 <0.001
9 SW-032 Tumbes Avenida 44.9 0.3 54.7 4.59999999999… 1.45 1.768 0.4 20.7 <0.003 <0.001
10 SW-036 Tumbes Avenida 44.7 0.3 54.5 5.29999999999… 1.51 2.323 0.5250000… 18.6 <0.003 <0.001
# … with 89 more rows, and 101 more variables: Fosfatos <chr>, Fosfatos_P <chr>, Bromuro <chr>, Ag_dis <chr>, Al_dis <chr>,
# As_dis <chr>, B_dis <chr>, Ba_dis <chr>, Be_dis <chr>, Bi_dis <chr>, Ca_dis <dbl>, Cd_dis <chr>, Ce_dis <chr>, Co_dis <chr>,
# Cr_dis <chr>, Cs_dis <chr>, Cu_dis <chr>, Fe_dis <chr>, Ga_dis <chr>, Ge_dis <chr>, Hf_dis <chr>, Hg_dis <chr>, K_dis <dbl>,
# La_dis <chr>, Li_dis <chr>, Lu_dis <chr>, Mg_dis <dbl>, Mn_dis <chr>, Mo_dis <chr>, Na_dis <dbl>, Nb_dis <chr>, Ni_dis <chr>,
# P_dis <chr>, Pb_dis <chr>, Rb_dis <chr>, Sb_dis <chr>, Se_dis <chr>, Si_dis <dbl>, SiO2_dis <dbl>, Sn_dis <chr>, Sr_dis <chr>,
# Ta_dis <chr>, Te_dis <chr>, Th_dis <chr>, Ti_dis <chr>, Tl_dis <chr>, U_dis <chr>, V_dis <chr>, W_dis <chr>, Yb_dis <chr>,
# Zn_dis <chr>, Zr_dis <chr>, Ag_tot <chr>, Al_tot <chr>, As_tot <chr>, B_tot <chr>, Ba_tot <dbl>, Be_tot <chr>, Bi_tot <chr>, …
So I need to achieve one object (I quite complex I think):
- Completed the data below the detection limit in a one new column (many new columns because I have many elemnts) named "element_com", this element is variable (can be Cd, Zn, Cu, and so on) and depend of two factors to be completed espace (Cuenca, Subcuenca or Microcuenca) and time factor (Temporada or time intervals).
In order to achieve this objects, I create the following algorithm:
#### Input values in water Below Detection (B.D) ####
library(tidyverse)
library(NADA)
library(DT)
#Filter to detect some B.D
datatable(data_base[ ,c("Zn_dis", "Zn_tot")], filter = "top")
#Elements that I need to complete
elementos <- colnames(data_base[ ,70:180])
#Variables that I need to complete
data_total <- data_base %>% select("Cuenca","Temporada", any_of(elementos))
# First Block
# Filter accord to Cuenca, Subcuenca or Microcuenca (or spatial coherence)
Tumbes <- data_base %>% filter(Cuenca=="Tumbes")
str(Tumbes)
# Second Block
#After one run, start here with Avenida:
z <- Tumbes[Tumbes$Temporada=="Avenida", ] #Do the same for Estiaje, change Avenida by Estiaje!
#Preparing Data for Analysis:
val0 <- unique(grep("<", z$Zn_dis, value = TRUE))
z$var0 <- z$Zn_dis
z$ND_var0 <- rep(0, length(z$var0))
indcero0 <- which(z$var0==val0)
z$var0[indcero0] <- substr(val0,2,nchar(val0))
z$var0 <- as.numeric(z$var0)
z$ND_var0[indcero0] <- 1
z$ND_var0 <- as.logical(z$ND_var0)
#Sort the Data Acorder to Study:
indna0 <- is.na(z$var0)
yn0 <- z$var0[which(indna0==FALSE)]
cyn0 <- z$ND_var0[which(indna0==FALSE)]
yn0 <- sort(yn0,index.return=TRUE)
cyn0 <- cyn0[yn0$ix]
#Apply the ROS (REGRESSION IN ORDER STATISTICS)
elemento <- ros(yn0$x,cyn0,forwardT = "log", reverseT = "exp")
elemento <- as.data.frame(elemento)
id <- which(elemento$censored==TRUE)
elemento <- elemento[id, ]
# Third Block
#DO IT WITH CARE!!!! (only the first time)
#Create new column of completed values.
Tumbes$Zn_com <- Tumbes$Zn_dis #omit this step after first time run
#Loop to input values:
id <- which(Tumbes$Zn_com==val0)
for(i in 1:length(id)){
replace <- elemento$modeled[i]
Tumbes[id, ]$Zn_com[i] <- replace
}
Tumbes$Zn_com <- as.numeric(Tumbes$Zn_com)
# Testing if works
#After run in Avenida and Estiaje
knitr::kable(Tumbes[ ,c("Zn_dis","Zn_com")])
write.csv(x = Tumbes, file = "Tumbes.csv")
At the begginig I subset the database to Tumbes Cuenca (First Block), then I work in the Avenida Temporada (Second and Third Block), after that again with Estiaje Temporada, the initial line of the third block only runs once. Finally, I run the Testing if works part.
Important:
In this case the database does not have Estiaje, but sometimes exist. Maybe if you runs with estiaje exist an error.
All that runs good but my problem is that I have to run hundreds of time, because in this case I have 3 Cuencas, 2 Temporadas (The data from estiaje is not yet here) and 101 elements.
I hope someone can help me. Additionally, I am creating a rpackage to automatically works with dataframes that contain non-detected values in geology water resource and complete by using Hesel methodology from the United State Geological Survey associate with spatio-temporal coherence, moreover many analisis in water resource so who help with these I will appreciate a lot. Thank so much!!!