This post is long!!! I am putting the excruciating details of the question at the end of this thread. I feel like I am so close and still miles away to getting this working. This is really hard for me to convey, but...
In short, I have the table and program below. I run it and it processes the data from the file I choose, deletes a bunch of data never used, then a pop-up comes up and I can choose multiple selections. In an example using the data below, I choose Site 1 and 3 and then I select all the choices in the next two pop-ups. I was hoping to see all of the data from Sites 1 and 3. But it erases, for example, Analyte 6 from Site 1 on the 4th of March. I am almost positive this is from my i values, but just can't see the problem and can't find anything similar in my help searches
Data Example (A little long, but wanted representative data):
Project | sampletype | SpecificMethod | Analyte | ClientMatrix | Sampled | tResult |
---|---|---|---|---|---|---|
Site 1 | Long | Method 1 | Analyte 1 | Air | 4/3/2020 | ND |
Site 2 | Long | Method 1 | Analyte 1 | Air | 4/3/2020 | ND |
Site 3 | Long | Method 1 | Analyte 1 | Air | 4/3/2020 | ND |
Site 4 | Long | Method 1 | Analyte 1 | Air | 4/3/2020 | ND |
Site 1 | Long | Method 1 | Analyte 2 | Air | 4/3/2020 | ND |
Site 2 | Long | Method 1 | Analyte 2 | Air | 4/3/2020 | ND |
Site 3 | Long | Method 1 | Analyte 2 | Air | 4/3/2020 | ND |
Site 4 | Long | Method 1 | Analyte 2 | Air | 4/3/2020 | ND |
Site 1 | Long | Method 1 | Analyte 3 | Air | 4/3/2020 | 16.6 |
Site 2 | Long | Method 1 | Analyte 3 | Air | 4/3/2020 | 76 |
Site 3 | Long | Method 1 | Analyte 3 | Air | 4/3/2020 | 27.8 |
Site 4 | Long | Method 1 | Analyte 3 | Air | 4/3/2020 | 21 |
Site 1 | Long | Method 3 | Analyte 4 | Air | 4/3/2020 | 0.133 |
Site 2 | Long | Method 3 | Analyte 4 | Air | 4/3/2020 | 0.236 |
Site 3 | Long | Method 3 | Analyte 4 | Air | 4/3/2020 | 0.179 |
Site 4 | Long | Method 3 | Analyte 4 | Air | 4/3/2020 | 0.192 |
Site 1 | Long | Method 3 | Analyte 5 | Air | 4/3/2020 | 0.404 |
Site 2 | Long | Method 3 | Analyte 5 | Air | 4/3/2020 | 0.619 |
Site 3 | Long | Method 3 | Analyte 5 | Air | 4/3/2020 | 0.57 |
Site 4 | Long | Method 3 | Analyte 5 | Air | 4/3/2020 | 0.565 |
Site 1 | Long | Method 3 | Analyte 6 | Air | 4/3/2020 | ND |
Site 2 | Long | Method 3 | Analyte 6 | Air | 4/3/2020 | 0.0529 |
Site 3 | Long | Method 3 | Analyte 6 | Air | 4/3/2020 | 0.0428 |
Site 4 | Long | Method 3 | Analyte 6 | Air | 4/3/2020 | ND |
Site 1 | Long | Method 3 | Analyte 7 | Air | 4/3/2020 | 0.664 |
Site 2 | Long | Method 3 | Analyte 7 | Air | 4/3/2020 | 0.784 |
Site 3 | Long | Method 3 | Analyte 7 | Air | 4/3/2020 | 1.32 |
Site 4 | Long | Method 3 | Analyte 7 | Air | 4/3/2020 | 1.17 |
Site 1 | Long | Method 3 | Analyte 4 | Air | 4/9/2020 | 0.511 |
Site 2 | Long | Method 3 | Analyte 4 | Air | 4/9/2020 | 1.43 |
Site 3 | Long | Method 3 | Analyte 4 | Air | 4/9/2020 | 0.381 |
Site 4 | Long | Method 3 | Analyte 4 | Air | 4/9/2020 | 0.314 |
Site 1 | Long | Method 3 | Analyte 5 | Air | 4/9/2020 | 1.93 |
Site 2 | Long | Method 3 | Analyte 5 | Air | 4/9/2020 | 2.38 |
Site 3 | Long | Method 3 | Analyte 5 | Air | 4/9/2020 | 1.35 |
Site 4 | Long | Method 3 | Analyte 5 | Air | 4/9/2020 | 1.06 |
Site 1 | Long | Method 3 | Analyte 6 | Air | 4/9/2020 | 0.229 |
Site 2 | Long | Method 3 | Analyte 6 | Air | 4/9/2020 | 0.241 |
Site 3 | Long | Method 3 | Analyte 6 | Air | 4/9/2020 | 0.119 |
Site 4 | Long | Method 3 | Analyte 6 | Air | 4/9/2020 | 0.0825 |
Site 1 | Long | Method 3 | Analyte 7 | Air | 4/9/2020 | 1.99 |
Site 2 | Long | Method 3 | Analyte 7 | Air | 4/9/2020 | 2.54 |
Site 3 | Long | Method 3 | Analyte 7 | Air | 4/9/2020 | 1.72 |
Site 4 | Long | Method 3 | Analyte 7 | Air | 4/9/2020 | 1.61 |
Site 1 | Long | Method 3 | Analyte 4 | Air | 4/15/2020 | 0.198 |
Site 2 | Long | Method 3 | Analyte 4 | Air | 4/15/2020 | 0.269 |
Site 3 | Long | Method 3 | Analyte 4 | Air | 4/15/2020 | 0.249 |
Site 4 | Long | Method 3 | Analyte 4 | Air | 4/15/2020 | 0.259 |
Site 1 | Long | Method 3 | Analyte 5 | Air | 4/15/2020 | 1.12 |
Site 2 | Long | Method 3 | Analyte 5 | Air | 4/15/2020 | 1.12 |
Site 3 | Long | Method 3 | Analyte 5 | Air | 4/15/2020 | 1.17 |
Site 4 | Long | Method 3 | Analyte 5 | Air | 4/15/2020 | 0.753 |
Site 1 | Long | Method 3 | Analyte 6 | Air | 4/15/2020 | 0.169 |
Site 2 | Long | Method 3 | Analyte 6 | Air | 4/15/2020 | 0.13 |
Site 3 | Long | Method 3 | Analyte 6 | Air | 4/15/2020 | 0.128 |
Site 4 | Long | Method 3 | Analyte 6 | Air | 4/15/2020 | 0.0661 |
Site 1 | Long | Method 3 | Analyte 7 | Air | 4/15/2020 | 1.04 |
Site 2 | Long | Method 3 | Analyte 7 | Air | 4/15/2020 | 1.15 |
Site 3 | Long | Method 3 | Analyte 7 | Air | 4/15/2020 | 1.2 |
Site 4 | Long | Method 3 | Analyte 7 | Air | 4/15/2020 | 1.2 |
The program:
library(readxl)
library(ggplot2)
library(lubridate)
fileIn <- file.choose(new = FALSE)
# Pull data from excel data and make dataframe (df) named, in this case, Elem
Elem <- read_xls(paste0(fileIn), col_names = TRUE, na = c("void", "VOID"), col_types = NULL)
# tResult is a character. This makes it a number
Elem$tResult <- as.numeric(Elem$tResult)
# Sampled is a character. This makes it a date in local time
Elem$Sampled <- parse_date_time(Elem$Sampled, '%m/%d/%Y', tz = 'Etc/GMT-8')
# Sort the data
Elem[with(Elem, order("Project", "Sampled", "SpecificMethod", "Analyte")), ]
# Remove rows with specific criteria, ex. Blanks
Elem <- Elem[!(Elem$ClientMatrix=="PM 10 - HV" | Elem$sampletype=="Blank - Equipment"),]
Elem <- Elem[!(Elem$sampletype=="Pre-deployment Check" | Elem$Project=="Special Projects" | Elem$Project=="Sample Media Lot Blanks"),]
Elem <- Elem[!(Elem$Project=="Predeployment Equipment Check" | Elem$Project=="Project"),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]
# Make new value named "Location" from the column named "Project" and choose one or multiple sites (with CTRL or Shift)
# Assign integer values to the locations (sites) you chose
Location <- unique(Elem$Project)
x <- select.list(Location, preselect = NULL, multiple = T, title = "Locations? Hold SHIFT or CTRL for multiple", graphics = TRUE)
i <- with(Elem, match(x, unique(Location)))
# Remove all rows of data that are NOT at a site you chose
Elem <- Elem[!(Elem$Project != Location[i]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]
# Make new value named "Method" from the column named "SpecificMethod" and choose multiple Methods (with CTRL or Shift)
Method <- unique(Elem$SpecificMethod)
y <- select.list(Method, preselect = NULL, multiple = T, title = "Method? Hold SHIFT or CTRL for multiple", graphics = TRUE)
j <- with(Elem, match(y, unique(SpecificMethod)))
# Remove all data that are NOT at an analysis method you chose
Elem <- Elem[!(Elem$SpecificMethod != Method[j]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]
# Make new value named "Pollutant" from the column named "Analyte" and choose multiple pollutants (with CTRL or Shift)
Pollutant <- unique(Elem$Analyte)
z <- select.list(Pollutant, preselect = NULL, multiple = T, title = "Pollutant? Hold SHIFT or CTRL for multiple", graphics = TRUE)
k <- with(Elem, match(z, unique(Analyte)))
# Remove all data that are NOT a pollutant you chose
Elem <- Elem[!(Elem$Analyte != Pollutant[k]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]
# Display df
View(Elem)
When I do this in excel, I filter the data and then copy/paste data into a new sheet. I want to instead automate this with R in a simple series of steps. I am not an expert by any means in R, and I have piecemealed this program from multiple sources so it is a little uneven. I started with using the menu() function in selecting my criteria (like my specific location), but it only allows for one selection.
The problems:
-My raw data comes out as an excel file and I have to do a lot of post-processing in Excel to make the data shareable.
-In R, I want to be able to select different criteria each time based on the Project, Method and Analyte.
-I want to be able to select multiple criteria when filtering the data (ex. Choose Site 1 and Site 2, but exclude the other sites while keeping all of the data, including the ND data).
The errors:
-When I run the program and select the first subset of data (based on Location), I lose a lot of my data associated with that selection. If I choose two sites (out of 4) for example, it deletes a large part of the data from those sites along with all the other sites I didn't choose (the latter being what I am shooting for, the former being a huge issue). I have a feeling it has to do with my "i" values, but am not totally sure.
-I get the dreaded "In Elem$Project != x : longer object length is not a multiple of shorter object length" error when I am trying to delete rows not associated with my Location choices.