Code help for novice trying to customize output of xls based data frame

chemgoof · April 15, 2022, 7:38pm

This post is long!!! I am putting the excruciating details of the question at the end of this thread. I feel like I am so close and still miles away to getting this working. This is really hard for me to convey, but...

In short, I have the table and program below. I run it and it processes the data from the file I choose, deletes a bunch of data never used, then a pop-up comes up and I can choose multiple selections. In an example using the data below, I choose Site 1 and 3 and then I select all the choices in the next two pop-ups. I was hoping to see all of the data from Sites 1 and 3. But it erases, for example, Analyte 6 from Site 1 on the 4th of March. I am almost positive this is from my i values, but just can't see the problem and can't find anything similar in my help searches

Data Example (A little long, but wanted representative data):

Project	sampletype	SpecificMethod	Analyte	ClientMatrix	Sampled	tResult
Site 1	Long	Method 1	Analyte 1	Air	4/3/2020	ND
Site 2	Long	Method 1	Analyte 1	Air	4/3/2020	ND
Site 3	Long	Method 1	Analyte 1	Air	4/3/2020	ND
Site 4	Long	Method 1	Analyte 1	Air	4/3/2020	ND
Site 1	Long	Method 1	Analyte 2	Air	4/3/2020	ND
Site 2	Long	Method 1	Analyte 2	Air	4/3/2020	ND
Site 3	Long	Method 1	Analyte 2	Air	4/3/2020	ND
Site 4	Long	Method 1	Analyte 2	Air	4/3/2020	ND
Site 1	Long	Method 1	Analyte 3	Air	4/3/2020	16.6
Site 2	Long	Method 1	Analyte 3	Air	4/3/2020	76
Site 3	Long	Method 1	Analyte 3	Air	4/3/2020	27.8
Site 4	Long	Method 1	Analyte 3	Air	4/3/2020	21
Site 1	Long	Method 3	Analyte 4	Air	4/3/2020	0.133
Site 2	Long	Method 3	Analyte 4	Air	4/3/2020	0.236
Site 3	Long	Method 3	Analyte 4	Air	4/3/2020	0.179
Site 4	Long	Method 3	Analyte 4	Air	4/3/2020	0.192
Site 1	Long	Method 3	Analyte 5	Air	4/3/2020	0.404
Site 2	Long	Method 3	Analyte 5	Air	4/3/2020	0.619
Site 3	Long	Method 3	Analyte 5	Air	4/3/2020	0.57
Site 4	Long	Method 3	Analyte 5	Air	4/3/2020	0.565
Site 1	Long	Method 3	Analyte 6	Air	4/3/2020	ND
Site 2	Long	Method 3	Analyte 6	Air	4/3/2020	0.0529
Site 3	Long	Method 3	Analyte 6	Air	4/3/2020	0.0428
Site 4	Long	Method 3	Analyte 6	Air	4/3/2020	ND
Site 1	Long	Method 3	Analyte 7	Air	4/3/2020	0.664
Site 2	Long	Method 3	Analyte 7	Air	4/3/2020	0.784
Site 3	Long	Method 3	Analyte 7	Air	4/3/2020	1.32
Site 4	Long	Method 3	Analyte 7	Air	4/3/2020	1.17
Site 1	Long	Method 3	Analyte 4	Air	4/9/2020	0.511
Site 2	Long	Method 3	Analyte 4	Air	4/9/2020	1.43
Site 3	Long	Method 3	Analyte 4	Air	4/9/2020	0.381
Site 4	Long	Method 3	Analyte 4	Air	4/9/2020	0.314
Site 1	Long	Method 3	Analyte 5	Air	4/9/2020	1.93
Site 2	Long	Method 3	Analyte 5	Air	4/9/2020	2.38
Site 3	Long	Method 3	Analyte 5	Air	4/9/2020	1.35
Site 4	Long	Method 3	Analyte 5	Air	4/9/2020	1.06
Site 1	Long	Method 3	Analyte 6	Air	4/9/2020	0.229
Site 2	Long	Method 3	Analyte 6	Air	4/9/2020	0.241
Site 3	Long	Method 3	Analyte 6	Air	4/9/2020	0.119
Site 4	Long	Method 3	Analyte 6	Air	4/9/2020	0.0825
Site 1	Long	Method 3	Analyte 7	Air	4/9/2020	1.99
Site 2	Long	Method 3	Analyte 7	Air	4/9/2020	2.54
Site 3	Long	Method 3	Analyte 7	Air	4/9/2020	1.72
Site 4	Long	Method 3	Analyte 7	Air	4/9/2020	1.61
Site 1	Long	Method 3	Analyte 4	Air	4/15/2020	0.198
Site 2	Long	Method 3	Analyte 4	Air	4/15/2020	0.269
Site 3	Long	Method 3	Analyte 4	Air	4/15/2020	0.249
Site 4	Long	Method 3	Analyte 4	Air	4/15/2020	0.259
Site 1	Long	Method 3	Analyte 5	Air	4/15/2020	1.12
Site 2	Long	Method 3	Analyte 5	Air	4/15/2020	1.12
Site 3	Long	Method 3	Analyte 5	Air	4/15/2020	1.17
Site 4	Long	Method 3	Analyte 5	Air	4/15/2020	0.753
Site 1	Long	Method 3	Analyte 6	Air	4/15/2020	0.169
Site 2	Long	Method 3	Analyte 6	Air	4/15/2020	0.13
Site 3	Long	Method 3	Analyte 6	Air	4/15/2020	0.128
Site 4	Long	Method 3	Analyte 6	Air	4/15/2020	0.0661
Site 1	Long	Method 3	Analyte 7	Air	4/15/2020	1.04
Site 2	Long	Method 3	Analyte 7	Air	4/15/2020	1.15
Site 3	Long	Method 3	Analyte 7	Air	4/15/2020	1.2
Site 4	Long	Method 3	Analyte 7	Air	4/15/2020	1.2

The program:

library(readxl)
library(ggplot2)
library(lubridate)

fileIn <- file.choose(new = FALSE)

# Pull data from excel data and make dataframe (df) named, in this case,  Elem
Elem <- read_xls(paste0(fileIn), col_names = TRUE, na = c("void", "VOID"), col_types = NULL)

# tResult is a character. This makes it a number
Elem$tResult <- as.numeric(Elem$tResult)

# Sampled is a character. This makes it a date in local time
Elem$Sampled <- parse_date_time(Elem$Sampled, '%m/%d/%Y', tz = 'Etc/GMT-8')

# Sort the data 
Elem[with(Elem, order("Project", "Sampled", "SpecificMethod", "Analyte")), ]

# Remove rows with specific criteria, ex. Blanks
Elem <- Elem[!(Elem$ClientMatrix=="PM 10 - HV" | Elem$sampletype=="Blank - Equipment"),]
Elem <- Elem[!(Elem$sampletype=="Pre-deployment Check" | Elem$Project=="Special Projects" | Elem$Project=="Sample Media Lot Blanks"),]
Elem <- Elem[!(Elem$Project=="Predeployment Equipment Check" | Elem$Project=="Project"),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]

# Make new value named "Location" from the column named "Project" and choose one or multiple sites (with CTRL or Shift)
# Assign integer values to the locations (sites) you chose
Location <- unique(Elem$Project)
x <- select.list(Location, preselect = NULL, multiple = T, title = "Locations? Hold SHIFT or CTRL for multiple",  graphics = TRUE)
i <- with(Elem, match(x, unique(Location)))

# Remove all rows of data that are NOT at a site you chose
Elem <- Elem[!(Elem$Project != Location[i]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]

# Make new value named "Method" from the column named "SpecificMethod" and choose multiple Methods (with CTRL or Shift)
Method <- unique(Elem$SpecificMethod)
y <- select.list(Method, preselect = NULL, multiple = T, title = "Method? Hold SHIFT or CTRL for multiple",  graphics = TRUE)
j <- with(Elem, match(y, unique(SpecificMethod)))

# Remove all data that are NOT at an analysis method you chose
Elem <- Elem[!(Elem$SpecificMethod != Method[j]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]

# Make new value named "Pollutant" from the column named "Analyte" and choose multiple pollutants (with CTRL or Shift)
Pollutant <- unique(Elem$Analyte)
z <- select.list(Pollutant, preselect = NULL, multiple = T, title = "Pollutant? Hold SHIFT or CTRL for multiple",  graphics = TRUE)
k <- with(Elem, match(z, unique(Analyte)))

# Remove all data that are NOT a pollutant you chose
Elem <- Elem[!(Elem$Analyte != Pollutant[k]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]

# Display df
View(Elem)

When I do this in excel, I filter the data and then copy/paste data into a new sheet. I want to instead automate this with R in a simple series of steps. I am not an expert by any means in R, and I have piecemealed this program from multiple sources so it is a little uneven. I started with using the menu() function in selecting my criteria (like my specific location), but it only allows for one selection.

The problems:
-My raw data comes out as an excel file and I have to do a lot of post-processing in Excel to make the data shareable.
-In R, I want to be able to select different criteria each time based on the Project, Method and Analyte.
-I want to be able to select multiple criteria when filtering the data (ex. Choose Site 1 and Site 2, but exclude the other sites while keeping all of the data, including the ND data).

The errors:
-When I run the program and select the first subset of data (based on Location), I lose a lot of my data associated with that selection. If I choose two sites (out of 4) for example, it deletes a large part of the data from those sites along with all the other sites I didn't choose (the latter being what I am shooting for, the former being a huge issue). I have a feeling it has to do with my "i" values, but am not totally sure.
-I get the dreaded "In Elem$Project != x : longer object length is not a multiple of shorter object length" error when I am trying to delete rows not associated with my Location choices.

FJCC · April 15, 2022, 9:06pm

I cannot test this at the moment but try replacing blocks of code like this

# Make new value named "Location" from the column named "Project" and choose one or multiple sites (with CTRL or Shift)
# Assign integer values to the locations (sites) you chose
Location <- unique(Elem$Project)
x <- select.list(Location, preselect = NULL, multiple = T, title = "Locations? Hold SHIFT or CTRL for multiple",  graphics = TRUE)
i <- with(Elem, match(x, unique(Location)))

# Remove all rows of data that are NOT at a site you chose
Elem <- Elem[!(Elem$Project != Location[i]),]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]

with

# Make new value named "Location" from the column named "Project" and choose one or multiple sites (with CTRL or Shift)
# Assign integer values to the locations (sites) you chose
Location <- unique(Elem$Project)
x <- select.list(Location, preselect = NULL, multiple = T, title = "Locations? Hold SHIFT or CTRL for multiple",  graphics = TRUE)

# Remove all rows of data that are NOT at a site you chose
Elem <- Elem[Elem$Project %in% x,]
Elem <- Elem[rowSums(is.na(Elem)) != ncol(Elem),]

chemgoof · April 15, 2022, 9:52pm

Looks like that worked. Thank you!

system · April 22, 2022, 9:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.