Hello, I am a GIS project manager in a large American city for a public housing agency. I constantly deal with 1000s of construction project data records related to rooftop ventilation assets in R Studio and Microsoft Excel.
I am out of my depth in my goal to better consolidate my most important set of information. My data repository consists of 262 records of unique housing developments. These developments are made of 1,396 records of unique buildings. In addition to this data are 10, 757 unique records of building rooftop ventilation assets.
What I am attempting to do, through nesting and map() functions is to create a pair of nested rooftop ventilation datasets that is readable in Excel format.
My goal is for one of these datasets to be at the development level with each unique building record concatenated in a nested list within one field; matching with those building record fields are unique building roof fans also in a concatenated list.
The other dataset will be at the building level with unique building code information matched to fields of unique building rooftop fans nested in a concatenated list.
As mentioned before, these all need to be readable when transforming this data to a Microsoft Excel format.
I have been successful in this goal before about a year ago for a similar operation. I have the code but it is so poorly commented that I am unsuccessful in re-tracing my steps and repeating my accomplishment.
Here is the reprex
#Read in Mar 04 2022 roof fan and ventilation systems upgrade and modernization project business analysis data-set (building data)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(tidyverse)
-- Attaching packages ------------------------------------------------------------------------ tidyverse 1.3.1 --
v ggplot2 3.3.5 v purrr 0.3.4
v tibble 3.1.2 v stringr 1.4.0
v tidyr 1.1.3 v forcats 0.5.1
v readr 2.1.2
-- Conflicts --------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Warning message:
package ‘readr’ was built under R version 4.1.2
library(readxl)
library(writexl)
library(purrr)#Read in updated version of business analysis individual roof fan asset data
IndivVentAsset_04mar2022 <- read_excel("SpreadsheetData/Copy of Roof Fan Ventilation Tracking_04mar2022.xlsx",
- sheet = "Individual Fan Data", col_types = c("text", "numeric", "text", "text", "text", "text", "text", "numeric", "numeric",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"text", "numeric", "numeric", "numeric", "numeric", "numeric", "text", "numeric", "text", "numeric", "numeric", "text", "text", "text", "numeric", "text", "text", "text", "text", "text", "numeric",
"text", "text", "date", "text", "text",
"date", "text", "text", "date", "text",
"text", "text", "text", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric",
"text", "text", "text", "text", "text", "text", "text"))
There were 50 or more warnings (use warnings() to see the first 50)
colnames(IndivVentAsset_04mar2022)
[1] "Roof Fan Asset ID" "Dummy" "Borough Grouping"
[4] "Managed By" "Development" "Managed By (Proper)"
[7] "Development (Proper)" "Capital Flag" "RAD Flag"
[10] "Private Flag" "Location Bypassed" "Reason for Bypassing"
[13] "Reconciled" "Building ID" "Building Text Check"
[16] "Building Number" "Building Fan Count" "Building Address"
[19] "Development Fan Count" "Unique Consolidation Count" "Unique Development Count"
[22] "Unique Development Building Count" "TDS_num" "TDS_text"
[25] "TDS Text Check" "Asset Verified" "Time Clock Present"
[28] "Clock Bypassed" "Engineering Labelling" "Engineering Rank"
[31] "Replacement Roof Fan Model" "Attachment/Extension Required?" "Fan to Be Retained"
[34] "Retention Type" "Roll-Out Order" "Phase"
[37] "Phase Detail" "Development Status Rank" "Fan Status"
[40] "Fan Status_Full" "Fan Install Date" "Install Age"
[43] "Install Quarter" "Fan Installation Month" "Engineer Assigned"
[46] "Asbestos Filing Status" "Fan Delivery Date" "Contractor Name"
[49] "Secondary Contractor" "Asbestos Flag" "Asbestos Tested"
[52] "Asbestos Tested #" "Asbestos Flag #" "In Procurement"
[55] "Bypass #" "Not Yet Assigned" "Asbestos"
[58] "Initial Asbestos" "Non-Residential" "To Be Retained"
[61] "Preliminary Preparation" "Order Initiated" "Fans Ordered"
[64] "Fans Installed" "Outside Initiative" "Not on Manufacturer List"
[67] "Validation" "Assigned to Engineer" "Site Visit Completed"
[70] "Engineering Docs Received" "Contractor Assigned" "Fan Order Initiated"
[73] "Fan Order Confirmed" "Fans Installed2"#Group asset data based on NYCHA development
DvlmtDataGrouping<- IndivVentAsset_04mar2022%>%
- group_by(Development,
Managed By
,Development (Proper)
,Borough Grouping
,Development Fan Count
, TDS_num, TDS_text, Phase,Phase Detail
)%>% - summarise(n())
summarise()
has grouped output by 'Development', 'Managed By', 'Development (Proper)', 'Borough Grouping', 'Development Fan Count', 'TDS_num', 'TDS_text', 'Phase'. You can override using the.groups
argument.
#Prepare dataset of roof fan assets into nested format, grouped by development
DvlmtNestedDataC <- IndivVentAsset_04mar2022%>%
- group_by(Development, TDS_text,
Managed By
,Borough Grouping
, Phase,Phase Detail
,Development Fan Count
)%>% - nest()
#Bind datasets of nested development data and summarized development data
DvlmtDataNested_Bldgs <- cbind(DvlmtDataGrouping, DvlmtNestedDataC)
New names:
- Development -> Development...1
Managed By
->Managed By...2
Borough Grouping
->Borough Grouping...4
Development Fan Count
->Development Fan Count...5
- TDS_text -> TDS_text...7
- ...
After some brief data-cleaning and massaging:
> #Print list of typical rooftop fan number codes for first development record in nested dataset
> print(DvlmtDataNestJoin[[11]][[1]]$`Roof Fan Asset ID`)
[1] "180.01.01" "180.01.03" "180.01.02" "180.01.04" "180.01.05" "180.01.06" "180.01.07" "180.01.12" "180.01.08"
[10] "180.01.16" "180.01.09" "180.01.11" "180.01.10"
> #Print list of typical building number codes for first development record in nested dataset
> #which are uniform to previous printout of rooftop fan codes
> print(DvlmtDataNestJoin[[11]][[1]]$`Building ID`)
[1] "180.01" "180.01" "180.01" "180.01" "180.01" "180.01" "180.01" "180.01" "180.01" "180.01" "180.01" "180.01"
[13] "180.01"
Thank you for your time and attention. I appreciate any and all hints to get this project moving in the right direction.