I can message over the code
We're all tutors here. See FAQ: What's a reproducible example (`reprex`) and how do I do one? and post the code and a list of where you're stuck, confused or just don't understand the results. After a while you'll find what many of us do, which it's a lot easier to spot errors in someone else's code than your own. Besides, other people probably have similar questions, so your's can help build the knowledge base here.
## 1. Read in the necessary data.
# setwd("~")
ReceivedOp <- list()
for (i in 1:14) { ## csv files - T/F for every group - based on whether or not they received an operation
ReceivedOp[[i]] <- read.csv(paste("./receivedOperationByYear/receivedOperation", (2002:2015)[i], ".csv", sep = ""))
}
rm(i)
filePathsDCODEs <- paste("./RDS AY ", 2002:2015, "/RDS_DCODE.csv", sep = "") ## The names of each file path
D_CODES <- list()
for (i in 1:14) { ## Diagnostic codes (IP or EP)
D_CODES[[i]] <- read.csv(filePathsDCODEs[i])
}
rm(i)
filePathsDEMOs <- paste("./RDS AY ", 2002:2015, "/RDS_DEMO.csv", sep = "") ## The names of each file path
DEMO <- list()
for (i in 1:14) {
DEMO[[i]] <- read.csv(filePathsDEMOs[i])
} ## Demographic data (i.e. ages)
rm(i)
DISCHARGE <- list()
for (i in 1:9) { ## Mortality information by INC_KEY
DISCHARGE[[i]] <- read.csv(paste("./RDS AY ", (2007:2015)[i], "/DISCHARGE", (2007:2015)[i], ".csv", sep = ""))
}
rm(i)
for (i in 6:14) { ## Change integer division to regular division
DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Days"] <- DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Days"] / 365
DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Months"] <- DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Months"] / 12
DEMO[[i]]$AGEU[DEMO[[i]]$AGEU == "Days" | DEMO[[i]]$AGEU == "Months"] <- "Years"
}
## 2. Collect data of interest
RuptureTypeList <- list()
InteractionTerms <- list()
DemoList <- list()
AgexRuptureTypexOperationTypeInteractions <- list()
AgexOperationTypeInteractions <- list()
AgexRuptureTypexMortalityInteractions <- list()
OrganizedDischarge <- list()
AgexRupturexOpxMort <- list()
pelvicFractures <- list()
for (i in 1:14) { ## Pre-allocate memory
RuptureType <- rep(NA, nrow(ReceivedOp[[i]]))
RuptureTypeList[[i]] <- RuptureType
InteractionTerms[[i]] <- RuptureType
DemoList[[i]] <- RuptureType
AgexRuptureTypexOperationTypeInteractions[[i]] <- RuptureType
AgexOperationTypeInteractions[[i]] <- RuptureType
AgexRuptureTypexMortalityInteractions[[i]] <- RuptureType
OrganizedDischarge[[i]] <- RuptureType
AgexRupturexOpxMort[[i]] <- RuptureType
pelvicFractures[[i]] <- matrix(rep(RuptureType, 16), ncol = 16)
}
rm(RuptureType, i)
for (i in 1:14) { ## The code here is going to determine whether there is an EP or IP rupture, by converting to numeric before checking, difference between 867 and 867.0 is.
for (j in 1:nrow(ReceivedOp[[i]])) { ## Going through by INC_KEY in the ReceivedOp
if (867.1 %in% suppressWarnings(as.numeric(as.character(D_CODES[[i]]$DCODE[D_CODES[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]])))) {
RuptureTypeList[[i]][j] <- 867.1
}
else if (867 %in% suppressWarnings(as.numeric(as.character(D_CODES[[i]]$DCODE[D_CODES[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]])))) {
RuptureTypeList[[i]][j] <- 867
}
}
}
rm(i, j)
for (i in 1:14) { ## Whether they had a bladder operation and whether their rupture was EP or IP.
InteractionTerms[[i]] <- interaction(ReceivedOp[[i]]$BLADDEROP, RuptureTypeList[[i]], drop = T) ## The interaction function is very useful.
}
rm(i)
for (i in 1:14) { ## Transfer each patient's age
for (j in 1:nrow(ReceivedOp[[i]])) {
if (!is.na(DEMO[[i]]$AGE[DEMO[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]][1] > 0) & DEMO[[i]]$AGE[DEMO[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]][1] > (1/12)) {
## Condition as such after examining the data outside the admissable range and finding all values were either 0 or negative (the latter representing missing data)
DemoList[[i]][j] <- DEMO[[i]]$AGE[DEMO[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]][1]
}
}
}
for (i in 1:14) { ## Convert ages into categories
DemoList[[i]] <- ifelse(17 >= DemoList[[i]], "Child", "Adult")
}
for (i in 1:14) { ## Contains information about age, EP vs IP, and whether they received an operation.
AgexRuptureTypexOperationTypeInteractions[[i]] <- interaction(ReceivedOp[[i]]$BLADDEROP, RuptureTypeList[[i]], DemoList[[i]], drop = T)
}
for (i in 1:14) { ## Whether they received an operation and their age
AgexOperationTypeInteractions[[i]] <- interaction(ReceivedOp[[i]]$BLADDEROP, DemoList[[i]], drop = T)
}
xFromNA <- function(item, x) {
if (is.na(item)) {
item <- x
}
item
}
for (i in 1:9) { ## Orders INC_KEYs for those who mortality data according to whether they received an operation or not, the outside loop goes by year.
for (j in 1:nrow(ReceivedOp[[i]])) { ## The inside loop goes by INC_KEY from the ReceivedOp list
if (sum(DISCHARGE[[i]]$DECEASED[DISCHARGE[[i]]$INC_KEY == ReceivedOp[[i+5]]$INC_KEY[j]], na.rm = T) > 0) {
OrganizedDischarge[[i+5]][j] <- TRUE
}
else if ((xFromNA(sum(DISCHARGE[[i]]$DECEASED[DISCHARGE[[i]]$INC_KEY == ReceivedOp[[i+5]]$INC_KEY[j]]), -1) == 0)) {
OrganizedDischarge[[i+5]][j] <- FALSE
}
}
}
for (i in 1:9) { ## What age group, whether they had an EP or IP rupture, and whether they died.
AgexRuptureTypexMortalityInteractions[[i]] <- interaction(DemoList[[i+5]], RuptureTypeList[[i+5]], OrganizedDischarge[[i+5]])
}
for (i in 1:9) {
AgexRupturexOpxMort[[i]] <- interaction(OrganizedDischarge[[i+5]], AgexRuptureTypexOperationTypeInteractions[[i+5]])
}
pelvicDCODES <- c(808.0, 808.1, 808.2, 808.3, 808.4, 808.41, 808.42, 808.43, 808.49, 808.5, 808.51, 808.52, 808.59, 808.8, 808.9)
for (i in 1:14) { ## Here we are getting whether it is an EP or IP rupture, by converting to numeric before checking, we remove the difference between 867 and 867.0.
for (j in 1:nrow(ReceivedOp[[i]])) { ## Going through by INC_KEY in the ReceivedOp
for (k in 1:16) {
if (pelvicDCODES[k] %in% suppressWarnings(as.numeric(as.character(D_CODES[[i]]$DCODE[D_CODES[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]])))) { ## Try to see what happens if you mess around with this condition in a separate file so you understand why it is the way it is.
pelvicFractures[[i]][j, k] <- TRUE
}
else {
pelvicFractures[[i]][j, k] <- FALSE
}
}
}
}
pelvisInteractions <- rep(list(rep(list(), 14)), 15)
pelvicDCODES <- as.character(pelvicDCODES)
for (j in 1:14) { ## Pre-allocate memory
RuptureType <- rep(NA, nrow(ReceivedOp[[j]]))
for (i in 1:15) {
pelvisInteractions[[i]][[j]] <- RuptureType
}
}
names(pelvisInteractions) <- pelvicDCODES
for (i in 1:15) {
for (j in 1:9) {
pelvisInteractions[[i]][[j+5]] <- interaction(AgexRuptureTypexOperationTypeInteractions[[j+5]], as.logical(pelvicFractures[[j+5]][ , i]))
}
}
pelvisInteractions <- lapply(pelvisInteractions, function(x) 'names<-'(x, 2002:2015))
pelvisInteractions <- rapply(pelvisInteractions, table, how = "replace")
## 3. Collect counts
EPvIP <- t(data.frame(lapply(RuptureTypeList, table)))[-(1+(1:13)*2), ] ## Calculate frequencies by year, same for all the rest
colnames(EPvIP) <- EPvIP[1, ]
EPvIP <- EPvIP[-1, ]
rownames(EPvIP) <- 2002:2015 ## Put into nice format for writing to file, same for all the rest
ReceivedOperationAndRuptureType <- t(data.frame(lapply(InteractionTerms, table)))[-(1+(1:13)*2), ]
colnames(ReceivedOperationAndRuptureType) <- ReceivedOperationAndRuptureType[1, ]
ReceivedOperationAndRuptureType <- ReceivedOperationAndRuptureType[-1, ]
rownames(ReceivedOperationAndRuptureType) <- 2002:2015
ReceivedOperationAndRuptureTypeAndAge <- t(data.frame(lapply(AgexRuptureTypexOperationTypeInteractions, table)))[-(1+(1:13)*2), ]
colnames(ReceivedOperationAndRuptureTypeAndAge) <- ReceivedOperationAndRuptureTypeAndAge[1, ]
ReceivedOperationAndRuptureTypeAndAge <- ReceivedOperationAndRuptureTypeAndAge[-1, ]
rownames(ReceivedOperationAndRuptureTypeAndAge) <- 2002:2015
ReceivedOperationAndAge <- t(data.frame(lapply(AgexOperationTypeInteractions, table)))[-(1+1:13*2), ]
colnames(ReceivedOperationAndAge) <- ReceivedOperationAndAge[1, ]
ReceivedOperationAndAge <- ReceivedOperationAndAge[-1, ]
rownames(ReceivedOperationAndAge) <- 2002:2015
AgeAndRuptureTypeAndMortality <- t(data.frame(lapply(AgexRuptureTypexMortalityInteractions[1:9], table)))[-(1+1:9*2), ]
colnames(AgeAndRuptureTypeAndMortality) <- AgeAndRuptureTypeAndMortality[1, ]
AgeAndRuptureTypeAndMortality <- AgeAndRuptureTypeAndMortality[-1, ]
rownames(AgeAndRuptureTypeAndMortality) <- 2007:2015
Mortality <- t(data.frame(lapply(OrganizedDischarge[6:14], table)))[-(1+1:9*2), ]
colnames(Mortality) <- Mortality[1, ]
Mortality <- Mortality[-1, ]
rownames(Mortality) <- 2007:2015
ReceivedAgexRupturexOpxMort <- t(data.frame(lapply(AgexRupturexOpxMort[1:9], table)))[-(1+1:9*2), ]
colnames(ReceivedAgexRupturexOpxMort) <- ReceivedAgexRupturexOpxMort[1, ]
ReceivedAgexRupturexOpxMort <- ReceivedAgexRupturexOpxMort[-1, ]
rownames(ReceivedAgexRupturexOpxMort) <- 2007:2015
pelvisInteractions <- lapply(pelvisInteractions, function(x) x[-(1:5)])
for (i in 1:length(pelvisInteractions)) {
for (j in 1:length(pelvisInteractions[[i]])) {
pelvisResults <- t(data.frame(pelvisInteractions[[i]]))[-(1+(1:13)*2), ]
colnames(pelvisResults) <- pelvisResults[1, ]
pelvisResults <- pelvisResults[-1, ]
rownames(pelvisResults) <- 2007:2015
}
write.csv(pelvisResults, paste0("pelvisdcode", names(pelvisInteractions)[i], "xEverything.csv"))
}
## 4. Write to files
write.csv(EPvIP, "resultsEPvIP.csv")
write.csv(ReceivedOperationAndRuptureType, "resultsOperationAndRupture.csv")
write.csv(ReceivedOperationAndRuptureTypeAndAge, "resultsOperationAndRuptureAndAge.csv")
write.csv(ReceivedOperationAndAge, "resultsOperationAndAge.csv")
write.csv(Mortality, "resultsMortality.csv")
write.csv(AgeAndRuptureTypeAndMortality, "resultsAgeRuptureTypeMortality.csv")
write.csv(ReceivedAgexRupturexOpxMort, "resultsOperationAndRuptureAndAgeAndMortality.csv")
Hi @kylemcat!
As @technocrat mentioned, people are going to need more explanation of what you’re doing and what your problem is in order to be able to help. Is this code possibly part of a class you’re taking? If so, please start by reading this: FAQ: Homework Policy
Otherwise, the best way for you to get help quickly is to:
- Identify what your specific problem is (are you getting an error message from a particular part of your code? Are you not sure how to make a specific part of it work the way you want?)
- Explain that problem
- Include a small, self-contained code example that shows your problem
If you can’t figure out step 3, at least start with steps 1 and 2. People here may be able to help you at least somewhat from there. I’ll tell you up front that it’s going to be tough for people to give detailed help if they don’t have an example to work from or can’t run your code themselves. Your full code depends on data files only you have, so it’s not possible for anyone else to run it right now. Normally, we might ask you to share a sample of the data, but if you are working with patient data that is not possible. No sensitive data should be shared here. One option is to create some synthetic data to use as an example in your question.
Without a bit of representative data (even if fabricated), I can't check for any errors easily, but I can offer some general comments.
- You obviously have prior programing experience in an imperative/procedural language such as
C++
- You're applying those concepts in
R
, which is overwhelmingly a functional language, with only light reliance on control structures. That's notwrong
, it's simply doing it the hard way. - Illustration: Let's start at the bottom. Did your saved objects result in csv files with the form your were looking for? Perhaps there was a missing column name for rows?Any encoding problems? If there were, it's because there are optional arguments to adjust that. Almost everything in
R
is a function with at least one and often multiple arguments, some of which are optional and other mandatory, in which case they may have a default. - You're off to a good start in creating a list of csv files to be read into the namespace. There's an implicit assumption, that they have identical structures and don't need variations in optional arguments to
read.csv
- It's a great rule of thumb that almost anything you need to do in
R
has a package containing a set of functions to do it. In this forum are many worshippers of the church of thetidyverse
Part of that package of packages isreadr
. In place on the control loop, and assuming that the csv files are identically structured, the more idiomatic way to do this inR
would be
library(readr)
library(dplyr)
comb_data = lapply(filePathsDEMOs, read_csv) %>% bind_rows()
- Now that you have a largish data object,
comb_data
you have some date related fields that you want to adjust. Here's a similar snippet with some toy data
> dates
days months
[1,] 1215 128
> dates <- as.tibble(dates)
> dates %>% mutate(days = days/365, months = months/12, years = days + months)
# A tibble: 1 x 3
days months years
<dbl> <dbl> <dbl>
1 3.33 10.7 14.0
# to save separately
dated <- dates %>% mutate(days = days/365, months = months/12, years = days + months)
# to write back
dates <- dates %>% mutate(days = days/365, months = months/12, years = days + months)
- In collecting your data of interest, assuming you have a number of uninteresting columns
selected_data <- comb_data %>% select(RuptureTypeList, InteractionTerms, DemoList, AgexRuptureTypexOperationTypeInteractions, AgexOperationTypeInteractions, AgexRuptureTypexMortalityInteractions, OrganizedDischarge, AgexRupturexOpxMort, pelvicFractures)
-
R
is lazy and there's no need to pre-allocate memory. - There are analogous idioms that minimize or eliminate entirely the need for looping and index slicing. This is all predicated on the notion of a
tidy
data structure with variables as columns and observations as rows. Not to worry, there's a transpose feature. - You can filter rows/observation on values in one or more columns/variables with booleans.
- Finally, you have all of your observations of interest in a single data object, which keeps a clean workspace.
- Wickham & Grolemund's new O'Reilly title R For Data Science will give you a thorough grounding in applying these principles. To me the key is to think of
R
as algebra, not programming.
Let me (and everyone else) know if you have specific questions.
Hi technocrat,
Thank you for your advice. I am actually not a data scientist and merely a researcher. I had someone else do the coding for me and I only have a basic understanding of R. I have since been unable to contact the before mentioned programmer and I’m kinda in a bind now.
I am trying to prove that the code does what I want it to do but I have no way of proving so.
I can post up the working director, if that works. But I agree with what you had said. Other people that have seen the code have called it “old fashioned” with the loops. But I’m too much of a novice to implement the changes that you have recommended.
What do you recommend that I do? Thank you again!!
Hi @kylemcat! as the others have mentioned, some context would be really helpful here. What does your code do? (Or, rather, what are you hoping it does?)
Let's try divide and conquer. If we can test a portion of your dataset whether it's live or fabricated data (live, if it protects anything private, is best), there will be something to work with that you can assess and then replicate.
It sounds like each of these is pretty hefty in its on right. What I'd like to try first is to see what one of these looks like in terms of layout (rows = observations, columns = variables or v.v.?), how data is being imported. Are things that look like numbers actual numbers like 3.14 or strings like "3.14." Are strings being imported as factors
(probably, but that can be fixed).
You've already identified items of interest. Let's take one, say pelvisdcode
. It seem like you want to classify into some sort of diagnostic code? Anyway, pick one, describe how you need it transformed and what properly formed output should look like. Some of it has sort of the flavor of cross tabulations.
Finally, am I correct that these are destined for importing into Excel
?
Since someone else did the coding (which isn't so much old-fashioned as just not the way we do things around here, you're at a double disadvantage, not being able to understand how your goals are being implemented either in that style or idiomatically.
And don't let the BS term data scientist
rattle you. It's applied statistics on one end and computer engineering on the other (making it run fast and reliably with really lots and lots of data). Many of us never get involved in the engineering aspect. If you're dealing with health data this complex you're a data scientist researcher. You just need some time to get up to speed is all.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.