any r tutors available here? I need someone to look at my code

kylemcat · December 9, 2018, 8:50pm

I can message over the code

technocrat · December 9, 2018, 10:12pm

We're all tutors here. See FAQ: What's a reproducible example (`reprex`) and how do I do one? and post the code and a list of where you're stuck, confused or just don't understand the results. After a while you'll find what many of us do, which it's a lot easier to spot errors in someone else's code than your own. Besides, other people probably have similar questions, so your's can help build the knowledge base here.

kylemcat · December 10, 2018, 3:42am

## 1. Read in the necessary data.

# setwd("~")

ReceivedOp <- list()

for (i in 1:14) { ## csv files - T/F for every group - based on whether or not they received an operation
  ReceivedOp[[i]] <- read.csv(paste("./receivedOperationByYear/receivedOperation", (2002:2015)[i], ".csv", sep = ""))
}

rm(i)

filePathsDCODEs <- paste("./RDS AY ", 2002:2015, "/RDS_DCODE.csv", sep = "") ## The names of each file path

D_CODES <- list()

for (i in 1:14) { ## Diagnostic codes (IP or EP)
  D_CODES[[i]] <- read.csv(filePathsDCODEs[i])
}

rm(i)

filePathsDEMOs <- paste("./RDS AY ", 2002:2015, "/RDS_DEMO.csv", sep = "") ## The names of each file path

DEMO <- list()

for (i in 1:14) {
  DEMO[[i]] <- read.csv(filePathsDEMOs[i])
} ## Demographic data (i.e. ages)

rm(i)

DISCHARGE <- list()

for (i in 1:9) { ## Mortality information by INC_KEY
  DISCHARGE[[i]] <- read.csv(paste("./RDS AY ", (2007:2015)[i], "/DISCHARGE", (2007:2015)[i], ".csv", sep = ""))
}

rm(i)

for (i in 6:14) { ## Change integer division to regular division
  DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Days"] <- DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Days"] / 365
  DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Months"] <- DEMO[[i]]$AGE[DEMO[[i]]$AGEU == "Months"] / 12
  DEMO[[i]]$AGEU[DEMO[[i]]$AGEU == "Days" | DEMO[[i]]$AGEU == "Months"] <- "Years"
}

## 2. Collect data of interest

RuptureTypeList <- list()
InteractionTerms <- list()
DemoList <- list()
AgexRuptureTypexOperationTypeInteractions <- list()
AgexOperationTypeInteractions <- list()
AgexRuptureTypexMortalityInteractions <- list()
OrganizedDischarge <- list()
AgexRupturexOpxMort <- list()
pelvicFractures <- list()

for (i in 1:14) { ## Pre-allocate memory
  RuptureType <- rep(NA, nrow(ReceivedOp[[i]]))
  RuptureTypeList[[i]] <- RuptureType
  InteractionTerms[[i]] <- RuptureType
  DemoList[[i]] <- RuptureType
  AgexRuptureTypexOperationTypeInteractions[[i]] <- RuptureType
  AgexOperationTypeInteractions[[i]] <- RuptureType
  AgexRuptureTypexMortalityInteractions[[i]] <- RuptureType
  OrganizedDischarge[[i]] <- RuptureType
  AgexRupturexOpxMort[[i]] <- RuptureType
  pelvicFractures[[i]] <- matrix(rep(RuptureType, 16), ncol = 16)
}

rm(RuptureType, i)

for (i in 1:14) { ## The code here is going to determine whether there is an EP or IP rupture, by converting to numeric before checking,  difference between 867 and 867.0 is.
  for (j in 1:nrow(ReceivedOp[[i]])) { ## Going through by INC_KEY in the ReceivedOp
    if (867.1 %in% suppressWarnings(as.numeric(as.character(D_CODES[[i]]$DCODE[D_CODES[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]])))) { 
      RuptureTypeList[[i]][j] <- 867.1
    }
    else if (867 %in% suppressWarnings(as.numeric(as.character(D_CODES[[i]]$DCODE[D_CODES[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]])))) {
      RuptureTypeList[[i]][j] <- 867
    }
  }
}

rm(i, j)

for (i in 1:14) { ## Whether they had a bladder operation and whether their rupture was EP or IP.
  InteractionTerms[[i]] <- interaction(ReceivedOp[[i]]$BLADDEROP, RuptureTypeList[[i]], drop = T) ## The interaction function is very useful.
}

rm(i)

for (i in 1:14) { ## Transfer each patient's age
  for (j in 1:nrow(ReceivedOp[[i]])) {
    if (!is.na(DEMO[[i]]$AGE[DEMO[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]][1] > 0) & DEMO[[i]]$AGE[DEMO[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]][1] > (1/12)) {
      ## Condition as such after examining the data outside the admissable range and finding all values were either 0 or negative (the latter representing missing data)
      DemoList[[i]][j] <- DEMO[[i]]$AGE[DEMO[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]][1]
    }
  }
}

for (i in 1:14) { ## Convert ages into categories
  DemoList[[i]] <- ifelse(17 >= DemoList[[i]], "Child", "Adult")
}

for (i in 1:14) { ## Contains information about age, EP vs IP, and whether they received an operation.
  AgexRuptureTypexOperationTypeInteractions[[i]] <- interaction(ReceivedOp[[i]]$BLADDEROP, RuptureTypeList[[i]], DemoList[[i]], drop = T)
}

for (i in 1:14) { ## Whether they received an operation and their age
  AgexOperationTypeInteractions[[i]] <- interaction(ReceivedOp[[i]]$BLADDEROP, DemoList[[i]], drop = T)
}

xFromNA <- function(item, x) {
  if (is.na(item)) {
    item <- x
  }
  item
}

for (i in 1:9) {  ## Orders INC_KEYs for those who mortality data according to whether they received an operation or not, the outside loop goes by year.
  for (j in 1:nrow(ReceivedOp[[i]])) { ## The inside loop goes by INC_KEY from the ReceivedOp list
    if (sum(DISCHARGE[[i]]$DECEASED[DISCHARGE[[i]]$INC_KEY == ReceivedOp[[i+5]]$INC_KEY[j]], na.rm = T) > 0) {
      OrganizedDischarge[[i+5]][j] <- TRUE
    }
    else if ((xFromNA(sum(DISCHARGE[[i]]$DECEASED[DISCHARGE[[i]]$INC_KEY == ReceivedOp[[i+5]]$INC_KEY[j]]), -1) == 0)) {
      OrganizedDischarge[[i+5]][j] <- FALSE
    }
  }
}

for (i in 1:9) { ## What age group, whether they had an EP or IP rupture, and whether they died.
  AgexRuptureTypexMortalityInteractions[[i]] <- interaction(DemoList[[i+5]], RuptureTypeList[[i+5]], OrganizedDischarge[[i+5]])
}

for (i in 1:9) {
  AgexRupturexOpxMort[[i]] <- interaction(OrganizedDischarge[[i+5]], AgexRuptureTypexOperationTypeInteractions[[i+5]])
}

pelvicDCODES <- c(808.0, 808.1, 808.2, 808.3, 808.4, 808.41, 808.42, 808.43, 808.49, 808.5, 808.51, 808.52, 808.59, 808.8, 808.9)

for (i in 1:14) { ## Here we are getting whether it is an EP or IP rupture, by converting to numeric before checking, we remove the difference between 867 and 867.0.
  for (j in 1:nrow(ReceivedOp[[i]])) { ## Going through by INC_KEY in the ReceivedOp
    for (k in 1:16) {
      if (pelvicDCODES[k] %in% suppressWarnings(as.numeric(as.character(D_CODES[[i]]$DCODE[D_CODES[[i]]$INC_KEY == ReceivedOp[[i]]$INC_KEY[j]])))) { ## Try to see what happens if you mess around with this condition in a separate file so you understand why it is the way it is.
        pelvicFractures[[i]][j, k] <- TRUE
      }
      else {
        pelvicFractures[[i]][j, k] <- FALSE
      }
    }
  }
}

pelvisInteractions <- rep(list(rep(list(), 14)), 15)

pelvicDCODES <- as.character(pelvicDCODES)

for (j in 1:14) { ## Pre-allocate memory
  RuptureType <- rep(NA, nrow(ReceivedOp[[j]]))
  for (i in 1:15) {
    pelvisInteractions[[i]][[j]] <- RuptureType
  }
}

names(pelvisInteractions) <- pelvicDCODES

for (i in 1:15) {
  for (j in 1:9) {
    pelvisInteractions[[i]][[j+5]] <- interaction(AgexRuptureTypexOperationTypeInteractions[[j+5]], as.logical(pelvicFractures[[j+5]][ , i]))
  }
}

pelvisInteractions <- lapply(pelvisInteractions, function(x) 'names<-'(x, 2002:2015))

pelvisInteractions <- rapply(pelvisInteractions, table, how = "replace")

## 3. Collect counts

EPvIP <- t(data.frame(lapply(RuptureTypeList, table)))[-(1+(1:13)*2), ] ## Calculate frequencies by year, same for all the rest
colnames(EPvIP) <- EPvIP[1, ]
EPvIP <- EPvIP[-1, ]
rownames(EPvIP) <- 2002:2015 ## Put into nice format for writing to file, same for all the rest

ReceivedOperationAndRuptureType <- t(data.frame(lapply(InteractionTerms, table)))[-(1+(1:13)*2), ]
colnames(ReceivedOperationAndRuptureType) <- ReceivedOperationAndRuptureType[1, ]
ReceivedOperationAndRuptureType <- ReceivedOperationAndRuptureType[-1, ]
rownames(ReceivedOperationAndRuptureType) <- 2002:2015

ReceivedOperationAndRuptureTypeAndAge <- t(data.frame(lapply(AgexRuptureTypexOperationTypeInteractions, table)))[-(1+(1:13)*2), ]
colnames(ReceivedOperationAndRuptureTypeAndAge) <- ReceivedOperationAndRuptureTypeAndAge[1, ]
ReceivedOperationAndRuptureTypeAndAge <- ReceivedOperationAndRuptureTypeAndAge[-1, ]
rownames(ReceivedOperationAndRuptureTypeAndAge) <- 2002:2015

ReceivedOperationAndAge <- t(data.frame(lapply(AgexOperationTypeInteractions, table)))[-(1+1:13*2), ]
colnames(ReceivedOperationAndAge) <- ReceivedOperationAndAge[1, ]
ReceivedOperationAndAge <- ReceivedOperationAndAge[-1, ]
rownames(ReceivedOperationAndAge) <- 2002:2015

AgeAndRuptureTypeAndMortality <- t(data.frame(lapply(AgexRuptureTypexMortalityInteractions[1:9], table)))[-(1+1:9*2), ]
colnames(AgeAndRuptureTypeAndMortality) <- AgeAndRuptureTypeAndMortality[1, ]
AgeAndRuptureTypeAndMortality <- AgeAndRuptureTypeAndMortality[-1, ]
rownames(AgeAndRuptureTypeAndMortality) <- 2007:2015

Mortality <- t(data.frame(lapply(OrganizedDischarge[6:14], table)))[-(1+1:9*2), ]
colnames(Mortality) <- Mortality[1, ]
Mortality <- Mortality[-1, ]
rownames(Mortality) <- 2007:2015

ReceivedAgexRupturexOpxMort <- t(data.frame(lapply(AgexRupturexOpxMort[1:9], table)))[-(1+1:9*2), ]
colnames(ReceivedAgexRupturexOpxMort) <- ReceivedAgexRupturexOpxMort[1, ]
ReceivedAgexRupturexOpxMort <- ReceivedAgexRupturexOpxMort[-1, ]
rownames(ReceivedAgexRupturexOpxMort) <- 2007:2015

pelvisInteractions <- lapply(pelvisInteractions, function(x) x[-(1:5)])

for (i in 1:length(pelvisInteractions)) {
  for (j in 1:length(pelvisInteractions[[i]])) {
    pelvisResults <- t(data.frame(pelvisInteractions[[i]]))[-(1+(1:13)*2), ]
    colnames(pelvisResults) <- pelvisResults[1, ]
    pelvisResults <- pelvisResults[-1, ]
    rownames(pelvisResults) <- 2007:2015
  }
  write.csv(pelvisResults, paste0("pelvisdcode", names(pelvisInteractions)[i], "xEverything.csv"))
}

## 4. Write to files

write.csv(EPvIP, "resultsEPvIP.csv")
write.csv(ReceivedOperationAndRuptureType, "resultsOperationAndRupture.csv")
write.csv(ReceivedOperationAndRuptureTypeAndAge, "resultsOperationAndRuptureAndAge.csv")
write.csv(ReceivedOperationAndAge, "resultsOperationAndAge.csv")
write.csv(Mortality, "resultsMortality.csv")
write.csv(AgeAndRuptureTypeAndMortality, "resultsAgeRuptureTypeMortality.csv")
write.csv(ReceivedAgexRupturexOpxMort, "resultsOperationAndRuptureAndAgeAndMortality.csv")

jcblum · December 10, 2018, 4:05am

Hi @kylemcat!

As @technocrat mentioned, people are going to need more explanation of what you’re doing and what your problem is in order to be able to help. Is this code possibly part of a class you’re taking? If so, please start by reading this: FAQ: Homework Policy

Otherwise, the best way for you to get help quickly is to:

Identify what your specific problem is (are you getting an error message from a particular part of your code? Are you not sure how to make a specific part of it work the way you want?)
Explain that problem
Include a small, self-contained code example that shows your problem

If you can’t figure out step 3, at least start with steps 1 and 2. People here may be able to help you at least somewhat from there. I’ll tell you up front that it’s going to be tough for people to give detailed help if they don’t have an example to work from or can’t run your code themselves. Your full code depends on data files only you have, so it’s not possible for anyone else to run it right now. Normally, we might ask you to share a sample of the data, but if you are working with patient data that is not possible. No sensitive data should be shared here. One option is to create some synthetic data to use as an example in your question.

technocrat · December 10, 2018, 5:19am

Without a bit of representative data (even if fabricated), I can't check for any errors easily, but I can offer some general comments.

You obviously have prior programing experience in an imperative/procedural language such as C++
You're applying those concepts in R, which is overwhelmingly a functional language, with only light reliance on control structures. That's not wrong, it's simply doing it the hard way.
Illustration: Let's start at the bottom. Did your saved objects result in csv files with the form your were looking for? Perhaps there was a missing column name for rows?Any encoding problems? If there were, it's because there are optional arguments to adjust that. Almost everything in R is a function with at least one and often multiple arguments, some of which are optional and other mandatory, in which case they may have a default.
You're off to a good start in creating a list of csv files to be read into the namespace. There's an implicit assumption, that they have identical structures and don't need variations in optional arguments to read.csv
It's a great rule of thumb that almost anything you need to do in R has a package containing a set of functions to do it. In this forum are many worshippers of the church of the tidyverse Part of that package of packages is readr. In place on the control loop, and assuming that the csv files are identically structured, the more idiomatic way to do this in R would be

library(readr)
library(dplyr)
comb_data = lapply(filePathsDEMOs, read_csv) %>% bind_rows()

Now that you have a largish data object, comb_data you have some date related fields that you want to adjust. Here's a similar snippet with some toy data

> dates
     days months
[1,] 1215    128
> dates <- as.tibble(dates)
> dates %>% mutate(days = days/365, months = months/12, years = days + months)
# A tibble: 1 x 3
   days months years
  <dbl>  <dbl> <dbl>
1  3.33   10.7  14.0
# to save separately
dated <- dates %>% mutate(days = days/365, months = months/12, years = days + months)
# to write back
dates <- dates %>% mutate(days = days/365, months = months/12, years = days + months)

In collecting your data of interest, assuming you have a number of uninteresting columns

selected_data <- comb_data %>% select(RuptureTypeList, InteractionTerms, DemoList, AgexRuptureTypexOperationTypeInteractions, AgexOperationTypeInteractions, AgexRuptureTypexMortalityInteractions, OrganizedDischarge, AgexRupturexOpxMort, pelvicFractures)

R is lazy and there's no need to pre-allocate memory.
There are analogous idioms that minimize or eliminate entirely the need for looping and index slicing. This is all predicated on the notion of a tidy data structure with variables as columns and observations as rows. Not to worry, there's a transpose feature.
You can filter rows/observation on values in one or more columns/variables with booleans.
Finally, you have all of your observations of interest in a single data object, which keeps a clean workspace.
Wickham & Grolemund's new O'Reilly title R For Data Science will give you a thorough grounding in applying these principles. To me the key is to think of R as algebra, not programming.

Let me (and everyone else) know if you have specific questions.

kylemcat · December 10, 2018, 5:41am

Hi technocrat,

Thank you for your advice. I am actually not a data scientist and merely a researcher. I had someone else do the coding for me and I only have a basic understanding of R. I have since been unable to contact the before mentioned programmer and I’m kinda in a bind now.

I am trying to prove that the code does what I want it to do but I have no way of proving so.

I can post up the working director, if that works. But I agree with what you had said. Other people that have seen the code have called it “old fashioned” with the loops. But I’m too much of a novice to implement the changes that you have recommended.

What do you recommend that I do? Thank you again!!

rensa · December 10, 2018, 5:49am

Hi @kylemcat! as the others have mentioned, some context would be really helpful here. What does your code do? (Or, rather, what are you hoping it does?)

technocrat · December 10, 2018, 6:33am

Let's try divide and conquer. If we can test a portion of your dataset whether it's live or fabricated data (live, if it protects anything private, is best), there will be something to work with that you can assess and then replicate.

It sounds like each of these is pretty hefty in its on right. What I'd like to try first is to see what one of these looks like in terms of layout (rows = observations, columns = variables or v.v.?), how data is being imported. Are things that look like numbers actual numbers like 3.14 or strings like "3.14." Are strings being imported as factors (probably, but that can be fixed).

You've already identified items of interest. Let's take one, say pelvisdcode. It seem like you want to classify into some sort of diagnostic code? Anyway, pick one, describe how you need it transformed and what properly formed output should look like. Some of it has sort of the flavor of cross tabulations.

Finally, am I correct that these are destined for importing into Excel?

Since someone else did the coding (which isn't so much old-fashioned as just not the way we do things around here, you're at a double disadvantage, not being able to understand how your goals are being implemented either in that style or idiomatically.

And don't let the BS term data scientist rattle you. It's applied statistics on one end and computer engineering on the other (making it run fast and reliably with really lots and lots of data). Many of us never get involved in the engineering aspect. If you're dealing with health data this complex you're a data scientist researcher. You just need some time to get up to speed is all.

system · December 30, 2018, 3:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.