Create dataframe with different length coloumns

Attila1 · September 24, 2023, 3:26pm

I have csv files named 5-1-1_RRIntervals, 5-1-2_RRIntervals, 5-1-3_RRIntervals, ... 15-15-4_RRIntervals, 15-15-5_RRIntervals.
They look like this:

timestamp, rr, since_start

1680519258168,725,372433
1680519258175,662,373095
1680519259158,681,373776
1680519261138,698,374474
1680519261145,678,375152
1680519261154,651,375803
1680519262127,666,376469
1680519263078,699,377168
1680519263084,746,377914
1680519264108,714,378628
...
I have the following code:

big_table <- data.frame()

firsts <- c("5")
for (first in firsts) {
for (third in 1:5) {
SI_combined <- c()
for (second in 1:15) {
file_name <- paste(first, second, third, sep = '-')

  if (file.exists(paste(file_name, "RRIntervals.csv", sep = '_'))) {
    data <- read.csv(paste(file_name, "RRIntervals.csv", sep = '_'))
    middle_col <- adattar[2]
    rr_intervals_raw <- adatok[-1,]
    rr_intervals <- round(rr_intervals_elso, digits = -1)
    
    mxdmn <- 0
    mo <- 0
    
    bins <- seq(0, 1500, by = 50)
    SI_values <- c()
    summation <- c()
    
    for (i in 1:length(rr_intervals)) {
      summation <- c(summation, rr_intervals[i])
      
      if (sum(summation) >= 30000) {
        mxdmn <- max(summation) - min(summation)
        mo <- Mode(summation)
        
        data_binned <- cut(summation, breaks = bins, right = TRUE, include.lowest = TRUE)
        bin_counts <- table(data_binned)
        
        x_bin <- cut(mo, breaks = bins, right = TRUE, include.lowest = TRUE)
        helyes_bin <- bin_counts[x_bin]
        összes_száma <- length(osszeg)
        Amo <- (helyes_bin / összes_száma) * 100
        SI <- Amo * 1000000 / (2 * mo * mxdmn)
        
        SI_values <- c(SI_values, SI)
      
        
        summation <- c()
      }
    }
    SI_combined <- c(SI_combined, SI_values)
  }
}

}
}

I want to add the five SI_combined to the big_table dataframe as coloumns, but they have different length. The rest should be filled with NA, but I could not manage it.
And I want to name the coloumns according to the "third" value (in the second for loop).

Thanks for the help.

AlexisW · September 26, 2023, 7:16pm

I can't understand your code, there are several objects that are not defined here (e.g. adattar, adatok, ...) and you actually don't use data in this code. Also, I would say it's not typical R-style code (which doesn't mean it's incorrect!)

You can't create a data frame with different column lengths, and this is by design. In general, the point of data frames is to store data where each row is a sample and each column a variable. So the columns have to be the same lengths so that the values in the rows match.

You have basically two solutions. From your code, I suspect you may be familiar with programming in a different language than R. In that case you can continue with the same logic, and you can use a list to store data: by construction a data.frame is just a list of columns, with the added constraint that the columns are the same lengths. So using a list you can do what you intended to do, but you will not have access to the more powerful data frame-oriented functions that R offers.

The other solution is to try to take a more R-style approach. As I don't understand your code I can't give you specific indications, an idea might be to start with:

SI_combined <- expand.grid(first = 5, second = 1:15, third = 1:5)
SI_combined$file_name <- paste0(SI_combined $first,"-", SI_combined $second, "-", SI_combined $third, "_", "RRIntervals.csv")

(just to give you an idea of a column-oriented logic)

A good R way to approach data manipulation is described in the book R for data science in details, if you want.

system · November 7, 2023, 7:16pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.