Using rbind() to add rows to Empty Dataframe

Dear colleagues,
I am trying to add rows to a data frame in R, but I don't want the column names to get modified. Here is the example, which'll help to explain the problem. Since the data is preloaded in R, I feel it should be easy to reproduce the example;

library(mlbench)
data("Soybean")
library(tidyverse)
var_unique <- data.frame(col_name=character(0),unique_cnt=integer(0))
col_names <- c(names(Soybean))
cnt <- 0
for (i in 2:5){
  variable <- col_names[i]
  cnt <- length(unique(Soybean[,variable]))
  #print (cnt)
  vec <- c(variable, cnt)
  #print(vec)
  var_unique <- rbind(var_unique,vec)

 }

As you can see I'm trying to make a data frame with count of unique values for the variable. Can I get help with two questions;

  • First, when the loop executes, I get the answer as following;
> var_unique
      X.date. X.8.
1        date    8
2 plant.stand    3
3      precip    4
4        temp    4

I am not very sure why did the column names get renamed. Is it possible to keep the column names as col_name and unique_cnt?

  • Second, earlier I defined the column unique_cnt as numeric data type, but since we're coercing it into a vector vec, the final data type turns out to be character. Is it possible to keep the datatype as it is, when the data frame is defined in the first place.

Help is greatly appreciated. Thanks in advance.

Since you already know how the resulting data.frame should look like I suggest you ore allocate it and fill it with values. You shouldn't have problems then

Can you kindly elaborate with an example?I am sorry but it seems I am not being able to follow you.

Yes, of course, but your code is really diffuse. You are adding rows to your data.frame and you have a data.frame with names, but you just get rid of it and overwrite your object. To keep it short, what about replace it to

var_unique[i, ] <- vec

a tale of two syntax's

library(mlbench)
data("Soybean")
library(tidyverse)
var_unique <- data.frame(col_name=character(0),unique_cnt=integer(0))
col_names <- c(names(Soybean))
cnt <- 0
for (i in 2:5){
  variable <- col_names[i]
  cnt <- length(unique(Soybean[,variable]))
  vec <- data.frame(col_name = variable, unique_cnt=cnt)
  var_unique <- rbind(var_unique,vec)
}
var_unique

# contrast

(soyd <- summarise_all(Soybean,
                      ~length(unique(.))) %>% 
                        pivot_longer(cols=everything(),
                        names_to = "col_name",
                        values_to = "unique_cnt") %>% 
                        slice(2:5))

Thanks for your clarification and helping me to learn sth new.

Thanks for providing two solutions. The one suggested by you is quite elegant. Can you kindly let me know, is the second one less time consuming?

library(mlbench)
data("Soybean")
library(tidyverse)

microbenchmark::microbenchmark(
  first = {var_unique <- data.frame(col_name=character(0),unique_cnt=integer(0))
  col_names <- c(names(Soybean))
  cnt <- 0
  for (i in 2:5){
    variable <- col_names[i]
    cnt <- length(unique(Soybean[,variable]))
    vec <- data.frame(col_name = variable, unique_cnt=cnt)
    var_unique <- rbind(var_unique,vec)
  }},
  second = {soyd <- summarise_all(Soybean,
                                  ~length(unique(.))) %>% 
    pivot_longer(cols=everything(),
                 names_to = "col_name",
                 values_to = "unique_cnt") %>% 
    slice(2:5)},
  unit = "s"
)
Unit: seconds
   expr       min         lq       mean     median        uq       max neval
  first 0.0093508 0.01260855 0.01612717 0.01476915 0.0186775 0.0387230   100
 second 0.0207036 0.02757900 0.03348093 0.03274780 0.0385336 0.0876993   100

Appreciate your help.I learned sth new how to compare execution times.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.