Fixing a for loop or looking for another solution on creating a new column with values based on values from two other columns?

jubejube · June 17, 2020, 6:11am

Hello,

I tried creating a for loop that would find the minimum value of a column [column is named stimulus] within EACH group of another column [column named block, where there are 8 'blocks'] for each subject. I want to create a new column called blockprocedure where the minimum values for each 'block' would be the new value.

I created a mini dummy version of how I would like the data to look like:


subject <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
stimulus <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1, 1, 1),
block <- c(3, 3, 3, 7, 7, 7, 4, 4, 4, 8, 8, 8, 1, 1, 1, 5, 5, 5, 2, 2, 2, 6, 6, 6, 3, 3, 3, 7, 7, 7, 4, 4, 4, 8, 8, 8, 2, 2, 2, 6, 6, 6),
blockprocedure <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1),
stimtype <- c('bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm', 'bd', 'nd', 'nm')

dummy = data.frame(subject, stimulus, block, stimtype, blockprocedure)

As you can see for subject 1, stimulus '1' & '5' appeared in blocks 3 & 7. The minimum stimulus value was '1' for blocks 3 and 7, while for blocks 4 and 8, the minimum stimulus value was '2', therefore the blockprocedure values for blocks 3 and 7 should be '1' and blocks 4 and 8 should be '2':

I hope this all makes sense!

This was my attempt at making a for loop to achieve this:

data$blockprocedure = NA #create empty column for blockprocedure
for(row in 1:nrow(data)){
  blockval <- data[row, "block"] #1, 2, 3, 4, 5, 6, 7, 8
  subjval <- data[row, "subject"] #up to 22
  minval <- min(subset(data, subjval == "subject" && blockval == "block")$stimulus)#1, 2, 3, 4
  data[row, "blockprocedure"] <- minval
}

## for EACH subject, look at the minimum stimulus number of EACH block (block 1-8) (min stimulus value should be either 1, 2, 3 or 4) and that min stimulus value will determine the blockprocedure (corresponds to the block's min. stim value)
## therefore blocks 1-8 will be paired to either blockprocedure 1, 2, 3 or 4

However we receive an error:

Error: Assigned data `minval` must be compatible with existing data.
ℹ Error occurred for column `blockprocedure`.
x Lossy cast from `value` <double> to `x` <logical>.
Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In min(subset(data, subjval == "subject" && blockval == "block")$stimulus) :
  no non-missing arguments to min; returning Inf

If anyone has any insight on how to fix a for loop or if there is an alternative solution, it would be much appreciated!

nirgrahamuk · June 17, 2020, 7:57am

mins_df <- group_by(dummy,subject,block) %>% 
summarise(min_stimulus_per_subeject_per_block=min(stimulus))

result <- left_join(dummy,mins_df)

Leon · June 17, 2020, 8:35am

In essense similar to @nirgrahamuk's solution, but avoiding the join:

# Load libraries ----------------------------------------------------------
library("tidyverse")

# Define example data -----------------------------------------------------
my_data <- tibble(
  subject = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
              1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
              2, 2),
  stimulus = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,
               4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 1, 1, 1, 1,
               1, 1),
  block = c(3, 3, 3, 7, 7, 7, 4, 4, 4, 8, 8, 8, 1, 1, 1, 5, 5, 5, 2, 2, 2,
            6, 6, 6, 3, 3, 3, 7, 7, 7, 4, 4, 4, 8, 8, 8, 2, 2, 2, 6, 6, 6),
  blockprocedure = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,
                     4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
                     1, 1, 1, 1, 1, 1),
  stimtype = c("bd", "nd", "nm", "bd", "nd", "nm", "bd", "nd", "nm", "bd",
               "nd", "nm", "bd", "nd", "nm", "bd", "nd", "nm", "bd", "nd",
               "nm", "bd", "nd", "nm", "bd", "nd", "nm", "bd", "nd", "nm",
               "bd", "nd", "nm", "bd", "nd", "nm", "bd", "nd", "nm", "bd",
               "nd", "nm")
)

# Wrangle data ------------------------------------------------------------
my_data %>% 
  group_by(subject, block) %>% 
  mutate(min_subject_block_stim = min(stimulus)) %>% 
  ungroup

Hope it helps

...and then just a word-of-advice: As a point-of-reference if you start solving a data-wrangling problem with a for-loop, then stop and think - Most likely, there is a better solution

jubejube · June 17, 2020, 7:54pm

Thank you so much for all of your solutions!

@Leon thank you for the word of advice, this really was such a simple fix!

system · June 24, 2020, 7:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.