Randomly Deleting Parts of a Row

omario · May 28, 2022, 5:54am

I am working with the R programming language. I have the following data frame:

id = 1:100
weight_time_1 = rnorm(100,100,10)
weight_time_2 = rnorm(100,100,10)
weight_time_3 = rnorm(100,100,10)
weight_time_4 = rnorm(100,100,10)
weight_time_5 = rnorm(100,100,10)
weight_time_6 = rnorm(100,100,10)
weight_time_7 = rnorm(100,100,10)
weight_time_8 = rnorm(100,100,10)
weight_time_9 = rnorm(100,100,10)
weight_time_10 = rnorm(100,100,10)
state_time_1 = sample.int(5, 100, replace = TRUE)
state_time_2 = sample.int(5, 100, replace = TRUE)
state_time_3 = sample.int(5, 100, replace = TRUE)
state_time_4 = sample.int(5, 100, replace = TRUE)
state_time_5 = sample.int(5, 100, replace = TRUE)
state_time_6 = sample.int(5, 100, replace = TRUE)
state_time_7 = sample.int(5, 100, replace = TRUE)
state_time_8 = sample.int(5, 100, replace = TRUE)
state_time_9 = sample.int(5, 100, replace = TRUE)
state_time_10 = sample.int(5, 100, replace = TRUE)


my_data = data.frame(id, weight_time_1, state_time_1, weight_time_2, state_time_2, weight_time_3, state_time_3, 
weight_time_4, state_time_4, weight_time_5, state_time_5, weight_time_6, state_time_6, weight_time_7, state_time_7, 
weight_time_8, state_time_8, weight_time_9, state_time_9, weight_time_10, state_time_10)

head(my_data)
  id weight_year_1 state_year_1 weight_year_2 state_year_2 weight_year_3 state_year_3 weight_year_4 state_year_4 weight_year_5 state_year_5 weight_year_6 state_year_6 weight_year_7 state_year_7 weight_year_8 state_year_8
1  1      119.3852            2     111.30729            5      99.11912            5      97.06366            1     103.73559            4     100.53940            3      90.98888            2      95.10628            3
2  2      124.5046            3      86.74208            4      96.87224            3      88.84019            2      92.39560            4      96.83324            3     108.60610            1      90.24227            3
3  3       98.3621            2     114.60002            1      91.61257            3     121.88707            2     103.78418            2      96.77586            2     103.58945            3     102.08050            3
4  4      102.8222            3      95.72920            5      92.51412            4     107.94097            4     105.07041            3     116.22625            1     100.52621            5     102.88718            1
5  5      114.0140            5      94.04442            2     112.10150            2     111.40825            4      90.93852            4      83.81637            3     118.08578            5      84.64170            3
6  6      113.0468            2      96.90621            1     102.99961            4      89.28867            1     107.19814            2      99.29141            1      79.91099            1     106.01940            1
  weight_year_9 state_year_9 weight_year_10 state_year_10
1     105.34245            5      106.61219             4
2      93.87486            4       95.14339             1
3      99.22730            1      108.46509             4
4      88.78866            1      114.68032             5
5      93.28602            5       91.50742             1
6     104.14194            4       98.67597             2

I want to randomly delete "parts of each row" from the left up until some column - this should look something like this ("red line" refers to deleted entries, e.g. replace with NA):

I thought of the following way to do this:

Step 1: First, randomly select which id's will be eligible to have deletions

 #1 = delete, 2 = no delete
id = 1:100
 delete_or_not_delete = sample.int(2, 100, replace = TRUE)
 deleted_ids = data.frame(id,delete_or_not_delete)

Step 2: For id's that were selected to be deleted, randomly pick how many columns to be deleted (e.g. excluding the "id" column, 2 = first 2 columns deleted, 4 = first 4 columns deleted, etc.)

col_delete = c(2,4,6,8,10, 12, 14, 16, 18)
col_delete = sample(col_delete, 100, replace = TRUE)
deleted_ids$col_delete = col_delete
deleted_ids$final_number_of_col_delete = ifelse(deleted_ids$delete_or_not_delete == "1", deleted_ids$col_delete, "NONE")
deleted_ids$col_delete = NULL
deleted_ids$delete_or_not_delete = NULL

In the end, I have something like this:

  id final_number_of_col_delete
1  1                       NONE
2  2                       NONE
3  3                         14
4  4                         14
5  5                         12
6  6                       NONE

Based on this file (deleted_ids), from "my_data" I would like to:

delete nothing from the row corresponding to id = 1
delete nothing from the row corresponding to id = 2
delete the first 14 columns (excluding the id column) from the row corresponding to id = 3
delete the first 14 columns (excluding the id column) from the row corresponding to id = 4
delete the first 12 columns (excluding the id column) from the row corresponding to id = 5
delete nothing from the row corresponding to id = 6
etc.

Can someone please show me how to do this?

Thanks!

Note: "Delete" here means "replace entries with NA".

Yarnabrina · May 28, 2022, 3:57pm

Here's a solution using plain for loop.

contains_missing_flag <- sample.int(2, 100, replace=TRUE)
col_delete <- c(2,4,6,8,10, 12, 14, 16, 18)
missing_elements_counts <- ifelse(contains_missing_flag == 2, 0, sample(col_delete, 100, replace = TRUE))
for (i in 1:100) {
  if (missing_elements_counts[i] >= 2) {
    my_data[i,] <- replace(my_data[i,], 2:missing_elements_counts[i], NA)
  }
}

system · June 4, 2022, 3:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.