I am working with the R programming language. I have the following data frame:
id = 1:100
weight_time_1 = rnorm(100,100,10)
weight_time_2 = rnorm(100,100,10)
weight_time_3 = rnorm(100,100,10)
weight_time_4 = rnorm(100,100,10)
weight_time_5 = rnorm(100,100,10)
weight_time_6 = rnorm(100,100,10)
weight_time_7 = rnorm(100,100,10)
weight_time_8 = rnorm(100,100,10)
weight_time_9 = rnorm(100,100,10)
weight_time_10 = rnorm(100,100,10)
state_time_1 = sample.int(5, 100, replace = TRUE)
state_time_2 = sample.int(5, 100, replace = TRUE)
state_time_3 = sample.int(5, 100, replace = TRUE)
state_time_4 = sample.int(5, 100, replace = TRUE)
state_time_5 = sample.int(5, 100, replace = TRUE)
state_time_6 = sample.int(5, 100, replace = TRUE)
state_time_7 = sample.int(5, 100, replace = TRUE)
state_time_8 = sample.int(5, 100, replace = TRUE)
state_time_9 = sample.int(5, 100, replace = TRUE)
state_time_10 = sample.int(5, 100, replace = TRUE)
my_data = data.frame(id, weight_time_1, state_time_1, weight_time_2, state_time_2, weight_time_3, state_time_3,
weight_time_4, state_time_4, weight_time_5, state_time_5, weight_time_6, state_time_6, weight_time_7, state_time_7,
weight_time_8, state_time_8, weight_time_9, state_time_9, weight_time_10, state_time_10)
head(my_data)
id weight_year_1 state_year_1 weight_year_2 state_year_2 weight_year_3 state_year_3 weight_year_4 state_year_4 weight_year_5 state_year_5 weight_year_6 state_year_6 weight_year_7 state_year_7 weight_year_8 state_year_8
1 1 119.3852 2 111.30729 5 99.11912 5 97.06366 1 103.73559 4 100.53940 3 90.98888 2 95.10628 3
2 2 124.5046 3 86.74208 4 96.87224 3 88.84019 2 92.39560 4 96.83324 3 108.60610 1 90.24227 3
3 3 98.3621 2 114.60002 1 91.61257 3 121.88707 2 103.78418 2 96.77586 2 103.58945 3 102.08050 3
4 4 102.8222 3 95.72920 5 92.51412 4 107.94097 4 105.07041 3 116.22625 1 100.52621 5 102.88718 1
5 5 114.0140 5 94.04442 2 112.10150 2 111.40825 4 90.93852 4 83.81637 3 118.08578 5 84.64170 3
6 6 113.0468 2 96.90621 1 102.99961 4 89.28867 1 107.19814 2 99.29141 1 79.91099 1 106.01940 1
weight_year_9 state_year_9 weight_year_10 state_year_10
1 105.34245 5 106.61219 4
2 93.87486 4 95.14339 1
3 99.22730 1 108.46509 4
4 88.78866 1 114.68032 5
5 93.28602 5 91.50742 1
6 104.14194 4 98.67597 2
I want to randomly delete "parts of each row" from the left up until some column - this should look something like this ("red line" refers to deleted entries, e.g. replace with NA):
I thought of the following way to do this:
Step 1: First, randomly select which id's will be eligible to have deletions
#1 = delete, 2 = no delete
id = 1:100
delete_or_not_delete = sample.int(2, 100, replace = TRUE)
deleted_ids = data.frame(id,delete_or_not_delete)
Step 2: For id's that were selected to be deleted, randomly pick how many columns to be deleted (e.g. excluding the "id" column, 2 = first 2 columns deleted, 4 = first 4 columns deleted, etc.)
col_delete = c(2,4,6,8,10, 12, 14, 16, 18)
col_delete = sample(col_delete, 100, replace = TRUE)
deleted_ids$col_delete = col_delete
deleted_ids$final_number_of_col_delete = ifelse(deleted_ids$delete_or_not_delete == "1", deleted_ids$col_delete, "NONE")
deleted_ids$col_delete = NULL
deleted_ids$delete_or_not_delete = NULL
In the end, I have something like this:
id final_number_of_col_delete
1 1 NONE
2 2 NONE
3 3 14
4 4 14
5 5 12
6 6 NONE
Based on this file (deleted_ids), from "my_data" I would like to:
- delete nothing from the row corresponding to id = 1
- delete nothing from the row corresponding to id = 2
- delete the first 14 columns (excluding the id column) from the row corresponding to id = 3
- delete the first 14 columns (excluding the id column) from the row corresponding to id = 4
- delete the first 12 columns (excluding the id column) from the row corresponding to id = 5
- delete nothing from the row corresponding to id = 6
- etc.
Can someone please show me how to do this?
Thanks!
Note: "Delete" here means "replace entries with NA".