Dear all,
I have already read some similar posts in here but my case has not been addressed here, I guess.
I have this dataset which has many duplicates. This data comes from two subjects/Filenames
each has 3 different word
. I want to end up with two rows for each word
within subjects/Filenames
based on the unduplicated value in m1:vowel_dur
.
For example, all words extends to 10 rows because of duplicates. The duplicated values occur in two columns m1
and intensity_onset
.
I want to delete any duplicated value with no corresponding (numeric) values in the same row. That is, for the word "3aadil", there are two duplicates in m1
, and I want to delete the second duplicate because it stands alone in the row (with no corresponding value as opposed to to the first duplicate).
The same is true for the intensity_onset
, I want to delete the second duplicate because it stands alone in the row. For some words such as "3uud" and "faatiH", the duplicated values occur first.
ID Filename word position Label m1 intensity_onset ZCrossing_f Standard.deviation..Hz._f fric_dur vowel_dur
<int> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 didi 3aadil ini 3- 23.4 NA NA NA 40 NA
2 2 didi 3aadil ini 3- 23.4 NA NA NA NA NA
3 3 didi 3aadil ini 3- 23.9 NA NA NA 50 NA
4 4 didi 3aadil ini 3- 23.9 NA NA NA NA NA
5 5 didi 3aadil ini a: NA 67.12023408 NA NA NA 70
6 6 didi 3aadil ini a: NA 67.12023408 NA NA NA NA
7 7 didi 3aadil ini a: NA 66.44877898 NA NA NA 65.4
8 8 didi 3aadil ini a: NA 66.44877898 NA NA NA NA
9 9 didi 3aadil ini 3- NA NA 15.5 392. NA NA
10 10 didi 3aadil ini 3- NA NA 15.6 542. NA NA
11 1 didi 3uud ini 3- 18.1 NA NA NA 59.8 NA
12 2 didi 3uud ini 3- 18.1 NA NA NA NA NA
13 3 didi 3uud ini 3- 21.4 NA NA NA 55.0 NA
14 4 didi 3uud ini 3- 21.4 NA NA NA NA NA
15 5 didi 3uud ini u: NA 64.32472991 NA NA NA NA
16 6 didi 3uud ini u: NA 64.32472991 NA NA NA 152.
17 7 didi 3uud ini u: NA 62.05730297 NA NA NA NA
18 8 didi 3uud ini u: NA 62.05730297 NA NA NA 147.
19 9 didi 3uud ini 3- NA NA 27.9 901. NA NA
20 10 didi 3uud ini 3- NA NA 26.2 558. NA NA
21 1 didi faatiH ini f 7.70 NA NA NA NA NA
22 2 didi faatiH ini f 7.70 NA NA NA 100 NA
23 3 didi faatiH ini f 7.40 NA NA NA NA NA
24 4 didi faatiH ini f 7.40 NA NA NA 70 NA
25 5 didi faatiH ini a: NA 64.89497288 NA NA NA 110
26 6 didi faatiH ini a: NA 64.89497288 NA NA NA NA
27 7 didi faatiH ini a: NA 64.45085401 NA NA NA 90.0
28 8 didi faatiH ini a: NA 64.45085401 NA NA NA NA
29 9 didi faatiH ini f NA NA 121. 3417. NA NA
30 10 didi faatiH ini f NA NA 137. 3269. NA NA
After deleting the duplicates I assume the data (for one subject) would look something like the following:
ID Filename word position Label m1 intensity_onset ZCrossing_f Standard.deviation..Hz._f fric_dur vowel_dur
1 1 didi 3aadil ini 3- 23.4 NA NA NA 40 NA
2 3 didi 3aadil ini 3- 23.9 NA NA NA 50 NA
3 5 didi 3aadil ini a: NA 67.12023408 NA NA NA 70
4 7 didi 3aadil ini a: NA 66.44877898 NA NA NA 65.4
5 9 didi 3aadil ini 3- NA NA 15.5 392. NA NA
6 10 didi 3aadil ini 3- NA NA 15.6 542. NA NA
7 1 didi 3uud ini 3- 18.1 NA NA NA 59.8 NA
8 3 didi 3uud ini 3- 21.4 NA NA NA 55.0 NA
9 6 didi 3uud ini u: NA 64.32472991 NA NA NA 152.
10 8 didi 3uud ini u: NA 62.05730297 NA NA NA 147.
11 9 didi 3uud ini 3- NA NA 27.9 901. NA NA
12 10 didi 3uud ini 3- NA NA 26.2 558. NA NA
13 2 didi faatiH ini f 7.70 NA NA NA 100 NA
14 4 didi faatiH ini f 7.40 NA NA NA 70 NA
15 5 didi faatiH ini a: NA 64.89497288 NA NA NA 110
16 7 didi faatiH ini a: NA 64.45085401 NA NA NA 90.0
17 9 didi faatiH ini f NA NA 121. 3417. NA NA
18 10 didi faatiH ini f NA NA 137. 3269. NA NA
However, I now want to group the data by Filename
, position
, word
and combine the rows. The output would be similar to this. Label
within the grouping variables is less relevant to me; it does not matter which one to include as long as I end up with similar output.
ID Filename word position Label m1 intensity_onset ZCrossing_f Standard.deviation..Hz._f fric_dur vowel_dur
1 didi 3aadil ini 3- 23.4 67.12023408 15.5 392. 40 70
2 didi 3aadil ini 3- 23.9 66.44877898 15.6 542. 50 65.4
1 didi 3uud ini 3- 18.1 64.32472991 27.9 901. 59.8 152.
2 didi 3uud ini 3- 21.4 62.05730297 26.2 558. 55.0 147.
1 didi faatiH ini f 7.70 64.89497288 121. 3417. 100 110
2 didi faatiH ini f 7.40 64.45085401 137. 3269. 70 90.0
Here is the data
structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L), Filename = c("didi", "didi", "didi", "didi",
"didi", "didi", "didi", "didi", "didi", "didi", "didi", "didi",
"didi", "didi", "didi", "didi", "didi", "didi", "didi", "didi",
"didi", "didi", "didi", "didi", "didi", "didi", "didi", "didi",
"didi", "didi", "me", "me", "me", "me", "me", "me", "me", "me",
"me", "me", "me", "me", "me", "me", "me", "me", "me", "me", "me",
"me", "me", "me", "me", "me", "me", "me", "me", "me", "me", "me"
), word = c("3aadil", "3aadil", "3aadil", "3aadil", "3aadil",
"3aadil", "3aadil", "3aadil", "3aadil", "3aadil", "3uud", "3uud",
"3uud", "3uud", "3uud", "3uud", "3uud", "3uud", "3uud", "3uud",
"faatiH", "faatiH", "faatiH", "faatiH", "faatiH", "faatiH", "faatiH",
"faatiH", "faatiH", "faatiH", "3aadil", "3aadil", "3aadil", "3aadil",
"3aadil", "3aadil", "3aadil", "3aadil", "3aadil", "3aadil", "3uud",
"3uud", "3uud", "3uud", "3uud", "3uud", "3uud", "3uud", "3uud",
"3uud", "faatiH", "faatiH", "faatiH", "faatiH", "faatiH", "faatiH",
"faatiH", "faatiH", "faatiH", "faatiH"), position = c("ini",
"ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini",
"ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini",
"ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini",
"ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini",
"ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini",
"ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini", "ini",
"ini", "ini", "ini", "ini", "ini"), Label = c("3-", "3-", "3-",
"3-", "a:", "a:", "a:", "a:", "3-", "3-", "3-", "3-", "3-", "3-",
"u:", "u:", "u:", "u:", "3-", "3-", "f", "f", "f", "f", "a:",
"a:", "a:", "a:", "f", "f", "3-", "3-", "3-", "3-", "a:", "a:",
"a:", "a:", "3-", "3-", "3-", "3-", "3-", "3-", "u:", "u:", "u:",
"u:", "3-", "3-", "f", "f", "f", "f", "a:", "a:", "a:", "a:",
"f", "f"), m1 = c(23.384, 23.384, 23.8514444444444, 23.8514444444444,
NA, NA, NA, NA, NA, NA, 18.0954761904762, 18.0954761904762, 21.4317894736842,
21.4317894736842, NA, NA, NA, NA, NA, NA, 7.70211428571429, 7.70211428571429,
7.39548, 7.39548, NA, NA, NA, NA, NA, NA, 34.5868076923077, 34.5868076923077,
30.7383076923077, 30.7383076923077, NA, NA, NA, NA, NA, NA, 15.5986666666667,
15.5986666666667, 18.5721428571429, 18.5721428571429, NA, NA,
NA, NA, NA, NA, 5.23757142857143, 5.23757142857143, 1.79862068965517,
1.79862068965517, NA, NA, NA, NA, NA, NA), intensity_onset = c(NA,
NA, NA, NA, "67.12023408", "67.12023408", "66.44877898", "66.44877898",
NA, NA, NA, NA, NA, NA, "64.32472991", "64.32472991", "62.05730297",
"62.05730297", NA, NA, NA, NA, NA, NA, "64.89497288", "64.89497288",
"64.45085401", "64.45085401", NA, NA, NA, NA, NA, NA, "69.96847194",
"69.96847194", "68.92231773", "68.92231773", NA, NA, NA, NA,
NA, NA, "66.19878876", "66.19878876", "65.94411944", "65.94411944",
NA, NA, NA, NA, NA, NA, "62.94039775", "62.94039775", "66.36752717",
"66.36752717", NA, NA), ZCrossing_f = c(NA, NA, NA, NA, NA, NA,
NA, NA, 15.5, 15.6, NA, NA, NA, NA, NA, NA, NA, NA, 27.94, 26.19,
NA, NA, NA, NA, NA, NA, NA, NA, 120.6, 136.71, NA, NA, NA, NA,
NA, NA, NA, NA, 26.36, 25.74, NA, NA, NA, NA, NA, NA, NA, NA,
25.6, 28.5, NA, NA, NA, NA, NA, NA, NA, NA, 149, 167.07), Standard.deviation..Hz._f = c(NA,
NA, NA, NA, NA, NA, NA, NA, 391.8329, 542.3985, NA, NA, NA, NA,
NA, NA, NA, NA, 901.3963, 558.205, NA, NA, NA, NA, NA, NA, NA,
NA, 3417.1561, 3269.0159, NA, NA, NA, NA, NA, NA, NA, NA, 513.2463,
507.5912, NA, NA, NA, NA, NA, NA, NA, NA, 483.0832, 472.0597,
NA, NA, NA, NA, NA, NA, NA, NA, 3149.5155, 3022.7919), fric_dur = c(40,
NA, 50, NA, NA, NA, NA, NA, NA, NA, 59.7810000000172, NA, 54.9839999999967,
NA, NA, NA, NA, NA, NA, NA, NA, 100, NA, 70, NA, NA, NA, NA,
NA, NA, 75.122000000003, NA, 74.9919999999984, NA, NA, NA, NA,
NA, NA, NA, 50, NA, 40, NA, NA, NA, NA, NA, NA, NA, NA, 80, NA,
84.695000000007, NA, NA, NA, NA, NA, NA), vowel_dur = c(NA, NA,
NA, NA, 70, NA, 65.36107601, NA, NA, NA, NA, NA, NA, NA, NA,
152.2170884, NA, 147.4494709, NA, NA, NA, NA, NA, NA, 110, NA,
89.98453674, NA, NA, NA, NA, NA, NA, NA, 94.87799913, NA, 85.00816518,
NA, NA, NA, NA, NA, NA, NA, NA, 80, NA, 100, NA, NA, NA, NA,
NA, NA, 102.0144636, NA, 97.79336994, NA, NA, NA)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -60L), groups = structure(list(
Filename = c("didi", "didi", "didi", "didi", "didi", "didi",
"didi", "didi", "didi", "me", "me", "me", "me", "me", "me",
"me", "me", "me"), word = c("3aadil", "3aadil", "3aadil",
"3uud", "3uud", "3uud", "faatiH", "faatiH", "faatiH", "3aadil",
"3aadil", "3aadil", "3uud", "3uud", "3uud", "faatiH", "faatiH",
"faatiH"), m1 = c(23.384, 23.8514444444444, NA, 18.0954761904762,
21.4317894736842, NA, 7.39548, 7.70211428571429, NA, 30.7383076923077,
34.5868076923077, NA, 15.5986666666667, 18.5721428571429,
NA, 1.79862068965517, 5.23757142857143, NA), .rows = structure(list(
1:2, 3:4, 5:10, 11:12, 13:14, 15:20, 23:24, 21:22, 25:30,
33:34, 31:32, 35:40, 41:42, 43:44, 45:50, 53:54, 51:52,
55:60), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
Thank you for your time, patience, and help in advance!