Hi everyone.
I have a large dataset which looks like TABLE 1.
Source refers to data sources. There are two data sources. The letters are individuals.
For each data source, individuals can report different codes. There are n no. of codes.
And I want to produce TABLE 2 where:
Code refers to the all the codes reported on Code_1 and Code_2 columns.
Count_in_Code_1 refers to the number of persons reporting each code under Code_1 column. For example Code 100 was reported by A and E. Hence, there are 2 counts. But only A appeared on both sources who reported Code 100. Hence, under the Match_Sources column, there is only 1 count for Code 100.
Similarly, Code 102 was reported by C,E, F and G under Code_1 column (hence the 4 in Table 2). But only C, F and G appeared on both sources who reported Code 102.
Really appreciate your help on this.
Yoyong
TABLE 1
Source_1 | Code_1 | Source_2 | Code_2 |
---|---|---|---|
A | 100;101 | A | 100;101 |
B | B | 100;102 | |
C | 101;102;103 | C | 101 |
D | 103;104 | D | 102;103;104 |
E | 100;102 | E | |
F | 101;102 | F | 101;102;103;104 |
G | 102;103;104 | G | 102;103;105 |
TABLE 2
Code | Count_in_Code_1 | Match_Sources |
---|---|---|
100 | 2 | 1 |
101 | 3 | 3 |
102 | 4 | 3 |
103 | 3 | 2 |
104 | 2 | 1 |
105 | 0 | 0 |