I have two dataframes. Applying the same dcast() function to the two get me different results in the output. Both the dataset have the same structure but different size. The first one has more than 950 rows:
> trans_matrix_complete
channel_from channel_to transition_probability
1 (start) MANAGER_SASWP 0.005154639
2 (start) GROUPDIRECTOR/CXO_LIVEWEBEXR 0.001030928
3 (start) GROUPDIRECTOR/CXO_SUG 0.011340206
4 (start) DIRECTOR_3RDLIVE 0.041237113
The code I apply is:
trans_matrix_complete <- mod_attrib$transition_matrix
is.factor(trans_matrix_complete$channel_from)
trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy)
is.vector(trans_matrix_complete$channel_from)
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to,
levels = c(levels(trans_matrix_complete$channel_to)))
trans_matrix_complete <- dcast(trans_matrix_complete,
channel_from ~ channel_to,
value.var = 'transition_probability')
And the trans_matrix_complete output I get is the following:
> dcast(trans_matrix_complete,
+ channel_from ~ channel_to,
+ value.var = 'transition_probability')
channel_from (conversion) (null) _3RDLIVE _3RDWP _AR _CHAT _CR
1: (start) NA NA 0.01134021 0.001030928 0.002061856 0.01649485 0.04845361
2: _3RDLIVE 0.6666667 0.06666667 NA NA NA NA 0.06666667
3: _3RDWP 0.3333333 0.33333333 NA NA NA NA NA
_CRSR _DMCR _EBOOK _EPCR _IC _OOTR _OTHR _PEV _SASCON
1: 0.001030928 0.001030928 NA 0.001030928 0.009278351 0.004123711 NA 0.001030928 NA
2: NA 0.066666667 NA NA NA NA NA NA NA
3: NA NA NA NA NA NA NA NA NA
Something is not working as it should be as with the smaller dataframe of just few lines I get the following outcome:
> trans_matrix_complete
channel_from (start) channel_0 channel_1 channel_3 channel_4 channel_5 channel_7 (conversion)
1 (start) 0 0.2299571 0.1409748 0.08477536 0.1663756 0.2863153 0.09160184 NA
2 channel_0 NA NA 0.1399532 0.08181362 0.1766773 0.2748871 0.09277229 0.03003179
3 channel_1 NA 0.2025543 NA 0.07164751 0.1547893 0.2656450 0.08301405 0.02707535
4 channel_3 NA 0.1995104 0.1226030 NA 0.1476948 0.2443900 0.07343941 0.03080375
5 channel_4 NA 0.2196648 0.1231438 0.07392872 NA 0.2734408 0.08305049 0.03277471
6 channel_5 NA 0.2355895 0.1392602 0.08586180 0.1793620 NA 0.09703657 0.03550463
7 channel_7 NA 0.2009948 0.1197494 0.07074429 0.1378040 0.2560796 NA 0.03021371
8 (conversion) NA NA NA NA NA NA NA 1.00000000
9 (null) NA NA NA NA NA NA NA NA
(null)
1 NA
2 0.2038648
3 0.1952746
4 0.1815585
5 0.1939966
6 0.2273852
7 0.1844141
8 NA
9 1.0000000
Where
a) the row number is different. I'm not sure why there are two dots listed in the first case
b) and too, trying to assign rownames to the dataframe by
row.names(trans_matrix_complete) <- trans_matrix_complete$channel_from
does not work for the large dataframe, as despite the row.names contact the dataframe show up exactly as in the first image, without names assigned to rows.
Any idea about this weird behavior?