dcast() different outcomes

I have two dataframes. Applying the same dcast() function to the two get me different results in the output. Both the dataset have the same structure but different size. The first one has more than 950 rows:

> trans_matrix_complete
                    channel_from                   channel_to transition_probability
1                        (start)                MANAGER_SASWP            0.005154639
2                        (start) GROUPDIRECTOR/CXO_LIVEWEBEXR            0.001030928
3                        (start)        GROUPDIRECTOR/CXO_SUG            0.011340206
4                        (start)             DIRECTOR_3RDLIVE            0.041237113

The code I apply is:

trans_matrix_complete <- mod_attrib$transition_matrix

is.factor(trans_matrix_complete$channel_from)

trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy) 

is.vector(trans_matrix_complete$channel_from)

trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to,
                             levels = c(levels(trans_matrix_complete$channel_to)))

trans_matrix_complete <- dcast(trans_matrix_complete,
                               channel_from ~ channel_to,
                               value.var = 'transition_probability')

And the trans_matrix_complete output I get is the following:

> dcast(trans_matrix_complete,
+                                channel_from ~ channel_to,
+                                value.var = 'transition_probability')
        channel_from (conversion)     (null)   _3RDLIVE      _3RDWP         _AR      _CHAT        _CR
  1:         (start)           NA         NA 0.01134021 0.001030928 0.002061856 0.01649485 0.04845361
  2:        _3RDLIVE    0.6666667 0.06666667         NA          NA          NA         NA 0.06666667
  3:          _3RDWP    0.3333333 0.33333333         NA          NA          NA         NA         NA
           _CRSR       _DMCR _EBOOK       _EPCR         _IC       _OOTR _OTHR        _PEV _SASCON
  1: 0.001030928 0.001030928     NA 0.001030928 0.009278351 0.004123711    NA 0.001030928      NA
  2:          NA 0.066666667     NA          NA          NA          NA    NA          NA      NA
  3:          NA          NA     NA          NA          NA          NA    NA          NA      NA

Something is not working as it should be as with the smaller dataframe of just few lines I get the following outcome:

> trans_matrix_complete
  channel_from (start) channel_0 channel_1  channel_3 channel_4 channel_5  channel_7 (conversion)
1      (start)       0 0.2299571 0.1409748 0.08477536 0.1663756 0.2863153 0.09160184           NA
2    channel_0      NA        NA 0.1399532 0.08181362 0.1766773 0.2748871 0.09277229   0.03003179
3    channel_1      NA 0.2025543        NA 0.07164751 0.1547893 0.2656450 0.08301405   0.02707535
4    channel_3      NA 0.1995104 0.1226030         NA 0.1476948 0.2443900 0.07343941   0.03080375
5    channel_4      NA 0.2196648 0.1231438 0.07392872        NA 0.2734408 0.08305049   0.03277471
6    channel_5      NA 0.2355895 0.1392602 0.08586180 0.1793620        NA 0.09703657   0.03550463
7    channel_7      NA 0.2009948 0.1197494 0.07074429 0.1378040 0.2560796         NA   0.03021371
8 (conversion)      NA        NA        NA         NA        NA        NA         NA   1.00000000
9       (null)      NA        NA        NA         NA        NA        NA         NA           NA
     (null)
1        NA
2 0.2038648
3 0.1952746
4 0.1815585
5 0.1939966
6 0.2273852
7 0.1844141
8        NA
9 1.0000000

Where

a) the row number is different. I'm not sure why there are two dots listed in the first case

b) and too, trying to assign rownames to the dataframe by

row.names(trans_matrix_complete) <- trans_matrix_complete$channel_from

does not work for the large dataframe, as despite the row.names contact the dataframe show up exactly as in the first image, without names assigned to rows.

Any idea about this weird behavior?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.