Issues in ranking a numeric column in R dataframe

Hi,

I am seeking to replicate all the operations using R code since I will be working with a large data. I have provided an example with an data chunk. Briefly, In this, column A contains various symbols, columns labeled "CD" and "CK," represent the primary numeric respectively (Input data), then in the (Expected data) these are then floored if value < 1 to 1. The floor function is implemented using the following formulas in MS Excel: [Floor_CD =IF(B2<1,1,B2)] and [Floor_CK=IF(D2<1,1,D2)]. Subsequently, column F is created to capture the difference between CK and CD (DIFF CK-CD), computed as E2-C2. Column G is then ranked based on the differences in column F using the formula [Rank Diff =RANK(F2,$F$2:$F$10)]. Next, column H (FC CK ov CD) is determined using the formula [FC CK ov CD =IF(E2>C2,E2/C2,-1/(E2/C2))]. Following this, column "I" is ranked based on column H using the formula [Rank FC=RANK(H2,$H$2:$H$10)]. Finally, column "J" is calculated as the sum of columns I and G (overall rank) [overall rank =SUM(I2,G2)].

Apologies if the description seems a bit confusing, in summary, I am looking to get my result same as Expected data, but after I run R code, I see especially the Rank_Diff and Rank_FC columns did not rank as I expected and output that I obtained from MS Excel.

Input data:

#>    Symbol        CD         CK
#> 1     ENG 26586.003 150571.165
#> 2   ITGA4     0.200      0.300
#> 3   PTPRK    14.500   1183.333
#> 4  P2RY14  4991.333  17288.500
#> 5    LRP1  4991.333  17288.500
#> 6    GPC1     0.500  85246.833
#> 7   CXCR3   101.000    100.000
#> 8  LILRA6   150.000      0.000
#> 9 LincRNA     0.000      0.000

Expected data:

#>    Symbol        CD  Floor_CD         CK   Floor_CK DIFF CK-CD Rank Diff
#> 1     ENG 26586.003 26586.003 150571.165 150571.165 123985.162         1
#> 2   ITGA4     0.200     1.000      0.300      1.000      0.000         6
#> 3   PTPRK    14.500    14.500   1183.333   1183.333   1168.833         5
#> 4  P2RY14  4991.333  4991.333  17288.500  17288.500  12297.167         3
#> 5    LRP1  4991.333  4991.333  17288.500  17288.500  12297.167         3
#> 6    GPC1     0.500     1.000  85246.833  85246.833  85245.833         2
#> 7   CXCR3   101.000   101.000    100.000    100.000     -1.000         8
#> 8  LILRA6   150.000   150.000      0.000      1.000   -149.000         9
#> 9 LincRNA     0.000     1.000      0.000      1.000      0.000         6
#>    FC CK ov CD Rank FC Overall_Rank
#> 1     5.663550       3            4
#> 2    -1.000000       6           12
#> 3    81.609195       2            7
#> 4     3.463704       4            7
#> 5     3.463704       4            7
#> 6 85246.833300       1            3
#> 7    -1.010000       8           16
#> 8  -150.000000       9           18
#> 9    -1.000000       6           12

R code:


library(dplyr)
library(readxl)

# Read the Excel file into a data frame
df <- read_excel("./LR_Calculations_Template.xlsx", sheet = 2)

# Perform calculations similar to Excel operations
# Column C: Floor_CD
df$Floor_CD <- ifelse(df$CD < 1, 1, df$CD)


# Column E: Floor_CK
df$Floor_CK <- ifelse(df$CK < 1, 1, df$CK)

# Column F: DIFF CK-CD
df$DIFF_CK_CD <- df$Floor_CK - df$Floor_CD

# Column G: Rank Diff
df$Rank_Diff <- rank(df$DIFF_CK_CD, ties.method = "min")

# Column H: FC CK ov CD
df$FC_CK_ov_CD <- ifelse(df$Floor_CK > df$Floor_CD, df$Floor_CK / df$Floor_CD, -1 / (df$Floor_CK / df$Floor_CD))

# Column I: Rank FC
df$Rank_FC <- rank(df$FC_CK_ov_CD, ties.method = "min")

# Column J: Overall Rank
df$Overall_Rank <- df$Rank_Diff + df$Rank_FC

# Print the resulting data frame
print(df)

Output from R Code:

#>    Symbol        CD         CK  Floor_CD   Floor_CK DIFF_CK_CD Rank_Diff
#> 1     ENG 26586.003 150571.165 26586.003 150571.165 123985.162         9
#> 2   ITGA4     0.200      0.300     1.000      1.000      0.000         3
#> 3   PTPRK    14.500   1183.333    14.500   1183.333   1168.833         5
#> 4  P2RY14  4991.333  17288.500  4991.333  17288.500  12297.167         6
#> 5    LRP1  4991.333  17288.500  4991.333  17288.500  12297.167         6
#> 6    GPC1     0.500  85246.833     1.000  85246.833  85245.833         8
#> 7   CXCR3   101.000    100.000   101.000    100.000     -1.000         2
#> 8  LILRA6   150.000      0.000   150.000      1.000   -149.000         1
#> 9 LincRNA     0.000      0.000     1.000      1.000      0.000         3
#>    FC_CK_ov_CD Rank_FC Overall_Rank
#> 1     5.663550       7           16
#> 2    -1.000000       3            6
#> 3    81.609195       8           13
#> 4     3.463704       5           11
#> 5     3.463704       5           11
#> 6 85246.833300       9           17
#> 7    -1.010000       2            4
#> 8  -150.000000       1            2
#> 9    -1.000000       3            6

Created on 2023-12-26 with reprex v2.0.2

Best Regards,
Toufiq

Coded as min, shown as max

Try using the min_rank() function from dplyr.

library(dplyr)

DF <- read.csv("~/R/Play/Dummy.csv")
DF
#>    Symbol        CD         CK
#> 1     ENG 26586.003 150571.165
#> 2   ITGA4     0.200      0.300
#> 3   PTPRK    14.500   1183.333
#> 4  P2RY14  4991.333  17288.500
#> 5    LRP1  4991.333  17288.500
#> 6    GPC1     0.500  85246.833
#> 7   CXCR3   101.000    100.000
#> 8  LILRA6   150.000      0.000
#> 9 LincRNA     0.000      0.000
DF |> mutate(CD_Flr = ifelse(CD < 1, 1, CD),
             CK_Flr = ifelse(CK < 1, 1, CK),
             Diff_CK_CD = CK_Flr - CD_Flr,
             Rank_Diff = min_rank(desc(Diff_CK_CD)),
             CK_ov_CD = ifelse(CK_Flr > CD_Flr, CK_Flr/CD_Flr, -1/(CK_Flr/CD_Flr)),
             Rank_FC = min_rank(desc(CK_ov_CD)),
             Overall_rank = Rank_Diff + Rank_FC) 
#>    Symbol        CD         CK    CD_Flr     CK_Flr Diff_CK_CD Rank_Diff
#> 1     ENG 26586.003 150571.165 26586.003 150571.165 123985.162         1
#> 2   ITGA4     0.200      0.300     1.000      1.000      0.000         6
#> 3   PTPRK    14.500   1183.333    14.500   1183.333   1168.833         5
#> 4  P2RY14  4991.333  17288.500  4991.333  17288.500  12297.167         3
#> 5    LRP1  4991.333  17288.500  4991.333  17288.500  12297.167         3
#> 6    GPC1     0.500  85246.833     1.000  85246.833  85245.833         2
#> 7   CXCR3   101.000    100.000   101.000    100.000     -1.000         8
#> 8  LILRA6   150.000      0.000   150.000      1.000   -149.000         9
#> 9 LincRNA     0.000      0.000     1.000      1.000      0.000         6
#>       CK_ov_CD Rank_FC Overall_rank
#> 1     5.663550       3            4
#> 2    -1.000000       6           12
#> 3    81.609172       2            7
#> 4     3.463704       4            7
#> 5     3.463704       4            7
#> 6 85246.833000       1            3
#> 7    -1.010000       8           16
#> 8  -150.000000       9           18
#> 9    -1.000000       6           12

Created on 2023-12-26 with reprex v2.0.2

1 Like

@FJCC Perfect, thank you very much.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.