Hi,
I am seeking to replicate all the operations using R code since I will be working with a large data. I have provided an example with an data chunk. Briefly, In this, column A contains various symbols, columns labeled "CD" and "CK," represent the primary numeric respectively (Input data
), then in the (Expected data
) these are then floored if value < 1 to 1. The floor function is implemented using the following formulas in MS Excel: [Floor_CD =IF(B2<1,1,B2)] and [Floor_CK=IF(D2<1,1,D2)]. Subsequently, column F is created to capture the difference between CK and CD (DIFF CK-CD), computed as E2-C2. Column G is then ranked based on the differences in column F using the formula [Rank Diff =RANK(F2,$F$2:$F$10)]. Next, column H (FC CK ov CD) is determined using the formula [FC CK ov CD =IF(E2>C2,E2/C2,-1/(E2/C2))]. Following this, column "I" is ranked based on column H using the formula [Rank FC=RANK(H2,$H$2:$H$10)]. Finally, column "J" is calculated as the sum of columns I and G (overall rank) [overall rank =SUM(I2,G2)].
Apologies if the description seems a bit confusing, in summary, I am looking to get my result same as Expected data
, but after I run R code, I see especially the Rank_Diff
and Rank_FC
columns did not rank as I expected and output that I obtained from MS Excel.
Input data:
#> Symbol CD CK
#> 1 ENG 26586.003 150571.165
#> 2 ITGA4 0.200 0.300
#> 3 PTPRK 14.500 1183.333
#> 4 P2RY14 4991.333 17288.500
#> 5 LRP1 4991.333 17288.500
#> 6 GPC1 0.500 85246.833
#> 7 CXCR3 101.000 100.000
#> 8 LILRA6 150.000 0.000
#> 9 LincRNA 0.000 0.000
Expected data:
#> Symbol CD Floor_CD CK Floor_CK DIFF CK-CD Rank Diff
#> 1 ENG 26586.003 26586.003 150571.165 150571.165 123985.162 1
#> 2 ITGA4 0.200 1.000 0.300 1.000 0.000 6
#> 3 PTPRK 14.500 14.500 1183.333 1183.333 1168.833 5
#> 4 P2RY14 4991.333 4991.333 17288.500 17288.500 12297.167 3
#> 5 LRP1 4991.333 4991.333 17288.500 17288.500 12297.167 3
#> 6 GPC1 0.500 1.000 85246.833 85246.833 85245.833 2
#> 7 CXCR3 101.000 101.000 100.000 100.000 -1.000 8
#> 8 LILRA6 150.000 150.000 0.000 1.000 -149.000 9
#> 9 LincRNA 0.000 1.000 0.000 1.000 0.000 6
#> FC CK ov CD Rank FC Overall_Rank
#> 1 5.663550 3 4
#> 2 -1.000000 6 12
#> 3 81.609195 2 7
#> 4 3.463704 4 7
#> 5 3.463704 4 7
#> 6 85246.833300 1 3
#> 7 -1.010000 8 16
#> 8 -150.000000 9 18
#> 9 -1.000000 6 12
R code:
library(dplyr)
library(readxl)
# Read the Excel file into a data frame
df <- read_excel("./LR_Calculations_Template.xlsx", sheet = 2)
# Perform calculations similar to Excel operations
# Column C: Floor_CD
df$Floor_CD <- ifelse(df$CD < 1, 1, df$CD)
# Column E: Floor_CK
df$Floor_CK <- ifelse(df$CK < 1, 1, df$CK)
# Column F: DIFF CK-CD
df$DIFF_CK_CD <- df$Floor_CK - df$Floor_CD
# Column G: Rank Diff
df$Rank_Diff <- rank(df$DIFF_CK_CD, ties.method = "min")
# Column H: FC CK ov CD
df$FC_CK_ov_CD <- ifelse(df$Floor_CK > df$Floor_CD, df$Floor_CK / df$Floor_CD, -1 / (df$Floor_CK / df$Floor_CD))
# Column I: Rank FC
df$Rank_FC <- rank(df$FC_CK_ov_CD, ties.method = "min")
# Column J: Overall Rank
df$Overall_Rank <- df$Rank_Diff + df$Rank_FC
# Print the resulting data frame
print(df)
Output from R Code:
#> Symbol CD CK Floor_CD Floor_CK DIFF_CK_CD Rank_Diff
#> 1 ENG 26586.003 150571.165 26586.003 150571.165 123985.162 9
#> 2 ITGA4 0.200 0.300 1.000 1.000 0.000 3
#> 3 PTPRK 14.500 1183.333 14.500 1183.333 1168.833 5
#> 4 P2RY14 4991.333 17288.500 4991.333 17288.500 12297.167 6
#> 5 LRP1 4991.333 17288.500 4991.333 17288.500 12297.167 6
#> 6 GPC1 0.500 85246.833 1.000 85246.833 85245.833 8
#> 7 CXCR3 101.000 100.000 101.000 100.000 -1.000 2
#> 8 LILRA6 150.000 0.000 150.000 1.000 -149.000 1
#> 9 LincRNA 0.000 0.000 1.000 1.000 0.000 3
#> FC_CK_ov_CD Rank_FC Overall_Rank
#> 1 5.663550 7 16
#> 2 -1.000000 3 6
#> 3 81.609195 8 13
#> 4 3.463704 5 11
#> 5 3.463704 5 11
#> 6 85246.833300 9 17
#> 7 -1.010000 2 4
#> 8 -150.000000 1 2
#> 9 -1.000000 3 6
Created on 2023-12-26 with reprex v2.0.2
Best Regards,
Toufiq