Creating COUNTIF equivalent in R

Forrestgump · September 20, 2019, 1:55pm

Hi there,

I have 2 columns in my data frame and I am trying to create a third column called 'ManagerCount'. In excel I can calculate using the Countif function using the column as the range and the cell as the criteria. I am to achieve this in rstudio however. Example below:-

Employee ID

EmployeeID	     ManagerID	  ManagerCount
80001	         NA	          1
81928	         54629	      1
76524	         54629   	  0
54629	         80001	      2
14973	         81928	      0

Any help would be greatly appreciated.

Regards,

Forrestgump

FJCC · September 20, 2019, 2:00pm

Perhaps I am just slow this morning but I do not understand the logic of the ManagerCount column. What determines the values of 1, 0, and 2 that appear?

Forrestgump · September 20, 2019, 2:12pm

Hi, It counts how many times the code in the 'EmployeeID' column appears in the 'ManagerID' column. So 80001 appears once in the ManagerID column, 54629 appears 2 times, 76524 and 14973 do not appear i.e. 0.

Regards,

Forrestgump

FJCC · September 20, 2019, 2:56pm

Getting the result of how many times each ID appears in the ManagerID column is easy. Making that a new column in the original data frame is a little clunky because the counts of zero appear as NA after the left_join. It may be that the data frame DF2 is all you need or it may be there is a more elegant solution.

suppressPackageStartupMessages(library(dplyr))

DF <- data.frame(EmployeeID = c("80001", "81928", "76524", "54629", "14973"),
                 ManagerID = c(NA, "54629", "54629", "80001", "81928"))

DF2 <- DF %>% group_by(ManagerID) %>% summarize(ManagerCount = n())

DF <- left_join(DF, DF2, by = c("EmployeeID" = "ManagerID")) %>% 
  mutate(ManagerCount = ifelse(is.na(ManagerCount), 0, ManagerCount))
#> Warning: Column `EmployeeID`/`ManagerID` joining factors with different
#> levels, coercing to character vector
DF
#>   EmployeeID ManagerID ManagerCount
#> 1      80001      <NA>            1
#> 2      81928     54629            1
#> 3      76524     54629            0
#> 4      54629     80001            2
#> 5      14973     81928            0

^{Created on 2019-09-20 by the reprex package (v0.2.1)}

Forrestgump · September 21, 2019, 8:39pm

I am using the code below to count how many time the id appears in the manager_id column :-

  mutate(ManagerCount = map_int(.x = WpA_Rscript$id,
                                        .f = ~ sum(WpA_Rscript$manager_id == .x,
                                                   na.rm = TRUE)))

However when I use this code i get the following error:-

Error in Ops.factor(WpA_Rscript$manager_gpid, .x) : 
  level sets of factors are different

Can anyone help me to understand what the issue is here?

KInd regards,
Forrestgump

jcblum · September 21, 2019, 8:51pm

Hi @Forrestgump! Welcome!

Welcome! I'm afraid you'll need to supply some more info in order for helpers to be able to understand your problem.

Can you please try to compose a small, self-contained reproducible example that illustrates your problem? (follow that link for instructions and explanations) A reprex makes it much easier for others to understand your issue and figure out how to help.

Since you're new here, it might also be helpful to know how to properly format code and console output that you post. Using proper code formatting makes the site easier to read, prevents confusion (unformatted code can get garbled by the forum software ), and is generally considered the polite thing to do. Check out this FAQ to find out how — it's as easy as the click of a button! :

FJCC · September 21, 2019, 8:59pm

Since you are showing only a part of your code and none of your data, this is something of a guess. The two columns you are comparing in the sum() function, id and manager_id, are factors but they do not have the same levels.
(The error message mentions the column manager_gpid. Is there a typo somewhere?)

Judging from your previous post, this makes sense, since not all employees are managers. You can fix this by making the two columns be of the type character.

WpA_Rscript %>%
  mutate(id = as.character(id), 
              manager_id = as.character(manager_id)) %>%
  mutate(ManagerCount = map_int(.x = id,
                                        .f = ~ sum(manager_id == .x,
                                                   na.rm = TRUE)))

system · October 12, 2019, 8:39pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.