Dataframe logical expressions using two variables

cook675 · April 17, 2020, 1:48am

df <- data.frame(
  A = c("Y","N","Y","Y","N","Y","Y","N","N","N","Y","Y","N"),
  B = c(1,3,2,3,1,2,2,3,1,3,1,2,3)
 )
df

   A B
1  Y 1
2  N 3
3  Y 2
4  Y 3
5  N 1
6  Y 2
7  Y 2
8  N 3
9  N 1
10 N 3
11 Y 1
12 Y 2
13 N 3

I need to test this dataframe and find out if, for a given value of column B (1,2, or 3), what is the sum of all the Y's and what is the sum of all the N's.

like, "If B ==1, what is the sum of Y" and "If B == 1, what is the sum of N"

What I really need to do is find out if the sum of N or Y == 0 while column B = (1,2,3)

For instance, when B = 1, the sum of Y's is 2, and the Sum of N's is 2.

I intentionally left out a N when B = 2. So the sum of N when B = 2 is 0. How do I ask the dataframe this question?

Thanks I am lost on this one

andresrcs · April 17, 2020, 1:53am

Is this what you mean?

library(dplyr)

df <- data.frame(
    A = c("Y","N","Y","Y","N","Y","Y","N","N","N","Y","Y","N"),
    B = c(1,3,2,3,1,2,2,3,1,3,1,2,3)
)

df %>% 
    filter(B == 1) %>% 
    group_by(A) %>% 
    summarise(B = sum(B))
#> # A tibble: 2 x 2
#>   A         B
#>   <fct> <dbl>
#> 1 N         2
#> 2 Y         2

^{Created on 2020-04-17 by the reprex package (v0.3.0.9001)}

cook675 · April 17, 2020, 2:12am

Its not exactly correct because it sums the actual number of B; What I was looking for was a count of the number of instances where a given expression is true. Sorry I probably used the term sum incorrectly here. So it accually works for B==1 because, 1 is the count, but for B = 2 Y should be 4

Im having a hard time describing what Im after. Im running a loop, where column B will be the loop index, and I want to have an expression IF the sum of N's or Y's is 0 while i = (1, 2, or 3).

So IF the loop is on the second iteration, and it finds that there are no N's while i =2 (this is true in this case) then.... do xyz

does this make sense? Thanks so much for your help

cook675 · April 17, 2020, 2:49am

I think I thought of an easier way; if I subset the rows of the dataframe where B = 2, then I count count directly and if N or Y = 0 then I can exclude it

andresrcs · April 17, 2020, 2:51am

Like this then?

library(dplyr)

df <- data.frame(
    A = c("Y","N","Y","Y","N","Y","Y","N","N","N","Y","Y","N"),
    B = c(1,3,2,3,1,2,2,3,1,3,1,2,3)
)

df %>% 
    filter(B == 2) %>% 
    count(A)
#> # A tibble: 1 x 2
#>   A         n
#>   <fct> <int>
#> 1 Y         4

I'm having a hard time following you.

cook675 · April 17, 2020, 3:00am

Yes actually like that!

Can I ask you a question, how would I subset form this df all the rows where B=2? Im trying this:

Subset <- df[, which("B" == 2)]

I dont have the syntax quite right but i know im close....

andresrcs · April 17, 2020, 3:04am

library(dplyr)

df <- data.frame(
    A = c("Y","N","Y","Y","N","Y","Y","N","N","N","Y","Y","N"),
    B = c(1,3,2,3,1,2,2,3,1,3,1,2,3)
)

# Using dplyr syntax
df %>% 
    filter(B == 2)
#>   A B
#> 1 Y 2
#> 2 Y 2
#> 3 Y 2
#> 4 Y 2

# Using base R
df[df$B==2,]
#>    A B
#> 3  Y 2
#> 6  Y 2
#> 7  Y 2
#> 12 Y 2

cook675 · April 17, 2020, 3:06am

Oh thank you so much! I cant tell you how much I appreciate your help.

system · April 24, 2020, 3:06am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.