Matching columns and rows

jhodzic · January 15, 2019, 6:32pm

I have two data frames, one data frame has columns of four letter codes "ABCD, etc." that correspond to specific species codes, with the input value being how many of these species were present in a given plot observation.

The other data frame has these exact codes but as rows. However, one of the columns specifies a property (big, small, medium) of the species code.

My problem is I need to fig
ure out how many four letter codes that are "big" are in the first matrix.
So I need to somehow match the columns of one data frame with the rows of another, so I can know on average how many "big species" each plot had.

(see screenshot for a simplified example; the two data frames are much larger and have other columns as well)

I've been battling at this for awhile, trying to use merge, trying to use the pipe operator, etc. but can't seem to figure it out. There is probably some really elegant way, but I'm new to this. Any advice is much appreciated.

andresrcs · January 15, 2019, 6:56pm

For future reference this is the correct way of sharing sample data (pasting screen captures is not a good thing to do here).

I think this is what you want

library(dplyr)
library(tidyr)

table_1 <- data.frame(stringsAsFactors = FALSE,
                      Plot = c('Plot1', 'Plot2', 'Plot3'),
                      ABCD = c(1, 3, 2),
                      EFGH = c(2, 3, 0),
                      IJKL = c(2, 3, 1),
                      MNOP = c(3, 2, 1)
)
table_2 <- data.frame(stringsAsFactors = FALSE,
                      Code = c('ABCD', 'EFGH', 'IJKL', 'MNOP'),
                      Size = c('Big', 'Small', 'Medium', 'Big')
)

table_1 %>% 
    gather(Code, Count, ABCD:MNOP) %>% 
    left_join(table_2, by = 'Code') %>% 
    group_by(Plot, Size) %>% 
    summarise(n = sum(Count))
#> # A tibble: 9 x 3
#> # Groups:   Plot [?]
#>   Plot  Size       n
#>   <chr> <chr>  <dbl>
#> 1 Plot1 Big        4
#> 2 Plot1 Medium     2
#> 3 Plot1 Small      2
#> 4 Plot2 Big        5
#> 5 Plot2 Medium     3
#> 6 Plot2 Small      3
#> 7 Plot3 Big        3
#> 8 Plot3 Medium     1
#> 9 Plot3 Small      0

^{Created on 2019-01-15 by the reprex package (v0.2.1)}

jhodzic · January 15, 2019, 7:06pm

thanks for your help and for the comment about best practices; I'll make sure to do that next time. I'll try your code.

jhodzic · January 16, 2019, 4:20am

I understand your thinking here but I'm hoping to streamline it a little bit if possible. I have 67 Plots, and 100s of species, so manually entering everything would be very hard.

andresrcs · January 16, 2019, 4:22am

What exactly you would be entering manually? I don't understand

jhodzic · January 16, 2019, 4:24am

table_1 <- data.frame(stringsAsFactors = FALSE,
                      Plot = c('Plot1', 'Plot2', 'Plot3'),
                      ABCD = c(1, 3, 2),
                      EFGH = c(2, 3, 0),
                      IJKL = c(2, 3, 1),
                      MNOP = c(3, 2, 1)

The values for the sizes

andresrcs · January 16, 2019, 4:26am

That is just the data you gave as example, the code below that, is what you have to apply to your real data

table_1 %>% 
    gather(Code, Count, -Plot) %>% 
    left_join(table_2, by = 'Code') %>% 
    group_by(Plot, Size) %>% 
    summarise(n = sum(Count))

jhodzic · January 16, 2019, 6:47am

Ah, duh. Thanks ; I didn't know about the gather() function.

cderv · January 16, 2019, 10:04pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · January 23, 2019, 10:04pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.