I've only recently started coding and am totally stuck!
I have a large list (Large.List.Df) that consists of 50+ arrays (each with 1000+ rows and 5+ columns). These arrays are all listed in double square brackets (e.g. [[A]] ) in a drop down menu when you open the dataframe Large.List.Df
I would like to use group_by() on name.of.column in each of the 50+ arrays so that I can mutate(name.of.new.column = 1:n()). I have used this combination of group_by(name.of.column) and mutate(name.of.new.column = 1:n()) on a normal dataframe (so just one element of the large list) and it works perfectly. But, if I run:
Are you looking for something like the following. If not, please post an example of your data as a Reproducible Example.
library(purrr)
#> Warning: package 'purrr' was built under R version 3.5.3
library(dplyr)
LIST <- list(A = data.frame(D = 1:6, B = rep(LETTERS[1:3], 2)),
C = data.frame(E = 2:7, B = rep(LETTERS[1:3], 2)))
LIST
#> $A
#> D B
#> 1 1 A
#> 2 2 B
#> 3 3 C
#> 4 4 A
#> 5 5 B
#> 6 6 C
#>
#> $C
#> E B
#> 1 2 A
#> 2 3 B
#> 3 4 C
#> 4 5 A
#> 5 6 B
#> 6 7 C
MyFunc <- function(DF) {
DF %>% group_by(B) %>%
mutate(NewCol = 1:n()) %>%
arrange(B)
}
LIST2 <- map(LIST, MyFunc)
LIST2
#> $A
#> # A tibble: 6 x 3
#> # Groups: B [3]
#> D B NewCol
#> <int> <fct> <int>
#> 1 1 A 1
#> 2 4 A 2
#> 3 2 B 1
#> 4 5 B 2
#> 5 3 C 1
#> 6 6 C 2
#>
#> $C
#> # A tibble: 6 x 3
#> # Groups: B [3]
#> E B NewCol
#> <int> <fct> <int>
#> 1 2 A 1
#> 2 5 A 2
#> 3 3 B 1
#> 4 6 B 2
#> 5 4 C 1
#> 6 7 C 2
This was exactly what I needed - thank you so very much @FJCC !!! Please could you explain what the function(DF) does? I'm a total newbie and haven't actually written any functions yet.
The above part of my code defines a new function that takes one argument named DF. It processes DF through the steps within the braces, grouping by B, mutating it to add NewCol, and sorting by B and then returns the result of that process. It would have been clearer if I had written
After running that code, I can pass a data frame that has a column named B into MyFunc and get back a data frame with the additional NewCol. Below is an example of NewFunc acting on the first element of the LIST I defined in my previous post.
library(dplyr)
LIST <- list(A = data.frame(D = 1:6, B = rep(LETTERS[1:3], 2)),
C = data.frame(E = 2:7, B = rep(LETTERS[1:3], 2)))
LIST
#> $A
#> D B
#> 1 1 A
#> 2 2 B
#> 3 3 C
#> 4 4 A
#> 5 5 B
#> 6 6 C
#>
#> $C
#> E B
#> 1 2 A
#> 2 3 B
#> 3 4 C
#> 4 5 A
#> 5 6 B
#> 6 7 C
MyFunc <- function(DF) {
DF %>% group_by(B) %>%
mutate(NewCol = 1:n()) %>%
arrange(B)
}
subList <- MyFunc(LIST[[1]])
subList
#> # A tibble: 6 x 3
#> # Groups: B [3]
#> D B NewCol
#> <int> <fct> <int>
#> 1 1 A 1
#> 2 4 A 2
#> 3 2 B 1
#> 4 5 B 2
#> 5 3 C 1
#> 6 6 C 2
MyFunc is no different than a standard R function like mean() that returns the average of whatever is passed to it, except that MyFunc is very simple, with no error handling or flexibility.
I coupled MyFunc with map(). What map() does is act on each element of the list that is given as its first argument using the function that is given as its second argument.
The call