Hello, I was directed here by Hadley to pose a question. I am working through the R for Data Science book and I am a little confused by grouped mutates.
I am working with a dataset called "flights" that has 19 variables and over 300,000 rows of flight data from NYC.
I enter the following code
popular_dests <- flights %>%
+ group_by(dest) %>%
+ filter(n() > 365)
popular_dests
The following output is produced
Source: local data frame [332,577 x 19]
Groups: dest [77]
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr>
1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR
2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA
3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK
4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK
5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA
6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR
7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR
8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA
9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK
10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA
# ... with 332,567 more rows, and 6 more variables: dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
# time_hour <dttm>
I don't really understand why this code doesn't appear to have changed popular_dests (there were 336,776 rows in flights, there are 332577 rows in popular_dests.
Is it that the code has taken the original dataset and only removed those flight where there are not 365 flights in a year? It only temporarily groups the data for the sake of the filter, and then ungroups again?
Thanks in advance.