I have data that is based on village name, incidence rates and maize yield (high, medium, low) for each village.
I'm trying to group this into maize yield and show the average incidence rates for the different yields (high, medium, low).
I'm very new to this and have been trying for a while but don't understand what I'm doing wrong, any tips or explanation on how to do this?
See the example below for how to do this with a built-in dataset in R.
I think you'd benefit from reading up about {dplyr}
!
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# first lets look at some data
# note that there's three columns:
# * len (numeric)
# * dose (numeric, but distribution seems low)
# * supp (categorical)
summary(ToothGrowth)
#> len supp dose
#> Min. : 4.20 OJ:30 Min. :0.500
#> 1st Qu.:13.07 VC:30 1st Qu.:0.500
#> Median :19.25 Median :1.000
#> Mean :18.81 Mean :1.167
#> 3rd Qu.:25.27 3rd Qu.:2.000
#> Max. :33.90 Max. :2.000
# how might we get the average "len" within the
# different categories?
ToothGrowth %>%
group_by(supp) %>%
summarise(len = mean(len, na.rm = T))
#> # A tibble: 2 × 2
#> supp len
#> <fct> <dbl>
#> 1 OJ 20.7
#> 2 VC 17.0
ToothGrowth %>%
group_by(dose) %>%
summarise(len = mean(len, na.rm = T))
#> # A tibble: 3 × 2
#> dose len
#> <dbl> <dbl>
#> 1 0.5 10.6
#> 2 1 19.7
#> 3 2 26.1
ToothGrowth %>%
group_by(supp, dose) %>%
summarise(len = mean(len, na.rm = T))
#> `summarise()` has grouped output by 'supp'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups: supp [2]
#> supp dose len
#> <fct> <dbl> <dbl>
#> 1 OJ 0.5 13.2
#> 2 OJ 1 22.7
#> 3 OJ 2 26.1
#> 4 VC 0.5 7.98
#> 5 VC 1 16.8
#> 6 VC 2 26.1
Created on 2022-08-20 by the reprex package (v2.0.1)
look at this example :
lets say we have the below data frame :
df <- data.frame(team = c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'),
position = c('G', 'G', 'F', 'G', 'F', 'F', 'F', 'G', 'G', 'F', 'F', 'F'),
points = c(4, 13, 7, 8, 15, 15, 17, 9, 21, 22, 25, 31))
df
we will use this command :
df %>% group_by(team) %>% summarise(total=sum(points))
will group the column ( Team ) to 3 values (A,B,C) then after it will summarize the total of points :
output :
team total
A 24
B 64
C 99
you can notice how , its grouped team and showed the total summary of the points .
I hope its clear .
if you need any further clarification please let me know .
Kind Regards
Using R
benefits from applying a functional programming approach—f(x) - y, just like school algebra.
x is what is at hand, in this case a data frame with three variables, one of which is numeric and two of which are character. This organization is tidy
because each row represents one unique observation.
y is the desired object, which presents a summary statistic (in this case mean
) of the numeric variable stratified by yield.
f is the function to turn x into y. Just as x and y are composite objects, f may consist of the application of one function to one or more other functions. "Functions are first-class objects." Just like f(g(x)=y.
There's usually more than one function(s) available. @JackDavison has illustrated the use of dplyr
to do this. dplyr
is part of a suite of packages that are very popular on this site and there is a wealth of examples to find using the search tool. dplyr
has a tendency to pose solutions in a procedural
form—do this, then do that. That can be helpful but can result in a sense that each solution is unique to its problem. The focus may shift to how
and the syntax required may multiply.
Here are three ways to create y
my_cars <- mtcars
# convert the grouping variable to a factor; if the cyl variable had
# been "four," "six" and "eight", it would work
# the same way
my_cars$cyl <- as.factor(my_cars$cyl)
# using {stats} (loaded by default)
aggregate(my_cars$hp, by = list(my_cars$cyl), FUN=mean)
#> Group.1 x
#> 1 4 82.63636
#> 2 6 122.28571
#> 3 8 209.21429
# using data.table
library(data.table)
DT <- data.table(my_cars)
DT[,mean(hp), by = .(cyl)]
#> cyl V1
#> 1: 6 122.28571
#> 2: 4 82.63636
#> 3: 8 209.21429
# using gt (for when purpose of creating output is display)
library(gt)
output <- DT[,.(cyl,mpg)] |>
gt(groupname_col = "cyl") |>
summary_rows(
groups = TRUE,
columns = mpg,
fns = list(
average = "mean",
total = "sum",
SD = "sd")
)
output
#nqnhbqygif .gt_table {
display: table;
border-collapse: collapse;
margin-left: auto;
margin-right: auto;
color: #333333;
font-size: 16px;
font-weight: normal;
font-style: normal;
background-color: #FFFFFF;
width: auto;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #A8A8A8;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #A8A8A8;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
}
#nqnhbqygif .gt_heading {
background-color: #FFFFFF;
text-align: center;
border-bottom-color: #FFFFFF;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
}
#nqnhbqygif .gt_title {
color: #333333;
font-size: 125%;
font-weight: initial;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
border-bottom-color: #FFFFFF;
border-bottom-width: 0;
}
#nqnhbqygif .gt_subtitle {
color: #333333;
font-size: 85%;
font-weight: initial;
padding-top: 0;
padding-bottom: 6px;
padding-left: 5px;
padding-right: 5px;
border-top-color: #FFFFFF;
border-top-width: 0;
}
#nqnhbqygif .gt_bottom_border {
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}
#nqnhbqygif .gt_col_headings {
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
}
#nqnhbqygif .gt_col_heading {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: normal;
text-transform: inherit;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: bottom;
padding-top: 5px;
padding-bottom: 6px;
padding-left: 5px;
padding-right: 5px;
overflow-x: hidden;
}
#nqnhbqygif .gt_column_spanner_outer {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: normal;
text-transform: inherit;
padding-top: 0;
padding-bottom: 0;
padding-left: 4px;
padding-right: 4px;
}
#nqnhbqygif .gt_column_spanner_outer:first-child {
padding-left: 0;
}
#nqnhbqygif .gt_column_spanner_outer:last-child {
padding-right: 0;
}
#nqnhbqygif .gt_column_spanner {
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
vertical-align: bottom;
padding-top: 5px;
padding-bottom: 5px;
overflow-x: hidden;
display: inline-block;
width: 100%;
}
#nqnhbqygif .gt_group_heading {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: middle;
}
#nqnhbqygif .gt_empty_group_heading {
padding: 0.5px;
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
vertical-align: middle;
}
#nqnhbqygif .gt_from_md > :first-child {
margin-top: 0;
}
#nqnhbqygif .gt_from_md > :last-child {
margin-bottom: 0;
}
#nqnhbqygif .gt_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
margin: 10px;
border-top-style: solid;
border-top-width: 1px;
border-top-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: middle;
overflow-x: hidden;
}
#nqnhbqygif .gt_stub {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-right-style: solid;
border-right-width: 2px;
border-right-color: #D3D3D3;
padding-left: 5px;
padding-right: 5px;
}
#nqnhbqygif .gt_stub_row_group {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-right-style: solid;
border-right-width: 2px;
border-right-color: #D3D3D3;
padding-left: 5px;
padding-right: 5px;
vertical-align: top;
}
#nqnhbqygif .gt_row_group_first td {
border-top-width: 2px;
}
#nqnhbqygif .gt_summary_row {
color: #333333;
background-color: #FFFFFF;
text-transform: inherit;
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
}
#nqnhbqygif .gt_first_summary_row {
border-top-style: solid;
border-top-color: #D3D3D3;
}
#nqnhbqygif .gt_first_summary_row.thick {
border-top-width: 2px;
}
#nqnhbqygif .gt_last_summary_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}
#nqnhbqygif .gt_grand_summary_row {
color: #333333;
background-color: #FFFFFF;
text-transform: inherit;
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
}
#nqnhbqygif .gt_first_grand_summary_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-top-style: double;
border-top-width: 6px;
border-top-color: #D3D3D3;
}
#nqnhbqygif .gt_striped {
background-color: rgba(128, 128, 128, 0.05);
}
#nqnhbqygif .gt_table_body {
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}
#nqnhbqygif .gt_footnotes {
color: #333333;
background-color: #FFFFFF;
border-bottom-style: none;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
}
#nqnhbqygif .gt_footnote {
margin: 0px;
font-size: 90%;
padding-left: 4px;
padding-right: 4px;
padding-left: 5px;
padding-right: 5px;
}
#nqnhbqygif .gt_sourcenotes {
color: #333333;
background-color: #FFFFFF;
border-bottom-style: none;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
}
#nqnhbqygif .gt_sourcenote {
font-size: 90%;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
}
#nqnhbqygif .gt_left {
text-align: left;
}
#nqnhbqygif .gt_center {
text-align: center;
}
#nqnhbqygif .gt_right {
text-align: right;
font-variant-numeric: tabular-nums;
}
#nqnhbqygif .gt_font_normal {
font-weight: normal;
}
#nqnhbqygif .gt_font_bold {
font-weight: bold;
}
#nqnhbqygif .gt_font_italic {
font-style: italic;
}
#nqnhbqygif .gt_super {
font-size: 65%;
}
#nqnhbqygif .gt_two_val_uncert {
display: inline-block;
line-height: 1em;
text-align: right;
font-size: 60%;
vertical-align: -0.25em;
margin-left: 0.1em;
}
#nqnhbqygif .gt_footnote_marks {
font-style: italic;
font-weight: normal;
font-size: 75%;
vertical-align: 0.4em;
}
#nqnhbqygif .gt_asterisk {
font-size: 100%;
vertical-align: 0;
}
#nqnhbqygif .gt_slash_mark {
font-size: 0.7em;
line-height: 0.7em;
vertical-align: 0.15em;
}
#nqnhbqygif .gt_fraction_numerator {
font-size: 0.6em;
line-height: 0.6em;
vertical-align: 0.45em;
}
#nqnhbqygif .gt_fraction_denominator {
font-size: 0.6em;
line-height: 0.6em;
vertical-align: -0.05em;
}
mpg | |
---|---|
6 | |
21.0 | |
21.0 | |
21.4 | |
18.1 | |
19.2 | |
17.8 | |
19.7 | |
average | 19.74 |
total | 138.20 |
SD | 1.45 |
4 | |
22.8 | |
24.4 | |
22.8 | |
32.4 | |
30.4 | |
33.9 | |
21.5 | |
27.3 | |
26.0 | |
30.4 | |
21.4 | |
average | 26.66 |
total | 293.30 |
SD | 4.51 |
8 | |
18.7 | |
14.3 | |
16.4 | |
17.3 | |
15.2 | |
10.4 | |
10.4 | |
14.7 | |
15.5 | |
15.2 | |
13.3 | |
19.2 | |
15.8 | |
15.0 | |
average | 15.10 |
total | 211.40 |
SD | 2.56 |
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.