Hi there,
I am new to learning and using R. I have been trying to find out summary statistics of a variable in a filtered data. I am using the following code:
library(dplyr)
diameter7 <- filter(nigeria6, electricity_area == 1)
diameter7 %>%
group_by(affected7) %>%
summary(electricty_area)
However, I keep getting the error message that electricity_area is not found. But I have checked the data and electricity_area is very much present. I am confused and can't really figure out what I am doing wrong here. Please help.
what happens if you do
str(diameter7)
?
We probably need to see some sample data. A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with.
Here are two options.
library(tidyverse)
# for printing
iris %>% split(.$Species) %>% map(summary)
#> $setosa
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100
#> 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200
#> Median :5.000 Median :3.400 Median :1.500 Median :0.200
#> Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246
#> 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300
#> Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600
#> Species
#> setosa :50
#> versicolor: 0
#> virginica : 0
#>
#>
#>
#>
#> $versicolor
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> Min. :4.900 Min. :2.000 Min. :3.00 Min. :1.000 setosa : 0
#> 1st Qu.:5.600 1st Qu.:2.525 1st Qu.:4.00 1st Qu.:1.200 versicolor:50
#> Median :5.900 Median :2.800 Median :4.35 Median :1.300 virginica : 0
#> Mean :5.936 Mean :2.770 Mean :4.26 Mean :1.326
#> 3rd Qu.:6.300 3rd Qu.:3.000 3rd Qu.:4.60 3rd Qu.:1.500
#> Max. :7.000 Max. :3.400 Max. :5.10 Max. :1.800
#>
#> $virginica
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Min. :4.900 Min. :2.200 Min. :4.500 Min. :1.400
#> 1st Qu.:6.225 1st Qu.:2.800 1st Qu.:5.100 1st Qu.:1.800
#> Median :6.500 Median :3.000 Median :5.550 Median :2.000
#> Mean :6.588 Mean :2.974 Mean :5.552 Mean :2.026
#> 3rd Qu.:6.900 3rd Qu.:3.175 3rd Qu.:5.875 3rd Qu.:2.300
#> Max. :7.900 Max. :3.800 Max. :6.900 Max. :2.500
#> Species
#> setosa : 0
#> versicolor: 0
#> virginica :50
#>
#>
#>
# a dataframe with summary stats
# :( I don't want to use skimr
iris %>%
group_by(Species) %>%
summarize(summ = summary(Sepal.Length) %>% broom::tidy() %>% list()) %>%
unnest(summ)
#> Warning: `tidy.summaryDefault()` is deprecated. Please use `skimr::skim()`
#> instead.
#> Warning: `tidy.summaryDefault()` is deprecated. Please use `skimr::skim()`
#> instead.
#> Warning: `tidy.summaryDefault()` is deprecated. Please use `skimr::skim()`
#> instead.
#> # A tibble: 3 x 7
#> Species minimum q1 median mean q3 maximum
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 4.3 4.8 5 5.01 5.2 5.8
#> 2 versicolor 4.9 5.6 5.9 5.94 6.3 7
#> 3 virginica 4.9 6.22 6.5 6.59 6.9 7.9
Created on 2022-01-18 by the reprex package (v2.0.1)
1 Like
Maybe this is a better solution. No complaining from broom!
library(tidyverse)
iris %>%
as_tibble() %>%
group_by(Species) %>%
summarize_all(~summary(.) %>% as_tibble_row() %>% list()) %>%
unnest(-Species, names_sep = "_")
#> # A tibble: 3 x 25
#> Species Sepal.Length_Mi~ `Sepal.Length_1~ Sepal.Length_Me~ Sepal.Length_Me~
#> <fct> <table> <table> <table> <table>
#> 1 setosa 4.3 4.800 5.0 5.006
#> 2 versicolor 4.9 5.600 5.9 5.936
#> 3 virginica 4.9 6.225 6.5 6.588
#> # ... with 20 more variables: Sepal.Length_3rd Qu. <table>,
#> # Sepal.Length_Max. <table>, Sepal.Width_Min. <table>,
#> # Sepal.Width_1st Qu. <table>, Sepal.Width_Median <table>,
#> # Sepal.Width_Mean <table>, Sepal.Width_3rd Qu. <table>,
#> # Sepal.Width_Max. <table>, Petal.Length_Min. <table>,
#> # Petal.Length_1st Qu. <table>, Petal.Length_Median <table>,
#> # Petal.Length_Mean <table>, Petal.Length_3rd Qu. <table>, ...
Created on 2022-01-18 by the reprex package (v2.0.1)
1 Like
Thank you so much to all of you for help. I finally solved my problem.
system
Closed
February 9, 2022, 7:34am
6
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.