# The 'errors' data have already been loaded.
head(errors)
# Generate an object called 'totals' that contains the numbers of good and bad predictions for polls rated A- and C-
totals <- errors %>% filter(grade %in% c("A-", "C-")) %>% group_by(grade,hit) %>% summarize(num = n()) %>% spread(grade,num)
totals
# Print the proportion of hits for grade A- polls to the console
mean(hit == TRUE / A-)
# Print the proportion of hits for grade C- polls to the console
mean(hit == TRUE / C-)
#> Error: <text>:9:22: unexpected ')'
#> 8: # Print the proportion of hits for grade A- polls to the console
#> 9: mean(hit == TRUE / A-)
#> ^
What I am trying to do: Filter the errors data for just polls with grades A- and C-. Calculate the proportion of times each grade of poll predicted the correct winner. I am trying to generate a 2 x2 tibble, I keep getting 2 x 3.
How to calculate the number of hits which are TRUE for each grade of A- and C- ?
It'll be easier to provide a reprex. The error above can be avoided by surrounding non-standard R column names with backticks: `A-`
this isnt a reprex ?
No, I cannot reproduce your input object errors
.
# The 'errors' data have already been loaded.
head(errors)
# Generate an object called 'totals' that contains the numbers of good and bad predictions for polls rated A- and C-
totals <- errors %>% filter(grade %in% c("A-", "C-")) %>% group_by(grade,hit) %>% summarize(num = n()) %>% spread(grade,num)
totals
# Print the proportion of hits for grade A- polls to the console
mean(hit == TRUE / `A-`)
# Print the proportion of hits for grade C- polls to the console
mean(hit == TRUE / `C-`)
#> Error: <text>:9:22: unexpected ')'
#> 8: # Print the proportion of hits for grade A- polls to the console
#> 9: mean(hit == TRUE / A-)
#> ^
I recently asked about generating a reprex
here. On my machine, I get this:
head(errors)
#> Error in head(errors) : object 'errors' not found
that object was predefined and preloaded into my workspace. how to find out how it was generated e.g the packages used and the original dataset used ?
If you do not know the data source, you can run dput(errors)
and then copy the output on the console and paste it here.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(dslabs)
data("polls_us_election_2016")
# Create a table called `polls` that filters by state, date, and reports the spread
polls <- polls_us_election_2016 %>%
filter(state != "U.S." & enddate >= "2016-10-31") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
# Create an object called `cis` that columns for the lower and upper confidence intervals. Select the columns indicated in the instructions.
N <- polls$samplesize
cis <- polls %>% mutate(X_hat=(spread+1)/2,se=2*sqrt(X_hat*(1-X_hat)/N),lower=spread-qnorm(0.975)*se,upper=spread+qnorm(0.975)*se) %>%
select(state,startdate,enddate,pollster,grade,spread,lower,upper)
add <- results_us_election_2016 %>% mutate(actual_spread = clinton/100 - trump/100) %>% select(state, actual_spread)
cis <- cis %>% mutate(state = as.character(state)) %>% left_join(add, by = "state")
errors <- cis %>% mutate(error = (spread - actual_spread),hit = sign(spread) == sign(actual_spread))
# The 'errors' data have already been loaded. Examine them using the `head` function.
head(errors)
#> state startdate enddate pollster grade
#> 1 New Mexico 2016-11-06 2016-11-06 Zia Poll <NA>
#> 2 Virginia 2016-11-03 2016-11-04 Public Policy Polling B+
#> 3 Iowa 2016-11-01 2016-11-04 Selzer & Company A+
#> 4 Wisconsin 2016-10-26 2016-10-31 Marquette University A
#> 5 North Carolina 2016-11-04 2016-11-06 Siena College A
#> 6 Georgia 2016-11-06 2016-11-06 Landmark Communications B
#> spread lower upper actual_spread error hit
#> 1 0.02 -0.001331221 0.0413312213 0.083 -0.063 TRUE
#> 2 0.05 -0.005634504 0.1056345040 0.054 -0.004 TRUE
#> 3 -0.07 -0.139125210 -0.0008747905 -0.094 0.024 TRUE
#> 4 0.06 0.004774064 0.1152259363 -0.007 0.067 FALSE
#> 5 0.00 -0.069295191 0.0692951912 -0.036 0.036 FALSE
#> 6 -0.03 -0.086553820 0.0265538203 -0.051 0.021 TRUE
# Generate an object called 'totals' that contains the numbers of good and bad predictions for polls rated A- and C-
totals <- errors %>% filter(grade %in% c("A-", "C-")) %>% group_by(grade,hit) %>% summarize(num = n()) %>% spread(grade,num)
#> Error in spread(., grade, num): could not find function "spread"
totals
#> Error in eval(expr, envir, enclos): object 'totals' not found
# Print the proportion of hits for grade A- polls to the console
totals %>% mean(hit == TRUE / `A-`)
#> Error in eval(lhs, parent, parent): object 'totals' not found
# Print the proportion of hits for grade C- polls to the console
totals %>% mean(hit == TRUE / `C-`)
#> Error in eval(lhs, parent, parent): object 'totals' not found
this should work. let me know
Is this what you are looking for?
prop.table(as.matrix(totals[, -1]), margin = 2)
#> C- A-
#> [1,] 0.1385042 0.1969697
#> [2,] 0.8614958 0.8030303
I think i need a 2 x 3 tibble with "hit" ,"A-","C-"
was my reprex ok ? did it run on your machine ?
this is a data camp exercise im working on, and the libraries and dataset are all in r studio.
It ran ok. I see this for totals
totals <- errors %>%
dplyr::filter(grade %in% c("A-", "C-")) %>%
dplyr::group_by(grade,hit) %>%
dplyr::summarize(num = n()) %>%
tidyr::spread(grade, num)
totals
#> # A tibble: 2 x 3
#> hit `C-` `A-`
#> <lgl> <int> <int>
#> 1 FALSE 50 26
#> 2 TRUE 311 106
I thought you wanted 2 by 2?
the instructions called for 2 x 2 , but who knows what the auto-grader will actually accept. I think the "hit" is 1, the grades are actually counted as 1. I was able to figure it out though. thanks. catch you next time.