Hi. Couple of preliminaries; screenshots are seldom very helpful, while a FAQ: What's a reproducible example (`reprex`) and how do I do one? with data (actual or representative) attracts more answers. Think of it as the human equivalent of R
's lazy evaluation.
Also, there's a FAQ: Homework Policy
Let's simulate your problem by reducing it to the basics.
You already have a variable, df$avg_income
that's been scrubbed of NA
s. I'm going to create a proxy from some made-up data and illustrate a tidy
solution.
require(charlatan)
#> Loading required package: charlatan
require(dplyr)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
require(tibble)
#> Loading required package: tibble
phony <- enframe(ch_integer(n = 50000, min = 4940, max = 2000001))
phony %>% select(-name) %>% rename(income = value) -> phony
mid <- median(phony$income)
phony %>% mutate(category = ifelse(income < mid, "low","high"))
#> # A tibble: 50,000 x 2
#> income category
#> <dbl> <chr>
#> 1 1724785 high
#> 2 1094812 high
#> 3 1322170 high
#> 4 1308088 high
#> 5 1066873 high
#> 6 1386093 high
#> 7 1616536 high
#> 8 180569 low
#> 9 123062 low
#> 10 1004668 high
#> # … with 49,990 more rows
Created on 2020-02-01 by the reprex package (v0.3.0)
ch_integer()
doesn't respect set.seed()
, so it will return a different result each time. And, of course, you don't need it with your real data.