So I'm guessing from your code that you're trying to subset the big community_all
data frame? I see you've loaded the tidyverse
packages at the start, but you aren't (yet) using any of their tools. Here's how you would subset community_all
, tidyverse
-style:
select(community_all, Community_Name, Community_Code)
The output will be a data frame with just those two columns in it. To get a handle on using select()
and the other dplyr
data-wrangling "verbs", start here: 5 Data transformation | R for Data Science. A major benefit of using these tools is that they abstract away a bunch of the complexity I'm about to go into below!
Why didn't the code you tried work?
cbind()
tries to guess whether you want data frame or matrix output based on what you pass to it, and then it uses totally different code under the hood based on what it guessed. At least one argument has to be a data frame for it to use the data frame method (the only one where stringsAsFactors
means anything). You might be a little surprised to hear that you didn't actually pass any data frames to cbind()
!
I'm not totally sure what you did pass, because your code as written shouldn't work. Based on what you posted, referring to Community_Name
and Community_Code
should have caused an "object not found" error. I'm guessing that you did something earlier in your session that made these objects exist (or appear to exist) on their own. You could have created new vectors with those names, or you might have used attach(community_all)
.
A brief tour of R subsetting, and a warning about attach()...
# Working with the built-in `mtcars` dataset...
str(mtcars)
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
#> $ disp: num 160 160 108 258 360 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec: num 16.5 17 18.6 19.4 17 ...
#> $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
#> $ am : num 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# Dollar-sign subsetting extracts a vector from a data frame
str(mtcars$cyl)
#> num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# Double-bracket subsetting extracts a vector from a data frame
str(mtcars[["cyl"]])
#> num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
# Single bracket subsetting preserves the data frame structure
str(mtcars["cyl"])
#> 'data.frame': 32 obs. of 1 variable:
#> $ cyl: num 6 6 4 6 8 6 8 4 4 6 ...
# Variables called on their own thanks to `attach()` act like
# dollar-sign or double-bracket subsetting (extracts a vector)
attach(mtcars)
str(cyl)
#> num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
Created on 2018-10-07 by the reprex package (v0.2.1)
I advise you to avoid using attach()
. It's a convenience function that lets you refer to vectors from a data frame without using the dataframe_name$
prefix. This seemed like a great idea once upon a time because it saved typing, so you see it a lot in R examples of a certain vintage. But it can cause all sorts of confusion and errors, because it effectively creates a bunch of invisible objects that you have to remember are there (they won't show up in your workspace). I strongly recommend avoiding it until you are sure you know what you're doing and understand the dangers (and by then, chances are you will have come up with your own reasons to avoid attach()
).
One way or another, you managed to pass individual vectors to cbind()
, so it used the matrix method. The matrix method doesn't know what to do with stringsAsFactors
, so it assumed that was just another vector that you were passing in, one that only has FALSE
values.
Like I mentioned above, matrices can only have a single type of data, so R had to convert everything in the matrix into one type. Per the documentation, cbind()
does this as follows:
The type of a matrix result determined from the highest type of any of the inputs in the hierarchy raw < logical < integer < double < complex < character < list .
So you seem to have passed cbind()
a vector of factor data, a vector of integer data, and a vector of logical data. Factors are integers under the hood, so as a result of the above hierarchy, you got a matrix of integers (FALSE
converts to integer 0
). Subsequently converting the matrix to a data frame can't reverse that operation, so you just wind up with a data frame of integers.
Whew! If you're still with me, one more point... While what you tried is creative, you might have already started to realize that using cbind()
was sort of going out of your way if all you wanted to do was subset your data frame. Here's an example of how this is more typically done in base R: Getting a subset of a data structure