I am not good at R coding. I have a problem to subset a data set based on column names. In my data set, The first 14 columns have words as names, and the rest 1000 columns have numbers as names (not in order). When I read the data in, I guess all the column names become strings. How do I subset certain columns based on column names' value (like column names between 750 and 850, and still keep the first 14 columns) among those 1000 columns with numbers as names? Is there any easy way to do it? Your help is very appreciated.
Column names have to be strings; they can't be numeric. Although you can subset based on name, there's no point in this case since the names are number-like anyway. We can use the numeric indices, instead.
the subset <- DF[,c(1:14,243,546,547)]
The comma separates rows from columns; in this case you want all rows, so there is just the comma.
Aside from the provided perfectly working solution in base-r, you may want consider the tidyverse package as part of your toolkit:
The tidyverse is a coherent system of packages for data manipulation, exploration and visualization that share a common design philosophy.
For instance, in the following chapter - of the excellent online free R for Data Science book - the select() function is introduced and several approaches of selecting your column variables are described.
This book is definitely worthwhile to get you started.
It is possibly to achieve this with base-r, by using a regex expression.
As I happen to favour working with the tidyverse approach, I still recommend to have a look at section 5.4 of the R for Data Science book, where selecting columns based on pattern matching is introduced.
There are a number of helper functions you can use within select() :
starts_with("abc") : matches names that begin with “abc”.
ends_with("xyz") : matches names that end with “xyz”.
contains("ijk") : matches names that contain “ijk”.
matches("(.)\\1") : selects variables that match a regular expression. This one matches any variables that contain repeated characters. You’ll learn more about regular expressions in strings.
num_range("x", 1:3) : matches x1 , x2 and x3 .
See ?select for more details.
If you're still struggling, please provide more detailed information on the column names and which one you're trying to select. Then we're able to help in constructing the right pattern and solution in either base-r or tidyverse.
That's true if all 300 numbers are discontinuous. If they are blocks \dots
c(267,290:321,415:682 ...)
If the columns are in an sequence without much order, the suggested reprex methods based on character representation is appropriate, but possibly just as much work depending on the naming convention.