How to know the type of variable given that there's two variables?

RStudioLearner · September 4, 2018, 4:27am

I was asked to find the type of variable that has the highest mean. The data have two variables, how do I specifically get the type of the variable with the highest mean? TIA

mishabalyasin · September 4, 2018, 8:24am

Can you expand a little bit on what you've tried so far? I'm asking because your question sounds a lot like a homework question. The policy on that can be found here:

Leon · September 4, 2018, 11:47am

Can you provide us with an example?

```{r}
# Your example goes here... :-)
```

jcblum · September 5, 2018, 7:23am

Thanks for the additional context! A minor point of terminology: the “type of a variable” is a specific concept in R that means something different than what you’re talking about, so having more info really helps in clarifying what it is you’re trying to do.

Choosing a value from one variable in a data frame based on the value of another variable is a common task that falls under the topic of “subsetting”. This page shows several examples of ways to subset things in R:
http://cookbook-r.com/Basics/Getting_a_subset_of_a_data_structure/

Maybe start by taking a look at those examples to see if you can come up with some code of your own to share? It’s ok if what you try doesn’t work — people here will be happy to help you figure out how to fix it.

(If you want the whole story on subsetting in R, the “Subsetting” chapter in Advanced R, 1st ed. is pretty great: http://adv-r.had.co.nz/Subsetting.html)

jcblum · September 5, 2018, 4:07pm

Aha, yeah a challenge with helping with classwork questions is that when there are many ways to accomplish a task, it’s almost impossible to guess which variation an instructor will have decided to focus on.

A conceptual issue here is whether you want to subset the data frame (as the code you posted does), or whether you want to pull out one of the vectors that makes up the data frame and subset it directly (maybe what your instructor was expecting?). In this case, both methods have the same result:

# Starting with this data frame...
str(File)
#> 'data.frame':	7 obs. of  3 variables:
#>  $ Individual: Factor w/ 2 levels "Kent","Mark": 1 2 2 1 1 1 2
#>  $ Type      : Factor w/ 2 levels "Pistol","Shotgun": 2 1 2 2 1 2 2
#>  $ SquareRoot: int  456 753 234 423 894 311 131

# Subset whole data frame (matrix-style)
subset_df <- File[which(File$SquareRoot==median(File$SquareRoot)), "Type"]
subset_df
#> [1] Shotgun
#> Levels: Pistol Shotgun

# Subset just the Type variable
subset_vec <- File$Type[File$SquareRoot == median(File$SquareRoot)]
subset_vec
#> [1] Shotgun
#> Levels: Pistol Shotgun

identical(subset_df, subset_vec)
#> [1] TRUE

The Levels information is printed because your Type variable is a factor. This is the default for text read in from a CSV using read.csv() (although it also would make sense in this case if you were fitting a statistical model to this data). If you must return just a character string, you need to convert the variable or the result:

# Convert result to character
as.character(File$Type[File$SquareRoot == median(File$SquareRoot)])
#> [1] "Shotgun"

# Convert whole Type variable to character
File$Type <- as.character(File$Type)

File$Type[File$SquareRoot == median(File$SquareRoot)]
#> [1] "Shotgun"