defining which portion of a column to 'look at'

thmsrs · March 7, 2022, 1:04am

I'm working with a dataset which I've 'attached' allowing me to use functions like mean(), summary(), by simply inputting the variable name, eg. mean(age).

I'd like to be able to define which part of the column the function looks at when returning these values.
Let me explain (look at image below). Study 1 is separated from Study 2. The last sample of study 1 is on row 510, the first of Study 2 is on row 511. I want to be able to use the mean, summary, etc. functions on rows (0-510], and [511, >) separately.

What's the best way to do this?

Current code:

Mean house size across both studies

attach(assignment2)
summary(housesize)

Analyzing the 'age' Quantitative variable

summary(age)

ibertchen · March 7, 2022, 1:40am

dplyr (A Grammar of Data Manipulation • dplyr) is your good friend.

library(dplyr)

df %>%
  group_by(study) %>%
  summarize(age_mean = mean(age))

thmsrs · March 7, 2022, 1:46am

How exactly would I implement into my code?

FJCC · March 7, 2022, 2:35am

Let's say your data are in a data frame named DF. You can make a data frame that stores the mean age of the two studies using @ibertchen's code.

library(dplyr)

MeanAges <- DF %>%
  group_by(study) %>%
  summarize(age_mean = mean(age))

If you do not want to use dplyr, you could calculate the mean of Study 1 with

mean(DF[DF$study == 1, "age"])

I cannot think of a handy way to do this while using attach() but I never use attach(), so that may just be a symptom of my preferences.

system · March 28, 2022, 2:35am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.