Doing data analysis on .sav file - how to combine sub-items into the main variable name

RSLearner · June 18, 2025, 12:09pm

Hello,

Before analyzing my dataset (.sav file), I want to "extract" (meaning to view or use) specific variables which are relevant for my analysis. However, the main Variable (e. g. Well) shows as sub-variables (e. g. Well1, Well2, Well3...). Therefore, when using the combine function (ie. c) for "Well" and "Good" (as an example for my relevant variables that I wish to analyse), I don't get any results as R does not know that "Well" consists of items "Well1, Well2, Well3 etc). My question, how do I get the values for one variable 'Well" (consisting of the sub items Well1, Well2, Well3...)? Do I need to create the final score for "Well" of this instrument first before being able to do the descriptive Statistics for "Well" (and the other variables that are of interest to me)? If so, what's the best way to do that? Also, how do deal with demographic data? Thanks so much.

jrkrideau · June 18, 2025, 2:10pm

It is hard to suggest anything without some idea of what the actual structure of the data looks like.

I assume that you have successfully read the data into an R data.frame or tibble. If so can you supply us with some sample data.?

The best way to supply data is to use the dput() function. Do dput(mydata) where "mydata" is the name of your dataset. For really large datasets probably dput(head(mydata, 100) is sufficient. Copy the output and paste it here between

```

This gives us an exact copy of your data.

RSLearner · June 23, 2025, 3:59pm

Hello, thanks a lot for your message. I can't share the exact data as it's confidential. That's why I tried to explain it using an example.

jrkrideau · June 23, 2025, 6:07pm

Okay.

I doubt that we need the actual data; what I think we need is a data sample that mimics the structure of your data. We need to know which variables are character, which are numeric and soon.

At the moment I have no idea of how to interpret "R does not know that "Well" consists of items "Well1, Well2, Well3 etc". This could mean half-a-dozen different data structures.

Have a look at the very basic example below. If we just print dat1 and dat2 they look the same but if we look at their structures usingstr() we see they are different and we need to know this sort of thing to work with the data.

dat1 <- data.frame(xx = as.factor(c('a', 'b', 'c', 'd')),
                   yy = c(1, 2, 3,4))

dat2 <- data.frame(xx = c('a', 'b', 'c', 'd'),
                   yy = c(1, 2, 3,4))

dat1
dat2

str(dat1)
str(dat2)