No Error, but Wrong Results

Hello! I will start by saying that I'm new to R so this may be a simple user error, but I'm having a scenario where the code below using Group_By and Summarise for simple descriptive statistics 'works' in the sense that no Console or other errors are thrown, but the output is wrong (i.e. the 'min' is not the min for that grouped variable).

I'm using a simplified data set that looks as follows (but with additional rows):

DAA1_Dataset.Tenure DAA1_Dataset.Monthly.Pay DAA1_Dataset.Gender..binary.
1                   1                     800                              0
2                   1                     1100                            0
3                   1                     1200                            1
4                   1                     1300                            1
5                   2                     1400                            1
6                   2                     1500                            1

However when I run the following code against it, I get an output table, but it consistently lists the same results for both 'groups' even though they are clearly different.


descriptives <- Compact_DAA1 %>% 
  dplyr::group_by(Compact_DAA1$DAA1_Dataset.Gender..binary.) %>% 
  dplyr::summarize(Mean = mean(Compact_DAA1$DAA1_Dataset.Monthly.Pay),
            Min = min(Compact_DAA1$DAA1_Dataset.Monthly.Pay),
            Max = max(Compact_DAA1$DAA1_Dataset.Monthly.Pay)

I know the 'dplyr::' shouldn't be required but I tried troubleshooting based on some other common errors.

As noted the output looks 'wrong' with the table as shown:

# A tibble: 2 × 4
  `Compact_DAA1$DAA1_Dataset.Gender..binary.` Mean     Min   Max  
                                        <int> <chr>    <chr> <chr>
1                                           0 1,850.00 800   2900 
2                                           1 1,850.00 800   2900

Any idea why it's not computing the correct values?

Welcome to the community @MarchAprl! The reason the results are the same is because "Compact_DAA1$" is included in the summarise() call. Since you are piping your statements together, you can drop the reference to the full data set. By including "Compact_DAA1$", the code is returning the mean/min/max for the entire data set instead of just the groups. I hope this helps.


Compact_DAA1 = data.frame(
  DAA1_Dataset.Tenure = c(1,1,1,1,2,2),
  DAA1_Dataset.Monthly.Pay = c(800, 1100, 1200, 1300, 100, 1500),
  DAA1_Dataset.Gender..binary. = c(0,0,1,1,1,1)

descriptives <- Compact_DAA1 %>% 
  dplyr::group_by(DAA1_Dataset.Gender..binary.) %>% 
  dplyr::summarize(Mean = mean(DAA1_Dataset.Monthly.Pay),
                   Min = min(DAA1_Dataset.Monthly.Pay),
                   Max = max(DAA1_Dataset.Monthly.Pay)

#> # A tibble: 2 × 4
#>   DAA1_Dataset.Gender..binary.  Mean   Min   Max
#>                          <dbl> <dbl> <dbl> <dbl>
#> 1                            0   950   800  1100
#> 2                            1  1025   100  1500

Created on 2023-01-22 with reprex v2.0.2.9000

We need to see your data. At the moment what you say is your data and your code have inconsistent column names.

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.

This worked perfectly, and was so simple. I know answering newb questions is annoying but thank you so much for your kindness and time.

I hope you have a lovely day, and while I'm unlikely to be able to pay you back (being internet mostly-strangers and all), I'll do my best to pay it forward today!

jrkrideau - thank you, I was doing my best to format the question using best practices, but this is very helpful. While I hope there isn't a next time, I will definitely keep this in mind for supplying better data!

You're welcome, and I'm glad it worked! Answering newb questions is never annoying. We're all here to learn and collectively expand our knowledge, so this was just one more opportunity to do that. Have a great day!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.