Optimal graphical representation

EO_1997 · August 24, 2021, 1:02pm

Hi everybody,

sorry for my bad english in advance. I hope that I can explain my problem as much as possible.

I am a beginner in R-Studio therefor I need your help

My records are the following survey:
participants: approx. 1200
Questions: 13 with a scale of 1 - 7 (1= very unimportant - 7= very important). Different ways to shop (for example: internet, shopping mall, catalogue, discounter etc...
1 with their age from 10 - 95 years.

I have to show that the importance of shopping is based on age. (example: younger people preferred online, older people in city shops.)

Now is my questions.
How can I represent this graphically without it becoming cluttered?

I tried it with:
ggplot(data_xls, aes(x=D1_Age, color=X4.1_online, fill=X4.1_Online)) +
geom_histogram(binwidth = 0.5, position = "stack") +....

But there is just one of the 13 possibilities listed.
Ist there any possibility to show all 13 options in one or two graphic representation?

Thanks for your help and ideas!

If you need further information, I can try to provide it.

mattbrown · August 24, 2021, 1:44pm

I'm not certain I completely understand your data. If my answer doesn't make sense you could try providing some sample data just so it's clear what you're working with - fake values are fine, we just need to see the structure.

Having said that, I think it would help if you had your data in a long format instead of wide. Something like the example below. This often makes it easier to work with ggplot.

The following might work for you once you have your data in long format.

put your ages into groups (10-19, 20-29, etc.)
use boxplots to show the score distributions for the age groups
use facet_wrap() or facet_grid() to see the various shopping types together

Here is an example of what I mean.

library(dplyr)
library(ggplot2)

set.seed(314159)

# FAKE DATA CREATION
dat <- data.frame(score = sample(1:7, 1200, replace = TRUE),
                  type = sample(c("internet", 
                                  "mall", 
                                  "catalogue", 
                                  sprintf("type_%02d", 4:13)), 
                                1200, replace = TRUE),
                  age = sample(10:95, 1200, replace = TRUE))

# turn age into age GROUPS for easier plotting/comparing
dat2 <- dat %>% 
  mutate(age_group = case_when(
    age >= 70 ~ "70+",
    age >= 60 ~ "60-69",
    age >= 50 ~ "50-59",
    age >= 40 ~ "40-49",
    age >= 30 ~ "30-39",
    age >= 20 ~ "20-29",
    age >= 10 ~ "10-19",
    TRUE ~ "--ERROR--"
  ))

ggplot(dat2,
       aes(x = age_group,
           y = score)) +
  geom_boxplot() +
  facet_wrap(.~type)

samplesmall

EO_1997 · August 24, 2021, 3:06pm

Thank you for your response. That looks very well.
Ages into groups is an good idea.

But I I have problems to put my datas into your example. I don't have a additional data like "score".
The score is included in the shopping possibility. I try to provide something.

Here you can see my datas as an example. (I hope It works)

X4.1_Online	X4.2_Mall	X4.3_Discounter	X4.4_market	X4.5_Outlet	X4.6_	X4.7_	X4.8_	X4.9_	X4.10_	X4.11_	X4.12_	X4.13_	D1_Age
7	5	5	6	6	1	1	1	1	5	4	7	7	26
5	4	4	5	1	7	1	1	1	6	2	2	7	40
7	2	2	4	2	3	3	2	1	5	5	4	5	25
7	5	5	5	6	6	5	4	1	5	4	6	7	56
4	1	1	4	6	1	1	1	1	1	2	4	1	28
1	4	4	4	1	1	1	1	5	2	1	1	4	72
2	6	1	7	7	2	4	1	1	1	1	5	4	60
1	3	3	2	2	3	4	5	7	4	7	5	7	60
6	1	1	4	4	1	3	1	1	4	1	3	5	23

X_4.1 - X_4.13 show the different possibilities and the numbers below are the importance for the interviewed person.

How can I get this into your "box plot - overview".

Thank you for your help!!

mattbrown · August 24, 2021, 3:48pm

You can use the pivot_longer() function from the tidyr package.

Starting from the data you provided, you can do something like below.

library(dplyr)
library(tidyr)
library(ggplot2)

raw_data <- data.frame(
      X4.1_Online = c(7L, 5L, 7L, 7L, 4L, 1L, 2L, 1L, 6L),
        X4.2_Mall = c(5L, 4L, 2L, 5L, 1L, 4L, 6L, 3L, 1L),
  X4.3_Discounter = c(5L, 4L, 2L, 5L, 1L, 4L, 1L, 3L, 1L),
      X4.4_market = c(6L, 5L, 4L, 5L, 4L, 4L, 7L, 2L, 4L),
      X4.5_Outlet = c(6L, 1L, 2L, 6L, 6L, 1L, 7L, 2L, 4L),
            X4.6_ = c(1L, 7L, 3L, 6L, 1L, 1L, 2L, 3L, 1L),
            X4.7_ = c(1L, 1L, 3L, 5L, 1L, 1L, 4L, 4L, 3L),
            X4.8_ = c(1L, 1L, 2L, 4L, 1L, 1L, 1L, 5L, 1L),
            X4.9_ = c(1L, 1L, 1L, 1L, 1L, 5L, 1L, 7L, 1L),
           X4.10_ = c(5L, 6L, 5L, 5L, 1L, 2L, 1L, 4L, 4L),
           X4.11_ = c(4L, 2L, 5L, 4L, 2L, 1L, 1L, 7L, 1L),
           X4.12_ = c(7L, 2L, 4L, 6L, 4L, 1L, 5L, 5L, 3L),
           X4.13_ = c(7L, 7L, 5L, 7L, 1L, 4L, 4L, 7L, 5L),
           D1_Age = c(26L, 40L, 25L, 56L, 28L, 72L, 60L, 60L, 23L)
) %>% 
  # add an ID column for clarity when looking at the long data - not technically needed
  mutate(id = 1:n())


long_data <- raw_data %>% 
  # you can list all the relevant columns (the ones that have scores in them) 
  # in a vector of strings or do something like this. I'm grabbing all those 
  # with "X4" since that works for your sample data but you may need to change this. 
  pivot_longer(cols = names(.)[grepl("X4", names(.))],
               names_to = "type",
               values_to = "score") %>% 
  # group the ages
  mutate(age_group = case_when(
    D1_Age >= 70 ~ "70+",
    D1_Age >= 60 ~ "60-69",
    D1_Age >= 50 ~ "50-59",
    D1_Age >= 40 ~ "40-49",
    D1_Age >= 30 ~ "30-39",
    D1_Age >= 20 ~ "20-29",
    D1_Age >= 10 ~ "10-19",
    TRUE ~ "--ERROR--"
  ))

ggplot(long_data,
       aes(x = age_group,
           y = score)) +
  geom_boxplot() +
  facet_wrap(.~type)

system · September 14, 2021, 3:49pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

X4.1_Online	X4.2_Mall	X4.3_Discounter	X4.4_market	X4.5_Outlet	X4.6_	X4.7_	X4.8_	X4.9_	X4.10_	X4.11_	X4.12_	X4.13_	D1_Age
7	5	5	6	6	1	1	1	1	5	4	7	7	26
5	4	4	5	1	7	1	1	1	6	2	2	7	40
7	2	2	4	2	3	3	2	1	5	5	4	5	25
7	5	5	5	6	6	5	4	1	5	4	6	7	56
4	1	1	4	6	1	1	1	1	1	2	4	1	28
1	4	4	4	1	1	1	1	5	2	1	1	4	72
2	6	1	7	7	2	4	1	1	1	1	5	4	60
1	3	3	2	2	3	4	5	7	4	7	5	7	60
6	1	1	4	4	1	3	1	1	4	1	3	5	23

X4.1_Online	X4.2_Mall	X4.3_Discounter	X4.4_market	X4.5_Outlet	X4.6_	X4.7_	X4.8_	X4.9_	X4.10_	X4.11_	X4.12_	X4.13_	D1_Age
7	5	5	6	6	1	1	1	1	5	4	7	7	26
5	4	4	5	1	7	1	1	1	6	2	2	7	40
7	2	2	4	2	3	3	2	1	5	5	4	5	25
7	5	5	5	6	6	5	4	1	5	4	6	7	56
4	1	1	4	6	1	1	1	1	1	2	4	1	28
1	4	4	4	1	1	1	1	5	2	1	1	4	72
2	6	1	7	7	2	4	1	1	1	1	5	4	60
1	3	3	2	2	3	4	5	7	4	7	5	7	60
6	1	1	4	4	1	3	1	1	4	1	3	5	23

X4.1_Online	X4.2_Mall	X4.3_Discounter	X4.4_market	X4.5_Outlet	X4.6_	X4.7_	X4.8_	X4.9_	X4.10_	X4.11_	X4.12_	X4.13_	D1_Age
7	5	5	6	6	1	1	1	1	5	4	7	7	26
5	4	4	5	1	7	1	1	1	6	2	2	7	40
7	2	2	4	2	3	3	2	1	5	5	4	5	25
7	5	5	5	6	6	5	4	1	5	4	6	7	56
4	1	1	4	6	1	1	1	1	1	2	4	1	28
1	4	4	4	1	1	1	1	5	2	1	1	4	72
2	6	1	7	7	2	4	1	1	1	1	5	4	60
1	3	3	2	2	3	4	5	7	4	7	5	7	60
6	1	1	4	4	1	3	1	1	4	1	3	5	23