Pivot table logic in R

VisualisationNewbie1 · March 29, 2021, 8:20am

Hello, what would your recommended chart type be for this number of variables? The requirement is to show in terms of quantity ALL products sold in Portugal. I have the below query to pull the results - but do not know the best way to present circa 700 items!

data <- read.csv("onlineretail1.csv")
data_portugal <- data[data$Country == "Portugal",]
library(dplyr)
data_portugal %>% group_by(Description) %>% summarise(Sold_Qty = sum(Quantity)) ->first_plot

Appreciate your help in advance.

nirgrahamuk · March 29, 2021, 8:51am

In my experience, not all "requirements" are created equal. I expect poor requirements , even when satisfied to result in unsatisfactory outcomes. I always challenge requirements.

Part of being an expert is reflecting back your expertise to your stakeholders as to why they shouldn't want what they first said they want.

It's actually better to understand the rational and motives behind original requirements, as that allows collaboration towards agreeing fresh requirements that are more likely to satisfy an actual need.

VisualisationNewbie1 · March 29, 2021, 10:54am

I completely agree and following conversations with the business about exactly how unwieldy this visual will be, we have discussed additional data categories that can be drilled down. However, in order to make the point I have been asked to still create the visual and I am at a loss, except for have a long, long bar chart

edgararuiz · March 29, 2021, 1:07pm

Another idea would be to see if your top most sold products make up the lion share of the sales. For example, you could list your top 25 products individually, and then group everything else into an "other" label.

FJCC · March 29, 2021, 1:34pm

I would use a line chart if the individual points must be shown and I suggest using a histogram if the range of values is not too large.

library(ggplot2)
DF <- data.frame(Prod = paste0("P", 1:700), Sales = runif(700, 100, 1000))
ggplot(DF, aes(Prod, Sales, group = 1)) + geom_line() +
  theme(axis.text.x = element_blank())


ggplot(DF, aes(Sales)) + geom_histogram(binwidth = 100, fill = "skyblue", color = "white")

^{Created on 2021-03-29 by the reprex package (v0.3.0)}

VisualisationNewbie1 · March 29, 2021, 2:37pm

On a similar line of thought are you aware of any functions that could 'group' items based on common occurrences of a word? For example, 'mug' or 'cake'?

edgararuiz · March 29, 2021, 7:51pm

Oh man, I do! @julia 's and @drob 's super-cool tidytext package will let you separate the words in each description into their own record. So you can have a table with a product_id and word variables, that you can then analyze and decide which product IDs you wish to group. 1 The tidy text format | Text Mining with R

VisualisationNewbie1 · March 31, 2021, 9:47am

In reading up around this I have also come across k means clustering - if anyone has any exposure to this, would it be suitable for this please?

system · April 21, 2021, 9:48am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.