creating graphs with order ids

hello99999 · December 12, 2020, 10:42am

Hello,

I am working on a project and I have a dataset named olist_full which contains 112650 observations but I want to focus on 2 specific columns which are order_id and the order_item_id, the order_id which is the order unique identifier (one order, one id) and the order_item_id which indicates the number of items in each order.
it can be that 2 columns have the same orders id cause the same customer placed different orders.

What I am interested in is to build a graph with the quantity of orders in terms of year, months, day of the week....
However I used the order_item_id which biases the results cause it counts orders twice or three times as it does not consider the id but the number of items in the id, which may be more than 1 sometimes.

this is what I obtained by using the following code :

Quantitytimeyear <- ggplot(olist_full, aes(x=year, y=order_item_id)) +
geom_col(aes(x=factor(year), y=order_item_id), stat="identity") + labs(title ="Quantity of orders over time (in years)", y="Quantity of orders", x="Order date") +
theme(
plot.title = element_text(hjust = 0.5),
)

What I want to do is to create graphs which take into account the order id and not the items in the order, so I would have to translate the ids into numbers, how can I do that ?

Thank you

andresrcs · December 12, 2020, 11:39am

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

If you have stored the data set in some R object, dput function is very handy.
In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

nirgrahamuk · December 12, 2020, 11:53am

Hello,
I'm sure you shared this image with the best intentions, but perhaps you didnt realise what it implies.
If someone wished to use example data to test code against, they would type it out from your screenshot...

This is very unlikely to happen, and so it reduces the likelihood you will receive the help you desire.
Therefore please see this guide on how to reprex data. Key to this is use of either datapasta, or dput() to share your data as code

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

system · January 2, 2021, 11:53am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.