creating graphs with order ids

Hello,

I am working on a project and I have a dataset named olist_full which contains 112650 observations but I want to focus on 2 specific columns which are order_id and the order_item_id, the order_id which is the order unique identifier (one order, one id) and the order_item_id which indicates the number of items in each order.
it can be that 2 columns have the same orders id cause the same customer placed different orders.

What I am interested in is to build a graph with the quantity of orders in terms of year, months, day of the week....
However I used the order_item_id which biases the results cause it counts orders twice or three times as it does not consider the id but the number of items in the id, which may be more than 1 sometimes.

this is what I obtained by using the following code :

Quantitytimeyear <- ggplot(olist_full, aes(x=year, y=order_item_id)) +
geom_col(aes(x=factor(year), y=order_item_id), stat="identity") + labs(title ="Quantity of orders over time (in years)", y="Quantity of orders", x="Order date") +
theme(
plot.title = element_text(hjust = 0.5),
)

What I want to do is to create graphs which take into account the order id and not the items in the order, so I would have to translate the ids into numbers, how can I do that ?

Thank you

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Hello,
I'm sure you shared this image with the best intentions, but perhaps you didnt realise what it implies.
If someone wished to use example data to test code against, they would type it out from your screenshot...

This is very unlikely to happen, and so it reduces the likelihood you will receive the help you desire.
Therefore please see this guide on how to reprex data. Key to this is use of either datapasta, or dput() to share your data as code

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.