Hello, I am new to R programming and struggling to solve what appears to be a simple problem in exploring a dataset. If anyone browsing has the knowledge and could share it with me that would be fantastic for my learning.
Here is my problem:
I am trying to sort unique values in the "article" column alphabetically. I tried using this:
bakery_sales[order(bakery_sales$article),]
This sorted my column alphabetically but I did not want to sort the entire dataset, only the unique values within the column.
So I added unique to the same code here but did not get the results I wanted:
It isn't totally clear to me whether you want to pull the article column out of the dataframe and sort the unique values, or just sort the whole dataframe according to the article column. However, neither of these are very difficult:
To do just the individual vector, you want to pull the vector out of the dataframe, remove the duplicate values, then sort it.
For the dataframe, you just use dplyr::arrange on the column you want to sort by.
Thank you so much. I tried a different approach and got what I needed by creating a new frame with what I needed but your solution was much more succinct and cleaner.
I did this;
items <- unique(bakery_sales$article)
and simply scrolled through the values to accomplish what I needed in terms of exploration but that lead me to a new question.
Is it best practice to create a new dataframe, list or table when doing something like this to be able to manipulate the data within or try to keep your data environment more condensed and simple in regards to amount of code?
That's a good question. And as with all good questions, the answer is: it depends. I can spell out my general thoughts, but I would need to know a bit more about what you are trying to do? Why are you looking at this data? What questions are you trying to answer? Or what problems are you trying to solve?
Personally, I like to stay with data.frames until the very end of your analysis. So if the question you are trying to answer is "how many unique article values are there?", then I would keep a data.frame until the last step, when you pull out the unique article values. But if this is an input to another step in your analysis, then I would keep the data.frame.
If you are trying to make the fastest possible solution, you want to keep as few objects in memory as possible - and the smaller they are the better (as a general rule). However, I would NOT recommend trying to optimize for speed/size until speed size become a problem. If you know that blazing speed is a requirement for your program upfront, you should probably consider choosing a different language than R.