I've got some data sets (about 30 at the moment) that I'm trying to combine into 1 data table for analysis. Unfortunately several of the sets have a different number of observations from the others, so I'm trying to combine them and fill in with blanks or NA those observations missing from some data sets.
At present, each set consists of 3 columns, the first 2 being a reference code and a description and the 3rd being the relevant data. so a typical data set would look like:
Group, Element Description Data
20001 FileMetaInformationVersion 2 bytes - 00 01
20002 MediaStorageSOPClassUID 1378549247
(and so on for a varying number of rows. Anyone who works in Radiology probably has an idea of what I'm trying to do here!).
So what I'd like to do is combine the data sets so that each data set matches up with the rows that they have in common (defined by Group, Element & Description column values), but where a set is missing a row that is present in another set, that value is added in with NA.
I'm assuming it might be handy to rotate the table so Group, Value becomes the header and then each set fills in a row of observations underneath.
Would anyone have any tips about how to go about this?
(Note: I can't get the oro.dicom package to install, which is a shame as I think it may have functions that do this!)
Also, just for fun! I've found that some of them aren't in the same order. So while it would seem obvious that each dataset is ordered by "Group, Element" descending, some of them aren't...
I've started trying that, unfortunately combining 2 data frames (each 222 obs of 3 variables, with a few NA observations in each) is returning a joined data frame with 4034 obs.
Some of the values in Description are repeated which I think is confusing it! What I might do is combine the columns "Group, Element" and "Description" into a unique identifier for each observation (that I can then hopefully merge by without lots of duplicate values!), and perhaps split the column back into 2 after.
Looking forward to it. The first reprex is always the hardest.
And if this is your usual data, you might be interested in attending R/Medicine 2020. Check out the program for R/Medicine 2019 here