Hi all,
I am trying to sort a dataframe by several columns.
Two conditions need to be met:
- One or more columns may have the structure of a character string containing a mix of letters and numbers, in which case we want to first sort alphabetically and then numerically within the same group.
- The sorting by any additional column should follow the sorting style as is normally done by the arrange() function in the dplyr package: sorting within the group of the previous column.
Problem: arrange() only seems to sort some of the content containing letters and numbers in the right way. See the code below:
#Create data frame with three columns
treatment = c("a 75 mg", 'p 0 mg', 'b 1 mg/kg', 'b 100mg/kg',
'a 300 mg', 'b 0 mg/kg', 'a 1000 mg', 'a 300 mg')
study = c('01', '01', '02', '02', '04', '01', '03', '01')
patients = c(1, 10, 100, 3, 14, 5, 10, 3)
myData = data.frame(treatment, study, patients, stringsAsFactors = F)
> myData
treatment study patients
1 a 75 mg 01 1
2 p 0 mg 01 10
3 b 1 mg/kg 02 100
4 b 100mg/kg 02 3
5 a 300 mg 04 14
6 b 0 mg/kg 01 5
7 a 1000 mg 03 10
8 a 300 mg 01 3
# 1. Sort by the column that contains letters and numbers only
# dplyr arrange()
b = arrange(myData, treatment)
# Treatment group 'b' is sorted correctly in ascending order, but treatment
# group 'a' seems to be sorted in descending order.
> b
treatment study patients
1 a 1000 mg 03 10
2 a 300 mg 04 14
3 a 300 mg 01 3
4 a 75 mg 01 1
5 b 0 mg/kg 01 5
6 b 1 mg/kg 02 100
7 b 100mg/kg 02 3
8 p 0 mg 01 10
# Sort using mixedorder() (gtools) package gives the correct output
a = myData[mixedorder(myData$treatment),]
> a
treatment study patients
1 a 75 mg 01 1
5 a 300 mg 04 14
8 a 300 mg 01 3
7 a 1000 mg 03 10
6 b 0 mg/kg 01 5
3 b 1 mg/kg 02 100
4 b 100mg/kg 02 3
2 p 0 mg 01 10
# 2. Sort by several columns with arrange()
# Treatment group 'a' is not sorted correctly at all
# Treatment group 'b' is sorted correctly for columns and study (ascending),
# but not for patients
c = arrange(myData, treatment, study, patients)
> c
treatment study patients
1 a 1000 mg 03 10
2 a 300 mg 01 3
3 a 300 mg 04 14
4 a 75 mg 01 1
5 b 0 mg/kg 01 5
6 b 1 mg/kg 02 100
7 b 100mg/kg 02 3
8 p 0 mg 01 10
Is there a straighforward, simple way to sort the way I need to,
or am I simply missing a step somewhere?