Hi,
It's a lot to ask, I know but I'm relatively new to R and I have some complicated data cleaning to do that I've been stuck on for a while.
Consider the example data that includes data for 3 different individuals:
i_pid a_lab_id a_bacteria a_source a_sample_host a_date a_amocla_mic a_amocla_result a_ampicillin_mic a_ampicillin_result
CA036HT041S 920707EMB1 ECOLI STO HUM 12/12/2020 15:52 <= 4 SUS <= 4 SUS
CA036HT041S 920707EMB2 ECOLI STO HUM 12/12/2020 15:53 8 SUS <= 4 SUS
CA036HT041S 920707EMB3 ECOLI STO HUM 12/12/2020 15:54 <= 4 SUS <= 4 SUS
CA036HT041S 920707EMB4 ECOLI STO HUM 12/12/2020 15:54 <= 4 RES <= 4 SUS
CA036HT041S 920707EMB5 ECOLI STO HUM 12/12/2020 15:55 <= 4 SUS <= 4 SUS
CA036HT041S 920707ENT1 EFAEM STO HUM 9/1/2021 12:26 2 SUS
CA036HT041S 920707ENT2 EFAEM STO HUM 9/1/2021 12:27 2 INT
CA036HT041S 920707ENT3 EFAEM STO HUM 9/1/2021 12:27 2 RES
CA036HT041S 920707ENT4 EFAEM STO HUM 9/1/2021 12:28 2 SUS
CA036HT041S 920707ENT5 EFAEM STO HUM 9/1/2021 12:28 2 SUS
CA037HT041S 920710KBS1 KPNEU STO HUM 19/12/2020 17:17 8 SUS 64 RES
CA037HT041S 920710KBS2 KPNEU STO HUM 19/12/2020 17:18 8 SUS 64 RES
CA037HT041S 920710KBS3 KPNEU STO HUM 19/12/2020 17:18 8 SUS 32 RES
CA037HT041S 920710KBS4 KPNEU STO HUM 19/12/2020 17:19 64 RES > 128 RES
CA037HT041S 920710KBS5 KPNEU STO HUM 19/12/2020 17:19 8 SUS 32 RES
CB011HA041N 930711STA1 SAURS NAS HUM 16/1/2021 14:54 1 NOCLSI
CB011HA041N 930711STA2 SAURS NAS HUM 16/1/2021 14:54 1 NOCLSI
CB011HA041N 930711STA3 SAURS NAS HUM 16/1/2021 14:53 1 NOCLSI
CB011HA041N 930711STA4 SAURS NAS HUM 16/1/2021 14:52 1 NOCLSI
CB011HA041N 930711STA5 SAURS NAS HUM 16/1/2021 14:52 1 NOCLSI
There are multiple isolates/'tests' for individuals done as repeats. I want to merge the data for each individual such that my data will show only the highest level of resistance for each measure (antibiotic) in the order of NOCLSI < SUS < INT < RES. The final data should be a significantly smaller data frame (in this case, it should be 3 rows, where the first 5 characters of 'i_pid' identifies the individual) that merges all the rows of each person into one row, with the corresponding 'mic' number to the cell with the highest level of resistance. In this dataset, every 2 columns correspond to each other - e.g. 'a_amocla_mic' corresponds to 'a_amocla_result'.
How do I do this!!!?
I barely have a code because I'm that confused, so I'm not even going to bother showing it. Thanks so much in advance, any help would be appreciated