Converting multiple rows to one row and a column for each unique field

Glove303 · August 24, 2020, 12:21pm

I have created a table in R by merging data, which currently looks like the example below. The key thing here is that I have 3 rows for every ID: AM, IP and PM.

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	N/A	N/A
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

I want to convert this to a table where there is only 1 row for each ID and the corresponding data is stored in columns, i.e. like this:

ID	Time.x	Speed.x	Time.y	Speed.y	Time.z	Speed.z
101	AM	8	PM	6	IP	9
102	N/A	N/A	N/A	N/A	N/A	N/A
103	AM	7	PM	6	IP	8

I have gone about this so far by filtering the table in to 3 tables, i.e. so I have an AM, an IP and a PM table. Then I have merged AM to IP to produce an "AMIP" table, which I have then merged with PM to create the final table.

The problem with this is that where I have no data, such as for ID 103, this is then lost during the filtering process and I end up with this:

ID	Time.x	Speed.x	Time.y	Speed.y	Time.z	Speed.z
101	AM	8	PM	6	IP	9
103	AM	7	PM	6	IP	8

By the way, please note that sometimes there may be data for AM but not IP or PM, or any of those combinations, so for example the following could also be possible as the starting data:

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	PM	8
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

If someone could please advise how best I can convert the table whilst retaining all ID's I would be very grateful.

nirgrahamuk · August 24, 2020, 1:34pm

The end result isn't quite what you asked for, as its more succinct. I prefer to have 3 columns for the 3 possible speeds at the differing times, rather than 6 columns where 3 serve as a key to label the other 3

example_df <- function(intext) {
tf <- tempfile()
writeLines(intext, con = tf)
require(tidyverse)
as_tibble(read.delim(tf))
}
(df <- example_df("
ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	AM	7
102	N/A	N/A
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8") %>% 
mutate(Speed=as.integer(Speed)) %>% 
filter(!is.na(Speed)) %>%
 distinct)

pivot_wider(df,id_cols = ID,
           names_from = Time,
           names_prefix = "Speed_at_time_",
            values_from = Speed
           )

# A tibble: 3 x 4
     ID Speed_at_time_AM Speed_at_time_PM Speed_at_time_IP
  <int>            <int>            <int>            <int>
1   101                8                6                9
2   102                7               NA               NA
3   103                7                8                6

Glove303 · August 24, 2020, 1:52pm

Fantastic, thanks very much for that. Yes, I did want to lose the redundant columns too - I was going to look at that afterwards, so this is even better as it does both at once.

Glove303 · August 26, 2020, 12:53pm

I'm just going through this code and could use a couple of pointers please:

What exactly are the first few lines doing? Are they creating a function called example_df that will then rewrite the table?
Do we need to include the table itself in the code, i.e. the section in red? I'm confused here because I wouldn't have thought we would, but it's not hashed out and is built into the the previous function we just created.
I still can't get my head around "%>%". So far I have used mutate such as: example_table <- mutate(example_table, Speed = (Distance/Time). So I'm having trouble understanding this part. I know that it "pipes functions", but I'm not sure what that means!
As far as I can see, pivot_wider is the part of the code that actually converts the rows to columns, I just don't understand how it relates to the previous sections as it only seems to refer to df?

Apologies for the basic questions, I'm still trying to get up to speed with R. I'm have googled a lot and tried tinkering with the code but I'm still a bit lost with these parts.

nirgrahamuk · August 26, 2020, 1:13pm

You did not the forum with a way to manipulate your data with our code.
I bridged this gap by pasting your text into my function, to turn it into a table.
obviously if you have your table already, then you don't need this...

I just means that the symbol on the left of the %>% becomes the first parameter sent to the function on the right of the %>%
i.e.
a %>% f(b)
is
f(a,b)

which helps when you would be nesting function calls
i.e.
a %>% f(b) %>% g(c)
is easier to read than
g(f(a,b),c)

df is the representation of the data that pivot_wider is to act on. Feel free to ask more questions if you are still unsure.

system · September 2, 2020, 1:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	N/A	N/A
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	PM	8
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	N/A	N/A
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	PM	8
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	N/A	N/A
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8

ID	Time	Speed
101	AM	8
101	PM	6
101	IP	9
102	N/A	N/A
102	PM	8
102	N/A	N/A
103	AM	7
103	IP	6
103	PM	8