I have created a table in R by merging data, which currently looks like the example below. The key thing here is that I have 3 rows for every ID: AM, IP and PM.
ID
Time
Speed
101
AM
8
101
PM
6
101
IP
9
102
N/A
N/A
102
N/A
N/A
102
N/A
N/A
103
AM
7
103
IP
6
103
PM
8
I want to convert this to a table where there is only 1 row for each ID and the corresponding data is stored in columns, i.e. like this:
ID
Time.x
Speed.x
Time.y
Speed.y
Time.z
Speed.z
101
AM
8
PM
6
IP
9
102
N/A
N/A
N/A
N/A
N/A
N/A
103
AM
7
PM
6
IP
8
I have gone about this so far by filtering the table in to 3 tables, i.e. so I have an AM, an IP and a PM table. Then I have merged AM to IP to produce an "AMIP" table, which I have then merged with PM to create the final table.
The problem with this is that where I have no data, such as for ID 103, this is then lost during the filtering process and I end up with this:
ID
Time.x
Speed.x
Time.y
Speed.y
Time.z
Speed.z
101
AM
8
PM
6
IP
9
103
AM
7
PM
6
IP
8
By the way, please note that sometimes there may be data for AM but not IP or PM, or any of those combinations, so for example the following could also be possible as the starting data:
ID
Time
Speed
101
AM
8
101
PM
6
101
IP
9
102
N/A
N/A
102
PM
8
102
N/A
N/A
103
AM
7
103
IP
6
103
PM
8
If someone could please advise how best I can convert the table whilst retaining all ID's I would be very grateful.
The end result isn't quite what you asked for, as its more succinct. I prefer to have 3 columns for the 3 possible speeds at the differing times, rather than 6 columns where 3 serve as a key to label the other 3
example_df <- function(intext) {
tf <- tempfile()
writeLines(intext, con = tf)
require(tidyverse)
as_tibble(read.delim(tf))
}
(df <- example_df("
ID Time Speed
101 AM 8
101 PM 6
101 IP 9
102 AM 7
102 N/A N/A
102 N/A N/A
103 AM 7
103 IP 6
103 PM 8") %>%
mutate(Speed=as.integer(Speed)) %>%
filter(!is.na(Speed)) %>%
distinct)
pivot_wider(df,id_cols = ID,
names_from = Time,
names_prefix = "Speed_at_time_",
values_from = Speed
)
# A tibble: 3 x 4
ID Speed_at_time_AM Speed_at_time_PM Speed_at_time_IP
<int> <int> <int> <int>
1 101 8 6 9
2 102 7 NA NA
3 103 7 8 6
Fantastic, thanks very much for that. Yes, I did want to lose the redundant columns too - I was going to look at that afterwards, so this is even better as it does both at once.
I'm just going through this code and could use a couple of pointers please:
What exactly are the first few lines doing? Are they creating a function called example_df that will then rewrite the table?
Do we need to include the table itself in the code, i.e. the section in red? I'm confused here because I wouldn't have thought we would, but it's not hashed out and is built into the the previous function we just created.
I still can't get my head around "%>%". So far I have used mutate such as: example_table <- mutate(example_table, Speed = (Distance/Time). So I'm having trouble understanding this part. I know that it "pipes functions", but I'm not sure what that means!
As far as I can see, pivot_wider is the part of the code that actually converts the rows to columns, I just don't understand how it relates to the previous sections as it only seems to refer to df?
Apologies for the basic questions, I'm still trying to get up to speed with R. I'm have googled a lot and tried tinkering with the code but I'm still a bit lost with these parts.
You did not the forum with a way to manipulate your data with our code.
I bridged this gap by pasting your text into my function, to turn it into a table.
obviously if you have your table already, then you don't need this...
I just means that the symbol on the left of the %>% becomes the first parameter sent to the function on the right of the %>%
i.e. a %>% f(b)
is f(a,b)
which helps when you would be nesting function calls
i.e. a %>% f(b) %>% g(c)
is easier to read than g(f(a,b),c)
df is the representation of the data that pivot_wider is to act on. Feel free to ask more questions if you are still unsure.