Creating 3 new columns specified by their ID from another varialbe

I am working on a school project regarding forest types and some measurements. I have the Site_ID & measurement as two columns, but within Site_ID there are only 3 options. We'll call them 1, 2, 3. I just need to make three new columns, with their measurements sorted by Site_1, Site_2, Site_3. The end goal for all of this is to perform an aov().

I'm still not sure of the formatting on this site so forgive me for this, but lets say I have a table like this:

Site_ID ------ Measurement
1 ------- 34
1 --------3434
1--------34324
2 ------ 34324
2 ------ 6768
2-----4534
3 ------ 657467
3 ----- 4346
3------ 345234

I just need it to be sorted by Site_ID:

Site_1 ------- Site_2 ------ Site_3
34 ---------- 34324 ----------657467
3434 --------- 6768 -------- 4346
34324 --------- 4534 ----- 345234

All of this sounds super doable to me, as in I remember doing it in my R class, but I've scrubbed through my notes and cannot figure it out. I could've sworn it had to do with pivot_longer() or left_join() or something like that, but it's not seeming to work. Thanks!

You can try a pivot_wider using the names_from, values_from, names_prefix arguments that are appropriate to the names of your data frame.

However, I think you need another column that is a second id. There's nothing in that example that connects the 34 to the 34324 to the 657467.

2 Likes

Hi, you could use pivot_wider:

library(dplyr)

df <- tibble(site = c(rep(1, 3), rep(2, 3), rep(3,3)),
             measurement = sample(10, 9))

# A tibble: 9 × 2
   site measurement
  <dbl>       <int>
1     1           6
2     1          10
3     1           7
4     2           5
5     2           3
6     2           8
7     3           9
8     3           1
9     3           4

df %>% 
  pivot_wider(everything(), names_from = site, names_prefix = "site_", values_from = measurement) %>% 
  unnest()

# A tibble: 3 × 3
  site_1 site_2 site_3
   <int>  <int>  <int>
1      6      5      9
2     10      3      1
3      7      8      4
1 Like

Oh, those numbers are not supposed to be connected. the "----" is more like the line separating columns in a table. More like:

Site_1 | Site_2 | Site_3
34 | 34324 | 657467

Thanks for your help!

This seemed really promising, but I did get some errors. FYI, "BA_cm2_ha" is my measurement

+   pivot_wider(everything(), names_from = Site_ID, names_prefix = "site_", values_from = BA_cm2_ha) %>% 
+   unnest()
Error: In row 1, can't recycle input of size 165 to size 116.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 
2: `cols` is now required when using unnest().
Please use `cols = c(site_BL, site_Mixed_C, site_MIXED_BL_C, site_NA)` 
> 

It felt like unnest() was causing a problem, so I tried it again without:

> FDaov %>%
+   pivot_wider(everything(), names_from = Site_ID, names_prefix = "site_", values_from = BA_cm2_ha)
# A tibble: 1 x 4
  site_BL     site_Mixed_C site_MIXED_BL_C site_NA   
  <list>      <list>       <list>          <list>    
1 <dbl [165]> <dbl [116]>  <dbl [111]>     <dbl [26]>
Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 

Is it having problems because my sites do not have an equal number of measurements?

I still think you need another column that is a second id. There's nothing to define a row.

If there is not a second id, then a data frame is not a meaningful way to store your data, and there is no "observation" to do a lm().

Are you referring to how the row data needs an Id? In this case each number is cm^2/hectare. I guess this is more of a list than a table. Maybe I am misunderstanding the aov() function.

I am basing it off of this tutorial: Anova in Excel (In Easy Steps)

I know this can be done in excel, but I'd rather do it in R and I assumed I could just make the three columns like in the excel example. I'll go back and review aov() from my class.

I get it now. You are saying that I need to have both predictors and response variables form my analysis, correct? Currently I'm trying to get 3 columns of data to see how they relate, but I have no response variable?

Actually I'm not sure if that's right either.

I'll give some input on my experiment, this may help:

We are trying to see how if tree base size is affected by the type of forest it is in. Site_1 is one type of forest with primarily one type of tree, Site_2 is another with a small mix of trees, and Site_3 is a forest with a larger mix of trees.

That's really all the data I have, and I'm not sure sure what other variables I should have in there

One last update (I think):

I figured out how to do the ANOVA with my data in excel. I got a p-value of ~0.086, which is the same p-value when I did this in R yesterday:

FD.aov <- aov(BA_cm2 ~ Site_ID, data=forest_data)

So, I'm fairly certain I did it correctly in R all along and there was no need to split the data into different columns for the aov() function in R. Thanks all!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.