Creating 3 new columns specified by their ID from another varialbe

the_chid · December 6, 2021, 9:02pm

I am working on a school project regarding forest types and some measurements. I have the Site_ID & measurement as two columns, but within Site_ID there are only 3 options. We'll call them 1, 2, 3. I just need to make three new columns, with their measurements sorted by Site_1, Site_2, Site_3. The end goal for all of this is to perform an aov().

I'm still not sure of the formatting on this site so forgive me for this, but lets say I have a table like this:

Site_ID ------ Measurement
1 ------- 34
1 --------3434
1--------34324
2 ------ 34324
2 ------ 6768
2-----4534
3 ------ 657467
3 ----- 4346
3------ 345234

I just need it to be sorted by Site_ID:

Site_1 ------- Site_2 ------ Site_3
34 ---------- 34324 ----------657467
3434 --------- 6768 -------- 4346
34324 --------- 4534 ----- 345234

All of this sounds super doable to me, as in I remember doing it in my R class, but I've scrubbed through my notes and cannot figure it out. I could've sworn it had to do with pivot_longer() or left_join() or something like that, but it's not seeming to work. Thanks!

arthur.t · December 6, 2021, 9:19pm

You can try a pivot_wider using the names_from, values_from, names_prefix arguments that are appropriate to the names of your data frame.

However, I think you need another column that is a second id. There's nothing in that example that connects the 34 to the 34324 to the 657467.

williaml · December 6, 2021, 9:19pm

Hi, you could use pivot_wider:

library(dplyr)

df <- tibble(site = c(rep(1, 3), rep(2, 3), rep(3,3)),
             measurement = sample(10, 9))

# A tibble: 9 × 2
   site measurement
  <dbl>       <int>
1     1           6
2     1          10
3     1           7
4     2           5
5     2           3
6     2           8
7     3           9
8     3           1
9     3           4

df %>% 
  pivot_wider(everything(), names_from = site, names_prefix = "site_", values_from = measurement) %>% 
  unnest()

# A tibble: 3 × 3
  site_1 site_2 site_3
   <int>  <int>  <int>
1      6      5      9
2     10      3      1
3      7      8      4

the_chid · December 6, 2021, 9:20pm

Oh, those numbers are not supposed to be connected. the "----" is more like the line separating columns in a table. More like:

Site_1 | Site_2 | Site_3
34 | 34324 | 657467

Thanks for your help!

the_chid · December 6, 2021, 9:26pm

This seemed really promising, but I did get some errors. FYI, "BA_cm2_ha" is my measurement

+   pivot_wider(everything(), names_from = Site_ID, names_prefix = "site_", values_from = BA_cm2_ha) %>% 
+   unnest()
Error: In row 1, can't recycle input of size 165 to size 116.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 
2: `cols` is now required when using unnest().
Please use `cols = c(site_BL, site_Mixed_C, site_MIXED_BL_C, site_NA)` 
>

It felt like unnest() was causing a problem, so I tried it again without:

> FDaov %>%
+   pivot_wider(everything(), names_from = Site_ID, names_prefix = "site_", values_from = BA_cm2_ha)
# A tibble: 1 x 4
  site_BL     site_Mixed_C site_MIXED_BL_C site_NA   
  <list>      <list>       <list>          <list>    
1 <dbl [165]> <dbl [116]>  <dbl [111]>     <dbl [26]>
Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates

the_chid · December 6, 2021, 9:27pm

Is it having problems because my sites do not have an equal number of measurements?

arthur.t · December 6, 2021, 9:35pm

I still think you need another column that is a second id. There's nothing to define a row.

If there is not a second id, then a data frame is not a meaningful way to store your data, and there is no "observation" to do a lm().

the_chid · December 6, 2021, 9:51pm

Are you referring to how the row data needs an Id? In this case each number is cm^2/hectare. I guess this is more of a list than a table. Maybe I am misunderstanding the aov() function.

I am basing it off of this tutorial: Anova in Excel (In Easy Steps)

I know this can be done in excel, but I'd rather do it in R and I assumed I could just make the three columns like in the excel example. I'll go back and review aov() from my class.

the_chid · December 6, 2021, 10:00pm

I get it now. You are saying that I need to have both predictors and response variables form my analysis, correct? Currently I'm trying to get 3 columns of data to see how they relate, but I have no response variable?

Actually I'm not sure if that's right either.

I'll give some input on my experiment, this may help:

We are trying to see how if tree base size is affected by the type of forest it is in. Site_1 is one type of forest with primarily one type of tree, Site_2 is another with a small mix of trees, and Site_3 is a forest with a larger mix of trees.

That's really all the data I have, and I'm not sure sure what other variables I should have in there

the_chid · December 6, 2021, 10:15pm

One last update (I think):

I figured out how to do the ANOVA with my data in excel. I got a p-value of ~0.086, which is the same p-value when I did this in R yesterday:

FD.aov <- aov(BA_cm2 ~ Site_ID, data=forest_data)

So, I'm fairly certain I did it correctly in R all along and there was no need to split the data into different columns for the aov() function in R. Thanks all!

system · December 27, 2021, 10:15pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.