I am working on a school project regarding forest types and some measurements. I have the Site_ID & measurement as two columns, but within Site_ID there are only 3 options. We'll call them 1, 2, 3. I just need to make three new columns, with their measurements sorted by Site_1, Site_2, Site_3. The end goal for all of this is to perform an aov().
I'm still not sure of the formatting on this site so forgive me for this, but lets say I have a table like this:
All of this sounds super doable to me, as in I remember doing it in my R class, but I've scrubbed through my notes and cannot figure it out. I could've sworn it had to do with pivot_longer() or left_join() or something like that, but it's not seeming to work. Thanks!
This seemed really promising, but I did get some errors. FYI, "BA_cm2_ha" is my measurement
+ pivot_wider(everything(), names_from = Site_ID, names_prefix = "site_", values_from = BA_cm2_ha) %>%
+ unnest()
Error: In row 1, can't recycle input of size 165 to size 116.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
2: `cols` is now required when using unnest().
Please use `cols = c(site_BL, site_Mixed_C, site_MIXED_BL_C, site_NA)`
>
It felt like unnest() was causing a problem, so I tried it again without:
> FDaov %>%
+ pivot_wider(everything(), names_from = Site_ID, names_prefix = "site_", values_from = BA_cm2_ha)
# A tibble: 1 x 4
site_BL site_Mixed_C site_MIXED_BL_C site_NA
<list> <list> <list> <list>
1 <dbl [165]> <dbl [116]> <dbl [111]> <dbl [26]>
Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Are you referring to how the row data needs an Id? In this case each number is cm^2/hectare. I guess this is more of a list than a table. Maybe I am misunderstanding the aov() function.
I know this can be done in excel, but I'd rather do it in R and I assumed I could just make the three columns like in the excel example. I'll go back and review aov() from my class.
I get it now. You are saying that I need to have both predictors and response variables form my analysis, correct? Currently I'm trying to get 3 columns of data to see how they relate, but I have no response variable?
Actually I'm not sure if that's right either.
I'll give some input on my experiment, this may help:
We are trying to see how if tree base size is affected by the type of forest it is in. Site_1 is one type of forest with primarily one type of tree, Site_2 is another with a small mix of trees, and Site_3 is a forest with a larger mix of trees.
That's really all the data I have, and I'm not sure sure what other variables I should have in there
I figured out how to do the ANOVA with my data in excel. I got a p-value of ~0.086, which is the same p-value when I did this in R yesterday:
FD.aov <- aov(BA_cm2 ~ Site_ID, data=forest_data)
So, I'm fairly certain I did it correctly in R all along and there was no need to split the data into different columns for the aov() function in R. Thanks all!