complet beginner; Species number Dataframe

matschelarmando · December 22, 2022, 12:40pm

Hey guys,

I am completely new to Rstudio and statistical/data-analysis. For a new project I need to analyse data with Rstudio and I am lost... I was able to introduce myself in some points of Rstudio like readin data (xlsx etc..), how to work with perfect dataframes I got from people who already finished theri script, and some other things. But when it comes to own data and differently structured dataframes, nothing works from what they did...
Now i would like to count number of insect species on different plots, I determined myself (which means that I created the xlsx-sheet myself).

I have a dataframe with 576 rows and 6 columns. Column 1 is the plot ID (for example: Greece1) the following columns refere to the insects: Col 2 is family, Col 3 is sub-family, Col 4 is genus, Col 5 is subgenus and Col 6 is the species. For my question now I will only need the plot ID and Col4 and Col6 (Col4 and Col6 together are the insect species).

What I would like to do now is to count how much species I have per PlotID... and I have no idea how...
WOuld you have some advice or ideas? Do I need to change the structure of my xlsx? If so, how?
All the data in the columns are names like Greece1 or Genus is Stenolophus and species teutonus. Do I need to make them as.numeric or as.factors? I read about those functions but I did not fully understand them.

I hope that I don't ask to much, but as I am lost in Rstudio I have no idea how to continue..

Greetings
matschelarmando

andresrcs · December 22, 2022, 4:30pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

technocrat · December 22, 2022, 6:43pm

A good introduction to R is R for Data Science. There are now hundreds of books and online tutorials and R is so vast that one can't really "know" it. Take a good guide like this and then build out into specific topics as necessary.

To illustrate your question, we can borrow a built-in dataset that is analogous. mtcars has different content but the way columns are selected works the same.

head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
dim(mtcars)
#> [1] 32 11
portion <- mtcars[,c(1,2,6)]
head(portion)
#>                    mpg cyl    wt
#> Mazda RX4         21.0   6 2.620
#> Mazda RX4 Wag     21.0   6 2.875
#> Datsun 710        22.8   4 2.320
#> Hornet 4 Drive    21.4   6 3.215
#> Hornet Sportabout 18.7   8 3.440
#> Valiant           18.1   6 3.460
dim(portion)
#> [1] 32  3

^{Created on 2022-12-22 by the reprex package (v2.0.1)}

The car names to the left are row identifiers, not variables, so they don't count as columns in the data frame. To begin, we had 32 rows, each of 11 variable from which a new data frame was subset of all rows and just the first, fourth and sixth columns. portion can be used for tabulating.

Try that with your data and come back with a reprex as @andresrcs suggests for help with tabulating.

Flm · December 23, 2022, 10:57am

Is this what you want:


library(tidyverse)

mydf <- tibble(
  ID = c("A","B","A","A", "C"),
  Col4 = c("gen1", "gen2", "gen3", "gen4", "gen5"),
  Col6 = c("spec1", "spec2", "spec3", "spec4", "spec5")
)

mydf %>% 
  group_by(ID) %>% 
  summarise(n = n())


# A tibble: 3 × 2
  ID        n
  <chr> <int>
1 A         3
2 B         1
3 C         1

matschelarmando · December 23, 2022, 12:01pm

Hey guys,

thanks for all the answers, sorry i needed some time to create the df!


data.frame(
  Site_ID = c("Greece1", "Greece1", "Greece1", "Muritz_2", "Muritz_2", "Muritz_2", "Muritz_2", "Spain4", "Spain4", "UK4"),
  Genus = c("Stenolophus", "Apion", "Cryptocephallus", "Apion", "Apion", "Apion", "Coccinula", "Microlestes", "Amara", "Amara"),
  species = c("teutonus", "rugicolle", "moreie", "pratense", "pratense", "pratense", "viridica", "minutulus", "anthobia", "aenea")
)

This would be what I got. What I want to do for my 570+ rows and my numerous Site-IDs is to count per Site (first column) how much insect species I have. Column 2 and 3 build together the species name. For example "Stenolophus teutonus" is one species of the Site Greece1. With this little Dataframe I am able to simply count and see that the Site Greece1 has 3 insect species or that Muritz_2 has 2 species.
But how do I do that for all my big DF?

I read something about the "vegan" library and the comand "specnumber".

What I would also like to do is afterwards plot the different sites and their species number. Probably a barplot with "speciesnumber" as the dependant variable on Y-axis and the Site-ID as the explanatory variable on the x-axis.

barplot(s~Site.ID, data = data.frame, ylab= "species number", x-lab = "examined plot")

would that even work once i figured out how to calculate the species number for each Site.ID?
I think I know how to do basic plots after watching tutorials and also read about ggplots but beginning with Rstudio is so difficult, when you have a specific question...

THank you very much guys
matschelarmando

matschelarmando · December 23, 2022, 12:08pm

Hey Flm,

thhanks for youre reply! It would be something like that i guess... but in youre example it's for the ID only, isn't it?
I would need it for the SiteID and than in relation to the numbers of SiteIDs the number of species respectively (which would be formed by gen1+spec1; gen3+spec3; gen4+spec4 for Site A in your nice example!)

I posted a comment about my problem, maybe it makes it clearer. Thank you for yourhelp!!

matschelarmando · December 23, 2022, 12:20pm

Hello technocrat,

thank you very much for the nice beginners tutorial, I will try to use it! As time is always rushing and I am really really slow with learning programming etc. I try more with tutorial videos about specific questions and read commented scripts of colleagues (even though I don't really get everything). But probably I will need to...

However, thanks for you example, but I think what you have done is count the rows and the columns, am I right? What I would need is, in your example now, for each car (this would be my SiteIDand those can repeat various times), count how many different "wt" there are (those would be my species). With the difference that the species is formed by two columns together (see my reprex)

greets
matschelarmando

Flm · December 23, 2022, 12:25pm

If I understand correctly this should be a solution:

library(tidyverse)

df <- 
  data.frame(
    Site_ID = c("Greece1", "Greece1", "Greece1", "Muritz_2", "Muritz_2", "Muritz_2", "Muritz_2", "Spain4", "Spain4", "UK4"),
    Genus = c("Stenolophus", "Apion", "Cryptocephallus", "Apion", "Apion", "Apion", "Coccinula", "Microlestes", "Amara", "Amara"),
    species = c("teutonus", "rugicolle", "moreie", "pratense", "pratense", "pratense", "viridica", "minutulus", "anthobia", "aenea")
  ) %>% as_tibble()


df %>% 
  mutate(gen_spec = paste(Genus, species, sep = "_")) %>% 
  count(Site_ID, gen_spec)

# A tibble: 8 × 3
  Site_ID  gen_spec                   n
  <chr>    <chr>                  <int>
1 Greece1  Apion_rugicolle            1
2 Greece1  Cryptocephallus_moreie     1
3 Greece1  Stenolophus_teutonus       1
4 Muritz_2 Apion_pratense             3
5 Muritz_2 Coccinula_viridica         1
6 Spain4   Amara_anthobia             1
7 Spain4   Microlestes_minutulus      1
8 UK4      Amara_aenea                1

matschelarmando · December 27, 2022, 10:29am

Thank you Flm,

This is almost what I had in mind! I would like an absolut number now, so that I habe for greece the absolut number, aswell as for the other species!
Is it also maybe possible to have this result in an own df or vector? So that I can now go and do other things with it? For exampler calculate diversity indices?

Thank you very much
matschelarmando

Flm · December 27, 2022, 10:34am

Is this what you want?

df %>% 
  mutate(gen_spec = paste(Genus, species, sep = "_")) %>% 
  count(Site_ID, gen_spec) %>% 
  group_by(Site_ID) %>% 
  summarise(sum = sum(n))

# A tibble: 4 × 2
  Site_ID    sum
  <chr>    <int>
1 Greece1      3
2 Muritz_2     4
3 Spain4       2
4 UK4          1
>

You can assign it using mytable <- before the code to use the table later

matschelarmando · January 17, 2023, 2:45pm

Hey Flm, sorry for the late answer.
That's almost it... With your code I get now the absolut number individuals: We had 3 species in Greece, so that I get 3 in the column, but for Muritz_2 we get 4. 4 is the number of individuals, but what I am searching for is the number of species, which would be 3 for greece, 2 for Muritz, 2 for Spain and 1 for UK
You have another idea?

Sorry for my late replies, but I am trevelling for work right now and cannot check regularly.

greets
Matschelarmando

Flm · January 17, 2023, 3:14pm

Hi, try this:


df %>%
  select(Site_ID, species) %>%
  unique() %>%
  count(Site_ID)


# A tibble: 4 × 2
  Site_ID      n
  <chr>    <int>
1 Greece1      3
2 Muritz_2     2
3 Spain4       2
4 UK4          1

technocrat · January 17, 2023, 5:57pm

Posing and understanding the problem is always trickiest.

is just

unique(mtcars$wt)
#>  [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 4.070 3.730 3.780
#> [13] 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840 3.845 1.935
#> [25] 2.140 1.513 3.170 2.770 2.780

Created on 2023-01-17 with reprex v2.0.2

For things that feel like they should be really basic, there is almost always a function, if you can find it.

system · February 28, 2023, 5:57pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.