InvertebratesAnalysis

Author

Trevor Harrington

Invertebrate Biodiversity in Riffle-Pool Streams: The Influence of Water Flow Characteristics.

#Load all valuable libraries to manipulate and display data
library(readxl)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(ggplot2)
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom        1.0.2     ✔ rsample      1.1.1
✔ dials        1.1.0     ✔ tune         1.0.1
✔ infer        1.0.4     ✔ workflows    1.1.2
✔ modeldata    1.0.1     ✔ workflowsets 1.0.0
✔ parsnip      1.0.3     ✔ yardstick    1.1.0
✔ recipes      1.0.4     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard()        masks purrr::discard()
✖ dplyr::filter()          masks stats::filter()
✖ recipes::fixed()         masks stringr::fixed()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ dplyr::lag()             masks stats::lag()
✖ yardstick::spec()        masks readr::spec()
✖ recipes::step()          masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts.
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(magrittr)

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract
 library(dplyr)

#data is saved locally, not accessable off of this PC
invertebrates <- read_excel("Invertebrates in R/inverts_class_data.xlsx") %>%
  
clean_names() 

Initial Data Split

   # Initial data intake / hypothesis generating exploration before further data investigation.
   
   my_data_splits <- initial_split(invertebrates, prop = 0.5,
                                   pool = 1) 

exploratory_data <- training(my_data_splits)
test_data <- testing(my_data_splits)

exploratory_data %>%
  
   t %>% as.data.frame( row.names = NULL, optional = FALSE,
              cut.names = FALSE, col.names = names('S1', 'S2'), fix.empty.names = TRUE,
              check.names = !optional,
              stringsAsFactors = FALSE) %>%

kable() %>%
  kable_styling(bootstrap_options = c("hover", "striped"))
V1 V2
site 2 (upstream) 1 (upstream)
riffle_pool Pool Pool
flow_velocity 0.18 3.00
stream_width 10 10
stream_depth 1.9 1.5
gastropoda 0 0
bivalvia 0 0
diptera 0 0
turbellaria 0 0
oliggocheata 0 0
hirundinea 0 0
decapoda 0 0
amphipod 0 0
isopod 0 0
trombidiforme 0 0
plecoptera 0 0
trichoptera 6 2
ephemroptera 2 0
megaloptera 0 0
coleoptera 0 4
hemiptera 0 1
odonta 7 4
lepidoptera 0 0
Note

Observations:

  • Several rows of data are 0 for both sample sites, and could be filtered out for lack of relevance to any potential analysis.

    • Flow velocity is the most significant difference that can be attributed to this data set for comparing water characteristics to the presence of species.
  • All the species seem to favor one steam against the other

    • the closest to having a even comparison is coleoptera (9:4) having less preference for riffle run vs. pool

exploratory_data %>%
  
  group_by(site) %>%

  mutate('total_individuals_present' = bivalvia + gastropoda + diptera + turbellaria + oliggocheata + hirundinea + decapoda + amphipod + isopod + trombidiforme + plecoptera + trichoptera + ephemroptera + megaloptera + coleoptera + hemiptera + odonta + lepidoptera) %>%
  
summarise(
  "average invertebrates present" = mean(total_individuals_present),
  "'Unit' flow" = mean(flow_velocity),
  "Riffle/pool" = list(riffle_pool),
  )%>%

kable() %>%
  kable_styling(bootstrap_options = c("hover", "striped"))
site average invertebrates present 'Unit' flow Riffle/pool
1 (upstream) 11 3.00 Pool
2 (upstream) 15 0.18 Pool
Note

Observations:

This data shows a comparison between two sample sites. Using the data generated in this instance, the comparison is between a Pool in site 1 and a Riffle run in site 2.

  • Site 1 contained 11 vertebrates and was measured at a flow rate of 3.00 ‘units of flow’

  • Site 2 contained 62 (6x more invertebrates) then site 1, at a 0.18 ‘unit of flow’ (16x slower flow rate)

What can be inferred? Data-split exploration

  • Using exploratory data, the first things to notice are that there are multiple locations where data has been collected.

  • data for this set are the species of invertebrates that were identified in two streams, across four locations. Each stream was measured at one ‘riffle run’ and one ‘pool’ section

    • A riffle stream is a shallow section moving water characterized by rapid flow with a rocky or gravel bottom, where the water flows over small obstructions and creates a ‘riffling’ sound. Riffle sections of a stream are typically found in the most upstream part of a stream where the slope is steeper and the water flow is more energetic.

    • A pool stream is a deeper section of a stream characterized by a slower flow rate with a smooth bottom, where the water flows around larger obstructions and creates a pooling effect. Pools are typically found in the lower half of a stream where the gradient is less steep and the water flow reduced.

Considering the characteristics of riffle and pool streams, it is possible some hypotheses could be generate on which would be more conducive of some species more then others.

  • It is reasonable to believe that among the 18 invertebrate species measured in this data set, some would have less preference to the water characteristics then others. However, it is still not without reason that one type of stream can be suitable for a larger proportion over another.

Data analysis using initial exploratory hypotheses

invertebrates %>%
  
  # reorient the data where variables are listed under the site columns.
  
  t %>% as.data.frame(colnames('Stream',
                               prefix = col)) %>%

 # filter out columns where every variable is 0
  
 filter_all(any_vars(.!=0)) %>%

  kable() %>%
    kable_styling(bootstrap_options = c("hover", "striped"))
V1 V2 V3 V4
site 1 (upstream) 1 (upstream) 2 (upstream) 2 (upstream)
riffle_pool Riffle Pool Riffle Pool
flow_velocity 3.00 3.00 0.18 0.18
stream_width 10 10 10 10
stream_depth 1.5 1.5 1.9 1.9
diptera 3 0 2 0
plecoptera 10 0 1 0
trichoptera 7 2 20 6
ephemroptera 14 0 6 2
megaloptera 4 0 0 0
coleoptera 5 4 9 0
hemiptera 1 1 0 0
odonta 10 4 24 7
lepidoptera 1 0 0 0
Note

Observations:

  • The riffle environment seems to be more conducive to invertebrate life judging by the larger population found in riffle sites. Does this correspond to biodiversity or is one species largely successful?
  • Trichoptera (caddisfly), Odonta (dragonfly), ephemroptera (mayfly) make up 50.4% of the upstream total species identified

Reconsidering Hypotheses with Full Dataset

Investigation of raw data suggests some potential questions could be related to a preference of species towards one type of stream versus another, with potentially some variation of these four testing sites providing an environment that is suitably favorable for a range of species.

This data has some significant limitations that prevent a in-depth investigation of the cause for success over one species versus another. Namely, this data lacks temporal, geographical, and methodology data that could provide insight for what other variables may contribute to the data collected. This data also lacks any measurement units for flow rate and width/depth, meaning they do not provide sufficient data to make any assumptions on the actual size, depth, or total flow of the stream to contribute in analysis. Overall, this data will be most valuable for

What can be inferred about from the addition data?

  • How are the characteristics between Site 1 & 2 different? Which combination of features seems to be the most ideal for invertebrate success.

    • Is large species count the same as biodiversity? Does the site with largest amount of species also have the most unique species, or is favorable for only a few?
  • Something we are becoming increasingly aware of is the impact of dams on aquatic habitats. If this stream is or is not dammed, there could be some inference or comparison made with other researchers findings and how the presents of man-made obstacles could impact invertebrate habitats.

    • likely, this kind of inference would require a larger data set to work with, but it could potentially provide some useful insight on making comparisons with other analysis conducted on a similar topic.
    • The invertebrate species found in the pool sites of one stream should be more similar to those in the pool sites of the other stream compared to the riffle sites of the two streams.
  • The invertebrate species found in the pool sites of one stream should be more similar to those in the pool sites of the other stream compared to the riffle sites of the two streams.

    invertebrates %>%
    
       pivot_longer(invertebrates,
                   cols = 6:23,
                   names_to = "order",
                   values_to = "count") %>%
    
      group_by(riffle_pool,site) %>%
    
      filter(count != 0) %>%
    
      kable() %>%
        kable_styling(bootstrap_options = c("hover", "striped"))
    Warning in gsub(vec_paste0("^", names_prefix), "", cols): argument 'pattern' has
    length > 1 and only the first element will be used
    site riffle_pool flow_velocity stream_width stream_depth order count
    1 (upstream) Riffle 3.00 10 1.5 diptera 3
    1 (upstream) Riffle 3.00 10 1.5 plecoptera 10
    1 (upstream) Riffle 3.00 10 1.5 trichoptera 7
    1 (upstream) Riffle 3.00 10 1.5 ephemroptera 14
    1 (upstream) Riffle 3.00 10 1.5 megaloptera 4
    1 (upstream) Riffle 3.00 10 1.5 coleoptera 5
    1 (upstream) Riffle 3.00 10 1.5 hemiptera 1
    1 (upstream) Riffle 3.00 10 1.5 odonta 10
    1 (upstream) Riffle 3.00 10 1.5 lepidoptera 1
    1 (upstream) Pool 3.00 10 1.5 trichoptera 2
    1 (upstream) Pool 3.00 10 1.5 coleoptera 4
    1 (upstream) Pool 3.00 10 1.5 hemiptera 1
    1 (upstream) Pool 3.00 10 1.5 odonta 4
    2 (upstream) Riffle 0.18 10 1.9 diptera 2
    2 (upstream) Riffle 0.18 10 1.9 plecoptera 1
    2 (upstream) Riffle 0.18 10 1.9 trichoptera 20
    2 (upstream) Riffle 0.18 10 1.9 ephemroptera 6
    2 (upstream) Riffle 0.18 10 1.9 coleoptera 9
    2 (upstream) Riffle 0.18 10 1.9 odonta 24
    2 (upstream) Pool 0.18 10 1.9 trichoptera 6
    2 (upstream) Pool 0.18 10 1.9 ephemroptera 2
    2 (upstream) Pool 0.18 10 1.9 odonta 7
invertebrates %>% #simple exploratory analysis 
  
  mutate(total_individuals_present = bivalvia + gastropoda + diptera + turbellaria + oliggocheata + hirundinea + decapoda + amphipod + isopod + trombidiforme + plecoptera + trichoptera + ephemroptera + megaloptera + coleoptera + hemiptera + odonta + lepidoptera) %>%
  
  group_by(site, riffle_pool) %>%
  
summarise(
  "average species present" = mean(total_individuals_present),
  "'Unit' flow" = list(flow_velocity),
  "Riffle/pool" = list(riffle_pool),
  ) %>%

kable() %>%
  kable_styling(bootstrap_options = c("hover", "striped"))
`summarise()` has grouped output by 'site'. You can override using the
`.groups` argument.
site riffle_pool average species present 'Unit' flow Riffle/pool
1 (upstream) Pool 11 3 Pool
1 (upstream) Riffle 55 3 Riffle
2 (upstream) Pool 15 0.18 Pool
2 (upstream) Riffle 62 0.18 Riffle
Note

Observations:

Hypothesis: By analyzing these four locations, the data may be able to determine whether the faster-flowing riffle run stream is more conducive to a greater number of of unique invertebrate species, or just contains a large quantity of a few well adapted species.

  • Something I am thinking about that lead to this hypothesis is how invasive species impact ecosystems – while the total quantity of life may be greater, the impact can still be negative if the species present are reducing biodiversity in the environment.

Null Hypothesis: insufficient statistical evidence is available to suggest this data shows a difference between riffle and pool streams.

Data analysis – Answering the Hypothesis

How can we test the claims?

Viewing Distribution of Species Across Sites

Hypotheses:

NULL Hypothesis: Characteristics