Trevor Harrington -- RStudio Portfolio

Undergraduate repository containing RStudio workbooks created in Bioinformatics BIO-4ST1-14708

View the Project on GitHub

Trevor Harrington’s Github Homepage

Undergraduate repository containing RStudio workbooks created in Bioinformatics BIO-4ST1-14708

This page will contain three Rstudio projects in order of my progress using coding tools. Palmer_penguine dataset was used to initially understand how to utilize RStudio for data cleaning and analysis. Invertebrates was an attempt to first explore a unknown dataset that was less-then-friendly for a new coder to manipulate and interpret. The most significant piece of work contained in this page is the analysis of indoor pollution representing a percentage of deaths in countries across the globe.


Statistical Analysis: Basic Functions

Introduction

This journal contains some useful examples of how to run basic statistics functions using the base RStudio software. These workbook have been developed over the first half of ENV 220 Field Methods and Technologies. The methods of statistical analysis that are described in this section include functions for linear regression, t-testing and ANOVA testing. These resources are valuable for understanding the coding and statistical formulas that can be used in RStudio to make inferential statements from a dataset.

### linear Regression ### T-testing ### ANOVA


Palmer Penguines analysis

Abstract

This Workbook contains an analysis of Palmer Penguins to determine correlations present in the dataset containing 8 variables including year, region, gender, and physical characteristics like bill length and height, body mass, and flipper length. A exploratory dataset was generated using a random set of half the total variables to find any potential correlations that can be inferred with the full dataset. Statistical analysis was performed in tables and graphs, and found several interesting correlations, including that between species and island, species and bill length, and between flipper size and bill length for gentoo penguins.

Analysis


Invertebrate analysis

This notebook is used to investigate the practice of data cleaning and organizing. The dataset was gathered by Dr. Duryea at Lamoille Creek in Lamoille, NV.

Abstract

Analysis


Indoor Air Pollution analysis

Abstract

This research paper investigates the impact of indoor air pollution on premature deaths and its correlation with factors such as GDP, population size, and region. The paper highlights indoor air pollution’s significant risk to public health, particularly in low-income countries, and the need for targeted interventions to address the issue. The analysis finds that regional factors are crucial in determining indoor air pollution prevalence and associated health risks. While economic development can help reduce indoor air pollution, it can also create new sources of pollution. The analysis emphasizes the need for a nuanced understanding of the issue and further research to develop effective policies and interventions.

Analysis


SNHU Arboretum Stream Pollution Analysis

Abstract

This study aimed to evaluate the impact of water source proximity on tree size and estimate carbon sequestration in the SNHU Arboretum. Two 5x5 meter plots were selected, one near a seasonal second-order stream and the other distant from it. Diameter at breast height (DBH) was measured for five trees in each plot, and biomass was estimated using the allometric scaling equation. Tree density was calculated to estimate the amount of CO2 sequestered by the arboretum. The t-test revealed no significant difference between the mean DBH of trees near water (mean = 37.36 cm) and those distant from water (mean = 45.32 cm) at the 95% confidence level. The small sample size may have affected the results, suggesting that a larger sample size and tree height measurements could improve accuracy. This study serves as a low-confidence example of CO2 calculation but accurately illustrates the process.

Part 1: DBH Comparison

Part 2: Soil Sample Analysis


Bioinformatics: Replication Origin Analysis

As a student delving into the fascinating world of genomics, this RMarkdown showcases the code built over the semester that explores various functions and techniques for analyzing genomic patterns. This document aims to provide a comprehensive guide for understanding and identifying the replication origin of bacterial genomes using RStudio.

The RMarkdown document covers a range of essential topics and functions over six chapters, which build the knowledge and tools needed to utilize RStudio in the pursuit of uncovering patterns in a bacterial genome. Bacterial genomes range in length, but can easily exceed lengths that these methods can quickly process; for this reason, randomized genomes and Rosalalind Challenges will be used to identify that code works successfully. The goal of this

Workbook