Title: | North Temperate Lakes - Microbial Observatory 16S Time Series Data and Functions |
---|---|
Description: | Analyses of OTU tables produced by 16S rRNA gene amplicon sequencing, as well as example data. It contains the data and scripts used in the paper Linz, et al. (2017) "Bacterial community composition and dynamics spanning five years in freshwater bog lakes," <doi: 10.1128/mSphere.00169-17>. |
Authors: | Alexandra Linz |
Maintainer: | Alexandra Linz <[email protected]> |
License: | GPL-3 |
Version: | 1.1.2 |
Built: | 2025-02-28 05:31:54 UTC |
Source: | https://github.com/cran/OTUtable |
Contains functions for the analysis of an OTU table generated from 16S rRNA amplicon sequencing. It also includes the data from the North Temperate Lakes-Microbial Observatory used in the paper Linz, et al. (2017) "Bacterial community composition and dynamics spanning five years in freshwater bog lakes." Please cite this paper if you use OTUtable. The data and code used in this paper are available at <https://github.com/McMahonLab/North_Temperate_Lakes-Microbial_Observatory>. Three data files are included: otu_table, taxonomy, and metadata. Access these by calling them with data(). There is also a fasta file associated with this dataset that is not included in this package - it can be found on our GitHub page in Data/16S_data. This package does not include functionality for fasta files; if you need this for analyses such as calculating UniFrac distance, please see the R package "phyloseq".
Package: | OTUtable |
Type: | Package |
Version: | 1.1.2 |
Date: | 2018-05-26 |
License: | GPL-3 |
Functions include:
bog_subset
chao1
clean_shared
clean_mothur_taxonomy
clean_TaxAss_taxonomy
combine_otus
extract_date
filter_taxa
grab_group
make_do_matrix
make_temp_matrix
obs_richness
pielou
plot_column
reduce_names
remove_reps
rotate
shannon
strat_metric
year_subset
zscore
Alexandra Linz <[email protected]>
Alexandra M. Linz, Benjamin C. Crary, Ashley Shade, Sarah Owens, Jack A. Gilbert, Rob Knight, Katherine D. McMahon. Bacterial Community Composition and Dynamics Spanning Five Years in Freshwater Bog Lakes. mSphere Jun 2017, 2 (3) e00169-17; DOI: 10.1128/mSphere.00169-17
Returns an OTU table containing only samples from the identified sampling site. This function can also be used on tables of higher level taxa generated by combine_otus(), or on tables that have already been processed by year_subset().
bog_subset(bog_id, table)
bog_subset(bog_id, table)
bog_id |
The three letter code indicating the sampling site. The bog is represented by letters one and two; options are TB, SS, CB, NS, MA, HK, WS, and FB. The third letter indicates the layer; E for epilimnion and H for hypolimnion. The bog_id should be in quotes, and regular expressions can be used. |
table |
A table containing the relative abundances of each taxon as rows and samples as columns. Sample names must be coded in the format bog, layer, date, and replicate (example: TBE07JUN08.R2 == Trout Bog Epilimnion, collected 07Jun08, replicate 2) |
Returns a relative abundance table containing samples from the specified sampling site in columns, with taxa in rows
Alexandra Linz <[email protected]>
data(otu_table) Trout_Bog_Epilimnion <- bog_subset("TBE", otu_table) Hells_Kitchen_Hypolimnion <- bog_subset("HKH", otu_table) # Include both epilimnion and hypolimnion in a single table Trout_Bog_both_layers <- bog_subset("TB.", otu_table) # Include all meromictic hypolimnia meromictic_hypolimnia <- bog_subset("HKH|MAH", otu_table)
data(otu_table) Trout_Bog_Epilimnion <- bog_subset("TBE", otu_table) Hells_Kitchen_Hypolimnion <- bog_subset("HKH", otu_table) # Include both epilimnion and hypolimnion in a single table Trout_Bog_both_layers <- bog_subset("TB.", otu_table) # Include all meromictic hypolimnia meromictic_hypolimnia <- bog_subset("HKH|MAH", otu_table)
Calculates Chao1 richness of a vector of relative abundance data. This alpha diversity metric takes into account the number of singletons and doubletons for a more accurate estimate than observed richness.
chao1(sample)
chao1(sample)
sample |
A vector of relative abundance data, typically a column in a matrix |
Returns a single number indicating the estimated richness in the tested sample based on the number of taxa appearing only once or twice
Use apply functions to calculate Chao1 richness for all samples in a matrix
Alexandra Linz <[email protected]>
data(otu_table) chao1_richness <- apply(otu_table, 2, chao1)
data(otu_table) chao1_richness <- apply(otu_table, 2, chao1)
Reduces information in a mothur .taxonomy file by removing the second column with the number of reads per OTU. It will also check for and remove OTUs in the taxonomy file that are not in OTU table - this may be the case if rarefaction was performed after classification, as was used in the NTL-Microbial Observatory dataset. This function was formerly clean_taxonomy in v1.0.0.
clean_mothur_taxonomy(taxonomy_file, table, remove_bootstrap)
clean_mothur_taxonomy(taxonomy_file, table, remove_bootstrap)
taxonomy_file |
A .taxonomy file output by mothur |
table |
An OTU table containing OTU numbers as row names |
remove_bootstrap |
TRUE or FALSE: if TRUE, removes bootstrap values from the classification strings |
Returns the taxonomy with OTUs as row names and seven columns containing each taxonomic level (Kingdom, Phylum, Class, Order, Lineage, Clade, and Tribe)
Alexandra Linz <[email protected]>
# Example path only: path <- "mothur_output/bogs.taxonomy" # table <- clean_shared("mothur_output/bogs.shared", trim.names = T) # taxonomy <- clean_mothur_taxonomy(path, table, remove_bootstrap = F)
# Example path only: path <- "mothur_output/bogs.taxonomy" # table <- clean_shared("mothur_output/bogs.shared", trim.names = T) # taxonomy <- clean_mothur_taxonomy(path, table, remove_bootstrap = F)
Formats a taxonomy file output by the McMahon Lab TaxAss 16S classification workflow (github.com/McMahonLab/TaxAss) into the same format produced by clean_mothur_taxonomy(). It will also check for and remove OTUs in the taxonomy file that are not in OTU table - this may be the case if rarefaction was performed after classification, as was used in the NTL-Microbial Observatory dataset.
clean_TaxAss_taxonomy(taxonomy_file, table, remove_bootstrap)
clean_TaxAss_taxonomy(taxonomy_file, table, remove_bootstrap)
taxonomy_file |
A .taxonomy file output by the TaxAss workflow |
table |
An OTU table containing OTU numbers as row names |
remove_bootstrap |
TRUE or FALSE: if TRUE, removes bootstrap values from the classification strings |
Returns the taxonomy with OTUs as row names and seven columns containing each taxonomic level (Kingdom, Phylum, Class, Order, Lineage, Clade, and Tribe)
Alexandra Linz <[email protected]>
# Example path only: path <- "TaxAss_output/bogs.taxonomy" # table <- clean_shared("mothur_output/bogs.shared", trim.names = T) # taxonomy <- clean_TaxAss_taxonomy(path, table, remove_bootstrap = F)
# Example path only: path <- "TaxAss_output/bogs.taxonomy" # table <- clean_shared("mothur_output/bogs.shared", trim.names = T) # taxonomy <- clean_TaxAss_taxonomy(path, table, remove_bootstrap = F)
Sums the abundances of OTUs with the same taxonomy at a given level into a single vector for that taxonomy. This creates a new table of relative abundance data at a higher taxonomic level than OTU. This function only works with the OTU level as input, but can be used on any subset of the OTU table created by year_subset() or bog_subset(). The OTU table must have the same number of rows as the taxonomy file (do not remove rows with no reads before running combine_otus()) If bootstrap values were not removed by expand_taxa(), this command will likely create spurious groupings based on identical bootstrap values.
combine_otus(level, table, taxonomy)
combine_otus(level, table, taxonomy)
level |
The desired level at which to combine OTUs; options are the column names from the taxonomy dataset |
table |
An OTU table containing the relative abundances of each OTU. |
taxonomy |
A taxonomy dataset in the form produced by expand_taxa(). |
Returns a table of relative abundance data with each row representing all OTUs of a given taxonomic assignment summed together. Row names are now the full taxonomic assignment of each row. To keep only the the lowest taxonomic level in the row names, run the function reduce_names()
Alexandra Linz <[email protected]>
data(otu_table) data(taxonomy) example_table <- year_subset("05", otu_table) example_table <- bog_subset("TBE", example_table) phylum_table <- combine_otus("Phylum", example_table, taxonomy)
data(otu_table) data(taxonomy) example_table <- year_subset("05", otu_table) example_table <- bog_subset("TBE", example_table) phylum_table <- combine_otus("Phylum", example_table, taxonomy)
The date each sample was collected is encoded in the sample ID. Extract this into R date format using this command.
extract_date(sample_ids)
extract_date(sample_ids)
sample_ids |
A vector of sample names. Samples must be labeled using the bog, layer, date, and replicate system (MAH04JUL05.R1 = Mary Lake Hypolimnion, 04Jul05, replicate 1) |
Returns a vector of dates corresponding to each sample
Alexandra Linz <[email protected]>
samples <- c("TBE01JUN09.R1", "TBE05JUN09", "TBE10JUN09.R2") extract_date(samples) # Extract sample dates from the OTU table data(otu_table) x <- extract_date(colnames(otu_table)) # Extract sample dates from the metadata data(metadata) x <- extract_date(metadata$Sample_Name)
samples <- c("TBE01JUN09.R1", "TBE05JUN09", "TBE10JUN09.R2") extract_date(samples) # Extract sample dates from the OTU table data(otu_table) x <- extract_date(colnames(otu_table)) # Extract sample dates from the metadata data(metadata) x <- extract_date(metadata$Sample_Name)
Returns a table containing only taxa that meet the imposed requirements of a minimum abundance and a minimum number of samples containing that taxon
filter_taxa(table, abundance, persistence)
filter_taxa(table, abundance, persistence)
table |
A table containing the relative abundances of each OTU or taxon in the form produced by clean_shared(). Can be used on the output of grab_groups() or combine_otus() |
abundance |
The minimum threshold for percentage of reads attributed to a taxon in at least one sample. Taxa at abundances greater than or equal this number will be retained. |
persistence |
The minimum threshold for the percentage of samples in which a taxon has been observed. Taxa at abundances greater than or equal this number will be retained. |
Returns a table with all taxa that met the imposed thresholds
Thanks Juliana Dias for suggesting this function!
Alexandra Linz <[email protected]>
# To make a table containing only OTUs with at least 0.1% abundance # in at least one sample that were observed # (at any abundance) in at least 50% of samples: # library(OTUtable) # data(otu_table) # filtered_table <- filter_taxa(otu_table, abundance = 0.1, persistence = 50) # To make a table containing only phyla with at least 10% abundance # in any one sample and were observed # at any abundance in at least 10% of samples: # data(taxonomy) # phylum_table <- combine_otus("Phylum", otu_table, taxonomy) # filtered_phylum_table <- filter_taxa(phylum_table, abundance = 10, persistence = 10)
# To make a table containing only OTUs with at least 0.1% abundance # in at least one sample that were observed # (at any abundance) in at least 50% of samples: # library(OTUtable) # data(otu_table) # filtered_table <- filter_taxa(otu_table, abundance = 0.1, persistence = 50) # To make a table containing only phyla with at least 10% abundance # in any one sample and were observed # at any abundance in at least 10% of samples: # data(taxonomy) # phylum_table <- combine_otus("Phylum", otu_table, taxonomy) # filtered_phylum_table <- filter_taxa(phylum_table, abundance = 10, persistence = 10)
Returns a table containing only taxa from a given phylogenetic group
grab_group(group, level, table, taxonomy)
grab_group(group, level, table, taxonomy)
group |
The phylogenetic classification of interest (can be a regular expression) |
level |
The phylogenetic level of the group of interest (must be a column name in the taxonomy file) |
table |
A table containing the relative abundances of each OTU in the form produced by clean_shared() |
taxonomy |
A taxonomy dataset in the form produced by expand_taxa() |
Returns a table with all taxa of a given taxonomic assignment
This function must be run on the OTU level table. However, the output of this function can be run through combine_otus() to create a higher level table of results. Sometimes closely related groups were classified better in the Greengenes vs the freshwater database during classification of the NTL-Microbial Observatory dataset. In this case, it is necessary to search for the names generated by both datasets to get all closely related OTUs. For example, Methylophilaceae in Greengenes are named betIV in the freshwater database.
Alexandra Linz <[email protected]>
data(otu_table) data(taxonomy) acI <- grab_group("acI", "Clade", otu_table, taxonomy) verruco <- grab_group("Verrucomicrobia", "Phylum", otu_table, taxonomy) # Example where two search terms are needed due to classification with two databases methylophilaceae <- grab_group("Methylophilaceae|betIV", "Clade", otu_table, taxonomy)
data(otu_table) data(taxonomy) acI <- grab_group("acI", "Clade", otu_table, taxonomy) verruco <- grab_group("Verrucomicrobia", "Phylum", otu_table, taxonomy) # Example where two search terms are needed due to classification with two databases methylophilaceae <- grab_group("Methylophilaceae|betIV", "Clade", otu_table, taxonomy)
Takes a given sample ID and converts the dissolved oxygen data in data(metadata) from long format into a matrix. This is useful for plotting using plot_column()
make_do_matrix(sampleID, field_data)
make_do_matrix(sampleID, field_data)
sampleID |
A regular expression used to select a group of samples |
field_data |
A dataset of DO profiles in long format. Column names must be the same as the metadata file provided with this package |
Also fills in NA values with the average of the depth above and below the missing value. If the value is at the bottom of the water column, the second deepest is substituted.
Returns matrix of DO data with depth in rows and date in columns
This is mainly used for generating contour plots. In general, long format is easier to work with. In the metadata file included in this package, each DO measurement is listed twice, once under the epilimnion sample name and again under the hypolimnion sample name.
Alexandra Linz <[email protected]>
data(metadata) dissolved_oxygen <- make_do_matrix("TBE.....07", metadata)
data(metadata) dissolved_oxygen <- make_do_matrix("TBE.....07", metadata)
Takes a given sample ID and converts temperature data of water profiles over time from long format into a matrix. This is most often useful for plotting using plot_column().
make_temp_matrix(sampleID, field_data)
make_temp_matrix(sampleID, field_data)
sampleID |
A regular expression used to select a group of samples |
field_data |
A dataset of temperature profiles in long format. Column names must be the same as the metadata file provided with this package |
Returns matrix of temperature data with depth in rows and date in columns
This is mainly used for generating contour plots. In general, long format is easier to work with. In the included metadata file, each temperature measurement is recorded twice, once as epilimnion and once as hypolimnion.
Alexandra Linz <[email protected]>
data(metadata) temp <- make_temp_matrix("TBE.....07", metadata)
data(metadata) temp <- make_temp_matrix("TBE.....07", metadata)
A dataset containing temperature and oxygen profiles from the lakes in this study
data(taxonomy)
data(taxonomy)
A dataframe with 6 columns (measured variables) and 13,607 rows (depth profiles)
Missing data indicated by NA Some sample dates and metadata dates may not match up exactly; if this presents an issue, please email and I will look at our written records for the right date Epilimnion and hypolimnion samples each have an identical depth profile entry; search for just one or the other
Alexandra Linz <[email protected]>
Calculates observed richness on a single column of relative abundance data.
obs_richness(sample)
obs_richness(sample)
sample |
A vector of relative abundance data, typically a single column in a matrix |
Returns a single number indicating the number of taxa in the tested sample
Use apply functions to calculate richness for all samples in a matrix
Alexandra Linz <[email protected]>
data(otu_table) richness <- apply(otu_table, 2, obs_richness)
data(otu_table) richness <- apply(otu_table, 2, obs_richness)
A dataset containing bacterial relative abundance data from the North Temperate Lakes Microbial Observatory Produced from mothur output using clean_shared()
data(otu_table)
data(otu_table)
A dataframe with 1,387 columns (samples) and 6,208 rows (OTUs)
Contains replicate samples Each column has been rarefied to 2500 Sample names encode sampling site ("TB"), epilimnion or hypolimnion ("E" or "H"), sampling date ("01JUN07") and replicate(".R2")
Alexandra Linz <[email protected]>
Calculates Pielou's evenness for a single vector of relative abundance data
pielou(sample)
pielou(sample)
sample |
A vector of relative abundance data |
Returns a single value indicating the evenness of a community
Alexandra Linz <[email protected]>
data(otu_table) even <- apply(otu_table, 2, pielou)
data(otu_table) even <- apply(otu_table, 2, pielou)
Takes output from make_do_matrix or make_temp_matrix and plots using filled.contour()
plot_column(data_matrix, title)
plot_column(data_matrix, title)
data_matrix |
A matrix output by make_do_matrix() or make_temp_matrix() |
title |
The title you would like on the plot |
Plots a filled contour plot showing the water column over time
Depends on the function rotate(). The functions make_do_matrix() and make_temp_matrix() fill in missing values with the average of the measurement at each depth above and below; however, if missing values are present in the matrix for plotting, these will appear as white space on the plot.
Alexandra Linz <[email protected]>
data(metadata) temp <- make_temp_matrix("TBE.....07", metadata) plot_column(temp, "Trout Bog 2007 Temperature")
data(metadata) temp <- make_temp_matrix("TBE.....07", metadata) plot_column(temp, "Trout Bog 2007 Temperature")
Reduces the full string indicating taxonomy to the last classified level. Works on tables at levels higher than OTUs.
reduce_names(table)
reduce_names(table)
table |
A table containing the relative abundances of each taxa produced by combine_otus() |
Returns the same table with shortened row names
This function is often most useful for plotting, so that the full string does not appear on the plot
Alexandra Linz <[email protected]>
data(otu_table) data(taxonomy) # Create a small table for the example # example <- year_subset("05", otu_table) # example <- bog_subset("TBE", example) # clade_table <- combine_otus("Clade", example, taxonomy) # clade_table <- clade_table[which(rowSums(clade_table) > 0),] # head(rownames(clade_table)) # reduced_clades <- reduce_names(clade_table) # head(rownames(reduced_clades))
data(otu_table) data(taxonomy) # Create a small table for the example # example <- year_subset("05", otu_table) # example <- bog_subset("TBE", example) # clade_table <- combine_otus("Clade", example, taxonomy) # clade_table <- clade_table[which(rowSums(clade_table) > 0),] # head(rownames(clade_table)) # reduced_clades <- reduce_names(clade_table) # head(rownames(reduced_clades))
Sometimes it is desirable to remove replicate samples (often for plotting). This command removes all samples marked as replicate 2. Please note that you should always check the similarity of replicates for your metric of interest before removing them for aesthetic purposes.
remove_reps(table)
remove_reps(table)
table |
An OTU table containing the relative abundances of each OTU |
Returns an OTU table containing only one replicate for each sample
Alexandra Linz <[email protected]>
data(otu_table) no_reps <- remove_reps(otu_table)
data(otu_table) no_reps <- remove_reps(otu_table)
Rotates a matrix of data so that columns are reversed
rotate(data_matrix)
rotate(data_matrix)
data_matrix |
Used in this package with matrix output by make_do_matrix or make_temp_matrix as part of the function plot_column(). Any matrix will work, though. |
Used to rotate the DO or temperature matrices so that depth 0 is at the top of a contour plot and the max depth is at the bottom.
Returns a matrix that has been rotated so that it reads from bottom to top
Used with make_do_matrix(), make_temp_matrix(), and plot_column(). plot_column() depends on this function.
An anonymous author on Stack Overflow Alexandra Linz <[email protected]>
data(metadata) temp <- make_temp_matrix("TBE.....07", metadata) r_temp <- rotate(temp)
data(metadata) temp <- make_temp_matrix("TBE.....07", metadata) r_temp <- rotate(temp)
Calculates Shannon's Biodiversity Index on a single column of relative abundance data. This metric takes into account both richness and evenness.
shannon(sample)
shannon(sample)
sample |
A vector of relative abundance data, typically a single column in a matrix |
Returns a single number indicating the amount of biodiversity in the tested sample
Use apply functions to calculate Shannon's index for all samples in a matrix
Alexandra Linz <[email protected]>
data(otu_table) richness <- apply(otu_table, 2, shannon)
data(otu_table) richness <- apply(otu_table, 2, shannon)
A dataset containing the taxonomy of each OTU in the otu_table Produced from mothur output using clean_taxonomy() Bootstrap values have been removed from this dataset, but are still available in as part of the Data folder in the McMahonLab/North_Temperate_Lakes-Microbial_Observatory GitHub repo
data(taxonomy)
data(taxonomy)
A dataframe with 7 columns (taxonomic levels) and 6,208 rows (OTUs)
Classified using our Freshwater database, followed by Greengenes - for the full workflow, visit the McMahonLab Github 16STax-Ass repository Some OTUs are missing; these were removed by subsampling of the OTU table The presence of both blank (__) assignments and "unclassified" assignments are the result of the dual classification.
Alexandra Linz <[email protected]>
Takes the year value in the last two digits of the sample ID and allows selection of a single year of data. Can be performed on tables at higher taxonomic levels generated by combine_otus(), or on tables already subset by bog_subset().
year_subset(year_id, table)
year_subset(year_id, table)
table |
A table containing the relative abundances of each taxa |
year_id |
Two digit code indicating the last two digits of the year of interest (05, 07, 08, 09) surrounded by quotes. Regular expressions can be used. |
Returns an OTU table containing only samples from the specified year
Alexandra Linz <[email protected]>
data(otu_table) seven <- year_subset("07", otu_table) # Select two years at once two_years <- year_subset("07|08", otu_table)
data(otu_table) seven <- year_subset("07", otu_table) # Select two years at once two_years <- year_subset("07|08", otu_table)
Normalizes taxa abundances in a table of relative abundance data using the z-score method. ((Abundance of one OTU in one sample) - (mean abundance for that OTU ))/(standard deviation of that OTU)
zscore(table)
zscore(table)
table |
A table of relative abundance data with taxa in rows and samples in columns |
Returns a table with relative abundance data replaced by z-scores
There is debate on whether this method of normalization is valid for microbial communities, as their abundance distrubtions tend to be heavily skewed. I found it useful for plotting heatmaps and for input into network analysis.
Alexandra Linz <[email protected]>
data(otu_table) # Create a small table for z-score normalization example <- year_subset("05", otu_table) example <- bog_subset("TBE", example) # Remove OTUs that are not present in this subset example <- example[which(rowSums(example) > 0), ] z_otu_table <- zscore(example)
data(otu_table) # Create a small table for z-score normalization example <- year_subset("05", otu_table) example <- bog_subset("TBE", example) # Remove OTUs that are not present in this subset example <- example[which(rowSums(example) > 0), ] z_otu_table <- zscore(example)