SECAPR stands for Sequence Capture Processor and constitutes a bioinformatic pipeline for the processing of Illumina short read data, particularly for datasets resulting from sequence capture enrichment (synonyms: target enrichment, exon capture, hybrid enrichment). This pipeline guides the user through all necessary steps from the raw Illumina sequencing data to phylogeny estimation from Multiple Sequence Alignments (MSAs). Included in the workflow are options for allele phasing and the extraction of SNP data, which can be used for phylogenetic or population genetic inferences. The pipeline and its main functionalities are described here.
Overly imprecise or erroneous geo-references are a major issues when using species distribution data from large databases compiled from various sources, such as public data aggregators (e.g. GBIF). CoordinateCleaner automatically cleans databases from records with potentially erroneous coordinates. It accounts for errors most commonly found in biological collections, for instance invalid coordinates, coordinates in the sea, coordinates assigned to country centroids, capitals or biodiversity institutions. Additionally, CoordinateCleaner can check for imprecise dating in fossils and identify if records in a data set have undergone decimal rounding. CoordinateCleaner is available as R packages via CRAN or GitHub..
Sampbias is a method and tool to 1) visualize the distribution of occurrence records and species in any user-provided dataset, 2) quantify the biasing effect of geographic features related to human accessibility, such as proximity to cities, rivers or roads, and 3) create publication-level graphs of these biasing effects in space.
We have developed a software software for coding species into user-defined units for e.g. biogeographic analyses, using a combination of GIS polygons and altitudinal ranges. There are two main packages: one written in python and another in R. The python program, available via the link above, is described in a pre-print, a simplified Graphical User Interface (GUI) available from sourceforge; see more details on the package's wiki page.
Daril Vilhena and Alexandre Antonelli recently described a network method for delimiting biogeographical regions based on species occurrence data (Vilhena and Antonelli 2015, Nature Communications). We have since made this method available for public use in Infomap Bioregions, an interactive web application that inputs species distribution data and generates bioregion maps. Species distributions may be provided as georeferenced point occurrences or range maps, and can be of local, regional or global scale. The application uses a novel adaptive resolution method to make best use of often incomplete species distribution data. The application is fully described here.
PyRate is a Python program to estimate speciation, extinction, and preservation rates from fossil occurrence data using probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. Different Bayesian algorithms are available to assess the presence of temporal rate shifts by exploring alternative diversification models. The methods and program are described here and here.
SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages and Relationships of Taxa) is a platform for automated mining of molecular sequences, large-scale phylogenetic inference and fossil calibration. The methods are described here.
Download parts of GenBank and query a local database in R. Visit the restez website to find out more.
An automated pipeline for retrieving orthologous DNA sequences from GenBank in R. Visit the phylotaR website to find out more.
Install and run command-line programs (such as RAxML, MAFFT or BLAST), outside of R, inside of R! Visit the outsider website to find out more.
Towards a modular version of the SUPERSMART pipeline in R.
See the progression on GitHub.