http://king2.scs.fsu.edu/CEBProjects/awty/awty_start.php

About AWTY

AWTY is a system for graphical exploration of Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference. The graphics produced by AWTY are designed to help assess whether an MCMC analysis has run long enough, such that tree topologies are being sampled in proportion to their true posterior probability distribution. In other words, "Are We There Yet?" or AWTY for short. Admittedly, the results generated by AWTY will never be able to answer this question with a definitive yes; however, in some cases results will point confidently to the answer no. See the AWTY image gallery for some examples.
To produce plots in AWTY a NEXUS or NEWICK formatted tree file representing a set of trees sampled over an MCMC run is required. To date, tree files generated by MrBayes and BAMBE have been tested. AWTY provides several graphical formats to display results or results may also be downloaded and analyzed using the plotting package of your choice. The online version of AWTY is written in Perl and PHP. Posterior probabilities of splits and topological tree distances are calculated by PAUP*. Graphics are generated by Gnuplot.

Citation:

Wilgenbusch J.C., Warren D.L., Swofford D.L. 2004. AWTY: A system for graphical exploration of MCMC convergence in Bayesian phylogenetic inference. http://ceb.csit.fsu.edu/awty.

http://www.pierroton.inra.fr/genetics/labo/Software/Permut/
This TURBO PASCAL program is based on the paper by Odile Pons and Rémy J. Petit (Genetics 1996, 144:1237-1245), on that of Burban et al. 1999, Mol Ecol 8, 1593-1602, and on a paper by Petit et al. in press in Forest Ecology and Management.
It computes measures of diversity and differenciation from haploid population genetic data, when a measure of the distance between haplotypes is available, and test whether the differentiation and diversity measures differ from the equivalent measures that do not take into account the distances between haplotypes (ie, that consider all haplotypes equally divergent).
The source file should be an ASCII file (its name should have 8 characters maximum: 12345678.txt)
and should include the following information :
First line : Number of cytotypes Number of populations Number of characters distinguishing the variants (for instance number of polymorphic fragments, or of polymorphic nucleotide sites).
The program asks for the number of permutations to be made. see the example (inperm.txt and outperm.out).
Then follows the number of individuals having a given cytotype (column) in a given population (row).
Finally, and without interruption, provide the table of character states for all haplotypes, where each line corresponds to one haplotype, and each column to a character.
No column should be empty (no missing haplotype) and each population (row) should be composed of AT LEAST 3 individuals !

http://cmpg.unibe.ch/software/arlequin35/Arl35Methods.html

Arlequin ver 3.5.1.2

(released on 24.02.2010)

Implemented methods

The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods. In the first category statistical information is extracted independently from each population, whereas in the second category, samples are compared to each other.

Intra-population methods

Standard indices		Some diversity measures like the number of polymorphic sites, gene diversity
Molecular diversity		Calculates several diversity indices like nucleotide diversity, different estimators of the population parameter
Mismatch distribution		The distribution of the number of pairwise differences between haplotypes, from which parameters of a demographic or spatial population expansion can be estimated
Haplotype frequency estimation		Estimates the frequency of haplotypes present in the population by maximum likelihood methods
Gametic phase estimation		Estimates the most like gametic phase of multi-locus genotypes using a pseudo-Bayesian approach (ELB algorithm)
Linkage disequilibrium		Test of non-random association of alleles at different loci
Hardy-Weinberg equilibrium		Test of non-random association of alleles within diploid individuals
Tajima's neutrality test		Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model
Fu's neutrality test		Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model
Ewens-Watterson neutrality test		Tests of selective neutrality based on Ewens sampling theory under the infinite alleles model
Chakraborty's amalgamation test		A test of selective neutrality and population homogeneity. This test can be used when sample heterogeneity is suspected
Minimum Spanning Network (MSN)		Computes a Minimum Spanning Tree (MST) and Network (MSN) among haplotypes. This tree can also be computed for all the haplotypes found in different populations if activated under the AMOVA section

Inter-population methods

Search for shared haplotypes between populations		Comparison of population samples for their haplotypic content. All the results are then summarized in a table
AMOVA		Different hierarchical Analyses of Molecular Variance to evaluate the amount of population genetic structure
FST-Pairwise genetic distances		FST-based genetic distances for short divergence time
Pairwise moecular distances		Molecular distancess between populations based on the number of pairwise differencs between haplotypes
delta-mu square		Genetic distance betwene populations based on microsatellite data
Exact test of population differentiation		Test of non-random distribution of haplotypes into population samples under the hypothesis of panmixia
Assignment test of genotypes		Assignment of individual genotypes to particular populations according to estimated allele frequencies
Detection of loci under selection from F-statistics		Detection of loci under selection by the examination of the joint distribution of FST and heterozygosity under a hierarchical island model.

Mantel test

Correlations or partial correlations between a set of 2 or 3 matrices

Can be used to test for the presence of isolation-by-distance

http://sasha.stanford.edu/Download_Sasha_Code.html

SAShA 2.0:

SAShA 2.0 uses the same algorithm as the original SAShA, but we were able to greatly improve its memory efficiency so that the code can run even very large datasets - thousands of alleles with ten of thousands of samples.

The SAShA 2.0 code is not yet compiled for Windows, but we are making the MATLAB code available. As long as you have access to a copy of MATLAB, the interface is just as easy to use. Here's how you do it:

[TO DOWNLOAD THIS INSTRUCTION PAGE AS A PDF, CLICK HERE.]

1)Download two code files: SAShA2.m and SAShA_CORE2.m, placing both folders in a directory that's easy for you to find. Just right click (control-click) and save as text with the extension ".m". It's important that use the filenames above and that you don't rename the files, because if you do, MATLAB won't be able to find them.
2)Open MATLAB.
3)Put the folder with the code files into MATLAB's search path. If you want to be absolutely sure this is true, add the folder to MATLAB's path. To do this: (a) In MATLAB, select the menu item "File: Set Path..." (b) Click "Add Folder..." in the dialog box that appears, (c) Select the folder containing the SAShA code from the Browse dialog that comes up and click "Open", (d) Click "Save" in the original "Set Path..." dialog, and then (e) Click "Close". If you're hacking with the code, be sure not to have two functions with the same name in MATLAB's search path. The results get complicated quickly, so if you want to play around, I recommend renaming the functions and files.
4)Get your data files ready. Prepare (A) a tab-delimited text allele-by-location matrix (ie: Allele.txt) in which each row corresponds to an allele, each column to a location, and the numbers in the matrix correspond to the numbers of that allele found in that location; AND (B) a text file containing the pairwise geographic distances between your locations in symmetric or triangular form (ie: GeoDist.txt). The units and conceptions of distance in that file are entirely up to you. Put both files in a folder that's easy to find. The simplest option is often to put them in the same folder as your code.
5)Set MATLAB's working directory to the folder with your data files. To do this: (Option A) In the MATLAB toolbar, click the "..." after "Current Folder:", and select the folder with your data files; OR (Option B) Select the menu item "Desktop: Current Folder" to open the "Current Folder Browser", and select your data file folder; OR (Option C) Use the command line command "cd" to set the current folder, as in "cd('RootPath/MyDataFileFolder/')".
6)Run SAShA2(). From the MATLAB Command Window prompt, type "SAShA2()", and answer the prompted questions. At each prompt, hitting return will select the default [shown in brackets]. Bear in mind that all filenames are case sensitive, so "allele.txt" is not the same as "Allele.txt".
7)Check out your results. SAShA2 will generate up to three figures and output up to four results files (Overall, Allele, Jack-Knife, ConsoleOutput) into the current MATLAB directory.

As we're freely distributing (as in beer) the code, feel free (as in speech) to modify, customize, and improve it as you see fit, but please cite us appropriately. If you do make the code better/more interesting, please let us know! We'd be especially excited if you can make a more usable front-end, or provide a web-accessible version.

SAShA 1.0:

For those without access to MATLAB, or those with smaller datasets, SAShA 1.0 works just fine. To run SAShA 1.0 from the Stand Alone Executable:

1)Download the three files: a) SAShA.exe, b)SAShA.ctf, c)MCRInstaller.exe

2) Put them in the same directory with

-- a) the text file containing your tab-delimited allele-by-location matrix (ie: Allele.txt)

and

-- b) the text file containing your upper-right triangular matrix of pairwise geographic distances (ie: GeoDist.txt)

Examples from Kathrina tunicata (Kelly and Ernisse 2007, Kelly et al. in review):

3) Install the MATLAB runtime environment (MCRInstaller.exe)

4) Make sure all the files are in the same directory, and that your system did not sneakily rename any files.

[Some systems try to turn the SAShA.ctf archive into SAShA.zip.

If yours did, just change the extension back to .ctf and it'll work fine.]

5)Run SAShA.exe, and learn!

A quick word on interpreting your data:

SAShA returns two statistics which represent how different the distribution of your alleles are from what one would expect under panmixia given the same spatial sampling. In the output, you'll find a lot of numbers, but they all serve to inform you about either these two stats, or the significance thereof.

The first statistic, Dg, is the difference between the Observed Mean (OM) of all the distances between every pair of identical alleles, and the Expected Mean (EM) under panmixia, given the same pattern of sampling. As it's a simple difference, Dg can be positive or negative, implying that the observed mean can be smaller (e.g. restricted allelic dispersal) or larger (e.g. over-dispersion) than the expected mean.

Dg = EM-OM

The second statistic, Dcdf, is the twice the root mean square difference between the observed or expected cumulative density functions of the pairwise distance distributions. Translating this, we turn both the observed and expected pairwise distance distributions into curves that essentially generate a cumulative sum of the frequency of each pairwise distance between identical alleles (observed) or all alleles (expected) starting zero distance and extending to the largest pairwise distance. We then take the difference between these curves, using the root-mean-square (i.e. subtract the curves from each other, square the result, take the mean, and then square root it). We then double this number to make Dcdf theoretically run from 0 to 1. If that's not clear to you, a picture is worth many, many words on this subject. I'd recommend taking a look at the overall plot on the welcome page and digging into the paper, and hopefully it'll make more sense.

Dg and Dcdf behave pretty similarly, but Dcdf can be particularly useful in multi-modal observed distributions (e.g. patchily distributed alleles) that are clearly different from expectation under panmixia, but that the difference between means will have a hard time representing.

All p-values reported here are generated by non-parametric permutation of your Allele-by-Location dataset, and therefore the precision of your p-value estimate is a function of the number of permutations you run. The default is 1000, because this number is a pretty safe bet to be sure whether or not a dataset will pass the traditional p < 0.05 criterion. If you see a p-value of 0, all that means is that none of the permuted datasets showed a larger divergence than the observed data. However, your real estimate of the p-value is not zero, but just less than 1/(the number of permutations + the one observed value). So for 1000 permutations, your p-value would be less than 0.000999. Some folks disagree about the plus one. If you agree with them, feel free to say that your p-value is only less than 0.001.

Allele-specific measures run these same stats on each allele in isolation, which can take some time if you're running a lot of alleles and a lot of permutations. Jack-Knifing runs the overall analysis on the dataset after removing each allele, and effectively tests the robustness of the overall trend. It's conceptually similar to bootstrapping. Running the code for each allele or without each allele can take a long time, and so the code defaults to opting out of these analyses.

[TO DOWNLOAD THIS INSTRUCTION PAGE AS A PDF, CLICK HERE.]

http://nimbletwist.com/software/ninja/download.html

NINJA is software for inferring large-scale neighbor-joining phylogenies. According to benchmark tests, at the time of release, NINJA is the fastest available tool for computing correct neighbor-joining phylogenies for inputs of more than 10,000 sequences. It is more than 10x faster than the fastest implementation of the canonical neighbor-joining algorithm (QuickTree). Details of the software are available in a paper appearing at WABI 2009 (see link below).
NINJA is availble as a Mesquite package. See details here.

URL of this page		`http://nimbletwist.com/software/ninja`
Terms of Use		`LGPL`
Distribution		`ninja.tgz`
Citation		Wheeler, T.J. 2009. Large-scale neighbor-joining with NINJA. In S.L. Salzberg and T. Warnow (Eds.), Proceedings of the 9th Workshop on Algorithms in Bioinformatics. WABI 2009, pp. 375-389. Springer, Berlin. (LNCS webpage,preprint)
Contact		Travis Wheeler travis _ at _ nimbletwist.com

http://bioinformatics.org/~tryphon/populations/#ancre_formats

Populations 1.2.31

Population genetic software (individuals or populations distances, phylogenetic trees)

haploids, diploids or polyploids genotypes (see input formats)
structured populations (see input files structured populations
No limit of populations, loci, alleles per loci (see input formats)
Distances between individuals (15 different methods)
Distances between populations (15 methods)
Bootstraps on loci OR individuals
Phylogenetic trees (individuals or populations), using Neighbor Joining or UPGMA (PHYLIP tree format)
Allelic diversity
Converts data files from Genepop to different formats (Genepop, Genetix, Msat, Populations...)

Programs:
Stephane Guindon.
http://compevol.auckland.ac.nz/
http://compevol.auckland.ac.nz/software/

Software

PhyTime

PhyTime is a software that estimates divergence times from the analysis of DNA or protein sequences. It relies on a fast Bayesian approach that builds the posterior densities of node ages. The method and its performances are described in Guindon, 2010, Mol. Biol. Evol. Please click here to run PhyTime online or to download it.

DensiTree

DensiTree is a program for qualitative analysis of sets of trees that make it possible to visually inspect the posterior distribution of an MCMC run over a tree. Many features of the posterior distribution are easily visible, such as agreement/disagreement of tree topologies, uncertainty in node heights and relative dominance of particular topologies.

Fitmodel

Fitmodel estimates the parameters of various codon-based models of substitution, including those described in Guindon, Rodrigo, Dyer and Huelsenbeck, 2004, PNAS. These models are especially useful as they accommodate site-specific switches between selection regimes without a priori knowledge of the positions in the tree where changes of selection regimes occurred. Click here to download.

BEAST

BEAST is an open source cross-platform program for Bayesian MCMC phylogenetic analysis of molecular sequences. The source code for BEAST is hosted at http://beast-mcmc.googlecode.com.

Geneious

Geneious is an integrated, cross-platform bioinformatics software suite for manipulating, finding, sharing, and exploring biological data such as DNA sequences or proteins, phylogenies, 3D structure information, publications, etc.

Java Evolutionary Biology Library

An open source Java library for evolutionary biology and bioinformatics, including objects representing biomolecular sequences, multiple sequence alignments and phylogenetic trees. The source code is hosted at http://sourceforge.net/projects/jebl

PAL – Phylogenetics Analysis Library

The PAL project is a collaborative effort to provide a high quality open source Java library for use in molecular evolution and phylogenetics.

PhyML

PhyML is a fast maximum likelihood phylogeny estimator from alignments of nucleotide or amino acid sequences. It provides a wide range of options that were designed to facilitate standard phylogenetic analyses. PhyML is open source distributed under the terms of the GNU GPL.

SplitsTree

SplitsTree4 is the leading application for computing evolutionary networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks.

http://www.stat.auckland.ac.nz/showperson?firstname=St%C3%A9phane&surname=Guindon

Senior Lecturer Stéphane Guindon
Further Information

I am a principal investigator in the Computational Evolution Group: http://compevol.auckland.ac.nz/
Softwares I am developing:

PhyML: estimation of maximum likelihood trees.
Fitmodel: deciphering the patterns of Darwinian selection acting on molecular sequences.
PhyTime : estimating species divergence times using a fast and accurate Bayesian approach.

Top Most recent publications

Guindon S. (2010). Bayesian estimation of divergence times from large sequence alignments. Molecular Biology and Evolution, In press.

Gouy M, Guindon S, & Gascuel O. (2009). SeaView version 4 : a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular Biology and Evolution, 23.

Juan Blog and journal!

July 3, 2011

Dna software

About AWTY

Citation:

Arlequin ver 3.5.1.2

Implemented methods

Intra-population methods

Inter-population methods

Mantel test

Populations 1.2.31

Contents

Software

PhyTime

DensiTree

Fitmodel

BEAST

Geneious

Java Evolutionary Biology Library

PAL – Phylogenetics Analysis Library

PhyML

SplitsTree

Senior Lecturer Stéphane Guindon
Further Information

No comments:

July 3, 2011

Dna software

About AWTY

Citation:

Arlequin ver 3.5.1.2

Implemented methods

Intra-population methods

Inter-population methods

Mantel test

Populations 1.2.31

Contents

Software

Senior Lecturer Stéphane GuindonFurther Information

No comments:

Senior Lecturer Stéphane Guindon
Further Information