About AWTY
AWTY is a system for graphical exploration of Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference. The graphics produced by AWTY are designed to help assess whether an MCMC analysis has run long enough, such that tree topologies are being sampled in proportion to their true posterior probability distribution. In other words, "Are We There Yet?" or AWTY for short. Admittedly, the results generated by AWTY will never be able to answer this question with a definitive yes; however, in some cases results will point confidently to the answer no. See the AWTY image gallery for some examples.To produce plots in AWTY a NEXUS or NEWICK formatted tree file representing a set of trees sampled over an MCMC run is required. To date, tree files generated by MrBayes and BAMBE have been tested. AWTY provides several graphical formats to display results or results may also be downloaded and analyzed using the plotting package of your choice. The online version of AWTY is written in Perl and PHP. Posterior probabilities of splits and topological tree distances are calculated by PAUP*. Graphics are generated by Gnuplot.
Citation:
Wilgenbusch J.C., Warren D.L., Swofford D.L. 2004. AWTY: A system for graphical exploration of MCMC convergence in Bayesian phylogenetic inference. http://ceb.csit.fsu.edu/awty.http://www.pierroton.inra.fr/genetics/labo/Software/Permut/
This TURBO PASCAL program is based on the paper by Odile Pons and Rémy J. Petit (Genetics 1996, 144:1237-1245), on that of Burban et al. 1999, Mol Ecol 8, 1593-1602, and on a paper by Petit et al. in press in Forest Ecology and Management.
It computes measures of diversity and differenciation from haploid population genetic data, when a measure of the distance between haplotypes is available, and test whether the differentiation and diversity measures differ from the equivalent measures that do not take into account the distances between haplotypes (ie, that consider all haplotypes equally divergent).
The source file should be an ASCII file (its name should have 8 characters maximum: 12345678.txt)
and should include the following information :
First line : Number of cytotypes Number of populations Number of characters distinguishing the variants (for instance number of polymorphic fragments, or of polymorphic nucleotide sites).
The program asks for the number of permutations to be made. see the example (inperm.txt and outperm.out).
Then follows the number of individuals having a given cytotype (column) in a given population (row).
Finally, and without interruption, provide the table of character states for all haplotypes, where each line corresponds to one haplotype, and each column to a character.
No column should be empty (no missing haplotype) and each population (row) should be composed of AT LEAST 3 individuals !
http://cmpg.unibe.ch/software/arlequin35/Arl35Methods.html
Arlequin ver 3.5.1.2(released on 24.02.2010) | ||||||||
| ||||||||
Implemented methods
The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods. In the first category statistical information is extracted independently from each population, whereas in the second category, samples are compared to each other.Intra-population methods
| Standard indices | Some diversity measures like the number of polymorphic sites, gene diversity | |
| Molecular diversity | Calculates several diversity indices like nucleotide diversity, different estimators of the population parameter | |
| Mismatch distribution | The distribution of the number of pairwise differences between haplotypes, from which parameters of a demographic or spatial population expansion can be estimated | |
| Haplotype frequency estimation | Estimates the frequency of haplotypes present in the population by maximum likelihood methods | |
| Gametic phase estimation | Estimates the most like gametic phase of multi-locus genotypes using a pseudo-Bayesian approach (ELB algorithm) | |
| Linkage disequilibrium | Test of non-random association of alleles at different loci | |
| Hardy-Weinberg equilibrium | Test of non-random association of alleles within diploid individuals | |
| Tajima's neutrality test | Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model | |
| Fu's neutrality test | Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model | |
| Ewens-Watterson neutrality test | Tests of selective neutrality based on Ewens sampling theory under the infinite alleles model | |
| Chakraborty's amalgamation test | A test of selective neutrality and population homogeneity. This test can be used when sample heterogeneity is suspected | |
| Minimum Spanning Network (MSN) | Computes a Minimum Spanning Tree (MST) and Network (MSN) among haplotypes. This tree can also be computed for all the haplotypes found in different populations if activated under the AMOVA section |
Inter-population methods
| Search for shared haplotypes between populations | Comparison of population samples for their haplotypic content. All the results are then summarized in a table | |
| AMOVA | Different hierarchical Analyses of Molecular Variance to evaluate the amount of population genetic structure | |
| FST-Pairwise genetic distances | FST-based genetic distances for short divergence time | |
| Pairwise moecular distances | Molecular distancess between populations based on the number of pairwise differencs between haplotypes | |
| delta-mu square | Genetic distance betwene populations based on microsatellite data | |
| Exact test of population differentiation | Test of non-random distribution of haplotypes into population samples under the hypothesis of panmixia | |
| Assignment test of genotypes | Assignment of individual genotypes to particular populations according to estimated allele frequencies | |
| Detection of loci under selection from F-statistics | Detection of loci under selection by the examination of the joint distribution of FST and heterozygosity under a hierarchical island model. |
Mantel test
| Correlations or partial correlations between a set of 2 or 3 matrices | Can be used to test for the presence of isolation-by-distance |
http://sasha.stanford.edu/Download_Sasha_Code.html
SAShA 2.0:
SAShA 2.0 uses the same algorithm as the original SAShA, but we were able to greatly improve its memory efficiency so that the code can run even very large datasets - thousands of alleles with ten of thousands of samples.
The SAShA 2.0 code is not yet compiled for Windows, but we are making the MATLAB code available. As long as you have access to a copy of MATLAB, the interface is just as easy to use. Here's how you do it:
[TO DOWNLOAD THIS INSTRUCTION PAGE AS A PDF, CLICK HERE.]
-
1)Download two code files: SAShA2.m and SAShA_CORE2.m, placing both folders in a directory that's easy for you to find. Just right click (control-click) and save as text with the extension ".m". It's important that use the filenames above and that you don't rename the files, because if you do, MATLAB won't be able to find them.
-
2)Open MATLAB.
-
3)Put the folder with the code files into MATLAB's search path. If you want to be absolutely sure this is true, add the folder to MATLAB's path. To do this: (a) In MATLAB, select the menu item "File: Set Path..." (b) Click "Add Folder..." in the dialog box that appears, (c) Select the folder containing the SAShA code from the Browse dialog that comes up and click "Open", (d) Click "Save" in the original "Set Path..." dialog, and then (e) Click "Close". If you're hacking with the code, be sure not to have two functions with the same name in MATLAB's search path. The results get complicated quickly, so if you want to play around, I recommend renaming the functions and files.
-
4)Get your data files ready. Prepare (A) a tab-delimited text allele-by-location matrix (ie: Allele.txt) in which each row corresponds to an allele, each column to a location, and the numbers in the matrix correspond to the numbers of that allele found in that location; AND (B) a text file containing the pairwise geographic distances between your locations in symmetric or triangular form (ie: GeoDist.txt). The units and conceptions of distance in that file are entirely up to you. Put both files in a folder that's easy to find. The simplest option is often to put them in the same folder as your code.
-
5)Set MATLAB's working directory to the folder with your data files. To do this: (Option A) In the MATLAB toolbar, click the "..." after "Current Folder:", and select the folder with your data files; OR (Option B) Select the menu item "Desktop: Current Folder" to open the "Current Folder Browser", and select your data file folder; OR (Option C) Use the command line command "cd" to set the current folder, as in "cd('RootPath/MyDataFileFolder/')".
-
6)Run SAShA2(). From the MATLAB Command Window prompt, type "SAShA2()", and answer the prompted questions. At each prompt, hitting return will select the default [shown in brackets]. Bear in mind that all filenames are case sensitive, so "allele.txt" is not the same as "Allele.txt".
-
7)Check out your results. SAShA2 will generate up to three figures and output up to four results files (Overall, Allele, Jack-Knife, ConsoleOutput) into the current MATLAB directory.
As we're freely distributing (as in beer) the code, feel free (as in speech) to modify, customize, and improve it as you see fit, but please cite us appropriately. If you do make the code better/more interesting, please let us know! We'd be especially excited if you can make a more usable front-end, or provide a web-accessible version.
SAShA 1.0:
For those without access to MATLAB, or those with smaller datasets, SAShA 1.0 works just fine. To run SAShA 1.0 from the Stand Alone Executable:
-
1)Download the three files: a) SAShA.exe, b)SAShA.ctf, c)MCRInstaller.exe
2) Put them in the same directory with
-- a) the text file containing your tab-delimited allele-by-location matrix (ie: Allele.txt)
and
-- b) the text file containing your upper-right triangular matrix of pairwise geographic distances (ie: GeoDist.txt)
Examples from Kathrina tunicata (Kelly and Ernisse 2007, Kelly et al. in review):
3) Install the MATLAB runtime environment (MCRInstaller.exe)
4) Make sure all the files are in the same directory, and that your system did not sneakily rename any files.
[Some systems try to turn the SAShA.ctf archive into SAShA.zip.
If yours did, just change the extension back to .ctf and it'll work fine.]
-
5)Run SAShA.exe, and learn!
A quick word on interpreting your data:
SAShA returns two statistics which represent how different the distribution of your alleles are from what one would expect under panmixia given the same spatial sampling. In the output, you'll find a lot of numbers, but they all serve to inform you about either these two stats, or the significance thereof.
The first statistic, Dg, is the difference between the Observed Mean (OM) of all the distances between every pair of identical alleles, and the Expected Mean (EM) under panmixia, given the same pattern of sampling. As it's a simple difference, Dg can be positive or negative, implying that the observed mean can be smaller (e.g. restricted allelic dispersal) or larger (e.g. over-dispersion) than the expected mean.
Dg = EM-OM
The second statistic, Dcdf, is the twice the root mean square difference between the observed or expected cumulative density functions of the pairwise distance distributions. Translating this, we turn both the observed and expected pairwise distance distributions into curves that essentially generate a cumulative sum of the frequency of each pairwise distance between identical alleles (observed) or all alleles (expected) starting zero distance and extending to the largest pairwise distance. We then take the difference between these curves, using the root-mean-square (i.e. subtract the curves from each other, square the result, take the mean, and then square root it). We then double this number to make Dcdf theoretically run from 0 to 1. If that's not clear to you, a picture is worth many, many words on this subject. I'd recommend taking a look at the overall plot on the welcome page and digging into the paper, and hopefully it'll make more sense.
Dg and Dcdf behave pretty similarly, but Dcdf can be particularly useful in multi-modal observed distributions (e.g. patchily distributed alleles) that are clearly different from expectation under panmixia, but that the difference between means will have a hard time representing.
All p-values reported here are generated by non-parametric permutation of your Allele-by-Location dataset, and therefore the precision of your p-value estimate is a function of the number of permutations you run. The default is 1000, because this number is a pretty safe bet to be sure whether or not a dataset will pass the traditional p < 0.05 criterion. If you see a p-value of 0, all that means is that none of the permuted datasets showed a larger divergence than the observed data. However, your real estimate of the p-value is not zero, but just less than 1/(the number of permutations + the one observed value). So for 1000 permutations, your p-value would be less than 0.000999. Some folks disagree about the plus one. If you agree with them, feel free to say that your p-value is only less than 0.001.
Allele-specific measures run these same stats on each allele in isolation, which can take some time if you're running a lot of alleles and a lot of permutations. Jack-Knifing runs the overall analysis on the dataset after removing each allele, and effectively tests the robustness of the overall trend. It's conceptually similar to bootstrapping. Running the code for each allele or without each allele can take a long time, and so the code defaults to opting out of these analyses.
http://nimbletwist.com/software/ninja/download.html
NINJA is software for inferring large-scale neighbor-joining phylogenies. According to benchmark tests, at the time of release, NINJA is the fastest available tool for computing correct neighbor-joining phylogenies for inputs of more than 10,000 sequences. It is more than 10x faster than the fastest implementation of the canonical neighbor-joining algorithm (QuickTree). Details of the software are available in a paper appearing at WABI 2009 (see link below).
NINJA is availble as a Mesquite package. See details here.
| URL of this page | http://nimbletwist.com/software/ninja | |
| Terms of Use | LGPL | |
| Distribution | ninja.tgz | |
| Citation | Wheeler, T.J. 2009. Large-scale neighbor-joining with NINJA. In S.L. Salzberg and T. Warnow (Eds.), Proceedings of the 9th Workshop on Algorithms in Bioinformatics. WABI 2009, pp. 375-389. Springer, Berlin. (LNCS webpage,preprint) | |
| Contact | Travis Wheeler travis _ at _ nimbletwist.com |
http://bioinformatics.org/~tryphon/populations/#ancre_formats
Populations 1.2.31
Contents
- haploids, diploids or polyploids genotypes (see input formats)
- structured populations (see input files structured populations
- No limit of populations, loci, alleles per loci (see input formats)
- Distances between individuals (15 different methods)
- Distances between populations (15 methods)
- Bootstraps on loci OR individuals
- Phylogenetic trees (individuals or populations), using Neighbor Joining or UPGMA (PHYLIP tree format)
- Allelic diversity
- Converts data files from Genepop to different formats (Genepop, Genetix, Msat, Populations...)
Programs:
Stephane Guindon.
http://compevol.auckland.ac.nz/
http://compevol.auckland.ac.nz/software/
Software
PhyTime
DensiTree
Fitmodel
BEAST
Geneious
Java Evolutionary Biology Library
PAL – Phylogenetics Analysis Library
PhyML
SplitsTree
http://www.stat.auckland.ac.nz/showperson?firstname=St%C3%A9phane&surname=Guindon
Senior Lecturer Stéphane Guindon
Further Information
Softwares I am developing:
Guindon S. (2010). Bayesian estimation of divergence times from large sequence alignments. Molecular Biology and Evolution, In press.
Gouy M, Guindon S, & Gascuel O. (2009). SeaView version 4 : a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular Biology and Evolution, 23.
No comments:
Post a Comment