November 20, 2015

GRE, Books Tips and Strategies

As an Ecuadorian interested in applying for Ph.D in the States I had to take the GRE in order to submit my application. I am sure that my Scores are not very competitive but the rule is to score high to be in the median of the distribution of the scores, to above at least 50% of the other takers.
My scores are below:

General Test Scores

Test Date Verbal Reasoning* Quantitative Reasoning* Analytical Writing

Prior Format Current Format
Prior Format Current Format


Scaled Score Estimated
Current Score
Scaled Score % Below Scaled Score Estimated
Current Score
Scaled Score % Below Score % Below
10/30/2015

150 45

155 60 3.0 15











What I did I use to reach this "minimum" Score?
Well, I follow a general guideline which was "use most of the best tools and recents books available".
I took a GRE Course to prepare for the Math Section, and concomitantly for the Verbal Section. Therefore, I can't suggest anything specifically for it. Nevertheless I use several books to score as high as possible.

I followed the reference of Magoosh - Best 2015 GRE Books:
In contrast to what they mention there, I used the Book from Princeton: "Cracking the New GRE 2015". I can say anything bad, though it wasn't quite impressive. The best is the practice section by itself, not but the quality but the more exercises to keep practicing.

A rather jadded but wise advise: Start practicing, keep practicing and never stop practicing. Practice make you aware of your weaknesses, so where you must work harder. Try to avoid the Weakness only philosophy. You need balance, strenghtening yet reinforcing - do not focus only on where you make mistakes! You must still reinforce intensely where you don't!

As indicated by Magoosh, the best are the officials:
1. ETS’s Official GRE Verbal Reasoning Practice Questions
2. ETS’s Official GRE Quantitative Reasoning Practice Questions
and, differently from Magoosh:. The GRE Official Guide. Why? Simply because it's written by the authors, and secondly, because those guidelines, though not very detailled, are what you must follow in the Test. So, read this first, and do the practice sections to test and pace yourself.

In addition I used this books (which can be downloaded):

McGraw-Hill Education: GRE 2015 Premium
GRUBER’S COMPLETE GUIDE 2015 - 4th Edition
Arco - GRE 2015
Cracking the GRE 2015
Master The GRE 2015 - Peterson
Kaplan GRE Premier 2016
Princeton GRE 2015

Barron's GRE 6 Practice Tests
McGraw Hill GRE 6 Practice Tests
Kaplan GRE Math WorkBook
Peterson GRE 2014
GRUBER’S COMPLETE GUIDE 2014 - 3rd Edition

Princeton Word Smart

The 

 

















































July 4, 2011

Novel Phylogenetic Inference Software!

 http://www.metapiga.org/welcome.html
MetaPIGA 2 is a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a random-restart hill climbing, a simulated annealing algorithm, a classical genetic algorithm, and the metapopulation genetic algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA 2 handles nucleic-acid and protein datasets as well as morphological (presence/absence) data. The benefits of the metaGA (Lemmon & Milinkovitch 2002; PNAS, 99: 10516-10521) are as follows: (i) it resolves the major problem inherent to classical Genetic Algorithms (i.e., the need to choose between strong selection, hence, speed, and weak selection, hence, accuracy) by maintaining high inter-population variation even under strong intra-population selection, and (ii) it can generate branch support values that approximate posterior probabilities.
The software MetaPIGA 2 also implements:

  • Simple dataset quality control (testing for the presence of identical sequences as well as for excessively ambiguous or excessively divergent sequences);
  • Automated trimming of poorly aligned regions using the trimAl algorithm;
  • The Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for the easy selection of nucleotide and amino-acid substitution models that best fit the data;
  • Ancestral-state reconstruction of all nodes in the tree.
MetaPIGA 2 provides high customization of heuristics' and models' parameters, manual batch file and command line processing. However, it also offers an extensive and ergonomic graphical user interface and functionalities assisting the user for dataset quality testing, parameters setting, generating and running batch files, following run progress, and manipulating result trees.
MetaPIGA 2 uses standard formats for data sets and trees, is platform independent, runs in 32- and 64-bits systems, and takes advantage of multiprocessor and/or multicore computers. A version for Grid computing is in development.
 

Citing MetaPIGA 2

MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics
Raphaël Helaers & Michel C. Milinkovitch
BMC Bioinformatics 2010, 11:379



http://bioinformatics.oxfordjournals.org/content/25/2/197.full

Phylogenetic inference under recombination using Bayesian stochastic topology selection


Abstract

Motivation: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths.

Results: We demonstrate our approach on simulated data and also to the genome of a suspected HIV recombinant strain as well as to an investigation of recombination in the sequences of 15 laboratory mouse strains sequenced by Perlegen Sciences. Our findings indicate that our method allows us to distinguish between rate heterogeneity and variation in phylogeny caused by recombination without being restricted to 4-taxa data.

Availability: The method has been implemented in JAVA and is available, along with data studied here, from http://www.stats.ox.ac.uk/~webb.

Contact: cholmes@stats.ox.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.



http://www.stats.ox.ac.uk/__data/assets/pdf_file/0005/4010/large_pedigrees.pdf


http://www.cs.cmu.edu/~guestrin/Class/10701-S07/Handouts/recitations/HMM-inference.pdf


Probabilistic Phylogenetic Inference with Insertions and Deletions

Abstract Top

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program DNAML in PHYLIP. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program DNAMLε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.

Author Summary Top

We describe a computationally efficient method to use insertion and deletion events, in addition to substitutions, in phylogenetic inference. To date, many evolutionary models in probabilistic phylogenetic inference methods have only accounted for substitution events, not for insertions and deletions. As a result, not only do tree inference methods use less sequence information than they could, but also it has remained difficult to integrate phylogenetic modeling into sequence alignment methods (such as profiles and profile-hidden Markov models) that inherently require a model of insertion and deletion events. Therefore an important goal in the field has been to develop tractable evolutionary models of insertion/deletion events over time of sufficient accuracy to increase the resolution of phylogenetic inference methods and to increase the power of profile-based sequence homology searches. Our model offers a partial answer to this problem. We show that our model generally improves inference power in both simulated and real data and that it is easily implemented in the framework of standard inference packages with little effect on computational efficiency (we extended DNAML, in Felsenstein's popular PHYLIP package).



Materials and Methods Top

The C source code for the modified PHYLIP 3.66 package [14] that contains the program DNAMLε , the C source code for evolving sequences with the generative model (εRATE ), the modified ROSE package (version 1.3) [76], as well as all the Perl scripts and datasets used to generate the results presented in this paper are provided as a tarball in Dataset S1. The program DNAMLε uses the EASEL sequence analysis library (SRE, unpublished) which is also provided.

Roland F. Schwarz, William Fletcher, Frank Förster, Benjamin Merget, Matthias Wolf, Jörg Schultz, and Florian Markowetz
PLoS One. 2010; 5(12): e15788. Published online 2010 December 31. doi: 10.1371/journal.pone.0015788
PMCID:
PMC3013127

Bhakti Dwivedi and Sudhindra R Gadagkar
BMC Evol Biol. 2009; 9: 211. Published online 2009 August 23. doi: 10.1186/1471-2148-9-211
PMCID:
PMC2746219


Title: A stochastic evolution model for residue Insertion-Deletion Independent from Substitution
Author(s): Lebre S, Michel CJ
Source: COMPUTATIONAL BIOLOGY AND CHEMISTRY   Volume: 34   Issue: 5-6   Pages: 259-267   Published: DEC 2010
Times Cited: 0

Title: Genomes as documents of evolutionary history
Author(s): Boussau B, Daubin V
Source: TRENDS IN ECOLOGY & EVOLUTION   Volume: 25   Issue: 4   Pages: 224-232   Published: APR 2010
Times Cited: 2


http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746219/?tool=pmcentrez
Phylogenetic inference under varying proportions of indel-induced alignment gaps
Bhakti Dwivedi1 and Sudhindra R Gadagkarcorresponding author1,2
1Department of Biology, University of Dayton, 300 College Park, Dayton, OH 46469-2320, USA
2Department of Natural Sciences, PO Box 1004, 1400 Brush Row Rd, Wilberforce, Ohio 45384, USA
corresponding authorCorresponding author.
Bhakti Dwivedi: dwivedbz@notes.udayton.edu; Sudhindra R Gadagkar: sgadagkar@centralstate.edu
Received May 11, 2009; Accepted August 23, 2009.
Background
The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods.
Results
(1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps.
Conclusion
When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy.
 
PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
Robert Lücking, Brendan P Hodkinson, Alexandros Stamatakis, and Reed A Cartwright
BMC Bioinformatics. 2011; 12: 10. Published online 2011 January 7. doi: 10.1186/1471-2105-12-10.
PMCID: PMC3024941
Phylogenetic assessment of alignments reveals neglected tree signal in gaps
Christophe Dessimoz and Manuel Gil
Genome Biol. 2010; 11(4): R37. Published online 2010 April 6. doi: 10.1186/gb-2010-11-4-r37.
PMCID: PMC2884540
| Abstract | Full Text | PDF–741K | Supplementary Material |



Stud Health Technol Inform. 2007;129(Pt 2):1245-9.

Enhancing the quality of phylogenetic analysis using fuzzy hidden Markov model alignments.

Source

Lab of Medical Informatics, Faculty of Medicine, Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.

Abstract

Any effective phylogeny inference based on molecular data begins by performing efficient multiple sequence alignments. So far, the Hidden Markov Model (HMM) method for multiple sequence alignment has been proved competitive to the classical deterministic algorithms with respect to phylogenetic analysis; nevertheless, its stochastic nature does not help it cope with the existing dependence among the sequence elements. This paper deals with phylogenetic analysis of protein and gene data using multiple sequence alignments produced by fuzzy profile Hidden Markov Models. Fuzzy profile HMMs are a novel type of profile HMMs based on fuzzy sets and fuzzy integrals, which generalize the classical stochastic HMM by relaxing its independence assumptions. In this paper, alignments produced by the fuzzy HMM model are used in phylogenetic analysis of protein data, enhancing the quality of phylogenetic trees. The new methodology is implemented in HPV virus phylogenetic inference. The results of the analysis are compared against those obtained by the classical profile HMM model and depict the superiority of the fuzzy profile HMM in this field.


Bioinformatics. 2005 Sep 1;21 Suppl 2:ii166-72.

Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models.

Source

Biomathematics and Statistics, Scotland, Edinburgh, UK. dirk@bioss.ac.uk

Abstract

MOTIVATION:

A recently proposed method for detecting recombination in DNA sequence alignments is based on the combination of hidden Markov models (HMMs) with phylogenetic trees. Although this method was found to detect breakpoints of recombinant regions more accurately than most existing techniques, it inherently fails to distinguish between recombination and rate variation. In the present paper, we propose to marry the phylogenetic tree to a factorial HMM (FHMM). The states of the first hidden chain represent tree topologies, whereas the states of the second independent hidden chain represent different global scaling factors of the branch lengths. Inference is done in terms of a hierarchical Bayesian model, where parameters and hidden states are sampled from the posterior distribution with Gibbs sampling.

RESULTS:

We have tested the proposed model on various synthetic and real-world DNA sequence alignments. The simulation results suggest that as opposed to the standard phylogenetic HMM, the phylogenetic FHMM clearly distinguishes between recombination and rate heterogeneity and thereby avoids the prediction of spurious recombinant regions.

AVAILABILITY:

The proposed method has been implemented in a MATLAB package that extends Kevin Murphy's HMM toolbox. Software and data used in our study are available from http://www.bioss.sari.ac.uk/~dirk/Supplements


July 3, 2011

Dna software

http://king2.scs.fsu.edu/CEBProjects/awty/awty_start.php

About AWTY

AWTY is a system for graphical exploration of Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference. The graphics produced by AWTY are designed to help assess whether an MCMC analysis has run long enough, such that tree topologies are being sampled in proportion to their true posterior probability distribution. In other words, "Are We There Yet?" or AWTY for short. Admittedly, the results generated by AWTY will never be able to answer this question with a definitive yes; however, in some cases results will point confidently to the answer no. See the AWTY image gallery for some examples.
To produce plots in AWTY a NEXUS or NEWICK formatted tree file representing a set of trees sampled over an MCMC run is required. To date, tree files generated by MrBayes and BAMBE have been tested. AWTY provides several graphical formats to display results or results may also be downloaded and analyzed using the plotting package of your choice. The online version of AWTY is written in Perl and PHP. Posterior probabilities of splits and topological tree distances are calculated by PAUP*. Graphics are generated by Gnuplot.

Citation:

Wilgenbusch J.C., Warren D.L., Swofford D.L. 2004. AWTY: A system for graphical exploration of MCMC convergence in Bayesian phylogenetic inference. http://ceb.csit.fsu.edu/awty.


http://www.pierroton.inra.fr/genetics/labo/Software/Permut/
This TURBO PASCAL program is based on the paper by Odile Pons and Rémy J. Petit (Genetics 1996, 144:1237-1245), on that of Burban et al. 1999, Mol Ecol 8, 1593-1602, and on a paper by Petit et al. in press in Forest Ecology and Management.
It computes measures of diversity and differenciation from haploid population genetic data, when a measure of the distance between haplotypes is available, and test whether the differentiation and diversity measures differ from the equivalent measures that do not take into account the distances between haplotypes (ie, that consider all haplotypes equally divergent).
The source file should be an ASCII file (its name should have 8 characters maximum: 12345678.txt)
and should include the following information :
First line : Number of cytotypes     Number of populations     Number of characters distinguishing the variants (for instance number of polymorphic fragments, or of polymorphic nucleotide sites).
The program asks for the number of permutations to be made. see the example (inperm.txt and outperm.out).
Then follows the number of individuals having a given cytotype (column) in a given population (row).
Finally, and without interruption, provide the table of character states for all haplotypes, where each line corresponds to one haplotype, and each column to a character.
No column should be empty (no missing haplotype) and each population (row) should be composed of AT LEAST 3 individuals !




http://cmpg.unibe.ch/software/arlequin35/Arl35Methods.html

Arlequin ver 3.5.1.2

(released on 24.02.2010)


Arlecchino
About What's new Implemented methods Downloads ScreenshotsUpdates

Implemented methods

The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods. In the first category statistical information is extracted independently from each population, whereas in the second category, samples are compared to each other.

Intra-population methods

Standard indices
Some diversity measures like the number of polymorphic sites, gene diversity
Molecular diversity
Calculates several diversity indices like nucleotide diversity, different estimators of the population parameter
Mismatch distribution
The distribution of the number of pairwise differences between haplotypes, from which parameters of a demographic or spatial population expansion can be estimated
Haplotype frequency estimation
Estimates the frequency of haplotypes present in the population by maximum likelihood methods
Gametic phase estimation
Estimates the most like gametic phase of multi-locus genotypes using a pseudo-Bayesian approach (ELB algorithm)
Linkage disequilibrium
Test of non-random association of alleles at different loci
Hardy-Weinberg equilibrium
Test of non-random association of alleles within diploid individuals
Tajima's neutrality test
Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model
Fu's neutrality test
Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model
Ewens-Watterson neutrality test
Tests of selective neutrality based on Ewens sampling theory under the infinite alleles model
Chakraborty's amalgamation test
A test of selective neutrality and population homogeneity. This test can be used when sample heterogeneity is suspected
Minimum Spanning Network (MSN)
Computes a Minimum Spanning Tree (MST) and Network (MSN) among haplotypes. This tree can also be computed for all the haplotypes found in different populations if activated under the AMOVA section

Inter-population methods

Search for shared haplotypes between populations
Comparison of population samples for their haplotypic content. All the results are then summarized in a table
AMOVA
Different hierarchical Analyses of Molecular Variance to evaluate the amount of population genetic structure
FST-Pairwise genetic distances
FST-based genetic distances for short divergence time
Pairwise moecular distances
Molecular distancess between populations based on the number of pairwise differencs between haplotypes
delta-mu square
Genetic distance betwene populations based on microsatellite data
Exact test of population differentiation
Test of non-random distribution of haplotypes into population samples under the hypothesis of panmixia
Assignment test of genotypes
Assignment of individual genotypes to particular populations according to estimated allele frequencies
Detection of loci under selection from F-statistics
Detection of loci under selection by the examination of the joint distribution of FST and heterozygosity under a hierarchical island model.

Mantel test

Correlations or partial correlations between a set of 2 or 3 matrices
Can be used to test for the presence of isolation-by-distance




http://sasha.stanford.edu/Download_Sasha_Code.html

SAShA 2.0:

    SAShA 2.0 uses the same algorithm as the original SAShA, but we were able to greatly improve its memory efficiency so that the code can run even very large datasets - thousands of alleles with ten of thousands of samples.

     The SAShA 2.0 code is not yet compiled for Windows, but we are making the MATLAB code available.  As long as you have access to a copy of MATLAB, the interface is just as easy to use. Here's how you do it:

[TO DOWNLOAD THIS INSTRUCTION PAGE AS A PDF, CLICK HERE.]

  1. 1)Download two code files: SAShA2.m and SAShA_CORE2.m, placing both folders in a directory that's easy for you to find. Just right click (control-click) and save as text with the extension ".m". It's important that use the filenames above and that you don't rename the files, because if you do, MATLAB won't be able to find them.

  2. 2)Open MATLAB.

  3. 3)Put the folder with the code files into MATLAB's search path. If you want to be absolutely sure this is true, add the folder to MATLAB's path. To do this: (a) In MATLAB, select the menu item "File: Set Path..." (b) Click "Add Folder..." in the dialog box that appears, (c) Select the folder containing the SAShA code from the Browse dialog that comes up and click "Open", (d) Click "Save" in the original "Set Path..." dialog, and then (e) Click "Close". If you're hacking with the code, be sure not to have two functions with the same name in MATLAB's search path. The results get complicated quickly, so if you want to play around, I recommend renaming the functions and files.

  4. 4)Get your data files ready. Prepare (A) a tab-delimited text allele-by-location matrix (ie: Allele.txt) in which each row corresponds to an allele, each column to a location, and the numbers in the matrix correspond to the numbers of that allele found in that location; AND (B) a text file containing the pairwise geographic distances between your locations in symmetric or triangular form (ie: GeoDist.txt). The units and conceptions of distance in that file are entirely up to you. Put both files in a folder that's easy to find. The simplest option is often to put them in the same folder as your code.

  5. 5)Set MATLAB's working directory to the folder with your data files. To do this: (Option A) In the MATLAB toolbar, click the "..." after "Current Folder:", and select the folder with your data files; OR (Option B) Select the menu item "Desktop: Current Folder" to open the "Current Folder Browser", and select your data file folder; OR (Option C) Use the command line command "cd" to set the current folder, as in "cd('RootPath/MyDataFileFolder/')".

  6. 6)Run SAShA2(). From the MATLAB Command Window prompt, type "SAShA2()", and answer the prompted questions. At each prompt, hitting return will select the default [shown in brackets]. Bear in mind that all filenames are case sensitive, so "allele.txt" is not the same as "Allele.txt".

  7. 7)Check out your results. SAShA2 will generate up to three figures and output up to four results files (Overall, Allele, Jack-Knife, ConsoleOutput) into the current MATLAB directory.

As we're freely distributing (as in beer) the code, feel free (as in speech) to modify, customize, and improve it as you see fit, but please cite us appropriately. If you do make the code better/more interesting, please let us know! We'd be especially excited if you can make a more usable front-end, or provide a web-accessible version.


SAShA 1.0:

    For those without access to MATLAB, or those with smaller datasets, SAShA 1.0 works just fine. To run SAShA 1.0 from the Stand Alone Executable:

  1. 1)Download the three files: a) SAShA.exe, b)SAShA.ctf, c)MCRInstaller.exe


2) Put them in the same directory with

-- a) the text file containing your tab-delimited allele-by-location matrix (ie: Allele.txt)

and

-- b) the text file containing your upper-right triangular matrix of pairwise geographic distances (ie: GeoDist.txt)

Examples from Kathrina tunicata (Kelly and Ernisse 2007, Kelly et al. in review):

3) Install the MATLAB runtime environment (MCRInstaller.exe)

4) Make sure all the files are in the same directory, and that your system did not sneakily rename any files.

[Some systems try to turn the SAShA.ctf archive into SAShA.zip.

If yours did, just change the extension back to .ctf and it'll work fine.]

  1. 5)Run SAShA.exe, and learn!


A quick word on interpreting your data:

SAShA returns two statistics which represent how different the distribution of your alleles are from what one would expect under panmixia given the same spatial sampling. In the output, you'll find a lot of numbers, but they all serve to inform you about either these two stats, or the significance thereof. 

The first statistic, Dg, is the difference between the Observed Mean (OM) of all the distances between every pair of identical alleles, and the Expected Mean (EM) under panmixia, given the same pattern of sampling. As it's a simple difference, Dg can be positive or negative, implying that the observed mean can be smaller (e.g. restricted allelic dispersal) or larger (e.g. over-dispersion) than the expected mean.

Dg = EM-OM

The second statistic, Dcdf, is the twice the root mean square difference between the observed or expected cumulative density functions of the pairwise distance distributions.  Translating this, we turn both the observed and expected pairwise distance distributions into curves that essentially generate a cumulative sum of the frequency of each pairwise distance between identical alleles (observed) or all alleles (expected) starting zero distance and extending to the largest pairwise distance. We then take the difference between these curves, using the root-mean-square (i.e. subtract the curves from each other, square the result, take the mean, and then square root it). We then double this number to make Dcdf theoretically run from 0 to 1. If that's not clear to you, a picture is worth many, many words on this subject. I'd recommend taking a look at the overall plot on the welcome page and digging into the paper, and hopefully it'll make more sense.

Dg and Dcdf behave pretty similarly, but Dcdf can be particularly useful in multi-modal observed distributions (e.g. patchily distributed alleles) that are clearly different from expectation under panmixia, but that the difference between means will have a hard time representing.

All p-values reported here are generated by non-parametric permutation of your Allele-by-Location dataset, and therefore the precision of your p-value estimate is a function of the number of permutations you run. The default is 1000, because this number is a pretty safe bet to be sure whether or not a dataset will pass the traditional p < 0.05 criterion. If you see a p-value of 0, all that means is that none of the permuted datasets showed a larger divergence than the observed data. However, your real estimate of the p-value is not zero, but just less than 1/(the number of permutations + the one observed value). So for 1000 permutations, your p-value would be less than 0.000999. Some folks disagree about the plus one. If you agree with them, feel free to say that your p-value is only less than 0.001.

Allele-specific measures run these same stats on each allele in isolation, which can take some time if you're running a lot of alleles and a lot of permutations. Jack-Knifing runs the overall analysis on the dataset after removing each allele, and effectively tests the robustness of the overall trend. It's conceptually similar to bootstrapping. Running the code for each allele or without each allele can take a long time, and so the code defaults to opting out of these analyses.

[TO DOWNLOAD THIS INSTRUCTION PAGE AS A PDF, CLICK HERE.]





http://nimbletwist.com/software/ninja/download.html

NINJA is software for inferring large-scale neighbor-joining phylogenies. According to benchmark tests, at the time of release, NINJA is the fastest available tool for computing correct neighbor-joining phylogenies for inputs of more than 10,000 sequences. It is more than 10x faster than the fastest implementation of the canonical neighbor-joining algorithm (QuickTree). Details of the software are available in a paper appearing at WABI 2009 (see link below).
NINJA is availble as a Mesquite package. See details here.
URL of this page     http://nimbletwist.com/software/ninja
Terms of Use
LGPL
Distribution
ninja.tgz
Citation
Wheeler, T.J. 2009. Large-scale neighbor-joining with NINJA. In S.L. Salzberg and T. Warnow (Eds.), Proceedings of the 9th Workshop on Algorithms in Bioinformatics. WABI 2009, pp. 375-389. Springer, Berlin. (LNCS webpage,preprint)
Contact
Travis Wheeler
travis _ at _ nimbletwist.com



http://bioinformatics.org/~tryphon/populations/#ancre_formats

Populations 1.2.31

Population genetic software (individuals or populations distances, phylogenetic trees)

Contents

  • haploids, diploids or polyploids genotypes (see input formats)
  • structured populations (see input files structured populations
  • No limit of populations, loci, alleles per loci (see input formats)
  • Distances between individuals (15 different methods)
  • Distances between populations (15 methods)
  • Bootstraps on loci OR individuals
  • Phylogenetic trees (individuals or populations), using Neighbor Joining or UPGMA (PHYLIP tree format)
  • Allelic diversity
  • Converts data files from Genepop to different formats (Genepop, Genetix, Msat, Populations...)


Programs:
Stephane Guindon.
http://compevol.auckland.ac.nz/
http://compevol.auckland.ac.nz/software/

Software

PhyTime

PhyTimePhyTime is a software that estimates divergence times from the analysis of DNA or protein sequences. It relies on a fast Bayesian approach that builds the posterior densities of node ages. The method and its performances are described in Guindon, 2010, Mol. Biol. Evol. Please click here to run PhyTime online or to download it.

DensiTree

DensiTree_iconDensiTree is a program for qualitative analysis of sets of trees that make it possible to visually inspect the posterior distribution of an MCMC run over a tree. Many features of the posterior distribution are easily visible, such as agreement/disagreement of tree topologies, uncertainty in node heights and relative dominance of particular topologies.

Fitmodel

fitmodel1Fitmodel estimates the parameters of various codon-based models of substitution, including those described in Guindon, Rodrigo, Dyer and Huelsenbeck, 2004, PNAS. These models are especially useful as they accommodate site-specific switches between selection regimes without a priori knowledge of the positions in the tree where changes of selection regimes occurred. Click here to download.

BEAST

beast_iconBEAST is an open source cross-platform program for Bayesian MCMC phylogenetic analysis of molecular sequences. The source code for BEAST is hosted at http://beast-mcmc.googlecode.com.

Geneious

geneious_48Geneious is an integrated, cross-platform bioinformatics software suite for manipulating, finding, sharing, and exploring biological data such as DNA sequences or proteins, phylogenies, 3D structure information, publications, etc.

Java Evolutionary Biology Library

An open source Java library for evolutionary biology and bioinformatics, including objects representing biomolecular sequences, multiple sequence alignments and phylogenetic trees. The source code is hosted at http://sourceforge.net/projects/jebl

PAL – Phylogenetics Analysis Library

pal_iconThe PAL project is a collaborative effort to provide a high quality open source Java library for use in molecular evolution and phylogenetics.

PhyML

200px-phymlPhyML is a fast maximum likelihood phylogeny estimator from alignments of nucleotide or amino acid sequences. It provides a wide range of options that were designed to facilitate standard phylogenetic analyses. PhyML is open source distributed under the terms of the GNU GPL.

SplitsTree

SplitsTree iconSplitsTree4 is the leading application for computing evolutionary networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks.


http://www.stat.auckland.ac.nz/showperson?firstname=St%C3%A9phane&surname=Guindon

Senior Lecturer Stéphane Guindon
Further Information


I am a principal investigator in the Computational Evolution Group: http://compevol.auckland.ac.nz/
Softwares I am developing:
  • PhyML: estimation of maximum likelihood trees.
  • Fitmodel: deciphering the patterns of Darwinian selection acting on molecular sequences.
  • PhyTime : estimating species divergence times using a fast and accurate Bayesian approach.
Top Most recent publications

Guindon S. (2010). Bayesian estimation of divergence times from large sequence alignments. Molecular Biology and Evolution, In press.

Gouy M, Guindon S, & Gascuel O. (2009). SeaView version 4 : a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular Biology and Evolution, 23.



November 16, 2010

Introduction to Computational Genomics: A Case Studies Approach
Introduction to Computational Genomics: A Case Studies Approach

Where did SARS come from? Have we inherited genes from Neanderthals? How do plants use their internal clock? The genomic revolution in biology enables us to answer such questions. But the revolution would have been impossible without the support of powerful computational and statistical methods that enable us to exploit genomic data. Many universities are introducing courses to train the next generation of bioinformaticians: biologists fluent in mathematics and computer science, and data analysts familiar with biology. This readable and entertaining book, based on successful taught courses, provides a roadmap to navigate entry to this field. It guides the reader through key achievements of bioinformatics, using a hands-on approach. Statistical sequence analysis, sequence alignment, hidden Markov models, gene and motif finding and more, are introduced in a rigorous yet accessible way. A companion website provides the reader with Matlab-related software tools for reproducing the steps demonstrated in the book.

Download

Megaupload

Mathematics of Evolution and Phylogeny

This book considers evolution at different scales: sequences, genes, gene families, organelles, genomes and species. The focus is on the mathematical and computational tools and concepts, which form an essential basis of evolutionary studies, indicate their limitations, and give them orientation. Recent years have witnessed rapid progress in this area, with models and methods becoming more realistic, powerful, and complex.

This book of contributed chapters is authored by renowned scientists and covers recent results in the highly topical area of mathematics in evolution and phylogeny. Each chapter is a detailed overview of a specific topic, from the underlying concepts to the latest results.

Aimed at graduates and researchers in phylogenetics, this book will be of interest to both mathematicians and biologists.
Features

* High quality contributions from renowned scientists
* Covers the latest in evolution and phylogenetics
* Much needed introductory material on phylogenetics aimed at biologists and mathematicians

DOWNLOAD .

mEGAUPLOAD1
mEGAUPLOAD2


Bioinformatics: Sequence Alignment and Markov Models

Bioinformatics showcases the latest developments in the field along with all the foundational information you'll need. It provides in-depth coverage of a wide range of autoimmune disorders and detailed analyses of suffix trees, plus late-breaking advances regarding biochips and genomes.

Featuring helpful gene-finding algorithms, Bioinformatics offers key information on sequence alignment, HMMs, HMM applications, protein secondary structure, microarray techniques, and drug discovery and development. Helpful diagrams accompany mathematical equations throughout, and exercises appear at the end of each chapter to facilitate self-evaluation.

This thorough, up-to-date resource features:

* Worked-out problems illustrating concepts and models
* End-of-chapter exercises for self-evaluation
* Material based on student feedback
* Illustrations that clarify difficult math problems
* A list of bioinformatics-related websites

Bioinformatics covers:

* Sequence representation and alignment
* Hidden Markov models
* Applications of HMMs
* Gene finding
* Protein secondary structure prediction
* Microarray techniques
* Drug discovery and development
* Internet resources and public domain databases

Downloads

Megaupload

ANCESTRAL SEQUENCE RECONSTRUCTION

Ancestral sequence reconstruction is a technique of growing importance in molecular biology and comparative genomics. As a powerful technique for both testing evolutionary and ecological hypotheses as well as uncovering the link between sequence and molecular phenotype, there are potential applications in a number of fields. Beginning with a historical overview of the field including apllications, the discussion then moves into potential applications in drug discovery and the pharmaceutical industry. A section on computational methodology provides a detailed discussion on available methods for reconstructing ancestral sequences, including advantages,disadvantages, and potential pitfalls. Purely computational applications, including whole proteome reconstruction are discussed. Another section provides a detailed discussion on taking computationally reconstructed sequences and synthesizing them in the laboratory, while the last section describes scientific questions where experimental ancestral sequence reconstruction coupled to a computaional and experimental how-to guide, while simultaneously addressing some of the hot topics in the field.

Publisher: Oxford University Press, USA; 1 edition (July 26, 2007), 272 Pages

Download
Megaupload

Mathematics of Genetic Diversity
J. F. C. Kingman


CBMS-NSF Regional Conference Series in Applied Mathematics 34

This book draws together some mathematical ideas that are useful in population genetics, concentrating on a few aspects which are both biologically relevant and mathematically interesting.

Contents

The Problem: Why Mathematics?; Genes and Their Inheritance, Selection, Mutation; Survival of the Fittest: Balanced Polymorphisms, Multi-Locus Selection, Balance Between Selection and Mutation, The House of Cards, The Diploid House of Cards, The Resistance of Polymorphisms to Mutation; The Neutral Alternative: Evolution in the Absence of Selection, A General Model for Mutation in Finite Populations, The Random Walk Case, The Frequency Spectrum, The Ewens Sampling Formula, The Poisson-Dirichlet Distribution, Partition Structures, Testing Neutrality; Selection in Finite Populations: Deleterious Mutants, The Wright-Fisher Model, Wright's Formula, The Infinite Alleles Limit.

Download Links
Megaupload

October 30, 2010

Megatextads promo codes

1solo
twogo
Promo Code 6: 2 MILLION POINTS! to use for whatever advertising you wish! - use code sixsixsix

October 27, 2010

Earn money Online!

Join these sites! They pay!!