Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    Archive for the 'Science' Category

    How to cite PHYLIP

    10th January 2014

    Official PHYLIP FAQ does suggest a few ways to cite the software, but I believe that the best citation is mentioned in the wikipedia PHYLIP article: pubmed reference for PMID 7288891. This PubMed citations seems the best, because

    • it does mention the software tool implementing the maximum likelihood approach,
    • it is likely the earliest mention of the PHYLIP software (which was distributed since around 1980),
    • it refers to a journal indexed by pubmed, and
    • according to Google Scholar, it was already cited over 6660 times :)
    Share

    Posted in Links, Science, Software | No Comments »

    GUIs for R

    17th October 2013

    I’ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.

    My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.

    Next step would be using some R-equivalent of the excellent ipython’s Mathematica-like Notebook webinterface…

    Share

    Posted in *nix, Notepad, Programming, Science, Software | No Comments »

    MultiParanoid vs. QuickParanoid: pro et contra for each

    9th July 2013

    MultiParanoid

    Here we present a new proteome-scale analysis program called MultiParanoid that can automatically find orthology relationships between proteins in multiple proteomes. The software is an extension of the InParanoid program that identifies orthologs and inparalogs in pairwise proteome comparisons. MultiParanoid applies a clustering algorithm to merge multiple pairwise ortholog groups from InParanoid into multi-species ortholog groups.

    QuickParanoid

    QuickParanoid is a suite of programs for automatic ortholog clustering and analysis. It takes as input a collection of files produced by InParanoid and finds ortholog clusters among multiple species. For a given dataset, QuickParanoid first preprocesses each InParanoid output file and then computes ortholog clusters. It also provides a couple of programs qa1 and qa2 for analyzing the result of ortholog clustering.

    So… both use InParanoid… Are there any differences? Let me list those which I’ve found.

    Read the rest of this entry »

    Share

    Posted in *nix, Bioinformatics, Software | 2 Comments »

    R functions for regression analysis cheat sheet

    29th May 2012

    Original PDF.
    My local copy.

    Share

    Posted in Bioinformatics, Links, Misc | No Comments »

    Information criteria for choosing best predictive models

    29th May 2012

    Usually I’m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).

    Just came across the Akaike’s Infor­ma­tion Criterion (AIC) and Schwarz Bayesian Infor­ma­tion Criterion (BIC). Citing robjhyndman,

    Asymp­tot­i­cally, min­i­miz­ing the AIC is equiv­a­lent to min­i­miz­ing the CV value. This is true for any model (Stone 1977), not just lin­ear mod­els. It is this prop­erty that makes the AIC so use­ful in model selec­tion when the pur­pose is prediction.

    Because of the heav­ier penalty, the model cho­sen by BIC is either the same as that cho­sen by AIC, or one with fewer terms. Asymp­tot­i­cally, for lin­ear mod­els min­i­miz­ing BIC is equiv­a­lent to leave–v–out cross-​​validation when v = n[1-1/(log(n)-1)] (Shao 1997).

    Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.

    Share

    Posted in Bioinformatics, Machine learning | No Comments »

    Academia or life?

    16th April 2011

    Worth reading: Goodbye academia, I get a life.

    Share

    Posted in Links, Science | No Comments »

    Amazonia! 6462 human microarray datasets

    6th March 2011

    Amazonia!Amazonia! – explore the jungle of microarray results

    Paradoxically, the tremendous downpour of microarray results prevents a simple use of expression data. Therefore, we propose a thematic entry to public transcriptomes: you may for instance query a gene on a “Stem Cells page”, where you will see the expression of your favorite gene across selected microarray experiments related to stem cell biology. This selection of samples can be customized at will among the 6462 samples currently present in the database.

    Every transcriptome study results in the identification of lists of genes relevant to a given biological condition. In order to include this valuable information in any new query in the Amazonia! database, we indicate for each gene in which lists it is included. This is a straightforward and efficient way to synthesize hundreds of microarray publications.

    A special feature of Amazonia! is the field of human stem cells, notably embryonic stem cells.

    Share

    Posted in Bioinformatics, Links, Science | No Comments »