Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    GUIs for R

    17th October 2013

    I’ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.

    My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.

    Next step would be using some R-equivalent of the excellent ipython’s Mathematica-like Notebook webinterface…

    Share

    Posted in *nix, Notepad, Programming, Science, Software | No Comments »

    R functions for regression analysis cheat sheet

    29th May 2012

    Original PDF.
    My local copy.

    Share

    Posted in Bioinformatics, Links, Misc | No Comments »

    Information criteria for choosing best predictive models

    29th May 2012

    Usually I’m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).

    Just came across the Akaike’s Infor­ma­tion Criterion (AIC) and Schwarz Bayesian Infor­ma­tion Criterion (BIC). Citing robjhyndman,

    Asymp­tot­i­cally, min­i­miz­ing the AIC is equiv­a­lent to min­i­miz­ing the CV value. This is true for any model (Stone 1977), not just lin­ear mod­els. It is this prop­erty that makes the AIC so use­ful in model selec­tion when the pur­pose is prediction.

    Because of the heav­ier penalty, the model cho­sen by BIC is either the same as that cho­sen by AIC, or one with fewer terms. Asymp­tot­i­cally, for lin­ear mod­els min­i­miz­ing BIC is equiv­a­lent to leave–v–out cross-​​validation when v = n[1-1/(log(n)-1)] (Shao 1997).

    Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.

    Share

    Posted in Bioinformatics, Machine learning | No Comments »

    Batch-retrieve EntrezGene homologs using NCBI’s HomoloGene and R’s annotationTools

    27th October 2010

    1. Install the annotationTools R package:
      source(“http://bioconductor.org/biocLite.R”)
      biocLite(“annotationTools”)
    2. Download full HomoloGene data file from ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/current
    3. library(annotationTools)
    4. homologene = read.delim(“homologene.data”, header=FALSE)
    5. mygenes = read.table(“file with one entrez ID of the source organism per line.txt”)
    6. getHOMOLOG(unlist(mygenes), taxonomy_ID_of_target_organism, homologene) [alternatively, wrap the call to getHOMOLOG into unlist to get a vector]

    It might be easier to achieve the same results with a Perl script calling NCBI’s e-utils.

    Share

    Posted in Bioinformatics, how-to, Notepad | 2 Comments »

    R tutorial links

    29th March 2010

    Share

    Posted in Bioinformatics, Links, Science, Systems Biology | 1 Comment »

    R script to filter probesets with log-expression values below the lowest spike-in

    27th January 2010

    Sometimes there is a need to remove all the probesets, which have expression values below the minimal spike-in intensity on the Affymetrix microarray. The reasoning behind this procedure is simple: minimal-expression spike-ins represent the bottom margin of microarray sensitivity, and anything below that margin cannot be reliably quantified – which also means that both fold-change and p-value of expression variance will be unreliable for these probesets.

    Here’s a simple R script to do just that. It is abundantly commented, and also contains an optional (commented out) fragment which allows the removal of more low-variance, low-intensity probesets.

    Read the rest of this entry »

    Share

    Posted in Bioinformatics, Programming, Science | No Comments »

    R under Debian testing/i386: permanently set pdfviewer option

    21st October 2009

    If you get this message when opening vignettes:

    Error in openPDF(vif) :
    getOption(‘pdfviewer’) is ”; please use ‘options(pdfviewer=…)’

    and you are tired of running this command every time:

    > options(pdfviewer=”okular”)

    then you should check if your system-wide Renviron file has proper PDF viewer set:
    Read the rest of this entry »

    Share

    Posted in *nix, how-to, Notepad, Software | No Comments »