# Autarchy of the Private Cave

## GUIs for R

17th October 2013

I’ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.

My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.

Next step would be using some R-equivalent of the excellent ipython’s Mathematica-like Notebook webinterface…

## R functions for regression analysis cheat sheet

29th May 2012

Original PDF.
My local copy.

## Information criteria for choosing best predictive models

29th May 2012

Usually I’m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).

Just came across the Akaikeâ€™s InforÂ­maÂ­tion Criterion (AIC) and Schwarz Bayesian InforÂ­maÂ­tion Criterion (BIC). Citing robjhyndman,

AsympÂ­totÂ­iÂ­cally, minÂ­iÂ­mizÂ­ing the AIC is equivÂ­aÂ­lent to minÂ­iÂ­mizÂ­ing the CV value. This is true for any model (Stone 1977), not just linÂ­ear modÂ­els. It is this propÂ­erty that makes the AIC so useÂ­ful in model selecÂ­tion when the purÂ­pose is prediction.

Because of the heavÂ­ier penalty, the model choÂ­sen by BIC is either the same as that choÂ­sen by AIC, or one with fewer terms. AsympÂ­totÂ­iÂ­cally, for linÂ­ear modÂ­els minÂ­iÂ­mizÂ­ing BIC is equivÂ­aÂ­lent to leaveâ€“vâ€“out cross-â€‹â€‹validation when v = n[1-1/(log(n)-1)] (Shao 1997).

Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.

Posted in Bioinformatics, Machine learning | No Comments »

## Batch-retrieve EntrezGene homologs using NCBI’s HomoloGene and R’s annotationTools

27th October 2010

1. Install the annotationTools R package:
source(“http://bioconductor.org/biocLite.R”)
biocLite(“annotationTools”)
3. library(annotationTools)
5. mygenes = read.table(“file with one entrez ID of the source organism per line.txt”)
6. getHOMOLOG(unlist(mygenes), taxonomy_ID_of_target_organism, homologene) [alternatively, wrap the call to getHOMOLOG into unlist to get a vector]

It might be easier to achieve the same results with a Perl script calling NCBI’s e-utils.

29th March 2010

Posted in Bioinformatics, Links, Science, Systems Biology | 1 Comment »

## R script to filter probesets with log-expression values below the lowest spike-in

27th January 2010

Sometimes there is a need to remove all the probesets, which have expression values below the minimal spike-in intensity on the Affymetrix microarray. The reasoning behind this procedure is simple: minimal-expression spike-ins represent the bottom margin of microarray sensitivity, and anything below that margin cannot be reliably quantified – which also means that both fold-change and p-value of expression variance will be unreliable for these probesets.

Here’s a simple R script to do just that. It is abundantly commented, and also contains an optional (commented out) fragment which allows the removal of more low-variance, low-intensity probesets.

Posted in Bioinformatics, Programming, Science | No Comments »

## R under Debian testing/i386: permanently set pdfviewer option

21st October 2009

If you get this message when opening vignettes:

Error in openPDF(vif) :
getOption(‘pdfviewer’) is ”; please use ‘options(pdfviewer=…)’

and you are tired of running this command every time:

> options(pdfviewer=”okular”)

then you should check if your system-wide Renviron file has proper PDF viewer set:
Read the rest of this entry »