17th October 2013
I’ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.
My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.
Next step would be using some R-equivalent of the excellent ipython’s Mathematica-like Notebook webinterface…
Posted in *nix, Notepad, Programming, Science, Software | No Comments »
29th May 2012
Original PDF.
My local copy.
Posted in Bioinformatics, Links, Misc | No Comments »
29th May 2012
Usually I’m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).
Just came across the Akaike’s InforÂmaÂtion Criterion (AIC) and Schwarz Bayesian InforÂmaÂtion Criterion (BIC). Citing robjhyndman,
AsympÂtotÂiÂcally, minÂiÂmizÂing the AIC is equivÂaÂlent to minÂiÂmizÂing the CV value. This is true for any model (Stone 1977), not just linÂear modÂels. It is this propÂerty that makes the AIC so useÂful in model selecÂtion when the purÂpose is prediction.
…
Because of the heavÂier penalty, the model choÂsen by BIC is either the same as that choÂsen by AIC, or one with fewer terms. AsympÂtotÂiÂcally, for linÂear modÂels minÂiÂmizÂing BIC is equivÂaÂlent to leave–v–out cross-​​validation when v = n[1-1/(log(n)-1)] (Shao 1997).
Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.
Posted in Bioinformatics, Machine learning | No Comments »
27th October 2010
- Install the annotationTools R package:
source(“http://bioconductor.org/biocLite.R”)
biocLite(“annotationTools”) - Download full HomoloGene data file from ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/current
- library(annotationTools)
- homologene = read.delim(“homologene.data”, header=FALSE)
- mygenes = read.table(“file with one entrez ID of the source organism per line.txt”)
- getHOMOLOG(unlist(mygenes), taxonomy_ID_of_target_organism, homologene) [alternatively, wrap the call to getHOMOLOG into unlist to get a vector]
It might be easier to achieve the same results with a Perl script calling NCBI’s e-utils.
Posted in Bioinformatics, how-to, Notepad | 2 Comments »
27th January 2010
Sometimes there is a need to remove all the probesets, which have expression values below the minimal spike-in intensity on the Affymetrix microarray. The reasoning behind this procedure is simple: minimal-expression spike-ins represent the bottom margin of microarray sensitivity, and anything below that margin cannot be reliably quantified – which also means that both fold-change and p-value of expression variance will be unreliable for these probesets.
Here’s a simple R script to do just that. It is abundantly commented, and also contains an optional (commented out) fragment which allows the removal of more low-variance, low-intensity probesets.
Read the rest of this entry »
Posted in Bioinformatics, Programming, Science | No Comments »
21st October 2009
If you get this message when opening vignettes:
Error in openPDF(vif) :
getOption(‘pdfviewer’) is ”; please use ‘options(pdfviewer=…)’
and you are tired of running this command every time:
> options(pdfviewer=”okular”)
then you should check if your system-wide Renviron file has proper PDF viewer set:
Read the rest of this entry »
Posted in *nix, how-to, Notepad, Software | No Comments »