Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Converting an existing Windows XP installation to a VirtualBox image

    29th June 2013

    Can be done in 2 steps, where 2nd step is optional:

    1. From Windows itself: use Disk2vhd to create the .vhd image (e.g. NOVA.VHD).
    2. (optional, requires VirtualBox) convert the VHD to VirtualBox-native VDI with VBoxManage clonehd NOVA.VHD nova.vdi --format VDI --variant Standard

    Posted in Software | 2 Comments »

    Free private git repository hosting

    29th August 2012

    Github is awesome and still improving, but sometimes I’d prefer to have some of my repositories hidden from the eyes of the public – not so much because of the code value (though that is also important sometimes), but rather because those repositories are all “work in progress” or “short-lived” and may have so much junk in them at some moments in time that it would simply be too embarrassing to publish this untidiness.

    Previously, I’ve used gitosis to setup git repository hosting on my server. I’m still using it for long-living projects, but I’m now lazy enough to dislike the steps needed to setup a new repo (and I’m creating more and more new repos, some of which are likely to die very young). Some kind of GUI would help, but gitweb seems not that useful to me (here’s how to make it work with gitosis, and another recipe, or maybe just try gitosis-web or gitosis-web-admin).

    Another downside is that gitosis is no longer actively maintained and was even removed from ubuntu repositories. Suggested course of action for gitosis users is to migrate to gitolite. However, basic design of gitolite is the same, so personally (looking for something easier to use) I see only minor gains in this migration (which I’ll have to perform anyway sooner or later).

    Another interesting self-hosted option is girocco. Too bad I have absolutely no experience with http://repo.or.cz/, so it’s hard to tell if girocco is convenient to use or not… Comments are welcome.

    Using dropbox for git repositories (also here) seems a nice and fairly easy option, with only a few downsides: it’ll eat your dropbox space (which is still much more than you get from free git hosters), and it isn’t that easy in a multi-user environment. Also, you will have to setup dropbox on your headless servers where you may want to run your code, which isn’t exactly what I’d want to do. Same arguments apply to git on google drive.

    An alternative to various self-hosted systems would be to use an existing system with free private projects. Git wiki has a list of hosts to start with.

    Here’s a brief summary of the options I’ve found relatively attractive (see below for my experience with 3 of the listed services). (See also this recent comparison.)

    Providers \ Features
    Repositories
    Users
    Space
    Paid plans?
    BitBucketUnlimited5Unlimited+
    AssemblaUnlimitedUnlimited1 GB+
    GIT EnterpriseUnlimited101 GB+
    ProjectLocker120.2 GB+

    Initially, I found GIT Enterprise and Assembla to be the most attractive options to try. After trying both, I found Assembla faster and generally more attractive to work with. It wasn’t immediately obvious how to create more than one source repository, but after figuring that out everything is smooth.

    However, after trying BitBucket, I had immediately switched all my assembla repositories to it :) BitBucket is just like github, but with free private repositories. It also has an issues tracker and a wiki. It even allows small teams to work on private repositories!

    Posted in *nix, Links, Software | 1 Comment »

    Where are you going?

    21st June 2012

    This is just a “Go to” dialog of the really good Notepad++ editor.

    Posted in Misc | No Comments »

    R functions for regression analysis cheat sheet

    29th May 2012

    Original PDF.
    My local copy.

    Posted in Bioinformatics, Links, Misc | No Comments »

    Information criteria for choosing best predictive models

    29th May 2012

    Usually I’m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).

    Just came across the Akaike’s Infor­ma­tion Criterion (AIC) and Schwarz Bayesian Infor­ma­tion Criterion (BIC). Citing robjhyndman,

    Asymp­tot­i­cally, min­i­miz­ing the AIC is equiv­a­lent to min­i­miz­ing the CV value. This is true for any model (Stone 1977), not just lin­ear mod­els. It is this prop­erty that makes the AIC so use­ful in model selec­tion when the pur­pose is prediction.

    Because of the heav­ier penalty, the model cho­sen by BIC is either the same as that cho­sen by AIC, or one with fewer terms. Asymp­tot­i­cally, for lin­ear mod­els min­i­miz­ing BIC is equiv­a­lent to leave–v–out cross-​​validation when v = n[1-1/(log(n)-1)] (Shao 1997).

    Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.

    Posted in Bioinformatics, Machine learning | No Comments »

    The genetics of orchids and dandelions

    1st May 2012

    Quite an interesting article on the genetics of behavior.

    Posted in Links, Misc | No Comments »

    Beanstalkd and related tools for easy parallelizing and backgrounding

    18th February 2012

    beanstalkd: a simple, fast work queue.
    Jack and the Beanstalkd: a web-app for basic work queue administration.
    beanstalkc: a simple beanstalkd client library for Python.
    queueit: a CLI interface tool which helps to integrate beanstalkd into shell scripts.

    Posted in Links, Programming, Python, Software | No Comments »