Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for the 'Programming' Category

    How to update a multisite Drupal 6/7 installation using Drush

    25th August 2014

    There are quite a lot of posts on how to do this, but my differs a tiny little bit, so I’m saving it for my own future reference, and also for the benefits of the wider audience.

    I am updating a multisite Drupal 6 installation. To the best of my knowledge, the only difference for Drupal 7 is that instead of the site_offline D6 variable the maintenance_mode variable is used in D7.

    On Debian stable and later, you can sudo aptitude install drush and then just use it immediately after that.

    Note: I recommend su webuser (or sudo -s followed by sudo -s -u webuser) before you run any non-testing drush commands, where webuser is the user which owns your web-exposed files (e.g. Debian’s default is, I think, www-data). I’ve seen a lot of recommendations to run drush as a super-user, but that does not make sense, and may actually cause problems with file ownership.

    One last thing before we start: if your drush seems to work fine but hangs when untarring modules – check this solution.

    Read the rest of this entry »


    Posted in *nix, Drupal, how-to, Notepad, PHP, Programming, Software, Web | 1 Comment »

    Alternatives to GNU make

    19th October 2013

    Right now, when I see that I have to often repeat/retype some sets and sequences of commands, I’m trying to wrap them up into some kind of a script, every time choosing the most appropriate language – shell when I need to start lots of existing command-line tools, Python when there’s some data handling and processing involved, and R when I’m invoking commands from R packages. So far I have been avoiding the fairly popular makefile-based approach to automating pipelines and workflows which rely heavily on existing tools. However, being curious, I’ve compiled a short list of modern make-like alternatives, to possibly explore… sometime later…

    • First comes make itself – the oldest and the most widely used software build tool. Stable and powerful. Still, even people who got used to using make, have some gripes about it. The most detailed list of gripes is probably here.
    • SCons is a build tool written in Python. I guess I like that “configuration files are Python scripts” – maybe knowing Python is enough to use SCons, which makes SCons a better choice than make for me. SCons seems to have gained some support (scroll down for comments/discussion). There were some doubts about SCons performance (1, 2, and 3); not sure where SCons is at right now in that regard.
    • waf, a Python-based framework for configuring, compiling and installing applications.
    • pyDoIt is a Python automation tool. It seems to use Python syntax. It aims at bringing the power of build-tools to execute any kind of task, where a task describes some computation to be done (actions), and contains some extra meta-data. Based on the description alone, I’m quite intrigued! I wonder if anyone had already worked with pyDoIt and can share experiences?…
    • Rake – Ruby make – is a simple build program with capabilities similar to those of make. Had seen a lot of positive feedback about this one – mostly regarding simplicity of use. Still [py]DoIt so far looks more attractive to me personally.
    • Ruffus is a lightweight python module for running computational pipelines. Sounds like some good competition to [py]DoIt!
    • Anduril is an open source component-based workflow framework for scientific data analysis. Sounds promising, though the latest downloadable version is over 400 MBs… It probably already contains a bunch of binaries and maybe even data and complete workflows for data analysis. Probably worth a look, but may turn out a little overweight for simple pipelining.
    • snakemake is a scalable bioinformatics workflow engine. I get the feeling that Python is truly dominating the pipelines/workflows world: snakemake, as even the name suggests, is in Python, too. The front-page example is so simple and clear, that snakemake immediately pushes DoIt down from the 1st place! Awesome.
    • Paver is a yet-another Python-based software project scripting tool along the lines of Make or Rake, designed to help out with repetitive tasks with the convenience of Python’s syntax. Sounds similar to DoIt. Have no idea how they actually compare to each other.

    That is it for now.

    What were your experiences with automating repetitive tasks and building simple pipelines?


    Posted in *nix, Notepad, Programming, Software | No Comments »

    GUIs for R

    17th October 2013

    I’ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.

    My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.

    Next step would be using some R-equivalent of the excellent ipython’s Mathematica-like Notebook webinterface…


    Posted in *nix, Notepad, Programming, Science, Software | No Comments »

    Migrating from Redmine to Bitbucket

    17th October 2013

    In one of the previous posts I’ve mentioned that BitBucket is über-cool :)

    Redmine is also really cool, and is actually more feature-reach than what BitBucket has to offer, but maintaining it needs just a tiny bit more time and attention than I’m willing to spend these days. So, migration it is!

    Redmine has issue 3647 titled “Data import/export system”; it is not resolved, but has a number of links to other resources. Like the redmine exporter at, which provides free hosted redmine service. Redmine itself has REST API, though I have no idea if it allows exporting all the data I may need. There’s also an XLS export plugin, but it has to be installed first, and I’m too lazy :) There’s also TaskAdapter, but they do not support BitBucket (yet?).

    For the complete backup, I think of using the pure-ruby redmine project data export script. To migrate issues only, I’ll consider the redmine2bitbucket script.

    P.S. Not implying anything (yet?), but my previous migration was from Trac to Redmine… At that time, Trac seemed to have less features than I wanted. And now I’m migrating back to “less features”, but with a benefit of no support required from me.


    Posted in Links, Notepad, Programming | No Comments »

    Graphs in Python

    13th July 2013

    directed graphSooner or later, everyone has to deal with graphs. Some people have to do programming with graphs, and a subset of those – do that in Python.

    NetworkX is a pure Python implementation, where anything can be nodes. Both nodes and edges have attributes. NetworkX supports directed graphs and multigraphs (where there are multiple edges between nodes). It might be slower than other implementations, but you may even not notice that – especially when working with smaller graphs, or not applying computationally-intensive algorithms to your graphs.

    graph-tool uses the Boost graph library (C++), so it should be really fast. It could be the only multi-threaded graph library for Python. It supports pickling the graphs, allows interactive graph drawing, and has well-illustrated documentation. If performance and efficiency are of utmost importance – could be the best choice.

    igraph is also really fast – just like graph-tool when using 1 CPU; graph-tool only wins conclusively when it is run on multiple CPUs/cores. igraph has an R package bindings to C.

    Pure python is also an option for really smaller cases.

    Finally, there’s a discussion around Python Graph API to simplify the inter-changeability and inter-operability of various existing Python graph modules. It also has a list of some less-known Python graph libraries, so check it out.


    Posted in Programming, Python | No Comments »

    Beanstalkd and related tools for easy parallelizing and backgrounding

    18th February 2012

    beanstalkd: a simple, fast work queue.
    Jack and the Beanstalkd: a web-app for basic work queue administration.
    beanstalkc: a simple beanstalkd client library for Python.
    queueit: a CLI interface tool which helps to integrate beanstalkd into shell scripts.


    Posted in Links, Programming, Python, Software | No Comments »

    Python performance: set vs list

    15th August 2011

    Sometimes there is a need to be sure that no identifier is processed twice – for example, when parsing a file into a database, with file potentially containing duplicate records. An obvious solution is to properly wrap the DB insertion code into try…except block, and process duplicate primary ID exceptions. Another, sometimes more desired solution is to maintain a set/list of processed IDs internally, and check against that list prior to attempting the insertion of anything. So is it a set or a list?

    There are already quite a few internet resources discussing “python set vs list”, but probably the simplest while elegant way to test that is below.
    Read the rest of this entry »


    Posted in Notepad, Programming, Python | 1 Comment »