19th October 2013
Right now, when I see that I have to often repeat/retype some sets and sequences of commands, I’m trying to wrap them up into some kind of a script, every time choosing the most appropriate language – shell when I need to start lots of existing command-line tools, Python when there’s some data handling and processing involved, and R when I’m invoking commands from R packages. So far I have been avoiding the fairly popular makefile-based approach to automating pipelines and workflows which rely heavily on existing tools. However, being curious, I’ve compiled a short list of modern make-like alternatives, to possibly explore… sometime later…
- First comes make itself – the oldest and the most widely used software build tool. Stable and powerful. Still, even people who got used to using make, have some gripes about it. The most detailed list of gripes is probably here.
- SCons is a build tool written in Python. I guess I like that “configuration files are Python scripts” – maybe knowing Python is enough to use SCons, which makes SCons a better choice than make for me. SCons seems to have gained some support (scroll down for comments/discussion). There were some doubts about SCons performance (1, 2, and 3); not sure where SCons is at right now in that regard.
- waf, a Python-based framework for configuring, compiling and installing applications.
- pyDoIt is a Python automation tool. It seems to use Python syntax. It aims at bringing the power of build-tools to execute any kind of task, where a task describes some computation to be done (actions), and contains some extra meta-data. Based on the description alone, I’m quite intrigued! I wonder if anyone had already worked with pyDoIt and can share experiences?…
- Rake – Ruby make – is a simple build program with capabilities similar to those of make. Had seen a lot of positive feedback about this one – mostly regarding simplicity of use. Still [py]DoIt so far looks more attractive to me personally.
- Ruffus is a lightweight python module for running computational pipelines. Sounds like some good competition to [py]DoIt!
- Anduril is an open source component-based workflow framework for scientific data analysis. Sounds promising, though the latest downloadable version is over 400 MBs… It probably already contains a bunch of binaries and maybe even data and complete workflows for data analysis. Probably worth a look, but may turn out a little overweight for simple pipelining.
- snakemake is a scalable bioinformatics workflow engine. I get the feeling that Python is truly dominating the pipelines/workflows world: snakemake, as even the name suggests, is in Python, too. The front-page example is so simple and clear, that snakemake immediately pushes DoIt down from the 1st place! Awesome.
- Paver is a yet-another Python-based software project scripting tool along the lines of Make or Rake, designed to help out with repetitive tasks with the convenience of Python’s syntax. Sounds similar to DoIt. Have no idea how they actually compare to each other.
That is it for now.
What were your experiences with automating repetitive tasks and building simple pipelines?