Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for October 17th, 2013

    GUIs for R

    17th October 2013

    I’ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.

    My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.

    Next step would be using some R-equivalent of the excellent ipython’s Mathematica-like Notebook webinterface…

    Share

    Posted in *nix, Notepad, Programming, Science, Software | No Comments »

    Migrating from Redmine to Bitbucket

    17th October 2013

    In one of the previous posts I’ve mentioned that BitBucket is über-cool :)

    Redmine is also really cool, and is actually more feature-reach than what BitBucket has to offer, but maintaining it needs just a tiny bit more time and attention than I’m willing to spend these days. So, migration it is!

    Redmine has issue 3647 titled “Data import/export system”; it is not resolved, but has a number of links to other resources. Like the redmine exporter at hostedredmine.com, which provides free hosted redmine service. Redmine itself has REST API, though I have no idea if it allows exporting all the data I may need. There’s also an XLS export plugin, but it has to be installed first, and I’m too lazy :) There’s also TaskAdapter, but they do not support BitBucket (yet?).

    For the complete backup, I think of using the pure-ruby redmine project data export script. To migrate issues only, I’ll consider the redmine2bitbucket script.

    P.S. Not implying anything (yet?), but my previous migration was from Trac to Redmine… At that time, Trac seemed to have less features than I wanted. And now I’m migrating back to “less features”, but with a benefit of no support required from me.

    Share

    Posted in Links, Notepad, Programming | No Comments »

    Resume broken scp/mc/fish transfer with rsync

    17th October 2013

    Note: this is a draft post back from 2010. As it is still useful to me, I’ve decided to publish it as is.

    I had already mused on the powers of rsync before.

    This time, a reminder to self on how to resume copying broken scp/mc/fish transfers, using rsync.

    First, an assortment of example commands.
    export RSYNC_RSH=ssh
    rsync --partial file_to_transfer user@remotehost:/path/remote_file
    rsync -av --partial --progress --inplace SRC DST
    rsync --partial --progress --rsh=ssh host:/work/source.tar.bz2 .
    rsync --partial --progress --rsh=ssh -r me@host.com:/datafiles/ ./

    One could also try the --append option of rsync to base the transfer resumption on the sizes of the two files rather than verifying that their contents match.

    Now a single command line explained in a little more details:
    rsync -vrPtz -e ssh host:/remote_path/* /local_path/
    Explained:
    -e ssh rsync will use ssh client instead of rsh, which makes data exchange encrypted
    -z compress file transfer
    -t preserve time (other attributes, such as owner and permissions are also possible)
    -P resume incomplete file transfer
    -r recurse into subdirectories
    -v verbose

    To specify a port when using ssh you must add it to the ssh command.
    Example: rsync --partial --progress --rsh="ssh -p 16703" user@host:path

    Share

    Posted in Notepad | No Comments »

    The favourite file compressor: gzip, bzip2, or 7z?

    17th October 2013

    Here comes a heap of assorted web-links!

    I had personally settled on using pbzip2 for these simple reasons:

    • performance scales quasi-linearly with the number of CPU cores (until one hits an I/O bottleneck);
    • when archive is damaged, you are only guaranteed to loose the damaged block(s) of size 100-900 KiB – remaining information might be salvable.

    Compared to pbzip2, neither gzip nor 7z (lzma) offer quasi-linear speedups proportional to the number of CPU cores.
    pigz, the parallel gzip, does parallelize compression, but gzip compresses not as good as bzip2, and decompression is not parallel like in pbzip2.
    7z is multi-threaded, but it tops out at using 2 CPU cores (see links below for tests).

    pbzip2 is also quite a good choice for FASTQ data files: even if a few blocks get lost due to data corruption, this should not noticeably affect the entire dataset.
    Specialized tools for FASTQ compression do exist (see e.g. this article, also Fastqz, fqzcomp, and samcomp project pages.) I think I liked fastqz quite a bit, but I still have to examine data recoverability in the case of archive damage. It is possible to use external parity tools which support file repair using pre-calculated recovery files – like the linux par2 utility, also for bzip2 archives and any other files in general – but adding parity file may negate the higher compression ratio benefits. Also, if there is no independent block structure of the archive, insufficient parity file may lead to the loss of the entire archive. In other words, this still has to be tested.

    Now the long-promised web-links!
    Read the rest of this entry »

    Share

    Posted in *nix, Links, Notepad, Software | 1 Comment »