Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    How to use mkfifo named pipes with prinseq-lite.pl

    24th February 2016

    prinseq_logo_1prinseq-lite.pl is a utility written in Perl for preprocessing NGS reads, also in FASTQ format.
    It can read sequences both from files and from stdin (if you only have 1 sequence).

    I wanted to use it with compressed (gzipped/bzipped2) FASTQ input files.
    As I do not need to store decompressed input files, the most efficient solution is to use pipes.
    This works well for a single file, but not for 2 files (paired-end reads).

    For 2 files, named pipes (also known as FIFOs) can be used.
    You can create a named pipe in Linux with the help of mkfifo command, for example mkfifo R1_decompressed.fastq.
    To use it, start decompressing something into it (either in a different terminal, or in background), for example zcat R1.fastq.gz > R1_decompressed.fastq &;
    we can call this a writing/generating process, because it writes into a pipe.
    (If you are writing software to use named pipes, any processes writing into them should be started in a new thread, as they will block until all the data is consumed.)
    Now if you give the R1_decompressed.fastq as a file argument to some other program, it will see decompressed content (e.g. wc -l R1_decompressed.fastq will tell you the number of lines in the decompressed file); we can call program reading from the named pipe a reading/consuming process.
    As soon as a consuming process had consumed (read) all of the data, the writing/generating process will finally exit.

    This, however, does not work with prinseq-lite.pl (version 0.20.4 or earlier), with a broken pipe error.

    Named pipes are very similar to usual files, with two major differences:

    • named pipes are not seekable: you cannot move file pointer (at least not backwards, not sure about skipping forward);
    • you cannot arbitrarily close/re-open a named pipe from the consuming end: closing a pipe on the consuming end also closes it for the writing/generating process.

    The reason why prinseq-lite.pl does not work with named pipes is that it performs file format checking first – by opening the file, reading the first 3 lines, and closing it.
    Closing a named pipe causes broken pipe for the writing process, and when prinseq-lite.pl attempts to open the pipe again – it succeeds, but there is no data there anymore, so it just sits and waits for data :)

    I’m ok with a quick and dirty solution, so here it is: prinseq-lite.pl patch to enable mkfifo named pipes as input files (also local prinseq-lite.pl.patch).
    WARNING: this patch simply disables file format checking!

    Share

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>