Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for February 24th, 2016

    How to use mkfifo named pipes with prinseq-lite.pl

    24th February 2016

    prinseq_logo_1prinseq-lite.pl is a utility written in Perl for preprocessing NGS reads, also in FASTQ format.
    It can read sequences both from files and from stdin (if you only have 1 sequence).

    I wanted to use it with compressed (gzipped/bzipped2) FASTQ input files.
    As I do not need to store decompressed input files, the most efficient solution is to use pipes.
    This works well for a single file, but not for 2 files (paired-end reads).

    For 2 files, named pipes (also known as FIFOs) can be used.
    You can create a named pipe in Linux with the help of mkfifo command, for example mkfifo R1_decompressed.fastq.
    To use it, start decompressing something into it (either in a different terminal, or in background), for example zcat R1.fastq.gz > R1_decompressed.fastq &;
    we can call this a writing/generating process, because it writes into a pipe.
    (If you are writing software to use named pipes, any processes writing into them should be started in a new thread, as they will block until all the data is consumed.)
    Now if you give the R1_decompressed.fastq as a file argument to some other program, it will see decompressed content (e.g. wc -l R1_decompressed.fastq will tell you the number of lines in the decompressed file); we can call program reading from the named pipe a reading/consuming process.
    As soon as a consuming process had consumed (read) all of the data, the writing/generating process will finally exit.

    This, however, does not work with prinseq-lite.pl (version 0.20.4 or earlier), with a broken pipe error. Read the rest of this entry »

    Share

    Posted in *nix, Bioinformatics, Software | No Comments »