How to use mkfifo named pipes with prinseq-lite.pl
24th February 2016
prinseq-lite.pl is a utility written in Perl for preprocessing NGS reads, also in FASTQ format.
It can read sequences both from files and from stdin (if you only have 1 sequence).
I wanted to use it with compressed (gzipped/bzipped2) FASTQ input files.
As I do not need to store decompressed input files, the most efficient solution is to use pipes.
This works well for a single file, but not for 2 files (paired-end reads).
For 2 files, named pipes (also known as FIFOs) can be used.
You can create a named pipe in Linux with the help of mkfifo command, for example mkfifo R1_decompressed.fastq.
To use it, start decompressing something into it (either in a different terminal, or in background), for example zcat R1.fastq.gz > R1_decompressed.fastq &;
we can call this a writing/generating process, because it writes into a pipe.
(If you are writing software to use named pipes, any processes writing into them should be started in a new thread, as they will block until all the data is consumed.)
Now if you give the R1_decompressed.fastq as a file argument to some other program, it will see decompressed content (e.g. wc -l R1_decompressed.fastq will tell you the number of lines in the decompressed file); we can call program reading from the named pipe a reading/consuming process.
As soon as a consuming process had consumed (read) all of the data, the writing/generating process will finally exit.
This, however, does not work with prinseq-lite.pl (version 0.20.4 or earlier), with a broken pipe error. Read the rest of this entry »
Posted in *nix, Bioinformatics, Software | No Comments »
