Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for the 'Science' Category

    The sugar conspiracy

    19th June 2016

    A long but interesting read: The Sugar Conspiracy.

    Share

    Posted in Links, Misc, Science, Society | No Comments »

    Practical comparison of NGS adapter trimming tools

    1st June 2016

    I used to work with sequencing providers who were giving me fairly clean data.
    It was already barcode-separated, and had no over-represented adapter sequences.
    The only thing I had to do was to (optionally) quality-trim the reads, and check for biological contamination.

    Recently, however, I have come across some real-world data, which not only had contamination in it, but also quite a noticeable percentage of adapters.
    I did a quick test of multiple tools to see if they fit my requirements:

    • should be easy/logical to use: no arcane/convoluted command lines or config files
    • should detect adapters automatically, either using its own database or a provided plain FASTA file
    • should be reasonably fast
    • must leave no adapter traces behind: I prefer aggressive trimming

    I have tried the following tools:

    • fastq-mcf from the ea-tools package
    • skewer
    • TrimmomaticPE
    • cutadapt: haven’t used it directly, but it is used by some of the compared tools
    • bbduk from BBMAP
    • autoadapt
    • TrimGalore!

    Read the rest of this entry »

    Share

    Posted in Bioinformatics | 6 Comments »

    Nobody wants higher-quality, complete bacterial genomes

    24th May 2016

    This is a piece of rant.

    Disclaimer

    The story, all names, characters, genomes and incidents portrayed in this blog post are fictitious.
    No identification with actual persons (living, dead or undead), places, companies, and processes is intended or should be inferred.
    No animals were harmed in the making of this blog post.

    Let’s try answering a question:

    why are there many incomplete/draft bacterial genomes, and much fewer complete genomes?

    Read the rest of this entry »

    Share

    Posted in Bioinformatics, Rant | 2 Comments »

    Streptomyces morphogenesis regulation: overview presentation

    13th May 2016

    Note: this post is just a placeholder/draft, it will be extended later. But it can already be useful ;)
    Read the rest of this entry »

    Share

    Posted in Science | No Comments »

    Preprint servers and open journals

    28th February 2016

    Let’s start with some definitions.

    With Open Journals I’m referring to open/public peer-review journals.
    With preprint servers, I’m referring to services which allow you to publish your manuscript with a DOI, for pre-submission interest and feedback collection.

    I am aware of the following public peer-review journals:

    • F1000 Research: your submission is made public without any editorial pre-screening within an average of 7 days, but only gets indexed in PubMed/Scopus/Scholar after a successful public peer review. Public means that a reviewer-signed evaluation appears together with the submitted manuscript. Authors may respond to criticism, and upload revisions of their submission. I believe a submission passes peer review after two positive reviews. Note that even your initial submission receives a DOI, and is thus citable (as well as all subsequent revisions). Brief examination of articles in some of the topics tells me that F1000 Research is a good place to publish, esp. because it is a kind of pre-print + journal in one package. You pay per-submission, there are 3 tiers by word count.
    • The Winnower: submit-review-revise, but here you pay for the DOI after your submission is reviewed. Before review your submission is thus not citable (except for by URL, which isn’t tracked as easily as DOI references). I haven’t formed an opinion on how attractive the winnower is for submitting, but I did find this quite interesting story for you to enjoy :)
    • Science Open: this project encompasses 5 mostly medical journals. It lists over 11 million articles on the front page, but those are sourced from other publications; Science Open itself seems to have several hundred publications across all 5 journals. Submissions get a DOI, then can undergo public review. It is not clear to me in which direction Science Open will be moving – towards becoming an excellent research papers aggregator, or towards becoming a publishing platform, or – like now – towards both.

    I’m also aware of the following preprint servers:
    Read the rest of this entry »

    Share

    Posted in Science | No Comments »

    How to use mkfifo named pipes with prinseq-lite.pl

    24th February 2016

    prinseq_logo_1prinseq-lite.pl is a utility written in Perl for preprocessing NGS reads, also in FASTQ format.
    It can read sequences both from files and from stdin (if you only have 1 sequence).

    I wanted to use it with compressed (gzipped/bzipped2) FASTQ input files.
    As I do not need to store decompressed input files, the most efficient solution is to use pipes.
    This works well for a single file, but not for 2 files (paired-end reads).

    For 2 files, named pipes (also known as FIFOs) can be used.
    You can create a named pipe in Linux with the help of mkfifo command, for example mkfifo R1_decompressed.fastq.
    To use it, start decompressing something into it (either in a different terminal, or in background), for example zcat R1.fastq.gz > R1_decompressed.fastq &;
    we can call this a writing/generating process, because it writes into a pipe.
    (If you are writing software to use named pipes, any processes writing into them should be started in a new thread, as they will block until all the data is consumed.)
    Now if you give the R1_decompressed.fastq as a file argument to some other program, it will see decompressed content (e.g. wc -l R1_decompressed.fastq will tell you the number of lines in the decompressed file); we can call program reading from the named pipe a reading/consuming process.
    As soon as a consuming process had consumed (read) all of the data, the writing/generating process will finally exit.

    This, however, does not work with prinseq-lite.pl (version 0.20.4 or earlier), with a broken pipe error. Read the rest of this entry »

    Share

    Posted in *nix, Bioinformatics, Software | No Comments »

    Good hands-on explanation of differences between Spearman’s and Pearson’s correlation

    22nd April 2014

    Linear correlation vs. Rank order correlation: drag 11 data points around the plot and observe how both Spearman’s and Pearson’s correlation measures change. But first follow the Next button at the bottom-right for a guided tour of data manipulations.

    Share

    Posted in Links, Science | No Comments »