<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>Autarchy of the Private Cave &#187; Science</title> <atom:link href="https://bogdan.org.ua/categories/science/feed" rel="self" type="application/rss+xml" /><link>https://bogdan.org.ua</link> <description>Tiny bits of bioinformatics, [web-]programming etc</description> <lastBuildDate>Wed, 28 Dec 2022 16:09:04 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>https://wordpress.org/?v=3.8.27</generator> <item><title>The sugar conspiracy</title><link>https://bogdan.org.ua/2016/06/19/the-sugar-conspiracy.html</link> <comments>https://bogdan.org.ua/2016/06/19/the-sugar-conspiracy.html#comments</comments> <pubDate>Sun, 19 Jun 2016 10:27:14 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Links]]></category> <category><![CDATA[Misc]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[Society]]></category> <category><![CDATA[conspiracy]]></category> <category><![CDATA[sugar]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2446</guid> <description><![CDATA[A long but interesting read: The Sugar Conspiracy.]]></description> <content:encoded><![CDATA[<p>A long but interesting read: <a
href="https://www.theguardian.com/society/2016/apr/07/the-sugar-conspiracy-robert-lustig-john-yudkin">The Sugar Conspiracy</a>.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F19%2Fthe-sugar-conspiracy.html&amp;linkname=The%20sugar%20conspiracy" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F19%2Fthe-sugar-conspiracy.html&amp;linkname=The%20sugar%20conspiracy" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F19%2Fthe-sugar-conspiracy.html&amp;linkname=The%20sugar%20conspiracy" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F19%2Fthe-sugar-conspiracy.html&amp;linkname=The%20sugar%20conspiracy" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F19%2Fthe-sugar-conspiracy.html&amp;linkname=The%20sugar%20conspiracy" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F19%2Fthe-sugar-conspiracy.html&#038;title=The%20sugar%20conspiracy" data-a2a-url="https://bogdan.org.ua/2016/06/19/the-sugar-conspiracy.html" data-a2a-title="The sugar conspiracy"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2016/06/19/the-sugar-conspiracy.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Practical comparison of NGS adapter trimming tools</title><link>https://bogdan.org.ua/2016/06/01/practical-comparison-of-ngs-adapter-trimming-tools.html</link> <comments>https://bogdan.org.ua/2016/06/01/practical-comparison-of-ngs-adapter-trimming-tools.html#comments</comments> <pubDate>Wed, 01 Jun 2016 19:23:14 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[adapter]]></category> <category><![CDATA[NGS]]></category> <category><![CDATA[sequencing]]></category> <category><![CDATA[trim]]></category> <category><![CDATA[trimmer]]></category> <category><![CDATA[trimming]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2428</guid> <description><![CDATA[I used to work with sequencing providers who were giving me fairly clean data. It was already barcode-separated, and had no over-represented adapter sequences. The only thing I had to do was to (optionally) quality-trim the reads, and check for biological contamination. Recently, however, I have come across some real-world data, which not only had [&#8230;]]]></description> <content:encoded><![CDATA[<p>I used to work with sequencing providers who were giving me fairly clean data.<br
/> It was already barcode-separated, and had no over-represented adapter sequences.<br
/> The only thing I had to do was to (optionally) quality-trim the reads, and check for biological contamination.</p><p>Recently, however, I have come across some <em>real-world data</em>, which not only had contamination in it, but also quite a noticeable percentage of adapters.<br
/> I did a quick test of multiple tools to see if they fit my requirements:</p><ul><li>should be easy/logical to use: no arcane/convoluted command lines or config files</li><li>should detect adapters automatically, either using its own database or a provided plain FASTA file</li><li>should be reasonably fast</li><li>must leave no adapter traces behind: I prefer aggressive trimming</li></ul><p>I have tried the following tools:</p><ul><li>fastq-mcf from the ea-tools package</li><li><a
href="https://sourceforge.net/projects/skewer/">skewer</a></li><li>TrimmomaticPE</li><li>cutadapt: haven&#8217;t used it directly, but it is used by some of the compared tools</li><li>bbduk from <a
href="https://sourceforge.net/projects/bbmap/">BBMAP</a></li><li><a
href="https://github.com/optimuscoprime/autoadapt">autoadapt</a></li><li><a
href="http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/">TrimGalore!</a></li></ul><p><span
id="more-2428"></span></p><p>As input, I have used 2 FASTQ files, each about 8.4 gigabytes<br
/> (or 3 785 687 KBytes together in 2 bzip2-compressed files, or 129 753 452 lines / 32 438 363 reads per file).<br
/> Time was measured with bash&#8217;s built-in <code>time</code>.<br
/> The all_adapters.txt is a plain FASTA file I took from FastQC distribution a long while ago,<br
/> and possibly added some more adapter sequences scavenged from the internet.</p><p><strong>fastq-mcf</strong> (ea-tools)<br
/> <code>fastq-mcf ~/bin/all_adapters.txt -o R1.clip.fastq -o R2.clip.fastq input_R1.fastq input_R2.fastq</code></p><ul><li>non-obvious way to specify 2 outputs for 2 inputs, but not complicated either</li><li>can be given a file with dozens of adapters: will auto-identify which adapters to trim</li><li>single-threaded, uses 315M RES and 380M VIRT</li><li>83.5 minutes on a loaded system</li><blockquote><p> Reads too short after clip: 137 684<br
/> Clipped &#8216;end&#8217; reads (input_R1.fastq): Count 895 775, Mean: 24.36, Sd: 17.32<br
/> Trimmed 2 072 551 reads (input_R1.fastq) by an average of 4.46 bases on quality < 7
Clipped 'end' reads (input_R2.fastq): Count 850 718, Mean: 25.70, Sd: 17.19
Trimmed 8 729 083 reads (input_R2.fastq) by an average of 4.44 bases on quality < 7</p></blockquote></ul><p><strong>skewer</strong><br
/> <code>skewer -x ~/bin/all_adapters.txt --mode pe --threads 8 input_R1.fastq input_R2.fastq</code></p><ul><li>looks much fancier: uses colors and has a text-mode progress bar</li><li>is multi-threaded, but appears to be extremely slow &#8211; <strong>much slower than single-threaded fastq-mcf</strong> &#8211; <ins
datetime="2016-06-15T12:24:23+00:00">update</ins>: it is incredibly fast if instead of 96 adapters you just give it 3 or so;</li><li>can read up to 96 adapters from the file&#8230; should be fine for most purposes</li><li>uses very little RAM (~4 megabytes RES, ~450M VIRT)</li><li>really slow: real 177m52.933s , user 1212m3.644s (7 threads)</li><blockquote><p> 32 438 363 read pairs processed; of these:<br
/> 12 339 ( 0.04%) short read pairs filtered out after trimming by size control<br
/> 94 409 ( 0.29%) empty read pairs filtered out after trimming by size control<br
/> 32 331 615 (99.67%) read pairs available; of these:<br
/> 934 379 ( 2.89%) trimmed read pairs available after processing<br
/> 31 397 236 (97.11%) untrimmed read pairs available after processing</p></blockquote></ul><p><strong>TrimmomaticPE</strong><br
/> <code>TrimmomaticPE -threads 8 -trimlog trimmomatic.log input_R1.fastq.bz2 input_R2.fastq.bz2 lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz ILLUMINACLIP:/usr/share/trimmomatic/TruSeq3-PE-2.fa:2:40:15</code></p><ul><li>failed to start without <em>seemingly</em> optional arguments to ILLUMINACLIP with an uninformative error message</li><li>uses 1.5+GB RES, 7.8GB VIRT, and does not fully utilize all 8 threads (CPU load only at around 500%, where 100% means 1 core)</li><li>does not seem to be I/O bound, but log file is <em>huge</em>: contains all read identifiers<ul><li>it might be better to disable log file (do not specify <code>-trimlog</code>) for higher I/O speed</li></ul></li><li>comes bundled with some adapters already, but:<ul><li>does not detect adapters itself: you have to know which file to choose</li><li>adapter files are structured in a way preventing merging them into a single file: adapter names have special meaning to Trimmomatic</li></ul></li><li>real 19m39.431s, user 71m8.600s , sys 23m44.556s: much faster than either skewer or fastq-mcf</li><blockquote><p> Input Read Pairs: 32438363<br
/> Both Surviving: 31591307 (97.39%)<br
/> Forward Only Surviving: 750772 (2.31%)<br
/> Reverse Only Surviving: 8023 (0.02%)<br
/> Dropped: 88261 (0.27%)</p></blockquote></ul><p>NOT trying <strong>cutadapt</strong>:</p><ul><li>looks great based on reading the manual</li><li>only accepts adapters on the command-line, and does not come with adapter files to use</li><li>is in Python/Python3, so could be easier re-used from Python programs</li></ul><p><strong>BBMAP</strong><br
/> <code>bbduk.sh in=input_R1.fastq.bz2 in2=input_R2.fastq.bz2 out=bbduk_clean_1.fastq out2=bbduk_clean_2.fastq ref=~/bin/all_adapters.txt</code></p><ul><li>refused to load some JNI library:<br
/><blockquote>Error: Could not find or load main class utilities.bbmap.jni.</p></blockquote></li><li>changed into bbmap/jni and ran <code>export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 ; make -f makefile.linux</code>, but this didn&#8217;t help</li><li>failed to run</li></ul><p><strong>autoadapt</strong> (relies on FastQC and cutadapt)<br
/> <code>autoadapt.pl --threads=8 input_R1.fastq autoadapt_clean_1.fastq input_R2.fastq autoadapt_clean_2.fastq</code></p><ul><li>first runs FastQC to a temporary file (0.5GB RES, 4.8GB VIRT)<ul><li>fastqc is started with <code>--threads 8</code>, but only 1 file is fed to fastqc&#8230;</li></ul></li><li>auto-detected adapters, from FastQC&#8217;s output:<br
/><blockquote><p> Detected the following known contaminant sequences:<br
/> Illumina Single End PCR Primer 1 (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT)<br
/> TruSeq Adapter, Index 7 (GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG)</p></blockquote></li><li>used <strong>over 15 GB RAM! + swap!</strong></li><li>this is too much, killed and re-starting with 1 thread</li><li>uses cutadapt (<8M RES, <31M VIRT), looking for adapters anywhere (and not only at 3' like TrimGalore does); here's the generated command sample:
<code><pre>cutadapt --format fastq --match-read-wildcards --times 2 --error-rate 0.2
--minimum-length 18 --quality-cutoff 20 --quality-base 33
--anywhere=GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
--anywhere=CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
--anywhere=AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
--anywhere=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
--paired-output autoadapt/autoadapt.tmp.f_zxQr95/autoadapt_R2.fastq.tmp
-o autoadapt/autoadapt.tmp.f_zxQr95/autoadapt_R1.fastq.tmp
input_R1.fastq input_R2.fastq &#038;&#038; cutadapt --format fastq --match-read-wildcards
--times 2 --error-rate 0.2 --minimum-length 18 --quality-cutoff 20 --quality-base 33
--anywhere=GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
--anywhere=CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
--anywhere=AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
--anywhere=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
--paired-output autoadapt_R1.fastq -o autoadapt_R2.fastq
autoadapt/autoadapt.tmp.f_zxQr95/autoadapt_R2.fastq.tmp
autoadapt/autoadapt.tmp.f_zxQr95/autoadapt_R1.fastq.tmp</pre><p></code></li><li>uses its own directory for intermediate/temporary files, then moves to destination &#8211; not good&#8230;<ul><li>the problem is that program&#8217;s partition may not have enough space for all the intermediate data</li><li>actually, cutadapt is run twice:<ul><li>first to the temporary directory</li><li>then to the final destination, using temporary/intermediate files as inputs</li></ul></li></ul></li><li>ran out of space in /home&#8230; created a copy of cutadapt under ~/data volume</li><li><code>/usr/bin/time -f '%C: %e s, %M Kb' ~/data/autoadapt-tmp-copy/autoadapt.pl --threads=1 input_R1.fastq autoadapt_R1.fastq input_R2.fastq autoadapt_R2.fastq</code></li><li>over 1h CPU time already, and still about half-done&#8230; should try with <code>--threads=2</code> or 4, maybe RAM use will be somewhat better?</li><li>total time 9979.87 seconds (2.8 hours), max RAM 235 480 Kb</li><li>trying in 4 threads: again 15+ Gb RAM and 7+Gb swap, killed at this point;<ul><li>the problem seems to be somewhere in the read splitting code &#8211; apparently, it keeps reads in RAM (???) while splitting&#8230;</li><li>looking at the split files: they are all partial, so autoadapt.pl somehow attempts to parallel-split into all thread segments at once</li></ul></li><li>trying to edit <code>splitFile()</code> function to use GNU split command; hopefully, <code>mergeFile()</code> does not use gigabytes of RAM&#8230;<ul><li>for testing: hard-code tmp dir name; skip actual fastqc</li><li>this now works great! let&#8217;s wait for merging&#8230;</li><li><code>mergeFile()</code> still eats ~2.5Gb of RES</li></ul></li><li>because of all the splitting, temporary directory size easily jumps to about 3x the original file size, or ~48 GB for ~16 GB of input files</li><li>3007.68 s (50 minutes &#8211; this does not include the initial FastQC run), 2 523 952 Kb (this is mostly the file merging operation)</li><li>it does not show any stats at the end</li></ul><p><strong>Trim Galore!</strong><br
/> <code>trim_galore --fastqc --path-to-cutadapt /usr/bin/cutadapt3 --paired input_R1.fastq input_R2.fastq</code></p><ul><li>the trim_galore perl wrapper itself consumes just a few megabytes of RAM</li><li>uses cutadapt for actual work</li><li>auto-detects adapters, although somehow the Illumina adapter found is only a substring of what was found by autoadapt/FastQC&#8230;<br
/><blockquote><p> Found perfect matches for the following adapter sequences:<br
/> Adapter type    Count   Sequence        Sequences analysed      Percentage<br
/> Illumina        17429   AGATCGGAAGAGC   1000000 1.74<br
/> Nextera 0       CTGTCTCTTATA    1000000 0.00<br
/> smallRNA        0       TGGAATTCTCGG    1000000 0.00<br
/> Using Illumina adapter for trimming (count: 17429). Second best hit was Nextera (count: 0)</p></blockquote></li><li>can run FastQC itself on the processed data, if so instructed by a command-line option</li><li>trims and summarizes each file separately</li><blockquote><p> Total reads processed:              32,438,363<br
/> Reads with adapters:                 6,878,225 (21.2%)<br
/> Reads written (passing filters):    32,438,363 (100.0%)<br
/> Total basepairs processed: 3,276,274,663 bp<br
/> Quality-trimmed:              11,132,367 bp (0.3%)<br
/> Total written (filtered):  3,226,980,229 bp (98.5%)</p></blockquote><li>cutadapt processes about 4 million reads/minute on my work PC i7</li><blockquote><p> Total reads processed:              32,438,363<br
/> Reads with adapters:                 6,030,241 (18.6%)<br
/> Reads written (passing filters):    32,438,363 (100.0%)<br
/> Total basepairs processed: 3,276,274,663 bp<br
/> Quality-trimmed:              40,530,133 bp (1.2%)<br
/> Total written (filtered):  3,199,297,597 bp (97.7%)</p></blockquote><li>length is checked after cutadapt:<br
/><blockquote><p>Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 145312 (0.45%)</p></blockquote></li><li>1955.49 s (32.6 minutes), 228 592 Kb (this is likely FastQC&#8217;s top RAM use)</li></ul><p><strong>How do I evaluate the quality of trimming?</strong><br
/> Notably, all trimmers removed the &#8220;Adapters detected&#8221; section from FastQC&#8217;s output.<br
/> For now, I&#8217;m simply choosing the smallest pair of processed read files<br
/> (under the assumption that the smallest is the most aggressively trimmed).</p><p>File sizes after trimming, R1+R2<br
/> 16&#8217;750&#8217;631 trimmomatic<br
/> 16&#8217;770&#8217;603 autoadapt, threads=1<br
/> 16&#8217;771&#8217;639 autoadapt, threads=8 // after swapping Perl splitter function for GNU split<br
/> 16&#8217;924&#8217;934 trimGalore<br
/> 16&#8217;963&#8217;937 fastq-mcf<br
/> 17&#8217;057&#8217;065 skewer</p><p>Looking at FastQC plots, major differences can be seen in read lengths distribution (which depends on how much of the sequence tail/head was trimmed),<br
/> per-tile quality (trimmomatic and skewer do not perform any kind of quality trimming by default, others do), and k-mer content.<br
/> For k-mer content, trimmomatic, trimGalore, and skewer look the most natural: there is a background of random-looking lesser spikes (up to 2-4),<br
/> and one or two bigger spikes (up to 12). For other tools (autoadapt, fastq-mcf) k-mer content looks like a flat line (but likely also 2-4)<br
/> with several huge spikes (up to 35-40). In fact, only autoadapt, trimgalore, and skewer got a &#8220;warning&#8221; on k-mer content &#8211; all others got an &#8220;error&#8221;.</p><p>Overall, Trimmomatic and trimGalore appear to be the two best adapter trimmers, both by aggressiveness+FastQC reports and by speed.<br
/> But trimGalore detected significantly shorter adapter, and also Trimmomatic produced a smaller, more aggressively trimmed file.<br
/> On the downside, Trimmomatic does not auto-detect adapters! This can be alleviated by first running FastQC on the input files,<br
/> then checking /usr/share/trimmomatic/ for matching adapter files &#8211; those which contain both adapters detected by FastQC.</p><p>Will use Trimmomatic for now.</p><p><ins
datetime="2016-06-10T16:47:12+00:00">Important update</ins>:</p><ul><li>It is possible to (quite easily) construct a file with all the adapters for <code>Trimmomatic</code>, and it will happily try to trim anything from that file; <code>Trimmomatic</code> is now my <em>sledgehammer</em> &#8211; give it anything, and it will crush it.</li><li><del
datetime="2016-06-11T16:37:53+00:00">I have just used <code>cutadapt</code> directly, on a peculiar case of Nextera transposon contamination throughout the length of reads. The advantage of <code>cutadapt</code> is that you can specify how many times to trim the adapter &#8211; by default it is just 1, but I&#8217;ve set it to 20 and got rid of all Nextera leftovers. <code>cutadapt</code> is now my <em>scalpel</em> &#8211; I use it in pathological cases, when I know what (and how much of it) to cut out.</del></li><li>Specifically for Nextera, I&#8217;m now using <a
href="https://github.com/sequencing/NxTrim">NxTrim</a> &#8211; a tool from Illumina, which examines the reads and splits them into several categories: proper MP, PE, single-end/overlapping reads, and <em>unknown</em>. After NxTrim, individual reads should still have other sequencing adapters clipped.</li></ul><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F01%2Fpractical-comparison-of-ngs-adapter-trimming-tools.html&amp;linkname=Practical%20comparison%20of%20NGS%20adapter%20trimming%20tools" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F01%2Fpractical-comparison-of-ngs-adapter-trimming-tools.html&amp;linkname=Practical%20comparison%20of%20NGS%20adapter%20trimming%20tools" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F01%2Fpractical-comparison-of-ngs-adapter-trimming-tools.html&amp;linkname=Practical%20comparison%20of%20NGS%20adapter%20trimming%20tools" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F01%2Fpractical-comparison-of-ngs-adapter-trimming-tools.html&amp;linkname=Practical%20comparison%20of%20NGS%20adapter%20trimming%20tools" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F01%2Fpractical-comparison-of-ngs-adapter-trimming-tools.html&amp;linkname=Practical%20comparison%20of%20NGS%20adapter%20trimming%20tools" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2016%2F06%2F01%2Fpractical-comparison-of-ngs-adapter-trimming-tools.html&#038;title=Practical%20comparison%20of%20NGS%20adapter%20trimming%20tools" data-a2a-url="https://bogdan.org.ua/2016/06/01/practical-comparison-of-ngs-adapter-trimming-tools.html" data-a2a-title="Practical comparison of NGS adapter trimming tools"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2016/06/01/practical-comparison-of-ngs-adapter-trimming-tools.html/feed</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>Nobody wants higher-quality, complete bacterial genomes</title><link>https://bogdan.org.ua/2016/05/24/nobody-wants-higher-quality-complete-bacterial-genomes.html</link> <comments>https://bogdan.org.ua/2016/05/24/nobody-wants-higher-quality-complete-bacterial-genomes.html#comments</comments> <pubDate>Tue, 24 May 2016 15:18:07 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Rant]]></category> <category><![CDATA[assembly]]></category> <category><![CDATA[bacteria]]></category> <category><![CDATA[basic income]]></category> <category><![CDATA[genome]]></category> <category><![CDATA[rant]]></category> <category><![CDATA[sequencing]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2424</guid> <description><![CDATA[This is a piece of rant. Disclaimer The story, all names, characters, genomes and incidents portrayed in this blog post are fictitious. No identification with actual persons (living, dead or undead), places, companies, and processes is intended or should be inferred. No animals were harmed in the making of this blog post. Let&#8217;s try answering [&#8230;]]]></description> <content:encoded><![CDATA[<p>This is a piece of rant.</p><p><strong>Disclaimer</strong></p><blockquote><p>The story, all names, characters, genomes and incidents portrayed in this blog post are fictitious.<br
/> No identification with actual persons (living, dead or undead), places, companies, and processes is intended or should be inferred.<br
/> No animals were harmed in the making of this blog post.</p></blockquote><p>Let&#8217;s try answering a question:</p><blockquote><p>why are there many incomplete/draft bacterial genomes, and much fewer complete genomes?</p></blockquote><p><span
id="more-2424"></span></p><p>The answer is simple: insufficient value/cost ratio.<br
/> This can also be summarized as the <em>good enough</em> principle: if something is <em>good enough</em>, it does not get improved.</p><p><strong>Sample scenario 1</strong>.<br
/> Players: Principal Investigator (<strong>PI</strong>), Bacterial Genome (<strong>BG</strong>), Biologist (<strong>B</strong>), Sequencing Company (<strong>SC</strong>), (optional) Bioinformatician (<strong>oBI</strong>), Genomes Database (<strong>GD</strong>).</p><p><strong>B</strong> is interested to work with <strong>BG</strong>, and gets <strong>PI</strong>&#8216;s approval to sequence it.<br
/> Biomaterial is sent to <strong>SC</strong>, which sequences and even assembles the <strong>BG</strong>.<br
/> <strong>BG</strong> looks overall great and comes in just a handful fragments.<br
/> <strong>oBI</strong> is (optionally) involved, to annotate and describe the <strong>BG</strong>.<br
/> <strong>B</strong> works happily with the <strong>BG</strong>, describing and characterizing all the interesting biosynthetic features it contains.<br
/> An article is prepared, and <strong>oBI</strong> is (optionally) involved again, to prepare and submit the <strong>BG</strong> to the <strong>GD</strong>.<br
/> Preparing the <strong>BG</strong>, <strong>oBI</strong> has to answer a question if this <strong>BG</strong> contains any plasmids.<br
/> Upon closer examination, <strong>oBI</strong> finds that one of the fragments is actually the complete chromosome, and all others are just unplaced fragments of it.<br
/> <strong>oBI</strong> knows that this genome could probably be merged into a single draft scaffold<br
/> using bioinformatics tools and manual examination in maybe a few days (or a week&#8230; or two? <img
src="https://bogdan.org.ua/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> ).<br
/> <strong>oBI</strong> also knows that with a little bit of <strong>B</strong>&#8216;s help (a few primer walking experiments) it should be possible to have the complete <strong>BG</strong> within a month or two.<br
/> However, <strong>BG</strong> stays a draft, and is not going to be complete any time soon.</p><p>Why?</p><p>Let&#8217;s look at motivations of all the players, and see if any of the players <em>wants</em> the complete <strong>BG</strong>:</p><ul><li><strong>PI</strong> wants publications; spending extra time/effort to make <strong>BG</strong> complete does not present any obvious benefits;</li><li><strong>BG</strong> wants to be left alone;</li><li><strong>B</strong> wants to publish exciting new findings; they are already supported by the draft <strong>BG</strong>, so there is clearly no need for a complete <strong>BG</strong>;</li><li><strong>SC</strong> was happy to get payment in time; <strong>SC</strong> is also proud to be able to provide genome assembly as an extra service with its (primary) sequencing offers;</li><li><strong>oBI</strong> has interest in finishing the <strong>BG</strong>: it will then be complete; however, there are 5 more other BGs awaiting processing, and the backlog of semi-written manuscripts only keeps growing&#8230; finishing this specific <strong>BG</strong> will not result in a perceived benefit to <strong>oBI</strong>;</li><li><strong>GD</strong> stores genomes; it doesn&#8217;t care much if the genome submitted could have been better.</li></ul><p><em>Surprise</em>!<br
/> Looks like <strong>none of the players sees benefits in actually finishing the BG</strong>,<br
/> simply because efforts spent (or time waited) does not bring any perceived benefits to any of the players.</p><p><strong>Sample scenario 2</strong>.<br
/> Players: Bacterial Genome (<strong>BG</strong>), Biologist (<strong>B</strong>), Sequencing Company (<strong>SC</strong>), non-optional Bioinformatician (<strong>noBI</strong>), Genomes Database (<strong>GD</strong>).</p><p>This time, <strong>B</strong> (who is interested in <em>quickly</em> publishing a short genome announcement) asks for <strong>noBI</strong>&#8216;s help from the moment the <strong>BG</strong> is provided by the <strong>SC</strong>.<br
/> <strong>noBI</strong> has a cursory look at the <strong>BG</strong>, and although there is a huge discrepancy between thousands of contigs on the one hand and insanely high coverage on the other,<br
/> the <strong>BG</strong> otherwise appears <em>good enough</em> for further work, especially after scaffolding; after all, this is <em>just</em> a genome announcement, not a full-blown <em>article</em>!<br
/> There is also some weirdness about the coverage distribution of the <strong>BG</strong>, but <strong>noBI</strong> carelessly ignores that.<br
/> The <strong>BG</strong> is worked on: annotated, examined, described, prepared for submission to the <strong>GD</strong>.<br
/> Meanwhile, the announcement article is also nearly complete.<br
/> Genome is submitted, and <strong>GD</strong>&#8216;s response comes back: some scaffolds contain <em>orangutan</em> and <em>human</em> DNA, and some scaffolds contain known <em>adapter sequences</em> in the middle&#8230;<br
/> &#8220;<em>Oh crap</em>&#8220;, thinks <strong>noBI</strong>, &#8220;<em>I should have checked the raw reads for adapters and contamination, in spite of having the <strong>BG</strong> assembly already</em>&#8230;&#8221;<br
/> The <strong>GD</strong> also kindly offers an easy way out: just remove the obviously-orangutan scaffolds, and remove/mask/discard adapter sequences.<br
/> This is the <strong>easy way</strong>, leading to a <em>quicker</em> genome announcement, and a slight bump to the personal publication records of both <strong>B</strong> and <strong>noBI</strong>.</p><p>The <strong>right way</strong> is, of course, to clean raw reads from adapters and contamination, re-assemble, re-scaffold, re-annotate, re-describe the BG,<br
/> then prepare again for submission. This can delay the <em>quick</em> genome announcement by about a week,<br
/> but will highly likely result in a more contiguous and more correct BG &#8211; although still not complete.</p><p>As we have learned from Scenario 1, perceived benefits of going the <em>right</em> way (as opposed to the <em>easy</em> way) are nearly non-existent&#8230;</p><p>There was a genome I have finalized manually a few years ago.<br
/> I had some good quality data, obtained a 300-something contigs initial assembly,<br
/> then scaffolded and manually finalized to about 10 scaffolds.<br
/> There was simply not enough evidence (data) to keep merging scaffolds, so I had to stop.</p><p>Nowadays, as <em>bacterial genome sequencing prices are akin to weekend supermarket shopping expenses</em>,<br
/> nobody is going the extra mile to produce a better quality, more contiguous, or even a complete genome.<br
/> And this feels sad&#8230;</p><p>On the other hand, consumer markets function like that for decades.<br
/> An old water heater with a failed heating element is not repaired: it is replaced by a new water heater,<br
/> because human time cost to repair the old one is higher than just buying a new one.</p><p>Funnily, universal basic income might change that: without the need to spend 40+ hours a week at work<br
/> (and thus being unable to repair that water heater on one&#8217;s own),<br
/> one might just order that heating element and fix it &#8211; instead of buying the new one.</p><p>Would universal basic income have the same effect on draft and incomplete bacterial genomes? I have no idea.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F24%2Fnobody-wants-higher-quality-complete-bacterial-genomes.html&amp;linkname=Nobody%20wants%20higher-quality%2C%20complete%20bacterial%20genomes" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F24%2Fnobody-wants-higher-quality-complete-bacterial-genomes.html&amp;linkname=Nobody%20wants%20higher-quality%2C%20complete%20bacterial%20genomes" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F24%2Fnobody-wants-higher-quality-complete-bacterial-genomes.html&amp;linkname=Nobody%20wants%20higher-quality%2C%20complete%20bacterial%20genomes" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F24%2Fnobody-wants-higher-quality-complete-bacterial-genomes.html&amp;linkname=Nobody%20wants%20higher-quality%2C%20complete%20bacterial%20genomes" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F24%2Fnobody-wants-higher-quality-complete-bacterial-genomes.html&amp;linkname=Nobody%20wants%20higher-quality%2C%20complete%20bacterial%20genomes" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F24%2Fnobody-wants-higher-quality-complete-bacterial-genomes.html&#038;title=Nobody%20wants%20higher-quality%2C%20complete%20bacterial%20genomes" data-a2a-url="https://bogdan.org.ua/2016/05/24/nobody-wants-higher-quality-complete-bacterial-genomes.html" data-a2a-title="Nobody wants higher-quality, complete bacterial genomes"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2016/05/24/nobody-wants-higher-quality-complete-bacterial-genomes.html/feed</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Streptomyces morphogenesis regulation: overview presentation</title><link>https://bogdan.org.ua/2016/05/13/streptomyces-morphogenesis-regulation-overview-presentation.html</link> <comments>https://bogdan.org.ua/2016/05/13/streptomyces-morphogenesis-regulation-overview-presentation.html#comments</comments> <pubDate>Fri, 13 May 2016 09:18:44 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Science]]></category> <category><![CDATA[Actinobacteria]]></category> <category><![CDATA[AdpA]]></category> <category><![CDATA[BldA]]></category> <category><![CDATA[BldD]]></category> <category><![CDATA[BldH]]></category> <category><![CDATA[BldM]]></category> <category><![CDATA[hyphae]]></category> <category><![CDATA[morphogenesis]]></category> <category><![CDATA[mycelium]]></category> <category><![CDATA[regulation]]></category> <category><![CDATA[sporulation]]></category> <category><![CDATA[Streptomyces]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2408</guid> <description><![CDATA[Note: this post is just a placeholder/draft, it will be extended later. But it can already be useful Streptomyces Morphogenesis Streptomyces Morphogenesis notes Morphogenesis regulation poster]]></description> <content:encoded><![CDATA[<p>Note: this post is just a placeholder/draft, it will be extended later. But it can already be useful <img
src="https://bogdan.org.ua/wp-includes/images/smilies/icon_wink.gif" alt=";)" class="wp-smiley" /><br
/> <span
id="more-2408"></span><br
/> <a
href="http://bogdan.org.ua/wp-content/uploads/2016/05/regulation-poster.png"><img
src="http://bogdan.org.ua/wp-content/uploads/2016/05/regulation-poster-500x353.png" alt="morphogenesis regulation" width="500" height="353" class="alignleft size-medium wp-image-2413" /></a><br
/> <a
href="http://bogdan.org.ua/wp-content/uploads/2016/05/Streptomyces-Morphogenesis.pdf">Streptomyces Morphogenesis</a><br
/> <a
href="http://bogdan.org.ua/wp-content/uploads/2016/05/Streptomyces-Morphogenesis-notes.pdf">Streptomyces Morphogenesis notes</a><br
/> <a
href="http://bogdan.org.ua/wp-content/uploads/2016/05/regulation-poster.pdf">Morphogenesis regulation poster</a></p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F13%2Fstreptomyces-morphogenesis-regulation-overview-presentation.html&amp;linkname=Streptomyces%20morphogenesis%20regulation%3A%20overview%20presentation" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F13%2Fstreptomyces-morphogenesis-regulation-overview-presentation.html&amp;linkname=Streptomyces%20morphogenesis%20regulation%3A%20overview%20presentation" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F13%2Fstreptomyces-morphogenesis-regulation-overview-presentation.html&amp;linkname=Streptomyces%20morphogenesis%20regulation%3A%20overview%20presentation" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F13%2Fstreptomyces-morphogenesis-regulation-overview-presentation.html&amp;linkname=Streptomyces%20morphogenesis%20regulation%3A%20overview%20presentation" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F13%2Fstreptomyces-morphogenesis-regulation-overview-presentation.html&amp;linkname=Streptomyces%20morphogenesis%20regulation%3A%20overview%20presentation" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2016%2F05%2F13%2Fstreptomyces-morphogenesis-regulation-overview-presentation.html&#038;title=Streptomyces%20morphogenesis%20regulation%3A%20overview%20presentation" data-a2a-url="https://bogdan.org.ua/2016/05/13/streptomyces-morphogenesis-regulation-overview-presentation.html" data-a2a-title="Streptomyces morphogenesis regulation: overview presentation"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2016/05/13/streptomyces-morphogenesis-regulation-overview-presentation.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Preprint servers and open journals</title><link>https://bogdan.org.ua/2016/02/28/preprint-servers-and-open-journals.html</link> <comments>https://bogdan.org.ua/2016/02/28/preprint-servers-and-open-journals.html#comments</comments> <pubDate>Sun, 28 Feb 2016 13:13:53 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Science]]></category> <category><![CDATA[journal]]></category> <category><![CDATA[open]]></category> <category><![CDATA[open access]]></category> <category><![CDATA[peer review]]></category> <category><![CDATA[preprint]]></category> <category><![CDATA[public]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2365</guid> <description><![CDATA[Let&#8217;s start with some definitions. With Open Journals I&#8217;m referring to open/public peer-review journals. With preprint servers, I&#8217;m referring to services which allow you to publish your manuscript with a DOI, for pre-submission interest and feedback collection. I am aware of the following public peer-review journals: F1000 Research: your submission is made public without any [&#8230;]]]></description> <content:encoded><![CDATA[<p>Let&#8217;s start with some definitions.</p><p>With <em>Open Journals</em> I&#8217;m referring to open/public peer-review journals.<br
/> With <em>preprint servers</em>, I&#8217;m referring to services which allow you to publish your manuscript with a DOI, for pre-submission interest and feedback collection.</p><p>I am aware of the following public peer-review journals:</p><ul><li><a
href="http://f1000research.com/">F1000 Research</a>: your submission is made public without any editorial pre-screening within an average of 7 days, but only gets indexed in PubMed/Scopus/Scholar after a successful public peer review. Public means that a reviewer-signed evaluation appears together with the submitted manuscript. Authors may respond to criticism, and upload revisions of their submission. I believe a submission passes peer review after two positive reviews. Note that even your initial submission receives a DOI, and is thus citable (as well as all subsequent revisions). Brief examination of articles in some of the topics tells me that F1000 Research is a good place to publish, esp. because it is a kind of <em>pre-print + journal</em> in one package. You pay per-submission, there are 3 tiers by word count.</li><li><a
href="https://thewinnower.com/">The Winnower</a>: submit-review-revise, but here you pay for the DOI after your submission is reviewed. Before review your submission is thus not citable (except for by URL, which isn&#8217;t tracked as easily as DOI references). I haven&#8217;t formed an opinion on how attractive the winnower is for submitting, but I did find this <a
href="https://thewinnower.com/discussions/26-what-happened-when-we-tried-to-publish-a-real-paper-investigating-time-travel" class="broken_link" rel="nofollow">quite interesting story</a> for you to enjoy <img
src="https://bogdan.org.ua/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></li><li><a
href="https://www.scienceopen.com/">Science Open</a>: this project encompasses 5 mostly medical journals. It lists over 11 million articles on the front page, but those are sourced from other publications; Science Open itself seems to have several hundred publications across all 5 journals. Submissions get a DOI, then can undergo public review. It is not clear to me in which direction Science Open will be moving &#8211; towards becoming an excellent research papers aggregator, or towards becoming a publishing platform, or &#8211; like now &#8211; towards both.</li></ul><p>I&#8217;m also aware of the following preprint servers:<br
/> <span
id="more-2365"></span></p><ul><li><a
href="http://arxiv.org/">arXiv</a>: probably the oldest one, suitable for quantitative research. Submissions are pre-screened to meet certain minimal requirements.</li><li><a
href="http://biorxiv.org/">bioRxiv</a> (CSHL): preprint server for biology. Submissions are pre-screened to meet certain minimal requirements.</li><li><a
href="https://figshare.com/">figShare</a>: online repository for digital artifacts, including figures, datasets, tables, PDF files <em>et cetera</em>. Uploaded items get a DOI. I used to think that you have to pay for a DOI, but right now this feature is listed under <em>free account features</em>.</li><li><a
href="https://peerj.com/preprints/">PeerJ preprints</a> (and <a
href="https://peerj.com/">PeerJ journal</a>): preprints are free, and you can submit a PeerJ preprint to PeerJ with a single button click. PeerJ has two journals, PeerJ itself (Life, Bio, Health) and PeerJ Computer Science. As is common, manuscript submitter pays for open access article. PeerJ has several different schemes of payment, including per-article, author membership, and institutional subscription. PeerJ has approximately 1800 articles published.</li><li><a
href="http://zenodo.org/">Zenodo</a> is a DOI-providing repository similar to FigShare, powered by Horizon-2020 EU program funding and CERN&#8217;s Data Centre.</li></ul><p>So far I only had experience with BioRxiv, and it was great. I&#8217;ll consider F1000 Research or PeerJ for some of my next manuscripts &#8211; both models are quite attractive, especially F1000&#8242;s open review.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F28%2Fpreprint-servers-and-open-journals.html&amp;linkname=Preprint%20servers%20and%20open%20journals" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F28%2Fpreprint-servers-and-open-journals.html&amp;linkname=Preprint%20servers%20and%20open%20journals" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F28%2Fpreprint-servers-and-open-journals.html&amp;linkname=Preprint%20servers%20and%20open%20journals" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F28%2Fpreprint-servers-and-open-journals.html&amp;linkname=Preprint%20servers%20and%20open%20journals" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F28%2Fpreprint-servers-and-open-journals.html&amp;linkname=Preprint%20servers%20and%20open%20journals" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F28%2Fpreprint-servers-and-open-journals.html&#038;title=Preprint%20servers%20and%20open%20journals" data-a2a-url="https://bogdan.org.ua/2016/02/28/preprint-servers-and-open-journals.html" data-a2a-title="Preprint servers and open journals"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2016/02/28/preprint-servers-and-open-journals.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>How to use mkfifo named pipes with prinseq-lite.pl</title><link>https://bogdan.org.ua/2016/02/24/how-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html</link> <comments>https://bogdan.org.ua/2016/02/24/how-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html#comments</comments> <pubDate>Wed, 24 Feb 2016 11:39:37 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[*nix]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[FIFO]]></category> <category><![CDATA[gist]]></category> <category><![CDATA[mkfifo]]></category> <category><![CDATA[patch]]></category> <category><![CDATA[prinseq-lite.pl]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2360</guid> <description><![CDATA[prinseq-lite.pl is a utility written in Perl for preprocessing NGS reads, also in FASTQ format. It can read sequences both from files and from stdin (if you only have 1 sequence). I wanted to use it with compressed (gzipped/bzipped2) FASTQ input files. As I do not need to store decompressed input files, the most efficient [&#8230;]]]></description> <content:encoded><![CDATA[<p><img
src="http://bogdan.org.ua/wp-content/uploads/2016/02/prinseq_logo_1.png" alt="prinseq_logo_1" width="204" height="32" class="alignleft size-full wp-image-2361" /><a
href="http://prinseq.sourceforge.net/">prinseq-lite.pl</a> is a utility written in Perl for preprocessing NGS reads, also in <a
href="https://en.wikipedia.org/wiki/FASTQ_format">FASTQ format</a>.<br
/> It can read sequences both from files and from stdin (if you only have 1 sequence).</p><p>I wanted to use it with compressed (gzipped/bzipped2) FASTQ input files.<br
/> As I do not need to store decompressed input files, the most efficient solution is to use pipes.<br
/> This works well for a single file, but not for 2 files (paired-end reads).</p><p>For 2 files, <a
href="https://en.wikipedia.org/wiki/Named_pipe">named pipes</a> (also known as <a
href="https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)">FIFO</a>s) can be used.<br
/> You can create a named pipe in Linux with the help of <code>mkfifo</code> command, for example <code>mkfifo R1_decompressed.fastq</code>.<br
/> To use it, start decompressing something into it (either in a different terminal, or in background), for example <code>zcat R1.fastq.gz > R1_decompressed.fastq &#038;</code>;<br
/> we can call this a writing/generating process, because it writes into a pipe.<br
/> (If you are writing software to use named pipes, any processes writing into them should be started in a new thread, as they will block until all the data is consumed.)<br
/> Now if you give the R1_decompressed.fastq as a file argument to some other program, it will see decompressed content (e.g. <code>wc -l R1_decompressed.fastq</code> will tell you the number of lines in the decompressed file); we can call program reading from the named pipe a reading/consuming process.<br
/> As soon as a consuming process had consumed (read) all of the data, the writing/generating process will finally exit.</p><p>This, however, does not work with prinseq-lite.pl (version 0.20.4 or earlier), with a <strong>broken pipe</strong> error.<span
id="more-2360"></span></p><p>Named pipes are very similar to usual files, with two <strong>major differences</strong>:</p><ul><li>named pipes are <strong>not seekable</strong>: you cannot move file pointer (at least not backwards, not sure about skipping forward);</li><li>you <strong>cannot</strong> arbitrarily close/<strong>re-open</strong> a named pipe from the consuming end: closing a pipe on the consuming end also closes it for the writing/generating process.</li></ul><p>The reason why prinseq-lite.pl does not work with named pipes is that it performs file format checking first &#8211; by opening the file, reading the first 3 lines, and closing it.<br
/> Closing a named pipe causes <strong>broken pipe</strong> for the writing process, and when prinseq-lite.pl attempts to open the pipe again &#8211; it succeeds, but there is no data there anymore, so it just sits and waits for data <img
src="https://bogdan.org.ua/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></p><p>I&#8217;m ok with a quick and dirty solution, so here it is: <a
href="https://gist.github.com/spock/7d4e46e1158e2e4a46d4">prinseq-lite.pl patch to enable mkfifo named pipes as input files</a> (also local <a
href="http://bogdan.org.ua/wp-content/uploads/2016/02/prinseq-lite.pl_.patch_.txt">prinseq-lite.pl.patch</a>).<br
/> <strong>WARNING</strong>: this patch simply disables file format checking!</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F24%2Fhow-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html&amp;linkname=How%20to%20use%20mkfifo%20named%20pipes%20with%20prinseq-lite.pl" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F24%2Fhow-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html&amp;linkname=How%20to%20use%20mkfifo%20named%20pipes%20with%20prinseq-lite.pl" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F24%2Fhow-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html&amp;linkname=How%20to%20use%20mkfifo%20named%20pipes%20with%20prinseq-lite.pl" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F24%2Fhow-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html&amp;linkname=How%20to%20use%20mkfifo%20named%20pipes%20with%20prinseq-lite.pl" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F24%2Fhow-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html&amp;linkname=How%20to%20use%20mkfifo%20named%20pipes%20with%20prinseq-lite.pl" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2016%2F02%2F24%2Fhow-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html&#038;title=How%20to%20use%20mkfifo%20named%20pipes%20with%20prinseq-lite.pl" data-a2a-url="https://bogdan.org.ua/2016/02/24/how-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html" data-a2a-title="How to use mkfifo named pipes with prinseq-lite.pl"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2016/02/24/how-to-use-mkfifo-named-pipes-with-prinseq-lite-pl.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Good hands-on explanation of differences between Spearman&#8217;s and Pearson&#8217;s correlation</title><link>https://bogdan.org.ua/2014/04/22/good-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html</link> <comments>https://bogdan.org.ua/2014/04/22/good-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html#comments</comments> <pubDate>Tue, 22 Apr 2014 10:42:10 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Links]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[correlation]]></category> <category><![CDATA[pearson]]></category> <category><![CDATA[spearman]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2140</guid> <description><![CDATA[Linear correlation vs. Rank order correlation: drag 11 data points around the plot and observe how both Spearman&#8217;s and Pearson&#8217;s correlation measures change. But first follow the Next button at the bottom-right for a guided tour of data manipulations.]]></description> <content:encoded><![CDATA[<p><a
href="https://www.economicsnetwork.ac.uk/statistics/pearson_spearman.htm">Linear correlation vs. Rank order correlation</a>: drag 11 data points around the plot and observe how both Spearman&#8217;s and Pearson&#8217;s correlation measures change. But first follow the <strong>Next</strong> button at the bottom-right for a <em>guided tour</em> of data manipulations.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F04%2F22%2Fgood-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html&amp;linkname=Good%20hands-on%20explanation%20of%20differences%20between%20Spearman%E2%80%99s%20and%20Pearson%E2%80%99s%20correlation" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F04%2F22%2Fgood-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html&amp;linkname=Good%20hands-on%20explanation%20of%20differences%20between%20Spearman%E2%80%99s%20and%20Pearson%E2%80%99s%20correlation" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F04%2F22%2Fgood-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html&amp;linkname=Good%20hands-on%20explanation%20of%20differences%20between%20Spearman%E2%80%99s%20and%20Pearson%E2%80%99s%20correlation" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F04%2F22%2Fgood-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html&amp;linkname=Good%20hands-on%20explanation%20of%20differences%20between%20Spearman%E2%80%99s%20and%20Pearson%E2%80%99s%20correlation" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F04%2F22%2Fgood-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html&amp;linkname=Good%20hands-on%20explanation%20of%20differences%20between%20Spearman%E2%80%99s%20and%20Pearson%E2%80%99s%20correlation" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2014%2F04%2F22%2Fgood-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html&#038;title=Good%20hands-on%20explanation%20of%20differences%20between%20Spearman%E2%80%99s%20and%20Pearson%E2%80%99s%20correlation" data-a2a-url="https://bogdan.org.ua/2014/04/22/good-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html" data-a2a-title="Good hands-on explanation of differences between Spearman’s and Pearson’s correlation"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2014/04/22/good-hands-on-explanation-of-differences-between-spearmans-and-pearsons-correlation.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>How to cite PHYLIP</title><link>https://bogdan.org.ua/2014/01/10/how-to-cite-phylip.html</link> <comments>https://bogdan.org.ua/2014/01/10/how-to-cite-phylip.html#comments</comments> <pubDate>Fri, 10 Jan 2014 15:29:07 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Links]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[citation]]></category> <category><![CDATA[PHYLIP]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=2083</guid> <description><![CDATA[Official PHYLIP FAQ does suggest a few ways to cite the software, but I believe that the best citation is mentioned in the wikipedia PHYLIP article: pubmed reference for PMID 7288891. This PubMed citations seems the best, because it does mention the software tool implementing the maximum likelihood approach, it is likely the earliest mention [&#8230;]]]></description> <content:encoded><![CDATA[<p><a
href="http://evolution.genetics.washington.edu/phylip/faq.html#citation">Official PHYLIP FAQ</a> does suggest a few ways to cite the software, but I believe that the best citation is mentioned in the <a
href="http://en.wikipedia.org/wiki/PHYLIP">wikipedia PHYLIP article</a>: <a
href="http://www.ncbi.nlm.nih.gov/pubmed/7288891">pubmed reference for PMID 7288891</a>. This PubMed citations seems the best, because</p><ul><li>it does mention the software tool implementing the maximum likelihood approach,</li><li>it is likely the earliest mention of the PHYLIP software (which was distributed since around 1980),</li><li>it refers to a journal indexed by pubmed, and</li><li>according to Google Scholar, it was already cited over 6660  times <img
src="https://bogdan.org.ua/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></li></ul><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F01%2F10%2Fhow-to-cite-phylip.html&amp;linkname=How%20to%20cite%20PHYLIP" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F01%2F10%2Fhow-to-cite-phylip.html&amp;linkname=How%20to%20cite%20PHYLIP" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F01%2F10%2Fhow-to-cite-phylip.html&amp;linkname=How%20to%20cite%20PHYLIP" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F01%2F10%2Fhow-to-cite-phylip.html&amp;linkname=How%20to%20cite%20PHYLIP" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2014%2F01%2F10%2Fhow-to-cite-phylip.html&amp;linkname=How%20to%20cite%20PHYLIP" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2014%2F01%2F10%2Fhow-to-cite-phylip.html&#038;title=How%20to%20cite%20PHYLIP" data-a2a-url="https://bogdan.org.ua/2014/01/10/how-to-cite-phylip.html" data-a2a-title="How to cite PHYLIP"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2014/01/10/how-to-cite-phylip.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>GUIs for R</title><link>https://bogdan.org.ua/2013/10/17/guis-for-r.html</link> <comments>https://bogdan.org.ua/2013/10/17/guis-for-r.html#comments</comments> <pubDate>Thu, 17 Oct 2013 20:59:01 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[*nix]]></category> <category><![CDATA[Notepad]]></category> <category><![CDATA[Programming]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[cantor]]></category> <category><![CDATA[deducer]]></category> <category><![CDATA[ipython]]></category> <category><![CDATA[notebook]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[R]]></category> <category><![CDATA[rkward]]></category> <category><![CDATA[rstudio]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1870</guid> <description><![CDATA[I&#8217;ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio. My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful. Next step would be using some R-equivalent of the excellent ipython&#8217;s Mathematica-like Notebook webinterface&#8230;]]></description> <content:encoded><![CDATA[<p>I&#8217;ve tried [briefly] Cantor (which also supports Octave and KAlgebra as backends), rkward, deducer/JGR, R Commander, and RStudio.</p><p>My personal choice was RStudio: it is good-looking, intuitive, easy-to-use, while powerful.</p><p>Next step would be using some R-equivalent of the excellent ipython&#8217;s Mathematica-like Notebook webinterface&#8230;</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F10%2F17%2Fguis-for-r.html&amp;linkname=GUIs%20for%20R" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F10%2F17%2Fguis-for-r.html&amp;linkname=GUIs%20for%20R" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F10%2F17%2Fguis-for-r.html&amp;linkname=GUIs%20for%20R" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F10%2F17%2Fguis-for-r.html&amp;linkname=GUIs%20for%20R" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F10%2F17%2Fguis-for-r.html&amp;linkname=GUIs%20for%20R" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2013%2F10%2F17%2Fguis-for-r.html&#038;title=GUIs%20for%20R" data-a2a-url="https://bogdan.org.ua/2013/10/17/guis-for-r.html" data-a2a-title="GUIs for R"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2013/10/17/guis-for-r.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>MultiParanoid vs. QuickParanoid: pro et contra for each</title><link>https://bogdan.org.ua/2013/07/09/multiparanoid-vs-quickparanoid-pro-et-contra-for-each.html</link> <comments>https://bogdan.org.ua/2013/07/09/multiparanoid-vs-quickparanoid-pro-et-contra-for-each.html#comments</comments> <pubDate>Tue, 09 Jul 2013 15:57:31 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[*nix]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[inparanoid]]></category> <category><![CDATA[multiparanoid]]></category> <category><![CDATA[orthologs]]></category> <category><![CDATA[orthology]]></category> <category><![CDATA[orthomcl]]></category> <category><![CDATA[paralogs]]></category> <category><![CDATA[quickparanoid]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1931</guid> <description><![CDATA[MultiParanoid Here we present a new proteome-scale analysis program called MultiParanoid that can automatically find orthology relationships between proteins in multiple proteomes. The software is an extension of the InParanoid program that identifies orthologs and inparalogs in pairwise proteome comparisons. MultiParanoid applies a clustering algorithm to merge multiple pairwise ortholog groups from InParanoid into multi-species [&#8230;]]]></description> <content:encoded><![CDATA[<p><a
href="http://multiparanoid.sbc.su.se/" class="broken_link" rel="nofollow">MultiParanoid</a></p><blockquote><p>Here we present a new proteome-scale analysis program called MultiParanoid that can automatically find orthology relationships between proteins in multiple proteomes. The software is an extension of the InParanoid program that identifies orthologs and inparalogs in pairwise proteome comparisons. MultiParanoid applies a clustering algorithm to merge multiple pairwise ortholog groups from InParanoid into multi-species ortholog groups.</p></blockquote><p><a
href="http://pl.postech.ac.kr/QuickParanoid/">QuickParanoid</a></p><blockquote><p>QuickParanoid is a suite of programs for automatic ortholog clustering and analysis. It takes as input a collection of files produced by InParanoid and finds ortholog clusters among multiple species. For a given dataset, QuickParanoid first preprocesses each InParanoid output file and then computes ortholog clusters. It also provides a couple of programs qa1 and qa2 for analyzing the result of ortholog clustering.</p></blockquote><p>So&#8230; both use <a
href="http://inparanoid.sbc.su.se/cgi-bin/index.cgi">InParanoid</a>&#8230; Are there any differences? Let me list those which I&#8217;ve found.</p><p><span
id="more-1931"></span></p><p><strong>MultiParanoid</strong></p><ul><li>requires <strong>all</strong> species names to be passed through the command line; I had 11, so that&#8217;s a downside (even though I got that list with 1 extra <code>ls</code> command)</li><li>is written in Perl</li><li>needs source code editing &#8211; to specify the input directory with all the InParanoid&#8217;s <code>sqltable.*</code> files, and also the output file</li><li>seemed to run fairly fast on my 11 genomes &#8211; finished in under 4 minutes</li></ul><p>Overall, MultiParanoid left a somewhat &#8220;messy&#8221; impression&#8230; But it definitely did work.</p><p><strong>QuickParanoid</strong></p><ul><li>interactive: it just asks you for a directory containing all the <code>sqltable.*</code> files, for <em>configuration file</em>, and the <em>executable prefix</em></li><li><em>configuration file</em> is just a list of all species names; this is similar to MultiParanoid, is also achieved with <code>ls</code> command (<code>ls -1 *.faa > config</code> in my case), but feels a little better than dropping those 11 filenames in the command line</li><li>written in C++</li><li>generates and compiles code! after collecting your input, two custom binaries are generated to actually run the analysis, which seems to have no practical utility for the end-user, but is definitely cool!</li><li>much faster than MultiParanoid &#8211; analysis itself (with the generated custom binary) took <strong>less than 5 seconds</strong>; generating that custom binary adds only a few more seconds</li><li>contains helpful <strong>qa1</strong> and <strong>qa2</strong> utilities; <strong>qa1</strong> summarizes the final clusters, and <strong>qa2</strong> compares two different results with each other (see examples below)</li><li>it had to be compiled first with <code>make qa</code>; also, <code>import &lt;string.h&gt;</code> was missing in one of the source files&#8230;</li></ul><p>Overall, QuickParanoid left the impressions &#8220;quick and cool&#8221;, with the minor drawback of having to add that missing <code>string.h</code> import.</p><p>With the help of QuickParanoid&#8217;s qa1, I&#8217;ve collected some stats on the clusters of orthologs in my 11 genomes.</p><p>QuickParanoid clusters:</p><blockquote><p>Number of clusters consisting of 1 species : 0<br
/> Number of clusters consisting of 2 species : 1757<br
/> Number of clusters consisting of 3 species : 975<br
/> Number of clusters consisting of 4 species : 622<br
/> Number of clusters consisting of 5 species : 463<br
/> Number of clusters consisting of 6 species : 406<br
/> Number of clusters consisting of 7 species : 325<br
/> Number of clusters consisting of 8 species : 355<br
/> Number of clusters consisting of 9 species : 448<br
/> Number of clusters consisting of 10 species : 607<br
/> Number of clusters consisting of 11 species : 2449<br
/> Total: 8407</p></blockquote><p>MultiParanoid clusters:</p><blockquote><p>Number of clusters consisting of 1 species : 0<br
/> Number of clusters consisting of 2 species : 1872<br
/> Number of clusters consisting of 3 species : 1023<br
/> Number of clusters consisting of 4 species : 637<br
/> Number of clusters consisting of 5 species : 479<br
/> Number of clusters consisting of 6 species : 418<br
/> Number of clusters consisting of 7 species : 338<br
/> Number of clusters consisting of 8 species : 358<br
/> Number of clusters consisting of 9 species : 454<br
/> Number of clusters consisting of 10 species : 605<br
/> Number of clusters consisting of 11 species : 2451<br
/> Total: 8635</p></blockquote><p>So, <strong>qa1</strong> is definitely useful. As the output of QuickParanoid is almost the same as that of MultiParanoid, <strong>qa1</strong> also works on MultiParanoid results &#8211; one just has to add a hash # at the beginning of the very first line of the MultiParanoid results file.</p><p>qa2 allows to compare, e.g., MultiParanoid and QuickParanoid results. Here&#8217;s the output from the default &#8216;names-only&#8217; comparison mode:</p><blockquote><p>Checking only sequence names&#8230;<br
/> Number of clusters in multiparanoid_result.txt : 8635<br
/> Number of clusters in quickparanoid_result.txt : 8407<br
/> Number of matched clusters: 8253<br
/> Residue clusters in multiparanoid_result.txt:<br
/> 5945 4338 8374 7913 8334 8315 4564 7508 7492 6173 5922 8187 8104 4377 7003 8590 7808 5265 8064 5195 6297 8285 6849 6868 6317 8619 5983 4549 6659 4503 7500 4550 8510 7591 7776 8075 5969 6051 3857 5949 7795 6031 8005 8441 7737 7445 8507 4534 8289 6712 7294 6083 8377 8117 7344 8040 6858 8138 4559 7788 7818 7479 7100 7906 6287 8007 8552 7198 7489 7446 8522 8270 8327 6367 5278 8150 6832 8519 6332 6361 6674 8323 6156 6183 7187 8048 6328 8127 7260 7143 7794 7810 7376 8541 8397 8389 8516 5146 6311 8347 7573 7103 4391 6980 8330 8384 5918 4318 4759 4656 6061 7030 6694 8260 6180 4084 4321 7745 7875 7650 5112 7635 6976 4776 6249 7607 5208 5046 7566 4401 8487 7307 5571 4310 7322 6625 4398 5410 7401 6213 7392 4985 8369 8165 5704 8520 6244 6305 8072 8172 8143 8236 8419 5235 4746 5333 5930 4309 4349 3097 8393 5665 4265 4317 6502 7175 6386 5215 7117 4332 5182 7590 4788 5907 8396 7910 7221 7484 4807 5610 6613 4050 4410 4878 6522 4284 4245 1936 4274 5407 6963 7694 6528 8538 5800 6577 6767 5344 4069 5747 4105 7162 5047 4879 5036 6303 78 3446 5883 3891 4727 6912 4219 4724 3775 2404 7618 4171 6499 6127 3438 3844 4962 5708 4824 4657 99 160 5507 4992 6610 3683 5871 3908 4127 4725 4749 5200 5450 4138 3342 4283 4814 2210 5672 5551 6487 569 1545 2556 2950 6949 3933 4051 5084 5492 4389 1142 1303 1555 1784 2045 2490 3037 3894 4390 4668 4066 510 695 3982 1349 1922 2148 2221 4781 1065 1100 1125 1263 6489 2054 3147 951 1708 2472 3009 3458 3569 3875 4996 6470 145 1565 2196 2298 1470 2312 3868 4750 1996 3579 1029 515 716 1075 2119 4825 509 1855 1863 101 390 756 1426 4721 4912 5342 351 4734 1455 1738 1880 2367 5167 788 2750 338 3543 3745 4762 837 2662 4143 209 3175 4717 2071 2429 847 1220 1340 2046 4573 2065 2328 939 1995 97 845 1631 2076 3711 4880 1039 1214 534 1133 578 767 754 226 988 3521 1221 2689 1848 1917 3654 2906 61 435 738 1577 971 1099 926 2101 376 294 173 119<br
/> Residue clusters in quickparanoid_result.txt:<br
/> 7697 5071 8137 8292 7403 6349 6516 7093 4532 5193 3596 6608 6214 5738 5438 3606 3492 4631 4037 5155 2725 5804 5871 3930 1185 3044 4776 5064 5321 760 2653 3419 4426 4530 5281 958 3602 1158 3008 256 5551 2742 3356 3143 3626 3548 1754 1411 1718 2104 3922 3867 3087 967 1932 2261 269 3881 3692 3206 3059 1115 500 3664 3985 3180 3615 3713 748 3311 1351 1396 1409 4113 3822 1917 1816 1109 3505 1110 1826 4079 4007 201 1308 4572 471 5128 1216 3060 2697 5732 2248 2299 3590 4537 2741 1440 2563 1001 3772 206 3020 4075 4571 3147 3727 2438 1744 1365 1156 1181 1442 2379 272 3902 777 3198 350 2111 3320 2670 3415 2118 3939 326 2826 3343 3698 2066 3284 3319 2883 3487 2976 876 1576 661 942 3960 1525 2962 270 577 1266 287 1499 2643 1651 1401 2218 1991 1 1681</p></blockquote><p>Not a conclusion:</p><ul><li>thanks to the summary of <strong>qa1</strong>, I&#8217;ve decided to take MultiParanoid results &#8211; they have (in my case) larger clusters with more genes in them, which is good, and overall more clusters &#8211; which is also good</li><li>if I had 20+ genomes to compare, or if I had to re-run this type of analysis multiple times &#8211; I&#8217;d use QuickParanoid</li><li>if I had to implement yet-another-inparanoid-based orthology clustering tool, then I&#8217;d first consider the QuickParanoid&#8217;s preprocessor/code generator, which was designed in an <a
href="http://pl.postech.ac.kr/QuickParanoid/#extension">easy to extend</a> manner</li></ul><p>Initially, I had also considered OrthoMCL for multi-species orthologs clustering. However, InParanoid + Multi/QuickParanoid is way much easier and quicker to set up and use, as OrthoMCL requires a database back-end for better scalability.</p><p>Well, QuickParanoid has a test dataset with 120 species, and</p><blockquote><p>&#8230; it takes only 199.56 seconds on an Intel 2.4Ghz machine with 1 gigabyte memory to process a dataset of 120 species &#8230;</p></blockquote><p> <img
src="https://bogdan.org.ua/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F07%2F09%2Fmultiparanoid-vs-quickparanoid-pro-et-contra-for-each.html&amp;linkname=MultiParanoid%20vs.%20QuickParanoid%3A%20pro%20et%20contra%20for%20each" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F07%2F09%2Fmultiparanoid-vs-quickparanoid-pro-et-contra-for-each.html&amp;linkname=MultiParanoid%20vs.%20QuickParanoid%3A%20pro%20et%20contra%20for%20each" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F07%2F09%2Fmultiparanoid-vs-quickparanoid-pro-et-contra-for-each.html&amp;linkname=MultiParanoid%20vs.%20QuickParanoid%3A%20pro%20et%20contra%20for%20each" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F07%2F09%2Fmultiparanoid-vs-quickparanoid-pro-et-contra-for-each.html&amp;linkname=MultiParanoid%20vs.%20QuickParanoid%3A%20pro%20et%20contra%20for%20each" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2013%2F07%2F09%2Fmultiparanoid-vs-quickparanoid-pro-et-contra-for-each.html&amp;linkname=MultiParanoid%20vs.%20QuickParanoid%3A%20pro%20et%20contra%20for%20each" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2013%2F07%2F09%2Fmultiparanoid-vs-quickparanoid-pro-et-contra-for-each.html&#038;title=MultiParanoid%20vs.%20QuickParanoid%3A%20pro%20et%20contra%20for%20each" data-a2a-url="https://bogdan.org.ua/2013/07/09/multiparanoid-vs-quickparanoid-pro-et-contra-for-each.html" data-a2a-title="MultiParanoid vs. QuickParanoid: pro et contra for each"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2013/07/09/multiparanoid-vs-quickparanoid-pro-et-contra-for-each.html/feed</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>R functions for regression analysis cheat sheet</title><link>https://bogdan.org.ua/2012/05/29/r-functions-for-regression-analysis-cheat-sheet.html</link> <comments>https://bogdan.org.ua/2012/05/29/r-functions-for-regression-analysis-cheat-sheet.html#comments</comments> <pubDate>Tue, 29 May 2012 13:11:48 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Links]]></category> <category><![CDATA[Misc]]></category> <category><![CDATA[R]]></category> <category><![CDATA[statistics]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1838</guid> <description><![CDATA[Original PDF. My local copy.]]></description> <content:encoded><![CDATA[<p>Original <a
href="http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf">PDF</a>.<br
/> My local <a
href='/wp-content/uploads/2012/05/Ricci-refcard-regression.pdf'>copy</a>.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Fr-functions-for-regression-analysis-cheat-sheet.html&amp;linkname=R%20functions%20for%20regression%20analysis%20cheat%20sheet" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Fr-functions-for-regression-analysis-cheat-sheet.html&amp;linkname=R%20functions%20for%20regression%20analysis%20cheat%20sheet" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Fr-functions-for-regression-analysis-cheat-sheet.html&amp;linkname=R%20functions%20for%20regression%20analysis%20cheat%20sheet" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Fr-functions-for-regression-analysis-cheat-sheet.html&amp;linkname=R%20functions%20for%20regression%20analysis%20cheat%20sheet" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Fr-functions-for-regression-analysis-cheat-sheet.html&amp;linkname=R%20functions%20for%20regression%20analysis%20cheat%20sheet" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Fr-functions-for-regression-analysis-cheat-sheet.html&#038;title=R%20functions%20for%20regression%20analysis%20cheat%20sheet" data-a2a-url="https://bogdan.org.ua/2012/05/29/r-functions-for-regression-analysis-cheat-sheet.html" data-a2a-title="R functions for regression analysis cheat sheet"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2012/05/29/r-functions-for-regression-analysis-cheat-sheet.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Information criteria for choosing best predictive models</title><link>https://bogdan.org.ua/2012/05/29/information-criteria-for-choosing-best-predictive-models.html</link> <comments>https://bogdan.org.ua/2012/05/29/information-criteria-for-choosing-best-predictive-models.html#comments</comments> <pubDate>Tue, 29 May 2012 11:44:50 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Machine learning]]></category> <category><![CDATA[AIC]]></category> <category><![CDATA[BIC]]></category> <category><![CDATA[cross-validation]]></category> <category><![CDATA[information criterion]]></category> <category><![CDATA[R]]></category> <category><![CDATA[statistics]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1831</guid> <description><![CDATA[Usually I&#8217;m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets). Just came across the Akaikeâ€™s InforÂ­maÂ­tion Criterion (AIC) and Schwarz Bayesian InforÂ­maÂ­tion Criterion (BIC). Citing robjhyndman, AsympÂ­totÂ­iÂ­cally, minÂ­iÂ­mizÂ­ing the AIC is equivÂ­aÂ­lent to minÂ­iÂ­mizÂ­ing the CV value. [&#8230;]]]></description> <content:encoded><![CDATA[<p>Usually I&#8217;m using 10-fold (non-stratified) <abbr
title="cross-validation">CV</abbr> to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).</p><p>Just came across the Akaikeâ€™s InforÂ­maÂ­tion Criterion (AIC) and Schwarz Bayesian InforÂ­maÂ­tion Criterion (BIC). Citing <a
href="http://robjhyndman.com/researchtips/crossvalidation/">robjhyndman</a>,</p><blockquote><p> AsympÂ­totÂ­iÂ­cally, minÂ­iÂ­mizÂ­ing the AIC is equivÂ­aÂ­lent to minÂ­iÂ­mizÂ­ing the CV value. This is true for any model (<a
href="http://www.jstor.org/stable/2984877" class="vt-p broken_link" rel="nofollow">Stone 1977</a>), not just linÂ­ear modÂ­els. It is this propÂ­erty that makes the AIC so useÂ­ful in model selecÂ­tion when the purÂ­pose is prediction.<br
/> &#8230;<br
/> Because of the heavÂ­ier penalty, the model choÂ­sen by BIC is either the same as that choÂ­sen by AIC, or one with fewer terms. AsympÂ­totÂ­iÂ­cally, for linÂ­ear modÂ­els minÂ­iÂ­mizÂ­ing BIC is equivÂ­aÂ­lent to leaveâ€“vâ€“out cross-â€‹â€‹validation when v = n[1-1/(log(n)-1)] (<a
href="http://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n21.pdf" class="vt-p">Shao 1997</a>).</p></blockquote><p>Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Finformation-criteria-for-choosing-best-predictive-models.html&amp;linkname=Information%20criteria%20for%20choosing%20best%20predictive%20models" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Finformation-criteria-for-choosing-best-predictive-models.html&amp;linkname=Information%20criteria%20for%20choosing%20best%20predictive%20models" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Finformation-criteria-for-choosing-best-predictive-models.html&amp;linkname=Information%20criteria%20for%20choosing%20best%20predictive%20models" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Finformation-criteria-for-choosing-best-predictive-models.html&amp;linkname=Information%20criteria%20for%20choosing%20best%20predictive%20models" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Finformation-criteria-for-choosing-best-predictive-models.html&amp;linkname=Information%20criteria%20for%20choosing%20best%20predictive%20models" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2012%2F05%2F29%2Finformation-criteria-for-choosing-best-predictive-models.html&#038;title=Information%20criteria%20for%20choosing%20best%20predictive%20models" data-a2a-url="https://bogdan.org.ua/2012/05/29/information-criteria-for-choosing-best-predictive-models.html" data-a2a-title="Information criteria for choosing best predictive models"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2012/05/29/information-criteria-for-choosing-best-predictive-models.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Academia or life?</title><link>https://bogdan.org.ua/2011/04/16/academia-or-life.html</link> <comments>https://bogdan.org.ua/2011/04/16/academia-or-life.html#comments</comments> <pubDate>Sat, 16 Apr 2011 10:56:42 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Links]]></category> <category><![CDATA[Science]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1582</guid> <description><![CDATA[Worth reading: Goodbye academia, I get a life.]]></description> <content:encoded><![CDATA[<p>Worth reading: <a
href="http://blog.devicerandom.org/2011/02/18/getting-a-life/">Goodbye academia, I get a life</a>.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F04%2F16%2Facademia-or-life.html&amp;linkname=Academia%20or%20life%3F" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F04%2F16%2Facademia-or-life.html&amp;linkname=Academia%20or%20life%3F" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F04%2F16%2Facademia-or-life.html&amp;linkname=Academia%20or%20life%3F" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F04%2F16%2Facademia-or-life.html&amp;linkname=Academia%20or%20life%3F" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F04%2F16%2Facademia-or-life.html&amp;linkname=Academia%20or%20life%3F" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2011%2F04%2F16%2Facademia-or-life.html&#038;title=Academia%20or%20life%3F" data-a2a-url="https://bogdan.org.ua/2011/04/16/academia-or-life.html" data-a2a-title="Academia or life?"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2011/04/16/academia-or-life.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Amazonia! 6462 human microarray datasets</title><link>https://bogdan.org.ua/2011/03/06/amazonia-6462-human-microarray-datasets.html</link> <comments>https://bogdan.org.ua/2011/03/06/amazonia-6462-human-microarray-datasets.html#comments</comments> <pubDate>Sun, 06 Mar 2011 19:18:51 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Links]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[amazonia]]></category> <category><![CDATA[data]]></category> <category><![CDATA[expression]]></category> <category><![CDATA[microarray]]></category> <category><![CDATA[stem cells]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1460</guid> <description><![CDATA[Amazonia! &#8211; explore the jungle of microarray results Paradoxically, the tremendous downpour of microarray results prevents a simple use of expression data. Therefore, we propose a thematic entry to public transcriptomes: you may for instance query a gene on a &#8220;Stem Cells page&#8221;, where you will see the expression of your favorite gene across selected [&#8230;]]]></description> <content:encoded><![CDATA[<p><a
href="http://amazonia.transcriptome.eu/"><img
src="http://bogdan.org.ua/wp-content/uploads/2011/03/AmaZoniaLogo.png" alt="Amazonia!" title="Amazonia!" width="357" height="100" class="alignleft size-full wp-image-1463" /></a><a
href="http://amazonia.transcriptome.eu/">Amazonia! &#8211; explore the jungle of microarray results</a></p><blockquote><p>Paradoxically, the tremendous downpour of microarray results prevents a simple use of expression data. Therefore, we propose a thematic entry to public transcriptomes: you may for instance query a gene on a &#8220;Stem Cells page&#8221;, where you will see the expression of your favorite gene across selected microarray experiments related to stem cell biology. This selection of samples can be customized at will among the 6462 samples currently present in the database.</p></blockquote><blockquote><p>Every transcriptome study results in the identification of lists of genes relevant to a given biological condition. In order to include this valuable information in any new query in the Amazonia! database, we indicate for each gene in which lists it is included. This is a straightforward and efficient way to synthesize hundreds of microarray publications.</p><p>A special feature of Amazonia! is the field of human stem cells, notably embryonic stem cells.</p></blockquote><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F03%2F06%2Famazonia-6462-human-microarray-datasets.html&amp;linkname=Amazonia%21%206462%20human%20microarray%20datasets" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F03%2F06%2Famazonia-6462-human-microarray-datasets.html&amp;linkname=Amazonia%21%206462%20human%20microarray%20datasets" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F03%2F06%2Famazonia-6462-human-microarray-datasets.html&amp;linkname=Amazonia%21%206462%20human%20microarray%20datasets" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F03%2F06%2Famazonia-6462-human-microarray-datasets.html&amp;linkname=Amazonia%21%206462%20human%20microarray%20datasets" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F03%2F06%2Famazonia-6462-human-microarray-datasets.html&amp;linkname=Amazonia%21%206462%20human%20microarray%20datasets" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2011%2F03%2F06%2Famazonia-6462-human-microarray-datasets.html&#038;title=Amazonia%21%206462%20human%20microarray%20datasets" data-a2a-url="https://bogdan.org.ua/2011/03/06/amazonia-6462-human-microarray-datasets.html" data-a2a-title="Amazonia! 6462 human microarray datasets"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2011/03/06/amazonia-6462-human-microarray-datasets.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Introduction to Python for bioinformatics</title><link>https://bogdan.org.ua/2011/02/25/introduction-to-python-for-bioinformatics.html</link> <comments>https://bogdan.org.ua/2011/02/25/introduction-to-python-for-bioinformatics.html#comments</comments> <pubDate>Fri, 25 Feb 2011 12:03:55 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Links]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Software]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1451</guid> <description><![CDATA[This overview presentation is two years old, but still a highly valuable resource: modules and tools mentioned are alive and useful. I think this is the second presentation by Giovanni I&#8217;m embedding (first one being about GNU/make for bioinformatics). Introduction to python for bioinformatics]]></description> <content:encoded><![CDATA[<p>This overview presentation is two years old, but still a highly valuable resource: modules and tools mentioned are alive and useful.<br
/> I think this is the second presentation by Giovanni I&#8217;m embedding (first one being about GNU/make for bioinformatics).</p><div
style="width:425px" id="__ss_1320208"><strong
style="display:block;margin:12px 0 4px"><a
href="http://www.slideshare.net/giovanni/introduction-to-python-for-bioinformatics" title="Introduction to python for bioinformatics">Introduction to python for bioinformatics</a></strong><object
id="__sse1320208" width="425" height="355"><param
name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=introduction-090421044444-phpapp02&#038;stripped_title=introduction-to-python-for-bioinformatics&#038;userName=giovanni" /><param
name="allowFullScreen" value="true"/><param
name="allowScriptAccess" value="always"/><embed
name="__sse1320208" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=introduction-090421044444-phpapp02&#038;stripped_title=introduction-to-python-for-bioinformatics&#038;userName=giovanni" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></div><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F02%2F25%2Fintroduction-to-python-for-bioinformatics.html&amp;linkname=Introduction%20to%20Python%20for%20bioinformatics" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F02%2F25%2Fintroduction-to-python-for-bioinformatics.html&amp;linkname=Introduction%20to%20Python%20for%20bioinformatics" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F02%2F25%2Fintroduction-to-python-for-bioinformatics.html&amp;linkname=Introduction%20to%20Python%20for%20bioinformatics" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F02%2F25%2Fintroduction-to-python-for-bioinformatics.html&amp;linkname=Introduction%20to%20Python%20for%20bioinformatics" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2011%2F02%2F25%2Fintroduction-to-python-for-bioinformatics.html&amp;linkname=Introduction%20to%20Python%20for%20bioinformatics" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2011%2F02%2F25%2Fintroduction-to-python-for-bioinformatics.html&#038;title=Introduction%20to%20Python%20for%20bioinformatics" data-a2a-url="https://bogdan.org.ua/2011/02/25/introduction-to-python-for-bioinformatics.html" data-a2a-title="Introduction to Python for bioinformatics"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2011/02/25/introduction-to-python-for-bioinformatics.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>How to replace newlines with commas, tabs etc (merge lines)</title><link>https://bogdan.org.ua/2010/11/16/how-to-replace-newlines-with-commas-tabs-etc-merge-lines.html</link> <comments>https://bogdan.org.ua/2010/11/16/how-to-replace-newlines-with-commas-tabs-etc-merge-lines.html#comments</comments> <pubDate>Tue, 16 Nov 2010 08:20:45 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[*nix]]></category> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[how-to]]></category> <category><![CDATA[Notepad]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[awk]]></category> <category><![CDATA[grep]]></category> <category><![CDATA[linux]]></category> <category><![CDATA[paste]]></category> <category><![CDATA[sed]]></category> <category><![CDATA[sort]]></category> <category><![CDATA[tr]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1208</guid> <description><![CDATA[Imagine you need to get a few lines from a group of files with missing identifier mappings. I have a bunch of files with content similar to this one: ENSRNOG00000018677 1368832_at 25233 ENSRNOG00000002079 1369102_at 25272 ENSRNOG00000043451 25353 ENSRNOG00000001527 1388013_at 25408 ENSRNOG00000007390 1389538_at 25493 In the example above I need &#8217;25353&#8242;, which does not have corresponding [&#8230;]]]></description> <content:encoded><![CDATA[<p>Imagine you need to get a few lines from a group of files with missing identifier mappings. I have a bunch of files with content similar to this one:</p><blockquote><p> ENSRNOG00000018677      1368832_at      25233<br
/> ENSRNOG00000002079      1369102_at      25272<br
/> ENSRNOG00000043451                            25353<br
/> ENSRNOG00000001527      1388013_at      25408<br
/> ENSRNOG00000007390      1389538_at      25493</p></blockquote><p>In the example above I need &#8217;25353&#8242;, which does not have corresponding affy_probeset_id in the 2nd column.</p><p>It is clear how to do that:</p><div
id="ig-sh-1" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}'</div></li></ol></div></div><p>This outputs a column of required IDs (EntrezGene in this example):</p><blockquote><p> 116720<br
/> 679845<br
/> 309295<br
/> 364867<br
/> 298220<br
/> 298221<br
/> 25353</p></blockquote><p>However, I need these IDs as a comma-separated list, not as newline-separated list.</p><p>There are several ways to achieve the desired result (only the last pipe commands differ):</p><div
id="ig-sh-2" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | gawk '$1=$1' ORS=', '</div></li></ol></div></div><div
id="ig-sh-3" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | tr '\n' ','</div></li></ol></div></div><div
id="ig-sh-4" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':a;N;$!ba;s/\n/, /g'</div></li></ol></div></div><div
id="ig-sh-5" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':q;N;s/\n/, /g;t q'</div></li></ol></div></div><div
id="ig-sh-6" class="syntax_hilite"><div
class="code"><ol
class="code" style="font-family:monospace;"><li
style="font-weight: normal; vertical-align:top;"><div
style="font: normal normal 1em/1.2em monospace; margin:0; padding:0; background:none; vertical-align:top;">sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | paste -s -d &quot;,&quot;</div></li></ol></div></div><p>These solutions differ in efficiency and (slightly) in output. <strong>sed</strong> will read all the input into its buffer to replace newlines with other separators, so it might not be best for large files. <strong>tr</strong> might be the most efficient, but I haven&#8217;t tested that. <strong>paste</strong> will re-use delimiters, so you cannot really get comma-space &#8220;, &#8221; separation with it.</p><p>Sources: <a
href="http://www.linuxquestions.org/questions/programming-9/sed-how-do-you-replace-end-of-line-with-a-space-637013/" class="broken_link" rel="nofollow">linuxquestions 1 (explains used sed commands)</a>, <a
href="http://www.linuxquestions.org/questions/programming-9/merge-lines-in-a-file-using-sed-191121/" class="broken_link" rel="nofollow">linuxquestions 2</a>, <a
href="http://www.cyberciti.biz/faq/linux-unix-sed-replace-newline/">nixcraft</a>.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F16%2Fhow-to-replace-newlines-with-commas-tabs-etc-merge-lines.html&amp;linkname=How%20to%20replace%20newlines%20with%20commas%2C%20tabs%20etc%20%28merge%20lines%29" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F16%2Fhow-to-replace-newlines-with-commas-tabs-etc-merge-lines.html&amp;linkname=How%20to%20replace%20newlines%20with%20commas%2C%20tabs%20etc%20%28merge%20lines%29" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F16%2Fhow-to-replace-newlines-with-commas-tabs-etc-merge-lines.html&amp;linkname=How%20to%20replace%20newlines%20with%20commas%2C%20tabs%20etc%20%28merge%20lines%29" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F16%2Fhow-to-replace-newlines-with-commas-tabs-etc-merge-lines.html&amp;linkname=How%20to%20replace%20newlines%20with%20commas%2C%20tabs%20etc%20%28merge%20lines%29" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F16%2Fhow-to-replace-newlines-with-commas-tabs-etc-merge-lines.html&amp;linkname=How%20to%20replace%20newlines%20with%20commas%2C%20tabs%20etc%20%28merge%20lines%29" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F16%2Fhow-to-replace-newlines-with-commas-tabs-etc-merge-lines.html&#038;title=How%20to%20replace%20newlines%20with%20commas%2C%20tabs%20etc%20%28merge%20lines%29" data-a2a-url="https://bogdan.org.ua/2010/11/16/how-to-replace-newlines-with-commas-tabs-etc-merge-lines.html" data-a2a-title="How to replace newlines with commas, tabs etc (merge lines)"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2010/11/16/how-to-replace-newlines-with-commas-tabs-etc-merge-lines.html/feed</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Overlaying gene expression data onto pathways from databases</title><link>https://bogdan.org.ua/2010/11/05/overlaying-gene-expression-data-onto-pathways-from-databases.html</link> <comments>https://bogdan.org.ua/2010/11/05/overlaying-gene-expression-data-onto-pathways-from-databases.html#comments</comments> <pubDate>Fri, 05 Nov 2010 13:20:06 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Links]]></category> <category><![CDATA[Software]]></category> <category><![CDATA[biocarta]]></category> <category><![CDATA[data]]></category> <category><![CDATA[expression]]></category> <category><![CDATA[genmapp]]></category> <category><![CDATA[KEGG]]></category> <category><![CDATA[microarray]]></category> <category><![CDATA[pathvisio]]></category> <category><![CDATA[pathway]]></category> <category><![CDATA[pathway explorer]]></category> <category><![CDATA[wikipathways]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1174</guid> <description><![CDATA[Superimposing gene expression data onto pathways from databases is a common task in the final steps of microarray data analysis &#8211; that is, biological interpretation and results discussion. I have found many tools which claim to facilitate this procedure. Some of them are reviewed below (in no specific order). Pathway Explorer by Bernhard Mlecnik was [&#8230;]]]></description> <content:encoded><![CDATA[<p>Superimposing gene expression data onto pathways from databases is a common task in the final steps of microarray data analysis &#8211; that is, biological interpretation and results discussion.</p><p>I have found many tools which claim to facilitate this procedure. Some of them are reviewed below (in no specific order).<br
/> <span
id="more-1174"></span><br
/> <a
href="https://pathwayexplorer.genome.tugraz.at/" class="broken_link" rel="nofollow">Pathway Explorer</a> by Bernhard Mlecnik was last updated in 2007, but is fully functional (I believe it is being maintained without changes to the last-updated date). Both online and downloadable Java applications are available. Note that for downloadable application you will need to obtain a license key &#8211; the procedure is well documented and was very fast for me.<br
/> <a
href="http://bogdan.org.ua/wp-content/uploads/2010/11/pe.png"><img
src="http://bogdan.org.ua/wp-content/uploads/2010/11/pe-500x371.png" alt="" title="Pathway Explorer" width="500" height="371" class="aligncenter size-medium wp-image-1176" /></a><br
/> Pathway Explorer supports import from 3 sources: KEGG xml files, biocarta URLs, and GenMAPP URLs. Import from KEGG does work as described in the short manual, and seems functional (I had some problems exporting/saving the resulting picture, but didn&#8217;t investigate further). Biocarta import seems to work, but for some reason does not display expression levels of pathway components. I could not test the import of GenMAPP pathways, because they are not available online.</p><p>I found Pathway Explorer good, but then switched to PathVisio (reviewed next), because for some reason Pathway Explorer was recognising only a small fraction of genes from my expression data. It could be that identifiers mappings are outdated, but this is just a guess.</p><p><a
href="http://www.pathvisio.org/">PathVisio</a> appears to be a spin-off of <a
href="http://www.genmapp.org/">GenMAPP</a> and <a
href="http://www.wikipathways.org/">WikiPathways</a>. It excells at importing/visualizing WikiPathways data, which even comes bundled with PathVisio Java application. It is easier to use than Pathway Explorer, and it seems to recognize more genes (although still not all the genes which are present in the data). There is KEGG pathways support, but it is not always usable &#8211; many edges (links between genes/proteins) are absent, so instead of a pathway you get a bunch of nodes relevant to a pathway, but cannot really see how they are connected. PathVisio supports an insanely long list of database identifiers, so it is highly unlikely that you will have to map your data to use a different identifier. This pathway mapper exports to several formats, including PNG and PDF.</p><p><a
href="http://bogdan.org.ua/wp-content/uploads/2010/11/pv.png"><img
src="http://bogdan.org.ua/wp-content/uploads/2010/11/pv-500x302.png" alt="" title="PathVisio" width="500" height="302" class="aligncenter size-medium wp-image-1181" /></a></p><p>I could not fully test <a
href="http://lbb.ut.ac.ir/projects/Affy2KEGG/WS/affy2keggWs.jsp" class="broken_link" rel="nofollow">AffyWEB</a>, because it doesn&#8217;t list rat arrays we used. Trying their barley genome example did work, so the tool is probably functional. It overlays your expression data onto KEGG pathways.</p><p><a
href="http://bogdan.org.ua/wp-content/uploads/2010/11/map00030.png"><img
src="http://bogdan.org.ua/wp-content/uploads/2010/11/map00030-500x345.png" alt="" title="AffyWEB" width="500" height="345" class="aligncenter size-medium wp-image-1177" /></a></p><p><a
href="http://www.g-language.org/data/marray/software/map2swf.cgi " class="broken_link" rel="nofollow">G-language Microarray System</a> is a comparatively simple pathway visualizer. It accepts CSV files containing EntrezGene IDs column with a single column of expression values normalized to 1-100 range, fetches requested KEGG pathway, and generates a Flash (SWF) object depicting that pathway with coloured components. It does work with sample data. I was too lazy to normalize my expression data to [1;100] range, and SWF is not exactly a usable format, so I haven&#8217;t tested this tool any further (you can right-click to zoom in the flash pathway below).<br
/> <object
classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
id="fm_ecj00010_1930207474"
class="flashmovie"
width="500"
height="800"><param
name="movie" value="http://bogdan.org.ua/wp-content/uploads/2010/11/ecj00010.swf" /> <!--[if !IE]>--> <object
type="application/x-shockwave-flash"
data="http://bogdan.org.ua/wp-content/uploads/2010/11/ecj00010.swf"
name="fm_ecj00010_1930207474"
width="500"
height="800"> <!--<![endif]--><p><a
href="http://adobe.com/go/getflashplayer"><img
src="http://www.adobe.com/images/shared/download_buttons/get_flash_player.gif" alt="Get Adobe Flash player" /></a></p> <!--[if !IE]>--> </object> <!--<![endif]--> </object></p><p>If time permits (or work requires) this post may be extended with the reviews of <a
href="http://www.genmapp.org/">GenMAPP</a>, <a
href="http://gepat.sourceforge.net/screenshots.htm">GEPAT</a>, <a
href="https://stat.ethz.ch/pipermail/bioconductor/2008-June/023112.html">KEGG2heatmap script</a>, <a
href="http://akt.ucsf.edu/EGAN/downloads.php" class="broken_link" rel="nofollow">EGAN</a>, <a
href="http://mapman.gabipd.org/web/guest/mapman">MapMan</a>, <a
href="http://bioinformatics.oxfordjournals.org/content/20/13/2156.abstract">Pathway Miner</a>, <a
href="http://nar.oxfordjournals.org/content/32/suppl_2/W460.full">ArrayXPath</a>, <a
href="http://nar.oxfordjournals.org/content/35/suppl_2/W625.full">VisANT</a>, <a
href="http://www.win.tue.nl/~mwestenb/spotxplore/">SpotXplore</a>, and maybe others.</p><p>Please comment to share your experience using pathway expression overlaying tools or to suggest other tools.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F05%2Foverlaying-gene-expression-data-onto-pathways-from-databases.html&amp;linkname=Overlaying%20gene%20expression%20data%20onto%20pathways%20from%20databases" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F05%2Foverlaying-gene-expression-data-onto-pathways-from-databases.html&amp;linkname=Overlaying%20gene%20expression%20data%20onto%20pathways%20from%20databases" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F05%2Foverlaying-gene-expression-data-onto-pathways-from-databases.html&amp;linkname=Overlaying%20gene%20expression%20data%20onto%20pathways%20from%20databases" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F05%2Foverlaying-gene-expression-data-onto-pathways-from-databases.html&amp;linkname=Overlaying%20gene%20expression%20data%20onto%20pathways%20from%20databases" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F05%2Foverlaying-gene-expression-data-onto-pathways-from-databases.html&amp;linkname=Overlaying%20gene%20expression%20data%20onto%20pathways%20from%20databases" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2010%2F11%2F05%2Foverlaying-gene-expression-data-onto-pathways-from-databases.html&#038;title=Overlaying%20gene%20expression%20data%20onto%20pathways%20from%20databases" data-a2a-url="https://bogdan.org.ua/2010/11/05/overlaying-gene-expression-data-onto-pathways-from-databases.html" data-a2a-title="Overlaying gene expression data onto pathways from databases"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2010/11/05/overlaying-gene-expression-data-onto-pathways-from-databases.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Batch-retrieve EntrezGene homologs using NCBI&#8217;s HomoloGene and R&#8217;s annotationTools</title><link>https://bogdan.org.ua/2010/10/27/batch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html</link> <comments>https://bogdan.org.ua/2010/10/27/batch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html#comments</comments> <pubDate>Wed, 27 Oct 2010 10:49:01 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[how-to]]></category> <category><![CDATA[Notepad]]></category> <category><![CDATA[annotationTools]]></category> <category><![CDATA[HomoloGene]]></category> <category><![CDATA[NCBI]]></category> <category><![CDATA[R]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1168</guid> <description><![CDATA[Install the annotationTools R package: source(&#8220;http://bioconductor.org/biocLite.R&#8221;) biocLite(&#8220;annotationTools&#8221;) Download full HomoloGene data file from ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/current library(annotationTools) homologene = read.delim(&#8220;homologene.data&#8221;, header=FALSE) mygenes = read.table(&#8220;file with one entrez ID of the source organism per line.txt&#8221;) getHOMOLOG(unlist(mygenes), taxonomy_ID_of_target_organism, homologene) [alternatively, wrap the call to getHOMOLOG into unlist to get a vector] It might be easier to achieve the same [&#8230;]]]></description> <content:encoded><![CDATA[<ol><li>Install the <a
href="http://bioconductor.org/packages/release/bioc/html/annotationTools.html">annotationTools</a> R package:<br
/> source(&#8220;http://bioconductor.org/biocLite.R&#8221;)<br
/> biocLite(&#8220;annotationTools&#8221;)</li><li>Download full HomoloGene data file from <a
href="ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/current">ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/current</a></li><li>library(annotationTools)</li><li>homologene = read.delim(&#8220;homologene.data&#8221;, header=FALSE)</li><li>mygenes = read.table(&#8220;file with one entrez ID of the source organism per line.txt&#8221;)</li><li>getHOMOLOG(unlist(mygenes), <a
href="http://www.ncbi.nlm.nih.gov/taxonomy">taxonomy_ID_of_target_organism</a>, homologene) [alternatively, wrap the call to getHOMOLOG into unlist to get a vector]</li></ol><p>It might be easier to achieve the same results with a Perl script calling NCBI&#8217;s e-utils.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F27%2Fbatch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html&amp;linkname=Batch-retrieve%20EntrezGene%20homologs%20using%20NCBI%E2%80%99s%20HomoloGene%20and%20R%E2%80%99s%20annotationTools" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F27%2Fbatch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html&amp;linkname=Batch-retrieve%20EntrezGene%20homologs%20using%20NCBI%E2%80%99s%20HomoloGene%20and%20R%E2%80%99s%20annotationTools" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F27%2Fbatch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html&amp;linkname=Batch-retrieve%20EntrezGene%20homologs%20using%20NCBI%E2%80%99s%20HomoloGene%20and%20R%E2%80%99s%20annotationTools" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F27%2Fbatch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html&amp;linkname=Batch-retrieve%20EntrezGene%20homologs%20using%20NCBI%E2%80%99s%20HomoloGene%20and%20R%E2%80%99s%20annotationTools" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F27%2Fbatch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html&amp;linkname=Batch-retrieve%20EntrezGene%20homologs%20using%20NCBI%E2%80%99s%20HomoloGene%20and%20R%E2%80%99s%20annotationTools" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F27%2Fbatch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html&#038;title=Batch-retrieve%20EntrezGene%20homologs%20using%20NCBI%E2%80%99s%20HomoloGene%20and%20R%E2%80%99s%20annotationTools" data-a2a-url="https://bogdan.org.ua/2010/10/27/batch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html" data-a2a-title="Batch-retrieve EntrezGene homologs using NCBI’s HomoloGene and R’s annotationTools"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2010/10/27/batch-retrieve-entrezgene-homologs-using-ncbi-homologene-and-r.html/feed</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>International salary survey in sciences (2010)</title><link>https://bogdan.org.ua/2010/10/14/international-salary-survey-in-sciences-2010.html</link> <comments>https://bogdan.org.ua/2010/10/14/international-salary-survey-in-sciences-2010.html#comments</comments> <pubDate>Thu, 14 Oct 2010 15:28:42 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Links]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[2010]]></category> <category><![CDATA[nature]]></category> <category><![CDATA[npg]]></category> <category><![CDATA[salary]]></category> <category><![CDATA[survey]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1144</guid> <description><![CDATA[Nature published the said survey based on responses of over 10000 employees in science. It has lots of multi-axis data to explore, and some major trends are discussed in the special report. Highly recommended for anyone considering science career changes.]]></description> <content:encoded><![CDATA[<p>Nature published the said <a
href="http://www.nature.com/naturejobs/salary/survey/2010/index.html">survey</a> based on responses of over 10000 employees in science. It has lots of multi-axis data to explore, and some major trends are discussed in the <a
href="http://www.nature.com/naturejobs/2010/100624/full/nj7301-1104a.html">special report</a>. Highly recommended for anyone considering science career changes.</p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F14%2Finternational-salary-survey-in-sciences-2010.html&amp;linkname=International%20salary%20survey%20in%20sciences%20%282010%29" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F14%2Finternational-salary-survey-in-sciences-2010.html&amp;linkname=International%20salary%20survey%20in%20sciences%20%282010%29" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F14%2Finternational-salary-survey-in-sciences-2010.html&amp;linkname=International%20salary%20survey%20in%20sciences%20%282010%29" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F14%2Finternational-salary-survey-in-sciences-2010.html&amp;linkname=International%20salary%20survey%20in%20sciences%20%282010%29" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F14%2Finternational-salary-survey-in-sciences-2010.html&amp;linkname=International%20salary%20survey%20in%20sciences%20%282010%29" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2010%2F10%2F14%2Finternational-salary-survey-in-sciences-2010.html&#038;title=International%20salary%20survey%20in%20sciences%20%282010%29" data-a2a-url="https://bogdan.org.ua/2010/10/14/international-salary-survey-in-sciences-2010.html" data-a2a-title="International salary survey in sciences (2010)"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2010/10/14/international-salary-survey-in-sciences-2010.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Tools for conversion of IDs in genomics</title><link>https://bogdan.org.ua/2010/08/10/tools-for-conversion-of-ids-in-genomics.html</link> <comments>https://bogdan.org.ua/2010/08/10/tools-for-conversion-of-ids-in-genomics.html#comments</comments> <pubDate>Tue, 10 Aug 2010 12:31:44 +0000</pubDate> <dc:creator><![CDATA[Bogdan]]></dc:creator> <category><![CDATA[Bioinformatics]]></category> <category><![CDATA[Links]]></category> <category><![CDATA[Science]]></category> <guid
isPermaLink="false">http://bogdan.org.ua/?p=1102</guid> <description><![CDATA[Tools for conversion of IDs in genomics]]></description> <content:encoded><![CDATA[<p><a
href="http://hum-molgen.org/NewsGen/08-2009/000020.html">Tools for conversion of IDs in genomics</a></p><p><a
class="a2a_button_citeulike" href="https://www.addtoany.com/add_to/citeulike?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F08%2F10%2Ftools-for-conversion-of-ids-in-genomics.html&amp;linkname=Tools%20for%20conversion%20of%20IDs%20in%20genomics" title="CiteULike" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pocket" href="https://www.addtoany.com/add_to/pocket?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F08%2F10%2Ftools-for-conversion-of-ids-in-genomics.html&amp;linkname=Tools%20for%20conversion%20of%20IDs%20in%20genomics" title="Pocket" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_kindle_it" href="https://www.addtoany.com/add_to/kindle_it?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F08%2F10%2Ftools-for-conversion-of-ids-in-genomics.html&amp;linkname=Tools%20for%20conversion%20of%20IDs%20in%20genomics" title="Kindle It" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_evernote" href="https://www.addtoany.com/add_to/evernote?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F08%2F10%2Ftools-for-conversion-of-ids-in-genomics.html&amp;linkname=Tools%20for%20conversion%20of%20IDs%20in%20genomics" title="Evernote" rel="nofollow noopener" target="_blank"></a><a
class="a2a_button_pinterest" href="https://www.addtoany.com/add_to/pinterest?linkurl=https%3A%2F%2Fbogdan.org.ua%2F2010%2F08%2F10%2Ftools-for-conversion-of-ids-in-genomics.html&amp;linkname=Tools%20for%20conversion%20of%20IDs%20in%20genomics" title="Pinterest" rel="nofollow noopener" target="_blank"></a><a
class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fbogdan.org.ua%2F2010%2F08%2F10%2Ftools-for-conversion-of-ids-in-genomics.html&#038;title=Tools%20for%20conversion%20of%20IDs%20in%20genomics" data-a2a-url="https://bogdan.org.ua/2010/08/10/tools-for-conversion-of-ids-in-genomics.html" data-a2a-title="Tools for conversion of IDs in genomics"><img
src="https://static.addtoany.com/buttons/share_save_120_16.png" alt="Share"></a></p>]]></content:encoded> <wfw:commentRss>https://bogdan.org.ua/2010/08/10/tools-for-conversion-of-ids-in-genomics.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>