<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Position Frequency Matrix to Position Weight Matrix (PFM2PWM)</title>
	<atom:link href="http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/feed" rel="self" type="application/rss+xml" />
	<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html</link>
	<description>Tiny bits of bioinformatics, [web-]programming etc</description>
	<lastBuildDate>Sun, 14 Mar 2010 08:55:31 +0200</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-103014</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Tue, 08 Dec 2009 13:14:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-103014</guid>
		<description>Assuming your program works properly (I haven&#039;t checked that):

1. Do you really need a 50-long PWM? Your nucleotides.seq does not even distantly resemble a set of aligned conserved sequences, which are usually used to construct PFMs/PWMs. You could be mixing the concepts of &quot;input sequence&quot; and &quot;sequences to construct PWM from&quot;.

2. Assuming you really want a low-Ic 50-long PWM: to perform a search, you will now need to read the sequence you want to search in, doing so in 50-long chunks, and score each chunk with your PWM - by adding up individual row-column scores of matching nucleotides. You really want to see Wasserman, 2004, for a figure explaining how this is done. Then you will need to normalize the absolute score, to get a number between 0 and 1, and then make a decision whether currently processed chunk is above the &quot;found threshold&quot; (&quot;found cut-off&quot;).</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Assuming your program works properly (I haven&#8217;t checked that):</p>
<p>1. Do you really need a 50-long PWM? Your nucleotides.seq does not even distantly resemble a set of aligned conserved sequences, which are usually used to construct PFMs/PWMs. You could be mixing the concepts of &#8220;input sequence&#8221; and &#8220;sequences to construct PWM from&#8221;.</p>
<p>2. Assuming you really want a low-Ic 50-long PWM: to perform a search, you will now need to read the sequence you want to search in, doing so in 50-long chunks, and score each chunk with your PWM &#8211; by adding up individual row-column scores of matching nucleotides. You really want to see Wasserman, 2004, for a figure explaining how this is done. Then you will need to normalize the absolute score, to get a number between 0 and 1, and then make a decision whether currently processed chunk is above the &#8220;found threshold&#8221; (&#8220;found cut-off&#8221;).<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: abraham</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-103013</link>
		<dc:creator>abraham</dc:creator>
		<pubDate>Tue, 08 Dec 2009 12:49:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-103013</guid>
		<description>Thanks for your answer. I want to understand how to find TFBS using PWM. The best way for me to learn is to implement it myself that is why I am doing the implementation step by step.

To be more specific. I wrote a c program that reads a file containing 14 lines of nucleotides. Each line has 50 nucleotides sequences. My c program reads the file and generate the PFM then after that I convert the PFM to PWM. Now what is the next step? what do I do with the PWM? How do I use it to find TFBS. I am a bit new in the field and I am still learning. Sorry if my questions sound stupid.

I can email you the c code (ANSI C) and the file containing the sequences maybe it can help to understand my concern
Thanks.
Regards,

A</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Thanks for your answer. I want to understand how to find TFBS using PWM. The best way for me to learn is to implement it myself that is why I am doing the implementation step by step.</p>
<p>To be more specific. I wrote a c program that reads a file containing 14 lines of nucleotides. Each line has 50 nucleotides sequences. My c program reads the file and generate the PFM then after that I convert the PFM to PWM. Now what is the next step? what do I do with the PWM? How do I use it to find TFBS. I am a bit new in the field and I am still learning. Sorry if my questions sound stupid.</p>
<p>I can email you the c code (ANSI C) and the file containing the sequences maybe it can help to understand my concern<br />
Thanks.<br />
Regards,</p>
<p>A<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-103008</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Tue, 08 Dec 2009 11:09:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-103008</guid>
		<description>I assume you know that PSSM (Position-Specific Scoring Matrix) is just an alternative name of PWMs. It appears that there is now an article on the subject &lt;a href=&quot;http://en.wikipedia.org/wiki/Position-Specific_Scoring_Matrix&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.

Generally, one should first look for existing tools to perform the needed task. You may find &lt;a href=&quot;http://nar.oxfordjournals.org/cgi/data/gkp084/DC1/1&quot; rel=&quot;nofollow&quot;&gt;this supplement&lt;/a&gt;, briefly comparing several existing tools, helpful in identifying the tool you need.

If you do not need an existing tool, but rather the algorithm/schema of the search, then reading an aforementioned review by Wasserman, 2004 (comments #2 and #6 on this page) will help.

Let me know if you have more questions.</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->I assume you know that PSSM (Position-Specific Scoring Matrix) is just an alternative name of PWMs. It appears that there is now an article on the subject <a href="http://en.wikipedia.org/wiki/Position-Specific_Scoring_Matrix" rel="nofollow">here</a>.</p>
<p>Generally, one should first look for existing tools to perform the needed task. You may find <a href="http://nar.oxfordjournals.org/cgi/data/gkp084/DC1/1" rel="nofollow">this supplement</a>, briefly comparing several existing tools, helpful in identifying the tool you need.</p>
<p>If you do not need an existing tool, but rather the algorithm/schema of the search, then reading an aforementioned review by Wasserman, 2004 (comments #2 and #6 on this page) will help.</p>
<p>Let me know if you have more questions.<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: abraham</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-103007</link>
		<dc:creator>abraham</dc:creator>
		<pubDate>Tue, 08 Dec 2009 10:18:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-103007</guid>
		<description>Thanks for the formula to calculate the weight. My question is once one convert the pfm to pwm how do we use the pwm to find binding sites. Everyone talks about using PSSM to find binding site but how to we use PSSM. I have managed to convert my PFM to PWM thanks to your formula. I have my PSSM, how do I use it to locate the binding sites?

Your assistance is and will be grealty appreciated</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Thanks for the formula to calculate the weight. My question is once one convert the pfm to pwm how do we use the pwm to find binding sites. Everyone talks about using PSSM to find binding site but how to we use PSSM. I have managed to convert my PFM to PWM thanks to your formula. I have my PSSM, how do I use it to locate the binding sites?</p>
<p>Your assistance is and will be grealty appreciated<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Chou</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-93293</link>
		<dc:creator>Steven Chou</dc:creator>
		<pubDate>Thu, 18 Dec 2008 03:36:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-93293</guid>
		<description>Thank you very much Bodan for help me finding the scotoma ,I am a newbie in perl,Because I read the English document very slow,so there is long way to go,however,thanks for your information~~</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Thank you very much Bodan for help me finding the scotoma ,I am a newbie in perl,Because I read the English document very slow,so there is long way to go,however,thanks for your information~~<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-93132</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Wed, 17 Dec 2008 11:45:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-93132</guid>
		<description>Steven,

post has links to TFBS Perl module, which AFAIK has PFM2PWM conversion.</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Steven,</p>
<p>post has links to TFBS Perl module, which AFAIK has PFM2PWM conversion.<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-93131</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Wed, 17 Dec 2008 11:38:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-93131</guid>
		<description>Hi Steven,

there is nothing wrong with you. Fortunately, there seems to be nothing wrong with me either.

For the example we are discussing:

[code]
w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )
[/code]
and
[code]
w = log2 (   1.901387819       / 16.605551275    / 0.25 )
[/code]

If I calculate that as w = log2 ( 1.901387819 / (16.605551275 / 0.25) ) - note additional braces - then I&#039;d get -5.12654089.
But if I calculate exactly in the order written, as w = log2 ( (1.901387819 / 16.605551275) / 0.25 ) - braces added for clarity - then I get -1.12654089.

So, on the one hand, the error in your calculation was to group 16.605551275 / 0.25, which is ( N + sqrt(N) ) / p), although it is &lt;strong&gt;not&lt;/strong&gt; grouped in the formula; the error stems from the incorrect order of operations. On the other hand, your finding of (13+sqrt(13))*0.25) = 4.1513878 and further correct results makes sense:

[code]
w = (x/y)/z (this is the correct order of calculations, not w = x/(y/z) ),
[/code]
[code]
w = (x/y)*1/z
[/code]
[code]
w = x/(y*z)
[/code]

So w = (x/y)/z = x/(y*z), log2 ( 1.901387819 / 16.605551275 / 0.25 ) = log2 ( 1.901387819 / (16.605551275 * 0.25) ).</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Hi Steven,</p>
<p>there is nothing wrong with you. Fortunately, there seems to be nothing wrong with me either.</p>
<p>For the example we are discussing:</p>
<div class="igBar"><span id="lcode-1"><a href="#" onclick="javascript:showPlainTxt('code-1'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-1">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">w = log2 <span style="color:#006600; font-weight:bold;">&#40;</span> <span style="color:#006600; font-weight:bold;">&#40;</span> f + sqrt<span style="color:#006600; font-weight:bold;">&#40;</span>N<span style="color:#006600; font-weight:bold;">&#41;</span> * p <span style="color:#006600; font-weight:bold;">&#41;</span> / <span style="color:#006600; font-weight:bold;">&#40;</span> N + sqrt<span style="color:#006600; font-weight:bold;">&#40;</span>N<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#41;</span> / p <span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p>
and</p>
<div class="igBar"><span id="lcode-2"><a href="#" onclick="javascript:showPlainTxt('code-2'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-2">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">w = log2 <span style="color:#006600; font-weight:bold;">&#40;</span>&nbsp; &nbsp;<span style="color:#800000;color:#800000;">1</span>.<span style="color:#800000;color:#800000;">901387819</span>&nbsp; &nbsp; &nbsp; &nbsp;/ <span style="color:#800000;color:#800000;">16</span>.<span style="color:#800000;color:#800000;">605551275</span>&nbsp; &nbsp; / <span style="color:#800000;color:#800000;">0</span>.<span style="color:#800000;color:#800000;">25</span> <span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>If I calculate that as w = log2 ( 1.901387819 / (16.605551275 / 0.25) ) - note additional braces - then I'd get -5.12654089.<br />
But if I calculate exactly in the order written, as w = log2 ( (1.901387819 / 16.605551275) / 0.25 ) - braces added for clarity - then I get -1.12654089.</p>
<p>So, on the one hand, the error in your calculation was to group 16.605551275 / 0.25, which is ( N + sqrt(N) ) / p), although it is <strong>not</strong> grouped in the formula; the error stems from the incorrect order of operations. On the other hand, your finding of (13+sqrt(13))*0.25) = 4.1513878 and further correct results makes sense:</p>
<div class="igBar"><span id="lcode-3"><a href="#" onclick="javascript:showPlainTxt('code-3'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-3">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">w = <span style="color:#006600; font-weight:bold;">&#40;</span>x/y<span style="color:#006600; font-weight:bold;">&#41;</span>/z <span style="color:#006600; font-weight:bold;">&#40;</span>this is the correct order of calculations, not w = x/<span style="color:#006600; font-weight:bold;">&#40;</span>y/z<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#41;</span>, </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<div class="igBar"><span id="lcode-4"><a href="#" onclick="javascript:showPlainTxt('code-4'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-4">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">w = <span style="color:#006600; font-weight:bold;">&#40;</span>x/y<span style="color:#006600; font-weight:bold;">&#41;</span>*<span style="color:#800000;color:#800000;">1</span>/z </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<div class="igBar"><span id="lcode-5"><a href="#" onclick="javascript:showPlainTxt('code-5'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-5">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">w = x/<span style="color:#006600; font-weight:bold;">&#40;</span>y*z<span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>So w = (x/y)/z = x/(y*z), log2 ( 1.901387819 / 16.605551275 / 0.25 ) = log2 ( 1.901387819 / (16.605551275 * 0.25) ).<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Chou</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-93124</link>
		<dc:creator>Steven Chou</dc:creator>
		<pubDate>Wed, 17 Dec 2008 09:40:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-93124</guid>
		<description>Hello,Bogdan,I search in the internet and want to find a tool could translate PFM into PWM,this page is useful.
But I have a little and stupid question in the formula:

w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )

and your example:

-1.1265 = log2((1+0.25*sqrt(13))/(13+sqrt(13))/0.25)

I calculate that with my pen and OFFICE excel

((1+0.25*sqrt(13)) = 1.9013878

(13+sqrt(13))/0.25) = 66.4222051

SO  
log2((1+0.25*sqrt(13))/(13+sqrt(13))/0.25) = log2(0.0286258) = -5.1265409 not -1.1265

And I found if (13+sqrt(13))/0.25) change to (13+sqrt(13))*0.25)

then 

(13+sqrt(13))*0.25) = 4.1513878

final answer will be log2(1.9013878/4.1513878) = -1.1265409

I am confused about this,please tell me what&#039;s wrong with me...

However, sorry for my poor english.</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Hello,Bogdan,I search in the internet and want to find a tool could translate PFM into PWM,this page is useful.<br />
But I have a little and stupid question in the formula:</p>
<p>w = log2 ( ( f + sqrt(N) * p ) / ( N + sqrt(N) ) / p )</p>
<p>and your example:</p>
<p>-1.1265 = log2((1+0.25*sqrt(13))/(13+sqrt(13))/0.25)</p>
<p>I calculate that with my pen and OFFICE excel</p>
<p>((1+0.25*sqrt(13)) = 1.9013878</p>
<p>(13+sqrt(13))/0.25) = 66.4222051</p>
<p>SO<br />
log2((1+0.25*sqrt(13))/(13+sqrt(13))/0.25) = log2(0.0286258) = -5.1265409 not -1.1265</p>
<p>And I found if (13+sqrt(13))/0.25) change to (13+sqrt(13))*0.25)</p>
<p>then </p>
<p>(13+sqrt(13))*0.25) = 4.1513878</p>
<p>final answer will be log2(1.9013878/4.1513878) = -1.1265409</p>
<p>I am confused about this,please tell me what's wrong with me...</p>
<p>However, sorry for my poor english.<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42475</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Sat, 22 Dec 2007 12:19:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42475</guid>
		<description>Of course I&#039;ll check.

The page opened for me, but I do believe there can be problems - we&#039;re using two separate servers (interface &amp; workhorse), which are physically quite distant and connected only via some 13-hop public internet channels which can be slow at times. I&#039;ll try to negotiate more reliable single-server collocation at my institute.

I sent the file to you via email (or you can try again to see if it works from the site). I also significantly extended the &lt;a href=&quot;http://biomed.org.ua/COTRASIF/help.html&quot; rel=&quot;nofollow&quot;&gt;help page&lt;/a&gt;, which now explains the format and meaning of the results file, and also some other important things about the functioning of COTRASIF. Please pay attention to the &quot;duplicate lines&quot; problem described on the help page - I considered that issue resolved until I had a look at your results file. So thanks for you help!

Feedback, criticism and suggestions are welcome - you may use both my email and &lt;a href=&quot;http://bogdan.org.ua/contact/&quot; rel=&quot;nofollow&quot;&gt;contact page&lt;/a&gt; for replies.</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Of course I'll check.</p>
<p>The page opened for me, but I do believe there can be problems - we're using two separate servers (interface &#038; workhorse), which are physically quite distant and connected only via some 13-hop public internet channels which can be slow at times. I'll try to negotiate more reliable single-server collocation at my institute.</p>
<p>I sent the file to you via email (or you can try again to see if it works from the site). I also significantly extended the <a href="http://biomed.org.ua/COTRASIF/help.html" rel="nofollow">help page</a>, which now explains the format and meaning of the results file, and also some other important things about the functioning of COTRASIF. Please pay attention to the "duplicate lines" problem described on the help page - I considered that issue resolved until I had a look at your results file. So thanks for you help!</p>
<p>Feedback, criticism and suggestions are welcome - you may use both my email and <a href="http://bogdan.org.ua/contact/" rel="nofollow">contact page</a> for replies.<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: realzhang</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42456</link>
		<dc:creator>realzhang</dc:creator>
		<pubDate>Sat, 22 Dec 2007 07:33:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42456</guid>
		<description>to Bogdan:

Yes, the result email was delayed about 2 days, and the &quot;submitted&quot; notification and &quot;finished&quot; notification were dilivered at the same time. After all, I received the mail.

But the result page seems blank. May you can check it for me? http://biomed.org.ua/COTRASIF/results/PWM/10_2007-12-20.txt</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->to Bogdan:</p>
<p>Yes, the result email was delayed about 2 days, and the "submitted" notification and "finished" notification were dilivered at the same time. After all, I received the mail.</p>
<p>But the result page seems blank. May you can check it for me? <a href="http://biomed.org.ua/COTRASIF/results/PWM/10_2007-12-20.txt" rel="nofollow">http://biomed.org.ua/COTRASIF/results/PWM/10_2007-12-20.txt</a><!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42420</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Fri, 21 Dec 2007 06:51:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42420</guid>
		<description>Realzhang,

did you get the link to results file from the task you submitted? As noted on the task submission page, gmail and yahoo sometimes either reject or delay for several days the delivery of emails from our processing server.

Your task was complete 5 minutes after submission, but I wonder if it got through to your inbox.</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Realzhang,</p>
<p>did you get the link to results file from the task you submitted? As noted on the task submission page, gmail and yahoo sometimes either reject or delay for several days the delivery of emails from our processing server.</p>
<p>Your task was complete 5 minutes after submission, but I wonder if it got through to your inbox.<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42293</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Wed, 19 Dec 2007 16:07:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42293</guid>
		<description>Realzhang,

there&#039;s some ambiguity in your question on PWM similarity threshold.

If you are interested in comparing PWM matrices, then I&#039;d suggest &lt;a href=&quot;http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/3/307&quot; rel=&quot;nofollow&quot;&gt;Similarity of position frequency matrices for transcription factor binding sites&lt;/a&gt;.

However, if you mean the matrix-to-sequence similarity (the threshold/cut-off problem, which arises when looking for TFBSs with a known PFM/PWM matrix) - then it&#039;s a complicated issue. From what I had previously seen in literature, 0.75 similarity (relative score) is often used. Based on my little &lt;a href=&quot;http://biopolymers.org.ua/2007/04/BTTokovenko.pdf&quot; rel=&quot;nofollow&quot;&gt;research&lt;/a&gt; (look for Fig.1 and explanations in the text), for ISRE TFBS in &lt;em&gt;rattus norvegicus&lt;/em&gt; promoters 0.75 similarity includes some 2/3 of all the maximal-scoring ISREs in all rat promoters. Though I used the 0.8 similarity cut-off, now I think that 0.75 (or even 0.7, given enough post-processing) is much more favourable. (Note: Fig.1 in that PDF has some possibly important theoretical flaws, but it&#039;s a fair representation of actual maximal matrix-sequence scores, obtained for promoters and exons in rat.)

I think that TFBS search itself, no matter how you optimize the threshold, will not give biologically valid results. (The only exception I can think of right now is developing some algorithm which would automatically adjust the cut-off individually for each searched promoter or even each individual searched sub-sequence; however, it&#039;s unclear what should be the criteria for such an algorithm to adjust the cut-off.) Thus, the best approach would be to use the lowest meaningful cut-off, and then just apply a series of filters (post-processors), which would refine the results set. One of the approaches to do that is to somehow employ phylogenetic information and evolutionary sequences conservation/divergence.

The UCSC link you gave me tries to do just that. (By the way, the only interesting thing in their calculations of PWM-sequence scores is the calculation of Z-score - this I haven&#039;t met before, and that&#039;s something to evaluate.)

Actually, I&#039;m nearly done developing a genome-wide TFBS finder web-tool (&lt;a href=&quot;http://biomed.org.ua/COTRASIF/&quot; rel=&quot;nofollow&quot;&gt;COTRASIF&lt;/a&gt;), which also relies on the inter-species evolutionary conservation of sequences - but I do that somewhat differently from the method described at UCSC. I&#039;ll make an official &quot;COTRASIF opening&quot; post in the nearest future, when all the initial features will be complete, and there will be a sufficient description for the tool.

Meanwhile, if you are interested, you may join the development of COTRASIF. This isn&#039;t a paying job (the project, at least currently, is not commercial), but it just might fit your interests. And there are huge and challenging plans for future :) (including additional results filtering by the DNA 3D-structure.... but psst, I didn&#039;t say that!)</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Realzhang,</p>
<p>there's some ambiguity in your question on PWM similarity threshold.</p>
<p>If you are interested in comparing PWM matrices, then I'd suggest <a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/3/307" rel="nofollow">Similarity of position frequency matrices for transcription factor binding sites</a>.</p>
<p>However, if you mean the matrix-to-sequence similarity (the threshold/cut-off problem, which arises when looking for TFBSs with a known PFM/PWM matrix) - then it's a complicated issue. From what I had previously seen in literature, 0.75 similarity (relative score) is often used. Based on my little <a href="http://biopolymers.org.ua/2007/04/BTTokovenko.pdf" rel="nofollow">research</a> (look for Fig.1 and explanations in the text), for ISRE TFBS in <em>rattus norvegicus</em> promoters 0.75 similarity includes some 2/3 of all the maximal-scoring ISREs in all rat promoters. Though I used the 0.8 similarity cut-off, now I think that 0.75 (or even 0.7, given enough post-processing) is much more favourable. (Note: Fig.1 in that PDF has some possibly important theoretical flaws, but it's a fair representation of actual maximal matrix-sequence scores, obtained for promoters and exons in rat.)</p>
<p>I think that TFBS search itself, no matter how you optimize the threshold, will not give biologically valid results. (The only exception I can think of right now is developing some algorithm which would automatically adjust the cut-off individually for each searched promoter or even each individual searched sub-sequence; however, it's unclear what should be the criteria for such an algorithm to adjust the cut-off.) Thus, the best approach would be to use the lowest meaningful cut-off, and then just apply a series of filters (post-processors), which would refine the results set. One of the approaches to do that is to somehow employ phylogenetic information and evolutionary sequences conservation/divergence.</p>
<p>The UCSC link you gave me tries to do just that. (By the way, the only interesting thing in their calculations of PWM-sequence scores is the calculation of Z-score - this I haven't met before, and that's something to evaluate.)</p>
<p>Actually, I'm nearly done developing a genome-wide TFBS finder web-tool (<a href="http://biomed.org.ua/COTRASIF/" rel="nofollow">COTRASIF</a>), which also relies on the inter-species evolutionary conservation of sequences - but I do that somewhat differently from the method described at UCSC. I'll make an official "COTRASIF opening" post in the nearest future, when all the initial features will be complete, and there will be a sufficient description for the tool.</p>
<p>Meanwhile, if you are interested, you may join the development of COTRASIF. This isn't a paying job (the project, at least currently, is not commercial), but it just might fit your interests. And there are huge and challenging plans for future <img src='http://bogdan.org.ua/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  (including additional results filtering by the DNA 3D-structure.... but psst, I didn't say that!)<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: realzhang</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42292</link>
		<dc:creator>realzhang</dc:creator>
		<pubDate>Wed, 19 Dec 2007 15:11:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42292</guid>
		<description>Bogdan:
Thanks for your reply. I&#039;ve seen the use of ln() at &lt;a href=&quot;http://expressome.kobic.re.kr/wita/document-1.jsp&quot; rel=&quot;nofollow&quot;&gt;WITA&lt;/a&gt;, which just let the pseudocount=1. After reading your reply, I think thers is no significant difference between the log2 and ln() if all the matrices use the same base.

By the way, do you have any good idea on how to determin the threshold of the PWM similarity? The &lt;a href=&quot;http://genome.ucsc.edu/cgi-bin/hgTables?db=hg18&amp;hgta_group=regulation&amp;hgta_track=tfbsConsSites&amp;hgta_table=tfbsConsSites&amp;hgta_doSchema=describe+table+schema&quot; rel=&quot;nofollow&quot;&gt;genome.UCSC&lt;/a&gt; uses a interesing statistical method, and how do you think of this problem?</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Bogdan:<br />
Thanks for your reply. I've seen the use of ln() at <a href="http://expressome.kobic.re.kr/wita/document-1.jsp" rel="nofollow">WITA</a>, which just let the pseudocount=1. After reading your reply, I think thers is no significant difference between the log2 and ln() if all the matrices use the same base.</p>
<p>By the way, do you have any good idea on how to determin the threshold of the PWM similarity? The <a href="http://genome.ucsc.edu/cgi-bin/hgTables?db=hg18&amp;hgta_group=regulation&amp;hgta_track=tfbsConsSites&amp;hgta_table=tfbsConsSites&amp;hgta_doSchema=describe+table+schema" rel="nofollow">genome.UCSC</a> uses a interesing statistical method, and how do you think of this problem?<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bogdan</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42109</link>
		<dc:creator>Bogdan</dc:creator>
		<pubDate>Tue, 18 Dec 2007 12:58:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42109</guid>
		<description>Realzhang,

it should be log&lt;sub&gt;2&lt;/sub&gt;(). For the explanation why, please see the &quot;Bioinformatics&quot; book by David W. Mount (section on PSSM information content).

In short, log&lt;sub&gt;2&lt;/sub&gt;() is used to determine the uncertainty (entropy) and information content, thus it is also used for the PFM2PWM conversion.

However, in the resources cited in the post both base 2 and base e logarithms are used (see e.g. &lt;a href=&quot;http://tfbs.genereg.net/DOC/TFBS/Matrix/PFM.html#CODE9&quot; rel=&quot;nofollow&quot;&gt;Perl modules documentation for TFBS 0.5&lt;/a&gt; and Jason&#039;s &lt;a href=&quot;http://home.cc.umanitoba.ca/~umhamlin/PLNT7690/presentation/5.html&quot; rel=&quot;nofollow&quot;&gt;slide 5&lt;/a&gt;, which, in turn, cites &lt;a href=&quot;http://www.cs.utsa.edu/~hugelab/res/20060912_ak_paper.pdf&quot; rel=&quot;nofollow&quot;&gt;Wasserman, 2004&lt;/a&gt;; actually, the only reference to using ln() is in &lt;a href=&quot;http://rsat.ulb.ac.be/rsat/help.convert-matrix.html&quot; rel=&quot;nofollow&quot;&gt;Hertz, 1999&lt;/a&gt;).</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Realzhang,</p>
<p>it should be log<sub>2</sub>(). For the explanation why, please see the "Bioinformatics" book by David W. Mount (section on PSSM information content).</p>
<p>In short, log<sub>2</sub>() is used to determine the uncertainty (entropy) and information content, thus it is also used for the PFM2PWM conversion.</p>
<p>However, in the resources cited in the post both base 2 and base e logarithms are used (see e.g. <a href="http://tfbs.genereg.net/DOC/TFBS/Matrix/PFM.html#CODE9" rel="nofollow">Perl modules documentation for TFBS 0.5</a> and Jason's <a href="http://home.cc.umanitoba.ca/~umhamlin/PLNT7690/presentation/5.html" rel="nofollow">slide 5</a>, which, in turn, cites <a href="http://www.cs.utsa.edu/~hugelab/res/20060912_ak_paper.pdf" rel="nofollow">Wasserman, 2004</a>; actually, the only reference to using ln() is in <a href="http://rsat.ulb.ac.be/rsat/help.convert-matrix.html" rel="nofollow">Hertz, 1999</a>).<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: realzhang</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-42088</link>
		<dc:creator>realzhang</dc:creator>
		<pubDate>Tue, 18 Dec 2007 06:08:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-42088</guid>
		<description>I think it should be ln, i.e. log(e,x), not log2, isn&#039;t it?</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->I think it should be ln, i.e. log(e,x), not log2, isn't it?<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PFM2PWM: which &#8220;nucleotide background frequency&#8221; to use &#187; Autarchy of the Private Cave</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-5070</link>
		<dc:creator>PFM2PWM: which &#8220;nucleotide background frequency&#8221; to use &#187; Autarchy of the Private Cave</dc:creator>
		<pubDate>Thu, 03 May 2007 14:53:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-5070</guid>
		<description>[...] Position Frequency Matrix to Position Weight Matrix (PFM2PWM) [...]</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->[...] Position Frequency Matrix to Position Weight Matrix (PFM2PWM) [...]<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chronos</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-401</link>
		<dc:creator>chronos</dc:creator>
		<pubDate>Tue, 20 Mar 2007 06:40:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-401</guid>
		<description>Jason, citing this page is perfectly OK. The &quot;pseudocount function&quot; was also used as a name for sqrt(N) part in the source where I first encountered the formula.

Thanks for the reference to Wasserman, 2004.</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Jason, citing this page is perfectly OK. The "pseudocount function" was also used as a name for sqrt(N) part in the source where I first encountered the formula.</p>
<p>Thanks for the reference to Wasserman, 2004.<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-400</link>
		<dc:creator>Jason</dc:creator>
		<pubDate>Tue, 20 Mar 2007 05:57:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-400</guid>
		<description>Hey thanks for the formula. Youre using the common method of sqrt(N) to compensate for small sample sizes. It turns out that this is still commonly used - see BOX 2 in (Wasserman, 2004) Applied Bioinformatics for the identification of regulatory elements. Wasserman uses the exact formula you have here but states using a &quot;pseudocount function&quot;. There are other methods for compensating for small samples, plus more stuff for nucleotide frequencies in regulatory regions, but the formula on this page is... succinct. I&#039;m making a presentation for class and your page is cited if thats ok.

-j</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->Hey thanks for the formula. Youre using the common method of sqrt(N) to compensate for small sample sizes. It turns out that this is still commonly used - see BOX 2 in (Wasserman, 2004) Applied Bioinformatics for the identification of regulatory elements. Wasserman uses the exact formula you have here but states using a "pseudocount function". There are other methods for compensating for small samples, plus more stuff for nucleotide frequencies in regulatory regions, but the formula on this page is... succinct. I'm making a presentation for class and your page is cited if thats ok.</p>
<p>-j<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PFM2PWM: which &#8220;nucleotide background frequency&#8221; to use &#187; Autarchy of the Private Cave</title>
		<link>http://bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html/comment-page-1#comment-18</link>
		<dc:creator>PFM2PWM: which &#8220;nucleotide background frequency&#8221; to use &#187; Autarchy of the Private Cave</dc:creator>
		<pubDate>Wed, 18 Oct 2006 20:39:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.bogdan.org.ua/2006/09/11/position-frequency-matrix-to-position-weight-matrix-pfm2pwm.html#comment-18</guid>
		<description>[...] Position Frequency Matrix to Position Weight Matrix (PFM2PWM)search-hilite.php fileSelecting notebook model to suit your needs [...]</description>
		<content:encoded><![CDATA[<p><!-- google_ad_section_start -->[...] Position Frequency Matrix to Position Weight Matrix (PFM2PWM)search-hilite.php fileSelecting notebook model to suit your needs [...]<!-- google_ad_section_end --></p>
]]></content:encoded>
	</item>
</channel>
</rss>
