Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for September, 2006

    Mean, standard deviation, and stem-and-leaf plot

    14th September 2006

    I am doing some simple statistics now, and had to review some basic concepts like standard deviation.
    As a note to myself and anyone interested, here it stays.

    The mean is just a sum of all your numerical observations, divided by the number of observations. E.g., if you have measured how tall your 5 children are, and got the values 1.42, 1.56, 1.05, 1.89, 1.92, the “mean height” of your children will be x = (1.42 + 1.56 + 1.05 + 1.89 + 1.92) / 5, x = 7.84 / 5 = 1.568 (all values in metres).

    The mean itself doesn’t tell you much, however. If you had this 1.568 mean available, you wouldn’t know even the range of heights.

    The standard deviation helps with this. First of all, it is measured in the same units as initial data – i.e. metres in our example. Second, it gives you an idea of how strongly do the measured values differ in your sample – the bigger the deviation, the longer is the value range measured in the sample.
    Read the rest of this entry »


    Posted in Science | No Comments »

    A ton of humour

    13th September 2006

    Sitting late in the night in front of my PC, and doing something boring, I stumbled upon the albinoblacksheep website. The two things I read were Cyber Sex (with a kind of follow-up) and probably old but anyway funny Automation: A Way of the Past – I liked that, so here’re the links :) .
    And if you’re Star Wars fan – you MUST watch ASCII Star Wars, for the Force to stay with you ;-)

    I didn’t read anything else there, but looks like a funny resource.

    Hope you will enjoy :)


    Posted in Notepad, Web | No Comments »

    PFM2PWM: which “nucleotide background frequency” to use

    13th September 2006

    As I previously mentioned, in converting PFM to PWM single variable – [prior] background nucleotide frequency – was ambiguous to me. From other articles I noticed that it is usually set to 0.25 (1/4 – because there 4 nucleotides, thus in “perfectly random” sequence they would appear in 25% of cases each). In that post, I also thought of using “real” background frequency of nucleotides, calculated from the sequence, to which the matrix is to be applied.

    I wrote a program to search all the 1000-basepair upstream sequences from all human, rat and mouse genes, present in Ensembl database release 40 (assuming those 1kb upstreams to be “promoters” of genes). For each promoter, only the single best score was returned. Then I draw a graph of the distribution of the number of promoters (y-axis) depending on the best match scores (x-axis). I did the search twice – one with p(b) = 0.25, and one with p(b) set to calculated values of A/C/G/T content in each promoter.
    Read the rest of this entry »


    Posted in Bioinformatics | 1 Comment »

    Position Frequency Matrix to Position Weight Matrix (PFM2PWM)

    11th September 2006

    In the course of my current research, I was dealing with the TFBS (Transcription Factor Binding Sites) search. To perfrom the search, one needs position weight matrix (PWM) for each TFBS. When you refer to the TRANSFAC database of transcription factors (and matrices), you will get position frequency matrix (PFM), and will need to convert PFM into PWM.

    I did find a couple of conversion formulas, but that was quite an effort to figure out which one is correct – I had seen two different formula variations. Here I will share what I had found.
    Read the rest of this entry »


    Posted in Bioinformatics | 25 Comments »

    PHP-Nuke 6.0/6.5 to Drupal 4.7.x/5.x migration (conversion)

    8th September 2006

    Post last updated: April 18, 2010.

    Now there is a Drupal 6.x module available. It is in no way related to the migrate script(s) below.

    The newest script version migrates from PHP-Nuke 6.5 to Drupal 5.x.
    Download the latest version of the migration script.

    In 2002 I set up a PHPNuke-6.0 – based portal. Eventually it died due to the lack of time investments and support from collaborators. Now, when time came to revive the project, I made a search and decided to use Drupal as a base CMS for the portal.
    In order to migrate userbase from an old portal to the new Drupal-powered one, and following the topic at, I found a script and its modification.
    I used it to migrate only users, and made some cosmetic changes:

    • added options for custom phpnuke table prefixes
    • default user name is now = uname (login), not ‘temp_name’, as before
    • I replaced hard-coded links to ‘migrate.php’ with links to $_SERVER['PHP_SELF'], so that if you rename the script you don’t have any problems with that :)
    • now forum topics should not be promoted to the main page (changed 1 to 0 as hinted by Alexis)

    Finally, I would like to thank both Karthik Kumar for the original script and Alexis Bellido for the 6.0_to_4.7 modification.
    Read the rest of this entry »


    Posted in CMS, Drupal, how-to, PHP, Programming, Web | 77 Comments »

    Allow posting duplicate form-name entries with different values

    6th September 2006

    Sometimes, writing automatic HTML forms processors, you need to post several values with the same name of the form field, e.g.:
    collection_gene = str_chrom_name
    collection_gene = gene_stable_id

    This is against the RFC on form fields design and submitting, but this approach is used – for example, by Ensembl. I spent some time to figure out how to make HTTP_Client and HTTP_Request submit multiple ‘name-value’ pairs instead of one (the latest defined, which overrides the previous). The solution is extremely simple:
    Read the rest of this entry »


    Posted in Bioinformatics, how-to, PHP, Programming, Science | No Comments »

    Avoiding out of memory fatal error when using HTTP_Client or HTTP_Request

    6th September 2006

    If you fetch large amounts of data (e.g. over 2MB per request) using HTTP_Client (or HTTP_request), you may get “out of memory” fatal errors, especially if:

    1. memory_limit is set to default 8M, and
    2. you process multiple pages using single non-reset instance of HTTP_Client object.

    This problem can manifest itself by producing fatal error after a couple of cycles of successful page retrieval – but always, if run with the same parameters, after some constant or only slightly variable number of cycles.

    In my case the problem was that HTTP_Request (a dependancy of HTTP_Client) was holding in memory all the previously fetched pages of the current session (the ‘history’ feature). To force HTTP_Request to hold only the most recent page, you need to ‘disable’ history after creating the HTTP_Client or HTTP_Request object instance:

    1. $req = &new HTTP_Client($params, $headers);
    2. // disable history to save memory
    3. $req->enableHistory(false);

    Hope this helps you.


    Posted in Bioinformatics, how-to, PHP, Programming, Science | No Comments »