Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Weird orthology species names in Ensembl

    30th September 2008

    For the COTRASIF tool, I’ve been using the Ensembl Compara database (since release 47) to automatically import into COTRASIF gene orthology mappings.

    However, with the E!50 release, the Compara database was dropped.

    Looking for another option to get orthologs from Ensembl (using martservice, via, I tried using the standard query – selecting “Homologs” group on the “Attributes” page for a single species database, and then selecting appropriate second species to get orthology mappings.

    Imagine my surprise, when not only in the interface, but also in the generated XML file I found attribute names like “cow_ensembl_gene” :-O

    I only need 11 species at the moment, and excluding the sufficiently unique name mappings like “zebrafish – danio rerio”, there is a number of questionable mappings: “yeast” for S. cerevisiae (could be S.pombe), “rat” for R. norvegicus (could be R.rattus), “anopheles” for A.gambiae (could be some other Anopheles). Other mappings might be also non-unique, especially for people working with different species of the same genus.

    Am I missing some system in this naming “convention”, or am I the only one who finds it strange?

    Is there a way not to use “common species names” when importing orthology data from Ensembl with the help of martservice?


    Posted in Bioinformatics, Science | No Comments »

    Convert MySQL database from one encoding/collation into another

    8th February 2008

    Most frequent use: convert database from latin1_swedish to utf8_general_ci.
    Original script found at: MySQL and UTF-8.

    Update: the original script had an error, it would generate queries likes this one (note the bold part):

    ALTER TABLE `links` CHANGE `link_rel` `link_rel` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL;

    This is clearly wrong syntax (and logic). I fixed this by making comparison to NULL strict (with three equal signs instead of two):

    // Does the field default to null, a string, or nothing?
    if ($row['Default'] === NULL)

    Update 2: based on comment by banesto, I modified the script; now it does not require specifying the from_collation, it’s sufficient to specify to_collation (which will be used for all the fields and tables). The modified code is:

    if ($row['Collation'] == ” || $row['Collation'] == $convert_to)

    Update 3: the long-lasting, re-appearing NOT NULL DEFAULT NULL problem is finally fixed.

    Update 4: incorporated Russ’s fix to skip numeric fields (in order to leave autoincrement values intact).

    Here’s the script itself: (to copy-paste: first click the “Plain text” header)
    Read the rest of this entry »


    Posted in Links, Notepad, PHP, Programming, Web | 59 Comments »


    18th October 2007

    To start: as of the latest MySQL, syntax presented in the title is not possible. But there are several very easy ways to accomplish what is expected using existing functionality.

    There are 3 possible solutions: using INSERT IGNORE, REPLACE, or INSERT … ON DUPLICATE KEY UPDATE.

    Imagine we have a table:

    1. CREATE TABLE `transcripts` (
    2.  `ensembl_transcript_id` varchar(20) NOT NULL,
    3.  `transcript_chrom_start` int(10) unsigned NOT NULL,
    4.  `transcript_chrom_end` int(10) unsigned NOT NULL,
    5.  PRIMARY KEY  (`ensembl_transcript_id`)
    6. ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

    Now imagine that we have an automatic pipeline importing transcripts meta-data from Ensembl, and that due to various reasons the pipeline might be broken at any step of execution. Thus, we need to ensure two things: 1) repeated executions of the pipeline will not destroy our database, and 2) repeated executions will not die due to ‘duplicate primary key’ errors.

    Method 1: using REPLACE
    Read the rest of this entry »


    Posted in Programming | 46 Comments »