Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for the 'Programming' Category

    Convert MySQL database from one encoding/collation into another

    8th February 2008

    Most frequent use: convert database from latin1_swedish to utf8_general_ci.
    Original script found at: MySQL and UTF-8.

    Update: the original script had an error, it would generate queries likes this one (note the bold part):

    ALTER TABLE `links` CHANGE `link_rel` `link_rel` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT NULL;

    This is clearly wrong syntax (and logic). I fixed this by making comparison to NULL strict (with three equal signs instead of two):

    // Does the field default to null, a string, or nothing?
    if ($row['Default'] === NULL)

    Update 2: based on comment by banesto, I modified the script; now it does not require specifying the from_collation, it’s sufficient to specify to_collation (which will be used for all the fields and tables). The modified code is:

    if ($row['Collation'] == ” || $row['Collation'] == $convert_to)
    continue;

    Update 3: the long-lasting, re-appearing NOT NULL DEFAULT NULL problem is finally fixed.

    Update 4: incorporated Russ’s fix to skip numeric fields (in order to leave autoincrement values intact).

    Here’s the script itself: (to copy-paste: first click the “Plain text” header)
    Read the rest of this entry »

    Share

    Posted in Links, Notepad, PHP, Programming, Web | 60 Comments »

    MySQL: INSERT IF NOT EXISTS syntax

    18th October 2007

    To start: as of the latest MySQL, syntax presented in the title is not possible. But there are several very easy ways to accomplish what is expected using existing functionality.

    There are 3 possible solutions: using INSERT IGNORE, REPLACE, or INSERT … ON DUPLICATE KEY UPDATE.

    Imagine we have a table:

    1. CREATE TABLE `transcripts` (
    2.  `ensembl_transcript_id` varchar(20) NOT NULL,
    3.  `transcript_chrom_start` int(10) unsigned NOT NULL,
    4.  `transcript_chrom_end` int(10) unsigned NOT NULL,
    5.  PRIMARY KEY  (`ensembl_transcript_id`)
    6. ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

    Now imagine that we have an automatic pipeline importing transcripts meta-data from Ensembl, and that due to various reasons the pipeline might be broken at any step of execution. Thus, we need to ensure two things: 1) repeated executions of the pipeline will not destroy our database, and 2) repeated executions will not die due to ‘duplicate primary key’ errors.

    Method 1: using REPLACE
    Read the rest of this entry »

    Share

    Posted in Programming | 46 Comments »

    GoDaddy: undocumented 20-second CPU time maximal execution limit? (python, ELF, etc)

    16th October 2007

    Today, setting up a relatively serious (in CPU resources needed) web-system, I ran into a weird problem of python scripts ending prematurely. After some investigation, it looked like any process which uses up more than 20 seconds of CPU time, is automatically killed. To verify this, I wrote an infinite loop in C,

    int main () {
    unsigned int i;

    for (i = 0; i < 2 ; i++ ) { i = 0; } return 0; } [/c] compiled it and executed several times on the GoDaddy shared hosting server. I did observe the program running for the maximum of 20 seconds of CPU time, not a second more. Please note, that 20 seconds of CPU time can be much more of “real” time, if the script isn’t using 100% of CPU, which often the case for shared hosting. Thus if you have in your php.ini max_execution_time set to, say, 60 seconds, your php script may actually execute as long as one minute; but I’m pretty sure that if your script has lots of CPU-intensive procedures, then as soon as it uses 20 seconds of CPU time, it will be terminated (however, this statement still needs checking – anyone?). To verify, I also created a cron job with the same file. It ran for 30 seconds CPU time. Strangely, this behaviour is not documented anywhere. This limit may also explain a number of other problems, if you have heavy web-applications: they just might be killed before they are finished, causing errors. I do understand the reason for this limitation, and am sure similar limitations exist in other shared hosting environments. The only important thing here is that this limit should have been documented and even put upfront somewhere in the hosting plans descriptions. I also wonder if the limit is the same for all godaddy shared hosting plans, or if it differs. 20 seconds when executed from PHP, and 30 seconds when executed as a cron job were observed on the Deluxe Linux Hosting plan. Extensions, additions and comments are welcome.

    Share

    Posted in *nix, Misc, Programming, Web | 22 Comments »

    Useful Python looping techniques

    26th September 2007

    These are all excerpts from the Python documentation.

    To synchronously and simultaneously loop over two sequences:

    1. questions = ['name', 'quest', 'favourite colour']
    2. answers = ['Lancelot', 'the holy grail', 'blue']
    3.  
    4. for q, a in zip(questions, answers):
    5.     print 'What is your %s?  It is %s.' % (q, a)

    To loop over a sequence with both key and value:
    Read the rest of this entry »

    Share

    Posted in Programming, Python | No Comments »

    How to sort Python dict (dictionary)

    26th September 2007

    Sample script (copypasted from Well House Consultants training course):
    click the PLAIN TEXT header for copy-pasteable version

    1. #!/usr/local/bin/python
    2.  
    3. author = {"php":"Rasmus Lerdorf",\
    4.     "perl":"Larry Wall",\
    5.     "tcl":"John Ousterhout",\
    6.     "awk":"Brian Kernighan",\
    7.     "java":"James Gosling",\
    8.     "parrot":"Simon Cozens",\
    9.     "python":"Guido van Rossum"}
    10.  
    11. langs = author.keys()
    12. langs.sort()
    13.  
    14. for language in langs:
    15.     print language,"is the child of",author[language]

    You can also define the Python ksort() function similar to that found in PHP:
    Read the rest of this entry »

    Share

    Posted in Programming, Python | 1 Comment »

    PHP proxy

    19th September 2007

    PHP proxy is simple but good. I converted it into a proxy-function for one of my projects.

    Do pay attention to the comments, especially these two:

    I had issues with this script (and others) returning 0 for the bytesTotal in flash. Basically, the Content-Length header was absent from the response. By simply adding
    header(“Content-length: “.strlen($response)) before the echo, it resolved the issue. I don’t know if there is a more appropriate fix to account for character encoding, etc, but it seems to work.

    @Schimmi: Well, if you can add some checks there (like who is referring your script) and allow the access to whitelisted clients (served from your domain)… I think, you can totally make it used applications from same-domain….So it would not be open to world. Yeah above script doesn’t have those things.

    Share

    Posted in Links, Notepad, PHP, Programming | No Comments »

    MySQL – Python: good MySQLdb tutorial (examples)

    7th September 2007

    Andy Dustman (used to blog at dustman.net) gave a presentation on Python and MySQL at the MySQL Users Conference 2005, Santa Clara, CA. The presentation is an excellent collection of examples for those who use the MySQLdb Python module.
    Read the rest of this entry »

    Share

    Posted in Links, Notepad, Programming, Python | No Comments »