Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    How to truncate git history (sample script included)

    28th March 2011

    Under a few assumptions (most importantly – you do not have any non-merged branches,), it is very easy to throw away git repository commits older than an arbitrarily-chosen commit.

    Here’s a sample script (call it e.g. git-truncate and put into your ~/bin or whichever location you have in PATH).


    #!/bin/bash
    git checkout --orphan temp $1
    git commit -m "Truncated history"
    git rebase --onto temp $1 master
    git branch -D temp
    # The following 2 commands are optional - they keep your git repo in good shape.
    git prune --progress # delete all the objects w/o references
    git gc --aggressive # aggressively collect garbage; may take a lot of time on large repos

    Invocation: cd to your repository, then git-truncate refspec, where refspec is either a commit’s SHA1 hash-id, or a tag.

    Expected result: a git repository starting with “Truncated history” initial commit, and continuing to the tip of the branch you were on when calling the script.

    If you truncate repositories often, then consider adding an optional 2nd argument (truncate-commit message) and also some safeguards against improper use – currently, even if refspec is wrong, the script will not abort after a failed checkout.

    Thanks for posting any improvements you may have.

    Source: Tekkub’s post on github discussions.
    See also: how to remove a single file from all of git’s commits.

    Share

    Posted in how-to, Notepad | 12 Comments »

    How to easily install any PyPi/easy_install python module on Debian

    16th February 2011

    Imagine you need to install pycassa (which uses easy_install). Here are the 2 (at maximum) very simple steps to have it properly debianized and installed on your Debian/Ubuntu:

    • if you don’t have the python-stdeb package: sudo aptitude install python-stdeb
    • pypi-install pycassa

    That’s it.

    Refer to stdeb readme for more information. You will need that if there are dependencies – which might not be resolved automatically by stdeb.

    Before stdeb, it wasn’t exactly trivial to make a .deb from python module.

    Share

    Posted in *nix, how-to, Notepad, Python, Software | 1 Comment »

    How to replace newlines with commas, tabs etc (merge lines)

    16th November 2010

    Imagine you need to get a few lines from a group of files with missing identifier mappings. I have a bunch of files with content similar to this one:

    ENSRNOG00000018677 1368832_at 25233
    ENSRNOG00000002079 1369102_at 25272
    ENSRNOG00000043451 25353
    ENSRNOG00000001527 1388013_at 25408
    ENSRNOG00000007390 1389538_at 25493

    In the example above I need ’25353′, which does not have corresponding affy_probeset_id in the 2nd column.

    It is clear how to do that:

    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}'

    This outputs a column of required IDs (EntrezGene in this example):

    116720
    679845
    309295
    364867
    298220
    298221
    25353

    However, I need these IDs as a comma-separated list, not as newline-separated list.

    There are several ways to achieve the desired result (only the last pipe commands differ):

    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | gawk '$1=$1' ORS=', '
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | tr '\n' ','
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':a;N;$!ba;s/\n/, /g'
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':q;N;s/\n/, /g;t q'
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | paste -s -d ","

    These solutions differ in efficiency and (slightly) in output. sed will read all the input into its buffer to replace newlines with other separators, so it might not be best for large files. tr might be the most efficient, but I haven’t tested that. paste will re-use delimiters, so you cannot really get comma-space “, ” separation with it.

    Sources: linuxquestions 1 (explains used sed commands), linuxquestions 2, nixcraft.

    Share

    Posted in *nix, Bioinformatics, how-to, Notepad, Software | 2 Comments »

    Linux: how to label swap partition w/o losing swap UUID

    16th July 2010

    In short: sudo mkswap -L new_swap_label -U old_swap_UUID /dev/sd_swap_device.
    If you don’t care about the UUID: just sudo mkswap -L new_swap_label /dev/sd_swap_device.

    Step-by-step:
    Read the rest of this entry »

    Share

    Posted in *nix, how-to | No Comments »

    Search and replace in a MySQL table

    27th October 2009

    This query performs a table-wide search-and-repalce:

    UPDATE `table_name` SET `table_field` = REPLACE(`table_field`,’string to search for and replace’,'replacement string’);

    If you need a database-wide search-and-replace, you could try this script (I haven’t tested/used it myself).

    Beware of the following gotchas:

    1. wrong query syntax may ruin the field you are performing replace on, so always backup first!
    2. be sure to provide “search-for” string as specific as possible, or you will get unexpected replacements (e.g. replacing mini with little will also convert all minivans into littlevans); also, do use WHERE clause when necessary to limit the number of rows modified
    3. the function in the example is case-sensitive, so replacing all minivans with vehicles won’t replace Minivans. However, I believe there exists a case-insensitive version of REPLACE function
    Share

    Posted in how-to, Notepad | No Comments »

    Configuring web-server: for production and for development

    25th October 2009

    Production: see http://www.howtoforge.com/how-to-set-up-apache2-with-mod_fcgid-and-php5-on-debian-etch – it is for Debian Etch (which is old-stable), but many of the steps apply equally well to Debian Lenny (current-stable). Also, this is a very basic guide, as if you are going to host multiple sites from multiple clients, you most definitely will need some hosting control panel.

    Development: see http://www.ruzee.com/blog/2009/01/apache-virtual-hosts-a-clean-setup-for-php-developers. This setup works very well, unless you need to create several virtual hosts every day – in which case necessary actions could be partially scripted.

    Share

    Posted in Links, Notepad, PHP, Programming, Software | No Comments »

    C: how to specify comparison operators floating precision

    11th June 2009

    There is no way I’m aware of to do what the title says. However…

    I’m sure that you are aware of the fact that floats representation in any programming language is limited by the precision of the internal binary representations. In other words, you can never have an exact float representation – there will always be some precision associated with the float you are working with. The simplest example is the difference in precision between the float and double types in C.

    Suppose I have the following code fragment:
    [C] if ( result.score >= input->raw_cut_off ) [/C]

    Both result.score and input->raw_cut_off are of type float, and can have positive and negative values. When compared with the greater than or equal ( >= ) operator, it is not always that condition is true – for the precision reasons shortly mentioned above.

    As I already said, there is no precision specification for equality operators in C. But it is quite simple to “invent” precision specification; e.g. if I wanted to test for equality only, I could write
    [C] if ( fabsf( result.score – input->raw_cut_off ) < 0.000001 )[/C] In this example, I'm effectively asking for 6-digit precision for the equality comparison of floating-point values. Note, that if you replace that 0.000001 with the actual precision limit of the floating type you are using, you will be "exactly" comparing floating-point numbers - up to that type's precision, of course :) .

    The first-most example with the >= operator can be rewritten as
    [C] if ( result.score > ( input->raw_cut_off – precision) ) [/C]
    where precision is exactly what it is named, e.g. precision = 0.000001.

    Sources used:

    Share

    Posted in how-to, Programming | No Comments »