how-to » Autarchy of the Private Cave

How to truncate git history (sample script included)

28th March 2011

Under a few assumptions (most importantly – you do not have any non-merged branches,), it is very easy to throw away git repository commits older than an arbitrarily-chosen commit.

Here’s a sample script (call it e.g. git-truncate and put into your ~/bin or whichever location you have in PATH).

#!/bin/bash git checkout --orphan temp $1 git commit -m "Truncated history" git rebase --onto temp $1 master git branch -D temp # The following 2 commands are optional - they keep your git repo in good shape. git prune --progress # delete all the objects w/o references git gc --aggressive # aggressively collect garbage; may take a lot of time on large repos

Invocation: cd to your repository, then git-truncate refspec, where refspec is either a commit’s SHA1 hash-id, or a tag.

Expected result: a git repository starting with “Truncated history” initial commit, and continuing to the tip of the branch you were on when calling the script.

If you truncate repositories often, then consider adding an optional 2nd argument (truncate-commit message) and also some safeguards against improper use – currently, even if refspec is wrong, the script will not abort after a failed checkout.

Thanks for posting any improvements you may have.

Source: Tekkub’s post on github discussions.
See also: how to remove a single file from all of git’s commits.

Posted in how-to, Notepad | 12 Comments »

How to easily install any PyPi/easy_install python module on Debian

16th February 2011

Imagine you need to install pycassa (which uses easy_install). Here are the 2 (at maximum) very simple steps to have it properly debianized and installed on your Debian/Ubuntu:

if you don’t have the python-stdeb package: sudo aptitude install python-stdeb
pypi-install pycassa

That’s it.

Refer to stdeb readme for more information. You will need that if there are dependencies – which might not be resolved automatically by stdeb.

Before stdeb, it wasn’t exactly trivial to make a .deb from python module.

Posted in *nix, how-to, Notepad, Python, Software | 1 Comment »

How to replace newlines with commas, tabs etc (merge lines)

16th November 2010

Imagine you need to get a few lines from a group of files with missing identifier mappings. I have a bunch of files with content similar to this one:

ENSRNOG00000018677 1368832_at 25233
ENSRNOG00000002079 1369102_at 25272
ENSRNOG00000043451 25353
ENSRNOG00000001527 1388013_at 25408
ENSRNOG00000007390 1389538_at 25493

In the example above I need ’25353′, which does not have corresponding affy_probeset_id in the 2nd column.

It is clear how to do that:

sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}'

This outputs a column of required IDs (EntrezGene in this example):

116720
679845
309295
364867
298220
298221
25353

However, I need these IDs as a comma-separated list, not as newline-separated list.

There are several ways to achieve the desired result (only the last pipe commands differ):

sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | gawk '$1=$1' ORS=', '

sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | tr '\n' ','

sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':a;N;$!ba;s/\n/, /g'

sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':q;N;s/\n/, /g;t q'

sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | paste -s -d ","

These solutions differ in efficiency and (slightly) in output. sed will read all the input into its buffer to replace newlines with other separators, so it might not be best for large files. tr might be the most efficient, but I haven’t tested that. paste will re-use delimiters, so you cannot really get comma-space “, ” separation with it.

Sources: linuxquestions 1 (explains used sed commands), linuxquestions 2, nixcraft.

Posted in *nix, Bioinformatics, how-to, Notepad, Software | 2 Comments »

Linux: how to label swap partition w/o losing swap UUID

16th July 2010

In short: sudo mkswap -L new_swap_label -U old_swap_UUID /dev/sd_swap_device.
If you don’t care about the UUID: just sudo mkswap -L new_swap_label /dev/sd_swap_device.

Step-by-step:
Read the rest of this entry »

Posted in *nix, how-to | No Comments »

Search and replace in a MySQL table

27th October 2009

This query performs a table-wide search-and-repalce:

UPDATE `table_name` SET `table_field` = REPLACE(`table_field`,’string to search for and replace’,'replacement string’);

If you need a database-wide search-and-replace, you could try this script (I haven’t tested/used it myself).

Beware of the following gotchas:

wrong query syntax may ruin the field you are performing replace on, so always backup first!
be sure to provide “search-for” string as specific as possible, or you will get unexpected replacements (e.g. replacing mini with little will also convert all minivans into littlevans); also, do use WHERE clause when necessary to limit the number of rows modified
the function in the example is case-sensitive, so replacing all minivans with vehicles won’t replace Minivans. However, I believe there exists a case-insensitive version of REPLACE function

Posted in how-to, Notepad | No Comments »

Configuring web-server: for production and for development

25th October 2009

Production: see http://www.howtoforge.com/how-to-set-up-apache2-with-mod_fcgid-and-php5-on-debian-etch – it is for Debian Etch (which is old-stable), but many of the steps apply equally well to Debian Lenny (current-stable). Also, this is a very basic guide, as if you are going to host multiple sites from multiple clients, you most definitely will need some hosting control panel.

Development: see http://www.ruzee.com/blog/2009/01/apache-virtual-hosts-a-clean-setup-for-php-developers. This setup works very well, unless you need to create several virtual hosts every day – in which case necessary actions could be partially scripted.

Posted in Links, Notepad, PHP, Programming, Software | No Comments »

C: how to specify comparison operators floating precision

11th June 2009

There is no way I’m aware of to do what the title says. However…

I’m sure that you are aware of the fact that floats representation in any programming language is limited by the precision of the internal binary representations. In other words, you can never have an exact float representation – there will always be some precision associated with the float you are working with. The simplest example is the difference in precision between the float and double types in C.

Suppose I have the following code fragment:
[C] if ( result.score >= input->raw_cut_off ) [/C]

Both result.score and input->raw_cut_off are of type float, and can have positive and negative values. When compared with the greater than or equal ( >= ) operator, it is not always that condition is true – for the precision reasons shortly mentioned above.

As I already said, there is no precision specification for equality operators in C. But it is quite simple to “invent” precision specification; e.g. if I wanted to test for equality only, I could write
[C] if ( fabsf( result.score – input->raw_cut_off ) < 0.000001 )[/C] In this example, I'm effectively asking for 6-digit precision for the equality comparison of floating-point values. Note, that if you replace that 0.000001 with the actual precision limit of the floating type you are using, you will be "exactly" comparing floating-point numbers - up to that type's precision, of course .

The first-most example with the >= operator can be rewritten as
[C] if ( result.score > ( input->raw_cut_off – precision) ) [/C]
where precision is exactly what it is named, e.g. precision = 0.000001.

Sources used:

comment by Randy A. Ynchausti
scientific programming

Posted in how-to, Programming | No Comments »

« Previous Entries

Next Entries »

Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

Categories

Subscribe

Archives

Recent comments

Meta