Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for November, 2010

    Beautiful aurora timelapse in HD

    26th November 2010

    Enjoy full-screen.

    Aurora Borealis timelapse HD – Tromsø 2010 from Tor Even Mathisen on Vimeo.

    Share

    Posted in Life, Links, Misc | No Comments »

    How to replace newlines with commas, tabs etc (merge lines)

    16th November 2010

    Imagine you need to get a few lines from a group of files with missing identifier mappings. I have a bunch of files with content similar to this one:

    ENSRNOG00000018677 1368832_at 25233
    ENSRNOG00000002079 1369102_at 25272
    ENSRNOG00000043451 25353
    ENSRNOG00000001527 1388013_at 25408
    ENSRNOG00000007390 1389538_at 25493

    In the example above I need ’25353′, which does not have corresponding affy_probeset_id in the 2nd column.

    It is clear how to do that:

    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}'

    This outputs a column of required IDs (EntrezGene in this example):

    116720
    679845
    309295
    364867
    298220
    298221
    25353

    However, I need these IDs as a comma-separated list, not as newline-separated list.

    There are several ways to achieve the desired result (only the last pipe commands differ):

    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | gawk '$1=$1' ORS=', '
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | tr '\n' ','
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':a;N;$!ba;s/\n/, /g'
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':q;N;s/\n/, /g;t q'
    1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | paste -s -d ","

    These solutions differ in efficiency and (slightly) in output. sed will read all the input into its buffer to replace newlines with other separators, so it might not be best for large files. tr might be the most efficient, but I haven’t tested that. paste will re-use delimiters, so you cannot really get comma-space “, ” separation with it.

    Sources: linuxquestions 1 (explains used sed commands), linuxquestions 2, nixcraft.

    Share

    Posted in *nix, Bioinformatics, how-to, Notepad, Software | 2 Comments »

    How to record Skype calls on Linux: use free Skype Call Recorder

    11th November 2010

    Just came across Skype Call Recorder – an awesome in its functionality+simplicity tool to record skype calls. Highly recommended!

    It worked immediately for me, and default settings are good enough not to bother tweaking. Well, I know that because I did tweak a few to get more nerdiness, but normal people don’t need that.

    SCR download page has packages for Ubuntu, Debian/i386, Xandros, RPM-based distributions, Gentoo – and as its free, you can of course just use the fsource, Luke!

    At the time of writing, a package for Debian/amd64 was not available, but it is really easy to build one.
    Here’s mine: skype-call-recorder-debian_0.8_amd64.deb

    Share

    Posted in *nix, Links, Software | 1 Comment »

    Blatant dewlance.com SEO, thrustvps, and HEAD attacks

    6th November 2010

    Update 4: there are claims that these HEAD-attacks were coming from a malicious dewlance.com customer, and have nothing to do with dewlance itself.

    Noticing weird narrow spikes in server load graph, I decided to investigate the most recent one – at 03:50 GMT+2 on Nov. 6, 2010.

    The reason was simple: someone issued a few hundred HEAD-requests over a 30 second period to a PHP-based web-application.

    All the requests were coming from IP 109.169.59.139, which belongs to the IP range of thrustvps.com:

    inetnum: 109.169.58.0 – 109.169.59.255
    netname: ThrustVPS_1
    descr: Thrust::VPS
    country: US
    admin-c: RF5058-RIPE
    tech-c: RF5058-RIPE
    status: ASSIGNED PA
    mnt-by: RAPIDSWITCH-MNT

    However, it is the referrer string which is more interesting: in all those requests, decorated with varying UserAgents and even operating systems, there was only one referrer – www.dewlance.com.

    Initially I thought that was a test of a new DoS attack – really, who would issue dozens of HEAD requests to the same page over a few seconds? However, after seeing that “referrer” string, I now think this is a cheap, blatant, poor and ugly SEO performed by dewlance. It relies on some sites displaying a box of ‘recent visitors’, sometimes including their referrer URL as a “page where this visitor came from” – this would give dewlance.com some free link-love. Or maybe dewlance.com expects administrators to investigate log files, notice that referrer string, and happily order some services from dewlance? No way :)

    I’ll file a complaint with thrustvps if I see that kind of misbehaviour again. All that started on Nov. 4, so there’s still hope people behind this dumb SEO implementation will get fired.

    Update 1: they do this every 4 hours since November 4, 2010 (Thursday). This results in loads up to 22, with ~50 apache processes struggling for a few CPU cores:
    Read the rest of this entry »

    Share

    Posted in Misc, Web | 8 Comments »

    Overlaying gene expression data onto pathways from databases

    5th November 2010

    Superimposing gene expression data onto pathways from databases is a common task in the final steps of microarray data analysis – that is, biological interpretation and results discussion.

    I have found many tools which claim to facilitate this procedure. Some of them are reviewed below (in no specific order).
    Read the rest of this entry »

    Share

    Posted in Bioinformatics, Links, Software | No Comments »