Autarchy of the Private Cave

Science, Society, Programming and Hobbies

  • Exits

  • Categories

  • Archives

  • Visitors' track

    Locations of visitors to this page
  • Tags list

  • Earn and spend


  • Exits

  • Ratings

    Science Blogs - Blog Top Sites website monitoring service
  • Archive for the 'Bioinformatics' Category

    Bioinformatics is a general term which refers to using computers and computational/math methods in applications to biology.

    Weird orthology species names in Ensembl

    30th September 2008

    For the COTRASIF tool, I’ve been using the Ensembl Compara database (since release 47) to automatically import into COTRASIF gene orthology mappings.

    However, with the E!50 release, the Compara database was dropped.

    Looking for another option to get orthologs from Ensembl (using martservice, via biomart.org), I tried using the standard query - selecting “Homologs” group on the “Attributes” page for a single species database, and then selecting appropriate second species to get orthology mappings.

    Imagine my surprise, when not only in the interface, but also in the generated XML file I found attribute names like “cow_ensembl_gene” :-O

    I only need 11 species at the moment, and excluding the sufficiently unique name mappings like “zebrafish - danio rerio”, there is a number of questionable mappings: “yeast” for S. cerevisiae (could be S.pombe), “rat” for R. norvegicus (could be R.rattus), “anopheles” for A.gambiae (could be some other Anopheles). Other mappings might be also non-unique, especially for people working with different species of the same genus.

    Am I missing some system in this naming “convention”, or am I the only one who finds it strange?

    Is there a way not to use “common species names” when importing orthology data from Ensembl with the help of martservice?

    Share This

    Posted in Bioinformatics, Science | No Comments »

    The 9th ICSB-2008 in Gothenburg, Sweden

    12th September 2008

    It has been quite a time since my last serious and long post. On the one hand, summer is vacations time - so I’ve been to one in Turkey; on the other hand - the long-awaited ICSB-2008 conference finally took place in Gothenburg, Sweden, on August 23-27 (or 22-28, counting in tutorials and workshops).

    Synopsis: in this post I present a personal-perspective report on the 9th ICSB (with a condensed ICSB-2008 photo-report in my gallery).

    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, Science, Systems Biology | No Comments »

    New bioinformatics term: (high-) throughputomics

    16th July 2008

    Just made it up for convenience, stimulated by reading workshop descriptions for the upcoming ICSB 2008 in Gothenburg, Sweden.

    Here is formal definition:

    high-throughputomics
    the term is used to denote/mention any or all of the modern high-throughput techniques (in all of, but not limited to: genomics, transcriptomics, proteomics, …), together with derived/applicable data-processing approaches. All the “networks” things also conveniently fall into the high-throughputomics definition

    As this is a general term, it might be even suitable as a conference title (but NOT for ICSB, which I’m waiting for eagerly).

    For a shorter and informal (spoken-only) term, putomics can be used:

    putomics
    spoken-only, informal short form of high-throughputomics

    Putomics is also conveniently similar to “computation” (computomics):

    computomics
    application of computer hardware and software for the analysis of massive amounts of data, obtained using high-throughput methods; this is a research sub-field of high-throughputomics

    P.S. :) ;)
    For easier citing:

    Tokovenko, Bogdan. New bioinformatics term: high-throughputomics. 2008-07-16. URL:http://bogdan.org.ua/2008/07/16/new-bioinformatics-term-high-throughputomics.html. Accessed: 2008-07-16. (Archived by WebCite® at http://www.webcitation.org/5ZMD9DITU)

    Share This

    Posted in Bioinformatics, Science | No Comments »

    BGRS-2008 conference in Novosibirsk, Russia

    10th July 2008

    International Conference on Bioinformatics of Genome Regulation and Structure logo… was held on June 22-28, 2008, in Akademgorodok (Novosibirsk), Russia. It was the sixth conference held.

    The International Conference on Bioinformatics of Genome Regulation and Structure is the bi-annual event. It features several bioinformatics sections, which IMO cover most of bioinformatics sub-fields.

    The Sixth conference, BGRS-2008, was well-organized and had something to offer to everyone. By far the largest section was Genomics and Transcriptomics (at least if judging by the abstracts book and by the posters presented; talks given were distributed more equally between sections). As I did some work in genomics (namely, our COTRASIF tool), I had quite a load of info to digest, and many new potentially fruitful contacts to establish (which I did quite good).

    The second section on my scale of priorities was “COMPUTER ANALYSIS AND IMAGE RECOGNITION IN SYSTEMS BIOLOGY”, which had several interesting researches presented in the field of spatial/developmental modelling. There was a very good talk on model reduction (with an actual example) for the purposes of both comparing different models and decreasing the model complexity without sacrificing model-predicted outcomes.

    As for the other sections, I didn’t find them interesting enough. Fortunately, there were social program events scheduled for every day, so I visited the Novosibirsk zoo and the Archaeological museum. I did not do as many pictures as I usually do at conferences/schools, because there were two photographers at the conference, and their photos can be freely seen here and here.

    Most of the conference participants could speak Russian (I’d estimate the group of Russian-speaking participants at 90% of the number of participants), even though they were coming from e.g. Singapore or USA. But the official conference language was English, and the 10% of non-speakers were far not underprivileged, which goes well with the international status of the conference.

    After the conference, there was a BGRS-2008 summer school. As I stayed for some extra days, I managed to attend up to 90% of the school’s events (including the guided tour to Novosibirsk ;) ). For me, summer school was somewhat less useful than the conference, but nevertheless such presentations as on Petri nets and about SABIO-RK/Sycamore were informative and will be used in my future work.

    There were 3 prizes for the student presentations; winners are at the end of the page.

    Certificates were given after successful school completion. As I wasn’t registered, I can now only print out the empty certificate, which is to signify that I did not attend the last day of the school and thus was disqualified ;) .

    Just found that there are also some photos from the organizers.

    There was also a football (soccer) game between the ICG team and the school participants team. I’m a fan of neither watching nor playing football, so I skipped this event altogether.

    Share This

    Posted in Bioinformatics, Misc, Science | No Comments »

    Gene regulatory network reconstruction from microarray data

    20th May 2008

    The title of this post is my current - “forthcome”, as in “done” - field of interest.

    First article on topic: Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data.

    Another one, on combining different high-throughput data sources to get higher-quality results: Uncovering signal transduction networks from high-throughput data by integer linear programming.

    I’m especially interested in time-series network reconstruction algorithms. If you have a good advice to share with a newcomer to the networks field - don’t hesitate :)

    Share This

    Posted in Bioinformatics, Links, Science | No Comments »

    COTRASIF: conservation-aided transcription factor binding site finder

    20th May 2008

    With this post, I’m finally announcing the opening of the (mostly) functional COTRASIF web-tool, created for the genome-wide identification of promoter regulatory sequences (transcription factor binding sits, TFBS). You can learn more from the About and Help pages. For an example of use, see the Supplement page (article is currently being prepared; as soon as it’s ready, I’ll make it available).

    If you are interested - have a look at the News page, where there is information on joining COTRASIF Google group. For non-public enquiries, please use my contact page.

    Note: the problem of identifying eukaryotic transcription factor binding sites stays acute for many years in a row - see e.g. the most recent Eukaryotic transcription factor binding sites - modelling and integrative search methods.

    Share This

    Posted in Bioinformatics, Links, Science, Software, Web | 1 Comment »

    Using Cytoscape from behind an HTTP proxy which requires authentication (authorization)

    14th September 2007

    Cytoscape 2.5.1 supports proxies, including HTTP proxies, but there is no support for HTTP proxies requiring authentication/authorization. It’s easy to use Cytoscape in the authentication-requiring proxy scenario; below is one possible method.

    Note, that exactly the same method can be used to allow any software, which supports proxies but not proxies with authentication, to be able to access the internet.
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, Notepad, Science, Software | No Comments »

    GeneDoc: DNA editing, alignment, analyser and shading software

    25th June 2007

    Full Featured Multiple Sequence Alignment Editor, Analyser and Shading Utility for Windows.

    Small and convenient. Can do sequence alignments (I recommend to limit the length to 2kb for alignments).

    Latest version I found: updated July, 2001, GeneDoc version # 2.6.02.

    Drawback: windows only (but has GNU-licence sources).

    Share This

    Posted in Bioinformatics, Science, Software | No Comments »

    Choosing cell modelling software: Virtual Cell, Cytoscape, CellDesigner, E-Cell

    10th May 2007

    I’m planning to reconstruct (based on literature and some original research) a specific cellular regulatory network. For this I decided to use some specialized biological modelling software. The requirements I had were pretty simple:

    • must have SBML support. SBML appears de-facto standard for biological model notation;
    • must be fairly frequently updated;
    • should be feature-packed and easy to use. However, this requirement can only be checked after some use, and I was pre-selecting, not reviewing.

    Software put into the title of the post was found to be the most mature and interesting from the usage perspective. However, there are more than those mentioned software tools reviewed. Reviews are based primarily on the information from official websites and documentation; some tools (like VirtualCell) are reviewed somewhat more thoroughly.
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, Science, Software | No Comments »

    Sampled Pattern Matching (SPM) definition

    5th May 2007

    “… consider a universal predictor based on pattern matching: Given a sequence Xi,… ,Xn drawn from a stationary mixing source, it predicts the next symbol Xn+i based on selecting a context of Xn+i. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudo random generator algorithm. It predicts the value of the most frequent symbol appearing at the so called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside XiX2 … Xn. In other words, in SPM the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by [Aaron D.] Wyner and Ziv in their 1989 seminal paper.”
    Read the rest of this entry »

    Share This

    Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

    An updated list of systems biology conferences

    4th April 2007

    … regularly updated since 2003: systems biology upcoming conferences

    Here’s the list as of the moment of writing:
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, Science | No Comments »

    Pattern matching and prediction (part 2)

    30th March 2007

    (This series started with Pattern matching and prediction, part 1)

    For part 2, I wanted to start (and probably also end) with Cybula’s AURA (universal pattern matcher, white-paper dated 2004). AURA is said to be built around Correlation Matrix Memory (CMM). CMMs were developed (or picked up for development?) by Prof. Austin, the founder of Cybula, in 1986.

    The white paper tells us that

    The now ubiquitous neural network methods such as Kohonen Networks, Radial Basis Function networks and Kohnen networks all allow users develop good pattern matching systems for small problems, where they excel. However, when the problems grow to large datasets, and where very high performance is needed, they become limited. … The well known k-Nearest Neighbour methods (k-NN) is a relatively good pattern matching method that has been constantly shown to operate well on many problems, however, it suffers from slow operation on large data problems.

    Read the rest of this entry »

    Share This

    Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

    Pattern matching and prediction (part 1)

    19th March 2007

    According to one of the definitions I provided earlier in the descriptive entry-level post on what is artificial intelligence, intelligence can be described as a special pattern-matching algorithm. Evidently, universal and complicated and recurring pattern matcher, but still just a pattern matcher :)

    I decided to find out more about pattern matchers of nowadays… definitely not focusing too much on regular expressions, which are of no interest to me in the light of possible applications.
    Read the rest of this entry »

    Share This

    Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

    An interesting list of -informatics-related conferences

    14th March 2007

    I came across an interesting conferences list. It is for the year 2007, but it appears to be updated and refreshed (based on the availability of conferences-2006 and earlier lists, back to 2004).

    The conference list is divided into several sub-lists:

    • artificial intelligence
    • bioinformatics
    • data mining
    • machine learning
    • medical informatics
    • web informatics

    The list is regularly updated, as it appears from the “deadline:” note for each conference - they appear as soon as deadlines become known.

    I would recommend to anyone who’s interested in the topics listed above to bookmark that page.

    Thanks to Li Guoliang for the list!

    Share This

    Posted in Artificial Intelligence, Bioinformatics, Programming, Science | 1 Comment »

    Terminologies for Gene and Protein Similarity

    2nd February 2007

    Note: this is an excerpt (very slightly edited) from the original article Terminologies for Gene & Protein Similarity by Julius H. Jackson.

    Below the definitions of heterologs, homologs, analogs, paralogs, xenologs and orthologs are provided.
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, Science | No Comments »

    Homology and similarity

    23rd October 2006

    In bioinformatics and biology, the “homology” term is used quite often, and quite often it is mis-used. So what are “homology” and “similarity”, and how can one use these terms correctly?
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, Science | No Comments »

    PFM2PWM: which “nucleotide background frequency” to use

    13th September 2006

    As I previously mentioned, in converting PFM to PWM single variable - [prior] background nucleotide frequency - was ambiguous to me. From other articles I noticed that it is usually set to 0.25 (1/4 - because there 4 nucleotides, thus in “perfectly random” sequence they would appear in 25% of cases each). In that post, I also thought of using “real” background frequency of nucleotides, calculated from the sequence, to which the matrix is to be applied.

    I wrote a program to search all the 1000-basepair upstream sequences from all human, rat and mouse genes, present in Ensembl database release 40 (assuming those 1kb upstreams to be “promoters” of genes). For each promoter, only the single best score was returned. Then I draw a graph of the distribution of the number of promoters (y-axis) depending on the best match scores (x-axis). I did the search twice - one with p(b) = 0.25, and one with p(b) set to calculated values of A/C/G/T content in each promoter.
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics | 1 Comment »

    Position Frequency Matrix to Position Weight Matrix (PFM2PWM)

    11th September 2006

    In the course of my current research, I was dealing with the TFBS (Transcription Factor Binding Sites) search. To perfrom the search, one needs position weight matrix (PWM) for each TFBS. When you refer to the TRANSFAC database of transcription factors (and matrices), you will get position frequency matrix (PFM), and will need to convert PFM into PWM.

    I did find a couple of conversion formulas, but that was quite an effort to figure out which one is correct - I had seen two different formula variations. Here I will share what I had found.
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics | 11 Comments »

    Allow posting duplicate form-name entries with different values

    6th September 2006

    Sometimes, writing automatic HTML forms processors, you need to post several values with the same name of the form field, e.g.:
    collection_gene = str_chrom_name
    collection_gene = gene_stable_id

    This is against the RFC on form fields design and submitting, but this approach is used - for example, by Ensembl. I spent some time to figure out how to make HTTP_Client and HTTP_Request submit multiple ‘name-value’ pairs instead of one (the latest defined, which overrides the previous). The solution is extremely simple:
    Read the rest of this entry »

    Share This

    Posted in Bioinformatics, PHP, Programming, Science | No Comments »

    Avoiding out of memory fatal error when using HTTP_Client or HTTP_Request

    6th September 2006

    If you fetch large amounts of data (e.g. over 2MB per request) using HTTP_Client (or HTTP_request), you may get "out of memory" fatal errors, especially if:

    1. memory_limit is set to default 8M, and
    2. you process multiple pages using single non-reset instance of HTTP_Client object.

    This problem can manifest itself by producing fatal error after a couple of cycles of successful page retrieval - but always, if run with the same parameters, after some constant or only slightly variable number of cycles.

    In my case the problem was that HTTP_Request (a dependancy of HTTP_Client) was holding in memory all the previously fetched pages of the current session (the 'history' feature). To force HTTP_Request to hold only the most recent page, you need to 'disable' history after creating the HTTP_Client or HTTP_Request object instance:

    PHP:
    1. $req = &new HTTP_Client($params, $headers);
    2. // disable history to save memory
    3. $req->enableHistory(false);

    Hope this helps you.

    Share This

    Posted in Bioinformatics, PHP, Programming, Science | No Comments »

     
    Close
    E-mail It