Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Archive for the 'Bioinformatics' Category

    Bioinformatics is a general term which refers to using computers and computational/math methods in applications to biology.

    Sampled Pattern Matching (SPM) definition

    5th May 2007

    “… consider a universal predictor based on pattern matching: Given a sequence Xi,… ,Xn drawn from a stationary mixing source, it predicts the next symbol Xn+i based on selecting a context of Xn+i. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudo random generator algorithm. It predicts the value of the most frequent symbol appearing at the so called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside XiX2 … Xn. In other words, in SPM the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by [Aaron D.] Wyner and Ziv in their 1989 seminal paper.”
    Read the rest of this entry »

    Share

    Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

    An updated list of systems biology conferences

    4th April 2007

    … regularly updated since 2003: systems biology upcoming conferences

    Here’s the list as of the moment of writing:
    Read the rest of this entry »

    Share

    Posted in Bioinformatics, Science | No Comments »

    Pattern matching and prediction (part 2)

    30th March 2007

    (This series started with Pattern matching and prediction, part 1)

    For part 2, I wanted to start (and probably also end) with Cybula’s AURA (universal pattern matcher, white-paper dated 2004). AURA is said to be built around Correlation Matrix Memory (CMM). CMMs were developed (or picked up for development?) by Prof. Austin, the founder of Cybula, in 1986.

    The white paper tells us that

    The now ubiquitous neural network methods such as Kohonen Networks, Radial Basis Function networks and Kohnen networks all allow users develop good pattern matching systems for small problems, where they excel. However, when the problems grow to large datasets, and where very high performance is needed, they become limited. … The well known k-Nearest Neighbour methods (k-NN) is a relatively good pattern matching method that has been constantly shown to operate well on many problems, however, it suffers from slow operation on large data problems.

    Read the rest of this entry »

    Share

    Posted in Artificial Intelligence, Bioinformatics, Science | 1 Comment »

    Pattern matching and prediction (part 1)

    19th March 2007

    According to one of the definitions I provided earlier in the descriptive entry-level post on what is artificial intelligence, intelligence can be described as a special pattern-matching algorithm. Evidently, universal and complicated and recurring pattern matcher, but still just a pattern matcher :)

    I decided to find out more about pattern matchers of nowadays… definitely not focusing too much on regular expressions, which are of no interest to me in the light of possible applications.
    Read the rest of this entry »

    Share

    Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

    Terminologies for Gene and Protein Similarity

    2nd February 2007

    Note: this is an excerpt (very slightly edited) from the original article Terminologies for Gene & Protein Similarity by Julius H. Jackson.

    Below the definitions of heterologs, homologs, analogs, paralogs, xenologs and orthologs are provided.
    Read the rest of this entry »

    Share

    Posted in Bioinformatics, Science | No Comments »

    Homology and similarity

    23rd October 2006

    In bioinformatics and biology, the “homology” term is used quite often, and quite often it is mis-used. So what are “homology” and “similarity”, and how can one use these terms correctly?
    Read the rest of this entry »

    Share

    Posted in Bioinformatics, Science | No Comments »

    PFM2PWM: which “nucleotide background frequency” to use

    13th September 2006

    As I previously mentioned, in converting PFM to PWM single variable – [prior] background nucleotide frequency – was ambiguous to me. From other articles I noticed that it is usually set to 0.25 (1/4 – because there 4 nucleotides, thus in “perfectly random” sequence they would appear in 25% of cases each). In that post, I also thought of using “real” background frequency of nucleotides, calculated from the sequence, to which the matrix is to be applied.

    I wrote a program to search all the 1000-basepair upstream sequences from all human, rat and mouse genes, present in Ensembl database release 40 (assuming those 1kb upstreams to be “promoters” of genes). For each promoter, only the single best score was returned. Then I draw a graph of the distribution of the number of promoters (y-axis) depending on the best match scores (x-axis). I did the search twice – one with p(b) = 0.25, and one with p(b) set to calculated values of A/C/G/T content in each promoter.
    Read the rest of this entry »

    Share

    Posted in Bioinformatics | 1 Comment »