Archive for the 'Bioinformatics' Category

Bioinformatics is a general term which refers to using computers and computational/math methods in applications to biology.

Sampled Pattern Matching (SPM) definition

5th May 2007

“… consider a universal predictor based on pattern matching: Given a sequence X_i,… ,X_n drawn from a stationary mixing source, it predicts the next symbol X_n+i based on selecting a context of X_n+i. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudo random generator algorithm. It predicts the value of the most frequent symbol appearing at the so called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X_iX₂ … X_n. In other words, in SPM the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by [Aaron D.] Wyner and Ziv in their 1989 seminal paper.”
Read the rest of this entry »

Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

An updated list of systems biology conferences

4th April 2007

… regularly updated since 2003: systems biology upcoming conferences

Here’s the list as of the moment of writing:
Read the rest of this entry »

Posted in Bioinformatics, Science | No Comments »

Pattern matching and prediction (part 2)

30th March 2007

(This series started with Pattern matching and prediction, part 1)

For part 2, I wanted to start (and probably also end) with Cybula’s AURA (universal pattern matcher, white-paper dated 2004). AURA is said to be built around Correlation Matrix Memory (CMM). CMMs were developed (or picked up for development?) by Prof. Austin, the founder of Cybula, in 1986.

The white paper tells us that

The now ubiquitous neural network methods such as Kohonen Networks, Radial Basis Function networks and Kohnen networks all allow users develop good pattern matching systems for small problems, where they excel. However, when the problems grow to large datasets, and where very high performance is needed, they become limited. … The well known k-Nearest Neighbour methods (k-NN) is a relatively good pattern matching method that has been constantly shown to operate well on many problems, however, it suffers from slow operation on large data problems.

Read the rest of this entry »

Posted in Artificial Intelligence, Bioinformatics, Science | 1 Comment »

Pattern matching and prediction (part 1)

19th March 2007

According to one of the definitions I provided earlier in the descriptive entry-level post on what is artificial intelligence, intelligence can be described as a special pattern-matching algorithm. Evidently, universal and complicated and recurring pattern matcher, but still just a pattern matcher

I decided to find out more about pattern matchers of nowadays… definitely not focusing too much on regular expressions, which are of no interest to me in the light of possible applications.
Read the rest of this entry »

Posted in Artificial Intelligence, Bioinformatics, Science | No Comments »

Terminologies for Gene and Protein Similarity

2nd February 2007

Note: this is an excerpt (very slightly edited) from the original article Terminologies for Gene & Protein Similarity by Julius H. Jackson.

Below the definitions of heterologs, homologs, analogs, paralogs, xenologs and orthologs are provided.
Read the rest of this entry »

Posted in Bioinformatics, Science | No Comments »

Homology and similarity

23rd October 2006

In bioinformatics and biology, the “homology” term is used quite often, and quite often it is mis-used. So what are “homology” and “similarity”, and how can one use these terms correctly?
Read the rest of this entry »

Posted in Bioinformatics, Science | No Comments »

PFM2PWM: which “nucleotide background frequency” to use

13th September 2006

As I previously mentioned, in converting PFM to PWM single variable – [prior] background nucleotide frequency – was ambiguous to me. From other articles I noticed that it is usually set to 0.25 (1/4 – because there 4 nucleotides, thus in “perfectly random” sequence they would appear in 25% of cases each). In that post, I also thought of using “real” background frequency of nucleotides, calculated from the sequence, to which the matrix is to be applied.

I wrote a program to search all the 1000-basepair upstream sequences from all human, rat and mouse genes, present in Ensembl database release 40 (assuming those 1kb upstreams to be “promoters” of genes). For each promoter, only the single best score was returned. Then I draw a graph of the distribution of the number of promoters (y-axis) depending on the best match scores (x-axis). I did the search twice – one with p(b) = 0.25, and one with p(b) set to calculated values of A/C/G/T content in each promoter.
Read the rest of this entry »

Posted in Bioinformatics | 1 Comment »

« Previous Entries

Next Entries »

Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

Categories

Subscribe

Archives

Recent comments

Meta

Archive for the 'Bioinformatics' Category

Sampled Pattern Matching (SPM) definition

An updated list of systems biology conferences

Pattern matching and prediction (part 2)

Pattern matching and prediction (part 1)

Terminologies for Gene and Protein Similarity

Homology and similarity

PFM2PWM: which “nucleotide background frequency” to use

Tiny bits of bioinformatics, [web-]programming etc

Categories

Tags list

Subscribe

Archives

Recent comments

Meta

Archive for the 'Bioinformatics' Category