Pattern matching and prediction (part 1)
19th March 2007
According to one of the definitions I provided earlier in the descriptive entry-level post on what is artificial intelligence, intelligence can be described as a special pattern-matching algorithm. Evidently, universal and complicated and recurring pattern matcher, but still just a pattern matcher
I decided to find out more about pattern matchers of nowadays… definitely not focusing too much on regular expressions, which are of no interest to me in the light of possible applications.
First of all, the links directory: Pattern Matching Pointers is a long list of links, related to combinatorial pattern matching:
Combinatorial Pattern Matching addresses issues of searching and matching strings and more complicated patterns such as trees, regular expressions, graphs, point sets, and arrays. The goal is to derive non-trivial combinatorial properties for such structures and then to exploit these properties in order to achieve improved performance for the corresponding computational problem.
A steady flow of high-quality research on this subject has changed a sparse set of isolated results into a full-fledged area of algorithmics with important applications. This area is expected to grow even further due to the increasing demand for speed and efficiency that comes from molecular biology, but also from areas such as information retrieval, pattern recognition, compiling, data compression, program analysis and security.
(Further I may link to some of the resources already listed in Pattern Matching Pointers.)
The directory mentioned focuses in part on the computational/molecular biology uses of pattern matching. The following sections are represented in the Pointers:
- people who work in the area (last updated in Dec 2006, and no longer maintained)
- a list of presentations and papers from the CPM conferences (1992-2006)
- a list of thematic conferences, at the moment of writing – for the year 2007:
- Jan 3-7: PSB’07 (Wailea, Maui)
- Jan 7-9: SODA’07 (New Orleans, Lousiana)
- Jan 15-17: APCB’07 (Hong Kong)
- Mar 27-29: DCC’07 (Snowbird, UT)
- Apr 1-5: CIBCB’07 (Honolulu, Hawaii)
- Apr 21-25: RECOMB’07 (San Francisco, CA)
- Apr 26-28: SDM’07 (Minneapolis, Minnesota)
- Jul 9-11: CPM’07 (London, Western Ontario)
- Jul 16-19: COCOON’07 (Banff, Alberta)
- Aug 13-17: CBS’07 (San Diego, CA)
- Jul 21-25: ISMB/ECCB’7 (Vienna, Austria)
- Aug 12-15: KDD’07 (San Jose, CA)
- a huge list of resources on the subject:
- bibliographies
- link collections (doh!)
- interest groups
- books
- journals
- proceedings
- software
- news groups and mailing lists
After a brief review, the list appears to be primarily related to regex-like matching, which was a disappointment for me personally. Still, the list is long, and you may find more than I did.
Next thing I checked was the book Bioinformatics: The Machine Learning Approach (MITPress). Chapters 2 through 4 are devoted to machine learning and prediction, and a number of chapters are devoted to HMM and neural networks and even combined HMM-NN approaches. Unfortunately, I didn’t read the book yet (btw, it can be found on the internet in electronic downloadable form); but the contents do look very promising, both for the bioinformaticians and those who just want to learn more about some modern algorithms and approaches.
Next one at hand was “A universal predictor based on pattern matching for mixing sources” (by P. Jacquet , W. Spankowski, I. Apostol). This one is a generic algorithm for predicting the x(n+1) value in the series of x1, x2, …, x(n). I found a PostScript version on the internet, so I assume it’s legal to give you the pdf version of this text. (If you know some restrictions exist, do let me know to remove the PDF version.) The proofs in the article might be hard to follow, but the idea is understood and can be used to the benefit of your specific applications.
(I also came across “A Universal Online Caching Algorithm Based on Pattern Matching“, which may be interesting for browser and web-cache developers )
There’s also an article “Universal lossless compression via multilevel pattern matching”, which is highly similar to the above-mentioned “A universal predictor…”. However, both can be used not only for compression.
Next article I briefly scanned was “A Simple Universal Pattern-Matching Automaton” by Daniel J. Bernstein – so far the best match to what I was looking for. (You may want to read about automata – not to widen the knowledge, but only to find out how this word is used.) I didn’t manage to grasp everything from the first reading, but from what I understood it appears that the automaton described is worth attempting to employ to solve the AI-related tasks in the field of “recognizing” something (or someone).
This is the end of Part 1. Continue to Pattern matching and prediction (part 2).