Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Pattern matching and prediction (part 2)

    30th March 2007

    (This series started with Pattern matching and prediction, part 1)

    For part 2, I wanted to start (and probably also end) with Cybula’s AURA (universal pattern matcher, white-paper dated 2004). AURA is said to be built around Correlation Matrix Memory (CMM). CMMs were developed (or picked up for development?) by Prof. Austin, the founder of Cybula, in 1986.

    The white paper tells us that

    The now ubiquitous neural network methods such as Kohonen Networks, Radial Basis Function networks and Kohnen networks all allow users develop good pattern matching systems for small problems, where they excel. However, when the problems grow to large datasets, and where very high performance is needed, they become limited. … The well known k-Nearest Neighbour methods (k-NN) is a relatively good pattern matching method that has been constantly shown to operate well on many problems, however, it suffers from slow operation on large data problems.

    The document claims that “The AURA technology has its origins in neural networks but draws upon pattern recognition methods and parallel processing for its fundamental operation“. I’m not a pro in pattern matching and NNs, but I’m aware of the both approaches and how they basically operate; in the light of my knowledge, it appears hard to complement NN with some external pattern recognition methods – just because NN is a pattern recognizing method itself, so it would be stacking two similar methods (which won’t improve performance).

    I was impressed by the list of data types AURA can handle:

    AURA can be applied to almost any data type. Currently, the technology has the following application components:

    • Signal Data (time varying data)
    • Text strings (strings of symbolic data)
    • Document sets
    • Form Data
    • Graphs (applicable to images and multidimensional data)

    I’ll continue citing the original white paper, as it appears to me a valuable resource, and I’m interested in the techniques used in AURA.

    The core of AURA is a storage and retrieval engine based on a Correlation Matrix Memory. This system allows large amounts of data to be saved and retrieved quickly and efficently. Unlike a database, AURA is designed first and foremost to deal with large incomplete data. (…) The power of Cybulas approach is to combine the CMM with methods that prepare the data correctly to get the best out of the network and to use the CMM as a part of a more sophisticated data access system.

    Next, the ability of AURA to process huge loads of data is said to be due to the getting-more-specific scheme, where initial pattern matching is done by approximate methods, following with more accurate/specific, and completing recognition with the most accurate methods. The process is as following:

    1. Data to be matched
    2. Pre-processor
    3. CMM
    4. Back check
    5. Candidate Matches
    6. Final Matches

    Multiple data types are supported in AURA thanks to different-data-types preprocessors, which convert input data into some single data type, which is actually used for further pattern matching. (At most, there are two such data types – to accommodate for graphs with arcs joining nodes. Or it could be the only type, with lists represented as the simplest uni-directional graph.)

    The main emphasis of text searching is that it allows the user to match parts of words, i.e. the data items used by the system are composed of individual letters, rather than whole words as found in the document components.(…) In contrast to the text matching components, the document components use words as the atomic elements of the search, rather than the individual letters.

    AURA is also a versatile pattern classifier, allowing the identification of an unknown item of data. (…) AURA differs from other classification methods in that it allows data to be added at any time to the classifier. No rebuilding of the classifier is required. This allows its use in on line applications where new data is constantly arriving.

    Hmm, is the description too good or AURA is really a kind of “Easy to use, WYSIWYG Universal Pattern Matcher, database and classifier”? All in all, AURA does look an attractive pattern matching solution.

    The technologies mentioned in AURA description need overview as well, so there might be also “part 3″ in the Pattern matching and prediction series. Hopefully, I’ll have time and desire to document my findings.

    Please comment if you happened to use any “universal” pattern-matching tools in your activities (except for usual regexps, of course). This would be a valuable information for me.


    One Response to “Pattern matching and prediction (part 2)”

    1. Pattern matching and prediction (part 1) »Autarchy of the Private Cave Says:

      [...] Pattern Matching (SPM) definition [...]

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>