I recently had an interesting conversation about classification and search with James Zubok, CFO of Brainware, Inc. Brainware is a Virginia based company that was once part of SER Systems AG, a former German ECM company. Brainware provides products that help companies extract information from unstructured and semi-structured documents, such as invoices, order forms, contracts, etc. without using templates. The company also offers some interesting classification and search technology and this is what our conversation focused on.
We discussed two different, but interrelated technologies that Brainware has developed; one a search engine based on n-grams and another, a classification engine that uses neural networks. Brainware offers both enterprise and desktop editions of each. I received a demo of the desktop version of the products and now have both running on my laptop.
A Search example
On the desktop search side, the product, called Globalbrain Personal Edition, differs from many other search products on the market in that it does not make use of keyword search. Rather, it searches are natural language based, using a patented n-gram approach. When indexing a word, the word is parsed into three parts and then a vector is created. For example, the word sample would be parsed as sam, amp, mpl, etc. According to Brainware, this three-letter snippet approach makes the search engine language independent. The capability provided by Brainware lets users search, not simply on key words, but on whole paragraphs. For example, I have many documents (in various formats) on my desktop that deal with all of the companies I speak with. Say, I want to find some documents relating to specific challenges companies faced in deploying their text analytics solutions. Rather than simply inputting “text analytics” and “challenges”, I can type in a phrase or even a paragraph with the wording I’m looking for. This returns a much more targeted set of documents.
A Classification example
On the desktop classification front, the product is very easy to use. I simply loaded the software which provided me a user interface where I could develop classes and then train my system to automatically classify documents based on a few training examples. As I mentioned, I have many documents on my desktop that deal with various technology areas and I might want to classify them in an intelligent manner for some research I’m planning. So, I created several classes: text analytics, visualization, and MDM. I simply created these classes and then dragged documents that I felt fell into each category onto those classes. I trained the system on these examples.
Brainware provides a visual interface that lets me view how “good” the learned set is via a series of points in three-dimensional space. The closer together the points (from the same class) are on the plot, the better the classification will be. Also, the more separate the various class points are, the better the classification. In my classification test, the visualization and the MDM documents were tightly clustered, while the text analytics information was not. In any event, I then ran the classifier over the rest of my documents (supplying a few parameters) and the system automatically classified what it could. It also gave me a list of documents that it couldn’t classify, but suggested the appropriate categories. I could then just drag those documents into the appropriate categories and run the classifier again. I should add that it did a good job of suggesting the right class for the documents it put in the unclassified category.
Brainware on an enterprise level
The enterprise edition of the product combines the search and classification capabilities and lets users search and classify over 400 different document types.
Now, Brainware isn’t planning to compete with Google, Yahoo!, Fast, etc. Rather, the company sees its search as a complement to these inverted index approaches. The idea would be to embed its search into other applications that deal with archiving, document management, or e-discovery, to name a few. The classification piece could also be embedded into the appropriate applications. I asked if the company was in discussions with content management providers and service providers that store emails and documents. It would seem to me that this software would be a natural complement to some of these systems. My understanding is that the company is looking for partnerships in the area. Brainware currently has a partnership with Attensity, a text analytics provider, to help classify and search documents as part of the text analytics process.
I’m interested to see what will develop with this company.