Sharing Our Passion for Technology
& Continuous Learning
Open Source Enterprise Search
Has locating information across a multitude of systems on your corporate network finally made you consider an enterprise search appliance?
Our company has a number of systems in place designed to capture corporate knowledge and subject matter expertise. Once it became too time consuming to find information across these systems (and we struck out with demos of search appliances like SearchBlox), we purchased the entry level Google Mini. We’ve been happy with the appliance, but wanted to search more information formats (beyond digest authenticated SSL web pages and SMB shares), authenticate to our central authentication system (not LDAP), and introduce additional security levels. (Word stemming would be nice too!) To avoid the costs of graduating to Google Search Appliances, (and creating an internal Source Allies project to front end the Google Mini XML responses with some custom XSLT), we looked to open source again.
The trend towards enterprise search consolidation (Autonomy acquiring Interwoven for $775M, Microsoft offering $1.2B for FAST) has been interrupted by strong open source Lucene-based products like Nutch and Solr. They have broken the enterprise search market segment wide open again. Nutch provides basic web & file system crawling search appliance functionality and Solr gives us the ability to infused structured data into the same underlying Lucene index.
We decided to implement these technologies into our company network. In our environment, the Nutch and Solr indexes are updated on a regular basis. We use Nutch to index unstructured data such as our intranet, wikis, blogs and subversion document repositories. Solr indexes structured data such as our corporate CRM application – a SugarCRM instance. (Incidentally, we use a separate product called OpenGROK to index our subversion source code repositories). Because both Nutch and Solr are both open source, it was very simple tie them into our single-sign-on system (front-ending them with our CAS server). -- Stay tuned for a follow-up blog highlighting the technical details of our configuration.
Ultimately, Nutch and Solr are going to provide our company with a more flexible enterprise search solution, but the solution is not without its fair share of Lucene/Nutch/Solr expertise to make it all happen. Now that we have commodity cloud computing, Hadoop Map/Reduce, structured and unstructured indexing tools on top of Lucene, I’m anxious to see what the open source community will do next in the enterprise search space. It doesn’t seem to far off to have an appliance that will do the normal Nutch/Goole Mini web and SMB share crawling, but also actively update the index with corporate collaboration (shared email/group chat/social media/RSS/wave protocol/video transcribing/forums/KM systems/custom SQL queries). Of course all of this is currently possible with Solr/Nutch and even Google Mini’s OneBox modules, but who will be the first to make it really easy to setup?