The Joomla! Community Portal ™

Thu

13

Oct

2011

Search and Joomla! CMS 2.5
Written by Elin Waring   
Thursday, 13 October 2011 12:45

Search is a central function for any content management system, and the Development Working Group has been working in a Github repository on making search be super powered in Joomla! 2.5.  Michael Babker has taken the lead in organizing this along with Chris Davenport. So far six people have contributed code. The original code for Finder in Joomla! 1.5 was done by Rob Schley, Louis Landry  and Andrew Eddie and distributed by JXtended.

I’ve heard Rob say at several Joomla! Days that Finder is the favorite of the extensions he has worked on, so I asked him why that is.

“Finder is my favorite extension because I appreciate challenging problems. I think most developers would acknowledge that building an indexing search engine that runs completely within a Joomla site is very difficult. Search engines also branch out into a lot of really interesting topics and problems like natural language processing, internationalization, ranking algorithms, etc. The more you learn about search engines, the more humble you become. Compared to major search engines, Finder is a toy. Yet, even though it is a toy, it is still complicated enough to seriously challenge even the most experienced Joomla developers.”

What makes Finder different

The existing Joomla! search uses a simple text search for a matching string and refers the results in the order they are found. Finder replaces this with a model that is more like the famous search engines we all can name in that before a search is ever done it indexes your site content and stores the index in your database (not Big Table … yet.).

I asked Rob what makes Finder so much faster.

“Finder is faster than other search options because it creates an index of the text inside the content. At a high level, Finder knows exactly how many instances of any given word exist within your site’s content. This is useful because it can quickly determine whether or not a search term is relevant and where it exists. It can also use that information to rank the results by relevance, which has been a barrier for search in Joomla for a long time.”

Besides making searches faster indexing allows useful features like suggestions and stemming. Stemming means breaking words down and doing things like word roots. For example, searching for  “breaking” can also give you results with “break” and “breaks”.   It can also rank results based on quality of the match and do things like make the weight given to common words lower.

The Finder Integration Project

I asked Michael some questions about the integration project.

Q: Why did you get involved in the Finder integration project?

A: Actually, it wasn’t even because I use Finder on my website.  I’ve spent some time converting code from being Joomla! 1.5 compatible to working with the current version so I offered to lend a hand in converting the code.  Since much of JXtended’s libraries were incorporated into Joomla! 1.6, the biggest issues have been getting the code converted and finding now unnecessary functions to remove.  Overall, I would say that with the advances in Joomla! 1.6/1.7, the overall code base is a few hundred lines smaller making management easier than before.

Q: What has been the most interesting part of this project so far?

A: Being able to work with several other developers to help make everything work as it did before while making everything as simple as possible.  There are six different people with commits on the project’s GitHub repo, so this has been a collaborative and open process.

Q: How hard will it be for extension developers to create finder plugins?

A: Personally, I hope that creating plugins for third party developers will be as easy as possible.  During the conversion process, the old Finder plugins have all been converted to use standard Joomla! content plugin events (such as onContentChangeState), and where practical, the individual plugins have had their code cleaned up and consolidated.  In converting the plugins, I’ve found the most difficult portion of development being getting the right data from the database.

Q: What do you like about Finder?

A: It adds a lot of additional features to the core of Joomla! and is an advanced search component that has features com_search is lacking in.  As part of the integration, I’ve been working on adding some additional features to the core of Joomla! and making some of Finder’s code reusable by developers.  The first project that I’m working on is rewriting the highlighter functions so that they can be reused in all code.  I’ve also introduced a new plugin event, onCategoryChangeState, that will allow developers to work with items in a category whose state has been changed.

Get Involved

Michael has set up a repository with the CMS and Finder already in place that you can download and install to help with testing or with code. There are definitely still issues, but it’s good  to find even more issues now rather than later. Use the tracker on Github to report anything you find.

Also important for developers is that just as in earlier versions of Joomla! you will need to make plugins to incorporate your extension content in the Finder index. The repository includes plugins for the core extensions. If you follow the patterns in those and your extensions are built with the 11.x framework it should be fairly straightforward to build one for your extensions.

There is a lot more to talk about with Finder, including how it works with utf-8 content (let's just say for now, it's great) and other localization issues and how to use Finder effectively in a site. I’ll have more on this in the future.