vrijdag 4 april 2008

Build your own search engine in PHP

Do you know how to build your own search engine in PHP? It is not that hard, because there are already some ready-to-use pieces, like Lucene. You just have to think how you want your search engine to work exactly and build the pieces together.

Image Search Engine Logo

Of course, you can't do in 15 minutes what others took years, but I want to give you just a few hints of how to create your own search functionality. You could use this to index the web, and if you do, let me know so I can stop using Google.

But of course, this stuff can also be used to create your own vertical or horizontal search engine. The magic word is Lucene.

Hey, I'm following livestream of The Next Web, for which day 2 will start at 10:30. So, read on and try writing some cool things, or watch the stream for some inspiration (many startups with short presentations) and get back later to convert your idea to reality.

So what is this about? Recently, I wrote an article of using the Zend Framework in CakePHP. This is pretty easy, and I'm still trying Zend components for use in CakePHP. I wrote a little example site (locally) at which you can login using your openID and view some del.ici.ous stuff.

Yesterday, I tried Zend_Search_Lucene. I just gave it a try, because it triggered my curiosity.

Of course, you first have to integrate the Zend Framework in your CakePHP installation. I assume you already have a controller. In that controller, create a function search:


function search($query = "cake") {
vendor('Zend/Search/Lucene');

if ($query == "build") {
$index = Zend_Search_Lucene::create('/tmp/my-index');

$url = "http://cakephp.agoris.nl/";
$doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($url);
$doc->addField(Zend_Search_Lucene_Field::Text('url', $url));
$index->addDocument($doc);

$i = 1;
foreach($doc->getLinks() as $link) {
$current_doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($url.$link);
$current_doc->addField(Zend_Search_Lucene_Field::Text('url', $url.$link));
echo "{$link}
";
$index->addDocument($current_doc);
$i++;
if ($i >= 10) break;
}
}


$index = Zend_Search_Lucene::open('/tmp/my-index');
$hits = $index->find($query);
$this->set('hits', $hits);

}


Two things, you create the index by calling /controller/search/build. The first URL is opened and added to the index. The first ten links in that page are also analyzed and add to the index.

After you build your index, use it like /controller/search/php.

So, this is a very short example of how to use CakePHP and the Zend Framework to create your own searchable index. You could use it to create your own search engine for the web, your website, or any other application with search functionality.

Want me to help implementing this in your application? Just contact me!

7 reacties:

  1. Nice, when are you going to update your site? :p

    BeantwoordenVerwijderen
  2. First off, I'm a young developer, not especially knowledgeable with PHP.
    You've sparked my curiosity! I have a working cakePHP project which I have been struggling with adding a search feature to. I'd be excited to find a very simple tutorial on how to implement Zend components in cakePHP file structure and to create a simple search on text fields..

    I've tried using this example on my own project and it always returns an empty $results..
    http://bakery.cakephp.org/articles/view/search-feature-to-cakephp-blog-example


    So I'm searching for a good tutorial.. Do you know any good resources? I need to search first_name and last_name fields and allow for misspellings.

    BeantwoordenVerwijderen
  3. Hi kyamry,

    Have you read http://cakephp.agoris.nl/2008/03/20/howto-use-zend-framework-in-cakephp/. It might be helpful.

    I would like to have a look at your project and the problem, is that possible?

    Regards,


    Steven

    BeantwoordenVerwijderen
  4. COMMENT BY PATRICK MC

    Good article. There are lots of products out there that can help build ones own search engine. In addition to the ones you mentioned, I see a few commentators have tried to promote their own.

    In general though, instead of being locked into a proprietary "product", I like to stick with standardized, proven, scripting technologies such as perl, biterScripting, UNIX-Shell. The advantages of using these generalized scripting abilities are as follows.

    1. The scripting languages can run in both real-time and batch modes. They can even run as part of your web search portal and get real time data.
    2. Hiring, training, managing your staff will be easier, since they will be learning and using a generalized scripting language, and not some very specialized "package".
    3. Since you will be developing the scripts, you have proprietary rights over them, and you are building up sellable assets with these scripts over time.
    4. Since you develop the scripts yourself, you have full control over them, and can modify them easily as requirements change with time.
    5. Scripting languages cost way less than correspoding "products".

    Regards.

    Patrick Mc

    BeantwoordenVerwijderen
  5. Hello there~ I need your help in implementing this~ Pls reply me at pearly_yeo90@hotmail.com .. thanks~

    BeantwoordenVerwijderen
  6. PERL? What year is this? Why on this green earth would you code something in perl? This is a cakePHP blog...

    BeantwoordenVerwijderen