seo

Recommendations for a site search engine?

I need recommendations for a site search engine that does the following:

  1. Spiders your site’s content to X levels deep
  2. Build a database of page titles, urls, and content
  3. Provide a searchable front-end that is easily customizable to match an existing site

I could write my own. It’d just be a simple spider program that spiders through every URL on a site (via wget, curl, or whatever), drops the url and content into a database, slap a fulltext index on the title/content, and matches against that data in a search query. Time is of the essence here, however, and if there’s a way to avoid re-inventing the wheel I’d like to take that route.

I don’t want to use the google API. It takes too long to update it’s index and I’d like the site index to be updated hourly. I’ve already tried phpdig, tsep, and php-crawler. Phpdig was the best of all of them, but their templating system (and php code in general) is horrendous and I’m about ready to give up on it. I’ve also heard mention of Lucene being a great alternative to using mysql fulltext indexes, but I think it’s overkill for what I need.

I’m looking for something that uses php and mysql.

Any suggestions?

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button