PHP Classes: Generic Database Driven Search

The following is a simple implementation of a search algorithm for searching website content. It is designed to be a quick and easy, client friendly search module you can implement into almost any database driven website.

Download the Database Driven Search Class .zip file

The class code is a little long to display the code here on the page, so instead I’ll describe the basics and link to a download of it in action with a simple example.

The nice thing about this implementation is that you can structure your database however you like. You just have to specify a query to retrieve the data you want indexed into the search and make sure the search index table is included in your database schema.

You also need to specify the field in your database which uniquely identifies your content records. By default this is expected to be an auto_increment integer in the search index but you cand change this if you need to. This implementation also makes use of my database class to connect to the DB, but feel free to change it if you want.

When you specify your query, the algorithm will pull all the data out and store it as searchable terms with the relevant term rankings to the content. The table you need to include in your DB schema to index your content is as follows:

CREATE TABLE SearchTerms (
ContentID INT NOT NULL ,
SearchTerm VARCHAR(200),
TermCount INT NOT NULL,
PRIMARY KEY(ContentID, SearchTerm)
);

Pretty simple. However to be comprehensive about this, I have also included a couple of extra tables for some additional functionality. The first is a simple search log. You can use the log to see what search queries people are performing on your site, how often and the number of results the search queries are returning.

The second is a mechanism for you to “tag” additional content that may not have been indexed. Although the indexing function will index your content quite well, we all have clients that will ring up and complain that they searched for “x’ and “y’ didn’t come up. The nice thing about tagging is that you can immediately tag the relevant content with whatever terms your client searched for which will immediately be included into the search index.

Instead of having to bugger around with tweaking an algorithm, you can just tag specific content with specific terms to quickly and effectively please your client.

The schema for these extra tables is as follows:

CREATE TABLE SearchLog (
LogID INT NOT NULL AUTO_INCREMENT ,
SearchTerm VARCHAR(250),
SearchWhen DATETIME,
NumResults INT,
IPAddr VARCHAR(60),
PRIMARY KEY(LogID)
);

CREATE TABLE SearchTaggedContent (
ContentID INT NOT NULL ,
SearchTerm VARCHAR(200),
TermCount INT NOT NULL,
PRIMARY KEY(ContentID, SearchTerm)
);

To customize the class to suit your particular setup, you just need to set a few variables at the top of the class, as the following example depicts:

private $min_term_length = 2; //the minimum amount of characters allowed in the 'word' for it to be indexed
private $include_tagged_content = true; //set to true to include manually tagged content in the search index
private $log_searches = true; //set to true to log all searches performed and set to false to stop logging searches

//if you need a multi table search, use this for the search index build query, otherwise comment it out
private $search_build_query = "SELECT p.*, c.category FROM Products p INNER JOIN Categories c ON p.category_id = c.category_id";
//or just specify the table name - the name of the table you want to build the search index from
private $table_name = "products";

//required
private $primary_id = "product_id"; //make this the primary key of the records you want indexed for searching
private $search_fields = array("product_code", "product_name", "producer", "category"); //fill this array with the field names of the content you want to index into the search

Using the class itself is easy once you’re set up. The first thing to do is build the index, which should be done every time you update your content. You don’ necessarily have to but the altered content may not come up in searches index until the index is rebuilt. Simply call build_index() to build the search index. It accepts no parameters.

To run a search query simply pass the entire search string (no manipulation necessary) to the public function perform_search($search_string). The function will return an array of content ids (the content ids you initially specified as uniquely identifying content records) ordered by relevance - the first array element will be the most relevant. An empty array is returned if nothing matched the search query.

The last two functions available are the tag_content and untag_content functions which are provided so you can manually link more search terms to your content and give them a “strength” value which affects the potential rankings. (a strength of 5 is considered very strong, 1 as minimum)

The tag_content function accepts three parameters: an array of content ids to tag, the search term to tag them with, the strength the tag will be indexed at.

the untag_content function accepts two parameters: a content id and a search term to remove from being indexed to that content record.

Hopefully that all makes some kind of sense :)

Comments (0)
+ –