How to Create a Torrent Tracker, Part 4: Search Engine

September 07, 2009 by Hangman

One of the most important parts of a torrent site is definitely the search engine. It is probably the first thing people are going to use, and a bug free search form will greatly improve user experience. In this tutorial, we are going to implement a Browse section including torrent details columns, paging, and a form which will be used to filter results based on search queries.

Torrents table

As a first step, we build an HTML table containing all of our torrents. I assume you have added the name and size fields in your xbt_files table.

<table>
<tr><th>Date</th><th>Name</th><th>Size</th><th>Seeds</th><th>Leechers</th></tr>

<?php
$q=mysql_query("SELECT name,size,seeders,leechers,ctime FROM xbt_files");
while($t=mysql_fetch_row($q))
{
?>
<tr><td><?php echo date("j M y",$t->ctime) ?></td><td><?php echo $t->name ?></td><td><?php echo $t->size ?></td><td><?php echo $t->seeds ?></td><td><?php echo $t->leechers ?></td></tr>
<?php
}
?>

</table>

This code will show a table listing all the torrent records contained in our database. It will seem good as long as we don’t have more than 25 torrents on our tracker. Otherwise, we are definitely going to need to split our table in different pages. We are going to edit our SQL query using the LIMIT clause and show a navigation bar below the table containing links to pages.

<?php
define(‘ROWS_PER_PAGE’,25);

$page=intval($_GET[‘p’]);
if($page<1) $page=1;
$offset=($page-1)*ROWS_PER_PAGE;

// main query
$q=mysql_query("SELECT SQL_CALC_FOUND_ROWS name,size,seeders,leechers,ctime FROM xbt_files LIMIT $offset,".ROWS_PER_PAGE);

$num_rows=mysql_fetch_row(mysql_query("SELECT FOUND_ROWS()"));
$max_page=ceil($num_rows/ROWS_PER_PAGE);

if($max_page > 0 && $page>$max_page)
        $page=1;

//
// … table code…
//

// build navigation bar code
$nav=;
for($p=1; $p<=$max_page; $p++)
{
        $start=(($p-1)*ROWS_PER_PAGE)+1;
        $end=$p*ROWS_PER_PAGE;
        if($end>$num_rows)
                $end=$num_rows;

        if ($p == $page)
        {
                $nav .= "{$start}{$end} | "; // no need to create a link to current page
        }
        else
        {
                $page_url=‘browse.php?p=’.$p;

                $nav .= ‘<a href="’.$page_url.‘">’."{$start}{$end}".‘</a> | ‘;
        }
}

$nav=rtrim($nav,‘ | ‘);

?>

<?php if(!empty($nav)) : ?>

<div id="page-navigation">
<?php echo $nav ?>
</div>

<?php endif ?>

Note that, after the SELECT query with the LIMIT clause, we make another query using the FOUND_ROWS() function in order to get the number of rows found with the previous query. Then we divide that value by the maximum number of rows per page we want to show, which gives us the number of total pages for this query; using this data, we are able to build a navigation bar.
As an exercise, you could add a sort parameter which will control the order of the results by using the ORDER BY clause.

Simple search form

Now that we have built a table capable of showing torrents, we are ready to implement a way to filter our entries. Here’s the code for our bare bones search form.

<form action="browse.php" method="get">
<input type="text" name="q" />
<input type="submit" value="Search" />
</form>

In our PHP script we are going to handle the q parameter. As a first example, we are going to try the simplest way to filter rows with MySQL, which is using the LIKE clause on the torrent’s name field. Let’s add this code at the beginning of our script:

<?php
$search=$_GET[‘q’];
$search=mysql_real_escape_string($search);
?>

And then we edit our main query:

<?php
$q=mysql_query("SELECT SQL_CALC_FOUND_ROWS name,size,seeders,leechers,ctime FROM xbt_files WHERE name LIKE ‘%".$search."%’ LIMIT $offset,".ROWS_PER_PAGE);
?>

With this code, searching for “pear” will show only torrents that contain the word “pear” in their names. But what if we have a torrent containing “appearance”? It will be shown as well. This is because we put the “%” symbol before and after the search term. If we didn’t do so, our query would have returned only torrents named exactly “pear”, which is definitely something we don’t want.

As we have seen, the LIKE clause is very limited and is not suitable for a real world search engine. So we are going to explore a different way to search in strings: regular expressions.

A step further: RegEx

Regular expressions are a way of identifying strings of text through search patterns written in a formal language. Think of them as a much more advanced version of wildcards. MySQL has a built-in RegEx processor which can be used through the REGEXP function.

In the following snippet we are going to modify our query in order to use regexes for searching. We will also look for spaces in our search keyword and split the words into an array. Additionally, since our query is getting long, it will be split into multiple parts which we’ll eventually concatenate into a new string, so that our code will be a bit easier to read.

<?php

$search=$_GET[‘q’];
$search=mysql_real_escape_string($search);

// split query string into an array of words
$search=explode(‘ ‘,$search);

$sql="SELECT SQL_CALC_FOUND_ROWS name,size,seeders,leechers,ctime FROM xbt_files ";

// where clause
$sql.="WHERE name LIKE ‘%".$search."%’ ";

// regexp

if(!empty($search))
{
        $sql.=‘(’;

        foreach($search as $word)
        {
                $word=trim($word);

                $sql.= "(`name` REGEXP ‘[[:<:]]".$word."[[:>:]]’ OR `description` REGEXP ‘[[:<:]]".$word."[[:>:]]’) AND ";
        }

        $sql=rtrim($sql,‘ AND ‘);
        $sql.=‘)’;
}

// limit
$sql.="LIMIT $offset,".ROWS_PER_PAGE;

$q=mysql_query($sql);

// … rest of code …

?>

This code will basically search for the specified keyword in the torrent’s name and description through MySQL’s regex engine. “[[:” and “[[:>:]]” are used for word boundaries, so that if you are looking for “lane” it won’t match “miscellaneous”.

Word stemming

There’s a new problem. What if our torrent contains the word “domain” and we are searching for “domains”? In order for this case to work, we need to use the stem of the keyword. Fortunately, this is exactly what the Porter Stemmer algorithm does.

The Porter Stemming Algorithm was developed by Martin Porter for reducing English words to their word stems. For example, the word “connections” would be reduced to its stem form “connect.”

In order to use it, first of all we have to include the library in our PHP file; then it’s simply a matter of adding this code before the call to the explode function.

<?php

require_once ‘class.stemmer.inc’;

// …

$stemmer=new Stemmer();
$search=$stemmer->stem_list($search);

?>

Filtering useless stuff

In a real world search engine, we may want to filter things such as symbols and short words.
So, right after the explode call, let’s add this if statement which will exclude words shorter than 3 characters:

<?php

$words=array();
foreach($search as $word)
{
        if(strlen($word)>=3)
                $words[]=$word;
}

$search=$words;

?>

Then we are ready to deal with useless symbols and punctuation. Let’s add this code right after the mysql_real_escape_string call:

<?php

var $symbols = array(‘/’,‘\’,’\,‘"’,‘,’,‘.’,‘<’,‘>’,
‘?’,‘;’,‘:’,‘[',']‘,‘{’,‘}’,‘|’,‘=’,‘+’,‘-’,‘_’,‘)’,‘(’,‘*’,
‘&’,‘^’,‘%’,‘$’,‘#’,‘@’,‘!’,‘~’,‘`’);

for ($i = 0; $i < sizeof($this->symbols); $i++)
{
        $search=str_replace($this->symbols[$i],‘ ‘,$search);
}

?>

Conclusion

And that’s it — we are done with our simple and bare bones search engine. In a real world scenario, you will want to add your own custom things; for example in some cases you may want to extend acronyms (”ff” to “final fantasy” and so on), filter a list of custom words and stuff like that. This is all up to you and it depends on what you need for your specific website.

See you in the next tutorial and happy coding!

Article republished with permission from Valadiléne.org - source here.