caching

Simple MySQL-Based Sphinx Caching

This is based on some sample caching code I found on the Sphinx forum and works with Sphinx V2.02 and the Sphinx PHP API. Updates I made:

I changed the mysql database to use a varchar for the request, and added an index to it.  The blob had scaling issues almost immediately.  I also added a log to watch the cache at work.  The log file needs to be writable by Apache ( or your web server ) and be located in the sphinx/var/log/ directory.  It currently lacks any cleanup of old or unused cache entries.

My Sphinx setup is using a sharded index performing 20-40 queries per second across ~50M documents at peak times.  This simple MySQL-based caching is being used as a temporary fix to some scaling issues we've encountered on a site called Industry Cortex, a research tool for the manufacturing industry.  It's working pretty well for the time being, and could be a very effective and simple caching solution on the right systems with the right needs.

How to install this:

Replace the entire RunQueries() method in the Sphinx PHP api.  Update the username and password to connect to MySQL using MySQLi on line 27 below.  Create the mysql database ( mine is called `sphinx_cache` ) and create the `sphinx_cache` table ( same name as the database ) with the sql code below.  Then create your log file and give your webserver write permissions on it.  You should now have a working Sphinx cache.

Sphinx API updates:

function RunQueries ()
{

	if ( empty($this->_reqs) )
	{
		$this->_error = "no queries defined, issue AddQuery() first";
		return false;
	}

	// mbstring workaround
	$this->_MBPush ();

	if (!( $fp = $this->_Connect() ))
	{
		$this->_MBPop ();
		return false;
	}

	// send query, get response
	$nreqs = count( $this->_reqs );
	$req = join ( "", $this->_reqs );
	$len = 8 + strlen( $req );
	$req = pack ( "nnNNN", SEARCHD_COMMAND_SEARCH, VER_COMMAND_SEARCH, $len, 0, $nreqs ) . $req; // add header

	// === BEGIN CACHE EDITS ===

	$mysqli = new mysqli( 'localhost', 'username', 'password', 'sphinx_cache' );
	$cache_log = '/data/sphinx/var/log/cache.log';

	// check for query in mysql cache
	$cache_query_start = microtime( true );
	$results = $mysqli->query( "SELECT response FROM sphinx_cache WHERE request='" . $mysqli->real_escape_string( $req ) . "'" );
	$cache_query_end = microtime( true );
	$cache_query_time = $cache_query_end - $cache_query_start;

	if( $results->num_rows > 0 ) { // there is a cached result

		// use cached result
		$row = $results->fetch_row( );
		$response = $row[0];

		// write cache hit to log
		file_put_contents( $cache_log, "  cached query served, $cache_query_time, " . date('Y-m-d h:m:s') . ', ' . $_SERVER['PHP_SELF'] . ', ' . $_SERVER['REMOTE_ADDR'] . "\n", FILE_APPEND );

	} else { // no cached result

		// query sphinx
		$sphinx_query_start = microtime( true );
		if ( !( $this->_Send ( $fp, $req, $len+8 ) ) ||
			 !( $response = $this->_GetResponse ( $fp, VER_COMMAND_SEARCH ) ) )
		{
			$this->_MBPop ( );
			return false;
		}
		$sphinx_query_end = microtime( true );
		$sphinx_query_time = $sphinx_query_end - $sphinx_query_start;

		// store new result into mysql cache
		$mysqli->query( "INSERT INTO sphinx_cache (request, response) VALUES ('" . $mysqli->real_escape_string( $req ) . "','" . $mysqli->real_escape_string( $response ) . "')");

		// write cache miss to log
		file_put_contents( $cache_log, "* sphinx query served, $sphinx_query_time, " . date('Y-m-d h:m:s') . ', ' . $_SERVER['PHP_SELF'] . ', ' . $_SERVER['REMOTE_ADDR'] . "\n", FILE_APPEND );

	}

	// === END CACHE EDITS ===

	// query sent ok; we can reset reqs now
	$this->_reqs = array ();

	// parse and return response
	return $this->_ParseSearchResponse ( $response, $nreqs );

}

SQL code to generate the caching database

My database is called sphinx_cache ... the same as the table.

CREATE TABLE `sphinx_cache` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `request` varchar(5000) NOT NULL DEFAULT '',
  `response` text NOT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `request` (`request`(333))
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;