NYCPHP Meetup

NYPHP.org

[nycphp-talk] Need About creating search

Rob Marscher rmarscher at beaffinitive.com
Tue Nov 27 16:37:39 EST 2007


On Nov 27, 2007, at 11:21 AM, Mitch Pirtle wrote:
> http://www.sphinxsearch.com/
>
> Basically it is the fastest search tool I have found anywhere on any
> platform. Wicked fast, and pretty decent interfaces for a variety of
> languages as well. Using it on some major projects now with millions
> of content entries submitted by users - this is a great environment
> for huge social sites.

+1.  We're using it here: http://www.heynielsen.com/search/ (sorry for  
the promo... but it seemed that people wanted to see it in action)

What I love about it is the way it uses mysql as the source for the  
index.  All you need to do is setup the sphinx.conf file and set a  
cron to periodically rebuild the indexes.  No other programming  
required to create the indexes.  They already have a php api class  
which is including the server download... so searching it from php is  
simple too.  Documentation on the api could use some help though...  
have to get the details by reading the source... maybe I should  
contribute.

You can search multiple indexes in one Query by separating the indexes  
by a space.  That's not documented as far as I know.  I discovered it  
in the sphinx forums.  You can also just search every index you have  
available in one query.  Here's some code that I use.  I have a main  
index - "mainIndex" - that I reindex once an hour (it indexes over  
100,000 records in a couple seconds and then sends a sighup so that  
the search server reloads the index).  I also have a "delta" index  
that contains only the new entries since the last time the main index  
was reindexed.  I do this every couple minutes and the operation takes  
under a second.  They talk about how to do this in the documentation.   
I also created stemmed and soundex indexes... so if no results were  
found in the regular index, it tries those other indexes next:

$spx = new SphinxClient();
$spx->SetServer($host, $port);
$spx->SetWeights(array(100, 1));
$spx->SetLimits(0, 250);
$spx->SetMatchMode(SPH_MATCH_ALL);
$spx->SetFilter('category', array($category));
$spx->SetSortMode(SPH_SORT_RELEVANCE);
$_rs = $spx->Query($search, 'mainIndex mainIndexDelta');
if (count($_rs['matches']) == 0) {
	// give another try with the soundex index <- love this!! :)
	$_rs = $spx->Query($search, 'mainIndexSoundex mainIndexSoundexDelta');
}
if (count($_rs['matches']) == 0) {
	// still no results?  how about stemming
	$_rs = $spx->Query($search, 'mainIndexStemmed mainIndexStemmedDelta');
}
if (count($_rs['matches']) == 0) {
	// still no results?  how about a different match mode
	$spx->SetMatchMode(SPH_MATCH_ANY);
	$spx->SetLimits(0, 20);
	$_rs = $spx->Query($search, 'mainIndex mainIndexDelta');
}

The search only returns the primary keys for the matched records.  You  
then have to do a separate mysql query to get any extra details... but  
you'd be surprised how fast searching mysql via those primary keys  
is.  You don't need any where clause because your search has already  
been narrowed.

-Rob




More information about the talk mailing list