Zend Search Lucene

理解

  • zend search lucence 完成的是对英文自动分词后创建索引,然后快速搜索出来
  • 对中文,Zend_Search_Lucene_Analysis_Analyzer_Common 没有做任何处理,需要自定义Zend_Search_Lucene_Analysis_Analyzer_Common方法来实现分词。最简单的方法是二元分词法。 也可以使用一个其他的分词库来解决中文分词问题。让Zend_Search_Lucene 只起到索引的作用。

中文分词实现

中文分词实现的最简单方法:

  1. 设置
     Zend_Search_Lucene_Analysis_Analyzer::setDefault( new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8() ); 
  2. 直接放中文分词后,以空格分开。

英文分词实现

默认支持英文分词。 实例:

创建索引

set_include_path(NW_ZEND_LUCENE_DIR);
require_once 'Zend/Search/Lucene.php';
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
 
$index = new Zend_Search_Lucene($this->index_dir, !is_dir($this->index_dir) );
			
$content = strip_tags($row['description']);
 
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('aid', $row['article_id']));
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('tbl_name', $row['table_name']) );
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('db_name', $row['db_name']) );
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('link', $row['link']) );
$doc->addField(Zend_Search_Lucene_Field::Text('tags', $row['tags'], 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('title', $row['title'], 'utf-8'));
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $content, 'utf-8'));		
$index->addDocument($doc);
 
$index->commit();

查询

 
$index = new Zend_Search_Lucene($this->index_dir, !is_dir($this->index_dir) );
 
echo "doc count: ".$index->count(). "\r\n";
 
$Query = Zend_Search_Lucene_Search_QueryParser::parse($word,'utf-8');         
$hits = $index->find($Query, 'aid', SORT_NUMERIC, SORT_DESC);
 
echo "Search for \"$word\" returned " .count($hits). " hits.\n\n";
 
foreach ($hits as $hit) {
    echo str_repeat('-', 80) . "\n";
    echo 'ID:    ' . $hit->id                     ."\n";
    echo 'Score: ' . sprintf('%.2f', $hit->score) ."\n\n";
 
    foreach ($hit->getDocument()->getFieldNames() as $field) {
	echo "$field: \n";
	echo '    ' . trim( $hit->$field ) . "\n";
    }
}

参考资料

search/zend_search_lucene.txt · 最后更改: 2008/08/07 10:10 由 58.31.68.160
到顶部
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0 红麦软件 红麦软件