solr

安装

  1. 假如solr home为 /data/solr
    cp apache-solr-1.4.0/dist/apache-solr-1.4.0.war /data/solr/dist
    cp -R apache-solr-1.4.0/example/solr/* /data/solr/
    
  2. 在 tomcat/conf/Catalina/localhost 下放配置文件solr.xml
    <Context docBase="/data/solr/dist/apache-solr-1.4.0.war" debug="0" crossContext="true" reloadable="true">  
        <Environment name="solr/home" type="java.lang.String" value="/data/solr" override="true" />  
    </Context>
    

中文分词

  • 二元分词: 在/data/solr/conf/schema.xml中加:
        <fieldType name="text_cn" class="solr.TextField" positionIncrementGap="100">
           <analyzer>
             <tokenizer class="solr.CJKTokenizerFactory"/>
           </analyzer>
        </fieldType>
    
  • 标准分词:汉字会以单字形式存在
        <fieldtype name="text" class="solr.TextField">
           <analyzer>
             <tokenizer class="solr.StandardTokenizerFactory"/>
             <filter class="solr.StandardFilterFactory"/>
             <filter class="solr.LowerCaseFilterFactory"/>
             <filter class="solr.StopFilterFactory"/>
             <filter class="solr.PorterStemFilterFactory"/>
           </analyzer>
        </fieldtype>
    
  • mmseg4j分词:
    1. 下载 mmseg4j-1.6.2词库
    2. 将mmseg4j-all-1.6.2.jar 放到 /data/solr/lib
    3. 将词库放在 /data/solr/dic
    4. 打开README.txt,里面的field type放在 schema.xml

solr-multicore配置

  1. 复制dist/apache-solr-1.4.0.war到tomcat webapps目录
    cp /usr/local/src/apache-solr-1.4.0/dist/apache-solr-1.4.0.war /usr/local/apache-tomcat-6.0.20/webapps/
  2. tomcat/conf/Catalina/localhost目录下写一个solr-cores.xml文件
    <Context docBase="/data/solr/dist/apache-solr-1.4.0.war" debug="0" crossContext="true" >  
        <Environment name="solr/home" type="java.lang.String" value="/data/solr/example/multicore" override="true" />  
    </Context>

通过api添加core

用法

优化: http://211.100.42.68:8180/solr/update?optimize=true&maxSegments=10&waitFlush=false

提交: http://211.100.42.68:8180/solr/update?commit=true

删除:

<delete><id>SP2514N</id></delete>
<delete><query>type:bbs</query></delete>

查询 http://211.100.42.68:8180/solr/select?indent=on&version=2.2&q=contents%3A%E6%90%9C%E7%8B%90&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=php&explainOther=&hl=on&hl.fl=title%2Ccontents

相似性问题

  • 字段上加
    termVectors="true"
  • 示例
    http://61.150.91.179:8180/solr/core6/select/?q=id:2&mlt=true&mlt.fl=title,txt&mlt.mindf=1&mlt.mintf=1&fl=id,score,url,title&%20mlt.match.include=true

常见问题

1. xml中不能包括 & ,遇到应该替换为&amp;

2. post时一定要加Content-Type: text/xml的header,不然会包400错 PHP

$xml = "<add><doc> ... </doc></add>";
$header[] = "Content-Type: text/xml";
 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5); 
curl_setopt($ch, CURLOPT_HTTPHEADER, $header); 
 
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $xml);
 
$data = curl_exec($ch);         
if (curl_errno($ch)) {
   echo " curl error:", curl_error($ch);
} else {
   curl_close($ch);
}

python:

try:
    req = urllib2.Request(posturl,data=xml)
    #req.add_header('Content-Type', 'text/xml')
    f = urllib2.urlopen(req)
    print 'read:', f.read()
except urllib2.HTTPError, e:
    print e    
except Exception, e:
    print e

3. 简单的示例已经完成了,索引文件(默认)会在 CWD/solr/data/index 目录下,要改为 solr.home/data目录下,在 F:\apache-solr-1.3.0\example\solr\conf\solrconfig.xml 把 dataDir 注释掉,如:

<!-- 
<dataDir>${solr.data.dir:./solr/data}</dataDir> 
-->  

参考资料

search/solr.txt · 最后更改: 2009/12/31 00:58 由 kenvin
到顶部
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0 红麦软件 红麦软件