====== solr ====== ===== 安装===== - 先下载apache-tomcat-6.0.26.tar.gz http://tomcat.apache.org/download-60.cgi 解压,mv apache-tomcat-6.0.26 /usr/local/ 修改配置文件:vi /usr/local/apache-tomcat-6.0.20/conf/server.xml - http://www.apache.org/dyn/closer.cgi/lucene/solr/ 下载 solr - 假如solr home为 /data/solr cp apache-solr-1.4.0/dist/apache-solr-1.4.0.war /data/solr/dist cp -R apache-solr-1.4.0/example/solr/* /data/solr/ - 在 tomcat/conf/Catalina/localhost 下放配置文件solr.xml ===== 中文分词 ===== * 二元分词: 在/data/solr/conf/schema.xml中加: * 标准分词:汉字会以单字形式存在 * mmseg4j分词: - 下载 [[http://mmseg4j.googlecode.com/files/mmseg4j-1.8.2.zip|mmseg4j-1.8.2.zip]] - 将 mmseg4j-all-1.8.2-with-dic.jar 放到 /data/solr/lib - 将词库放在 /data/solr/dic - 打开README.txt,里面的field type放在 schema.xml ===== solr-multicore配置 ===== - 复制dist/apache-solr-1.4.0.war到tomcat webapps目录 cp /usr/local/src/apache-solr-1.4.0/dist/apache-solr-1.4.0.war /usr/local/apache-tomcat-6.0.20/webapps/ - tomcat/conf/Catalina/localhost目录下写一个solr-cores.xml文件 /data/solr/multicore 内容参考 apache-solr-1.4.0/example/multicore **通过api添加core** * 参考:http://wiki.apache.org/solr/CoreAdmin * 实例:http://61.150.91.x:8180/solr/admin/cores?action=CREATE&name=core5&&instanceDir=core5&config=/data/solr/unotice_xhdata/multicore/solrconfig.xml&schema=/data/solr/unotice_xhdata/multicore/schema.xml ===== 用法 ===== 优化: http://211.100.42.68:8180/solr/update?optimize=true&maxSegments=10&waitFlush=false 提交: http://211.100.42.68:8180/solr/update?commit=true 删除: SP2514N type:bbs 查询 http://211.100.42.68:8180/solr/select?indent=on&version=2.2&q=contents%3A%E6%90%9C%E7%8B%90&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=php&explainOther=&hl=on&hl.fl=title%2Ccontents ===== 相似性问题 ===== * 字段上加 termVectors="true" * 示例 http://61.150.91.179:8180/solr/core6/select/?q=id:2&mlt=true&mlt.fl=title,txt&mlt.mindf=1&mlt.mintf=1&fl=id,score,url,title&%20mlt.match.include=true * 文档 http://wiki.apache.org/solr/MoreLikeThis ===== 常见问题 ===== 1. xml中不能包括 & ,遇到应该替换为& 2. post时一定要加Content-Type: text/xml的header,不然会包400错 PHP $xml = " ... "; $header[] = "Content-Type: text/xml"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_TIMEOUT, 5); curl_setopt($ch, CURLOPT_HTTPHEADER, $header); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $xml); $data = curl_exec($ch); if (curl_errno($ch)) { echo " curl error:", curl_error($ch); } else { curl_close($ch); } python: try: req = urllib2.Request(posturl,data=xml) #req.add_header('Content-Type', 'text/xml') f = urllib2.urlopen(req) print 'read:', f.read() except urllib2.HTTPError, e: print e except Exception, e: print e 3. 简单的示例已经完成了,索引文件(默认)会在 CWD/solr/data/index 目录下,要改为 solr.home/data目录下,在 F:\apache-solr-1.3.0\example\solr\conf\solrconfig.xml 把 dataDir 注释掉,如: ===== 参考资料 ===== * http://wiki.chenlb.com/solr/ * http://www.ibm.com/developerworks/cn/java/j-solr1/ * [[http://www.cnblogs.com/cy163/archive/2009/09/18/1569681.html|solr 使用安装介绍]] * http://www.jwebstar.com.cn/docs/installsolr.html * http://blog.chenlb.com/2009/04/solr-chinese-segment-mmseg4j-use-demo.html * solr查询参数说明:http://blog.chenlb.com/2009/03/solr-query-params-explain.html * 定义自己的solr查询插件 http://blog.chenlb.com/2009/02/use-custom-solr-queryparser.html * solr Multicore http://blog.chenlb.com/2009/01/try-solr-multicore.html