Started Learning Solr recently. Here are some of the key points(from here):
- Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.
- Its a Information Retrieval Application.
- Index/Query Via HTTP
- Scalability – Efficient Replication To Other Solr Search Servers
- Highly Configurable And User Extensible Caching
Solr Configuration:
- schema.xml: Where the data is described
- solrconfig.xml: Where it is decribed how people can interact with the data
Loading Data:
- Documents can be added, deleted or replaced.
- Message Transport: HTTP POST
- Message Format: XML
- Example:
<add><doc> <field name=”id”>SOLR</field> <field name=”name”>Apache Solr</field></doc></add>
Querying Data:
- Transport Protocol: HTTP GET
- Example:
schema.xml:
Decides option for various fields.
- Is it a number? A string? A date? ●Is there a default value for documents that don’t have one?
- Is it created by combining the values of other fields?
- Is it stored for retrieval?
- Is it indexed? If so is it parsed? If so how?
- Is it a unique identifier?
Fields:
- <field>Describes How You Deal With Specific Named Fields
- Example:
<field name=”title” type=”text” stored=”false” />
Field Type:
- The Underlying Storage Class (FieldType)
- The Analyzer To Use Or Parsing If It Is A Text Field
- Example:
<fieldType name=”sfloat” sortMissingLast=”true” omitNorms=”true” />
Analyzer:
- ‘Analyzer’ Is A Core Lucene Class For Parsing Text
- Example:
<fieldType name=”text_greek” class=”solr.TextField>
<analyzer class=”org.apache.lucene.analysis.el.GreekAnalyzer”/>
</fieldType>
Tokenizers And TokenFilters:
- Analyzers Are Typical Comprised Of Tokenizers And TokenFilters
- Tokenizer: Controls How Your Text Is Tokenized
- TokenFilter: Mutates And Manipulates The Stream Of Tokens
- Solr Lets You Mix And Match Tokenizers and TokenFilters In Your schema.xml To Define Analyzers On The Fly
- Example:
<fieldType name=”text” class=”solr.TextField”> <analyzer type=”index”>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
</analyzer>
<analyzer type=”query”>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
<filter class=”solr.SynonymFilterFactory” synonyms=”synonyms.txt” expand=”true”/>
</analyzer>
solrconfig.xml:
This is where you configure options for how this Solr instance should behave.Low-Level Index Settings
- Performance Settings (Cache Sizes, etc…)
- Types of Updates Allowed
- Types of Queries Allowed
Note: ● solrconfig.xml depends on schema.xml. ● schema.xml does not depend on solrconfig.xml.