Search | Developer Bubble

Started Learning Solr recently. Here are some of the key points(from here):

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.
Its a Information Retrieval Application.
Index/Query Via HTTP
Scalability – Efficient Replication To Other Solr Search Servers
Highly Configurable And User Extensible Caching

Solr Configuration:

schema.xml: Where the data is described
solrconfig.xml: Where it is decribed how people can interact with the data

Loading Data:

Documents can be added, deleted or replaced.
Message Transport: HTTP POST
Message Format: XML
Example:

<add><doc> <field name=”id”>SOLR</field> <field name=”name”>Apache Solr</field></doc></add>

Querying Data:

Transport Protocol: HTTP GET

Example:

http://solr/select?q=electronics

schema.xml:

Decides option for various fields.

Is it a number? A string? A date? ●Is there a default value for documents that don’t have one?
Is it created by combining the values of other fields?
Is it stored for retrieval?
Is it indexed? If so is it parsed? If so how?
Is it a unique identifier?

Fields:

<field>Describes How You Deal With Specific Named Fields
Example:

<field name=”title” type=”text” stored=”false” />

Field Type:

The Underlying Storage Class (FieldType)
The Analyzer To Use Or Parsing If It Is A Text Field
Example:

<fieldType name=”sfloat” sortMissingLast=”true” omitNorms=”true” />

Analyzer:

‘Analyzer’ Is A Core Lucene Class For Parsing Text
Example:

<fieldType name=”text_greek” class=”solr.TextField>
<analyzer class=”org.apache.lucene.analysis.el.GreekAnalyzer”/>
</fieldType>

Tokenizers And TokenFilters:

Analyzers Are Typical Comprised Of Tokenizers And TokenFilters
Tokenizer: Controls How Your Text Is Tokenized
TokenFilter: Mutates And Manipulates The Stream Of Tokens
Solr Lets You Mix And Match Tokenizers and TokenFilters In Your schema.xml To Define Analyzers On The Fly
Example:

<fieldType name=”text” class=”solr.TextField”> <analyzer type=”index”>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
</analyzer>
<analyzer type=”query”>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
<filter class=”solr.SynonymFilterFactory” synonyms=”synonyms.txt” expand=”true”/>
</analyzer>

solrconfig.xml:

This is where you configure options for how this Solr instance should behave.Low-Level Index Settings

Performance Settings (Cache Sizes, etc…)
Types of Updates Allowed
Types of Queries Allowed

Note:  ● solrconfig.xml depends on schema.xml.  ● schema.xml does not depend on solrconfig.xml.

Developer Bubble

Ranting About Software Development!

Category Archives: Search

Solr and Lucene