Currently i am working on a project where we started with setting up Solr on VM on premise. Then we moved to Solr as a Service provider which was using Solr in background, and at the end we moved to another Solr as a Service provider which was using SolrCloud in background. All of these terminologies were difficult to understand and it even becomes more complex when you introduce SwitchOnRebuild and Blue/Green indexes.
In this post, we will understand basics of Solr, SolrCloud, and Solr as a Service along with SwitchOnRebuild configurations.
Solr is an open-source enterprise-search platform, written in Java, from the Apache Lucene project. It provides both a RESTful XML interface, and a JSON API with which search applications can be built.
Solr comes with rich feature sets like:
- Advanced Full-Text Search Capabilities
- Optimized for High Volume Traffic
- Standards Based Open Interfaces – XML, JSON and HTTP
- Comprehensive Administration Interfaces
- Easy Monitoring
- Flexible and Adaptable with easy configuration
- Near Real-Time Indexing
- Extensible Plugin Architecture
Sitecore has come up with native support of Solr along with Lucene as search provider. Please see compatibility table below:
You can setup the Solr by following https://doc.sitecore.com/developers/82/sitecore-experience-platform/en/walkthrough–setting-up-solr.html. With Sitecore 9, Solr comes as default search provider so you do not need to worry about configuring Sitecore to work with Solr.
You can also use the script by @jermdavis to download and setup Solr as a windows service – https://gist.github.com/jermdavis/8d8a79f680505f1074153f02f70b9105.
SwitchOnRebuild with Solr
You can set up Solr to rebuild an index in a separate core so that the rebuilding does not affect the search index that is currently used. Once the rebuilding and the optimization of the index completes, Sitecore switches the two cores, and the rebuilt and optimized index is used.
We mostly uses master/web/core indexes in our application to index and fetch data and use that in component to show content on page.
To configure SwitchOnRebuild with Solr, follow below steps:
- From the Solr server, copy the existing sitecore_master_index folder. Call the copy
- Update the
sitecore_master_index_rebuild/core.propertiesfile and set name of the new core:
- Restart Solr and check both primary and secondary cores.
- To use this implementation, change the type reference on a particular search index to
Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, and add the
rebuildcoreparameter. You can use below patch file for the same:
Note: After you have changed the configuration file, your website uses indexes from the primary core. Each time you initiate a full index rebuild, Sitecore does this in the secondary core. The secondary core then becomes the primary one after the rebuild.
You can copy above patch file on any role without any change. You will have to disable all index strategies except manual on CD server. Please see: https://sitecore.stackexchange.com/questions/3918/patching-to-remove-index-update-strategies.
Though Solr provides great features, it’s sometimes difficult to achieve high availability for the production environment using Solr. SolrCoud comes into the picture as a feature of scaling Solr.
So, it does not mean if we have Solr configured on cloud it means SolrCloud.
Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability. Called SolrCloud, SolrCloud is designed to provide a highly available, fault tolerant environment for distributing your indexed content and query requests across multiple servers. These capabilities provide distributed indexing and search capabilities, supporting the following features:
- Central configuration for the entire cluster
- Automatic load balancing and fail-over for queries
- ZooKeeper integration for cluster coordination and configuration.
Key SolrCloud Concepts
A SolrCloud cluster consists of some “logical” concepts layered on top of some “physical” concepts.
- A Cluster can host multiple Collections of Solr Documents.
- A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.
- The number of Shards that a Collection has determines:
- The theoretical limit to the number of Documents that Collection can reasonably contain.
- The amount of parallelization that is possible for an individual search request.
- A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
- Each Node can host multiple Cores.
- Each Core in a Cluster is a physical Replica for a logical Shard.
- Every Replica uses the same configuration specified for the Collection that it is a part of.
- The number of Replicas that each Shard has determines:
- The level of redundancy built into the Collection and how fault tolerant the Cluster can be in the event that some Nodes become unavailable.
- The theoretical limit in the number concurrent search requests that can be processed under heavy load.
Please find below compatibility table of SolrCloud with Sitecore.
You can configure SolrCloud by following https://doc.sitecore.com/developers/90/platform-administration-and-architecture/en/walkthrough–setting-up-solrcloud.html or https://sitecore-community.github.io/docs/search/solr/Install-and-configure-SolrCloud/.
Even though SolrCloud provides fault tolerance and high availability, we will still require to setup SwitchOnRebuild to make your site up and running during index rebuild.
SwitchOnRebuild with SolrCloud
SwitchOnRebuild configurations seems simple as we saw with Solr above. But when it comes to SolrCloud, it’s somewhat confusing. This implementation uses Solr aliases instead of collection names.
The mechanism for maintaining and switching two indexes is different when use SolrCloud. The implementation in the SwitchOnRebuildSolrCloudSearchIndex class uses collection aliases: it uses the active alias for search and update operations and the rebuild alias for rebuild operations. When a rebuild operation finishes, the CREATEALIAS command swaps the collections the aliases reference.
You need to create two collections, one is primary and another is Secondary. And also you need to create alias for both collections respectively.
We need to follow these guidelines when we configure Sitecore to use SolrCloud:
- Only use the
SwitchOnRebuildSolrCloudSearchIndexindex type in combination with an active index update strategy (index has at least one index update strategy other than manual).
- Only one Sitecore instance can use the
SwitchOnRebuildSolrCloudSearchIndexindex type for a particular search index. All other Sitecore instances must use the
SolrSearchIndextype, and also use the
main aliasas the
<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider"> <param desc="core">$(id)MainAlias</param>
It is likely to miss these notes which will cause SwitchOnRebuild to not work properly with SolrCloud.
Sitecore also provides an example config file for SwitchOnRebuild with SolrCloud. If you haven’t checked it. Please see \App_Config\Include\Examples
Also, It’s not mentioned what to do with other indexes where SwitchOnRebuild is not required like sitecore_marketingdefinitions_web. Weather we need to use alias name or not. Please see below configuration file which takes care of configuration changes required per role and other required considerations.
Solr as a Service
So, Solr as a service does not mean it offers SolrCloud only, it can be Solr too.
If you do not have in-house Solr expertise and you don’t want your developers to focus on managing, maintaining, and monitoring search infrastructure, you can use Solr as a service. There are few providers available who provides solr as a service. Few of them are:
https://opensolr.com – It provides GUI from where you need to choose region and Solr version and with few clicks, your node will be up and running.
Note: Opensolr uses Solr as back-end implementation and not SolrCloud. So, if you are setting up solr using Opensolr, you need to keep this in mind and implement all configurations mentioned in Solr section.
What types of Solr Cloud Hosting Opensolr provide?
https://www.searchstax.com – It is fully managed Solr as a Service that allows you to spend more time building your search application and less time on managing search infrastructure. It also provides full GUI and complete control which we generally have for Solr in developer machine.
Note: SearchStax uses SolrCloud as back-end implementation mostly though it also supports Solr. So, if you are setting up solr using SearchStax, you need to keep in mind and implement configurations mentioned in SolrCloud section. Especially, SwitchOnRebuild.
What types of Solr Cloud Hosting SearchStax provide?
SearchStax’s founder @maggon took part in SUGCON 2019 recently held in Bengaluru, India. You can go through slide World class Solr power in 30 minutes which may answer all your questions.