Solr, SolrCloud, and Solr as a Service

Currently i am working on a project where we started with setting up Solr on VM on premise. Then we moved to Solr as a Service provider which was using Solr in background, and at the end we moved to another Solr as a Service provider which was using SolrCloud in background. All of these terminologies were difficult to understand and it even becomes more complex when you introduce SwitchOnRebuild and Blue/Green indexes.

In this post, we will understand basics of Solr, SolrCloud, and Solr as a Service along with SwitchOnRebuild configurations.

Solr

Solr is an open-source enterprise-search platform, written in Java, from the Apache Lucene project. It provides both a RESTful XML interface, and a JSON API with which search applications can be built.

Solr comes with rich feature sets like:

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Traffic
  • Standards Based Open Interfaces – XML, JSON and HTTP
  • Comprehensive Administration Interfaces
  • Easy Monitoring
  • Flexible and Adaptable with easy configuration
  • Near Real-Time Indexing
  • Extensible Plugin Architecture

Sitecore has come up with native support of Solr along with Lucene as search provider. Please see compatibility table below:

2019-05-24 13_20_29-Solr compatibility table - Sitecore Knowledge Base

You can setup the Solr by following https://doc.sitecore.com/developers/82/sitecore-experience-platform/en/walkthrough–setting-up-solr.html. With Sitecore 9, Solr comes as default search provider so you do not need to worry about configuring Sitecore to work with Solr.

You can also use the script by @jermdavis to download and setup Solr as a windows service – https://gist.github.com/jermdavis/8d8a79f680505f1074153f02f70b9105.

SwitchOnRebuild with Solr

You can set up Solr to rebuild an index in a separate core so that the rebuilding does not affect the search index that is currently used. Once the rebuilding and the optimization of the index completes, Sitecore switches the two cores, and the rebuilt and optimized index is used.

We mostly uses master/web/core indexes in our application to index and fetch data and use that in component to show content on page.

To configure SwitchOnRebuild with Solr, follow below steps:

  • From the Solr server, copy the existing sitecore_master_index folder. Call the copy sitecore_master_index_rebuild
  • Update the sitecore_master_index_rebuild/core.properties file and set name of the new core:name=sitecore_master_index_rebuild
  • Restart Solr and check both primary and secondary cores.
  • To use this implementation, change the type reference on a particular search index to Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, and add the rebuildcore parameter. You can use below patch file for the same:

Note: After you have changed the configuration file, your website uses indexes from the primary core. Each time you initiate a full index rebuild, Sitecore does this in the secondary core. The secondary core then becomes the primary one after the rebuild.

You can copy above patch file on any role without any change. You will have to disable all index strategies except manual on CD server. Please see: https://sitecore.stackexchange.com/questions/3918/patching-to-remove-index-update-strategies.

Though Solr provides great features, it’s sometimes difficult to achieve high availability for the production environment using Solr. SolrCoud comes into the picture as a feature of scaling Solr.

SolrCloud

So, it does not mean if we have Solr configured on cloud it means SolrCloud.

Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability. Called SolrCloud, SolrCloud is designed to provide a highly available, fault tolerant environment for distributing your indexed content and query requests across multiple servers. These capabilities provide distributed indexing and search capabilities, supporting the following features:

  • Central configuration for the entire cluster
  • Automatic load balancing and fail-over for queries
  • ZooKeeper integration for cluster coordination and configuration.

Key SolrCloud Concepts

A SolrCloud cluster consists of some “logical” concepts layered on top of some “physical” concepts.

Logical Concepts
  • A Cluster can host multiple Collections of Solr Documents.
  • A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.
  • The number of Shards that a Collection has determines:
    • The theoretical limit to the number of Documents that Collection can reasonably contain.
    • The amount of parallelization that is possible for an individual search request.
Physical Concepts
  • A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
  • Each Node can host multiple Cores.
  • Each Core in a Cluster is a physical Replica for a logical Shard.
  • Every Replica uses the same configuration specified for the Collection that it is a part of.
  • The number of Replicas that each Shard has determines:
    • The level of redundancy built into the Collection and how fault tolerant the Cluster can be in the event that some Nodes become unavailable.
    • The theoretical limit in the number concurrent search requests that can be processed under heavy load.

Please find below compatibility table of SolrCloud with Sitecore.

2019-05-24 14_35_39-Solr compatibility table - Sitecore Knowledge Base

You can configure SolrCloud by following https://doc.sitecore.com/developers/90/platform-administration-and-architecture/en/walkthrough–setting-up-solrcloud.html or https://sitecore-community.github.io/docs/search/solr/Install-and-configure-SolrCloud/.

Even though SolrCloud provides fault tolerance and high availability, we will still require to setup SwitchOnRebuild to make your site up and running during index rebuild.

SwitchOnRebuild with SolrCloud

SwitchOnRebuild configurations seems simple as we saw with Solr above. But when it comes to SolrCloud, it’s somewhat confusing. This implementation uses Solr aliases instead of collection names.

The mechanism for maintaining and switching two indexes is different when use SolrCloud. The implementation in the SwitchOnRebuildSolrCloudSearchIndex class uses collection aliases: it uses the active alias for search and update operations and the rebuild alias for rebuild operations. When a rebuild operation finishes, the CREATEALIAS command swaps the collections the aliases reference.

You need to create two collections, one is primary and another is Secondary. And also you need to create alias for both collections respectively.

We need to follow these guidelines when we configure Sitecore to use SolrCloud:

  • Only use the SwitchOnRebuildSolrCloudSearchIndex index type in combination with an active index update strategy (index has at least one index update strategy other than manual).
  • Only one Sitecore instance can use the SwitchOnRebuildSolrCloudSearchIndex index type for a particular search index. All other Sitecore instances must use the SolrSearchIndex type, and also use the main alias as the core parameter:
    <index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
      <param desc="core">$(id)MainAlias</param>

It is likely to miss these notes which will cause SwitchOnRebuild to not work properly with SolrCloud.

Sitecore also provides an example config file for SwitchOnRebuild with SolrCloud. If you haven’t checked it. Please see \App_Config\Include\Examples
\Sitecore.ContentSearch.SolrCloud.SwitchOnRebuild.config.example
file.

Also, It’s not mentioned what to do with other indexes where SwitchOnRebuild is not required like sitecore_marketingdefinitions_web. Weather we need to use alias name or not. Please see below configuration file which takes care of configuration changes required per role and other required considerations.

Solr as a Service

So, Solr as a service does not mean it offers SolrCloud only, it can be Solr too.

If you do not have in-house Solr expertise and you don’t want your developers to focus on managing, maintaining, and monitoring search infrastructure, you can use Solr as a service. There are few providers available who provides solr as a service. Few of them are:

Opensolr:

https://opensolr.com – It provides GUI from where you need to choose region and Solr version and with few clicks, your node will be up and running.

Note: Opensolr uses Solr as back-end implementation and not SolrCloud. So, if you are setting up solr using Opensolr, you need to keep this in mind and implement all configurations mentioned in Solr section.

What types of Solr Cloud Hosting Opensolr provide?
https://opensolr.freshdesk.com/support/solutions/articles/33000202124-what-types-of-solr-cloud-hosting-do-you-provide

SearchStax:

https://www.searchstax.com – It is fully managed Solr as a Service that allows you to spend more time building your search application and less time on managing search infrastructure. It also provides full GUI and complete control which we generally have for Solr in developer machine.

Note: SearchStax uses SolrCloud as back-end implementation mostly though it also supports Solr. So, if you are setting up solr using SearchStax, you need to keep in mind and implement configurations mentioned in SolrCloud section. Especially, SwitchOnRebuild.

What types of Solr Cloud Hosting SearchStax provide?
SearchStax’s founder @maggon took part in SUGCON 2019 recently held in Bengaluru, India. You can go through slide World class Solr power in 30 minutes which may answer all your questions.

Advertisements

Coveo for Sitecore Hive – Understanding Placeholders

Currently, i am working on building search capabilities using Coveo. I am really impressed with the tool and capability it gives while keeping things simple and more in Sitecore context.

Why Coveo components are not getting displayed?

After installing and understanding Coveo (Coveo for Sitecore 90 4.1.729.23 with Sitecore 9.0.2), i started building a sample search page. Coveo team has done a great job for on-demand recording of labs.

While building a search page, I started adding required various components in below hierarchy:

  • Body-main (Custom main container placeholder): Coveo search interface
  • UI Content: Modular Frame
  • UI Header: Coveo For Sitecore Analytics
  • Main section: Results Section
  • Results List: Coveo Results List
  • Boddy-bottom: Coveo Search Resources
  • UI Results Footer: Coveo Pager
  • Results Header (using results header extender): Custom rich text component
  • Main Section: Facets Section
  • Results Header: Results Header Section
  • Results Header Section: Results Sorts Section
  • Results Header: Coveo Breadcrumbs
  • Sorts: Coveo Relevancy & Field sorts

For some of the Coveo components, when you add the components on page, it requires a datasource. So, you’re prompted to create a new or select existing datasource. I left most of the properties unchanged as i was just playing around and wasn’t configuring for actual requirements.

1

And each of the datasource has a field for DOM unique Id with value in format coveo8c7b58dc. From name one can understand that whatever id we specify here will be used to uniquely identify particular component in html.

4

Completed adding components and required datasource. Everything was working fine. Results and other components were getting displayed on page until i changed DOM unique id for my search interface component. As shown below:
3

What is the issue?

  • I first and foremost checked browser’s console for any JavaScript related errors or any error while making a request to Coveo rest point. There wasn’t a single error in console. So, All the Coveo resources were added successfully, i was able to see the Coveo rest endpoint request, i was then able to show the results coming from request but nothing was getting displayed on page.
  • After removal of all possibilities, i was checking presentation details of the search page and i found that DOM unique Id of the Coveo Search Interface component was getting used as part of placeholder key. I renamed unique Id later but that didn’t changed placeholder key and thus it wasn’t showing any component added into search interface component as unique Id and placeholder key wasn’t matching.
    5

How to solve it?

I wanted to have unique identifier for search interface component. So, i used that id and changed the placeholder key accordingly as shown below and everything started displaying on page.

6

Important: Whenever you add Coveo components and if the component requires a datasouce, specify meaningful name in DOM unique Id at the time of adding component itself, before you start adding more component inside that to avoid any issue in future.

Some more information about Coveo Placeholders

  • Dynamic placeholder key which is getting generated for Coveo components is different then normal dynamic placeholder keys of Sitecore 9. See the difference:
    • Sitecore 9 default: /body-bottom/footer-container/footer-links-{C00378EF-EA1F-4090-A640-F8BC10403476}-0
    • Coveo: /body-main/coveo-ui-content_dynamic_coveo-global-search-interface/coveo-ui-main-section/coveo-ui-results_dynamic_coveo45196b99
  • I first checked for mvc.getDynamicPlaceholderKeys pipeline. If Coveo has added any custom processor to generate the dynamic placeholder key using that uniqueId. But i didn’t found anything there which i was strongly believing.
    7
  • I looked at one of the view which ships with Coveo installation from path /Views/Coveo Hive/Sections/Results Section.cshtml and found that Coveo is using it’s own extension method for placeholder. CoveoDynamicPlaceholder
    8
  • It’s defined in Coveo.UI.Components.Extensions.SitecoreHelperExtensions
    coveo-extensions-1
  • There are two overloads for CoveoDynamicPlaceholder method
    coveo-extensions-2
  • First method accepts GroupName as well UniqueId. So, for all components where it requires a datasource, DOM unique Id is getting used as UniqueId from the datasource. When i searched for CoveoDynamiclaceholder in all views of Coveo Hive, found two usages of above method.
    • Coveo Search Interface Component
      • @Html.Sitecore().CoveoDynamicPlaceholder(“coveo-ui-content”, @Model.Id)
    • Placeholder Section
      • @Html.Sitecore().CoveoDynamicPlaceholder(@Model.PlaceholderKey)
    • So, do not forgot to give meaningful name for field DOM unique Id/Placeholder Key when you add any of these two components.
  • Rest of the components are using the second method. Where, UniqueId is not being specified by us, but it’s using UniqueId of the rendering.

Implement great search experience using Sitecore and Coveo. Stay tuned for more posts on this.