How to: Exclude a Page from Sitefinity Internal Search

June 20, 2013 Digital Experience, Sitefinity
As you probably know, in Sitefinity CMS it is easy to disable page indexing from external search crawlers (like Google bot, etc.) by unchecking the "Allow search engines to index this page" property. However, that page will still be indexed by the internal Sitefinity search engine and will appear in the list of search results on your web site.

Use the steps below to gain more control over what pages are indexed automatically by Sitefinity. 

1. In Visual Studio create a class that inherits the PageInboundPipe class from the Telerik.Sitefinity.Publishing.Pipes namespace. Override its LoadPageNodes method:

public class PagePipeNoIndex : PageInboundPipe
{
    protected override IEnumerable<PageNode> LoadPageNodes()
    {
        return base.LoadPageNodes().Where(n => this.CanProcessItem(n));
    }
 
    public override bool CanProcessItem(object item)
    {
        if (item == null)
            return false;
 
        if (item is PageData)
        {
            var pageData = item as PageData;
            if (pageData.NavigationNode.IsBackend)
            {
                return false;
            }
            if (!pageData.Crawlable)
            {
                return false;
            }
        }
 
        if (item is PageNode)
        {
            var pageNode = (PageNode)item;
 
            if (pageNode.IsBackend)
                return false;
 
            if ((pageNode.NodeType != NodeType.Standard && pageNode.NodeType != NodeType.External) || !pageNode.Page.Crawlable)
            {
                return false;
            }
        }
 
        return base.CanProcessItem(item);
    }       
}

This method is invoked every time Sitefinity needs to update its pages' search index (e.g. a new page is created or an old page is updated). It will check the value of the Crawlable property which corresponds to the status of the "Allow search engines to index this page" checkbox and will not add the item to the index if it is unchecked. 

2. Replace the internal page pipe with our custom pipe from above - this is done in Global.asax.cs file as follows:

public class Global : System.Web.HttpApplication
{
    protected void Application_Start(object sender, EventArgs e)
    {
        Bootstrapper.Initialized += Bootstrapper_Initialized;
    }
 
    void Bootstrapper_Initialized(object sender, Telerik.Sitefinity.Data.ExecutedEventArgs e)
    {
        if (e.CommandName == "Bootstrapped")
        {
            ReplacePagePipeWithCustomPagePipe();
        }
    }
 
    private void ReplacePagePipeWithCustomPagePipe()
    {
        //Remove the default page pipe
        PublishingSystemFactory.UnregisterPipe(PageInboundPipe.PipeName);
 
        //This code will add the PagePipeNoIndex to the registered pipes with the original page pipe name
        //so when the publishing system try's to use the page pipe will use the new one
        PublishingSystemFactory.RegisterPipe(PageInboundPipe.PipeName, typeof(PagePipeNoIndex));
    }
...
}

That's it, build the project and from now on if you uncheck the "Allow search engines to index this page" checkbox the page will be hidden from both the external and internal search crawlers.

To learn more about the Publishing system in Sitefinity CMS please check this blog post or the online documentation.

Veselin Vasilev