How to: Exclude a Page from Sitefinity Internal Search

Default Blog Top Image
by Veselin Vasilev Posted on June 20, 2013
The content you're reading is getting on in years.

This post is on the older side and its content may be out of date.
Be sure to visit our blogs homepage for our latest news, updates and information.

As you probably know, in Sitefinity CMS it is easy to disable page indexing from external search crawlers (like Google bot, etc.) by unchecking the "Allow search engines to index this page" property. However, that page will still be indexed by the internal Sitefinity search engine and will appear in the list of search results on your web site.

Use the steps below to gain more control over what pages are indexed automatically by Sitefinity. 

1. In Visual Studio create a class that inherits the PageInboundPipe class from the Telerik.Sitefinity.Publishing.Pipes namespace. Override its LoadPageNodes method:

public class PagePipeNoIndex : PageInboundPipe
{
    protected override IEnumerable<PageNode> LoadPageNodes()
    {
        return base.LoadPageNodes().Where(n => this.CanProcessItem(n));
    }
 
    public override bool CanProcessItem(object item)
    {
        if (item == null)
            return false;
 
        if (item is PageData)
        {
            var pageData = item as PageData;
            if (pageData.NavigationNode.IsBackend)
            {
                return false;
            }
            if (!pageData.Crawlable)
            {
                return false;
            }
        }
 
        if (item is PageNode)
        {
            var pageNode = (PageNode)item;
 
            if (pageNode.IsBackend)
                return false;
 
            if ((pageNode.NodeType != NodeType.Standard && pageNode.NodeType != NodeType.External) || !pageNode.Page.Crawlable)
            {
                return false;
            }
        }
 
        return base.CanProcessItem(item);
    }       
}

This method is invoked every time Sitefinity needs to update its pages' search index (e.g. a new page is created or an old page is updated). It will check the value of the Crawlable property which corresponds to the status of the "Allow search engines to index this page" checkbox and will not add the item to the index if it is unchecked. 

2. Replace the internal page pipe with our custom pipe from above - this is done in Global.asax.cs file as follows:

public class Global : System.Web.HttpApplication
{
    protected void Application_Start(object sender, EventArgs e)
    {
        Bootstrapper.Initialized += Bootstrapper_Initialized;
    }
 
    void Bootstrapper_Initialized(object sender, Telerik.Sitefinity.Data.ExecutedEventArgs e)
    {
        if (e.CommandName == "Bootstrapped")
        {
            ReplacePagePipeWithCustomPagePipe();
        }
    }
 
    private void ReplacePagePipeWithCustomPagePipe()
    {
        //Remove the default page pipe
        PublishingSystemFactory.UnregisterPipe(PageInboundPipe.PipeName);
 
        //This code will add the PagePipeNoIndex to the registered pipes with the original page pipe name
        //so when the publishing system try's to use the page pipe will use the new one
        PublishingSystemFactory.RegisterPipe(PageInboundPipe.PipeName, typeof(PagePipeNoIndex));
    }
...
}

That's it, build the project and from now on if you uncheck the "Allow search engines to index this page" checkbox the page will be hidden from both the external and internal search crawlers.

To learn more about the Publishing system in Sitefinity CMS please check this blog post or the online documentation.

Veselin Vasilev
View all posts from Veselin Vasilev on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
More from the author
Prefooter Dots
Subscribe Icon

Latest Stories in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

Loading animation