Friday, 27 September 2013

Search vs. Recursive Looping: Getting a List of Sites (SPWeb's) a User Has Access to in a SharePoint Site Collection

Introduction

A question about returning all the sites (SPWeb's) "the current user" has access to in a given site collection comes up regularly in the TechNet SharePoint forums. The question usually asked is, "is there a method that returns all the sub-webs" of a site collection that a user has access to, or do we need to recursively loop through each web in the site collection, checking if the user has a specific permission to view the web?

The answer is regularly that you need to loop through the collection of webs (recursively), to determine the list of webs the user has access to.

Depending on the size of a site collection, this can be a very expensive and time consuming operation.

There is another way to achieve this requirement, using Search. This article explores using Search to generate a list of webs a user has access to, examines the performance differences between Search and Looping through collections, as well as some potential pros and cons.

Creating a Webpart to Test the Performance of Both Methods.

To compare the difference in performance and the results produced from each method, we are going to create a test webpart. The webpart is very simple, containing two main methods. One method is used for generating the list of webs by looping (calling SPWeb.GetSubwebsForCurrentUser() on each web), and the other method is using the SharePoint Search infrastructure, via the KeywordSearch class. Each of these methods is wrapped in an SPMonitoredScope block, enabling the performance of the each method to be tracked. The results can be seen in the Developer Dashboard.

The method that uses SPWeb.GetSubwebsForCurrentUser() starts at the root web for the site collection, an traverse down, calling GetSubwebsForCurrentUser() on each child web of the current web, until it finishes enumerating all the webs the current user has access to.

The search query used in the search method, queries the search engine for "ALL sites AND webs WHERE the webapplication hostname STARTS WITH the current sites hostname". You can test out the results of this search query using the SharePoint UI, via a standard Enterprise Search site. The search command would look something similar to this, if you were searching for all sub-sites you had access to on the http://corporate site collection:

(contentclass:STS_SITE OR contentclass:STS_Web) AND sitename:http://corporate

WebPart Code for Testing the Performance of Both Methods


using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Text;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using Microsoft.Office.Server.Search.Administration;
using Microsoft.Office.Server.Search.Query;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Utilities;

namespace SearchVerseLoop.GetTheSitesIHavePermissionsToSee
{
    [ToolboxItemAttribute(false)]
    public class GetTheSitesIHavePermissionsToSee : WebPart
    {
        private Label _sitesFromSearch;
        private Label _sitesFromLooping;

        protected override void CreateChildControls()
        {
            _sitesFromSearch = new Label();
            _sitesFromLooping = new Label();
            Controls.Add(_sitesFromSearch);
            Controls.Add(_sitesFromLooping);
        }

        protected override void OnPreRender(EventArgs e)
        {
            base.OnPreRender(e);
            _sitesFromLooping.Text = GetAllWebs();
        }

        private String GetAllWebs()
        {
            try
            {
                var output = new StringBuilder();
                var websFromLooping = new ArrayList();
                using (new SPMonitoredScope("using a loop"))
                {
                    GetListOfWebs(SPContext.Current.Site.RootWeb, SPContext.Current.Site.RootWeb.GetSubwebsForCurrentUser(), websFromLooping);
                    output.Append(String.Format("<p>There are {0} webs I have access to (retrieved from looping through the rootwebs sub-webs)</p>", websFromLooping.Count));
                    foreach (var web in websFromLooping)
                    {
                        output.Append(String.Format("<span>{0}</span><br/>", web));
                    }
                }

                var websFromSearching = new ArrayList();
                using (new SPMonitoredScope("using search"))
                {
                    GetListOfWebsFromSearch(websFromSearching);
                    output.Append(String.Format("<p>There are {0} webs I have access to (retrieved from search, filtering on the current site)</p>", websFromSearching.Count));
                    foreach (var web in websFromSearching)
                    {
                        output.Append(String.Format("<span>{0}</span><br/>", web));
                    }
                }

                return output.ToString();
            }
            catch (Exception e)
            {
                return e.Message;
            }
        }

        private void GetListOfWebs(SPWeb currentWeb, IEnumerable<SPWeb> webCollection, ArrayList webs)
        {
            webs.Add(currentWeb.Url);
            foreach (SPWeb web in webCollection)
            {
                if (web.GetSubwebsForCurrentUser().Count > 0)
                {
                    GetListOfWebs(web, web.GetSubwebsForCurrentUser(), webs);
                }
                else
                {
                    webs.Add(web.Url);
                }
            }
        }

        private void GetListOfWebsFromSearch(ArrayList webs)
        {
            var ssaProxy = (SearchServiceApplicationProxy)SearchServiceApplicationProxy.GetProxy(SPServiceContext.GetContext(SPContext.Current.Site));
            var keywordQuery = new KeywordQuery(ssaProxy)
                {
                    RowLimit = 500,
                    TrimDuplicates = true,
                    ResultsProvider = SearchProvider.Default
                };
            keywordQuery.SelectProperties.Clear();
            keywordQuery.SelectProperties.Add("Path");
            keywordQuery.ResultTypes |= ResultType.RelevantResults;
            keywordQuery.QueryText = String.Format("(contentclass:STS_SITE OR contentclass:STS_Web) AND sitename:{0}", SPContext.Current.Site.HostName);
            ResultTableCollection searchResults;
            try
            {
                searchResults = keywordQuery.Execute();
            }
            catch (Exception)
            {
                //"Your query is malformed. Please rephrase your query."
                return;
            }

            if (!searchResults.Exists(ResultType.RelevantResults)) return;
            var searchResult = searchResults[ResultType.RelevantResults];
            var results = new DataTable { TableName = "SearchResults" };
            results.Load(searchResult, LoadOption.OverwriteChanges);
            foreach (DataRow dataRow in results.Rows)
            {
                webs.Add(dataRow["Path"]);
            }
        }
    }
}


Using SPMonitoredScope

In the code above we have two main functions that get called during the PreRender event. Both of these functions are wrapped in an SPMonitoredScope, which will enable us to track performance information about each method, namely the time each method takes to produce the list webs the current user has access to.

Using SPMonitoredScope also allows us to see other valuable information, such as the number and type of SQL calls, and expensive object allocations, like SPRequest allocations.

Turning on the Developer Dashboard with PowerShell

To see the results, we need to enable the Developer Dashboard. There is no user interface in SharePoint for enabling the Developer Dashboard, but thankfully, it's easily enabled using PowerShell.

To enable the Developer Dashboard, logon to your (test) SharePoint server, and open the SharePoint Management Shell.

Execute the following commands to enable the Developer Dashboard.
$ds = [Microsoft.SharePoint.Administration.SPWebService]::ContentService.DeveloperDashboardSettings;
$ds.DisplayLevel = 'On';
$ds.Update();


Testing the Example Webpart

To test the performance differences, we will run the following tests.
1. User A, with the webpart on a site collection with 9 sites (all webs are indexed)
2. User A, with the webpart on a site collection with 54 sites (some webs are NOT indexed)
3. User B, who has more restricted permissions than User A, with the webpart on a site collection with 54 sites (some webs are NOT indexed)

Each test will be run three times (by refreshing the page), the results (time taken for each method) will be aggregate to produce an average time.

Test 1 User A, on a site collection with 9 sites (all webs are indexed):

The output from the webpart shows both methods return the same number of sites.


This image shows part of the Developer Dashboard output. Using it, we can see the time taken for various parts of the page to load. The highlighted section shows the time taken to execute the two methods we wrapped in the SPMonitoredScope blocks.
You can see immediately that the search method is much faster, even on a small site collection.


The results from the first page refresh.


The results from the second page refresh.


The results from the third page refresh.



From the five screen shots above, we can see that both methods returned the same number of sites, and the differences in the time taken by each method.

Average time for the Looping Method to generate the result set: 84.3ms (93.68, 90.96, 68.20)
Average time for the Search Method to generate the result set: 23.37ms (25.62, 22.04, 22.45)

From this test, we can already see that using search is considerably faster, even though we are dealing with a small site collection.

Test 2 User A, with the Webpart on a Site Collection with 54 Sites (some Webs are not Indexed):

From the output of the webpart you can see, there is a difference in the number of sites returned. The loop method returns 57 webs, but the search method only returns 54 webs.

The difference in the search results is mainly down to a setting on an SPWeb that controls if the site is included in the search index. That setting, "Allow this site to appear in search results?", is set via the site settings page of a site (SPWeb). For example: The "search center" (http://sneakpreview/searchcenter) is not returned in the result set, as this site is excluded from appearing in search results.

This is one of the caveats of using the search method, and could be seen as either a dis-benefit, or a benefit.


Looking at the Developer Dashboard for this test, we can see that the Search method clearly out-performs the loop based method. Test 2 has approximately 6 times the number of sites to retrieve. Using search, the query takes about 3 times longer than it did in the first example. Using looping, the query takes nearly 10 times longer than it did in the first test. Ouch! 


The results from the first page refresh.


 The results from the second page refresh.


The results from the third page refresh.



From the five screen shots above, we can see that both methods returned approximately the same number of sites, and the different time taken by each method. Looping returns all 57 sites the user has access to, while the Search method returns 54 sites (because some sites are excluded from the Search Index).

Average time for the Looping Method to generate the result set: 803.8ms (806.64, 806.96, 786.25, 815.11)
Average time for the Search Method to generate the result set: 64ms (69.83, 67.64, 60.80, 57.91)

In this case, where we are searching a slightly larger site collection, the differences in performance are very noticeable!

Test 3 User B, who has more Restricted Permissions than User A, with the Webpart on a Site Collection with 54 sites (some Webs are not Indexed):

In this test, we focus on some other performance statistics that are highlighted by the Developer Dashboard.

We can see that this user has access to fewer sites than the user used in the previous test (47 sites, as compared with 57 sites for the user in Test 2). While the results are similar to Test 2 (the Search method returns 3 fewer results than the looping method, and the performance time statistics are similar), we want to look at what else is going on behind the scenes.


If we have a closer look at the Developer Dashboard's output, under the Database Queries, we can that the Looping method (calling SPWeb.GetSubwebsForCurrentUser()) makes two calls to the SQL (proc_ListChildWebsFiltered, and proc_GetTpWebMetaDataAndListMetaData) database for each Web that is checked.


Further down the Developer Dashboard page, we have the SPRequest Allocations listed. Here we can see that the Looping method (which calls SPWeb.GetSubwebsForCurrentUser()) creates an SPRequest allocation for each web that is checked.



Test Summary

From the three tests above, it's clear that the Search method out performs the Looping method, and uses less resources in doing so. This makes the search method more scale-able, both in terms of simultaneous users loading the page, and in terms of how large the site collection can be.

The caveat to the Searching method is that the result set might not include all of the sites a user has access to, if one or more sites has been excluded from the Search Index. This may or may not be a problem, depending on why the sites have been excluded from the search index.

The looping method puts more load on the SharePoint infrastructure, and performance issues are bound to occur as the number of users using the code (or webpart) increases and/or the number of sites in a site collection increases.

Quick Summary of Pros and Cons

Search Pros


  • It's fast
  • It can handle a very large site collection, returning results very quickly

Search Cons


  • If a site has the "Allow this site to appear in search results?" set to No, then true to form, the site won't be returned in the search results. This could be a pro (in some scenarios) or a con.
  • There are limited properties that can returned about an SPWeb object using Search. If you need to query additional properties, for example a property from the SPWeb.Properties collection, you would need to use the looping method.

Loop (iteratively calling GetSubwebsForCurrentUser) Pros


  • You can query additional properties of each SPWeb object as you parse the  collection of webs the user has access to. For example, you could query a custom property from the SPWeb.Properties collection.

Loop (iteratively calling GetSubwebsForCurrentUser) Cons


  • As the number of webs in a site collection increases, the performance becomes a big issue, causing the page to load slower.
  • Make calls to SPWeb.GetSubwebsForCurrentUser() increases load on the SQL server. This could cause a performance problem (albeit, depending on the size of your environment, number of webs in the site collection and the frequency in which the code is called).
  • Creates a lot of SPRequest allocations.


See Also

KeywordQuery
SPWeb.GetSubwebsForCurrentUser()
Using the Developer Dashboard
Using SPMonitoredScope
SPMonitoredScope