Back to Cleverworkarounds mainpage
 

Jun 28 2013

A lesser known way to fine-tune SharePoint search precision…

Send to Kindle

Hi all

While I’d like to claim credit for the wisdom in this post, alas I cannot. One of Seven Sigma’s consultants (Daniel Wale) worked this one out and I thought that it was blog-worthy. Before I get into the issue and Daniel’s resolution, let me give you a bit of search engine theory 101 with a concept that I find is useful to help understand search optimisation.

Precision vs. recall

Each time a person searches for information, there is an underlying goal or intended outcome. While there has been considerable study of information seeking behaviours in academia and beyond, they boil down to three archetype scenarios.

  1. “I know exactly what I am looking for” – The user has a particular place in mind, either because they visited it in the past or because they assume it exists. This known as known item seeking, but is also referred to as navigational seeking or refinding.
  2. “I’m not sure what I am looking for but I’ll know it when I find it” – This is known as exploratory seeking and the purpose is to find information assumed to be available. This is characterised by
    • - Looking for more than one answer
    • - No expectation of a “right” answer
    • - Open ended
    • - Not necessarily knowing much about what is being looking for
    • - Not being able to articulate what is being looked for
  3. “Gimme gimme gimme!” – A detailed research type search known as exhaustive seeking, leaving no stone unturned in topic exploration. This is characterised by;
    • - Performing multiple searches
    • - Expressing what is being looked for in many ways

Now among other things, each of these scenarios would require different search results to meet the information seeking need. For example: If you know what you are looking for, then you would likely prefer a small, highly accurate set of search results that has the desired result at the top of the list. Conversely if you are performing an exploratory or exhaustive search, you would likely prefer a greater number of results since any of them are potentially relevant to you.

In information retrieval, the terms precision and recall are used to measure search efficiency. Google’s Tim Bray put it well when he said “recall measures how well a search system finds what you want and precision measures how well it weeds out what you do not want”. Sometimes recall is just what the doctor ordered, whereas other times, precision is preferred.

The scenario and the issue…

That said, recently, Seven Sigma worked on a knowledgebase project for a large customer contact centre. The vast majority of the users of the system are customer centre operators who deal directly with all customer enquiries and have worked there for a long time. Thus most of the search behaviours are in the known item seeking category as they know the content pretty well – it is just that there is a lot of it. Additionally, picture yourself as one of those operators and then imagine the frustration a failed or time consuming search with an equally frustrated customer on the end of the phone and a growing queue of frustrated callers waiting their turn. In this scenario, search results need to be as precise as possible.

Thus, we invested a lot of time in the search and navigation experience on this project and that investment paid off as the users were very happy with the new system and particularly happy with the search experience. Additionally, we created a mega menu solution to the current navigation that dynamically builds links from knowledgebase article metadata and a managed metadata term set. This was done via the data view web part, XSLT, JavaScript and Marc’s brilliant SPServices. We were very happy with it because there was no server side code at all, yet it was very easy to administer.

So what was the search related issue? In a nutshell, we forgot that the search crawler doesn’t differentiate between your pages content and items in your custom navigation. As a result, we had an issue where searches did not have adequate precision.

To explain the problem, and the resolution, I’ll take a step back and let Daniel continue the story… Take it away Dan…

The knowledgebase that Paul described above contained thousands of articles, and when the search crawler accessed each article page, it also saw the titles of many other articles in the dynamic menu code embedded in the page. As a result, this content also got indexed. When you think about it, the search crawler can’t tell whether content is real content versus when it is a dynamic menu that drops down/slides out when you hover over the menu entry point. The result was that when users searched for any term that appeared in the mega menu, they would get back thousands of results (a match for every page) even when the “actual content” of the page doesn’t contain any references to the searched term.

There is a simple solution however, for controlling what the SharePoint search crawler indexes and what it ignores. SharePoint knows to exclude content that exists inside of <div> HTML tags that have the class noindex added to them. Eg

<div class=”menu noindex> 
  <ul> 
    <li>Article 1</li> 
    <li>Article 2</li> 
  </ul> 
</div>

There is one really important thing to note however. If your <div class=”noindex”> contains a nested <div> tag that doesn’t contain the noindex class, everything inside of this inner <div> tag will be included by the crawler. For example:

<div class=”menu noindex> 
  <ul> 
    <li>Article 1</li> 

      <div class=”submenu>
        <ul>
          <li>Article 1.1</li>
          <li>Article 1.2</li>
        </ul>
      </div>

    <li>Article 2</li> 
  </ul> 
</div>

In the code above the nested <div> to surround the submenu items does not contain the noindex class. So the text “Article 1.1” and “Article 1.2” will be crawled, while the “Article 1” and “Article 2” text in the parent <div> will still be excluded.

Obviously the example above its greatly simplified and like our solution, your menu is possibly making use of a DataViewWebPart with an XSL transform building it out. It’s inside your XSL where you’ll need to include the <div> with the noindex class because the Web Part will generate its own <div> tags that will encapsulate your menu. (Use the browser Developer Tools and inspect the code that it inserts if you aren’t familiar with the code generated, you’ll find at least one <div> elements that is nested inside any <div class=”noindex”> you put around your web part thinking you were going to stop the custom menu being crawled).

Initially looking around for why our search results were being littered with so many results that seemed irrelevant, I found the way to exclude the custom menu using this method rather easily, I also found a lot of forum posts of people having the same issue but reporting that their use of <div> tags with the noindex class was not working. Some of these posts people had included snippets of their code, each time they had nested <div> tags and were baffled by why their code wasn’t working. I figured most people were having this problem because they simply don’t read the detail in the solutions about the nesting or simply don’t understand that the web part will generate its own HTML into their page and quite likely insert a <div> that surrounds the content they are wanting to hide. As any SharePoint developer quickly finds out a lot of knowledge in SharePoint won’t come from well set out documentation library with lots of code examples that developers get used to with other environments, you need to read blogs (like this one), read forums, talk to colleagues and just build up your own experience until these kinds of gotchas are just known to you. Even the best SharePoint developer can overlook simple things like this and by figuring them out they get that little bit better each time.

Being a SharePoint developer is really about being the master of self-learning, the master of using a search engine to find the knowledge you need and most importantly the master of knowing which information you’re reading is actually going to be helpful and what is going to lead you down the garden path. The MSDN blog post by Mark Arend (http://blogs.msdn.com/b/markarend/archive/2010/06/07/control-search-indexing-crawling-within-a-page-with-noindex.aspx) gives a clear description of the problem and the solution, he also states that it is by design that nested <div> tags are re-evaluated for the noindex class. He also mentions the product team was considering changing this…  did this create the confusion for people or was it that they read the first part of the solution and didn’t read the note about nested <div> tags? In any case it’s a vital bit of the solution that it seems a lot of people overlook still.

In case you are wondering, the built in SharePoint navigation menu’s already have the correct <div> tags with the noindex class surrounding them so they aren’t any concern. This problem only exists if you have inserted your own dynamic menu system.

Other Search Provider Considerations

It is more common that you think that some sites do not just use SharePoint Search. The <div class=”noindex”> is a SharePoint specific filter for excluding content within a page, what if you have a Google Search Appliance crawling your site as well? (Yep… we did in this project)

You’re in luck, the Google documents how to exclude content within a page from their search appliance. There are a few different options but the equivalent blanket ignore of the contents between the <div class=”noindex”> tags would be to encapsulate the section between the following two comments

<!–googleoff: all–>

and

<!–googleon: all–>

If you want to know more about the GSA googleoff/googleon tags and the various options you have here is the documentation: http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html#pagepart

Conclusion

(… and Paul returns to the conversation).

I think Dan has highlighted an easy to overlook implication of custom designing not only navigational content, but really any type of dynamically generated content on a page. While the addition of additional content can make a page itself more intuitive and relevant, consider the implication on the search experience. Since the contextual content will be crawled along with the actual content, sometimes you might end up inadvertently sacrificing precision of search results without realising.

Hope this helps and thanks for reading (and thanks Dan for writing this up)

 

Paul Culmsee

www.sevensigma.com.au

h2bp2013

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Nov 14 2007

SharePoint Branding Part 7 -The ‘governance’ of it all..

Send to Kindle

Well, here we are! After delving into dark arts where everybody but metrosexual web designers fear to tread (HTML and CSS), we then delved into the areas that metrosexual web designers truly fear to tread (packaging, deployment and even some c# code!). Finally, we get to the area where everybody is interested until it happens to get in their way! (Ooh, I am a cynical old sod tonight).

That is Governance!

Continue reading “SharePoint Branding Part 7 -The ‘governance’ of it all..”

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Nov 10 2007

SharePoint Branding Part 6 – A "solution" to all issues?

Send to Kindle

There has been a bit of a gap in this series between part 5 and 6 – and fortunately for the both of us, I think this is the penultimate post in my series on SharePoint branding. While it has been an interesting exercise for me, I must confess each successive article is getting harder to write as my interest is shifting :-)  So many sub-disciplines within MOSS – I think I might delve into WCM soon :-).

Continue reading “SharePoint Branding Part 6 – A "solution" to all issues?”

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Oct 27 2007

SharePoint Branding Part 5 – Feature Improvements and Bugs

Send to Kindle

So, here we are at the fifth article in my series on SharePoint branding. By now, we have left all the master page stuff way behind, and we have created a custom feature to install our branding to a server.

To recap for those of you hitting this page first, I suggest you go back and read this series in order.

  • Part 1 dealt with the publishing feature, and some general masterpage/CSS concepts and some quirks (core.css and application.master) that have to be worked around.
  • Part 2 delved into the methods to work around the application.master and core.css issue
  • Part 3 delved further into the methods to work around the application.master and core.css issue and the option that solved a specific problem for me
  • Part 4 then changed tack and introduced how to package up your clever branding

Continue reading “SharePoint Branding Part 5 – Feature Improvements and Bugs”

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Oct 25 2007

SharePoint Branding Part 4 – Packaging up your masterpiece into a Feature

Send to Kindle

Welcome to the fourth article in my series on SharePoint branding. Sorry for the time it’s taken to get this one out, but a certain game called “The Legend of Zelda:  Hourglass Phantom” on NDS got in the way. I finished it yesterday and only had to cheat via google once :-). Anyway it’s out of my system now so I can get back to this.

After 3 epic articles on all that painful CSS and master page stuff (part one, two and three), we now focus on what you have to do to get your branding masterpiece deployed to the SharePoint farm the clever way. In this next set of articles, we will look at where things should go, and then how to get it there the right way.

Continue reading “SharePoint Branding Part 4 – Packaging up your masterpiece into a Feature”

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Oct 13 2007

SharePoint Branding – How CSS works with master pages – Part 3

Send to Kindle

Welcome to the third article (or is it a manifesto?) in my series on SharePoint branding. In this article, we continue examining methods to incorporate CSS files into master pages for clever branding. In my first article of this topic, I discussed what I think is the main issue with SharePoint branding – APPLICATION.MASTER and CORE.CSS. The previous article to this one, examined 5 methods to deal with the trial and tribulations of APPLICATION.MASTER and CORE.CSS behavior. So, to recap where we got to, let’s re-examine the original scenario and then look at the summary of the 5 different several methods with their relative merits and issues.

The Scenario

Like many organizations, my client had an existing corporate branding standard that was used in a non SharePoint environment and naturally enough, they wanted their SharePoint site to look like this branding. This was for a fully featured intranet/extranet that utilized most of the MOSS2007 features such as

  • Document collaboration
  • Infopath Forms Services
  • Workflow
  • Enterprise Search
  • Excel services
  • Business Data Catalog
  • Custom web parts
  • Event Handlers

It was *not* a public site at all.

Initial investigation soon concluded that we would need a custom master page. DEFAULT.MASTER didn’t quite have the design flexibility that was required. In fact the branding requirements were actually closer to some of the built in master pages such as BLUEGLASS.MASTER, since this was for intranet purposes, particularly collaborative document management, those master pages are unsuitable.

Continue reading “SharePoint Branding – How CSS works with master pages – Part 3″

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Oct 11 2007

SharePoint Branding – How CSS works with master pages – Part 2

Send to Kindle

Jeez if I had realised how long it would take to write these damn articles, I probably wouldn’t have started! In my first article of this topic, I discussed the theory behind master pages, the publishing feature, and what I think is the main issue with SharePoint branding – APPLICATION.MASTER and CORE.CSS. In this article I will now list a branding scenario that I had to deal with, and the various options you can use to deal with the challenges of APPLICATION.MASTER and CORE.CSS

The Scenario

Like many organizations, my client had an existing corporate branding standard that was used in a non SharePoint environment and naturally enough, they wanted their SharePoint site to look like this branding.

This was for a fully featured intranet/extranet that utilized most of the MOSS2007 features such as

  • Document collaboration
  • Infopath Forms Services
  • Workflow
  • Enterprise Search
  • Excel services
  • Business Data Catalog
  • Custom web parts
  • Event Handlers

It was *not* a public site at all.

Initial investigation soon concluded that we would need a custom master page. DEFAULT.MASTER didn’t quite have the design flexibility that was required. In fact the branding requirements were actually closer to some of the built in master pages such as BLUEGLASS.MASTER, since this was for intranet purposes, particularly collaborative document management, those master pages are unsuitable. (I will explain why soon).

Continue reading “SharePoint Branding – How CSS works with master pages – Part 2″

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Oct 08 2007

SharePoint Branding – How CSS works with master pages – Part 1

Send to Kindle

This is version 2 of this article, after I went and accidentally blew away my first masterpiece that took literally days to write. If this has ever happened to you, don’t you hate it, that your second version is never as good as your first?

Quick Links: [Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7]

Anyway, this is (attempt 2 of) part 1 of a series of articles that cover SharePoint branding in some detail. Kudos has to be given Heather Solomon especially for her wonderful site and articles on this subject (and author of the book to your left :). In Addition, Andrew Connell and Mike have done some great work that helped me in this area.

So, SharePoint branding was the catalyst behind my deciding to make a blog and call it cleverworkarounds. The whole experience at times made me want to change careers, but I ultimately got there. I would go down one path, only to be stumped by a problem, and think I have the solution, only to find another quirk that needed another workaround. So the aim of cleverworkarounds is to determine the least dodgy way to implement branding of a SharePoint installation. No doubt people will disagree with the conclusions I’ve reached, but that’s expected since the cleverness of a workaround really depends on your needs.

 

So this series of articles will cover a few issues. First some basic master page theory, then I will talk about the difference between branding in WSS3 (the freebie version) and MOSS07 (the pricey one). I will then take you through the quirks of CSS and master pages. Subsequent articles will get into the details of branding techniques and finally finish off by covering the governance issues surrounding branding.

Continue reading “SharePoint Branding – How CSS works with master pages – Part 1″

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Oct 08 2007

A simple example of a SharePoint “feature”

Send to Kindle


If you check my introductory post, I discussed the concept of SharePoint features in a real world scenario. In this post I actually show an example of features from end to end, to illustrate that scenario.

So, first to recap the scenario, slightly simplified from my other post: A webdesigner has a new CSS file that is the new corporate branding standard. They must package it up as a ‘feature’. A misunderstood sysadmin nazi installs the feature onto the SharePoint farm once only, then sends a mail to the 35 site collection owners advising them to activate the "branding feature" on their sites.  Each site collection owner who does so has the identical configuration modified so it is all nice and consistent.

Now also before we start, this demo requires the "Office SharePoint Server Publishing Infrastructure" feature to be enabled. If this is not enabled, the "Style Library" document library that we rely on, will not exist.

Step 1: Create the Feature

Our web designer of course has a development box so they can’t kill production. On this box they navigate to the

C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES

and create a new folder called CustomBranding

C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES\CustomBranding

Inside this folder we place our CSS file (in this case, called CustomBrand.CSS) and we create a file called FEATURE.XML.

Now in real life, you will copy a FEATURE.XML from one of the many other features here and work off that. But in our case, we will just type it in. The contents of FEATURE.XML is this:

<Feature Id="01c34560-6561-11dc-8314-0800200c9a66″   
Title="Pimp my SharePoint"   
Description="This is a feature that adds a new sexy CSS"   
Version="1.0.0.0″   
Scope="Site"   
xmlns="
http://schemas.microsoft.com/sharepoint/">   
<ElementManifests>        
    <ElementManifest Location="ProvisionedFiles.xml"/>   
</ElementManifests>
</Feature>

So we have a <feature> element and inside that an <elementmanifests> element. The required parameters for the <feature> element are below (lifted straight from MSDN)

Attribute Description
Description Optional String. Returns a longer representation of what the Feature does.
Id Required Text. Contains the globally unique identifier (GUID) for the Feature.
Scope Required Text. Can contain one of the following values: Farm (farm), WebApplication (Web application), Site (site collection), Web (Web site).
Title Optional Text. Returns the title of the Feature. Limited to 255 characters.
Version Optional Text. Specifies a System.Version-compliant representation of the version of a Feature. This can be up to four numbers delimited by decimals that represent a version.

So the first thing to do is generate a GUID. You can do this a number of ways, but I typically use an online generator like the one here to do it.

My GUID from the online generator is:  01c34560-6561-11dc-8314-0800200c9a66. Feel free to use it for this example but you should substitute with your own.

Title and description parameters should be plainly obvious and version is optional, but whack it in anyway.

Scope is important, a feature can be activated at various points in the farm. "Site" means it is activated once per site collection. All sub-sites under this site collection can make use of the feature without having to activate it. This will become clear later.

Next we refer to an <element manifest>. This is a reference to another XML file that actually tells Sharepoint what to do (provisionedfiles.xml). In our case, it is going to tell SharePoint to upload the CustomBrand.CSS file to the site collection style library.

Let’s take a look.

<Elements xmlns="http://schemas.microsoft.com/sharepoint/">
    <Module Name="MyPimpedStyles" Url="Style Library" RootWebOnly="TRUE">        
        <File Url="CustomBrand.css" Type="GhostableInLibrary" />    
    </Module>
</Elements>

In this file, the top-level Elements element defines the elements comprising the Feature. In my previous post, I outlined a table of elements that can be used to install SharePoint features.

  • Content Types: Contains a definition of a SharePoint content type.
  • Content Type Binding: Actually applies a content type to a document library.
  • Control: Allows you to replace existing controls on the page, such as the search or navigation with your own custom control.
  • Custom Action: You can define custom actions such as add a new menu item in "Site Actions".
  • Feature/Site Template Association: This allows you to bind a feature to a site template so that the feature is included in new sites based on that template.
  • Field: Contains a field, or column definition that can be reused in multiple lists.
  • Hide Custom Action: Opposite to "Custom Action", where you want to hide menu items.
  • List Instance: Provisions a SharePoint site with a list which includes specific data.
  • List Template: A list definition or template, which defines a list that can be provisioned to a SharePoint site.
  • Module: Deploys files which are included when provisioning sites.
  • Receiver: Defines an event handler for a list, or document library
  • Workflow: Defines a workflow for a list, or document library. 

So as you can see in the above XML file, we have used the MODULE element to install 1 single file. Let’s examine the <MODULE> and <FILE> element in detail.

<Module Name="MyPimpedStyles" Url="Style Library" RootWebOnly="TRUE">        
<File Url="CustomBrand.css" Type="GhostableInLibrary" />    
</Module>

Module Attribute Description
Name Required Text. Contains the name of the file set.
RootWebOnly Optional Boolean. TRUE if the files specified in the module are installed only in the top-level Web site of the site collection.
Url Optional Text. Specifies the virtual path of the folder in which to place the files when a site is instantiated. If Path is not specified, the value of Url is used for the physical path. Use the Url attribute to provision a folder through the Feature.
File
Attribute
Description
IgnoreIfAlreadyExists Optional Boolean. TRUE to provision the view even if the file aready exists at the specified URL; otherwise, FALSE.
Type Optional Text. Specifies that the file be cached in memory on the front-end Web server. Possible values include Ghostable and GhostableInLibrary. Both values specify that the file be cached, but GhostableInLibrary specifies that the file be cached as part of a list whose base type is Document Library.When changes are made, for example, to the home page through the UI, only the differences from the original page definition are stored in the database, while default.aspx is cached in memory along with the schema files. The HTML page that is displayed in the browser is constructed through the combined definition resulting from the original definition cached in memory and from changes stored in the database.

So, the module section is specifying where any <file> elements be copied to. We are going to copy this to the document library called "Style Library" in the root web site for the site collection. 

 

Step 2. Installing and testing the Feature

Now that our web developer has created the feature, they test it on their development SharePoint server. Open command prompt and execute the STSADM -installfeature command. When the -name parameter is specified, SharePoint knows to look in the TEMPLTE\FEATUIRES folder already, so you do not have to specify a full path.

Step 3. Test the feature

Okay, so the feature is installed. Now what? Now we need to activate this feature on a site collection. Here is the "Style Library" of my test site. (Remember that this library will not exist unless the SharePoint Publishing Infrastructure feature has been installed). Note that at this time, there is no CSS file called CUSTOMBRAND.CSS

So now let’s Activate the feature. Browse to Site Actions>Site Settings and from the Site Collection Administration menu, choose "Site Collection Features". Lo and behold! We have our feature listed! Note the title and description is as per our FEATURE.XML file.

Click "Activate" to activate the feature (you can also do this on the command line via STSADM -o activatefeature command). Once it is marked as active, re-examine the style library. Woo freakin hoo! There is our CSS file!

Step 5. Test and deploy the Feature

In our example here, we can test this feature, by choosing to use this new CSS file in the master page settings of any site within the site collection. The Site Collection administrator navigates to site settings->look and feel->master page settings and specifies the CSS file override as shown below.

By clicking on the "Browse" button, they can select the CSS file from the style library in the site collection.

This highlights the relationship between the web designer and the farm, site collection and site owners. In a large production farm the sequence would look something like this.

  • The developer creates and tests the feature
  • The developer hands the tested and approved feature to the the SharePoint farm administrator
  • The SharePoint farm administrator copies this feature into the FEATURES folder on the web front end servers on the farm and notifies the site collection administrators that the feature has been installed
  • Each site collection administrator activates the feature and informs the site owners that the feature is now available.
  • Each site owner optinally chooses to use this new CSS installed by the feature.

Summing Up

I hope that you found this article useful. Now you are going to totally hate me, because now I am going to tell you that features are only half of the solution to SharePoint customisation. "What is the other half"? you may ask.  Well the other half of the solution is "solutions" … don’t you just love generic terminology!

 

 

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle



Today is: Wednesday 27 August 2014 |