A lesser known way to fine-tune SharePoint search precision…

Send to Kindle

Hi all

While I’d like to claim credit for the wisdom in this post, alas I cannot. One of Seven Sigma’s consultants (Daniel Wale) worked this one out and I thought that it was blog-worthy. Before I get into the issue and Daniel’s resolution, let me give you a bit of search engine theory 101 with a concept that I find is useful to help understand search optimisation.

Precision vs. recall

Each time a person searches for information, there is an underlying goal or intended outcome. While there has been considerable study of information seeking behaviours in academia and beyond, they boil down to three archetype scenarios.

  1. “I know exactly what I am looking for” – The user has a particular place in mind, either because they visited it in the past or because they assume it exists. This known as known item seeking, but is also referred to as navigational seeking or refinding.
  2. “I’m not sure what I am looking for but I’ll know it when I find it” – This is known as exploratory seeking and the purpose is to find information assumed to be available. This is characterised by
    • – Looking for more than one answer
    • – No expectation of a “right” answer
    • – Open ended
    • – Not necessarily knowing much about what is being looking for
    • – Not being able to articulate what is being looked for
  3. “Gimme gimme gimme!” – A detailed research type search known as exhaustive seeking, leaving no stone unturned in topic exploration. This is characterised by;
    • – Performing multiple searches
    • – Expressing what is being looked for in many ways

Now among other things, each of these scenarios would require different search results to meet the information seeking need. For example: If you know what you are looking for, then you would likely prefer a small, highly accurate set of search results that has the desired result at the top of the list. Conversely if you are performing an exploratory or exhaustive search, you would likely prefer a greater number of results since any of them are potentially relevant to you.

In information retrieval, the terms precision and recall are used to measure search efficiency. Google’s Tim Bray put it well when he said “recall measures how well a search system finds what you want and precision measures how well it weeds out what you do not want”. Sometimes recall is just what the doctor ordered, whereas other times, precision is preferred.

The scenario and the issue…

That said, recently, Seven Sigma worked on a knowledgebase project for a large customer contact centre. The vast majority of the users of the system are customer centre operators who deal directly with all customer enquiries and have worked there for a long time. Thus most of the search behaviours are in the known item seeking category as they know the content pretty well – it is just that there is a lot of it. Additionally, picture yourself as one of those operators and then imagine the frustration a failed or time consuming search with an equally frustrated customer on the end of the phone and a growing queue of frustrated callers waiting their turn. In this scenario, search results need to be as precise as possible.

Thus, we invested a lot of time in the search and navigation experience on this project and that investment paid off as the users were very happy with the new system and particularly happy with the search experience. Additionally, we created a mega menu solution to the current navigation that dynamically builds links from knowledgebase article metadata and a managed metadata term set. This was done via the data view web part, XSLT, JavaScript and Marc’s brilliant SPServices. We were very happy with it because there was no server side code at all, yet it was very easy to administer.

So what was the search related issue? In a nutshell, we forgot that the search crawler doesn’t differentiate between your pages content and items in your custom navigation. As a result, we had an issue where searches did not have adequate precision.

To explain the problem, and the resolution, I’ll take a step back and let Daniel continue the story… Take it away Dan…

The knowledgebase that Paul described above contained thousands of articles, and when the search crawler accessed each article page, it also saw the titles of many other articles in the dynamic menu code embedded in the page. As a result, this content also got indexed. When you think about it, the search crawler can’t tell whether content is real content versus when it is a dynamic menu that drops down/slides out when you hover over the menu entry point. The result was that when users searched for any term that appeared in the mega menu, they would get back thousands of results (a match for every page) even when the “actual content” of the page doesn’t contain any references to the searched term.

There is a simple solution however, for controlling what the SharePoint search crawler indexes and what it ignores. SharePoint knows to exclude content that exists inside of <div> HTML tags that have the class noindex added to them. Eg

<div class=”menu noindex> 
  <ul> 
    <li>Article 1</li> 
    <li>Article 2</li> 
  </ul> 
</div>

There is one really important thing to note however. If your <div class=”noindex”> contains a nested <div> tag that doesn’t contain the noindex class, everything inside of this inner <div> tag will be included by the crawler. For example:

<div class=”menu noindex> 
  <ul> 
    <li>Article 1</li> 

      <div class=”submenu>
        <ul>
          <li>Article 1.1</li>
          <li>Article 1.2</li>
        </ul>
      </div>

    <li>Article 2</li> 
  </ul> 
</div>

In the code above the nested <div> to surround the submenu items does not contain the noindex class. So the text “Article 1.1” and “Article 1.2” will be crawled, while the “Article 1” and “Article 2” text in the parent <div> will still be excluded.

Obviously the example above its greatly simplified and like our solution, your menu is possibly making use of a DataViewWebPart with an XSL transform building it out. It’s inside your XSL where you’ll need to include the <div> with the noindex class because the Web Part will generate its own <div> tags that will encapsulate your menu. (Use the browser Developer Tools and inspect the code that it inserts if you aren’t familiar with the code generated, you’ll find at least one <div> elements that is nested inside any <div class=”noindex”> you put around your web part thinking you were going to stop the custom menu being crawled).

Initially looking around for why our search results were being littered with so many results that seemed irrelevant, I found the way to exclude the custom menu using this method rather easily, I also found a lot of forum posts of people having the same issue but reporting that their use of <div> tags with the noindex class was not working. Some of these posts people had included snippets of their code, each time they had nested <div> tags and were baffled by why their code wasn’t working. I figured most people were having this problem because they simply don’t read the detail in the solutions about the nesting or simply don’t understand that the web part will generate its own HTML into their page and quite likely insert a <div> that surrounds the content they are wanting to hide. As any SharePoint developer quickly finds out a lot of knowledge in SharePoint won’t come from well set out documentation library with lots of code examples that developers get used to with other environments, you need to read blogs (like this one), read forums, talk to colleagues and just build up your own experience until these kinds of gotchas are just known to you. Even the best SharePoint developer can overlook simple things like this and by figuring them out they get that little bit better each time.

Being a SharePoint developer is really about being the master of self-learning, the master of using a search engine to find the knowledge you need and most importantly the master of knowing which information you’re reading is actually going to be helpful and what is going to lead you down the garden path. The MSDN blog post by Mark Arend (http://blogs.msdn.com/b/markarend/archive/2010/06/07/control-search-indexing-crawling-within-a-page-with-noindex.aspx) gives a clear description of the problem and the solution, he also states that it is by design that nested <div> tags are re-evaluated for the noindex class. He also mentions the product team was considering changing this…  did this create the confusion for people or was it that they read the first part of the solution and didn’t read the note about nested <div> tags? In any case it’s a vital bit of the solution that it seems a lot of people overlook still.

In case you are wondering, the built in SharePoint navigation menu’s already have the correct <div> tags with the noindex class surrounding them so they aren’t any concern. This problem only exists if you have inserted your own dynamic menu system.

Other Search Provider Considerations

It is more common that you think that some sites do not just use SharePoint Search. The <div class=”noindex”> is a SharePoint specific filter for excluding content within a page, what if you have a Google Search Appliance crawling your site as well? (Yep… we did in this project)

You’re in luck, the Google documents how to exclude content within a page from their search appliance. There are a few different options but the equivalent blanket ignore of the contents between the <div class=”noindex”> tags would be to encapsulate the section between the following two comments

<!–googleoff: all–>

and

<!–googleon: all–>

If you want to know more about the GSA googleoff/googleon tags and the various options you have here is the documentation: http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html#pagepart

Conclusion

(… and Paul returns to the conversation).

I think Dan has highlighted an easy to overlook implication of custom designing not only navigational content, but really any type of dynamically generated content on a page. While the addition of additional content can make a page itself more intuitive and relevant, consider the implication on the search experience. Since the contextual content will be crawled along with the actual content, sometimes you might end up inadvertently sacrificing precision of search results without realising.

Hope this helps and thanks for reading (and thanks Dan for writing this up)

 

Paul Culmsee

www.sevensigma.com.au

h2bp2013

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

More SharePoint Branding – Customisation using JavaScript Part 3

Send to Kindle


Hey there. Welcome to part 3 of my series on SharePoint customisation using JavaScript and web parts.

So here is the lowdown so far. We are trying to find an effective, repeatable way to easily customise SharePoint form pages, so that we can hide fields or form elements when we need to. The goals were to:

  • Allow hidden fields based on identity
  • Avoid use of SharePoint Designer
  • Avoid customisations to the form pages that unghosted the pages from the site definition

So how have we progressed thus far?.

  • Part 1 of this series looked at how we can use JavaScript to deal with the common request of hiding form elements from the user in lists and document libraries.
  • Part 2 wrapped this JavaScript code into a web part which has been loaded into the SharePoint web part gallery.

So let’s knock the rest of this over and pick up right from we left off…

CleverWorkArounds Coffee requirement of this post depends on how much you hate JavaScript.

Metrosexual web developer    image

Socially inept technical guy    imageimageimage

[Quick Navigation: Part 1, Part 2, Part 4, Part 5 and Part 6]

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

More SharePoint Branding – Customisation using JavaScript Part 2

Send to Kindle

Hi again.

JavaScript sucks! There I said it. Despite me hating it as a programming language, I can’t deny that in SharePoint, it does have its uses.

CleverWorkArounds Coffee requirement of this post depends on how much you hate JavaScript.

Metrosexual web developer    image
Socially inept technical guy    imageimageimage
Luddite IT manager                   imageimageimageimageimageimageimageimageimage 
(sorry … why are you here anyway?)

[Quick Navigation: Part 1, Part 3, Part 4, Part 5 and Part 6]

To quickly recap the first post of this series, we looked at how we can use JavaScript to deal with the common request of hiding form elements from the user in lists and document libraries. The technique demonstrated can be used for columns, buttons and whatever else you want. The method once debugged, is fairly easy to implement with SharePoint designer with and some cut and paste.

But there are several problems with the method that prevent it from getting a better CleverWorkaround rating than “Meh”. They include:

  • One size fits all, fields are hidden for all visitors irrespective of need.
  • You need to modify the page in SharePoint designer via cut and paste of JavaScript code
  • You need to modify auto-generated pages
  • You need to modify a page from its site definition
  • Insecure, relying on client side to hide content/controls is not a secure solution

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SharePoint Branding Part 7 -The ‘governance’ of it all..

Send to Kindle

Well, here we are! After delving into dark arts where everybody but metrosexual web designers fear to tread (HTML and CSS), we then delved into the areas that metrosexual web designers truly fear to tread (packaging, deployment and even some c# code!). Finally, we get to the area where everybody is interested until it happens to get in their way! (Ooh, I am a cynical old sod tonight).

That is Governance!

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SharePoint Branding Part 5 – Feature Improvements and Bugs

Send to Kindle

So, here we are at the fifth article in my series on SharePoint branding. By now, we have left all the master page stuff way behind, and we have created a custom feature to install our branding to a server.

To recap for those of you hitting this page first, I suggest you go back and read this series in order.

  • Part 1 dealt with the publishing feature, and some general masterpage/CSS concepts and some quirks (core.css and application.master) that have to be worked around.
  • Part 2 delved into the methods to work around the application.master and core.css issue
  • Part 3 delved further into the methods to work around the application.master and core.css issue and the option that solved a specific problem for me
  • Part 4 then changed tack and introduced how to package up your clever branding

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SharePoint Branding Part 4 – Packaging up your masterpiece into a Feature

Send to Kindle

Welcome to the fourth article in my series on SharePoint branding. Sorry for the time it’s taken to get this one out, but a certain game called “The Legend of Zelda:  Hourglass Phantom” on NDS got in the way. I finished it yesterday and only had to cheat via google once :-). Anyway it’s out of my system now so I can get back to this.

After 3 epic articles on all that painful CSS and master page stuff (part one, two and three), we now focus on what you have to do to get your branding masterpiece deployed to the SharePoint farm the clever way. In this next set of articles, we will look at where things should go, and then how to get it there the right way.

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

Careful with pre-requisite SharePoint Features

Send to Kindle

After a web designer applied a new master page to a site, he killed the site. We saw these messages (debug logging was enabled on the site)

Server Error in ‘/’ Application.

——————————————————————————–

Compilation Error

Description: An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately.

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SharePoint Branding – How CSS works with master pages – Part 3

Send to Kindle

Welcome to the third article (or is it a manifesto?) in my series on SharePoint branding. In this article, we continue examining methods to incorporate CSS files into master pages for clever branding. In my first article of this topic, I discussed what I think is the main issue with SharePoint branding – APPLICATION.MASTER and CORE.CSS. The previous article to this one, examined 5 methods to deal with the trial and tribulations of APPLICATION.MASTER and CORE.CSS behavior. So, to recap where we got to, let’s re-examine the original scenario and then look at the summary of the 5 different several methods with their relative merits and issues.

The Scenario

Like many organizations, my client had an existing corporate branding standard that was used in a non SharePoint environment and naturally enough, they wanted their SharePoint site to look like this branding. This was for a fully featured intranet/extranet that utilized most of the MOSS2007 features such as

  • Document collaboration
  • Infopath Forms Services
  • Workflow
  • Enterprise Search
  • Excel services
  • Business Data Catalog
  • Custom web parts
  • Event Handlers

It was *not* a public site at all.

Initial investigation soon concluded that we would need a custom master page. DEFAULT.MASTER didn’t quite have the design flexibility that was required. In fact the branding requirements were actually closer to some of the built in master pages such as BLUEGLASS.MASTER, since this was for intranet purposes, particularly collaborative document management, those master pages are unsuitable.

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SharePoint Branding – How CSS works with master pages – Part 2

Send to Kindle

Jeez if I had realised how long it would take to write these damn articles, I probably wouldn’t have started! In my first article of this topic, I discussed the theory behind master pages, the publishing feature, and what I think is the main issue with SharePoint branding – APPLICATION.MASTER and CORE.CSS. In this article I will now list a branding scenario that I had to deal with, and the various options you can use to deal with the challenges of APPLICATION.MASTER and CORE.CSS

The Scenario

Like many organizations, my client had an existing corporate branding standard that was used in a non SharePoint environment and naturally enough, they wanted their SharePoint site to look like this branding.

This was for a fully featured intranet/extranet that utilized most of the MOSS2007 features such as

  • Document collaboration
  • Infopath Forms Services
  • Workflow
  • Enterprise Search
  • Excel services
  • Business Data Catalog
  • Custom web parts
  • Event Handlers

It was *not* a public site at all.

Initial investigation soon concluded that we would need a custom master page. DEFAULT.MASTER didn’t quite have the design flexibility that was required. In fact the branding requirements were actually closer to some of the built in master pages such as BLUEGLASS.MASTER, since this was for intranet purposes, particularly collaborative document management, those master pages are unsuitable. (I will explain why soon).

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SharePoint Branding – How CSS works with master pages – Part 1

Send to Kindle

This is version 2 of this article, after I went and accidentally blew away my first masterpiece that took literally days to write. If this has ever happened to you, don’t you hate it, that your second version is never as good as your first?

Quick Links: [Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7]

Anyway, this is (attempt 2 of) part 1 of a series of articles that cover SharePoint branding in some detail. Kudos has to be given Heather Solomon especially for her wonderful site and articles on this subject (and author of the book to your left :). In Addition, Andrew Connell and Mike have done some great work that helped me in this area.

So, SharePoint branding was the catalyst behind my deciding to make a blog and call it cleverworkarounds. The whole experience at times made me want to change careers, but I ultimately got there. I would go down one path, only to be stumped by a problem, and think I have the solution, only to find another quirk that needed another workaround. So the aim of cleverworkarounds is to determine the least dodgy way to implement branding of a SharePoint installation. No doubt people will disagree with the conclusions I’ve reached, but that’s expected since the cleverness of a workaround really depends on your needs.

 

So this series of articles will cover a few issues. First some basic master page theory, then I will talk about the difference between branding in WSS3 (the freebie version) and MOSS07 (the pricey one). I will then take you through the quirks of CSS and master pages. Subsequent articles will get into the details of branding techniques and finally finish off by covering the governance issues surrounding branding.

Continue reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle