A lesser known way to fine-tune SharePoint search precision…

Send to Kindle

Hi all

While I’d like to claim credit for the wisdom in this post, alas I cannot. One of Seven Sigma’s consultants (Daniel Wale) worked this one out and I thought that it was blog-worthy. Before I get into the issue and Daniel’s resolution, let me give you a bit of search engine theory 101 with a concept that I find is useful to help understand search optimisation.

Precision vs. recall

Each time a person searches for information, there is an underlying goal or intended outcome. While there has been considerable study of information seeking behaviours in academia and beyond, they boil down to three archetype scenarios.

  1. “I know exactly what I am looking for” – The user has a particular place in mind, either because they visited it in the past or because they assume it exists. This known as known item seeking, but is also referred to as navigational seeking or refinding.
  2. “I’m not sure what I am looking for but I’ll know it when I find it” – This is known as exploratory seeking and the purpose is to find information assumed to be available. This is characterised by
    • – Looking for more than one answer
    • – No expectation of a “right” answer
    • – Open ended
    • – Not necessarily knowing much about what is being looking for
    • – Not being able to articulate what is being looked for
  3. “Gimme gimme gimme!” – A detailed research type search known as exhaustive seeking, leaving no stone unturned in topic exploration. This is characterised by;
    • – Performing multiple searches
    • – Expressing what is being looked for in many ways

Now among other things, each of these scenarios would require different search results to meet the information seeking need. For example: If you know what you are looking for, then you would likely prefer a small, highly accurate set of search results that has the desired result at the top of the list. Conversely if you are performing an exploratory or exhaustive search, you would likely prefer a greater number of results since any of them are potentially relevant to you.

In information retrieval, the terms precision and recall are used to measure search efficiency. Google’s Tim Bray put it well when he said “recall measures how well a search system finds what you want and precision measures how well it weeds out what you do not want”. Sometimes recall is just what the doctor ordered, whereas other times, precision is preferred.

The scenario and the issue…

That said, recently, Seven Sigma worked on a knowledgebase project for a large customer contact centre. The vast majority of the users of the system are customer centre operators who deal directly with all customer enquiries and have worked there for a long time. Thus most of the search behaviours are in the known item seeking category as they know the content pretty well – it is just that there is a lot of it. Additionally, picture yourself as one of those operators and then imagine the frustration a failed or time consuming search with an equally frustrated customer on the end of the phone and a growing queue of frustrated callers waiting their turn. In this scenario, search results need to be as precise as possible.

Thus, we invested a lot of time in the search and navigation experience on this project and that investment paid off as the users were very happy with the new system and particularly happy with the search experience. Additionally, we created a mega menu solution to the current navigation that dynamically builds links from knowledgebase article metadata and a managed metadata term set. This was done via the data view web part, XSLT, JavaScript and Marc’s brilliant SPServices. We were very happy with it because there was no server side code at all, yet it was very easy to administer.

So what was the search related issue? In a nutshell, we forgot that the search crawler doesn’t differentiate between your pages content and items in your custom navigation. As a result, we had an issue where searches did not have adequate precision.

To explain the problem, and the resolution, I’ll take a step back and let Daniel continue the story… Take it away Dan…

The knowledgebase that Paul described above contained thousands of articles, and when the search crawler accessed each article page, it also saw the titles of many other articles in the dynamic menu code embedded in the page. As a result, this content also got indexed. When you think about it, the search crawler can’t tell whether content is real content versus when it is a dynamic menu that drops down/slides out when you hover over the menu entry point. The result was that when users searched for any term that appeared in the mega menu, they would get back thousands of results (a match for every page) even when the “actual content” of the page doesn’t contain any references to the searched term.

There is a simple solution however, for controlling what the SharePoint search crawler indexes and what it ignores. SharePoint knows to exclude content that exists inside of <div> HTML tags that have the class noindex added to them. Eg

<div class=”menu noindex> 
  <ul> 
    <li>Article 1</li> 
    <li>Article 2</li> 
  </ul> 
</div>

There is one really important thing to note however. If your <div class=”noindex”> contains a nested <div> tag that doesn’t contain the noindex class, everything inside of this inner <div> tag will be included by the crawler. For example:

<div class=”menu noindex> 
  <ul> 
    <li>Article 1</li> 

      <div class=”submenu>
        <ul>
          <li>Article 1.1</li>
          <li>Article 1.2</li>
        </ul>
      </div>

    <li>Article 2</li> 
  </ul> 
</div>

In the code above the nested <div> to surround the submenu items does not contain the noindex class. So the text “Article 1.1” and “Article 1.2” will be crawled, while the “Article 1” and “Article 2” text in the parent <div> will still be excluded.

Obviously the example above its greatly simplified and like our solution, your menu is possibly making use of a DataViewWebPart with an XSL transform building it out. It’s inside your XSL where you’ll need to include the <div> with the noindex class because the Web Part will generate its own <div> tags that will encapsulate your menu. (Use the browser Developer Tools and inspect the code that it inserts if you aren’t familiar with the code generated, you’ll find at least one <div> elements that is nested inside any <div class=”noindex”> you put around your web part thinking you were going to stop the custom menu being crawled).

Initially looking around for why our search results were being littered with so many results that seemed irrelevant, I found the way to exclude the custom menu using this method rather easily, I also found a lot of forum posts of people having the same issue but reporting that their use of <div> tags with the noindex class was not working. Some of these posts people had included snippets of their code, each time they had nested <div> tags and were baffled by why their code wasn’t working. I figured most people were having this problem because they simply don’t read the detail in the solutions about the nesting or simply don’t understand that the web part will generate its own HTML into their page and quite likely insert a <div> that surrounds the content they are wanting to hide. As any SharePoint developer quickly finds out a lot of knowledge in SharePoint won’t come from well set out documentation library with lots of code examples that developers get used to with other environments, you need to read blogs (like this one), read forums, talk to colleagues and just build up your own experience until these kinds of gotchas are just known to you. Even the best SharePoint developer can overlook simple things like this and by figuring them out they get that little bit better each time.

Being a SharePoint developer is really about being the master of self-learning, the master of using a search engine to find the knowledge you need and most importantly the master of knowing which information you’re reading is actually going to be helpful and what is going to lead you down the garden path. The MSDN blog post by Mark Arend (http://blogs.msdn.com/b/markarend/archive/2010/06/07/control-search-indexing-crawling-within-a-page-with-noindex.aspx) gives a clear description of the problem and the solution, he also states that it is by design that nested <div> tags are re-evaluated for the noindex class. He also mentions the product team was considering changing this…  did this create the confusion for people or was it that they read the first part of the solution and didn’t read the note about nested <div> tags? In any case it’s a vital bit of the solution that it seems a lot of people overlook still.

In case you are wondering, the built in SharePoint navigation menu’s already have the correct <div> tags with the noindex class surrounding them so they aren’t any concern. This problem only exists if you have inserted your own dynamic menu system.

Other Search Provider Considerations

It is more common that you think that some sites do not just use SharePoint Search. The <div class=”noindex”> is a SharePoint specific filter for excluding content within a page, what if you have a Google Search Appliance crawling your site as well? (Yep… we did in this project)

You’re in luck, the Google documents how to exclude content within a page from their search appliance. There are a few different options but the equivalent blanket ignore of the contents between the <div class=”noindex”> tags would be to encapsulate the section between the following two comments

<!–googleoff: all–>

and

<!–googleon: all–>

If you want to know more about the GSA googleoff/googleon tags and the various options you have here is the documentation: http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Preparing.html#pagepart

Conclusion

(… and Paul returns to the conversation).

I think Dan has highlighted an easy to overlook implication of custom designing not only navigational content, but really any type of dynamically generated content on a page. While the addition of additional content can make a page itself more intuitive and relevant, consider the implication on the search experience. Since the contextual content will be crawled along with the actual content, sometimes you might end up inadvertently sacrificing precision of search results without realising.

Hope this helps and thanks for reading (and thanks Dan for writing this up)

 

Paul Culmsee

www.sevensigma.com.au

h2bp2013

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

High school students showing us SharePoint consultants how it’s done

Send to Kindle

Hi all

Once in a while, you can come across a case study that not only showcases innovative and brilliant solutions, but tells a much deeper story that both inspires and teaches. I am writing this post to tell you such a story – a story about genuine collaboration and what it can enable when the right conditions exist to foster it.

To explain this story, I first need to talk about the work of an academic named Richard Hackman. Here is a guy who spent most of his working life examining the factors that make teams work really well. Over the years he studied hundreds of high performing teams, trying to distil the magical ingredients that would lead to success for other teams. He would come up with theories, then create models that looked great on a whiteboard, but when applied to real teams in the real world, reality never fitted the models.

From causes to conditions…

After years of doing this, Hackman started to wonder whether he was leaning the ladder against the wrong wall. In other words, he wondered if trying to determine the causes of team efficacy by looking at successful teams retrospectively was the wrong approach. In the end, he changed his focus and asked himself a different question. What are the enabling conditions that need to exist that give rise to great teams?

He came up with six conditions arguing that irrespective of what else you did or what methodology you used, usually led to better results. I will give you a super brief summary below:

  1. A real team: Interdependence among members, clear boundaries distinguishing members from non-members and moderate stability of membership over time
  2. A compelling purpose: A purpose that is clear, challenging, and consequential. It energizes team members  and fully engages their talents
  3. Right people: People who had task expertise, self organised and skill in working collaboratively with others
  4. Clear norms of conduct: Team understands clearly what behaviours are, and are not, acceptable
  5. A supportive organisational context: The team has the resources it needs and the reward system provides recognition and positive consequences for excellent team performance
  6. Appropriate coaching: The right sort of coaching for the team was provided at the right time

Now my interest in Hackman and his conditions stemmed from reviewing the published “models” for SharePoint governance. Whether it is the 7 “pillars”, the 5 “steps”, or the 6 “focus areas”, all are developed in a retrospective way – by looking at a mythically perfect SharePoint solution and then breaking it down into all the things that need to be done to enable it. You see, for a long time now, I have deliberately not started with one of the models up front and Hackman offered me a reason why. Instead I first strive to create the conditions that Hackman lists above and develop governance as it is needed, rather than follow a fixed model.

Meet Louis Zulli Jr and his students

Earlier this year, I met Louis Zulli Jnr – a teacher out of Florida who is part of a program called the Centre of Advanced Technologies. We were co-keynoting at a conference and he came on after I had droned on about common SharePoint governance mistakes. Louis then gave a talk that blew me away, and at the same time proved Hackman completely right. The majority of Lou’s presentation showcased a whole bunch of SharePoint powered solutions that his students had written. The solutions themselves were very impressive, as this was not just regular old SharePoint customisation in terms of a pretty looking site with a few clever web parts. Instead, we were treated to examples like:

  • IOS, Android and Windows Phone  apps that leveraged SharePoint to display teacher’s assignments, school events and class times;
  • Silverlight based application providing a virtual tour of the campus;
  • Integration of SharePoint with Moodle;
  • An Academic Planner web application allowing students to plan their classes, submit a schedule, have them reviewed, track of the credits of the classes selected and whether a student’s selections meet graduation requirements;
  • An innovative campus Hall Pass system that leveraged jQuery, HTML5, CSS3, XML, JSON, REST, List Data Web Services and features integration with IOS, Windows 8 and swipe card hardware.

All of this and more was developed by 16 to 18 year olds and all at a level of quality that I know most SharePoint consultancies would be jealous of. To any of Lou’s students who read this – and I have consulted and delivered SharePoint since 2006, as well as speaking to people around the world on SharePoint – the work quality that I saw is world-class and you all have lucrative careers ahead of you in the SharePoint space and beyond.

image

So the demos themselves were impressive enough, but that is actually not what impressed me the most. In fact, what had me hooked was not on the slide deck. It was the anecdotes that Lou told about the dedication of his students to the task and how they went about getting things done. He spoke of students working during their various school breaks to get projects completed and how they leveraged each other’s various skills and other strengths. Lou’s final slide summed his talk up brilliantly, and really spoke to Hackman’s six conditions. The slide made the following points:

  • Students want to make a difference! Give them the right project and they do incredible things.
  • Make the project meaningful. Let it serve a purpose for the campus community.
  • Learn to listen. If your students have a better way, do it. If they have an idea, let them explore it.
  • Invest in success early. Make sure you have the infrastructure to guarantee uptime and have a development farm.
  • Every situation is different but there is no harm in failure. “I have not failed. I’ve found 10,000 ways that won’t work” – Thomas A. Edison

If you look at the above 5 points and think about Hackman’s conditions of compelling direction, supportive context, real team and coaching in particular, you can see that Lou ensured those conditions were present. The results of course spoke for themselves.

About halfway through Lou’s talk, I decided that whether he liked it or not, he was coming out to Australia to tell this story. So we sat down together and talked for a long while and I asked him all sorts of questions about his students, the projects, how he coached his students and how his own teaching style developed. I ended up showing him Hackman’s six conditions for great teams performance and he said “that’s what we do”.

The real lesson…

So as I write this, Lou is on his long journey home after similarly blowing away the attendees of the Australian SharePoint conference with his story about what he and his students have done. His talk was the hit of the conference and I hope that the staff and students of Lakewood High School read this post because it’s important for them to know that their story and examples were the topic of much conversation amongst attendees and highly inspiring. I also hope that people in the SharePoint community read this because CAT shows precisely why SharePoint can be such an amazing enabler within organisations when the right conditions are in place for it. Governance models are great and all, but without these enabling conditions in place, cannot deliver great outcomes on their own.

This leads me onto one final cautionary point – directed at Lou’s students, but applicable to all readers who aspire to improve collaboration in their organisations and their projects.

There are plenty of clever people in this world – in fact most IT people from my experience are intellectually very clever (IQ), but some have all the emotional maturity (EQ) of a baseball bat. IQ is what you are born with, but EQ is caught and lived. What makes great SharePoint practitioners (and in fact great leaders) is EQ, not just IQ and the CAT program shows what happens when clever people are given discretionary freedom with supportive conditions in place. My advice is to never forget that it is the conditions in which a team or organisation finds itself that a strong predictor of outcome. Take the same clever people and change the conditions (for example, from a supportive educational institution to an organisation with a blame culture and silo based fiefdoms) and you will get very different outcomes indeed.

What students may not realise is that what the CAT program is really teaching them is the experience of living those enabling conditions and therefore teaching them EQ. These students will eventually move into organisations that do not necessarily have the same enabling conditions as what exists for them now. So look past the cool API’s, the development tools, technical whitepapers, the certifications, endless debates as to whether X vs Y is the best practice, and understand the conditions like Hackman did. Strive to (re)create those conditions in all your future work and you will go further than a SharePoint laden CV ever will.

This of course, took me around 18 years of working in IT before I figured it out and have been making amends ever since. So whatever you do, wise up earlier than I did!

 

Thanks for reading

Paul Culmsee

www.cleverworkarounds.com

www.hereticsguidebooks.com

HGBP_Cover-236x300

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle