Why can’t users find stuff on the intranet? An IBIS synthesis–Part 2

Tags: Analysis,Business Analysis,Collaboration,Dialogue mapping,Facilitation,IBIS,Information Architecture,Issue Mapping,Knowledge management,Non Linear Process,Search,shared understanding,Strategy,user engagement,web2.0 @ 5:49 pm

Hi all

This is the second post in a quick series that attempts to use IBIS to analyse an online discussion. Strange as it may sound, but I believe that issue mapping and IBIS is one of the most pure forms of information architecture you can do. This is because a mapper, you are creating a navigable mental model of speech as it is uttered live. This post is semi representative of this. I am creating an IBIS based issue map, but I’m not interacting live with participants. nevertheless, imagine if you will, you sitting in a room with a group of stakeholders answering the question on why users cannot find what they are looking for on the intranet. Can you see its utility in creating shared understanding of a multifaceted issue?

Where we left off…

We finished the previous discussion with a summary map that identified several reasons why it is hard to find information on intranets. In this post we will continue our examination of this topic. What you will notice in this post is that the number of nodes that I capture are significantly less than in part 1. This is because some topics start to become saturated and people’s contributions are the same as what is already captured. In Part 1, I captured 55 nodes from the first 11 replies to the question. In this post, I capture an additional 33 nodes from the next 15 replies.

So without further adieu, lets get into it!

Suzanne Thornley • Just another few to add (sorry 5 not 3 :-):
1. Search engine is not set up correctly or used to full potential
2. Old content is not deleted and therefore too many results/documents returned
3. Documents have no naming convention and therefore it is impossible to clearly identify what they are and if they are current.
4. Not just a lack of metadata but also a lack of governance/training around metadata/meta tagging so that less relevant content may surface because the tagging and metadata is better.
5. Poor and/or low cost search engine is deployed in the mistaken belief that users will be happy/capable of finding content by navigating through a complex intranet structure.

Suzanne offered 5 additional ideas to the original map from where we last left off. She was also straight to the point too, which always makes a mappers job of expressing it in IBIS easier. You might notice that I reversed “Old content is not deleted and therefore too many results/documents returned” in the resulting map. This is because I felt that old content not being deleted was one of a number arguments supporting why too many results are returned.

My first map refactor

With the addition of Suzanne’s contributions, I felt that it was a good time to take stock and adjust the map. First up, I felt that a lot of topics were starting to revolve around the notion of information architecture, governance and user experience design. So I grouped the themes of vocabulary, lack of metadata, excessive results and issues around structure of data as part of a meta theme of “information architecture”. I similarly grouped a bunch of answers into “governance” and “user experience design”. These for me, seemed to be the three meta-themes that were emerging so far…

For the trainspotters, Suzanne’s comment about document naming conventions was added to the “Vocabulary and labelling issues” sub-map. You can’t see it here because I collapsed the detailed so you can see the full picture of themes as they are at this point.

Patrik Bergman • Several of you mention the importance of adding good metadata. Since this doesn’t come natural to all employees, and the wording they use can differ – how do you establish a baseline for all regarding how to use metadata consistently? I have seen this in a KM product from Layer 2 for example, but it can of course be managed without this too, but maybe to a higher cost, or?

Patrick’s comment was a little hard to map. I captured his point that metadata does not come natural to employees as a pro, supporting the idea that lack of metadata is an example of poor information architecture. The other points I opted to leave off, because they were not really related to the core question on why people can’t find stuff on the intranet.

Luc de Ruijter • @Patrik. Metadata are crucial. I’ve been using them since 2005 (Tridion at that time).You can build a lot of functionality with it. And it requires standardisation. If everyone adds his own meta, this will not enable you to create solutions. You can standardize anything in any CMS. So use your CMS to include metadata. If you have a DMS the same applies. (DMS are a more logical tool for intranets, as most enterprise content exists as documenst. Software such as LiveLink can facilitate adding meta in the save as process. You just have to tick some fields before you can save a document on to the intranet.)
@Suzanne. There’s been a lot of buzz about governance. You don’t need governance over meta, you just need a sound metastructure (and a dept of function to manage it – such as library of information management). Basically a lot of ‘governance’ can be automated instead of being discussed all the time :-).

Like Patricks comment, much of what Luc said here wasn’t really related to the question at hand or has been captured already. But I did acknowledge his contribution to the governance debate, and he specifically argued against Suzanne’s point about lack of governance around metadata tagging.

Next we have a series of answers, but you will notice that most of the points are re-iterating points that have already been made.

Patrik Bergman • Thanks Luc. It seems SharePoint gives us some basic metadata handling, but perhaps we need something strong in addition to SharePoint later.

Simon Evans • My top three?
1) The information being searched does not actually exist or exists only in an unrecognisable form and therefore cannot be found!
2) As Karen says above, info is organised by departmental function rather than focussed on end to end business process.
3) Lack of metadata as above

Mahmood Ahmad • @Simon evan. I want to also add Poor Information Structure in the list. Therefore Information Management should be an important factor.

Luc de Ruijter • @Patrik. Sharepoint 2010 is the first version that does something with it. Ms is a bit slow in pushing the possibilities with it.
@Simon @Mahmood Let’s say that information structure is the foundation for an intranet (or any website), and that a lack of metadata is only a symptom of a bad foundation?

Patrik Bergman • Good thing we use the 2010 version then 😀 I will see how good it handles it, and see if we need additional software.

Erin Dammen • I believe 1) lack of robust metadata, resulting in poor search results; 2) structure is not tailored to the way the user thinks; 3) lack of motivation on the part of contributors to make their information easy to use (we have a big problem with people just PDFing EVERYTHING instead of posting HTML pages.) I like that in SP 2010, users have the power to add their own keywords and flag pages as "I like it." Let your community do some of the legwork, I think it helps!

Simon’s first point that the information searched may not exist or may not be in the right format was new, so that was captured under governance. (After all, its hard to architect information when its not there!).

I also added Erin’s third point about lack of motivation on the part of contributors. I mulled over this and decided it was a new theme, so I added it to the root question, rather than trying to make it fit into information architecture, governance or user experience design. I also captured her point on letting the community do the legwork through user tagging (known as folksonomy).

Luc de Ruijter • @all. The list of root causes remains small. This is not surprising (it would be really worrying if the list of causes would be a long list). And it is good to learn that we encounter the same (few but not so easy to solve) issues.
Still, in our line of work these root causes lack overall attention. What could be the reason for that? 🙂
@Erin Motivation is not the issue, I think; and facilitation is. If it is easier to PDF everything, than everyone will do so. And apparently everyone has the tools to do so. (If you don’t want people to PDF stuff, don’t offer them the quick fix.)
If another method of sharing documents is easier, then people will migrate. How easy is it to find PDF’s through search? How easy is it to add metadata to PDF’s? And are colleagues explained why consistent(!) meta is so relevant? Can employees add their own meta keywords? How do you maintain the quality and integrity of your keywords?
Of course it depends on your professional usergroup whether professionals will use "I like" buttons. Its a bit on the Facebook consumer edge if you’d ask me. Very en vogue perhaps, but in my view not so business ‘like’.

Luc, who is playing the devils advocate role as this discussion progresses, provides three counter arguments to Erin’s argument around user motivation. They are all captured as con’s.

Steven Osborne • 1) Its not there and never was
2) Its there but inactive so can no longer be accessed
3) Its not where someone thought it would be or should be or its not called what they thought it was called or should be called.

Marcus Hamilton-Mills • 1) The main navigation is poor
2) The content is titled poorly (e.g internal branding, uncommon wording, not easy to differentiate from other content etc.)
3) Search can’t find it due to poor meta data

patrick c walsh • 1) Navigation breaks down because there’s too much stuff
2) There’s too much crap content hidden away because there’s just too much stuff
and
3) er…there’s just too much stuff

Mark Smith • 1. Poor navigation, information architecture and content sign-posting
2. Lack of content governance, meta-data and inconsistent taxonomy, resulting in poor search capability.
3. The content they are trying to find is out of date, cannot be trusted or isn’t even available on the intranet

Luc de Ruijter • @Steven Had a bit of a laugh there
@all Am I right in making the connection between
– the huge amount of content is an issue
– that internal branding causes confusion (in labeling and titles).
and
the fact that – in most cases – these causes can be back tracked to the owners of intranet, the comms department? They produce most content clutter.
Or am I too quick in drawing that conclusion?

Now the conversation is really starting to saturate. Most of the contributions above are captured already in the map as it is, so I only added two nodes: Patrick’s point about navigation (an information architecture issue) and too much information.

Where are we at now?

We will end part 2 with a summary below. Like the first post, you can click here to see the maps exported in more detail. In part 3, the conversation got richer again, so the maps will change once again.

Until then, thanks for reading

Paul Culmsee

www.sevensigma.com.au

(2) Comments

Why can’t users find stuff on the intranet? An IBIS synthesis–Part 1

Tags: Analysis,Business Analysis,Collaboration,Dialogue mapping,Facilitation,IBIS,Information Architecture,Issue Mapping,Knowledge management,Search,shared understanding,Uncategorized,user engagement @ 8:25 am

There was an interesting discussion on the Intranet Professionals group on LinkedIn recently where Luc De Ruijter asked the question:

What are the main three reasons users cannot find the content they were looking for on intranet?

As you can imagine there were a lot of responses, and a lot more than three answers. As I read through them, I thought it might be a good exercise to use IBIS (the language behind issue mapping) to map the discussion and see what the collective wisdom of the group has to say. So in these posts, I will illustrate the utility of IBIS and Issue mapping for this work, and make some comments about the way the conversation progressed.

So what is IBIS and Issue/Dialogue Mapping?

Issue Mapping captures the rationale behind a conversation or dialogue—the emergent ideas and solutions that naturally arise from robust debate. This rationale is graphically represented using a simple, but powerful, visual structure called IBIS (Issue Based Information System). This allows all elements and rationale of a conversation, and subsequent decisions, to be captured in a manner that can be easily reflected upon.

The elements of the IBIS grammar are below. Questions give rise to ideas, or potential answers. Ideas have pros or cons arguing for or against those ideas.

Dialogue Mapping is essentially Issue Mapping a conversation live, where the mapper is also a facilitator. When it is done live it is powerful stuff. As participants discuss a problem, they watch the IBIS map unfold on the screen. This allows participants to build shared context, identify patterns in the dialogue and move from analysis to synthesis in complex situations. What makes this form of mappingcompelling is that everything is captured. No idea, pro or con is ignored. In a group scenario, this is an extremely efficient way of meeting what social psychologist Hugh Mackay says is the first of the ten human desires which drives us – this being the desire to be taken seriously. Once an idea is mapped, the idea and the person who put it forth are taken seriously. This process significantly reduces “wheel spinning” in meetings where groups get caught up in a frustrating tangled mess of going over the same old ground. It also allows the dialogue to move more effectively to decision points (commitments) around a shared understanding.

In this case though, this was a long discussion on a LinkedIn group so we do not get the benefit of being able to map live. So in this case I will create a map to represent the conversation as it progresses and make some comments here and there…

So let’s kick off with the first reply from Bob Meier.

Bob Meier • Don’t know if these are top 3, but they’re pretty common find-ability issues:
1. Lack of metadata. If there are 2000 documents called “agenda and minutes” then a search engine, fancy intranet, or integrated social tool won’t help.
2. Inconsistent vocabulary and acronyms. If you’ve branded the expense report system with some unintuitive name (e.g. a vendor name like Concur) then I’ll scan right past a link looking for “expense reports” or some variation.
3. Easier alternatives. If it’s easier for me to use phone/email/etc. to find what I want, then I won’t take the time to learn how to use the intranet tools. Do grade schools still teach library search skills? I don’t think many companies do…

In IBIS this was fairly straightforward. Bob listed his three answers with some supporting arguments. I reworded his supporting argument of point 2, but otherwise it pretty much reflects what was said…

Nigel Williams (LION) • I agree with Bob but I’d add to point two not speaking our user base’s language. How many companies offer a failure to find for example (i.e.if you fail to find something in a search you submit a brief form which pops up automatically stating what you were looking for and where you expected to find it? Lots of comms and intranet teams are great at telling people and assuming we help them to learn but don’t listen and learn from all levels of the business.
If I make that number 1 I’ll also add:
2) Adopting social media because everyone else is, not because our business or users need it. This then ostracises the technophobics and concerns some of our less confident regular users. They then form clans of anti-intranetters and revert to tried and tested methods pre-intranet (instant messaging, shared drives, email etc.)
3) Not making the search box available enough. I’m amazed how many users in user testing say they’ve never noticed search hidden in the top right of the banner – “ebay has their’s in the middle of the screen, so does Google. Where’s ours?” is a typical response. If you have a user group at your mercy ask them to search for an item on on Google, then eBay, then Amazon, then finally your intranet. Note whether they search in the first three and then use navigation (left hand side or top menu) when in your intranet.

Nigel’s starts out by supporting Bob’s answer and I therefore add them as pros in the map. Having done this though, I can already see some future conversational patterns. Bob’s two supporting arguments for “not using the vocabulary of users”, actually are two related issues. One is about user experience and the other is about user engagement/governance. Nevertheless, I have mapped it as he stated it at this point and we see what happens.

Luc de Ruijter • @Bob. I recognise your first 2 points. The third however might be a symptom or result, not a cause. Or is it information skills you are refering to?
How come metadata are not used? Clearly there is a rationale to put some effort in this?
@Nigel. Is the situation in which Comm. depts don’t really listen to users a reason for not finding stuff? Or would it be a lack of rapport with users before and while building intranets? Is the cause concepetual, rather than editorial for instance?
(I’m really looking for root causes, the symptoms we all know from daily experience).
Adding more media is something we’ve seen for years indeed. Media tend to create silo’s.
Is your third point about search or about usability/design?

In following sections I will not reproduce the entire map in the blog post – just relevant sections.

In this part of the conversation, Luc doesn’t add any new answers to the root question, but queries three that have been put forward thus far. Also note at this point I believe one of Luc’s answers is for a different question. Bob’s “easier alternatives” point was never around metadata. But Luc asks “how come metadata is not used?”. I have added it to the map here, changing the framing from a rhetorical question to an action. Having said that, if I was facilitating this conversation, I would have clarified that point before committing it to the map.

Luc also indicates that the issue around communications and intranet teams not listening might be due to a lack of rapport.

Finally, he adds an additional argument why social media may not be the utopia it is made out to be, by arguing that adding more media channels creates more information silos. He also argues against the entire notion on the grounds that this is a usability issue, rather than a search issue.

Nigel Williams (LION) • Hi Luc, I think regarding Comms not listening that it is two way. If people are expecting to find something with a certain title or keyword and comms aren’t recognising this (or not providing adequate navigation to find it) then the item is unlikely to be found.
Similarly my third point is again both, it is an issue of usability but if that stops users conducting searches then it would impact daily search patterns and usage.

I interpret this reply as Nigel arguing against Luc’s assertion around lack of rapport being the reason behind intranet and comms teams not listening and learning from all levels of the user base.

Nigel finishes by arguing that even if social media issues are usability issues, they might still impede search and the idea is therefore valid.

Bob Meier • I really like Nigel’s point about the importance of feedback loops on Intranets, and without those it’s hard to build a system that’s continually improving. I don’t have any data on it, but I suspect most companies don’t regularly review their search analytics even if they have them enabled. Browse-type searching is harder to measure/quantify, but I’d argue that periodic usability testing can be used in place of path analysis.
I also agree with Luc – my comment on users gravitating from the Intranet to easier alternatives could be a symptom rather than a cause. However, I think it’s a self-reinforcing symptom. When you eliminate other options for finding information, then the business is forced to improve the preferred system, and in some cases that can mean user training. Not seeing a search box is a great example of something that could be fixed with a 5-minute Intranet orientation.
If I were to replace my third reason, I’d point at ambiguous or mis-placed Intranet ownership . Luc mentions Communications departments, but in my experience many of those are staffed for distributing executive announcements rather than facilitating collective publishing and consumption. I’ve seen many companies where IT or HR own the Intranet, and I think the “right” department varies by company. Communications could be the right place depending on how their role is defined.

Bob makes quite a number of points in this answer, right across various elements of the unfolding discussion. Firstly, he makes a point about analytics and the fact that a lack of feedback loops makes it hard to build a system that continually improves.

In term of the discussion around easier alternatives, Bob offers some strategies to mitigate the issue. He notes that there are training implications when eliminating the easier alternatives.

Finally, Bob identifies issues around the ownership of the intranet as another answer to the original question of people not being able to find stuff on the intranet. He also lists a couple of common examples.

Karen Glynn • I think the third one listed by Bob is an effect not a cause.
Another cause could be data being structured in ways that employees don’t understand – that might be when it is structured by departments, so that users need to know who does what before they can find it, or when it is structured by processes that employees don’t know about or understand. Don’t forget intranet navigations trends are the opposite to the web – 80% of people will try and navigate first rather than searching the intranet.

In this answer, Karen’s starts by agreeing with the point Luc made about “easier alternatives” being a symptom rather than a cause, so there is no need to add it to the map as it is already there. However she provides a new answer to the original question: the structure of information (this by the way is called top-down information architecture – and it was bound to come out of this discussion eventually). She also makes a claim that 80% of people will navigate prior to search on the intranet. I wonder if you can tell what will happen next? Smile

Luc de Ruijter • @Nigel Are (customer) keywords the real cause for not finding stuff? In my opinion this limits the chalenge (of building effective intranet/websites) to building understandable navigation patters. But is navigation the complete story? Where do navigation paths lead users to?
@Bob Doesn’t an investiment in training in order to have colleagues use the search function sound a bit like attacking the symptom? Why is search not easy to locate in the first place? I’d argue you’re looking at a (functional) design flaw (cause) for which the (where is the search?) training is a mere remedy, but not a solution.
@Karen You mention data. How does data relate to what we conventionally call content, when we need to bring structure in it?
Where did you read the 80% intranet-users navigate before searching?

Okay, so this is the first time thus far where I do a little bit of map restructuring. In the discussion so far, we had two ideas offered around the common notion of vocabulary. In this reply, Luc states “Are (customer) keywords the real cause for not finding stuff?” I wasn’t sure which vocabulary issue he was referring to, so this prompted me to create a “meta idea” called “Vocabulary and labelling issues”, of which there are two examples cited thus far. This allowed me to capture the essence of Luc’s comment as a con against the core idea of issues around vocabulary and labelling.

Luc then calls into question Bob’s suggestion of training and eliminating the easier alternatives. Prior to Luc’s counter arguments, I had structured Bob’s argument like this:

To capture Luc’s argument effectively, I restructured the original argument and made a consolidated idea to “eliminate other options and provide training”. This allowed me to capture Luc’s counter argument as shown below.

Finally, Luc asked Karen for the source of her contention that 80% of users navigate intranets, rather than use the search engine first up.

In this final bit of banter for now, the next three conversations did not add too many nodes to the map, so I have grouped them below…

Karen Glynn • Luc, the info came from the Neilsen group.

Helen Bowers • @Karen Do you know if the Neilsen info is available for anyone to look at?

Karen Glynn • I don’t know to be honest – it was in one of the ‘paid for’ reports if I remember correctly.

Luc de Ruijter • @Karen. OK in that case, could you provide us with the title and page reference of the source? Than it can become usable as a footnote (in a policy for instance).Thanks
Reasons so far for not finding stuff:
1. Lack of metadata (lack of content structure).
2. Inconsistent vocabulary and acronyms (customer care words).
3. Adopting social media from a hype-driven motivation (lack of coherence)
4. Bad functional design (having to search for the search box)
5. Lack of measuring and feedback on (quality, performance of) the intranet
6. Silo’s. Site structures suiting senders instead of users

So for all that Banter, here is what I added to what has already been captured.

Where are we at?

At this point, let’s take a breath and summarise what has been discussed so far. Below is the summary map with core answers to the question so far. I have deliberately tucked away the detail into sub maps so you can see what is emerging. Please note I have not synthesised this map yet (well … not too much anyway). I’ll do that in the next post.

If you want to take a look at the entire map as it currently stands, take a look at the final image at the very bottom of this post. (click to enlarge). I have also exported the entire map so far for you to view things in more context. Please note that the map will change significantly as we continue to capture and synthesise the rationale, so as we continue to unpack the discussion, expect this map to change quite a bit..

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(4) Comments

The cloud is not the problem-Part 4: Industry shakeout and playing with the big kids…

Hi all

Welcome to the fourth post about the adaptive change that cloud computing is going to have on practitioners, paradigms and organisations. The previous two posts took a look at some of the dodgier side of two of the industries biggest players, Microsoft and Amazon. While I have highlighted some dumb issues with both, I nevertheless have to acknowledge their resourcing, scalability, and ability to execute. On that point of ability to execute, in this post we are going to expand a little towards the cloud industry more broadly and the inevitable consolidation that is, and will continue to take place.

Now to set the scene, a lot of people know that in the early twentieth century, there were a lot of US car manufacturers. I wonder if you can take a guess at the number of defunct car manufacturers there have been before and after that time.

…Fifty?

…One Hundred?

Not even close…

What if I told you that there were over 1700!

Here is another interesting stat. The table below shows the years where manufacturers went bankrupt or ceased operations. Below that I have put the average shelf life of each company for that decade.

Year	1870’s	1880’s	1890’s	1900’s	1910’s	1920’s	1930’s	1940’s	1950’s	1960’s	1970’s	1980’s	1990’s	2000’s	2010’s
# defunct	4	2	5	88	660	610	276	42	13	33	11	5	5	3	5
avg years in operation	5	1	1	3	3	4	5	7	14	10	19	37	16	49	42

Now, you would expect that the bulk of closures would be depression era, but note that the depression did not start until the late 1920’s and during the boom times that preceded it, 660 manufactures went to the wall – a worse result!

The pattern of consolidation

What I think the above table shows is the classic pattern of industry consolidation after an initial phase of innovation and expansion, where over time, the many are gobbled by the few. As the number of players consolidate, those who remain grow bigger, with more resources and economies of scale. This in turn creates barriers to entry for new participants. Accordingly, the rate of attrition slows down, but that is more due to the fact that there are fewer players in the industry. Those that are left continue to fight their battles, but now those battles take longer. Nevertheless, as time goes on, the number of players consolidate further.

If we applied a cloud/web hosting paradigm to the above table, I would equate the dotcom bust of 2000 with the depression era of the 1920’s and 1930’s. I actually think with cloud computing, we are in the 1960’s and on right now. The largest of the large players have how made big bets on the cloud and have entered the market in a big, big way. For more than a decade, other companies hosted Microsoft technology, with Microsoft showing little interest beyond selling licenses via them. Now Microsoft themselves are also the hosting provider. Does that mean most the hosting providers will have the fate of Netscape? Or will they manage to survive the dance with Goliath like Citrix or VMWare have?

For those who are not Microsoft or Amazon…

Imagine you have been hosting SharePoint solutions for a number of years. Depending on your size, you probably own racks or a cage in some-one else’s data centre, or you own a small data centre yourself. You have some high end VMWare gear to underpin your hosting offerings and you do both managed SharePoint (i.e. offer a basic site collection subscription with no custom stuff – ala Office 365) and you offer dedicated virtual machines for those who want more control (ala Amazon). You have dutifully paid your service provider licensing to Microsoft, have IT engineers on staff, some SharePoint specialists, a helpdesk and some dodgy sales guys – all standard stuff and life is good. You had a crack at implementing SharePoint multi tenancy, but found it all a bit too fiddly and complex.

Then Amazon comes along and shakes things up with their IaaS offerings. They are cost competitive, have more data centres in more regions, a higher capacity, more fault tolerance, a wider variety of services and can scale more than you can. Their ability to execute in terms of offering new services is impossible to keep up with. In short, they slowly but relentlessly take a chunk of the market and continue to grow. So, you naturally counter by pushing the legitimate line that you specialise in SharePoint, and as a result customers are in much more trusted hands than Amazon, when investing on such a complex tool as SharePoint.

But suddenly the game changes again. The very vendor who you provide cloud-based SharePoint services for, now bundles it with Exchange, Lync and offers Active Directory integration (yeah, yeah, I know there was BPOS but no-one actually heard of that). Suddenly the argument that you are a safer option than Amazon is shot down by the fact that Microsoft themselves now offer what you do. So whose hands are safer? The small hosting provider with limited resources or the multinational with billions of dollars in the bank who develops the product? Furthermore, given Microsoft’s advantage in being able to mobilise its knowledge resources with deep product knowledge, they have a richer managed service offering than you can offer (i.e. they offer multi tenancy :).

This puts you in a bit of a bind as you are getting assailed at both ends. Amazon trumps you in the capabilities at the IaaS end and is encroaching in your space and Microsoft is assailing the SaaS end. How does a small fish survive in a pond with the big ones? In my opinion, the mid-tier SharePoint cloud providers will have to reinvent themselves.

The adaptive change…

So for the mid-tier SharePoint cloud provider grappling with the fact that their play area is reduced because of the big kids encroaching, there is only one option. They have to be really, really good in areas the big kids are not good at. In SharePoint terms, this means they have to go to places many don’t really want to go: they need to bolster their support offerings and move up the SharePoint stack.

You see, traditionally a SharePoint hosting provider tends to take two approaches. They provide a managed service where the customer cannot mess with it too much (i.e. Site collection admin access only). For those who need more than that, they will offer a virtual machine and wipe their hands of any maintenance or governance, beyond ensuring that the infrastructure is fast and backed up. Until now, cloud providers could get away with this and the reason they take this approach should be obvious to anyone who has implemented SharePoint. If you don’t maintain operational governance controls, things can rapidly get out of hand. Who wants to deal with all that “people crap”? Besides, that’s a different skill set to typical skills required to run and maintain cloud services at the infrastructure layer.

So some cloud providers will kick and scream about this, and delude themselves into thinking that hosting and cloud services are their core business. For those who think this, I have news for you. The big boys think these are their core business too and they are going to do it better than you. This is now commodity stuff and a by-product of commoditisation is that many SharePoint consultancies are now cloud providers anyway! They sign up to Microsoft or Amazon and are able to provide a highly scalable SharePoint cloud service with all the value added services further up the SharePoint stack. In short, they combine their SharePoint expertise with Microsoft/Amazon’s scale.

Now on the issue of support, Amazon has no specific SharePoint skills and they never will. They are first and foremost a compelling IaaS offering. Microsoft’s support? … go and re-read part 2 if you want to see that. It seems that no matter the big multinational, level 1 tech support is always level 1 tech support.

So what strategies can a mid-tier provider take to stay competitive in this rapidly commoditising space. I think one is to go premium and go niche.

Provide brilliant support. If I call you, day or night, I expect to speak to a SharePoint person straight away. I want to get to know them on a first name basis and I do not want to fight the defence mechanism of the support hierarchy.
Partner with SharePoint consultancies or acquire consulting resources. The latter allows you to do some vertical integration yourself and broaden your market and offerings. A potential KPI for any SharePoint cloud provider should be that no support person ever says “sorry that’s outside the scope of what we offer.”
Develop skills in the tools and systems that surround SharePoint or invest in SharePoint areas where skills are lacking. Examples include Project Server, PerformancePoint, integration with GIS, Records management and ERP systems. Not only will you develop competencies that few others have, but you can target particular vertical market segments who use these tools.
(Controversial?) Dump your infrastructure and use Amazon in conjunction with another IaaS provider. You just can’t compete with their scale and price point. If you use them you will likely save costs, when combined with a second provider you can play the resiliency card and best of all … you can offer VPC 🙂

Conclusion

In the last two posts we looked at some of the areas where both Microsoft and Amazon sometimes struggle to come to grips with the SharePoint cloud paradigm. In this post, we took a look at other cloud providers having to come to grips with the SharePoint cloud paradigm of having to compete with these two giants, who are clearly looking to eke out as much value as they can from the cloud pie. Whether you agree with my suggested strategy (Rackspace appears to), the pattern of the auto industry serves as an interesting parallel to the cloud computing marketplace. Is the relentless consolidation a good thing? Probably not in the long term (we will tackle that issue in the last post in this series). In the next post, we are going to shift our focus away from the cloud providers themselves, and turn our gaze to the internal IT departments – who until now, have had it pretty good. As you will see, a big chunk of the irrational side of cloud computing comes from this area.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(4) Comments

The cloud is not the problem–Part 3: When silos strike back…

Tags: Active Directory,Amazon,Cloud compouting,EC2,Governance,IAAS,Infrastructure,Networking,Performance,planning,Risk,SaaS,Security,SharePoint,VPC,VPN @ 12:27 pm

What can Ikea fails tell us about cloud computing?

My next door neighbour is a builder. When he moved next door, the house was an old piece of crap. Within 6 months, he completely renovated it himself, adding in two bedrooms, an underground garage and all sorts of cool stuff. On the other hand, I bought my house because it was a good location, someone had already renovated it and all we had to do was move in. The reason for this was simple: I had a new baby and more importantly, me and power tools do not mix. I just don’t have the skills, nor the time to do what my neighbour did.

You can probably imagine what would happen if I tried to renovate my house the way my neighbour did. It would turn out like the Ikea fails in the video. Similarly, many SharePoint installs tend to look similar to the video too. Moral of the story? Sometimes it is better to get something pre-packaged than to do it yourself.

In the last post, we examined the “Software as a Service” (SaaS) model of cloud computing in the form of Office 365. Other popular SaaS providers include SlideShare, Salesforce, Basecamp and Tom’s Planner to name a few. Most SaaS applications are browser based and not as feature rich or complex as their on-premise competition. Therefore the SaaS model is that its a bit like buying a kit home. In SaaS, no user of these services ever touches the underlying cloud infrastructure used to provide the solution, nor do they have a full mandate to tweak and customise to their hearts content. SaaS is basically predicated on the notion that someone else will do a better set-up job than you and the old 80/20 rule about what features for an application are actually used.

Some people may regard the restrictions of SaaS as a good thing – particularly if they have dealt with the consequences of one too many unproductive customization efforts previously. As many SharePointer’s know, the more you customise SharePoint, the less resilient it gets. Thus restricting what sort of customisations can be done in many circumstances might be a wise thing to do.

Nevertheless, this actually goes against the genetic traits of pretty much every Australian male walking the planet. The reason is simple: no matter how much our skills are lacking or however inappropriate tools or training, we nevertheless always want to do it ourselves. This brings me onto our next cloud provider: Amazon, and their Infrastructure as a Service (IaaS) model of cloud based services. This is the ultimate DIY solution for those of us that find SaaS to cramping our style. Let’s take a closer look shall we?

Amazon in a nutshell

Okay, I have to admit that as an infrastructure guy, I am genetically predisposed to liking Amazon’s cloud offerings. Why? well as an infrastructure guy, I am like my neighbour who renovated his own house. I’d rather do it all myself because I have acquired the skills to do so. So for any server-hugging infrastructure people out there who are wondering what they have been missing out on? Read on… you might like what you see.

Now first up, its easy for new players to get a bit intimidated by Amazon’s bewildering array of offerings with brand names that make no sense to anybody but Amazon… EC2, VPC, S3, ECU, EBS, RDS, AMI’s, Availability Zones – sheesh! So I am going to ignore all of their confusing brand names and just hope that you have heard of virtual machines and will assume that you or your tech geeks know all about VMware. The simplest way to describe Amazon is VMWare on steroids. Amazon’s service essentially allows you to create Virtual Machines within Amazon’s “cloud” of large data centres around the world. As I stated earlier, the official cloud terminology that Amazon is traditionally associated is called Infrastructure as a Service (IaaS). This is where, instead of providing ready-made applications like SaaS, a cloud vendor provides lower level IT infrastructure for rent. This consists of stuff like virtualised servers, storage and networking.

Put simply, utilising Amazon, one can deploy virtual servers with my choice of operating system, applications, memory, CPU and disk configuration. Like any good “all you can eat” buffet, one is spoilt for choice. One simply chooses an Amazon Machine Image (AMI) to use as a base for creating a virtual server. You can choose one of Amazon’s pre-built AMI’s (Base installs of Windows Server or Linux) or you can choose an image from the community contributed list of over 8000 base images. Pretty much any vendor out there who sells a turn-key solution (such as those all-in-one virus scanning/security solutions) has likely created an AMI. Microsoft have also gotten in on the Amazon act and created AMI’s for you, optimised by their product teams. Want SQL 2008 the way Microsoft would install it? Choose the Microsoft Optimized Base SQL Server 2008R2 AMI which “contains scripts to install and optimize SQL Server 2008R2 and accompanying services including SQL Server Analysis services, SQL Server Reporting services, and SQL Server Integration services to your environment based on Microsoft best practices.”

The series of screen shots below shows the basic idea. After signing up, use the “Request instance wizard” to create a new virtual server by choosing an AMI first. In the example below, I have shown the default Amazon AMI’s under “Quick start” as well as the community AMI’s.

Amazons default AMI’s

Community contributed AMI’s

From the list above, I have chosen Microsoft’s “Optimized SQL Server 2008 R2 SP1” from the community AMI’s and clicked “Select”. Now I can choose the CPU and memory configurations. Hmm how does a 16 core server sound with 60 gig of RAM? That ought to do it… 🙂

Now I won’t go through the full description of commissioning virtual servers, but suffice to say that you can choose which geographic location this server will reside within Amazon’s cloud and after 15 minutes or so, your virtual server will be ready to use. It can be assigned a public IP address, firewall restricted and then remotely managed as per any other server. This can all be done programmatically too. You can talk to Amazon via web services start, monitor, terminate, etc. as many virtual machines as you want, which allows you to scale your infrastructure on the fly and very quickly. There are no long procurement times and you then only pay for what servers are currently running. If you shut them down, you stop paying.

But what makes it cool…

Now I am sure that some of you might be thinking “big deal…any virtual machine hoster can do that.” I agree – and when I first saw this capability I just saw it as a larger scale VMWare/Xen type deployment. But when really made me sit up and take notice was Amazon’s Virtual Private Cloud (VPC) functionality. The super-duper short version of VPC is that it allows you extend your corporate network into the Amazon cloud. It does this by allowing you to define your own private network and connecting to it via site-to-site VPN technology. To describe how it works, diagrammatically check out the image below.

Let’s use an example to understand the basic idea. Let’s say your internal IP address range at your office is 10.10.10.0 to 10.10.10.255 (a /24 for the geeks). With VPC you tell Amazon “I’d like a new IP address range of 10.10.11.0 to 10.10.11.255” . You are then prompted to tell Amazon the public IP address of your internet router. The screenshots below shows what happens next:

The first screenshot asks you to choose what type of router is at your end. Available choices are Cisco, Juniper, Yamaha, Astaro and generic. The second screenshot shows you a sample configuration that is downloaded. Now any Cisco trained person reading this will recognise what is going on here. This is the automatically generated configuration to be added to an organisations edge router to create an IPSEC tunnel. In other words, we have extended our corporate network itself into the cloud. Any service can be run on such a network – not just SharePoint. For smaller organisations wanting the benefits of off-site redundancy without the costs of a separate datacenter, this is a very cost effective option indeed.

For the Cisco geeks, the actual configuration is two GRE tunnels that are IPSEC encrypted. BGP is used for route table exchange, so Amazon can learn what routes to tunnel back to your on-premise network. Furthermore Amazon allows you to manage firewall settings at the Amazon end too, so you have an additional layer of defence past your IPSEC router.

This is called Virtual Private Cloud (VPC) and when configured properly is very powerful. Note the “P” is for private. No server deployed to this subnet is internet accessible unless you choose it to be. This allows you to extend your internal network into the cloud and gain all the provisioning, redundancy and scalability benefits without exposure to the internet directly. As an example, I did a hosted SharePoint extranet where we use SQL log shipping of the extranet content databases back to the a DMZ network for redundancy. Try doing that on Office365!

This sort of functionality shows that Amazon is a mature, highly scalable and flexible IaaS offering. They have been in the business for a long time and it shows because their full suite of offerings is much more expansive than what I can possibly cover here. Accordingly my Amazon experiences will be the subject of a more in-depth blog post or two in future. But for now I will force myself to stop so the non-technical readers don’t get too bored. 🙂

So what went wrong?

So after telling you how impressive Amazon’s offering is, what could possibly go wrong? Like the Office365 issue covered in part 2, absolutely nothing with the technology. To understand why, I need to explain Amazon’s pricing model.

Amazon offer a couple of ways to pay for servers (called instances in Amazon speak). An on-demand instance is calculated based on a per-hour price while the server is running. The more powerful the server is in terms of CPU, memory and disk, the more you pay. To give you an idea, Amazon’s pricing for a Windows box with 8CPU’s and 16GB of RAM, running in Amazon’s “US east” region will set you back $0.96 per hour (as of 27/12/11). If you do the basic math for that, it equates to around $8409 per year, or $25228 over three years. (Yeah I agree that’s high – even when you consider that you get all the trappings of a highly scalable and fault tolerant datacentre).

On the other hand, a reserved instance involves making a one-time payment and in turn, receive a significant discount on the hourly charge for that instance. Essentially if you are going to run an Amazon server on a 24*7 basis for more than 18 months or so, a reserved instance makes sense as it reduces considerable cost over the long term. The same server would only cost you $0.40 per hour if you pay an up-front $2800 for a 3 year term. Total cost: $13312 over three years – much better.

So with that scene set, consider this scenario: Back at the start of 2011, a client of mine consolidated all of their SharePoint cloud services to Amazon from a variety of other another hosting providers. They did this for a number of reasons, but it basically boiled down to the fact they had 1) outgrown the SaaS model and 2) had a growing number of clients. As a result, requirements from clients were getting more complicated and beyond that which most of the hosting providers could cater for. They also received irregular and inconsistent support from their existing providers, as well as some unexpected downtime that reduced confidence. In short, they needed to consolidate their cloud offering and manage their own servers. They were developing custom SharePoint solutions, needed to support federated claims authentication and required disaster recovery assurance to mitigate the risk of going 100% cloud. Amazon’s VPC offering in particular seemed ideal, because it allowed full control of the servers in a secure way.

Now making this change was not something we undertook lightly. We spent considerable time researching Amazon’s offerings, trying to understand all the acronyms as well as their fine print. (For what its worth I used IBIS as the basis to develop an assessment and the map of my notes can be found here). As you are about to see though, we did not check well enough.

Back when we initially evaluated the VPC offering, it was only available in very few Amazon sites (two locations in the USA only) and the service was still in beta. This caused us a bit of a dilemma at the time because of the risk of relying on a beta service. But we were assured when Amazon confirmed that VPC would eventually be available in all of of their datacentres. We also stress tested the service for a few weeks, it remained stable and we developed and tested a disaster recovery strategy involving SQL log shipping and a standby farm. We also purchased reserved instances from Amazon since these servers were going to be there for the long haul, so we pre-paid to reduce the hourly rates. Quite a complex configuration was provisioned in only two days and we were amazed by how easy it all was.

Things hummed along for 9 months in this fashion and the world was a happy place. We were delighted when Amazon notified us that VPC had come out of beta and was now available in any of Amazon’s datacentres around the world. We only used the US datacentre because it was the only location available at the time. Now we wanted to transfer the services to Singapore. My client contacted Amazon about some finer points on such a move and was informed that they would have to pay for their reserved instances all over again!

What the?

It turns out, reserved instances are not transferrable! Essentially, Amazon were telling us that although we paid for a three year reserved instance, and only used it for 9 months, to move the servers to a new region would mean we have to pay all over again for another 3 year reserve. According to Amazon’s documentation, each reserved instance is associated with a specific region, which is fixed for the lifetime of the reserved instance and cannot be changed.

“Okay,” we answer, “we can understand that in circumstances where people move to another cloud provider. But in our case we were not.” We had used around 1/3rd of the reserved instance. So surely Amazon should pro-rata the unused amount, and offer that as a credit when we re-purchase reserved instances in Singapore? I mean, we will still be hosting with Amazon, so overall, they will not be losing any revenue al all. On the contrary, we will be paying them more, because we will have to sign up for an additional 3 years of reserve when we move the services.

So we ask Amazon whether that can be done. “Nope,” comes back the answer from amazons not so friendly billing team with one of those trite and grossly insulting “Sorry for any inconvenience this causes” ending sentences. After more discussions, it seems that internally within Amazon, each region or datacentre within each region is its own profit centre. Therefore in typical silo fashion, the US datacentre does not want to pay money to the Singapore operation as that would mean the revenue we paid would no longer recognised against them.

Result? Customer is screwed all because the Amazon fiefdoms don’t like sharing the contents of the till. But hey – the regional managers get their bonuses right? Sad smile

Conclusion

Like part 2 of this cloud computing series, this is not a technical issue. Amazon’s cloud service in our experience has been reliable and performed well. In this case, we are turned off by the fact that their internal accounting procedures create a situation that is not great for customers who wish to remain loyal to them. In a post about the danger of short termism and ignoring legacy, I gave the example of how dumb it is for organisations to think they are measuring success based on how long it takes to close a helpdesk call. When such a KPI is used, those in support roles have little choice but to try and artificially close calls when users problems have not been solved because that’s how they are deemed to be performing well. The reality though is rather than measure happy customers, this KPI simply rewards which helpdesk operators have managed to game the system by getting callers off the phone as soon as they can.

I feel that Amazon are treating this is an internal accounting issue, irrespective of client outcomes. Amazon will lose the business of my client because of this since they have enough servers hosted where the financial impost of paying all over again is much more than transferring to a different cloud provider. While VPC and automated provisioning of virtual servers is cool and all, at the end of the day many hosting providers can offer this if you ask them. Although it might not be as slick with fancy as Amazon’s automated configuration, it nonetheless is very doable and the other providers are playing catch-up. Like Apple, Amazon are enjoying the benefits of being first to market with their service, but as competition heats up, others will rapidly bridge the gap.

Thanks for reading

Paul Culmsee

(1) Comment

The cloud isn’t the problem–Part 2: When complex technology meets process…

Tags: Cloud compouting,Globalisation,Infrastructure,Office365,Process Improvement,SharePoint @ 6:32 pm

Hi all

Welcome to my second post that delves into the irrational world of cloud computing. In the first post, I described my first foray into the world of web hosting, which started way back in 2000. Back then I was more naive than I am now (although when it comes to predicting the future I am as naive as anybody else.) I concluded Part 1 by asserting that cloud computing is an adaptive change. We are going to explore the effects of this and the challenges it poses in the next few posts.

Adaptive change occurs in a number of areas, including the companies providing a cloud application – especially if on-premise has been the basis of their existence previously. To that end, I’d like to tell you an Office 365 fail story and then see what lessons we can draw from it.

Office 365 and Software as a Service…

For those who have ignored the hype, Office 365 known in cloud speak as “Software as a Service” (SaaS). Basically one gets SharePoint, Exchange mail, web versions of Office Applications and Lync all bundled up together. In Office 365, SharePoint is not run on-premise at all, and instead it is all run from Microsoft servers in a subscription arrangement. Once a month you pay Microsoft for the number of users using the service and the world is a happy place.

Office 365, like many SaaS models, keeps much of the complexity of managing SharePoint in the hands of Microsoft. A few years back, Office 365 would have been described in hosting terms as a managed service. Like all managed services, one sacrifices a certain level of control by outsourcing the accompanying complexity. You do not manage or control the underlying cloud infrastructure including network, servers, operating systems, SharePoint farm settings or storage. Furthermore, limited custom code will run on Office 365, because developers do not have back-end access. Only sandbox solutions are available, and even then, there are some additional limitations when compared to on-premise sandbox solutions. You have limited control of SharePoint service applications too, so the best way to think about Office 365 is that your administrative control extends to the site collection level (this is not actually true but suffices for this series.)

One key reason why its hard to get feature parity between on-premise and SaaS equivalents is because many SaaS architectures are based around the concept of multitenancy. If you have heard this word bandied about in SharePoint land is because it is something that is supported in SharePoint 2010. But the concept extends to the majority of SaaS providers. To understand it, imagine an swanky office building in the up-market part of town. It has a bunch of companies that rent out office space and are therefore tenants. No tenant can afford an entire building, so they all lease office space from the building and enjoy certain economies of scale like a great location, good parking, security and so on. This does have a trade-off though. The tenants have to abide by certain restrictions. An individual tenant can’t just go and paint the building green because it matches their branding. Since the building is a shared resource, it is unlikely the other tenants would approve.

Multi tenancy allows the SaaS vendor to support multiple customers with a single platform. The advantages of this model is economies of scale, but the trade-off is the aforementioned customisation flexibility. SaaS vendors will talk up this by telling you that SaaS applications can be updated more frequently than on-premise software, since there is less customisation complexity from each individual customer. While that’s true, it nevertheless means a loss of control or choice in areas like data security, integration with on-premise systems, latency and flexibility of the application to accommodate change as an organisation grows.

A small example of the restricting effect of multi-tenancy is when you upload a PDF into a SharePoint document library in Office 365. You cannot open the PDF in the browser and instead you are prompted to save it locally. This is because of a well-known issue with a security feature that was added to IE8. In the on-premise SharePoint world, you can modify the behaviour by changing the “Browser File Handling” option in the settings of the affected Web Application. But with Office 365, you have to live with it (or use a less than elegant workaround) because you do not have any access at a web application level to change the behaviour. Changing it will affect any tenants serviced by that web application.

Minor annoyances aside, if you are a small organisation or you need to mobilise quickly on a project with a geographically dispersed team, Office 365 is a very sweet offering. It is powerful and integrated, and while not fully featured compared to on-premise SharePoint, it is nonetheless impressive. One can move very quickly and be ready to go with within one or two business days – that is, if you don’t make a typo…

How a typo caused the world to cave in…

A while back, I was part of a geographically dispersed, multi-organisation team that needed a collaborative portal for around a year. Given the project team was distributed across varying organisations, various parts of Australia, the fact that one of the key stakeholders had suggested SharePoint, and the fact that Office 365 behaved much better than Google apps behind overly paranoid proxy servers of participating organisations, Office 365 seemed ideal and we resolved to use it. I signed up to a Microsoft Office 365 E3 service.

Now when I say sign up, Microsoft uses Telstra in Australia as their Office 365 partner, so I was directed to Telstra’s sign up site. My first hint of trouble to come was when I was asked to re-enter my email address in the signup field. Through some JavaScript wizard no doubt, I was unable to copy/paste my email address into the confirmation field. They actually made me re-type it. “Hmm” I thought, “they must really be interested in data validation. At least it reduces the chance that people do not copy/paste the wrong information into a critical field.” I also noted that there was also some nice JavaScript that suggested the strength of the password chosen as it was typed.

But that’s where the fun ended. Soon after entering the necessary detail, and obligatory payment details, I am asked to enter a mysterious thing listed only as an Organization Level Attribute and more specifically, “Microsoft Online Services Company Identifier.” Checking the question mark icon tells me that it is “used to create your Microsoft Online Services account identity.”

I wondered if this was the domain name for the site, as there was no descriptive indicator as to the significance of this code. For all I knew, it could be a Microsoft admin code or accounting code. Nevertheless I assumed it was the domain name because I just had a feeling it was. So I entered my online identity and away I went. I got a friendly email message to say things were in motion and I waited my obligatory hour or so for things to provision.

The inbox sound chimes and I received two emails. One told me I now have a “Telstra t-Suite account” and the other is entitled “Registration confirmation from Microsoft online.” I was thanked for purchasing and the email stated that “the services are managed via Microsoft Online Portal (MOP), a separate portal to the Telstra T-Suite Management Console.” I had no idea what the Telstra T-Suite Management Console was at this point, but I was invited to log into the Microsoft Online Portal with a supplied username and password.

At this point I swore…I could see by my username, that I made a typo in the Microsoft Online Services Company Identifier. Username: admin@SampleProjject.onmicrosoft.com – which means I typed in “SampleProjject” instead of “SampleProject” (Aargh!)

The saga begins…

Swearing at my dyslexic typing, I logged a support call to Telstra in the faint hope that I can change this before it’s too late… Below is the anonymised mail I sent:

“Hiya

In relation to the order below I accidentally set SampleProjject as the identifier when it should be SampleProject. Can this be rectified before things are commissioned?

Thanks

Paul”

Another hour passed by and my inbox chimed again with a completely unsurprising reply to my query.

“Hi Paul , sorry but company identifier can not be changed because it is used to identify the account in Office 365 database.”

Cursing once again at my own lack of checking, I cannot help but shake my head in that while I was forced to type in my email address twice (and with cut and paste disabled) when I signed up to Office 365, I was given no opportunity to verify the Microsoft Online Services Company Identifier (henceforth known as MOSI) before giving the final go-ahead. Surely this identifier is just as important as the email address? Therefore, why not ask for it to be entered twice or visually make it clear what the purpose of this identifier is? Then dumb users like me would get a second chance before opening the hellgate, unleashing forces that can never be contained.

At the end of the day though, the fault was mine so while I think Telstra could do better with their validation and conveying the significance of the MOSI, I caused the issue.

Forces are unleashed…

So I log into Telstra’s t-suite system and try and locate my helpdesk call entry. The t-suite site, although not SharePoint, has a bit of a web part feel about it – only like when you have fixed the height of a web part far too small. It turned out that their site doesn’t handle IE9 well. If you look closely the “my helpdesk cases” and “my service access” are collapsed to the point that I can’t actually see anything. So I tried Chrome and was able to operate the portal like a normal person would. My teeth gnashed once more…

Finally, being able to take an action, I open my support request and ask the following:

*** NOTES created by Paul Culmsee
Can I cancel this account and re provision? A typo was made when the MOSI was entered. The domain name is incorrect for the site.

A few emails went back and forth and I received a confirmation that the account is cancelled. I then return to the Office 365 site and re-apply for an E3 service. This time I triple checked my spelling of the MOSI and clicked “proceed.” I received an email that thanked me for my application and that I should receive a provisioning notification within an hour or so.

So I wait…

and wait…

24 hours went by and I received no notification of the E3 service being provisioned. I log into Telstra’s t-suite and log a new call, asking when things will be provisioned. Here is what I asked…

Hi there, I have had no notification of this being provisioned from Microsoft. Surely this should be done by now?

In typical level 1 helpdesk fashion, the guy on the other end did not actually read what I wrote. He clearly missed the word “no”

Hi Paul,

that’s affirmative. Your T-Suite order has been provisioned. As per the instructions in the welcome email you can follow the links to log in to portal.

Contact me on 1800TSUITE Option 2.3 to discuss it further. I’ll keep this case open for a week.

*sigh* – this sort of bad level 1 email support actually does a lot of damage to the reputation of the organisation so I mail back…

But I received no welcome email from Microsoft with the online password details… I have no means to log into the portal

This inane exchange costs me half a day, so I took Telstra’s friendly advice and contacted them “on 1800TSUITE Option 2.3 to discuss it further.” I got a pretty good tech who realised there indeed was a problem. He told me he would look into it and I thanked him for his time. Sometime later he called back and advised me that something was messed up in the provisioning process and that the easiest thing to do, was for him to delete my most recent E3 application, and for me to sign up from scratch using a totally different email address and a totally new MOSI. Somehow, either Telstra’s or Microsoft’s systems had associated my email address and MOSI with the original, failed attempt to sign up (the one with the typo), and it was causing the provisioning process to have an exception somewhere along the line.

In hearing this, I can imagine some giant PowerShell provisioning script with dodgy exception handling getting halfway through and then dying on them. So I was happy to follow the tech’s advice went through the entire Office 365 sign up process from the very beginning again (this is the third time). This time I used a fresh email address and quadruple checked all of the fields before I provisioned. Eureka! This time things worked as planned. I received all the right confirmation emails and I was able to sign into the Microsoft online portal. From there I created user accounts, provisioned a SharePoint site collection and we were ready to rock and roll. Although the entire saga ended up taking 5 business days from start to finish, I have my portal and the project team got down to business.

Now for what it’s worth, it should be noted that if you are an integrator or are in the business of managing multiple Office 365 services, Telstra requires a different email address to be used for each Office 365 service you purchase. One cannot have an alias like provision@myoffice365supportprovider as the general account used to provision multiple E1-E4 services. Each needs its own t-suite account with a different email address.

Plunged into darkness…

Things hummed along for a couple of months with no hiccups. We received an invoice for the service by email, and then a couple of days later, received a mail to confirm that in fact the invoice has been automatically paid via credit card. For our purposes, Office 365 was a really terrific solution and the project team really liked it and were getting a lot of value out off it.

I then had to travel overseas and while I was gone, suddenly the project team were unable to login to the portal. They would receive a “subscription expired” message when attempting to login. Now this was pretty serious as a project team was coming to an important deadline and now no-one could log in. We checked the VISA records and it seemed that the latest invoice had not been deducted from the account as there was still a balance owing. Since I was in overseas, one of my colleagues immediately called up Telstra support (it was now after hours in Perth) and was stuck in a queue for an hour and then ended up speaking to two support people. After all of the fuss with the provisioning issues around the MOSI and my typo, it seemed that Telstra support didn’t actually know what a MOSI was in any event. This is what my colleague said:

I was asked for an account number straight away both times, and I explained that I didn’t have one, but I did have the invoice number in question, and that this was a Microsoft Office 365 subscription. They were still unable to locate the account or invoice. I then gave them the MOSI, thinking this would help. Unfortunately, they both had no idea what I was talking about! I explained that users were unable to login to the site with a ‘subscription expired’ error message. I also explained the fact that the VISA had not been processed for this period (although it was fine in the last period).

Both support staff could not access the Office 365 subscription information (even after I gave them our company name). Because I called after hours, t-suite department was not available. The two staff I talked to could not access the account, so could not pull up any of the relevant details. It turns out that after business hours, Telstra redirect t-suite support to the mobile and phones department. The first support person passed me onto technical but the transfer was rerouted to the original call menu – so I went through the whole thing again, press x for this, press x for that, etc. The second time round, I explained it all over again. The tech assured me that it couldn’t be a billing issue and that Telstra generally would not suspend an account because of a few days late payment. If that was the case, prior to suspension, Telstra would send out an email to notify customers of overdue payment. I told him that no such email had been sent. He then said that it would most likely be a technical problem and would have to be dealt with the next day as the T-suite department would not be available til next morning between office hours 9-5pm EST.

I hung up frustrated, no closer to solving the problem after two hours on the phone.

My colleague then got up early and called Telstra at 6am the next day (9am EST is 3 hours ahead of Perth time). She explained the situation again to Telstra t-suite support person all over again. Here again is the words of my colleague:

The first person who took my call (who I will call “girl one”) couldn’t give me an answer and said she’d get someone to call back, and in the meantime she’d check with another department for me. She put me on hold and during this time the call was re-routed back to the original menu when you first call. I thought that instead of waiting for a call that I may not receive soon as this was an emergency, I went through the menu again. This time I got “girl two” and explained the whole thing *again*. I got her to double check that the E3 subscription was set to automatically deduct from the VISA supplied – yes, it was. She noticed that it said 0 licenses available. She told me that she was not sure what that was all about, so would log a call with Microsoft. Girl two advised me that it could take any time between an hour to a few days for a response from Microsoft.

I then got a call from Telstra (girl three) on the cell-phone just after I finished with girl two. This was the person who girl one promised would call back. I told her what I’d gone through with all support staff so far, and that “girl two” was going to log a call with Microsoft. Girl three, like girl two, noticed the 0 licenses available. She wasn’t sure that it was because there were none to begin with or that there were no more available. I stated that the site had been working fine till yesterday. I explained that no one could access the site and that they all got the same message. Same as girl two, girl three advised that she would also log a call with Microsoft. Again, I was told that it could take up to several days before I could get a reply.

Half an hour later, we received an email from Telstra t-suite support. It stated the following:

Case Number: xxxxxxxx-xxxxxxx

Case Subject: subscription has expired for all users

I checked your account info and invoices. The invoice xxxxx paid for 01 Oct to 01 Nov was for company ID SampleProjject not SampleProject. Please call billing department to change it for you.

With this email, we now knew that the core problem here was related to billing in some way. As far as we had been told, Telstra had deleted the original two failed Office 365 subscriptions, but apparently not from their billing systems. The bill was paid against a phantom E3 service – the deleted one called “SampleProjject”. Accordingly the live service had expired and users were locked out of the system.

As instructed in the above email, my colleague called up t-suite billing (there was a phone number on the invoice). In her words:

Once again, the support person asked for the account number to which I said I didn’t have one. I offered him the invoice number and the MOSI, thinking someone’s got to know what it was since it was ‘used to identify the account in Office 365 database.’ He stated he could not ‘pull up an account with the MOSI’ and said something to the effect that he didn’t know what the MOSI ‘was all about’. He asked what company registered the service and I gave him our details. He immediately saw several ‘accounts’ in the billing system related to our company. He noted that the production E3 was a trial subscription and the trial had now expired and he surmised that the problem was most likely due to that fact. I queried why this was the case when the payment subscription was set to automatically deduct from the supplied visa account. He told me that as going from trial to production was a sales thing, I would have to speak to t-suite sales department. He also added that we were lucky because there was a risk that the mistakenly expired E3 service could have been deleted from Office 365.

I called up sales and finally, they were able to correct the problem.

So after a long, stressful and chaotic evening and morning, Armageddon was averted and the portal users were able to log in again.

Reflections…

This whole story started from something seemingly innocuous – a typo that I made on a poorly described text box (MOSI). From it, came a chain of events that could have resulted in a production E3 service being mistakenly deleted. There were multiple failures at various levels (including my bad typing that set this whole thing off). Nevertheless, first thing that becomes obvious is that this was a high risk issue that had utterly nothing to do with the Office 365 service itself. As I said, the feedback from the project team has been overwhelmingly positive for Office 365. There was no bug or no extended outage because of any technical factors. Instead, it was the lack of resilience in the systems and processes that surround the Office 365 service. At the end of the day, we got almost nailed because of a billing screw-up. It was exacerbated by some poor technical support outcomes. Witness the number of people and departments my colleague had to go through to get a straight answer, as well as the two times she was redirected back to the main phone menuing system when she was supposed to be transferred.

Now I don’t blame any of the tech support staff (okay, except the first guy who did not read my initial query). I think that the tech support themselves were equally hamstrung by immature process and poor integration of systems. What was truly scary about this issue was that it snuck up upon us from left field. We thought the issue was resolved once the service was finally provisioned (third time lucky), and had email receipts of paid invoices. Yet this near fatal flaw was there all along, only manifesting some three months later when the evaluation period expired.

I think there are a number of specific aspects to this story that Microsoft needs to reflect on. I have summarised these below:

Why is the registration process to sign up to Office 365 via Telstra such a complete fail of the “Don’t Make Me Think” test.
Why is the significance of the MOSI not made more clear when you first enter it (given you have to enter your email address twice)?
Why did no-one at all in Telstra support have the faintest idea what a MOSI is?
When you entrust your data and service to a cloud provider how confident do you feel when tech support completely misinterprets your query and answers a completely opposite question?
How do you think customers with a critical issue feel when the company that sits between you and Microsoft tells you that it will take “between an hour to a few days for a response from Microsoft”. Vote of confidence?
How do you think customers with a critical issue feel when the company that sits between you and Microsoft redirects tech support to their cell phone division after hours?
How do you think customers with a critical issue feel when the company that sits between you and Microsoft has to pass you around from department to department to solve an issue, and along the way, re-route you back to the main support line?
We were advised to delete our E3 accounts and start all over again. Why did Telstra’s systems not delete the service out of their billing systems? Presumably they are not integrated, given that from a billing perspective, the old E3 service was still there?

Now I hope that I don’t sound bitter and twisted from this experience. In fact, the experience reinforced what most in IT strategy already know. It’s not about the technology. I still like what Office 365 offers and I will continue to use and recommend it under the right circumstances. This experience was simply a sobering reality check though that all of the cool features amounts to naught when it can be undone by dodgy underlying supporting structures. I hope that Microsoft and Telstra read this and learn from it too. From a customer perspective, having to work through Telstra as a proxy for Microsoft feels like additional layers of defence on behalf of Microsoft. Is all of this duplication really necessary? Why can’t Australian customers work directly with Microsoft like the US can?

Moving on…

No cloud provider is immune to these sorts of stories – and for that matter no on-premise provider is immune either. So for Amazon fanboys out there who want to take this post as evidence to dump on Microsoft, I have some news for you too. In the next post in this series, I am going to tell you an Amazon EC2 story that, while not being an issue that resulted in an outage, nevertheless represents some very short sighted dumbass policies. The result of which, we are literally forced to hand our business to another cloud provider.

Until then, thanks for reading and happy clouding 🙂

Paul Culmsee

www.sevensigma.com.au

(10) Comments

The cloud is not the problem–Part 1: Has it been here all along?

Tags: Cloud compouting,Globalisation,Infrastructure,Process Improvement,ROI,SharePoint @ 11:01 pm

Hiya

I have been meaning to write a post or three on cloud computing, and its benefits, challenges and eventual legacy. I’ve finally had some time to do so. This series will span over a few posts (not sure how many at this stage) and will focus mainly on SharePoint. In short, I think the cloud is a shining example of innovation, combined with human irrationality, poorly thought out process with a dash of organisational dysfunction. In this first post, I will give you a little cloud history lesson, through the eyes of a slightly jaded IT infrastructure person. To that end, I will try and do the following throughout this series:

Educate readers to some conceptual aspects of cloud computing and why it matters
Highlight aspects to cloud computing that are current being conveniently overlooked by proponents (and opponents)
Look at what the real challenges are, not just for organisations utilising it, but for the organisations providing cloud services
Highlight what the future might look like from a couple of perspectives
As always, take a relatively dry topic and try and make this entertaining enough that you will want to read it through 🙂

So let’s roll the clock back a decade or so and set the scene…

In the beginning…

In the height of the dotcom boom of 2000, I took a high paying contract position for a miner-turned-ISP. You see, back then it was all the rage for “penny stock” mining companies – who had never actually dug anything of value out of the ground – to embrace “The interweb” by becoming an Internet Service Provider. Despite having no idea whatsoever about what it entailed to be an ISP, instantly they would enjoy at least a fiftyfold increase in stock price and all the adulation of those dotcom investors who actually believed that there was money to be made.

Lured from my stable job by the hubris-funded per-hour rate and a cooler job title, I designed and ran an ISP from late 1999 till late 2004, doing all things security, Linux, Cisco and Microsoft. Back then, the buzzword of choice was “hosting”. Of course, the dotcom bubble popped big time and the market collapsed back to cold hard reality pretty quickly. Like all organisations that rode the wave, we then had to survive the backwash of a pretty severe bear market. Accordingly, my hourly rate went down and our ISP sales guys dutifully sold “hosting solutions” to clients that were neither useful nor appropriate. The best example of this is when someone sold a hosted exchange server to a company of 300 staff with no consideration whatsoever of bandwidth, security and authentication (remember that this was the era of Exchange 2000, immature Active Directory deployments and 1.5/256 megabit ADSL connections).

We actually learnt a lot from dumbass stuff like this (and we went through a seemingly endless number of sales guys as a result). By the end of the journey, we did some good work and had a few success stories. The net result of riding the highs and lows of the dotcom boom, was my conclusion that if you had a public IP address and a communications rack with decent air conditioning, you were pretty much a hosting provider.

Then in 2004 I took a different job with a different company. They hired me because they had just acquired a fairly well-known “hosting provider” who had gone through some tough times. I was tasked with migrating the hosting infrastructure – and the sites hosted on it – to the parent company premises and integrate it with the existing infrastructure. So imagine my shock when on day one, I arrive onsite to see that the infrastructure of this hosting provider was essentially a store room, full of clone PC’s with panels removed, sitting in a couple of communications racks, with a cheap portable fan blowing onto it all to keep it cool and with no redundant power (in fact one power cord was sticky taped to the floor and led out the room to the nearest outlet). As it happened, some very high profile websites ran on this infrastructure.

This period I describe as “my bitter and twisted days” as I had a limited time to somehow migrate this mess to the more robust infrastructure of the parent company. This was the period where I became a bit of an IT control freak and used to take a dim view of web developers who dared to ask me a dumb question. I also subsequently revised my view of hosting. I decided that if you had a public IP address and a comms rack with completely crap air conditioning, you were pretty much a hosting provider. After all, when you access a website, did you ever stop to consider where it physically might reside?

…and henceforth came “the cloud”

Before SharePoint 2010 came out, I used to do talks where I put up the SharePoint 2007 pie and asked people what buzzword was missing. Many hands would rise and the answer was always “cloud”. Cognisant of this, I redrew Microsoft’s marketing diagram to try and capture the essence of this this new force in enterprise IT. I suggested that Microsoft would jump on the cloud big-time with SharePoint 2010. How do you think I did? Smile

As it turned out, Microsoft for some reason opted not to use my suggested logo and instead went with that blue Frisbee with fresh buzzwords to replace the 2007 ones that had reached their saturation point. Nevertheless, the picture above did turn out to be prophetic: The era of the cloud is most definitely upon us, along with the gushing praise that often accompanies any flavour of the year technology.

Now in one sense, nothing much has changed from the days of web hosting. If you have an IP address with a webserver on the end of it, you can pretty much call yourself a cloud provider. This is because at the end of the day, we are still using the core ingredients of TCPIP, DNS, HTTP, communications racks and supposedly good air conditioning. When you access something in “the cloud”, you have no visibility as to the quality of the infrastructure on the other end. For all you know, it could be a store room being kept cool with a dodgy fan and some sticky tape :-).

But while that’s a cynical view, its is also naively simplistic. Like all fads that come and go, things are always changed as a result. The truth is that there has been changes from the days of web hosting that will change the entire face of IT in the coming years.

The major difference between this era and the last is the advancement in technology beyond those core ingredients of TCPIP, DNS and HTTP. Bandwidth has became significantly cheaper, faster and more reliable. Virtualisation of servers (and services) not only gained momentum, but is now a mature technology. My own evidence for this fact is that I haven’t put SharePoint web front end servers onto non-virtualised infrastructure for a couple of years now. Add to that the fact that the tools and systems that we use to build web solutions are now much more powerful and sophisticated. As a result, “cloud” applications now reflect a level of sophistication and features way beyond their web based email origins. Look at Office 365 as a case in point. Microsoft have bet big-time on this type of offering. I’m sure that most architectural diagrams currently drawn all over Microsoft whiteboards for SharePoint vNext, will be all about reworking the plumbing to create feature parity between on-premise SharePoint and it’s cloud based equivalent.

It’s interesting stuff indeed.

Now, perhaps because I had an ISP/hosting ringside seat, I could see all of this happening way back in 2000 – more than a decade ago. Not only could I see it, I experienced the pain of early adopters trying to do it (witness the example of the hosted Exchange 2000 “solution” I started this post with). But a decade later, cloud based infrastructure now realises the sort of capabilities that I was able to foresee in my ISP days. We have access to unlimited storage and scalability. With it, I can save massive time and effort to get complex systems up and running. In this fast-moving age we find ourselves in, being able to mobilise resources and be productive quickly is hugely important. Recognising this, companies like Amazon, Google and Microsoft leverage their incredible economies of scale, as well as the sheer depth of technical expertise to make some rather compelling offerings. Bean counters (i.e. CFO’s and CIO’s with tight budgets) suddenly realised that the cost to “jack-in” to a cloud based solution is way less costly than the traditional manner of up-front costs of hardware, licensing, procurement and configuration.

The cloud offers minimal entry cost because for the most part, it is based on a pay-for-use model. You stop paying for it when you stop using it. Buying servers are forever, but the cloud is apparently not. Furthermore, the economies of scale that the big boys of the cloud space offer, usually far exceeds what can be done via internal IT resources anyway. This extends past sheer hardware scalability and includes security, reliability and performance monitoring. As a cloud provider customer, you will not just expect, but assume that companies like Microsoft, Amazon and Google can use their deep pockets to hire the best of the best engineers, architects and security practitioners. Organisational decision makers look increasingly longingly at the cloud, in the face of internal IT costs being high.

Even the most traditional on-premise IT vendors are getting in on the act. Consider SAP, previously a bastion of the “on-premise” model. Their American division just shelled out US$3.4 billion to buy a cloud provider called SuccessFactors (3.4 billion = 50% premium to SuccessFactors share price.) Why did they do this? According to Paul Hamerman (the bold areas are mine).

“SAP’s cloud strategy has been struggling with time-to-market issues, and its core on-premise HR management software has been at competitive disadvantage with best-of-breed solutions in areas such as employee performance, succession planning and learning management. By acquiring SuccessFactors, SAP puts itself into a much stronger competitive position in human resources applications and reaffirms its commitment to software-as-a-service as a key business model.”

If that wasn’t enough, consider some of Gartner’s predictions for 2012 and beyond. One notable predictions is that by year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud. Closer to home for me, I have a client who has a ten-year BHAG (known as a Big, Hairy Audacious Goal). While I can’t tell you what this goal is, I can tell you that they have identified a key success metric that currently takes them around 12 months to achieve. Their BHAG is to reduce this time from 12 months to 4 weeks and achieve this within a decade. Essentially they have a time-to-market issue – similar to what Hamerman outlined with SAP. By utilising cloud technology and being able to procure the necessary scalability at the click of a button and the swipe of a credit card, I was able to save them one month almost straight away and make a massive inroad to their organisation-wide strategic goal.

So it seems that in the rational world of key performance indicators and return on investment, and given the market trends of large, mainstream vendors going “cloud”, it would seem that we are in the midst of a revolution that has an unstoppable momentum. But of course, the world is not rational is it? If it were, then someone would be able to explain to me why the US still uses the imperial system given that every other country (save for Liberia and Myanmar) has now changed to metric (yes my US readers, the UK is actually metric).

The irrational road ahead…

In this first post I have painted a picture of the “new reality” – the realisation of what I first saw in 2000 is now upon us. While this first post might sound like gushing praise of all things cloud, rest assured that this is not the case. I deliberately titled this post “the cloud is not the problem” because we are going to dive into the seedy underbelly of this brave new cloudy world we find ourselves in. My contention is that cloud computing is an adaptive challenge, which by definition, questions certain established ways of doing things. Therefore it has an effect on the roles, beliefs, assumptions and values behind the established order. In the next post or three, we are going to explore some of the less rational sides of “the cloud” at a number of levels. Furthermore, the irrationality often tends to be dressed up as rationality, so we have to look behind the positive and negative straw-man arguments we are currently hearing about, to what is really going on. Along the way I hope to develop your “cloud computing strawman argument” radar, so you can smell manure when its inevitably dished out to you 🙂

The general breakdown of this series will be as follows:

I’ll start by chronicling my experience with Microsoft’s new Software as a Service (Saas) offering: Office 365, as well as Amazon’s Platform as a Service Offering (EC2). Both are terrific offerings, but are let down by things that have nothing to do with the technology. From there we will move into looking at some of the existing roles and paradigms that are impacted by the move to cloud solutions, and the defence mechanisms that will be employed to counter it. I’ll end the series by taking a look at the cloud from a longer term perspective, based on the notion of systems theory (which despite its drop-dead boring sounding premise is actually quite interesting).

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(6) Comments

Why SharePoint training sometimes doesn’t deliver (and what to do about it)

I was surprised to see the recent SharePoint Fatigue Syndrome post got some traction in the interweb. As it happened, that particular post was kicking around in an unfinished state for months. The thing is, its not the only “home truth” type of post that I have sitting in my “drafts” folder. I also have one on the state of the SharePoint training market. Given that I have a training announcement to make, I thought that I would combine them.

A day in the life…

We recently worked on a SharePoint upgrade project, where the previous developers did an excellent job overall. That is…if you judge them on the SharePoint governance metrics of writing clean and maintainable code, packaging it up properly, not hacking away at system files and actually writing documentation.

Unfortunately, although they did an excellent job through that lens, the actual solution, when judged on whether users found that it made their life easier, it was an epic fail. Users hated it with passion and like many solution that users hate, the system was soon relegated to being a little-used legacy platform where the maintenance costs now outweighed the benefits. The organisation had invested a couple-hundred thousand dollars on this solution and saw very little value for that money. Accordingly, they took their business elsewhere…to us. After a workshop, the client had one of those inverse “aha” moments when they realised that if they had taken a little more time to understand SharePoint, the custom solution would have never been developed in the first place.

This sort of example, to me, highlights where SharePoint governance goes so wrong. The care and diligence the developers exercised was necessary, but clearly not sufficient. No matter what the quality of the code, the unit testing regime and its packaging, at the end of the day a blueberry pie was baked and the client wanted an apple pie. The problem was not in the ingredients or the baking. The problem was that by the time they delivered the pie, it was clear that the wrong recipe was used. In the above case, the developer had omitted a whole raft of critical considerations in creating the solution – none of which were covered in developer training.

Necessary but not sufficient…

When you think about it, the current approach to SharePoint training seems not to be about recipes, but all about ingredients. Trainees get shipped off to “boot camps” for an indoctrination of all of the ingredients in the cupboard (and SharePoint is a bloody big cupboard!). SharePoint features and components are examined in individual detail, usually with an accompanying exercise or lab to demonstrate competency in that particular component. Graduates then return with a huge list of ingredients, but still no skills in how to develop the right recipes.

What exacerbates this problem is that training is siloed across disciplines. As an example: An “IT Pro” bootcamp will go into meticulous detail about performance, scalability and design aspects. Any considerations around development, information architecture and user engagement are seen through the lens of the infrastructure nerd. (Ah – who am I kidding… user engagement in an IT pro bootcamp has never happened. )

Now consider for a second, how we design SharePoint sites. These days, it is common for people to actively discourage designing SharePoint solutions based on organisational departmental boundaries. (By organisational departmental boundaries I mean Marketing, HR, IT etc.) Why is this design approach frowned upon? Proponents claim that it tends to perpetuate the problem of information silos and doesn’t stand the test of time, given that organisations tend to restructure just when your information architecture masterpiece is ready for prime time. In fact, the research organisation Jackob Neilsen did a study and found that task based structures (characterised by “My…” and “I need to…”) endured better than organisational based structures. Quoting from them:

In our study, task-based structures often endured better than intranets organized departmentally. In our user testing of intranets, we’ve also found that task-based navigation tends to facilitate ease-of-learning. Thus, the benefits for IA durability are just one more argument in favor of adopting a task-based structure for your intranet.

So what I find ironically funny is the second sentence of the Jackob Neilsen quote: “Ease-of-learning.” I wonder what sort of learning they are talking about? Presumably something other than delivering a failed solution with some really nice programming governance behind it! Yet the way SharePoint training is designed and marketed actually compartmentalises SharePoint training into similar silos. The result? Students get a rose coloured view of the SharePoint world, based on their discipline. This is because, as Ackoff brilliantly put it, “complexity is in the eye of the beholder – the other persons job always looks simple”.

By the way, what I am highlighting is not the fault of the trainers because at the end of the day, they respond to what they think the market wants. Sadly, what the market thinks it wants is often not what it needs.

I feel that the missing link – and most critical aspect of SharePoint training for practitioners – is not about how many ingredients you know, but how you go about creating those recipes. Yet SharePoint training overly focuses on what each ingredient does in isolation – whether a job discipline or a particular component. Whilst I fully accept that knowing the ingredients is a necessity, it is clearly not sufficient. This is an airbrushed version of reality, without due consideration of how ingredients combine in unique scenarios. Accordingly, this training does nothing to teach how to achieve shared understanding between practitioner and the eventual users who have to live with the legacy of what is delivered.

When you think about it, shared understanding is what makes or breaks SharePoint success because it is the pre-requisite to shared commitment to a solution. As demonstrated by the example of great code underpinning a crap solution, lack of shared understanding and commitment will always trump any other good work performed.

What to do about it…

SharePoint is a product that often requires adaptive change on the part of users. Learning the capabilities of the product is one thing – changing entrenched collaborative practice is another altogether. In case you haven’t noticed before, users tend not to be charmed by new, shiny features if they cannot see how it will make their jobs easier. (Nerdy knowledge workers like you and me easily get seduced by shiny things but our world view is seriously skewed compared to those who live on the coal face of organisations). Thus, the skills required to facilitate change and align various roles, require a different type of training course: one that integrates rather than compartmentalises. One that teaches how to synthesise the whole, rather than reductionise into the parts.

For such a course, no virtual machines are needed because there are no labs to demonstrate competence in some SharePoint component that will be out of date by SharePoint vNext. Instead, such a course needs to focus on the concepts, patterns and practices that are typically not seen in the IT practitioners toolkit (and for that matter, not seen in many complex mainstream IT/PM methodologies). The added bonus for such a course is that the skills and learning’s it provides are applicable beyond SharePoint and even beyond IT itself. While a typical SharePoint might give you mileage for the current version, a course like what I describe will give you tools that you can use anywhere, irrespective of the technology and project.

Does such a class exist? (Is that the longest post you have ever read to get to such a rhetorical question? Smile )

Of course it exists – I’ve been running it around the world for a couple of years now. It’s called the SharePoint Governance and Information Architecture Class (#SPGovIA) and it was a year in the making and comes with lots of goodies, such as a CD with a sample performance framework, governance plan, SharePoint ROI calculator (spreadsheet) and sample mind maps of Information Architecture. The class was originally designed for Microsoft New Zealand, on behalf of 3Grow for the Elite program that used to certify gold partners for serious SharePoint competence. Since then its been run in the UK, Netherlands, US, Australia and New Zealand. Next month I will run classes in Singapore and Hong Kong.

For my US readers, early next year I will be taking the course on the road, specifically Canada and the USA in Feb 2012. This course is not run often, because for me the US is a damn long way to travel and my time is tight these days! So I sincerely hope that if this sort of class sounds interesting to you, then you will consider being part of it. Michal Pisarek has already made an announcement for classes in Vancouver, and more details will be forthcoming for one or two US cities. I only have time for 2 classes in North America, so which city should it be?

For more detail on the class, head on over to www.spgovia.com. While there, click the Media link and watch the first half hour of the class. I look forward to seeing you there.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(8) Comments

SharePoint Fatigue Syndrome

Tags: Business Analysis,Business Process Management,Collaboration,Governance,Human Process Management,Knowledge management,Non Linear Process,organisational culture,planning,Process Improvement,Project Management,Risk,shared understanding,SharePoint,social fragmentation,Strategy,user engagement,web2.0,Wicked Problems @ 9:26 pm

Hiya

I have been wrong about many things – I am happy to admit that. In SharePoint land, one of my bigger naive assumptions was that in early 2007, I figured I’d have maybe a 6 month head start before the rest of the industry began to learn from its initial SharePoint deployment mistakes and start delivering SharePoint “properly.” I thought that I’d better make hay while the sun shines, so to speak, as the market would tighten up as more players entered it.

Yet here we are, heading to the latter half of 2011 – some five years later. As I continue to go into organisations, whether in a SharePoint remedial capacity, or a training/architect capacity, I am still seeing the legacy of really poor SharePoint outcomes. Furthermore I am seeing other, frankly disturbing trends that leave me both concerned and pessimistic. I now have a label for this concern: “SharePoint Fatigue Syndrome.” SharePoint Fatigue Syndrome is hard to define, yet its effects are there for all to see. I suffer from it at times, and I am certain others do too. As an example, recently on the Perth SharePoint User Group on LinkedIn the following topic for discussion was raised:

Hi folks, as you already know we have a worrying skills shortage in SharePoint Development / Architecture in Perth and things are getting worse. It’s getting to the stage where companies have to suspend or worse still abandon their SharePoint projects due to lack of available talent. As the core of the SharePoint community in Perth your suggestions are vital towards finding real solutions to this growing problem. What can be done?

Now I know that this problem is not just limited to Perth. There are consistently reports online that speak of SharePoint people being in demand. So you would think that in a “hot” sector like SharePoint, where the industry is crying out for talent, that the rate of attrition would not outpace the uptake of new talent. After all – money talks, right? If you are a .NET developer with half a brain, there is serious money to be made in SharePoint development land. On top of that, there is the collective realisation in the marketplace that actually talking to people about how SharePoint could make their lives easier, leads to better outcomes. Hence the emergence of this notion of a “SharePoint Architect” with a more varied skill-set that just tech or dev. This role has further been legitimised by entire conferences now just catering to the business this end of the market (I am thinking the Share conferences here).

So, we have all of this newfound collective wisdom spreading through the community via various channels, in terms of the skills and roles required in SharePoint circa 2011 and beyond. We have the fat pay-packets being commanded as a result of demand for these skills. So, with that in mind, why is the attrition rate growing?

As an example, I know personally, several exceptional SharePoint practitioners who are no longer in SharePoint. I’ve also had various quiet conversations with many SharePoint practitioners, right up to SharePoint MCM’s, who vent their various frustrations on how difficult it is to get truly lasting SharePoint solutions in their clients and organisations. I’ve reflected on the various reasons I have come to the conclusion that SharePoint is just plain tiring. As a result, people are burning themselves out.

7 causes of SharePoint Fatigue Syndrome…

Burnout, in case you are not aware, is actually a lack of emotional attachment to what you are doing. Quoting about.com:

The term “burnout” is a relatively new term, first coined in 1974 by Herbert Freudenberger, in his book, “Burnout: The High Cost of High Achievement.” He originally defined ‘burnout’ as “the extinction of motivation or incentive, especially where one’s devotion to a cause or relationship fails to produce the desired results.

SharePoint Fatigue Syndrome is SharePoint manifested burnout. The symptoms include feeling physically and emotionally drained, difficulty maintaining optimism and energy levels, feeling that you have less to give as the burden of work seems overwhelming. Sound familiar?

So why does SharePoint work run this risk? I see 7 major reasons.

1: Cost pressure leading to overwork

First up, the lure of the big dollar is a double edged sword. Not long ago I shared a beer with a SharePoint developer who’s work I respect greatly, yet I can’t afford to hire. This is because the percentage of chargeable hours he would need to work just so I can break even, is very high. This puts me (the employer) under pressure and at risk. As a result, I need to ensure that my newly minted SharePoint employee is productive from day 1 and I need him to work a lot of hours. But here is the irony. When I had my beer with this developer, the conversation started with him lamenting to me that he is already pulling ridiculous hours (60 to 80 per week). He was looking for a job with less hours and yet more money. This is simply not sustainable, both for employer and employee. The more you chase one (work hours vs. money), the more you lose the other it seems.

2: Structures that force an inappropriate problem solving paradigm (and wicked problems of course)

Then there is the broader problem where structure influences behaviour. As a basic example: from the developers’ perspective, they have to put up with sales guys who promise the world, and project managers who then make their life hell and force them to cut corners delivering the impossible. Project managers find out that their beloved work breakdown structure gets chopped and changed when their pain-in-the-ass developers whine that they can’t make the schedule. As I have stated many times previously, SharePoint project are likely to have wicked problem aspects to them. The structures that work well to deliver tame problems, such as Exchange, a VOIP system or a network upgrade, are much less effective for SharePoint projects. While organisations persist with approaches that consistently fail to deliver good outcomes, and don’t look at the structural issues, the attrition rate will continue. There is only so much that someone can take putting up with these sorts of stresses.

3: Technical complexity

SharePoint’s technical complexity plays a part too. No-one person understands the product in its entirety. The closest person I know is Spence and ages ago on twitter he remarked that even within Microsoft no-one understood it all. As a result, it is simply too easy to make a costly mistake via an untested assumption. (I thought the user profile service was tough – until I did federated claims authentication and multitenancy that is). The utter myriad of features, design options and their even greater number of caveats, mean that one can make a simple design mistake that causes the entire logical edifice of an information architecture to come crashing down. Many have experienced the feeling of having to tell someone that the project time and cost is about to blow out because nobody realised that, say, Managed Metadata has a bunch of issues that precludes its use in many circumstances. Accordingly, SharePoint architects learn pretty quickly that it is hard to answer a definite “yes” to many questions, because to do so would require a question worded like a contractual clause, to ensure it is framed with appropriate caveats. Even then, consultants would know that lingering feeling in the back of their mind that they might have missed an assumption. This brings me onto…

4: Pace of change

This is BIG…and becoming more acute. Remember the saying ‘The only certainties in life are death and taxes?’ Outside of that, the future is always unpredictable.

In between SharePoint 2003 and SharePoint 2007, the wave of Web 2.0 and social networking broke, forever changing how we collaborate and work with information online (and some of those effects are still to be felt). Microsoft, like any smart organisation, responds to the sentiment of its client base. Microsoft also, like most mature organisations, tends to hedge its bets in terms of marketplace strategy in which it tries to get in on the act with the cool kids, yet tries not to kill the goose that laid the golden egg. Just look at Windows 8’s new interface, tablets, app stores and the cloud.

But that is one facet. Change happens in many forms and at many scales. For example, at a project level, it may mean a key team member leaves the organisation suddenly (SharePoint fatigue no doubt). At a global and organisational level, events like the odd global financial crisis force organisations to change strategic focus very quickly indeed.

I don’t know about you, but when Windows 8 was announced recently I was not excited (in fact I was not excited by SharePoint 2010 either). I thought to myself “So soon? I am still figuring out the current platform!”

As an example of the effect of pace of change, consider all the Government 2.0 initiatives around the world. Collaboration is in-vogue baby, so information should be free and government agencies should engage with the community. While that’s nice and all, there is the world of compliance, security and records management that takes a very different view. So, we end up with market forces that push against each other in combination with vendors hedging strategy of being all things to all people. It’s little wonder that SharePoint projects become very complex very fast.

By the way, it is worthwhile checking out what Bill Brantley in this post sums up the whole government 2.0 issue when he said:

What exactly is the nature of the Gov 2.0 challenge? This question was inspired by Andrew Krzmarzick’s post (What Gov 2.0 Needs Now: Managers, Money and Models) and Christina Morrison’s post (What is Gov 2.0? A survey of Government IT pros) on the recent GovLoop survey about Gov 2.0. As Andrew and Christina argued, the survey demonstrates many differing perspectives on Gov 2.0 in terms of what it actually means and how to implement Gov 2.0. To me, this suggests that Gov 2.0 is the classic wicked problem

5: SharePoint Entropy

One of my clients (who you will meet in my book when it’s published), once said to me “All good ideas eventually deteriorate into hard work.” This is a nice way to lead into the concept of SharePoint entropy, which in some ways is the inevitable outcome from the first four symptoms. The easiest way to understand entropy is to watch this awesome TV series called the “Wonders of The Universe.” In that show, the concept of entropy was discussed and for me made a lot of intrinsic sense. Without getting into the detail, entropy is the notion that over time things move from an organised to less organised state. Rather than have me waste your time trying to explain it in prose, let’s listen to the show in question. (Don’t skip the video – this is important!)

Now what does this has to do with SharePoint fatigue? Gordon Whyte saw what I am getting at with his post on entropy within organisations, especially in relation to change management.

For example, when we build a car we take raw materials such as metal, leather, plastic and glass and arrange them in a highly organised way to make a car. But if we then leave that car for long enough the metal will rust, the glass will become brittle and break and the leather will dry out and turn to dust. If the car is left for a very long time it will eventually disappear altogether. This thought left me wondering about the nature of organisations. If a progression from order to chaos is the natural order of the universe, then is this same pressure present in organisations and, perhaps more importantly, what is the optimum position for an organisation between the extremes of rigid inflexibility (low entropy) and complete chaos (high entropy)? This question is not as crazy as it might at first appear”

Gordon has nailed the issue in his post. Any SharePoint solution that has a low entropic nature requires more energy and effort to maintain that order and control. Complex SharePoint solutions often have complex governance wrapped around them. Governance that is process and structure centric by definition, has low entropy and accordingly, needs higher effort to maintain over time. In fact, if you do not maintain that effort and energy, then any SharePoint solution will usually disintegrate back into the sort of information management chaos that gave rise to SharePoint in the first place! Rather like the sandcastle in the video.

By the way, I feel that email and file shares are high entropy solutions – all failed SharePoint projects lead back to these tools because they require less structure to maintain (in the short term).

In short, if SharePoint is implemented with low entropy, more energy is needed to maintain it. Remove the energy and very quickly, things become chaotic again. Governance approaches that are not cognisant of this will never stand the test of time. The question then becomes whether people feel that the end in mind is worth the perceived extra effort that is being asked of them.

6. Social complexity

Social complexity is also somewhat of a result of the first five symptoms. Most organisations have a blame culture. If they didn’t, then people wouldn’t spend so much time trying to position themselves for blame avoidance. Social complexity is the result of turf wars, ideological smackdowns and all of the other sort of things that result in the cliché of “the silos” where people are not talking to each-other in organisations. SharePoint exacerbates social complexity for two main reasons.

Firstly, because it is a collaboration tool, it actually requires some collaboration to put it in! This is often easier said than done. Secondly, because it is a pervasive and disruptive technology, it almost always clashes with an established tool, process or practice where proponents aren’t willing to change. In fact, they may not even recognise that there is a problem to solve – especially when SharePoint has been thrust upon them. (In an old post, I wrote about the notion of memeplexes and the ideological immune mechanisms that they create and why it is so hard to get shared understanding across departmental boundaries in organisations. Memetic smackdowns are the result).

The long and the short of social complexity is that there is only so much stress people can take. We all seem to have a pathological need to seek order and safety, rather then remain in a stressful situation. Once social complexity bites, the merry-go-round of staff attrition really starts to bite…

7. Meaning over motivation…

Now if I haven’t completely depressed you, let me offer you a perverse glimmer of light. For those of us who understand the preceding 6 fatigue symptoms, recognise them for what they are and take steps to mitigating them, there is one other symptom that contributes to SharePoint Fatigue Syndrome. This is the trickiest of all – and I am a somewhat willing victim of it.

I have spent a lot of time learning techniques to help address the symptoms I outlined here and as it turns out, these skills are universally applicable, whether in SharePoint, IT or beyond IT. For years now, I have metaphorically had one foot out of the SharePoint world door and the other foot into the world of construction, health and management sectors. Hell…I have written what I think is the first business book ever by a SharePoint person that is a non SharePoint and non IT. I also have clients with SharePoint deployments who do not know me as a SharePoint person at all, but only as a sensemaker (and for that I am grateful.)

The point is this: While the investment in these skills enables me to counter the effects of SharePoint fatigue syndrome, it is also inexorably pulling me away from SharePoint work. It seems that once you crack this nut a little, your skills are in demand across the entire problem solving spectrum. Right now this is my coping mechanism for SharePoint Fatigue Syndrome – I get to step away from SharePoint for periods and work on something else. Eventually…inevitably…I will also be one of those attrition statistics.

Conclusion:

The problem is that SharePoint Fatigue Syndrome is a negatively reinforcing cycle. As evidenced by the SharePoint attrition rate, money isn’t that great a motivator. If it was, then the void of skilled resources would have been filled by now. Paying more money might give you a short term gain, but in the long term is not going to address my seven causes of SharePoint Fatigue Syndrome.

I will leave this admittedly negative sounding post with the key to breaking this cycle. While you can attend my SharePoint Governance and Information Architecture class or Issue Mapping Class to learn many ways, the video below says it all. I encourage you to watch and reflect on it, because it’s the same key point to understanding how to do effective user engagement.

Thanks for reading

Paul Culmsee

(37) Comments

Troubleshooting SharePoint (People) Search 101

Tags: Active Directory,Assurance,Infrastructure,Performance,Search,Security,SharePoint,Troubleshooting,Web Parts @ 12:34 pm

I’ve been nerding it up lately SharePointwise, doing the geeky things that geeks like to do like ADFS and Claims Authentication. So in between trying to get my book fully edited ready for publishing, I might squeeze out the odd technical SharePoint post. Today I had to troubleshoot a broken SharePoint people search for the first time in a while. I thought it was worth explaining the crawl process a little and talking about the most likely ways in which is will break for you, in order of likelihood as I see it. There are articles out on this topic, but none that I found are particularly comprehensive.

Background stuff

If you consider yourself a legendary IT pro or SharePoint god, feel free to skip this bit. If you prefer a more gentle stroll through SharePoint search land, then read on…

When you provision a search service application as part of a SharePoint installation, you are asked for (among other things), a windows account to use for the search service. Below shows the point in the GUI based configuration step where this is done. First up we choose to create a search service application, and then we choose the account to use for the “Search Service Account”. By default this is the account that will do the crawling of content sources.

Now the search service account is described as so: “.. the Windows Service account for the SharePoint Server Search Service. This setting affects all Search Service Applications in the farm. You can change this account from the Service Accounts page under Security section in Central Administration.”

In reading this, suggests that the windows service (“SharePoint Server Search 14”) would run under this account. The reality is that the SharePoint Server Search 14 service account is the farm account. You can see the pre and post provisioning status below. First up, I show below where SharePoint has been installed and the SharePoint Server Search 14 service is disabled and with service credentials of “Local Service”.

The next set of pictures show the Search Service Application provisioned according to the following configuration:

Search service account: SEVENSIGMA\searchservice
Search admin web service account: SEVENSIGMA\searchadminws
Search query and site settings account: SEVENSIGMA\searchqueryss

You can see this in the screenshots below.

Once the service has been successfully provisioned, we can clearly see the “Default content access account” is based on the “Search service account” as described in the configuration above (the first of the three accounts).

Finally, as you can see below, once provisioned, it is the SharePoint farm account that is running the search windows service.

Once you have provisioned the Search Service Application, the default content access (in my case SEVENSIGMA\searchservice), it is granted “Read” access to all web applications via Web Application User Policies as shown below. This way, no matter how draconian the permissions of site collections are, the crawler account will have the access it needs to crawl the content, as well as the permissions of that content. You can verify this by looking at any web application in Central Administration (except for central administration web application) and choosing “User Policy” from the ribbon. You will see in the policy screen that the “Search Crawler” account has “Full Read” access.

In case you are wondering why the search service needs to crawl the permissions of content, as well as the content itself, it is because it uses these permissions to trim search results for users who do not have access to content. After all, you don’t want to expose sensitive corporate data via search do you?

There is another more subtle configuration change performed by the Search Service. Once the evilness known as the User Profile Service has been provisioned, the Search service application will grant the Search Service Account specific permission to the User Profile Service. SharePoint is smart enough to do this whether or not the User Profile Service application is installed before or after the Search Service Application. In other words, if you install the Search Service Application first, and the User Profile Service Application afterwards, the permission will be granted regardless.

The specific permission by the way, is “Retrieve People Data for Search Crawlers” permission as shown below:

Getting back to the title of this post, this is a critical permission, because without it, the Search Server will not be able to talk to the User Profile Service to enumerate user profile information. The effect of this is empty "People Search results.

How people search works (a little more advanced)

Right! Now that the cool kids have joined us (who skipped the first section), lets take a closer look at SharePoint People Search in particular. This section delves a little deeper, but fear not I will try and keep things relatively easy to grasp.

Once the Search Service Application has been provisioned, a default content source, called – originally enough – “Local SharePoint Sites” is created. Any web applications that exist (and any that are created from here on in) will be listed here. An example of a freshly minted SharePoint server with a single web application, shows the following configuration in Search Service Application:

Now hopefully http://web makes sense. Clearly this is the URL of the web application on this server. But you might be wondering that sps3://web is? I will bet that you have never visited a site using sps3:// site using a browser either. For good reason too, as it wouldn’t work.

This is a SharePointy thing – or more specifically, a Search Server thing. That funny protocol part of what looks like a URL, refers to a connector. A connector allows Search Server to crawl other data sources that don’t necessarily use HTTP. Like some native, binary data source. People can develop their own connectors if they feel so inclined and a classic example is the Lotus Notes connector that Microsoft supply with SharePoint. If you configure SharePoint to use its Lotus Notes connector (and by the way – its really tricky to do), you would see a URL in the form of:

notes://mylotusnotesbox

Make sense? The protocol part of the URL allows the search server to figure out what connector to use to crawl the content. (For what its worth, there are many others out of the box. If you want to see all of the connectors then check the list here).

But the one we are interested in for this discussion is SPS3: which accesses SharePoint User profiles which supports people search functionality. The way this particular connector works is that when the crawler accesses this SPS3 connector, it in turns calls a special web service at the host specified. The web service is called spscrawl.asmx and in my example configuration above, it would be http://web/_vti_bin/spscrawl.asmx

The basic breakdown of what happens next is this:

Information about the Web site that will be crawled is retrieved (the GetSite method is called passing in the site from the URL (i.e the “web” of sps3://web)
Once the site details are validated the service enumerates all of the use profiles
For each profile, the method GetItem is called that retrieves all of the user profile properties for a given user. This is added to the index and tagged as content class of “urn:content-class:SPSPeople” (I will get to this in a moment)

Now admittedly this is the simple version of events. If you really want to be scared (or get to sleep tonight) you can read the actual SP3 protocol specification PDF.

Right! Now lets finish this discussion by this notion of contentclass. The SharePoint search crawler tags all crawled content according to its class. The name of this “tag” – or in correct terminology “managed property” – is contentclass. By default SharePoint has a People Search scope. It is essentially a limits the search to only returning content tagged as “People” contentclass.

Now to make it easier for you, Dan Attis listed all of the content classes that he knew of back in SharePoint 2007 days. I’ll list a few here, but for the full list visit his site.

“STS_Web” – Site
“STS_List_850″ – Page Library
“STS_List_DocumentLibrary” – Document Library
“STS_ListItem_DocumentLibrary” – Document Library Items
“STS_ListItem_Tasks” – Tasks List Item
“STS_ListItem_Contacts” – Contacts List Item
“urn:content-class:SPSPeople” – People

(why some properties follow the universal resource name format I don’t know *sigh* – geeks huh?)

So that was easy Paul! What can go wrong?

So now we know that although the protocol handler is SPS3, it is still ultimately utilising HTTP as the underlying communication mechanism and calling a web service, we can start to think of all the ways that it can break on us. Let’s now take a look at common problem areas in order of commonality:

1. The Loopback issue.

This has been done to death elsewhere and most people know it. What people don’t know so well is that the loopback fix was to prevent an extremely nasty security vulnerability known as a replay attack that came out a few years ago. Essentially, if you make a HTTP connection to your server, from that server and using a name that does not match the name of the server, then the request will be blocked with a 401 error. In terms of SharePoint people search, the sps3:// handler is created when you create your first web application. If that web application happens to be a name that doesn’t match the server name, then the HTTP request to the spscrawl.asmx webservice will be blocked due to this issue.

As a result your search crawl will not work and you will see an error in the logs along the lines of:

Access is denied: Check that the Default Content Access Account has access to the content or add a crawl rule to crawl the content (0x80041205)
The server is unavailable and could not be accessed. The server is probably disconnected from the network. (0x80040d32)
***** Couldn’t retrieve server http://web.sevensigma.com policy, hr = 80041205 – File:d:\office\source\search\search\gather\protocols\sts3\sts3util.cxx Line:548

There are two ways to fix this. The quick way (DisableLoopbackCheck) and the right way (BackConnectionHostNames). Both involve a registry change and a reboot, but one of them leaves you much more open to exploitation. Spence Harbar wrote about the differences between the two some time ago and I recommend you follow his advice.

(As an slightly related side note, I hit an issue with the User Profile Service a while back where it gave an error: “Exception occurred while connecting to WCF endpoint: System.ServiceModel.Security.MessageSecurityException: The HTTP request was forbidden with client authentication scheme ‘Anonymous’. —> System.Net.WebException: The remote server returned an error: (403) Forbidden”. In this case I needed to disable the loopback check but I was using the server name with no alternative aliases or full qualified domain names. I asked Spence about this one and it seems that the DisableLoopBack registry key addresses more than the SMB replay vulnerability.)

2. SSL

If you add a certificate to your site and mark the site as HTTPS (by using SSL), things change. In the example below, I installed a certificate on the site http://web, removed the binding to http (or port 80) and then updated SharePoint’s alternate access mappings to make things a HTTPS world.

Note that the reference to SPS3://WEB is unchanged, and that there is also a reference still to HTTP://WEB, as well as an automatically added reference to HTTPS://WEB

So if we were to run a crawl now, what do you think will happen? Certainly we know that HTTP://WEB will fail, but what about SPS3://WEB? Lets run a full crawl and find out shall we?

Checking the logs, we have the unsurprising error “the item could not be crawled because the crawler could not contact the repository”. So clearly, SPS3 isn’t smart enough to work out that the web service call to spscrawl.asmx needs to be done over SSL.

Fortunately, the solution is fairly easy. There is another connector, identical in function to SPS3 except that it is designed to handle secure sites. It is “SPS3s”. We simple change the configuration to use this connector (and while we are there, remove the reference to HTTP://WEB)

Now we retry a full crawl and check for errors… Wohoo – all good!

It is also worth noting that there is another SSL related issue with search. The search crawler is a little fussy with certificates. Most people have visited secure web sites that warning about a problem with the certificate that looks like the image below:

Now when you think about it, a search crawler doesn’t have the luxury of asking a user if the certificate is okay. Instead it errs on the side of security and by default, will not crawl a site if the certificate is invalid in some way. The crawler also is more fussy than a regular browser. For example, it doesn’t overly like wildcard certificates, even if the certificate is trusted and valid (although all modern browsers do).

To alleviate this issue, you can make the following changes in the settings of the Search Service Application: Farm Search Administration->Ignore SSL warnings and tick “Ignore SSL certificate name warnings”.

The implication of this change is that the crawler will now accept any old certificate that encrypts website communications.

3. Permissions and Change Legacy

Lets assume that we made a configuration mistake when we provisioned the Search Service Application. The search service account (which is the default content access account) is incorrect and we need to change it to something else. Let’s see what happens.

In the search service application management screen, click on the default content access account to change credentials. In my example I have changed the account from SEVENSIGMA\searchservice to SEVENSIGMA\svcspsearch

Having made this change, lets review the effect in the Web Application User Policy and User Profile Service Application permissions. Note that the user policy for the old search crawl account remains, but the new account has had an entry automatically created. (Now you know why you end up with multiple accounts with the display name of “Search Crawling Account”)

Now lets check the User Profile Service Application. Now things are different! The search service account below refers to the *old* account SEVENSIGMA\searchservice. But the required permission of “Retrieve People Data for Search Crawlers” permission has not been granted!

If you traipsed through the ULS logs, you would see this:

Leaving Monitored Scope (Request (GET:https://web/_vti_bin/spscrawl.asmx)). Execution Time=7.2370958438429 c2a3d1fa-9efd-406a-8e44-6c9613231974
mssdmn.exe (0x23E4) 0x2B70 SharePoint Server Search FilterDaemon e4ye High FLTRDMN: Errorinfo is "HttpStatusCode Unauthorized The request failed with HTTP status 401: Unauthorized." [fltrsink.cxx:553] d:\office\source\search\native\mssdmn\fltrsink.cxx
mssearch.exe (0x02E8) 0x3B30 SharePoint Server Search Gatherer cd11 Warning The start address sps3s://web cannot be crawled. Context: Application ‘Search_Service_Application’, Catalog ‘Portal_Content’ Details: Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled. (0x80041205)

To correct this issue, manually grant the crawler account the “Retrieve People Data for Search Crawlers” permission in the User Profile Service. As a reminder, this is done via the Administrators icon in the “Manage Service Applications” ribbon.

Once this is done run a fill crawl and verify the result in the logs.4.

4. Missing root site collection

A more uncommon issue that I once encountered is when the web application being crawled is missing a default site collection. In other words, while there are site collections defined using a managed path, such as http://WEB/SITES/SITE, there is no site collection defined at HTTP://WEB.

The crawler does not like this at all, and you get two different errors depending on whether the SPS or HTTP connector used.

SPS:// – Error in PortalCrawl Web Service (0x80042617)
HTTP:// – The item could not be accessed on the remote server because its address has an invalid syntax (0x80041208)

The fix for this should be fairly obvious. Go and make a default site collection for the web application and re-run a crawl.

5. Alternative Access Mappings and Contextual Scopes

SharePoint guru (and my squash nemesis), Nick Hadlee posted recently about a problem where there are no search results on contextual search scopes. If you are wondering what they are Nick explains:

Contextual scopes are a really useful way of performing searches that are restricted to a specific site or list. The “This Site: [Site Name]”, “This List: [List Name]” are the dead giveaways for a contextual scope. What’s better is contextual scopes are auto-magically created and managed by SharePoint for you so you should pretty much just use them in my opinion.

The issue is that when the alternate access mapping (AAM) settings for the default zone on a web application do not match your search content source, the contextual scopes return no results.

I came across this problem a couple of times recently and the fix is really pretty simple – check your alternate access mapping (AAM) settings and make sure the host header that is specified in your default zone is the same url you have used in your search content source. Normally SharePoint kindly creates the entry in the content source whenever you create a web application but if you have changed around any AAM settings and these two things don’t match then your contextual results will be empty. Case Closed!

Thanks Nick

6. Active Directory Policies, Proxies and Stateful Inspection

A particularly insidious way to have problems with Search (and not just people search) is via Active Directory policies. For those of you who don’t know what AD policies are, they basically allow geeks to go on a power trip with users desktop settings. Consider the image below. Essentially an administrator can enforce a massive array of settings for all PC’s on the network. Such is the extent of what can be controlled, that I can’t fit it into a single screenshot. What is listed below is but a small portion of what an anal retentive Nazi administrator has at their disposal (mwahahaha!)

Common uses of policies include restricting certain desktop settings to maintain consistency, as well as enforce Internet explorer security settings, such as proxy server and security settings like maintaining the trusted sites list. One of the common issues encountered with a global policy defined proxy server in particular is that the search service account will have its profile modified to use the proxy server.

The result of this is that now the proxy sits between the search crawler and the content source to be crawled as shown below:

Crawler —–> Proxy Server —–> Content Source

Now even though the crawler does not use Internet Explorer per se, proxy settings aren’t actually specific to Internet Explorer. Internet explorer, like the search crawler, uses wininet.dll. Wininet is a module that contains Internet-related functions used by Windows applications and it is this component that utilises proxy settings.

Sometimes people will troubleshoot this issue by using telnet to connect to the HTTP port. "ie: “Telnet web 80”. But telnet does not use the wininet component, so is actually not a valid method for testing. Telnet will happily report that the web server is listening on port 80 or 443, but it matters not when the crawler tries to access that port via the proxy. Furthermore, even if the crawler and the content source are on the same server, the result is the same. As soon as the crawler attempts to index a content source, the request will be routed to the proxy server. Depending on the vendor and configuration of the proxy server, various things can happen including:

The proxy server cannot handle the NTLM authentication and passes back a 400 error code to the crawler
The proxy server has funky stateful inspection which interferes with the allowed HTTP verbs in the communications and interferes with the crawl

For what its worth, it is not just proxy settings that can interfere with the HTTP communications between the crawler and the crawled. I have seen security software also get in the way, which monitors HTTP communications and pre-emptively terminates connections or modifies the content of the HTTP request. The effect is that the results passed back to the crawler are not what it expects and the crawler naturally reports that it could not access the data source with suitably weird error messages.

Now the very thing that makes this scenario hard to troubleshoot is the tell-tale sign for it. That is: nothing will be logged in the ULS logs, not the IIS logs for the search service. This is because the errors will be logged in the proxy server or the overly enthusiastic stateful security software.

If you suspect the problem is a proxy server issue, but do not have access to the proxy server to check logs, the best way to troubleshoot this issue is to temporarily grant the search crawler account enough access to log into the server interactively. Open internet explorer and manually check the proxy settings. If you confirm a policy based proxy setting, you might be able to temporarily disable it and retry a crawl (until the next AD policy refresh reapplies the settings). The ideal way to cure this problem is to ask your friendly Active Directory administrator to either:

Remove the proxy altogether from the SharePoint server (watch for certificate revocation slowness as a result)
Configure an exclusion in the proxy settings for the AD policy to that the content sources for crawling are not proxied
Create a new AD policy specifically for the SharePoint box so that the default settings apply to the rest of the domain member computers.

If you suspect the issue might be overly zealous stateful inspection, temporarily disable all security-type software on the server and retry a crawl. Just remember, that if you have no logs on the server being crawled, chances are its not being crawled and you have to look elsewhere.

7. Pre-Windows 2000 Compatibility Access Group

In an earlier post of mine, I hit an issue where search would yield no results for a regular user, but a domain administrator could happily search SP2010 and get results. Another symptom associated with this particular problem is certain recurring errors event log – Event ID 28005 and 4625.

ID 28005 shows the message “An exception occurred while enqueueing a message in the target queue. Error: 15404, State: 19. Could not obtain information about Windows NT group/user ‘DOMAIN\someuser’, error code 0×5”.
The 4625 error would complain “An account failed to log on. Unknown user name or bad password status 0xc000006d, sub status 0xc0000064” or else “An Error occured during Logon, Status: 0xc000005e, Sub Status: 0x0”

If you turn up the debug logs inside SharePoint Central Administration for the “Query” and “Query Processor” functions of “SharePoint Server Search” you will get an error “AuthzInitializeContextFromSid failed with ERROR_ACCESS_DENIED. This error indicates that the account under which this process is executing may not have read access to the tokenGroupsGlobalAndUniversal attribute on the querying user’s Active Directory object. Query results which require non-Claims Windows authorization will not be returned to this querying user.

The fix is to add your search service account to a group called “Pre-Windows 2000 Compatibility Access” group. The issue is that SharePoint 2010 re-introduced something that was in SP2003 – an API call to a function called AuthzInitializeContextFromSid. Apparently it was not used in SP2007, but its back for SP2010. This particular function requires a certain permission in Active Directory and the “Pre-Windows 2000 Compatibility Access” group happens to have the right required to read the “tokenGroupsGlobalAndUniversal“ Active Directory attribute that is described in the debug error above.

8. Bloody developers!

Finally, Patrick Lamber blogs about another cause of crawler issues. In his case, someone developed a custom web part that had an exception thrown when the site was crawled. For whatever reason, this exception did not get thrown when the site was viewed normally via a browser. As a result no pages or content on the site could be crawled because all the crawler would see, no matter what it clicked would be the dreaded “An unexpected error has occurred”. When you think about it, any custom code that takes action based on browser parameters such as locale or language might cause an exception like this – and therefore cause the crawler some grief.

In Patricks case there was a second issue as well. His team had developed a custom HTTPModule that did some URL rewriting. As Patrick states “The indexer seemed to hate our redirections with the Response.Redirect command. I simply removed the automatic redirection on the indexing server. Afterwards, everything worked fine”.

In this case Patrick was using a multi-server farm with a dedicated index server, allowing him to remove the HTTP module for that one server. in smaller deployments you may not have this luxury. So apart from the obvious opportunity to bag programmers :-), this example nicely shows that it is easy for a 3rd party application or code to break search. What is important for developers to realise is that client web browsers are not the only thing that loads SharePoint pages.

If you are not aware, the user agent User Agent string identifies the type of client accessing a resource. This is the means by which sites figure out what browser you are using. A quick look at the User Agent parameter by SharePoint Server 2010 search reveals that it identifies itself as “Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)“. At the very least, test any custom user interface code such as web parts against this string, as well as check the crawl logs when it indexes any custom developed stuff.

Conclusion

Well, that’s pretty much my list of gotchas. No doubt there are lots more, but hopefully this slightly more detailed exploration of them might help some people.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

www.spgovia.com

(26) Comments

More classes planned and clearing the air…

Tags: 21apps,alliance,Governance,Information Architecture,Issue Mapping,Reviews,SamePage,Seven Sigma,SharePoint,Speaking presentation,Strategy @ 10:18 pm

Hi all

I have a couple of important community service type announcements to make.

How do I know I’m attending a legitimate Seven Sigma Class?

Sometimes the training marketplace can be confusing with various organisations offering various courses. Ask any attendee of the SPGov+IA class and they will attest to the uniqueness of our course. Both myself, and some of my trusted local partners have been contacted by people about other SharePoint courses in the Information Architecture space, wondering if we endorse or are in any way associated with them. This has happened again recently, so it’s probably worth clearing the air here and now.

Ahem

Seven Sigma has a number of relationships with like-minded organisations around the world. In the UK, we have a terrific relationship with Andrew and Ant at 21apps. In New Zealand, we work with Chan at 3Grow and Debbie at EnvisionIT. In the US we work with Erica Toelle at FPWeb, as well as Ruven Gotz and in Brisbane recently we worked with Alpesh Nakar from Just SharePoint.

Aside from myself, Ant Clay of 21apps is the only authorised trainer of our courseware. Essentially if Ant or I are not running the class, then it’s not my class!. Visit the trainer section of www.spgovia.com for our details.

Furthermore outside of Australia, if the course organiser is not Andrew Woodward or Ant Clay (Europe), Erica Toelle or Ruven Gotz (US), Chan or Debbie Ireland (New Zealand), then it is not the SPGov+IA class.

www.spgovia.com is the official site for the SharePoint Governance and Information Architecture Master Class. Here you can find out about the class, feedback from past attendees, schedules and registration information. This is the authoritative source for all information related to all classes. Each of the above partners will publish location specific information about classes that they plan to run.

More SPGov+IA classes for 2011 (and Issue Mapping Class is a go…)

I am proud to report that the first ever Issue Mapping Master Class, co-developed with CogNexus and run by Seven Sigma happened in my home town of Perth last month. This has been a long time coming, and the feedback from the first attendees was immensely gratifying.

Definitely one of the best courses I have ever attended…I have already recommended to many people that they should get on the next course if possible. Jon Gorton

This course was brilliant. The technique itself is a valuable tool for any business with multiple applications. Leisha Velterop

So now on top of the SharePoint Governance and Information Architecture Class, we can offer a specialist course on the craft of Issue and Dialogue Mapping – something that regular readers of this blog may be familiar with. For all alumni of the SPGov+IA class who put their hand up for a dedicated IBIS class, you now have your wish. The Issue Mapping class will be taken onto the road for the first time too and the plan is to run both classes in each location.

To that end, we have classes locked in for Auckland and Wellington. But Melbourne and the US East coast are also being earmarked in the latter part of 2011. Here are the planned classes so far.

September 5-6, 2011: #SPIAAKL SharePoint Governance and Information Architecture (Auckland, New Zealand)
September 8-9, 2011 #IMAKL Issue Mapping Master Class (Auckland, New Zealand)
September 12,-13 2011 #SPIAWEL SharePoint Governance and Information Architecture (Wellington, New Zealand)
October 12,-13 2011 #IMPER Issue Mapping Master Class (Perth, Australia)
November 10-11 2011 #SPIASIN SharePoint Governance and Information Architecture (Singapore)

We will publish more details of the new class as soon as we can.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(0) Comments

« Previous Page — Next Page »