Back to Cleverworkarounds mainpage
 

Feb 02 2012

An obscure “failed to create the configuration database” issue…

Send to Kindle

Hi all

You would think that after years of installing SharePoint in various scenarios, that I would be able to get past step 3 in the configuration wizard (the step that creates the configuration database). But today I almost got nailed by an issue that – while in hindsight is dead-set obvious – was rather difficult to diagnose.

Basically it was a straightforward two server farm installation. The installer account had local admin rights on the web front end server and sysadmin rights on the SQL box. SQL was a dedicated named instance using an alias. I was tutoring the install while an engineer did the driving and as soon as we hit step 3, blammo! – the Installation failed claiming that the configuration database could not be created.

image

Looking a little deeper into the log, the error message stated:

An error occurred while getting information about the user svc-spfarm at server mydomain.com: Access is denied

Hmm.. After double checking all the obvious things (SQL dbcreator and securityadmin on the farm account, group policy interference, etc) it was clear this was something different. The configuration database was successfully created on the SQL server, although the permissions of the farm account had not been applied. This proved that SQL permissions were appropriate. Clearly this was an issue around authentication and active directory.

There were very few reports of similar symptoms online and the closest I saw was a situation where the person ran the SharePoint configuration wizard using the local machine administrator account by mistake, rather than a domain account. Of course, the local account had no rights to access active directory and the wizard had failed because it had no way to verify the SharePoint farm account in AD to grant it permissions to the configuration database. But in this case we were definitely using a valid domain account.

As part of our troubleshooting, we opted to explicitly give the farm account “Log on as a service” rights (since this is needed for provisioning the user profile service later anyhow). It was then we saw some really bizarre behaviour. We were unable to find the SharePoint farm account in Active Directory. Any attempt to add the farm account to the “log on as a service” right would not resolve and therefore we could not assign that right. We created another service account to test this behaviour and and the same thing happened. This immediately smelt like an issue with Active directory replication – where the domain controller being accessed was not replicating with the others domain controllers. A quick repladmin check and we ascertained that all was fine.

Hmm…

At this point, we started experimenting with various accounts, old and new. We were able to conclude that irrespective of the age of the account, some accounts could be found in Active Directory no problem, whereas others could not be. Yet those that could not be found were valid and working on the domain.

Finally one of the senior guys in the organisation realised the problem. In their AD topology, there was an OU for all service accounts. The permissions of that OU had been modified from the default. The “Domain users” group did not have any access to that OU at all. This prevented service accounts from being enumerated by regular domain accounts (a security design they had adopted some time back). Interestingly, even service accounts that live in this OU cannot enumerate any other accounts in that OU, including themselves.

This caused several problems with SharePoint. First the configuration wizard could not finish because it needed to assign the farm account permissions to the config and central admin databases. Additionally, the farm account would not be able to register managed accounts if those accounts were stored in this OU.

Fortunately, when they created this setup, they made a special group called “Enumerate Service Account OU”. By adding the installer account and the farm account to this group all was well.

I have to say, I thought I had seen most of the ways Active Directory configuration might trip me up – but this was a first. Anyone else seen this before?

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

www.hereticsguidebooks.com

p.s The error log detail is below….


 

Log Name:      Application

Source:        SharePoint 2010 Products Configuration Wizard

Date:          1/02/2012 2:22:52 PM

Event ID:      104

Task Category: None

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      Mycomputer

Description:

Failed to create the configuration database.

An exception of type System.InvalidOperationException was thrown.  Additional exception information: An error occurred while getting information about the user svc-spfarm at server mydomain: Access is denied

System.InvalidOperationException: An error occurred while getting information about the user svc-spfarm at server mydomain

   at Microsoft.SharePoint.Win32.SPNetApi32.NetUserGetInfo1(String server, String name)

   at Microsoft.SharePoint.Administration.SPManagedAccount.GetUserAccountControl(String username)

   at Microsoft.SharePoint.Administration.SPManagedAccount.Update()

   at Microsoft.SharePoint.Administration.SPProcessIdentity.Update()

   at Microsoft.SharePoint.Administration.SPApplicationPool.Update()

   at Microsoft.SharePoint.Administration.SPWebApplication.CreateDefaultInstance(SPWebService service, Guid id, String applicationPoolId, SPProcessAccount processAccount, String iisServerComment, Boolean secureSocketsLayer, String iisHostHeader, Int32 iisPort, Boolean iisAllowAnonymous, DirectoryInfo iisRootDirectory, Uri defaultZoneUri, Boolean iisEnsureNTLM, Boolean createDatabase, String databaseServer, String databaseName, String databaseUsername, String databasePassword, SPSearchServiceInstance searchServiceInstance, Boolean autoActivateFeatures)

   at Microsoft.SharePoint.Administration.SPWebApplication.CreateDefaultInstance(SPWebService service, Guid id, String applicationPoolId, IdentityType identityType, String applicationPoolUsername, SecureString applicationPoolPassword, String iisServerComment, Boolean secureSocketsLayer, String iisHostHeader, Int32 iisPort, Boolean iisAllowAnonymous, DirectoryInfo iisRootDirectory, Uri defaultZoneUri, Boolean iisEnsureNTLM, Boolean createDatabase, String databaseServer, String databaseName, String databaseUsername, String databasePassword, SPSearchServiceInstance searchServiceInstance, Boolean autoActivateFeatures)

   at Microsoft.SharePoint.Administration.SPAdministrationWebApplication.CreateDefaultInstance(SqlConnectionStringBuilder administrationContentDatabase, SPWebService adminService, IdentityType identityType, String farmUser, SecureString farmPassword)

   at Microsoft.SharePoint.Administration.SPFarm.CreateAdministrationWebService(SqlConnectionStringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword)

   at Microsoft.SharePoint.Administration.SPFarm.CreateBasicServices(SqlConnectionStringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword)

   at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword, SecureString masterPassphrase)

   at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, String farmUser, SecureString farmPassword, SecureString masterPassphrase)

   at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.CreateOrConnectConfigDb()

   at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.Run()

   at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Jan 13 2012

The cloud is not the problem-Part 4: Industry shakeout and playing with the big kids…

Send to Kindle

Hi all

Welcome to the fourth post about the adaptive change that cloud computing is going to have on practitioners, paradigms and organisations. The previous two posts took a look at some of the dodgier side of two of the industries biggest players, Microsoft and Amazon. While I have highlighted some dumb issues with both, I nevertheless have to acknowledge their resourcing, scalability, and ability to execute. On that point of ability to execute, in this post we are going to expand a little towards the cloud industry more broadly and the inevitable consolidation that is, and will continue to take place.

Now to set the scene, a lot of people know that in the early twentieth century, there were a lot of US car manufacturers. I wonder if you can take a guess at the number of defunct car manufacturers there have been before and after that time.

…Fifty?

…One Hundred?

Not even close…

What if I told you that there were over 1700!

Here is another interesting stat. The table below shows the years where manufacturers went bankrupt or ceased operations. Below that I have put the average shelf life of each company for that decade.

Year 1870’s 1880’s 1890’s 1900’s 1910’s 1920’s 1930’s 1940’s 1950’s 1960’s 1970’s 1980’s 1990’s 2000’s 2010’s
# defunct 4 2 5 88 660 610 276 42 13 33 11 5 5 3 5
avg years in operation 5 1 1 3 3 4 5 7 14 10 19 37 16 49 42

Now, you would expect that the bulk of closures would be depression era, but note that the depression did not start until the late 1920’s and during the boom times that preceded it, 660 manufactures went to the wall – a worse result!

The pattern of consolidation

image

What I think the above table shows is the classic pattern of industry consolidation after an initial phase of innovation and expansion, where over time, the many are gobbled by the few. As the number of players consolidate, those who remain grow bigger, with more resources and economies of scale. This in turn creates barriers to entry for new participants. Accordingly, the rate of attrition slows down, but that is more due to the fact that there are fewer players in the industry. Those that are left continue to fight their battles, but now those battles take longer. Nevertheless, as time goes on, the number of players consolidate further.

If we applied a cloud/web hosting paradigm to the above table, I would equate the dotcom bust of 2000 with the depression era of the 1920’s and 1930’s. I actually think with cloud computing, we are in the 1960’s and on right now. The largest of the large players have how made big bets on the cloud and have entered the market in a big, big way. For more than a decade, other companies hosted Microsoft technology, with Microsoft showing little interest beyond selling licenses via them. Now Microsoft themselves are also the hosting provider. Does that mean most the hosting providers will have the fate of Netscape? Or will they manage to survive the dance with Goliath like Citrix or VMWare have?

For those who are not Microsoft or Amazon…

image

Imagine you have been hosting SharePoint solutions for a number of years. Depending on your size, you probably own racks or a cage in some-one else’s data centre, or you own a small data centre yourself. You have some high end VMWare gear to underpin your hosting offerings and you do both managed SharePoint (i.e. offer a basic site collection subscription with no custom stuff – ala Office 365) and you offer dedicated virtual machines for those who want more control (ala Amazon). You have dutifully paid your service provider licensing to Microsoft, have IT engineers on staff, some SharePoint specialists, a helpdesk and some dodgy sales guys – all standard stuff and life is good. You had a crack at implementing SharePoint multi tenancy, but found it all a bit too fiddly and complex.

Then Amazon comes along and shakes things up with their IaaS offerings. They are cost competitive, have more data centres in more regions, a higher capacity, more fault tolerance, a wider variety of services and can scale more than you can. Their ability to execute in terms of offering new services is impossible to keep up with. In short, they slowly but relentlessly take a chunk of the market and continue to grow. So, you naturally counter by pushing the legitimate line that you specialise in SharePoint, and as a result customers are in much more trusted hands than Amazon, when investing on such a complex tool as SharePoint.

But suddenly the game changes again. The very vendor who you provide cloud-based SharePoint services for, now bundles it with Exchange, Lync and offers Active Directory integration (yeah, yeah, I know there was BPOS but no-one actually heard of that). Suddenly the argument that you are a safer option than Amazon is shot down by the fact that Microsoft themselves now offer what you do. So whose hands are safer? The small hosting provider with limited resources or the multinational with billions of dollars in the bank who develops the product? Furthermore, given Microsoft’s advantage in being able to mobilise its knowledge resources with deep product knowledge, they have a richer managed service offering than you can offer (i.e. they offer multi tenancy :).

This puts you in a bit of a bind as you are getting assailed at both ends. Amazon trumps you in the capabilities at the IaaS end and is encroaching in your space and Microsoft is assailing the SaaS end. How does a small fish survive in a pond with the big ones? In my opinion, the mid-tier SharePoint cloud providers will have to reinvent themselves.

The adaptive change…

So for the mid-tier SharePoint cloud provider grappling with the fact that their play area is reduced because of the big kids encroaching, there is only one option. They have to be really, really good in areas the big kids are not good at. In SharePoint terms, this means they have to go to places many don’t really want to go: they need to bolster their support offerings and move up the SharePoint stack.

You see, traditionally a SharePoint hosting provider tends to take two approaches. They provide a managed service where the customer cannot mess with it too much (i.e. Site collection admin access only). For those who need more than that, they will offer a virtual machine and wipe their hands of any maintenance or governance, beyond ensuring that  the infrastructure is fast and backed up. Until now, cloud providers could get away with this and the reason they take this approach should be obvious to anyone who has implemented SharePoint. If you don’t maintain operational governance controls, things can rapidly get out of hand. Who wants to deal with all that “people crap”? Besides, that’s a different skill set to typical skills required to run and maintain cloud services at the infrastructure layer.

So some cloud providers will kick and scream about this, and delude themselves into thinking that hosting and cloud services are their core business. For those who think this, I have news for you. The big boys think these are their core business too and they are going to do it better than you. This is now commodity stuff and a by-product of commoditisation is that many SharePoint consultancies are now cloud providers anyway! They sign up to Microsoft or Amazon and are able to provide a highly scalable SharePoint cloud service with all the value added services further up the SharePoint stack. In short, they combine their SharePoint expertise with Microsoft/Amazon’s scale.

Now on the issue of support, Amazon has no specific SharePoint skills and they never will. They are first and foremost a compelling IaaS offering. Microsoft’s support? … go and re-read part 2 if you want to see that. It seems that no matter the big multinational, level 1 tech support is always level 1 tech support.

So what strategies can a mid-tier provider take to stay competitive in this rapidly commoditising space. I think one is to go premium and go niche.

  • Provide brilliant support. If I call you, day or night, I expect to speak to a SharePoint person straight away. I want to get to know them on a first name basis and I do not want to fight the defence mechanism of the support hierarchy.
  • Partner with SharePoint consultancies or acquire consulting resources. The latter allows you to do some vertical integration yourself and broaden your market and offerings. A potential KPI for any SharePoint cloud provider should be that no support person ever says “sorry that’s outside the scope of what we offer.”
  • Develop skills in the tools and systems that surround SharePoint or invest in SharePoint areas where skills are lacking. Examples include Project Server, PerformancePoint, integration with GIS, Records management and ERP systems. Not only will you develop competencies that few others have, but you can target particular vertical market segments who use these tools.
  • (Controversial?) Dump your infrastructure and use Amazon in conjunction with another IaaS provider. You just can’t compete with their scale and price point. If you use them you will likely save costs, when combined with a second provider you can play the resiliency card and best of all … you can offer VPC :-)

Conclusion

In the last two posts we looked at some of the areas where both Microsoft and Amazon sometimes struggle to come to grips with the SharePoint cloud paradigm. In this post, we took a look at other cloud providers having to come to grips with the SharePoint cloud paradigm of having to compete with these two giants, who are clearly looking to eke out as much value as they can from the cloud pie. Whether you agree with my suggested strategy (Rackspace appears to), the pattern of the auto industry serves as an interesting parallel to the cloud computing marketplace. Is the relentless consolidation a good thing? Probably not in the long term (we will tackle that issue in the last post in this series). In the next post, we are going to shift our focus away from the cloud providers themselves, and turn our gaze to the internal IT departments – who until now, have had it pretty good. As you will see, a big chunk of the irrational side of cloud computing comes from this area.

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Jan 03 2012

The cloud is not the problem–Part 3: When silos strike back…

Send to Kindle

What can Ikea fails tell us about cloud computing?

My next door neighbour is a builder. When he moved next door, the house was an old piece of crap. Within 6 months, he completely renovated it himself, adding in two bedrooms, an underground garage and all sorts of cool stuff. On the other hand, I bought my house because it was a good location, someone had already renovated it and all we had to do was move in. The reason for this was simple: I had a new baby and more importantly, me and power tools do not mix. I just don’t have the skills, nor the time to do what my neighbour did.

You can probably imagine what would happen if I tried to renovate my house the way my neighbour did. It would turn out like the Ikea fails in the video. Similarly, many SharePoint installs tend to look similar to the video too. Moral of the story? Sometimes it is better to get something pre-packaged than to do it yourself.

In the last post, we examined the “Software as a Service” (SaaS) model of cloud computing in the form of Office 365. Other popular SaaS providers include SlideShare, Salesforce, Basecamp and Tom’s Planner to name a few. Most SaaS applications are browser based and not as feature rich or complex as their on-premise competition. Therefore the SaaS model is that its a bit like buying a kit home. In SaaS, no user of these services ever touches the underlying cloud infrastructure used to provide the solution, nor do they have a full mandate to tweak and customise to their hearts content. SaaS is basically predicated on the notion that someone else will do a better set-up job than you and the old 80/20 rule about what features for an application are actually used.

Some people may regard the restrictions of SaaS as a good thing – particularly if they have dealt with the consequences of one too many unproductive customization efforts previously. As many SharePointer’s know, the more you customise SharePoint, the less resilient it gets. Thus restricting what sort of customisations can be done in many circumstances might be a wise thing to do.

Nevertheless, this actually goes against the genetic traits of pretty much every Australian male walking the planet. The reason is simple: no matter how much our skills are lacking or however inappropriate tools or training, we nevertheless always want to do it ourselves. This brings me onto our next cloud provider: Amazon, and their Infrastructure as a Service (IaaS) model of cloud based services. This is the ultimate DIY solution for those of us that find SaaS to cramping our style. Let’s take a closer look shall we?

Amazon in a nutshell

Okay, I have to admit that as an infrastructure guy, I am genetically predisposed to liking Amazon’s cloud offerings. Why? well as an infrastructure guy, I am like my neighbour who renovated his own house. I’d rather do it all myself because I have acquired the skills to do so. So for any server-hugging infrastructure people out there who are wondering what they have been missing out on? Read on… you might like what you see.

Now first up, its easy for new players to get a bit intimidated by Amazon’s bewildering array of offerings with brand names that make no sense to anybody but Amazon… EC2, VPC, S3, ECU, EBS, RDS, AMI’s, Availability Zones – sheesh! So I am going to ignore all of their confusing brand names and just hope that you have heard of virtual machines and will assume that you or your tech geeks know all about VMware. The simplest way to describe Amazon is VMWare on steroids. Amazon’s service essentially allows you to create Virtual Machines within Amazon’s “cloud” of large data centres around the world. As I stated earlier, the official cloud terminology that Amazon is traditionally associated is called Infrastructure as a Service (IaaS). This is where, instead of providing ready-made applications like SaaS, a cloud vendor provides lower level IT infrastructure for rent. This consists of stuff like virtualised servers, storage and networking.

Put simply, utilising Amazon, one can deploy virtual servers with my choice of operating system, applications, memory, CPU and disk configuration. Like any good “all you can eat” buffet, one is spoilt for choice. One simply chooses an Amazon Machine Image (AMI) to use as a base for creating a virtual server. You can choose one of Amazon’s pre-built AMI’s (Base installs of Windows Server or Linux) or you can choose an image from the community contributed list of over 8000 base images. Pretty much any vendor out there who sells a turn-key solution (such as those all-in-one virus scanning/security solutions) has likely created an AMI. Microsoft have also gotten in on the Amazon act and created AMI’s for you, optimised by their product teams. Want SQL 2008 the way Microsoft would install it? Choose the Microsoft Optimized Base SQL Server 2008R2 AMI which “contains scripts to install and optimize SQL Server 2008R2 and accompanying services including SQL Server Analysis services, SQL Server Reporting services, and SQL Server Integration services to your environment based on Microsoft best practices.”

The series of screen shots below shows the basic idea. After signing up, use the “Request instance wizard” to create a new virtual server by choosing an AMI first. In the example below, I have shown the default Amazon AMI’s under “Quick start” as well as the community AMI’s.

image61_thumb1
Amazons default AMI’s
image3_thumb
Community contributed AMI’s

From the list above, I have chosen Microsoft’s “Optimized SQL Server 2008 R2 SP1” from the community AMI’s and clicked “Select”. Now I can choose the CPU and memory configurations. Hmm how does a 16 core server sound with 60 gig of RAM? That ought to do it… :-)

image13_thumb

Now I won’t go through the full description of commissioning virtual servers, but suffice to say that you can choose which geographic location this server will reside within Amazon’s cloud and after 15 minutes or so, your virtual server will be ready to use. It can be assigned a public IP address, firewall restricted and then remotely managed as per any other server. This can all be done programmatically too. You can talk to Amazon via web services start, monitor, terminate, etc. as many virtual machines as you want, which allows you to scale your infrastructure on the fly and very quickly. There are no long procurement times and you then only pay for what servers are currently running. If you shut them down, you stop paying.

But what makes it cool…

Now I am sure that some of you might be thinking “big deal…any virtual machine hoster can do that.” I agree – and when I first saw this capability I just saw it as a larger scale VMWare/Xen type deployment. But when really made me sit up and take notice was Amazon’s Virtual Private Cloud (VPC) functionality. The super-duper short version of VPC is that it allows you extend your corporate network into the Amazon cloud. It does this by allowing you to define your own private network and connecting to it via site-to-site VPN technology. To describe how it works, diagrammatically check out the image below.

image_thumb13

Let’s use an example to understand the basic idea. Let’s say your internal IP address range at your office is 10.10.10.0 to 10.10.10.255 (a /24 for the geeks). With VPC you tell Amazon “I’d like a new IP address range of 10.10.11.0 to 10.10.11.255” . You are then prompted to tell Amazon the public IP address of your internet router. The screenshots below shows what happens next:

image_thumb14

image6_thumb

The first screenshot asks you to choose what type of router is at your end. Available choices are Cisco, Juniper, Yamaha, Astaro and generic. The second screenshot shows you a sample configuration that is downloaded. Now any Cisco trained person reading this will recognise what is going on here. This is the automatically generated configuration to be added to an organisations edge router to create an IPSEC tunnel. In other words, we have extended our corporate network itself into the cloud. Any service can be run on such a network – not just SharePoint. For smaller organisations wanting the benefits of off-site redundancy without the costs of a separate datacenter, this is a very cost effective option indeed.

For the Cisco geeks, the actual configuration is two GRE tunnels that are IPSEC encrypted. BGP is used for route table exchange, so Amazon can learn what routes to tunnel back to your on-premise network. Furthermore Amazon allows you to manage firewall settings at the Amazon end too, so you have an additional layer of defence past your IPSEC router.

This is called Virtual Private Cloud (VPC) and when configured properly is very powerful. Note the “P” is for private. No server deployed to this subnet is internet accessible unless you choose it to be. This allows you to extend your internal network into the cloud and gain all the provisioning, redundancy and scalability benefits without exposure to the internet directly. As an example, I did a hosted SharePoint extranet where we use SQL log shipping of the extranet content databases back to the a DMZ network for redundancy. Try doing that on Office365!

This sort of functionality shows that Amazon is a mature, highly scalable and flexible IaaS offering. They have been in the business for a long time and it shows because their full suite of offerings is much more expansive than what I can possibly cover here. Accordingly my Amazon experiences will be the subject of a more in-depth blog post or two in future. But for now I will force myself to stop so the non-technical readers don’t get too bored. :-)

So what went wrong?

So after telling you how impressive Amazon’s offering is, what could possibly go wrong? Like the Office365 issue covered in part 2, absolutely nothing with the technology. To understand why, I need to explain Amazon’s pricing model.

Amazon offer a couple of ways to pay for servers (called instances in Amazon speak). An on-demand instance is calculated based on a per-hour price while the server is running. The more powerful the server is in terms of CPU, memory and disk, the more you pay. To give you an idea, Amazon’s pricing for a Windows box with 8CPU’s and 16GB of RAM, running in Amazon’s “US east” region will set you back $0.96 per hour (as of 27/12/11). If you do the basic math for that, it equates to around $8409 per year, or $25228 over three years. (Yeah I agree that’s high – even when you consider that you get all the trappings of a highly scalable and fault tolerant datacentre).

On the other hand, a reserved instance involves making a one-time payment and in turn, receive a significant discount on the hourly charge for that instance. Essentially if you are going to run an Amazon server on a 24*7 basis for more than 18 months or so, a reserved instance makes sense as it reduces considerable cost over the long term. The same server would only cost you $0.40 per hour if you pay an up-front $2800 for a 3 year term. Total cost: $13312 over three years – much better.

So with that scene set, consider this scenario: Back at the start of 2011, a client of mine consolidated all of their SharePoint cloud services to Amazon from a variety of other another hosting providers. They did this for a number of reasons, but it basically boiled down to the fact they had 1) outgrown the SaaS model and 2) had a growing number of clients. As a result, requirements from clients were getting more complicated and beyond that which most of the hosting providers could cater for. They also received irregular and inconsistent support from their existing providers, as well as some unexpected downtime that reduced confidence. In short, they needed to consolidate their cloud offering and manage their own servers. They were developing custom SharePoint solutions, needed to support federated claims authentication and required disaster recovery assurance to mitigate the risk of going 100% cloud. Amazon’s VPC offering in particular seemed ideal, because it allowed full control of the servers in a secure way.

Now making this change was not something we undertook lightly. We spent considerable time researching Amazon’s offerings, trying to understand all the acronyms as well as their fine print. (For what its worth I used IBIS as the basis to develop an assessment and the map of my notes can be found here). As you are about to see though, we did not check well enough.

Back when we initially evaluated the VPC offering, it was only available in very few Amazon sites (two locations in the USA only) and the service was still in beta. This caused us a bit of a dilemma at the time because of the risk of relying on a beta service. But we were assured when Amazon confirmed that VPC would eventually be available in all of of their datacentres. We also stress tested the service for a few weeks, it remained stable and we developed and tested a disaster recovery strategy involving SQL log shipping and a standby farm. We also purchased reserved instances from Amazon since these servers were going to be there for the long haul, so we pre-paid to reduce the hourly rates. Quite a complex configuration was provisioned in only two days and we were amazed by how easy it all was.

Things hummed along for 9 months in this fashion and the world was a happy place. We were delighted when Amazon notified us that VPC had come out of beta and was now available in any of Amazon’s datacentres around the world. We only used the US datacentre because it was the only location available at the time. Now we wanted to transfer the services to Singapore. My client contacted Amazon about some finer points on such a move and was informed that they would have to pay for their reserved instances all over again!

What the?

It turns out, reserved instances are not transferrable! Essentially, Amazon were telling us that although we paid for a three year reserved instance, and only used it for 9 months, to move the servers to a new region would mean we have to pay all over again for another 3 year reserve. According to Amazon’s documentation, each reserved instance is associated with a specific region, which is fixed for the lifetime of the reserved instance and cannot be changed.

“Okay,” we answer, “we can understand that in circumstances where people move to another cloud provider. But in our case we were not.” We had used around 1/3rd of the reserved instance. So surely Amazon should pro-rata the unused amount, and offer that as a credit when we re-purchase reserved instances in Singapore? I mean, we will still be hosting with Amazon, so overall, they will not be losing any revenue al all. On the contrary, we will be paying them more, because we will have to sign up for an additional 3 years of reserve when we move the services.

So we ask Amazon whether that can be done. “Nope,” comes back the answer from amazons not so friendly billing team with one of those trite and grossly insulting “Sorry for any inconvenience this causes” ending sentences. After more discussions, it seems that internally within Amazon, each region or datacentre within each region is its own profit centre. Therefore in typical silo fashion, the US datacentre does not want to pay money to the Singapore operation as that would mean the revenue we paid would no longer recognised against them.

Result? Customer is screwed all because the Amazon fiefdoms don’t like sharing the contents of the till. But hey – the regional managers get their bonuses right? Sad smile

Conclusion

Like part 2 of this cloud computing series, this is not a technical issue. Amazon’s cloud service in our experience has been reliable and performed well. In this case, we are turned off by the fact that their internal accounting procedures create a situation that is not great for customers who wish to remain loyal to them. In a post about the danger of short termism and ignoring legacy, I gave the example of how dumb it is for organisations to think they are measuring success based on how long it takes to close a helpdesk call. When such a KPI is used, those in support roles have little choice but to try and artificially close calls when users problems have not been solved because that’s how they are deemed to be performing well. The reality though is rather than measure happy customers, this KPI simply rewards which helpdesk operators have managed to game the system by getting callers off the phone as soon as they can.

I feel that Amazon are treating this is an internal accounting issue, irrespective of client outcomes. Amazon will lose the business of my client because of this since they have enough servers hosted where the financial impost of paying all over again is much more than transferring to a different cloud provider. While VPC and automated provisioning of virtual servers is cool and all, at the end of the day many hosting providers can offer this if you ask them. Although it might not be as slick with fancy as Amazon’s automated configuration, it nonetheless is very doable and the other providers are playing catch-up. Like Apple, Amazon are enjoying the benefits of being first to market with their service, but as competition heats up, others will rapidly bridge the gap.

Thanks for reading

Paul Culmsee

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Jul 22 2011

Troubleshooting SharePoint (People) Search 101

Send to Kindle

I’ve been nerding it up lately SharePointwise, doing the geeky things that geeks like to do like ADFS and Claims Authentication. So in between trying to get my book fully edited ready for publishing, I might squeeze out the odd technical SharePoint post. Today I had to troubleshoot a broken SharePoint people search for the first time in a while. I thought it was worth explaining the crawl process a little and talking about the most likely ways in which is will break for you, in order of likelihood as I see it. There are articles out on this topic, but none that I found are particularly comprehensive.

Background stuff

If you consider yourself a legendary IT pro or SharePoint god, feel free to skip this bit. If you prefer a more gentle stroll through SharePoint search land, then read on…

When you provision a search service application as part of a SharePoint installation, you are asked for (among other things), a windows account to use for the search service. Below shows the point in the GUI based configuration step where this is done. First up we choose to create a search service application, and then we choose the account to use for the “Search Service Account”. By default this is the account that will do the crawling of content sources.

image    image

Now the search service account is described as so: “.. the Windows Service account for the SharePoint Server Search Service. This setting affects all Search Service Applications in the farm. You can change this account from the Service Accounts page under Security section in Central Administration.”

In reading this, suggests that the windows service (“SharePoint Server Search 14”) would run under this account. The reality is that the SharePoint Server Search 14 service account is the farm account. You can see the pre and post provisioning status below. First up, I show below where SharePoint has been installed and the SharePoint Server Search 14 service is disabled and with service credentials of “Local Service”.

image

The next set of pictures show the Search Service Application provisioned according to the following configuration:

  • Search service account: SEVENSIGMA\searchservice
  • Search admin web service account: SEVENSIGMA\searchadminws
  • Search query and site settings account: SEVENSIGMA\searchqueryss

You can see this in the screenshots below.

image

imageimage

Once the service has been successfully provisioned, we can clearly see the “Default content access account” is based on the “Search service account” as described in the configuration above (the first of the three accounts).

image

Finally, as you can see below, once provisioned, it is the SharePoint farm account that is running the search windows service.

image

Once you have provisioned the Search Service Application, the default content access (in my case SEVENSIGMA\searchservice), it is granted “Read” access to all web applications via Web Application User Policies as shown below. This way, no matter how draconian the permissions of site collections are, the crawler account will have the access it needs to crawl the content, as well as the permissions of that content. You can verify this by looking at any web application in Central Administration (except for central administration web application) and choosing “User Policy” from the ribbon. You will see in the policy screen that the “Search Crawler” account has “Full Read” access.

image

image

In case you are wondering why the search service needs to crawl the permissions of content, as well as the content itself, it is because it uses these permissions to trim search results for users who do not have access to content. After all, you don’t want to expose sensitive corporate data via search do you?

There is another more subtle configuration change performed by the Search Service. Once the evilness known as the User Profile Service has been provisioned, the Search service application will grant the Search Service Account specific permission to the User Profile Service. SharePoint is smart enough to do this whether or not the User Profile Service application is installed before or after the Search Service Application. In other words, if you install the Search Service Application first, and the User Profile Service Application afterwards, the permission will be granted regardless.

The specific permission by the way, is “Retrieve People Data for Search Crawlers” permission as shown below:

image    image

Getting back to the title of this post, this is a critical permission, because without it, the Search Server will not be able to talk to the User Profile Service to enumerate user profile information. The effect of this is empty "People Search results.

How people search works (a little more advanced)

Right! Now that the cool kids have joined us (who skipped the first section), lets take a closer look at SharePoint People Search in particular. This section delves a little deeper, but fear not I will try and keep things relatively easy to grasp.

Once the Search Service Application has been provisioned, a default content source, called – originally enough – “Local SharePoint Sites” is created. Any web applications that exist (and any that are created from here on in) will be listed here. An example of a freshly minted SharePoint server with a single web application, shows the following configuration in Search Service Application:

image

Now hopefully http://web makes sense. Clearly this is the URL of the web application on this server. But you might be wondering that sps3://web is? I will bet that you have never visited a site using sps3:// site using a browser either. For good reason too, as it wouldn’t work.

This is a SharePointy thing – or more specifically, a Search Server thing. That funny protocol part of what looks like a URL, refers to a connector. A connector allows Search Server to crawl other data sources that don’t necessarily use HTTP. Like some native, binary data source. People can develop their own connectors if they feel so inclined and a classic example is the Lotus Notes connector that Microsoft supply with SharePoint. If you configure SharePoint to use its Lotus Notes connector (and by the way – its really tricky to do), you would see a URL in the form of:

notes://mylotusnotesbox

Make sense? The protocol part of the URL allows the search server to figure out what connector to use to crawl the content. (For what its worth, there are many others out of the box. If you want to see all of the connectors then check the list here).

But the one we are interested in for this discussion is SPS3: which accesses SharePoint User profiles which supports people search functionality. The way this particular connector works is that when the crawler accesses this SPS3 connector, it in turns calls a special web service at the host specified. The web service is called spscrawl.asmx and in my example configuration above, it would be http://web/_vti_bin/spscrawl.asmx

The basic breakdown of what happens next is this:

  1. Information about the Web site that will be crawled is retrieved (the GetSite method is called passing in the site from the URL (i.e the “web” of sps3://web)
  2. Once the site details are validated the service enumerates all of the use profiles
  3. For each profile, the method GetItem is called that retrieves all of the user profile properties for a given user. This is added to the index and tagged as content class of “urn:content-class:SPSPeople” (I will get to this in a moment)

Now admittedly this is the simple version of events. If you really want to be scared (or get to sleep tonight) you can read the actual SP3 protocol specification PDF.

Right! Now lets finish this discussion by this notion of contentclass. The SharePoint search crawler tags all crawled content according to its class. The name of this “tag” – or in correct terminology “managed property” – is contentclass. By default SharePoint has a People Search scope. It is essentially a limits the search to only returning content tagged as “People” contentclass.

image

Now to make it easier for you, Dan Attis listed all of the content classes that he knew of back in SharePoint 2007 days. I’ll list a few here, but for the full list visit his site.

  • “STS_Web” – Site
  • “STS_List_850″ – Page Library
  • “STS_List_DocumentLibrary” – Document Library
  • “STS_ListItem_DocumentLibrary” – Document Library Items
  • “STS_ListItem_Tasks” – Tasks List Item
  • “STS_ListItem_Contacts” – Contacts List Item
  • “urn:content-class:SPSPeople” – People

(why some properties follow the universal resource name format I don’t know *sigh* – geeks huh?)

So that was easy Paul! What can go wrong?

So now we know that although the protocol handler is SPS3, it is still ultimately utilising HTTP as the underlying communication mechanism and calling a web service, we can start to think of all the ways that it can break on us. Let’s now take a look at common problem areas in order of commonality:

1. The Loopback issue.

This has been done to death elsewhere and most people know it. What people don’t know so well is that the loopback fix was to prevent an extremely nasty security vulnerability known as a replay attack that came out a few years ago. Essentially, if you make a HTTP connection to your server, from that server and using a name that does not match the name of the server, then the request will be blocked with a 401 error. In terms of SharePoint people search, the sps3:// handler is created when you create your first web application. If that web application happens to be a name that doesn’t match the server name, then the HTTP request to the spscrawl.asmx webservice will be blocked due to this issue.

As a result your search crawl will not work and you will see an error in the logs along the lines of:

  • Access is denied: Check that the Default Content Access Account has access to the content or add a crawl rule to crawl the content (0x80041205)
  • The server is unavailable and could not be accessed. The server is probably disconnected from the network.   (0x80040d32)
  • ***** Couldn’t retrieve server http://web.sevensigma.com policy, hr = 80041205 – File:d:\office\source\search\search\gather\protocols\sts3\sts3util.cxx Line:548

There are two ways to fix this. The quick way (DisableLoopbackCheck) and the right way (BackConnectionHostNames). Both involve a registry change and a reboot, but one of them leaves you much more open to exploitation. Spence Harbar wrote about the differences between the two some time ago and I recommend you follow his advice.

(As an slightly related side note, I hit an issue with the User Profile Service a while back where it gave an error: “Exception occurred while connecting to WCF endpoint: System.ServiceModel.Security.MessageSecurityException: The HTTP request was forbidden with client authentication scheme ‘Anonymous’. —> System.Net.WebException: The remote server returned an error: (403) Forbidden”. In this case I needed to disable the loopback check but I was using the server name with no alternative aliases or full qualified domain names. I asked Spence about this one and it seems that the DisableLoopBack registry key addresses more than the SMB replay vulnerability.)

2. SSL

If you add a certificate to your site and mark the site as HTTPS (by using SSL), things change. In the example below, I installed a certificate on the site http://web, removed the binding to http (or port 80) and then updated SharePoint’s alternate access mappings to make things a HTTPS world.

Note that the reference to SPS3://WEB is unchanged, and that there is also a reference still to HTTP://WEB, as well as an automatically added reference to HTTPS://WEB

image

So if we were to run a crawl now, what do you think will happen? Certainly we know that HTTP://WEB will fail, but what about SPS3://WEB? Lets run a full crawl and find out shall we?

Checking the logs, we have the unsurprising error “the item could not be crawled because the crawler could not contact the repository”. So clearly, SPS3 isn’t smart enough to work out that the web service call to spscrawl.asmx needs to be done over SSL.

image

Fortunately, the solution is fairly easy. There is another connector, identical in function to SPS3 except that it is designed to handle secure sites. It is “SPS3s”. We simple change the configuration to use this connector (and while we are there, remove the reference to HTTP://WEB)

image

Now we retry a full crawl and check for errors… Wohoo – all good!

image

It is also worth noting that there is another SSL related issue with search. The search crawler is a little fussy with certificates. Most people have visited secure web sites that warning about a problem with the certificate that looks like the image below:

image

Now when you think about it, a search crawler doesn’t have the luxury of asking a user if the certificate is okay. Instead it errs on the side of security and by default, will not crawl a site if the certificate is invalid in some way. The crawler also is more fussy than a regular browser. For example, it doesn’t overly like wildcard certificates, even if the certificate is trusted and valid (although all modern browsers do).

To alleviate this issue, you can make the following changes in the settings of the Search Service Application: Farm Search Administration->Ignore SSL warnings and tick “Ignore SSL certificate name warnings”.

image  image

image

The implication of this change is that the crawler will now accept any old certificate that encrypts website communications.

3. Permissions and Change Legacy

Lets assume that we made a configuration mistake when we provisioned the Search Service Application. The search service account (which is the default content access account) is incorrect and we need to change it to something else. Let’s see what happens.

In the search service application management screen, click on the default content access account to change credentials. In my example I have changed the account from SEVENSIGMA\searchservice to SEVENSIGMA\svcspsearch

image

Having made this change, lets review the effect in the Web Application User Policy and User Profile Service Application permissions. Note that the user policy for the old search crawl account remains, but the new account has had an entry automatically created. (Now you know why you end up with multiple accounts with the display name of “Search Crawling Account”)

image

Now lets check the User Profile Service Application. Now things are different! The search service account below refers to the *old* account SEVENSIGMA\searchservice. But the required permission of “Retrieve People Data for Search Crawlers” permission has not been granted!

image

 

image

If you traipsed through the ULS logs, you would see this:

Leaving Monitored Scope (Request (GET:https://web/_vti_bin/spscrawl.asmx)). Execution Time=7.2370958438429 c2a3d1fa-9efd-406a-8e44-6c9613231974
mssdmn.exe (0x23E4) 0x2B70 SharePoint Server Search FilterDaemon e4ye High FLTRDMN: Errorinfo is "HttpStatusCode Unauthorized The request failed with HTTP status 401: Unauthorized." [fltrsink.cxx:553] d:\office\source\search\native\mssdmn\fltrsink.cxx
mssearch.exe (0x02E8) 0x3B30 SharePoint Server Search Gatherer cd11 Warning The start address sps3s://web cannot be crawled. Context: Application ‘Search_Service_Application’, Catalog ‘Portal_Content’ Details: Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled. (0x80041205)

To correct this issue, manually grant the crawler account the “Retrieve People Data for Search Crawlers” permission in the User Profile Service. As a reminder, this is done via the Administrators icon in the “Manage Service Applications” ribbon.

image

Once this is done run a fill crawl and verify the result in the logs.4.

4. Missing root site collection

A more uncommon issue that I once encountered is when the web application being crawled is missing a default site collection. In other words, while there are site collections defined using a managed path, such as http://WEB/SITES/SITE, there is no site collection defined at HTTP://WEB.

The crawler does not like this at all, and you get two different errors depending on whether the SPS or HTTP connector used.

  • SPS:// – Error in PortalCrawl Web Service (0x80042617)
  • HTTP:// – The item could not be accessed on the remote server because its address has an invalid syntax (0x80041208)

image

The fix for this should be fairly obvious. Go and make a default site collection for the web application and re-run a crawl.

5. Alternative Access Mappings and Contextual Scopes

SharePoint guru (and my squash nemesis), Nick Hadlee posted recently about a problem where there are no search results on contextual search scopes. If you are wondering what they are Nick explains:

Contextual scopes are a really useful way of performing searches that are restricted to a specific site or list. The “This Site: [Site Name]”, “This List: [List Name]” are the dead giveaways for a contextual scope. What’s better is contextual scopes are auto-magically created and managed by SharePoint for you so you should pretty much just use them in my opinion.

The issue is that when the alternate access mapping (AAM) settings for the default zone on a web application do not match your search content source, the contextual scopes return no results.

I came across this problem a couple of times recently and the fix is really pretty simple – check your alternate access mapping (AAM) settings and make sure the host header that is specified in your default zone is the same url you have used in your search content source. Normally SharePoint kindly creates the entry in the content source whenever you create a web application but if you have changed around any AAM settings and these two things don’t match then your contextual results will be empty. Case Closed!

Thanks Nick

6. Active Directory Policies, Proxies and Stateful Inspection

A particularly insidious way to have problems with Search (and not just people search) is via Active Directory policies. For those of you who don’t know what AD policies are, they basically allow geeks to go on a power trip with users desktop settings. Consider the image below. Essentially an administrator can enforce a massive array of settings for all PC’s on the network. Such is the extent of what can be controlled, that I can’t fit it into a single screenshot. What is listed below is but a small portion of what an anal retentive Nazi administrator has at their disposal (mwahahaha!)

image

Common uses of policies include restricting certain desktop settings to maintain consistency, as well as enforce Internet explorer security settings, such as proxy server and security settings like maintaining the trusted sites list. One of the common issues encountered with a global policy defined proxy server in particular is that the search service account will have its profile modified to use the proxy server.

The result of this is that now the proxy sits between the search crawler and the content source to be crawled as shown below:

Crawler —–> Proxy Server —–> Content Source

Now even though the crawler does not use Internet Explorer per se, proxy settings aren’t actually specific to Internet Explorer. Internet explorer, like the search crawler, uses wininet.dll. Wininet is a module that contains Internet-related functions used by Windows applications and it is this component that utilises proxy settings.

Sometimes people will troubleshoot this issue by using telnet to connect to the HTTP port. "ie: “Telnet web 80”. But telnet does not use the wininet component, so is actually not a valid method for testing. Telnet will happily report that the web server is listening on port 80 or 443, but it matters not when the crawler tries to access that port via the proxy. Furthermore, even if the crawler and the content source are on the same server, the result is the same. As soon as the crawler attempts to index a content source, the request will be routed to the proxy server. Depending on the vendor and configuration of the proxy server, various things can happen including:

  • The proxy server cannot handle the NTLM authentication and passes back a 400 error code to the crawler
  • The proxy server has funky stateful inspection which interferes with the allowed HTTP verbs in the communications and interferes with the crawl

For what its worth, it is not just proxy settings that can interfere with the HTTP communications between the crawler and the crawled. I have seen security software also get in the way, which monitors HTTP communications and pre-emptively terminates connections or modifies the content of the HTTP request. The effect is that the results passed back to the crawler are not what it expects and the crawler naturally reports that it could not access the data source with suitably weird error messages.

Now the very thing that makes this scenario hard to troubleshoot is the tell-tale sign for it. That is: nothing will be logged in the ULS logs, not the IIS logs for the search service. This is because the errors will be logged in the proxy server or the overly enthusiastic stateful security software.

If you suspect the problem is a proxy server issue,  but do not have access to the proxy server to check logs, the best way to troubleshoot this issue is to temporarily grant the search crawler account enough access to log into the server interactively. Open internet explorer and manually check the proxy settings. If you confirm a policy based proxy setting, you might be able to temporarily disable it and retry a crawl (until the next AD policy refresh reapplies the settings). The ideal way to cure this problem is to ask your friendly Active Directory administrator to either:

  • Remove the proxy altogether from the SharePoint server (watch for certificate revocation slowness as a result)
  • Configure an exclusion in the proxy settings for the AD policy to that the content sources for crawling are not proxied
  • Create a new AD policy specifically for the SharePoint box so that the default settings apply to the rest of the domain member computers.

If you suspect the issue might be overly zealous stateful inspection, temporarily disable all security-type software on the server and retry a crawl. Just remember, that if you have no logs on the server being crawled, chances are its not being crawled and you have to look elsewhere.

7. Pre-Windows 2000 Compatibility Access Group

In an earlier post of mine, I hit an issue where search would yield no results for a regular user, but a domain administrator could happily search SP2010 and get results. Another symptom associated with this particular problem is certain recurring errors event log – Event ID 28005 and 4625.

  • ID 28005 shows the message “An exception occurred while enqueueing a message in the target queue. Error: 15404, State: 19. Could not obtain information about Windows NT group/user ‘DOMAIN\someuser’, error code 0×5”.
  • The 4625 error would complain “An account failed to log on. Unknown user name or bad password status 0xc000006d, sub status 0xc0000064” or else “An Error occured during Logon, Status: 0xc000005e, Sub Status: 0x0”

If you turn up the debug logs inside SharePoint Central Administration for the “Query” and “Query Processor” functions of “SharePoint Server Search” you will get an error “AuthzInitializeContextFromSid failed with ERROR_ACCESS_DENIED. This error indicates that the account under which this process is executing may not have read access to the tokenGroupsGlobalAndUniversal attribute on the querying user’s Active Directory object. Query results which require non-Claims Windows authorization will not be returned to this querying user.

image

The fix is to add your search service account to a group called “Pre-Windows 2000 Compatibility Access” group. The issue is that SharePoint 2010 re-introduced something that was in SP2003 – an API call to a function called AuthzInitializeContextFromSid. Apparently it was not used in SP2007, but its back for SP2010. This particular function requires a certain permission in Active Directory and the “Pre-Windows 2000 Compatibility Access” group happens to have the right required to read the “tokenGroupsGlobalAndUniversal“ Active Directory attribute that is described in the debug error above.

8. Bloody developers!

Finally, Patrick Lamber blogs about another cause of crawler issues. In his case, someone developed a custom web part that had an exception thrown when the site was crawled. For whatever reason, this exception did not get thrown when the site was viewed normally via a browser. As a result no pages or content on the site could be crawled because all the crawler would see, no matter what it clicked would be the dreaded “An unexpected error has occurred”. When you think about it, any custom code that takes action based on browser parameters such as locale or language might cause an exception like this – and therefore cause the crawler some grief.

In Patricks case there was a second issue as well. His team had developed a custom HTTPModule that did some URL rewriting. As Patrick states “The indexer seemed to hate our redirections with the Response.Redirect command. I simply removed the automatic redirection on the indexing server. Afterwards, everything worked fine”.

In this case Patrick was using a multi-server farm with a dedicated index server, allowing him to remove the HTTP module for that one server. in smaller deployments you may not have this luxury. So apart from the obvious opportunity to bag programmers :-), this example nicely shows that it is easy for a 3rd party application or code to break search. What is important for developers to realise is that client web browsers are not the only thing that loads SharePoint pages.

If you are not aware, the user agent User Agent string identifies the type of client accessing a resource. This is the means by which sites figure out what browser you are using. A quick look at the User Agent parameter by SharePoint Server 2010 search reveals that it identifies itself as “Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)“. At the very least, test any custom user interface code such as web parts against this string, as well as check the crawl logs when it indexes any custom developed stuff.

Conclusion

Well, that’s pretty much my list of gotchas. No doubt there are lots more, but hopefully this slightly more detailed exploration of them might help some people.

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

www.spgovia.com

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Mar 31 2011

Consequences of complexity–the evilness of the SharePoint 2010 User Profile Service

Send to Kindle

Hiya

A few months back I posted a relatively well behaved rant over the ridiculously complex User Profile Service Application of SharePoint 2010. I think this component in particular epitomises SharePoint 2010’s awful combination of “design by committee” clunkiness, along with real-world sheltered Microsoft product manager groupthink which seems to rate success on the number of half baked features packed in, as opposed to how well those features install logically, integrate with other products and function properly in real-world scenarios.

Now truth be told, until yesterday, I have had an unblemished record with the User Profile Service – being able to successfully provision it first time at all sites I have visited (and no I did not resort to running it all as administrator). Of course, we all have Spence to thank for this with his rational guide. Nevertheless, I am strongly starting to think that I should write the irrational guide as a sort of bizzaro version of Spencers articles, which combines his rigour with some mega-ranting ;-).

So what happened to blemish my perfect record? Bloody Active Directory policies – that’s what.

In case you didn’t know, SharePoint uses a scaled down, pre-release version of Forefront Identify Manager. Presumably the logic here to this was to allow more flexibility, by two-way syncing to various directory services, thereby saving the SharePoint team development time and effort, as well as being able to tout yet another cool feature to the masses. Of course, the trade-off that the programmers overlooked is the insane complexity that they introduced as a result. I’m sure if you asked Microsoft’s support staff what they think of the UPS, they will tell you it has not worked out overly well. Whether that feedback has made it way back to the hallowed ground of the open-plan cubicles of SharePoint product development I can only guess. But I theorise that if Microsoft made their SharePoint devs accountable for providing front-line tech support for their components, they will suddenly understand why conspiracy theorist support and infrastructure guys act the way they do.

Anyway I better supress my desire for an all out rant and tell you the problem and the fix. The site in question was actually a fairly simple set-up. Two server farm and a single AD forest. About the only thing of significance from the absolute stock standard setup was that the active directory NETBIOS name did not match the active directory fully qualified domain name. But this is actually a well known and well covered by TechNet and Spence. A quick bit of PowerShell goodness and some AD permission configuration sorts the issue.

Yet when I provisioned the User Profile Service Application and then tried to start the User Profile Synchronisation Service on the server (the big, scary step that strikes fear into practitioners), I hit the sadly common “stuck on starting” error. The ULS logs told me utterly nothing of significance – even when i turned the debug juice to full throttle. The ever helpful windows event logs showed me Event ID 3:

ForeFront Identity Manager,
Level: Error

.Net SqlClient Data Provider: System.Data.SqlClient.SqlException: HostId is not registered
at Microsoft.ResourceManagement.Data.Exception.DataAccessExceptionManager.ThrowException(SqlException innerException)
at Microsoft.ResourceManagement.Data.DataAccess.RetrieveWorkflowDataForHostActivator(Int16 hostId, Int16 pingIntervalSecs, Int32 activeHostedWorkflowDefinitionsSequenceNumber, Int16 workflowControlMessagesMaxPerMinute, Int16 requestRecoveryMaxPerMinute, Int16 requestCleanupMaxPerMinute, Boolean runRequestRecoveryScan, Boolean& doPolicyApplicationDispatch, ReadOnlyCollection`1& activeHostedWorkflowDefinitions, ReadOnlyCollection`1& workflowControlMessages, List`1& requestsToRedispatch)
at Microsoft.ResourceManagement.Workflow.Hosting.HostActivator.RetrieveWorkflowDataForHostActivator()
at Microsoft.ResourceManagement.Workflow.Hosting.HostActivator.ActivateHosts(Object source, ElapsedEventArgs e)

The most common issue with this message is the NETBIOS issue I mentioned earlier. But in my case this proved to be fruitless. I also took Spence’s advice and installed the Feb 2011 cumulative update for SharePoint 2010, but to no avail. Every time I provisioned the UPS sync service, I received the above persistent error – many, many, many times. :-(

For what its worth, forget googling the above error because it is a bit of a red herring and you will find issues that will likely point you to the wrong places.

In my case, the key to the resolution lay in understanding my previously documented issue with the UPS and self-signed certificate creation. This time, I noticed that the certificates were successfully created before the above error happened.  MIISCLIENT showed no configuration had been written to Forefront Identity Manager at all. Then I remembered that the SharePoint User Profile Service Application talks to Forefront over HTTPS on port 5725. As soon as I remembered that HTTP was the communication mechanism, I had a strong suspicion on where the problem was – as I have seen this sort of crap before…

I wondered if some stupid proxy setting was getting in the way. Back in the halcyon days of SharePoint 2003, I had this issue when scheduling SMIGRATE tasks, where the account used to run SMIGRATE is configured to use a proxy server, would fail. To find out if this was the case here, a quick execute of the GPRESULT tool and we realised that there was a proxy configuration script applied at the domain level for all users. We then logged in as the farm account interactively (given that to provision the UPS it needs to be Administrator anyway this was not a problem). We then disabled all proxy configuration via Internet explorer and tried again.

Blammo! The service provisions and we are cooking with gas! it was the bloody proxy server. Reconfigure group policy and all is good.

Conclusion

The moral of the story is this. Anytime windows components communicate with each-other via HTTP, there is always a chance that some AD induced dumbass proxy setting might get in the way. If not that, stateful security apps that check out HTTP traffic or even a corrupted cache (as happened in this case). The ULS logs will never tell you much here, because the problem is not SharePoint per se, but the registry configuration enforced by policy.

So, to ensure that you do not get affected by this, configure all SharePoint servers to be excluded from proxy access, or configure the SharePoint farm account not to use a proxy server at all. (Watch for certificate revocation related slowness if you do this though).

Finally, I called this post “consequences of complexity” because this sort of problem is very tricky to identify the root cause. With so many variables in the mix, how the hell can people figure this sort of stuff out?

Seriously Microsoft, you need to adjust your measures of success to include resiliency of the platform!

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Feb 17 2011

A brief sojourn into the world of Exchange 2010

Send to Kindle

Okay so this post is going to seem way out of place because it has utterly nothing to do with SharePoint and instead focuses on Microsoft Exchange Server. To explain why I have to give you a quick history lesson.

Before I was a SharePoint guy, I was a networking, infrastructure and security guy. In fact I met and worked with Jeremy Thake before either of us were full-time SharePoint guys. If you were to ask him I’m sure he would tell you I was a bit of an infrastructure and security nazi back then. What warms my heart though is that since then I have mellowed out and now Jeremy has taken on some of those nazi tendencies (I have heard him threaten to “hunt you down if you do that” at user group presentation – referring to some dodgy SharePoint developer practice that will hurt you later).

Anyways, I still get asked to do the odd bit of Cisco, Active Directory and Exchange work. Although my interest in these areas is practically nil, some part of me likes to have a crack at it every so often to make sure I can get on the bike again – so to speak. Each time I get on the bike, I then remember why I got off in the first place ;-( (It’s a bit like eating KFC – you swear you will never do it again but given enough time, the pain seems to fade)

This time around, I agreed to help a client extricate themselves from the evil nightmare that is Small Business Server 2008, to real, grown up Windows Server environment. They had outgrown SBS and had been taken over by a foreign company and there was a need for AD domain trusts, among many other things – something that SBS can’t do. As part of this I had to get Exchange, SharePoint, AD, WSUS, Certificate services and various other things like RRAS, DHCP and DNS off the Small Business Server and onto real servers.

So first up my big Exchange 2010 lesson learned and then some detail on how you too can make Small Business Server 2008 history in your organisation.

My first and last exchange post: Exchange 2010 RTM and SP1 do not play nice!

Due to the nature of this upgrade, I had to set up a temporary exchange server 2010 box to be a temporary mailbox store. Provided that any Exchange 2007 servers in the organisation are running Service Pack 1, mail can happily route between Exchange 2007 and 2010 servers. Once the migration of mailboxes was complete, we decommissioned the Small Business Server following the steps outlined in the next section. We then installed a fresh, new Win2008R2 + Exchange 2010 as the final server – only this time with Exchange 2010 service pack 1 (the client used newer media this time).

All went well, the new server installed fine. So now I had two Exchange 2010 servers in the organisation, one RTM and one SP1. I was able to manage both servers using Exchange system manager on both servers and there was nothing untoward in the logs on either server.

However, when I tried to move a mailbox from the RTM box to the SP1 box, I received the following error:

Service ‘net.tcp://<servername>/Microsoft.Exchange.MailboxReplicationService’ encountered an exception. Error: MapiExceptionNoAccess: Unable to open message store. (hr=0x80070005, ec=-2147024891)

Diagnostic context:
    Lid: 18969   EcDoRpcExt2 called [length=132]

[blah blah blah skip ugly stack trace stuff]

Exception details: MapiExceptionNoAccess (80070005): MapiExceptionNoAccess: Unable to open message store. (hr=0x80070005, ec=-2147024891)

[blah blah blah skip more ugly stack trace stuff]

As you can see, not a helpful message at all. So I tried to initiate the mailbox move from the SP1 server instead of the RTM server. This time, I received a different error:

There are no available servers that are running the Mailbox Replication Service.
+ CategoryInfo : NotSpecified: (0:Int32) [New-MoveRequest], MailboxReplicationTransientException + FullyQualifiedErrorId : 5C08CF31,Microsoft.Exchange.Management.RecipientTasks.NewMoveRequest

Now as Johnny says, this error suggests that no exchange server in the organisation is running the mailbox replication service. However in my case the RTM box was running this service and it was started. Clearly something was amiss.

Google didn’t show much about this problem, and I considered calling Microsoft support, but knew full well that they would probably make me install SP1 on both boxes before investigating. So I installed SP1, following Gnawgnu’s advice about surviving an Exchange 2010 Service Pack 1 install. All went smoothly and when I reattempted the mailbox move, everything worked fine.

Moral of the story, apparently a stack trace is an appropriate error message for an incompatibility between Exchange versions. C’mon exchange product team, you are no better than SharePoint in terms of horrible error messages. Surely a version check would be an easy use-case to test for?

How to extricate yourself from Small Business Server 2008

For what its worth, getting SBS2008 out of your domain is a bit like pulling teeth. It really doesn’t want to go. Nevertheless it can be done and I largely followed this unofficial guide and can confirm that it works for me (I have added a couple of steps below, and also remember, this is SBS2008 we are talking about so its bound to go wrong somewhere)

1. Upgrade the AD schema of the SBS2008 domain

  • Using your new Win2008 R2 media find the adprep utility and run: Adprep /forestprep and Adprep /domainPrep
  • On SBS 2008 ensure that the schema version is updated to 47 and *not* 44 by checking the HKLM\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Schema Version registry key

2. Install Win2008R2 on your soon to be new domain controller and add it as a member server of your SBS2008 domain

3. Install the Active Directory Domain Services role and then launch the Active Directory Domain Services Installation Wizard (dcpromo.exe).

  • On the Choose a Deployment Configuration page, click Existing forest
  • On the Additional Domain Controller Options page, make sure the DNS and Global Catalog is checked
  • Check all of your group policies for reference to the original SBS server and repoint to the new AD server (recreating share folders where necessary)
  • Move FSMO roles to the new AD server: http://support.microsoft.com/default.aspx?scid=kb;EN-US;255504
  • Add the AD certificate services role and backup/restore the certificate store
  • Install DHCP and backup/restore config from SBS box and then remove DHCP role from SBS2008
  • Change DHCP scopes so DNS points to the new DC, as well as statically assigned devices

6. Install Win2008R2 on your soon to be Exchange Server and install Exchange 2010 (with the hub, client access server and mailbox roles)

  • Patch Exchange 2007 on your SBS2008 Server to SP1 if it is not already (otherwise you cannot move mailboxes to the Exchange 2010 server)
  • Note: You need to download a special Exchange SP1 installer for Small Business Server as the default installer will refuse to install on account that a SBS box does not meet minimum conditions for install
  • Move mailboxes and public folders from SBS2008 server to the new Exchange 2010 Server
  • Export IIS certificates from the SBS2008 server to the new server and then set up client access (OWA, ActiveSync and Outlook Anywhere) with the same certificate
  • Reconfigue your router/firewall to the ne server for OWA/Activesync/Outlook Anywhere

7. Uninstall Exchange 2007 from SBS 2008

8. Install Win2008R2 on your soon to be SharePoint Server and install Search Server Express 2010 or whatever SharePoint edition you have paid for

  • Create a new web application
  • Restore Companyweb site using database reattach method
  • Reconfigure companyweb search to use enterprise search template that comes with Search Server Express 2010
  • Install Microsoft fax (if you used faxing in SBS2008) and enable email based fax routing
  • Configure incoming email to SharePoint by configuring a subdomain in active directory and configuring a remote domain in Exchange 2010 Hub Transport
  • Mail enable the Faxes document library in companyweb
  • Set the destination for faxes to be the Faxes document library in companyweb

8. DCPromo down SBS 2008 and remove SBS 2008 from the network.

9. Crack open a beer and celebrate your victory

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Aug 15 2010

More User Profile Sync issues in SP2010: Certificate Provisioning Fun

Send to Kindle

Wow, isn’t the SharePoint 2010 User Profile Service just a barrel of laughs. Without a bit of context, when you compare it to SP2007, you can do little but shake your head in bewilderment at how complex it now appears.

I have a theory about all of this. I think that this saga started over a beer in 2008 or so.

I think that Microsoft decided that SharePoint 2010 should be able to write back to Active Directory (something that AD purists dislike but sold Bamboo many copies of their sync tool). Presumably the SharePoint team get on really well with the Forefront Identify Manager team and over a few Friday beers, the FIM guys said “Why write your own? Use our fit for purpose tool that does exactly this. As an added bonus, you can sync to other directories easily too”.

“Damn, that *is* a good idea”, says the SharePoint team and the rest is history. Remember the old saying, the road to hell is paved with good intentions?

Anyways, when you provision the UPS enough times, and understand what Forefront Identity Manager does, it all starts to make sense. Of course, to have it make sense, requires you to mess it up in the first place and I think that everyone universally will do this – because it is essentially impossible to get it right the first time unless you run everything as domain administrator. This is a key factor that I feel did not get enough attention within the product team. I have now visited three sites where I have had to correct issues with the user profile service. Remember, not all of us do SharePoint all day – for the humble system administrator that is also catering with the overall network, this implementation is simply too complex. Result? Microsoft support engineers are going to get a lot of calls here – and its going to cost Microsoft that way.

One use-case they never tested

I am only going to talk about one of the issues today because Spence has written the definitive article that will get you through if you are doing it from scratch.

I went to a client site where they had attempted to provision the user profile synchronisation unsuccessfully. I have no idea of the things they tried because I wasn’t there unfortunately, but I made a few changes to permissions, AD rights and local security policy as per Spencers post. I then provisioned user profile sync again and I hit this issue. A sequence of 4 event log entries.

Event ID:      234
Description:
ILM Certificate could not be created: Cert step 2 could not be created: C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

Event ID:      234
Description:
ILM Certificate could not be created: Cert could not be added: C:\Program Files\Microsoft Office Servers\14.0\Tools\CertMgr.exe -add -r LocalMachine -s My -c -n “ForefrontIdentityManager” -r LocalMachine -s TrustedPeople

Event ID:      234
Description:
ILM Certificate could not be created: netsh http error:netsh http add urlacl url=
http://+:5725/ user=Domain\spfarm sddl=D:(A;;GA;;;S-1-5-21-2972807998-902629894-2323022004-1104)

Event ID:      234
Description:
Cannot get the self issued certificate thumbprint:

The theory

Luckily this one of those rare times where the error message actually makes sense (well – if you have worked with PKI stuff before). Clearly something went wrong in the creation of certificates. Looking at the sequence of events, it seems that as part of provisioning ForeFront Identity Manager, a self signed certificate was created for the Computer Account, added to the Trusted People certificate store and then is used for SSL on a web application or web service listening on port 5725.

By the way, don’t go looking for the web app listening on such a port in IIS because its not there. Just like SQL Reporting Services, FIM likely uses very little of IIS and doesn’t need the overhead.  

The way I ended up troubleshooting this issue was to take a good look at the first error in the sequence and what the command was trying to do. Note the description in the event log is important here. “ILM Certificate could not be created: Cert step 2 could not be created”. So this implies that this command is the second step in a sequence and there was a step 1 that must have worked. Below is the step 2 command that was attempted.

C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

When you create a certificate, it has to have a trusted issuer. Verisign and Thawte are examples and all browsers consider them trustworthy issuers. But we are not using 3rd party issuers here. Forefront uses a self-signed certificate. In other words, it trusts itself. We can infer that step 1 is the creation of this self-trusted certificate issuer by looking at the parameters of the MakeCert command that step 2 is using.

Now I am not going to annotate every Makecert parameter here, but the English version of the command above says something like:

Make me a shiny new certificate for the local machine account and call it “ForefrontIdentityManager”, issued by a root certificate that can be found in the trusted root store also called ForeFrontIdentityManager.

So this command implies that step 1 was the creation of that root certificate that issues the other certificates. (Product team – you could have given the name of the root issuer certificate something different to the issued certificate)

The root cause

Now that we have established a theory of what is going on, the next step is to run the failing Makecrt command from a prompt and see what we get back. Make sure you do this as the Sharepoint farm account so you are comparing apples with apples.

C:\Program Files\Microsoft Office Servers\14.0\Tools>MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

Error: There are more than one matching certificate in the issuer’s root cert store. Failed

Aha! so what do we have here? The error message states that we have more than 1 matching certificate in the issuers root certificate store.

For what its worth it is the parameters “-ir localmachine -is root” that specifies the certificate store to use. In this case, it is the trusted root certificate store on the local computer.

So lets go and take a look. Run the Microsoft Management Console (MMC) and Choose “Add/Remove Snap In” from the File Menu.

image

From the list of snap ins choose Certificates and then choose “Computer Account”

image

Now in the list of certificate stores, we need to examine the one that the command refers to: The Trusted Root Certification Authorities store. Well, look at that, the error was telling the truth!

image

Clearly the Forefront Identity Manager provisioning/unprovisioning code does not check for all circumstances. I can only theorise what my client did to cause this situation because I wasn’t privy to what was done on this particular install before I got there. but step 1 of this provisioning process would create an issuing certificate whether one existed already or not. Step 2 then failed because it had no way to determine which of these certificates is the authoritative one.

This was further exacerbated because each re-attempt creates another root certificate because there is no check whether a certificate already exists.

The cure is quite easy. Just delete all of the ForefrontIdentityManager certificates from the Trusted Root Certification Authorities and re-provision the user profile sync in SharePoint. Provided that there is no certificate in this store to begin with, step 1 will create it and step 2 will then be able to create the self signed certificate using this issuer just fine.

Conclusion (and minor rant)

Many SharePoint pros have commented on the insane complexity of the new design of the user profile sync system. Yes I understand the increased flexibility offered by the new regime, leveraging a mature product like Forefront, but I see that with all of this flexibility comes risk that has not been well accounted for. SP2010 is twice as tough to learn as SP2007 and it is more likely that you will make a mistake than not making one. The more components added, the more points of failure and the less capable over-burdened support staff are in dealing with it when it happens.

SharePoint 2010 is barely out of nappies and I have already been in a remediation role for several sites over the user profile service alone.

I propose that Microsoft add a new program level KPI to rate how well they are doing with their SharePoint product development. That KPI should be something like % of time a system administrator can provision a feature without making a mistake or resorting to running it all as admin. The benefit to Microsoft would be tangible in terms of support calls and failed implementations. Such a KPI would force the product team to look at an example like the user profile service and think “How can we make this more resilient?”. “How can we remove the number of manual steps that are not obvious?”, “how can we make the wizard clearer to understand?” (yes they *will* use the wizard).

Right now it feels like the KPI was how many features could be crammed, in as well as how much integration between components there is. If there is indeed a KPI for that they definitely nailed it this time around.

Don’t get me wrong – its all good stuff, but if Microsoft are stumping seasoned SharePoint pros with this stuff, then they definitely need to change the focus a bit in terms of what constitutes a good product.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Aug 12 2010

Index index everywhere but not a result in sight.

Send to Kindle

I have been doing a bit more tech work than normal lately – SP2010 popularity I guess, and was asked to remediate a few issues on a problematic server that I hadn’t set up. The server in question had a number of issues (over and above the usual “lets all run it as one account” type stuff) that had a single root cause, so I thought I’d quickly document the symptoms and the cause here.

Symptom 1: SQL Server Event ID 28005:

“An exception occurred while enqueueing a message in the target queue. Error: 15404, State: 19. Could not obtain information about Windows NT group/user ‘DOMAIN\someuser’, error code 0x5”

This error would be reported in the Application log around 70 times per minute. As it happened, I had removed the account in question from running any of the SharePoint web, application or windows services, but this still persisted. I suspect SharePoint was installed as this account because it was the db owner of many of the databases on the SQL Server. Whatever the case, SQL was whinging about it despite its lack of actual need to be there.

Symptom 2: Event ID 4625:

At a similar rate of knots as the SQL error, was the rate of 4625 errors in the security log. These logs were not complaining about the account that the SQL event complained about, but instead it complained about ANY account running the SQL Server instance. I tried network service, a domain account and a local account and saw similar errors (although the local one had a different code).

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Event ID:      4625
Task Category: Logon
Keywords:      Audit Failure
Description:  An account failed to log on. 

Subject:
    Security ID:        NETWORK SERVICE
    Account Name:        COMPUTER$
    Account Domain:        DOMAIN
    Logon ID:        0x3e4 

Failure Information:
    Failure Reason:        Unknown user name or bad password.
    Status:            0xc000006d
    Sub Status:        0xc0000064 

Process Information:
    Caller Process ID:    0x7e0
    Caller Process Name:    E:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\sqlservr.exe 

Detailed Authentication Information:
    Logon Process:        Authz   
    Authentication Package:    Kerberos

When using a local, rather than a domain user account the code was:

Failure Information:
    Failure Reason:        An Error occured during Logon.
    Status:            0xc000005e
    Sub Status:        0x0 

Symptom 3: Search only working for domain administrators

On top of the logs being filled by endless entries of the previous two, I had another error (the original reason why I was called in actually). A SharePoint search would yield zip, nada, zero, no results for a regular user, but a domain administrator could happily search SP2010 and get results. (well actually regular users did get some results – people searched actually worked fine).

The crawler was fine and dandy and the default content source had not been messed with. There were no errors or logs to suggest anything untoward.

The resolution:

It was the second symptom that threw me because I thought that the problem must have been kerberos config. But I quickly discounted that after checking SPN’s and the like (notwithstanding the fact this was a single server install anyway!)

On a hunch (helped by the fact that I had dealt with the issue of registering managed accounts not so long ago), I concentrated on the user account that was causing SQL Server all the trouble (Event ID 28005). I loaded up Active Directory and temporarily changed the security of this user account so that “Authenticated Users” had “READ” access to it.

image

As soon as I did this, both event ID 28005 and 4625 stopped.

I then checked the search (symptom 3) and it was still barfing. In this case I decided to turn up the debug juice on the “Query” and “Query Processor” functions of “SharePoint Server Search”.

image

After upping the level of verbosity, I found what I was looking for.

08/12/2010 22:13:41.00     w3wp.exe (0x2228)                           0x1F0C    SharePoint Server Search          Query Processor                   g2j3    High        AuthzInitializeContextFromSid failed with ERROR_ACCESS_DENIED. This error indicates that the account under which this process is executing may not have read access to the tokenGroupsGlobalAndUniversal attribute on the querying user’s Active Directory object. Query results which require non-Claims Windows authorization will not be returned to this querying user.    debd2c54-d6a5-41b8-bf26-c4697b36f4d4

I knew immediately that this was very likely the same issue as the first two symptoms and when I googled this result, my Perth compatriot, Jeremy Thake had hit the same issue in July. The fix is to add your search service account to a group called “Pre-Windows 2000 Compatibility Access” group. This group happens to have the right required to read this attribute. Whether it is the same attribute I needed for my SQL issue, or the registering managed accounts issue I don’t know, but what I do know is that this group loosens up permissions enough to cure all four of these issues.

The little security guy in me keeps telling me I should confirm the least privilege in each of these scenarios, but hey if Microsoft are saying to put the accounts into this group, then who am I to argue?

Finally, it turns out that SP2010 re-introduced something that was in SP2003. A call to a function called AuthzInitializeContextFromSid which seems to be the root of it all. Apparently it was not used in SP2007, but its sure there now. I assume that one of the many stored procedures that SharePoint would call in SQL may have been the cause of Symptom 1. When you look at Symptom 2, it now makes sense because AD was faithfully reporting an access denied when the call to AuthzInitializeContextFromSid was made. It reported SQL Server as the culprit, I assume, because of a stored proc doing the work perhaps? It just sucks that the security event logged is a fairly stock message that doesn’t give you enough specifics to really work out what is going on.

Anyway, I hope that helps someone else – as googling the event ID details is not overly helpful

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Sep 16 2008

Sometimes "Microsoft bashing" is justified

Send to Kindle

Microsoft bashing is a favourite pastime of many a nerd. Whether it is justified or not in many cases is debatable since M$ will never please everyone. But the point is, it is cathartic and in actual fact, good therapy because venting your frustrations at Bill Gates is much healthier than at your colleagues or family.

To my Microsoft employee friends reading this. Don’t feel all defensive – some of the very best Microsoft bashing I have ever heard comes from you guys anyway :-)

So although sometimes the M$ bashing is completely unjustified, long shall it continue to preserve the sanity of IT professionals around the globe.

Having said that, on occasion you will hit some Microsoft induced pain that is legitimately and frustratingly dumb. By "legitimately", I mean that you cannot say "although in hindsight it was dumb, I can actually understand why they decided to do that". Instead you get caught out and experience pain and frustration simply because of a silly Microsoft oversight.

In this case, the oversight is with the SharePoint Configuration Wizard

Continue reading “Sometimes "Microsoft bashing" is justified”

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Aug 25 2008

All in the name of "security"…

Send to Kindle

Here is a recent little story about when, in the name of "security", a really dumb thing was done, and the response that said a lot about the security posture of those behind the response.

A client of mine has 4 servers (2 for an Active Directory domain, and two for SharePoint/SQL server) hosted with an external provider. I was commissioned to perform a fairly standard install of MOSS 2007 enterprise.

My former life in security still influences me to this day, and thus I always build SharePoint in a fairly locked down fashion. So, apart from some strict naming conventions among components, I used a bunch of user accounts to run the various SharePoint services. I made sure that none of which have any privileges over and above what they absolutely require for SharePoint to work.

The install was fairly flawless and was over in a couple of hours, however my client called me half a day later to let me know that search was broken.

Continue reading “All in the name of "security"…”

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle


Next Page »

Today is: Friday 24 October 2014 |