A few months back I posted a relatively well behaved rant over the ridiculously complex User Profile Service Application of SharePoint 2010. I think this component in particular epitomises SharePoint 2010’s awful combination of “design by committee” clunkiness, along with real-world sheltered Microsoft product manager groupthink which seems to rate success on the number of half baked features packed in, as opposed to how well those features install logically, integrate with other products and function properly in real-world scenarios.
Now truth be told, until yesterday, I have had an unblemished record with the User Profile Service – being able to successfully provision it first time at all sites I have visited (and no I did not resort to running it all as administrator). Of course, we all have Spence to thank for this with his rational guide. Nevertheless, I am strongly starting to think that I should write the irrational guide as a sort of bizzaro version of Spencers articles, which combines his rigour with some mega-ranting ;-).
So what happened to blemish my perfect record? Bloody Active Directory policies – that’s what.
In case you didn’t know, SharePoint uses a scaled down, pre-release version of Forefront Identify Manager. Presumably the logic here to this was to allow more flexibility, by two-way syncing to various directory services, thereby saving the SharePoint team development time and effort, as well as being able to tout yet another cool feature to the masses. Of course, the trade-off that the programmers overlooked is the insane complexity that they introduced as a result. I’m sure if you asked Microsoft’s support staff what they think of the UPS, they will tell you it has not worked out overly well. Whether that feedback has made it way back to the hallowed ground of the open-plan cubicles of SharePoint product development I can only guess. But I theorise that if Microsoft made their SharePoint devs accountable for providing front-line tech support for their components, they will suddenly understand why conspiracy theorist support and infrastructure guys act the way they do.
Anyway I better supress my desire for an all out rant and tell you the problem and the fix. The site in question was actually a fairly simple set-up. Two server farm and a single AD forest. About the only thing of significance from the absolute stock standard setup was that the active directory NETBIOS name did not match the active directory fully qualified domain name. But this is actually a well known and well covered by TechNet and Spence. A quick bit of PowerShell goodness and some AD permission configuration sorts the issue.
Yet when I provisioned the User Profile Service Application and then tried to start the User Profile Synchronisation Service on the server (the big, scary step that strikes fear into practitioners), I hit the sadly common “stuck on starting” error. The ULS logs told me utterly nothing of significance – even when i turned the debug juice to full throttle. The ever helpful windows event logs showed me Event ID 3:
ForeFront Identity Manager,
.Net SqlClient Data Provider: System.Data.SqlClient.SqlException: HostId is not registered
at Microsoft.ResourceManagement.Data.Exception.DataAccessExceptionManager.ThrowException(SqlException innerException)
at Microsoft.ResourceManagement.Data.DataAccess.RetrieveWorkflowDataForHostActivator(Int16 hostId, Int16 pingIntervalSecs, Int32 activeHostedWorkflowDefinitionsSequenceNumber, Int16 workflowControlMessagesMaxPerMinute, Int16 requestRecoveryMaxPerMinute, Int16 requestCleanupMaxPerMinute, Boolean runRequestRecoveryScan, Boolean& doPolicyApplicationDispatch, ReadOnlyCollection`1& activeHostedWorkflowDefinitions, ReadOnlyCollection`1& workflowControlMessages, List`1& requestsToRedispatch)
at Microsoft.ResourceManagement.Workflow.Hosting.HostActivator.ActivateHosts(Object source, ElapsedEventArgs e)
The most common issue with this message is the NETBIOS issue I mentioned earlier. But in my case this proved to be fruitless. I also took Spence’s advice and installed the Feb 2011 cumulative update for SharePoint 2010, but to no avail. Every time I provisioned the UPS sync service, I received the above persistent error – many, many, many times.
For what its worth, forget googling the above error because it is a bit of a red herring and you will find issues that will likely point you to the wrong places.
In my case, the key to the resolution lay in understanding my previously documented issue with the UPS and self-signed certificate creation. This time, I noticed that the certificates were successfully created before the above error happened. MIISCLIENT showed no configuration had been written to Forefront Identity Manager at all. Then I remembered that the SharePoint User Profile Service Application talks to Forefront over HTTPS on port 5725. As soon as I remembered that HTTP was the communication mechanism, I had a strong suspicion on where the problem was – as I have seen this sort of crap before…
I wondered if some stupid proxy setting was getting in the way. Back in the halcyon days of SharePoint 2003, I had this issue when scheduling SMIGRATE tasks, where the account used to run SMIGRATE is configured to use a proxy server, would fail. To find out if this was the case here, a quick execute of the GPRESULT tool and we realised that there was a proxy configuration script applied at the domain level for all users. We then logged in as the farm account interactively (given that to provision the UPS it needs to be Administrator anyway this was not a problem). We then disabled all proxy configuration via Internet explorer and tried again.
Blammo! The service provisions and we are cooking with gas! it was the bloody proxy server. Reconfigure group policy and all is good.
The moral of the story is this. Anytime windows components communicate with each-other via HTTP, there is always a chance that some AD induced dumbass proxy setting might get in the way. If not that, stateful security apps that check out HTTP traffic or even a corrupted cache (as happened in this case). The ULS logs will never tell you much here, because the problem is not SharePoint per se, but the registry configuration enforced by policy.
So, to ensure that you do not get affected by this, configure all SharePoint servers to be excluded from proxy access, or configure the SharePoint farm account not to use a proxy server at all. (Watch for certificate revocation related slowness if you do this though).
Finally, I called this post “consequences of complexity” because this sort of problem is very tricky to identify the root cause. With so many variables in the mix, how the hell can people figure this sort of stuff out?
Seriously Microsoft, you need to adjust your measures of success to include resiliency of the platform!
Thanks for reading