More User Profile Sync issues in SP2010: Certificate Provisioning Fun

Send to Kindle

Wow, isn’t the SharePoint 2010 User Profile Service just a barrel of laughs. Without a bit of context, when you compare it to SP2007, you can do little but shake your head in bewilderment at how complex it now appears.

I have a theory about all of this. I think that this saga started over a beer in 2008 or so.

I think that Microsoft decided that SharePoint 2010 should be able to write back to Active Directory (something that AD purists dislike but sold Bamboo many copies of their sync tool). Presumably the SharePoint team get on really well with the Forefront Identify Manager team and over a few Friday beers, the FIM guys said “Why write your own? Use our fit for purpose tool that does exactly this. As an added bonus, you can sync to other directories easily too”.

“Damn, that *is* a good idea”, says the SharePoint team and the rest is history. Remember the old saying, the road to hell is paved with good intentions?

Anyways, when you provision the UPS enough times, and understand what Forefront Identity Manager does, it all starts to make sense. Of course, to have it make sense, requires you to mess it up in the first place and I think that everyone universally will do this – because it is essentially impossible to get it right the first time unless you run everything as domain administrator. This is a key factor that I feel did not get enough attention within the product team. I have now visited three sites where I have had to correct issues with the user profile service. Remember, not all of us do SharePoint all day – for the humble system administrator that is also catering with the overall network, this implementation is simply too complex. Result? Microsoft support engineers are going to get a lot of calls here – and its going to cost Microsoft that way.

One use-case they never tested

I am only going to talk about one of the issues today because Spence has written the definitive article that will get you through if you are doing it from scratch.

I went to a client site where they had attempted to provision the user profile synchronisation unsuccessfully. I have no idea of the things they tried because I wasn’t there unfortunately, but I made a few changes to permissions, AD rights and local security policy as per Spencers post. I then provisioned user profile sync again and I hit this issue. A sequence of 4 event log entries.

Event ID:      234
Description:
ILM Certificate could not be created: Cert step 2 could not be created: C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

Event ID:      234
Description:
ILM Certificate could not be created: Cert could not be added: C:\Program Files\Microsoft Office Servers\14.0\Tools\CertMgr.exe -add -r LocalMachine -s My -c -n “ForefrontIdentityManager” -r LocalMachine -s TrustedPeople

Event ID:      234
Description:
ILM Certificate could not be created: netsh http error:netsh http add urlacl url=
http://+:5725/ user=Domain\spfarm sddl=D:(A;;GA;;;S-1-5-21-2972807998-902629894-2323022004-1104)

Event ID:      234
Description:
Cannot get the self issued certificate thumbprint:

The theory

Luckily this one of those rare times where the error message actually makes sense (well – if you have worked with PKI stuff before). Clearly something went wrong in the creation of certificates. Looking at the sequence of events, it seems that as part of provisioning ForeFront Identity Manager, a self signed certificate was created for the Computer Account, added to the Trusted People certificate store and then is used for SSL on a web application or web service listening on port 5725.

By the way, don’t go looking for the web app listening on such a port in IIS because its not there. Just like SQL Reporting Services, FIM likely uses very little of IIS and doesn’t need the overhead.  

The way I ended up troubleshooting this issue was to take a good look at the first error in the sequence and what the command was trying to do. Note the description in the event log is important here. “ILM Certificate could not be created: Cert step 2 could not be created”. So this implies that this command is the second step in a sequence and there was a step 1 that must have worked. Below is the step 2 command that was attempted.

C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

When you create a certificate, it has to have a trusted issuer. Verisign and Thawte are examples and all browsers consider them trustworthy issuers. But we are not using 3rd party issuers here. Forefront uses a self-signed certificate. In other words, it trusts itself. We can infer that step 1 is the creation of this self-trusted certificate issuer by looking at the parameters of the MakeCert command that step 2 is using.

Now I am not going to annotate every Makecert parameter here, but the English version of the command above says something like:

Make me a shiny new certificate for the local machine account and call it “ForefrontIdentityManager”, issued by a root certificate that can be found in the trusted root store also called ForeFrontIdentityManager.

So this command implies that step 1 was the creation of that root certificate that issues the other certificates. (Product team – you could have given the name of the root issuer certificate something different to the issued certificate)

The root cause

Now that we have established a theory of what is going on, the next step is to run the failing Makecrt command from a prompt and see what we get back. Make sure you do this as the Sharepoint farm account so you are comparing apples with apples.

C:\Program Files\Microsoft Office Servers\14.0\Tools>MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

Error: There are more than one matching certificate in the issuer’s root cert store. Failed

Aha! so what do we have here? The error message states that we have more than 1 matching certificate in the issuers root certificate store.

For what its worth it is the parameters “-ir localmachine -is root” that specifies the certificate store to use. In this case, it is the trusted root certificate store on the local computer.

So lets go and take a look. Run the Microsoft Management Console (MMC) and Choose “Add/Remove Snap In” from the File Menu.

image

From the list of snap ins choose Certificates and then choose “Computer Account”

image

Now in the list of certificate stores, we need to examine the one that the command refers to: The Trusted Root Certification Authorities store. Well, look at that, the error was telling the truth!

image

Clearly the Forefront Identity Manager provisioning/unprovisioning code does not check for all circumstances. I can only theorise what my client did to cause this situation because I wasn’t privy to what was done on this particular install before I got there. but step 1 of this provisioning process would create an issuing certificate whether one existed already or not. Step 2 then failed because it had no way to determine which of these certificates is the authoritative one.

This was further exacerbated because each re-attempt creates another root certificate because there is no check whether a certificate already exists.

The cure is quite easy. Just delete all of the ForefrontIdentityManager certificates from the Trusted Root Certification Authorities and re-provision the user profile sync in SharePoint. Provided that there is no certificate in this store to begin with, step 1 will create it and step 2 will then be able to create the self signed certificate using this issuer just fine.

Conclusion (and minor rant)

Many SharePoint pros have commented on the insane complexity of the new design of the user profile sync system. Yes I understand the increased flexibility offered by the new regime, leveraging a mature product like Forefront, but I see that with all of this flexibility comes risk that has not been well accounted for. SP2010 is twice as tough to learn as SP2007 and it is more likely that you will make a mistake than not making one. The more components added, the more points of failure and the less capable over-burdened support staff are in dealing with it when it happens.

SharePoint 2010 is barely out of nappies and I have already been in a remediation role for several sites over the user profile service alone.

I propose that Microsoft add a new program level KPI to rate how well they are doing with their SharePoint product development. That KPI should be something like % of time a system administrator can provision a feature without making a mistake or resorting to running it all as admin. The benefit to Microsoft would be tangible in terms of support calls and failed implementations. Such a KPI would force the product team to look at an example like the user profile service and think “How can we make this more resilient?”. “How can we remove the number of manual steps that are not obvious?”, “how can we make the wizard clearer to understand?” (yes they *will* use the wizard).

Right now it feels like the KPI was how many features could be crammed, in as well as how much integration between components there is. If there is indeed a KPI for that they definitely nailed it this time around.

Don’t get me wrong – its all good stuff, but if Microsoft are stumping seasoned SharePoint pros with this stuff, then they definitely need to change the focus a bit in terms of what constitutes a good product.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle