Sometimes "Microsoft bashing" is justified

Microsoft bashing is a favourite pastime of many a nerd. Whether it is justified or not in many cases is debatable since M$ will never please everyone. But the point is, it is cathartic and in actual fact, good therapy because venting your frustrations at Bill Gates is much healthier than at your colleagues or family.

To my Microsoft employee friends reading this. Don’t feel all defensive – some of the very best Microsoft bashing I have ever heard comes from you guys anyway 🙂

So although sometimes the M$ bashing is completely unjustified, long shall it continue to preserve the sanity of IT professionals around the globe.

Having said that, on occasion you will hit some Microsoft induced pain that is legitimately and frustratingly dumb. By "legitimately", I mean that you cannot say "although in hindsight it was dumb, I can actually understand why they decided to do that". Instead you get caught out and experience pain and frustration simply because of a silly Microsoft oversight.

In this case, the oversight is with the SharePoint Configuration Wizard

Domain Controllers and SharePoint

I was unfortunate enough to inherit a MOSS 2007 install that had been performed on a Domain Controller and the farm was running with Domain Admin privileges. (Now this alone is not Microsoft bashworthy, as it is an administrator decision to make). Although you install SharePoint onto a domain controller, most people will tell you that it is not a best practice. I’d go as far as to say it is a bloody terrible idea unless you a really small organisation.

One of the main reasons for this is because a Domain Controller has no local security groups. There is no real concept of "local admin" access. Instead, if you are an administrator of the domain controller, by definition you are a Domain Administrator. If you are granted privileges on the domain controller, you are granted those privileges for the entire domain.

Why is this a problem? A few reasons actually, but for the purposes of this article, we only have to talk about one.

When installed onto a member server (i.e not a Domain Controller), SharePoint creates a bunch of local security groups. This is not unusual at all, as SQL Server does exactly the same. But on a domain controller, there are no local security groups – just domain groups. Thus, the SharePoint installer figures out that it is running on a Domain Controller and changes the behaviour of the creation of those groups.

Normally on a member server, a SharePoint Installation would create three local groups as shown below

image

In a domain controller situation, these local groups are not created because there is no concept of local security accounts. Accounts get created in Active Directory instead.

I mentioned earlier that SharePoint was running as domain administrator as well as installed onto a DC (talk about a double whammy!). My main motivation was to change this back to use low privileged accounts. I had created a new account to run the SharePoint farm, granted it db_creator and securityadmin privileges on the SQL Server, and added the account to the farm administrators group in SharePoint. However when I tried to change the application pool identity from the domain administrator to this account, I received the dreaded "Cannot connect to configuration database" error.

A quick scan of the logs showed that the farm account was still missing some required permissions on the server.

887x    Unexpected    Can not access configuration database registry key: System.Security.SecurityException: Requested registry access is not allowed.     at Microsoft.Win32.RegistryKey.OpenSubKey(String name, Boolean writable)     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.get_RegistryConnectionString()  The Zone of the assembly that failed was:  MyComputer     
88bl    Monitorable    An exception occured while trying to acquire the local farm: System.Security.SecurityException: Requested registry access is not allowed.     at Microsoft.Win32.RegistryKey.OpenSubKey(String name, Boolean writable)     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.get_RegistryConnectionString()     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.get_Local()     at Microsoft.SharePoint.Administration.SPFarm.FindLocal(SPFarm& farm, Boolean& isJoined)  The Zone of the assembly that failed was:  MyComputer     
887x    Unexpected    Can not access configuration database registry key: System.Security.SecurityException: Requested registry access is not allowed.     at Microsoft.Win32.RegistryKey.OpenSubKey(String name, Boolean writable)     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.get_RegistryConnectionString()  The Zone of the assembly that failed was:  MyComputer     
88bl    Monitorable    An exception occured while trying to acquire the local farm: System.Security.SecurityException: Requested registry access is not allowed.     at Microsoft.Win32.RegistryKey.OpenSubKey(String name, Boolean writable)     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.get_RegistryConnectionString()     at Microsoft.SharePoint.Administration.SPConfigurationDatabase.get_Local()     at Microsoft.SharePoint.Administration.SPFarm.FindLocal(SPFarm& farm, Boolean& isJoined)  The Zone of the assembly that failed was:  MyComputer     
8yr8    Exception    An unhandled exception has occurred. Access to the port is denied. System.UnauthorizedAccessException: Access to the port is denied.     at System.IO.Ports.InternalResources.WinIOError(Int32 errorCode, String str)     at System.Threading.Semaphore..ctor(Int32 initialCount, Int32 maximumCount, String name)     at Microsoft.SharePoint.Publishing.BlobCache.GetCacheTokenThreadStart()     at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)     at System.Threading.ThreadHelper.ThreadStart()    

"Screw this", I said to myself at this point, "I am better off *demoting* the DC back to a member server, and rerunning the SharePoint Configuration Wizard so it can reapply all of the necessary permissions required".

Demoted DC – Not so wizardry…

So in this site, there was actually no reason for the SharePoint server to be a domain controller. So I decided to demote the server back to a member server, and then rerun the configuration wizard.

The domain demotion worked fine, a reboot later and I was back as a member server.

I reran the SharePoint Configuration Wizard, choosing *not* to disconnect from the farm and to continue using this server as the Central Administration server. Pretty quickly the whole thing died at step 3 with the following error in the logs.

09/15/2008 09:20:35  9  INF                          The securityGroup value is WSS_WPG
09/15/2008 09:20:35  9  ERR                          Task secureresources has failed with an unknown exception
09/15/2008 09:20:35  9  ERR                          Exception: System.Security.Principal.IdentityNotMappedException: Some or all identity references could not be translated.
   at System.Security.Principal.NTAccount.Translate(IdentityReferenceCollection sourceAccounts, Type targetType, Boolean forceSuccess)
   at System.Security.Principal.NTAccount.Translate(Type targetType)
   at System.Security.AccessControl.CommonObjectSecurity.ModifyAccess(AccessControlModification modification, AccessRule rule, Boolean& modified)
   at System.Security.AccessControl.CommonObjectSecurity.AddAccessRule(AccessRule rule)
   at Microsoft.SharePoint.PostSetupConfiguration.ResourceAccess.SetRegistryAccessRule()
   at Microsoft.SharePoint.PostSetupConfiguration.ResourceAccess.Secure()
   at Microsoft.SharePoint.PostSetupConfiguration.SecurityTask.SecureResources()
   at Microsoft.SharePoint.PostSetupConfiguration.SecurityTask.Run()
   at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()

After some pondering, I realised the problem. The first line of the above log gives a big hint. Something went awry with the group WSS_WPG.

Of course! We just demoted a domain controller, back to a member server. Because SharePoint was installed when the machine was a domain controller the local group WSS_WPG does not exist. So I created this group and reran the configuration wizard. Sure enough we got further (but it still died as I expected it to).

09/15/2008 09:31:18  9  INF                          The securityGroup value is WSS_ADMIN_WPG
09/15/2008 09:31:18  9  ERR                          Task secureresources has failed with an unknown exception
09/15/2008 09:31:18  9  ERR                          Exception: System.Security.Principal.IdentityNotMappedException: Some or all identity references could not be translated.

Aha, WSS_ADMIN_WPG group also needs to be created. So I created this local group also and ran the wizard for a third time. This time we made it all the way to step 6 before another fatal error. This time the message and code is different to the first two.

09/15/2008 09:34:22  9  ERR                          Task adminvs has failed with an unknown exception
09/15/2008 09:34:22  9  ERR                          Exception: System.InvalidOperationException: 2220
   at Microsoft.SharePoint.Win32.SPNetApi32.NetLocalGroupSetMember(String groupName, String userName)
   at Microsoft.SharePoint.Administration.SPProvisioningAssistant.SetGroupMember(String username, Int32 group)
   at Microsoft.SharePoint.Administration.SPProvisioningAssistant.ProvisionProcessIdentity(String strUserName, SecureString secStrPassword, IdentityType identityType, Boolean isAdminProcess, Boolean isWindowsService, String strServiceName, Boolean dontRestartService)
   at Microsoft.SharePoint.Administration.SPProcessIdentity.ProvisionInternal(SecureString sstrPassword, Boolean isRunningInTimer)
   at Microsoft.SharePoint.Administration.SPApplicationPool.ProvisionInternal(SecureString sstrPassword)
   at Microsoft.SharePoint.Administration.SPApplicationPool.Provision()
   at Microsoft.SharePoint.Administration.SPWebServiceInstance.Provision()
   at Microsoft.SharePoint.PostSetupConfiguration.CentralAdministrationSiteTask.ProvisionAdminVs()
   at Microsoft.SharePoint.PostSetupConfiguration.CentralAdministrationSiteTask.Run()
   at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()

This error proved harder to crack, because in all of their wisdom, the engineers behind this part of the Configuration Wizard neglected to mention the name of the group that was causing this problem with the NetLocalGroupSetMember function. I had to look at another SharePoint install to see what was missing.

As it happened, it was the WSS_RESTRICTED_WPG group. Once I manually added this local group to the server, the Configuration Wizard successfully completed.

And it was good… Not!

Although the configuration wizard completed successfully, my troubles were not over. I created a new AD account to run the SharePoint farm. This account had no domain level privileges apart from being a regular domain user account.

On the SQL Server (thankfully a separate box) I added this account as a login, and granted it the dbcreator and securityadmin server roles (minimum privileges required for a farm account).

Back on the newly demoted member server, I added this account to the WSS_ADMIN_WPG and WSS_RESTRICTED_WPG groups. I then added this account to the "Farm Administrators" group inside SharePoint Central Administration. (Operations->Update Farm Administrators group)

image

Finally, I used the STSADM command "updatefarmcredentials" to change the farm account from the domain administrator to my newly created farm account.

C:\stsadm -o updatefarmcredentials -userlogin MYDOMAIN\MOSS.Farm -password MyPassWord

I then browsed to the SharePoint Central Administration site.

GONG! Erorrs! A quick check of the eventlogs and what do we see? Seems the new account does not have enough rights to run SharePoint. Dammit! Despite the configuration wizard (and in particular step 3) applying various permissions to the server, it seems like we still have some work to do.

The first log was event ID 5021.

Event ID:      5021
Description:
The identity of application pool SharePoint Central Administration v3 is invalid. The user name or password that is specified for the identity may be incorrect, or the user may not have batch logon rights. If the identity is not corrected, the application pool will be disabled when the application pool receives its first request.  If batch logon rights are causing the problem, the identity in the IIS configuration store must be changed after rights have been granted before Windows Process Activation Service (WAS) can retry the logon. If the identity remains invalid after the first request for the application pool is processed, the application pool will be disabled. The data field contains the error number.

The above message talked about batch logon rights. I guess because I chose not to disconnect from the server farm when I re-ran the configuration wizard, I wasn’t asked to confirm the farm account to use. Therefore the wizard made no attempt to grant the farm account the local security policy right to log on as a batch job.

The next event log entry was 1309.

Event ID:      1309
Description:
Event code: 3005
Event message: An unhandled exception has occurred.
Event time: 15/09/2008 1:56:54 PM
Event time (UTC): 15/09/2008 5:56:54 AM
Event ID: 07afc239d8df4a13b10a19c3e026e979
Event sequence: 2
Event occurrence: 1
Event detail code: 0
Application information:
    Application domain: /LM/W3SVC/1926590645/ROOT-1-128659318128622880
    Trust level: WSS_Minimal
    Application Virtual Path: /
    Application Path: C:\inetpub\wwwroot\wss\VirtualDirectories\38138\
    Machine name: SHAREPOINT
Process information:
    Process ID: 3456
    Process name: w3wp.exe
    Account name: MYDOMIN\MOSS.farm
Exception information:
    Exception type: COMException
    Exception message: Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0x80070542)

The exception message here talks about impersonation. The most logical local security policy right is "Impersonate a client after authentication". So I granted the farm account this right and the above exception went away.

In the end I compared the local security policy rights with a known good SharePoint install and noted that the farm account had the following local security policy rights.

  • Log on as a batch job
  • Log on as a service
  • Impersonate a client after authentication

Alas I was still not done! It turned out that some file-system and registry permissions were also required. Since the WSS_ADMIN_WPG and WSS_WPG groups did not exist when SharePoint was first installed, no permission changes were made to the 12 hive, the inetpub folder, or the registry. So I had to manually put these permissions back in, utilising a known good server as a reference.

I ended up modifying permissions for WSS_ADMIN_WPG and WS_WPG in two locations on the server.

  • Permissions on C:\inetpub\wwwroot
  • Permissions on 12 Hive (C:\program files\common files\microsoft shared\web server extensions\12)

I also had to modify permissions to one particular registry key for WSS_ADMIN_WPG

  • HKLM\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Secure\configdb

The key in particular that we needed access to was DSN. This hey holds the location of the farm configuration database.

Finally, after much teeth gnashing and grumbling about Bill Gates, we are working!

Let the bashing begin…

Running the SharePoint configuration wizard is possibly the most important activities that you will do as an administrator of a SharePoint farm. It is not a once off thing either. Service Packs, Infrastructure updates, hotfixes and the like all require the wizard to be re-run on every server in the farm.

Therefore, this thing has to be rock solid and not trip up. If you are half way through a service pack and this thing burps on something silly and trivial, then you are likely in for some serious pain.

My problems stemmed from the fact that the wizard *assumes* that the local WSS_* groups already exist and that certain permissions are already in place. In the scenario that I encountered, it is clear that this assumption does not always hold true. Therefore the wizard should detect this and rectify the situation, rather than bomb out with a runtime error and a stacktrace!

I have manually gotten things working again but what assurance do I have that I have caught everything? The annoying thing is that this is such a trivial thing to check for and fix. If the required groups so not exist, then create them! Sheesh!

Furthermore, how about showing some meaningful information in the logs? At least the first two log entries showed me the name of the group that I was having a problem with. Not so on the third error. You would think that as a developer that if you are going to make a call to a function called "NetLocalGroupSetmember", that someone debugging would like to know the name of the group that you are trying to use??? I had to make an educated guess as to the cause of that one. The first two had the friendly message "The securityGroup value is <groupname>" so I had a fighting chance. You can sooo tell when different developers have worked on different bits! Gotta love consistency!

Furthermore, the configuration wizard should check that the farm account has the appropriate local privileges on the server. In my case, despite using the STSADM -o  updatefarmcredentials, it seems that the local security rights needed by the new farm account wasn’t set and nor were registry and file level permissions. Why not? It would not be hard to make the check and rectify if things are not set up as expected!

Conclusion

If anybody says to me "Well why not just reinstall SharePoint and reconnect the content databases"? On the surface of it, it seems like a fairly logical thing to do. After all, reinstalling the product to a new farm would ask you for a farm account, create the necessary local groups and set the various correct permissions.

The answer is that the SharePoint database schema changes with hotfixes and service packs. A uninstall and reinstall would only work if you knew the exact patchlevel of the SharePoint that you uninstalled and patched the reinstallation to the same level. Failure to do this and you will hit the dreaded "“Your backup is from a different version of Windows SharePoint Services and cannot be restored to a server running the current version.” message. If you hit that then you have the equally fun task of trying to determine exactly which combination of Service Pack and/or Hotfix is required.

Think that through for a second and then consider this rhetorical question… Do you think that an administrator who installs MOSS 2007 onto a Domain controller as Domain Admin is likely to keep configuration management notes of all of the patches or service packs applied to SharePoint?

Need I answer? 🙂

thanks for reading

Paul Culmsee

15 Comments on “Sometimes "Microsoft bashing" is justified

  1. Ouch!

    Glad you managed to sort it in the end!

    I agree the config wizard should be rock solid. My recent SharePoint collapse came about after running the config wizard

  2. I’ve heard support say demoting a DC where SharePoint is installed is “Not supported” and would direct you to reinstall. Not sure if that’s documented anywhere, but these troubleshooting steps are fabululous… I guess it’s a case of it’s too much work and easier to just reinstall… like saying… just reboot.

  3. Paul – great post as usual; I’ve hit similar issues as well so I feel your pain. Hilarity does not ensue!

  4. No need to think about the version of the databases. Just patch your reinstalled SharePoint environment to the latest build and attach the databases, they will get updated automatically.

    This approach might require a second thought should you have a customized environment since such environments have to be tested for compatibility with the desired updates by simulating the update process in a test environment. You can of course try the database upgrade in the test environment, this should save you a lot of time and money since time is money.

  5. where to change the permissions for these 2 groups. WSS_ADMIN_WPG and WS_WPG . What permissions ? Please elaborate.

  6. Thank you! That was so incredibly useful. Sharepoint works again!!! Yippie…

    And the bashing is justified. Sharepoint feels in general very fragile. Let’s hope the situation improves with Sharepoint 2010.

  7. Very much appreciate your impressively detailed post. It led me to a local security policy rights issue with my worker process after changing authentication providers in SP2010. Service account did not have “Impersonate a client after authentication” right. I got the same “Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0Ă—80070542)” error. That fixed it. Thanks a bunch!

  8. Ok one word. Your a legend. Its nice to know im not going insane. Ive just suffered the same issue you had above but only, i installed sharepoint with least priviledges on a DC. Worked fine, for a while, however assuming timer job kicking in or something, kaboom, I got the error you described above. The least priviledge account has always been great for me, in this circumstance, I totally agree, NEVER INSTALL IT ON A DC. Business requirements interfering with technical requirements, low budgets, tight deadlines always push you that way. *sob* *sob*

  9. It’s amazing that you disected and maneuvered the entire SharePoint install process. I want to tell you that I have been working 3 full days and have read every article on the internet trying to salvage a SP 2010 install on SBS 2011. I was stuck at “Cannot connect to the configuration database” after somehow the WSS_ADMIN_WPG and WSS_WPG security groups got deleted. I was left with only the “unknown acount” SIDs for each group. After creating the groups and assigning permissions, all that was left for me to do was add the permission in the registry. Your install procedure will apparantly also work on a domain controller such as SBS 2011. I owe you. I looked for a “donate” button on your website. Thanks.

  10. Thank you very much, not exaclty your issue but your detailed analysis point me to the right direction, in my case one of the groups was missing.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.