Why me? Web part errors on new web applications

Send to Kindle

Oh man, it’s just not my week. After nailing a certificate issue yesterday that killed user profile provisioning, I get an even better one today! I’ve posted it here as a lesson on how not to troubleshoot this issue!

The symptoms:

I created a brand new web application on a SP2010 farm, and irrespective of the site collection I subsequently create, I get the dreaded error "Web Part Error: This page has encountered a critical error. Contact your system administrator if this problem persists"

Below is a screenshot of a web app using the team site template. Not so good huh?

image

The swearing…

So faced with this broken site, I do what any other self respecting SharePoint consultant would do. I silently cursed Microsoft for being at the root of all the world’s evils and took a peek into that very verbose and very cryptic place known as the ULS logs. Pretty soon I found messages like:

0x3348 SharePoint Foundation         General                       8sl3 High     DelegateControl: Exception thrown while building custom control ‘Microsoft.SharePoint.SPControlElement’: This page has encountered a critical error. Contact your system administrator if this problem persists. eff89784-003b-43fd-9dde-8377c4191592

0x3348 SharePoint Foundation         Web Parts                     7935 Information http://sp:81/default.aspx – An unexpected error has been encountered in this Web Part.  Error: This page has encountered a critical error. Contact your system administrator if this problem persists.,

Okay, so that is about as helpful as a fart in an elevator, so I turned up the debug juice using that new, pretty debug juicer turner-upper (okay, the diagnostic logging section under monitoring in central admin). I turned on a variety of logs at different times including.

  • SharePoint Foundation           Configuration                   Verbose
  • SharePoint Foundation           General                         Verbose
  • SharePoint Foundation           Web Parts                       Verbose
  • SharePoint Foundation           Feature Infrastructure          Verbose
  • SharePoint Foundation           Fields                          Verbose
  • SharePoint Foundation           Web Controls                    Verbose
  • SharePoint Server               General                         Verbose
  • SharePoint Server               Setup and Upgrade               Verbose
  • SharePoint Server               Topology                        Verbose

While my logs got very big very quickly, I didn’t get much more detail apart from one gem,to me, seemed so innocuous amongst all the detail, yet so kind of.. fundamental 🙂

0x3348 SharePoint Foundation         Web Parts                     emt7 High     Error: Failure in loading assembly: Microsoft.SharePoint, Version=12.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a eff89784-003b-43fd-9dde-8377c4191592

That rather scary log message was then followed up by this one – which proved to be the clue I needed.

0x3348 SharePoint Foundation         Runtime                       6610 Critical Safe mode did not start successfully. This page has encountered a critical error. Contact your system administrator if this problem persists. eff89784-003b-43fd-9dde-8377c4191592

It was about this time that I also checked the event logs (I told you this post was about how not to troubleshoot) and I saw the same entry as above.

Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Event ID:      6610
Description:
Safe mode did not start successfully. This page has encountered a critical error. Contact your system administrator if this problem persists.

I read the error message carefully. This problem was certainly persisting and I was the system administrator, so I contacted myself and resolved to search google for the “Safe mode did not start successfully” error.

The 46 minute mark epiphany

image

If you watch the TV series “House”, you will know that House always gets an epiphany around the 46 minute mark of the show, just in time to work out what the mystery illness is and save the day. Well, this is the 46 minute mark of this post!

I quickly found that others had this issue in the past, and it was the process where SharePoint checks web.config to process all of the controls marked as safe. If you have never seen this, it is the section of your SharePoint web application configuration file that looks like this:

image 

This particular version of the error is commonly seen when people deploy multiple servers in their SharePoint farm, and use a different file path for the INETPUB folder. In my case, this was a single server. So, although I knew I was on the right track, I knew this wasn’t the issue.

My next thought was to run the site in full trust mode, to see if that would make the site work. This is usually a setting that makes me mad when developers ask for it because it tells me they have been slack. I changed the entry

<trust level="WSS_Minimal" originUrl="" />

to

<trust level="Full" originUrl="" />

But to no avail. Whatever was causing this was not affected by code access security.

I reverted back to WSS_Minimal and decided to remove all of the SafeControl entries from the web.config file, as shown below. I knew the site would bleat about it, but was interested if the “Safe Mode” error would go away.

image

The result? My broken site was now less broken. It was still bitching, but now it appeared to be bitching more like what I was expecting.

image

After that, it was a matter of adding back the <safecontrol> elements and retrying the site. It didn’t take long to pinpoint the offending entry.

<SafeControl Assembly="Microsoft.SharePoint, Version=12.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" Namespace="Microsoft.SharePoint.WebPartPages" TypeName="ContentEditorWebPart" Safe="False" />

As soon as I removed this entry the site came up fine. I even loaded up the content editor web part without this entry and it worked a treat. Therefore, how this spurious entry got there is still a mystery.

The final mystery

My colleague and I checked the web.config file in C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG. This is the one that gets munged with other webconfig.* files when a new web application is provisioned.

Sure enough, its modified date was July 29 (just outside the range of the SharePoint and event logs unfortunately). When we compared against a known good file from another SharePoint site, we immediately saw the offending entry.

image

The solution store on this SharePoint server is empty and no 3rd party stuff to my knowledge has been installed here. But clearly this file has been modified. So, we did what any self respecting SharePoint consultant would do…

…we blamed the last guy.

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

More User Profile Sync issues in SP2010: Certificate Provisioning Fun

Send to Kindle

Wow, isn’t the SharePoint 2010 User Profile Service just a barrel of laughs. Without a bit of context, when you compare it to SP2007, you can do little but shake your head in bewilderment at how complex it now appears.

I have a theory about all of this. I think that this saga started over a beer in 2008 or so.

I think that Microsoft decided that SharePoint 2010 should be able to write back to Active Directory (something that AD purists dislike but sold Bamboo many copies of their sync tool). Presumably the SharePoint team get on really well with the Forefront Identify Manager team and over a few Friday beers, the FIM guys said “Why write your own? Use our fit for purpose tool that does exactly this. As an added bonus, you can sync to other directories easily too”.

“Damn, that *is* a good idea”, says the SharePoint team and the rest is history. Remember the old saying, the road to hell is paved with good intentions?

Anyways, when you provision the UPS enough times, and understand what Forefront Identity Manager does, it all starts to make sense. Of course, to have it make sense, requires you to mess it up in the first place and I think that everyone universally will do this – because it is essentially impossible to get it right the first time unless you run everything as domain administrator. This is a key factor that I feel did not get enough attention within the product team. I have now visited three sites where I have had to correct issues with the user profile service. Remember, not all of us do SharePoint all day – for the humble system administrator that is also catering with the overall network, this implementation is simply too complex. Result? Microsoft support engineers are going to get a lot of calls here – and its going to cost Microsoft that way.

One use-case they never tested

I am only going to talk about one of the issues today because Spence has written the definitive article that will get you through if you are doing it from scratch.

I went to a client site where they had attempted to provision the user profile synchronisation unsuccessfully. I have no idea of the things they tried because I wasn’t there unfortunately, but I made a few changes to permissions, AD rights and local security policy as per Spencers post. I then provisioned user profile sync again and I hit this issue. A sequence of 4 event log entries.

Event ID:      234
Description:
ILM Certificate could not be created: Cert step 2 could not be created: C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

Event ID:      234
Description:
ILM Certificate could not be created: Cert could not be added: C:\Program Files\Microsoft Office Servers\14.0\Tools\CertMgr.exe -add -r LocalMachine -s My -c -n “ForefrontIdentityManager” -r LocalMachine -s TrustedPeople

Event ID:      234
Description:
ILM Certificate could not be created: netsh http error:netsh http add urlacl url=
http://+:5725/ user=Domain\spfarm sddl=D:(A;;GA;;;S-1-5-21-2972807998-902629894-2323022004-1104)

Event ID:      234
Description:
Cannot get the self issued certificate thumbprint:

The theory

Luckily this one of those rare times where the error message actually makes sense (well – if you have worked with PKI stuff before). Clearly something went wrong in the creation of certificates. Looking at the sequence of events, it seems that as part of provisioning ForeFront Identity Manager, a self signed certificate was created for the Computer Account, added to the Trusted People certificate store and then is used for SSL on a web application or web service listening on port 5725.

By the way, don’t go looking for the web app listening on such a port in IIS because its not there. Just like SQL Reporting Services, FIM likely uses very little of IIS and doesn’t need the overhead.  

The way I ended up troubleshooting this issue was to take a good look at the first error in the sequence and what the command was trying to do. Note the description in the event log is important here. “ILM Certificate could not be created: Cert step 2 could not be created”. So this implies that this command is the second step in a sequence and there was a step 1 that must have worked. Below is the step 2 command that was attempted.

C:\Program Files\Microsoft Office Servers\14.0\Tools\MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

When you create a certificate, it has to have a trusted issuer. Verisign and Thawte are examples and all browsers consider them trustworthy issuers. But we are not using 3rd party issuers here. Forefront uses a self-signed certificate. In other words, it trusts itself. We can infer that step 1 is the creation of this self-trusted certificate issuer by looking at the parameters of the MakeCert command that step 2 is using.

Now I am not going to annotate every Makecert parameter here, but the English version of the command above says something like:

Make me a shiny new certificate for the local machine account and call it “ForefrontIdentityManager”, issued by a root certificate that can be found in the trusted root store also called ForeFrontIdentityManager.

So this command implies that step 1 was the creation of that root certificate that issues the other certificates. (Product team – you could have given the name of the root issuer certificate something different to the issued certificate)

The root cause

Now that we have established a theory of what is going on, the next step is to run the failing Makecrt command from a prompt and see what we get back. Make sure you do this as the Sharepoint farm account so you are comparing apples with apples.

C:\Program Files\Microsoft Office Servers\14.0\Tools>MakeCert.exe -pe -sr LocalMachine -ss My -a sha1 -n CN=”ForefrontIdentityManager” -sky exchange -pe -in “ForefrontIdentityManager” -ir localmachine -is root

Error: There are more than one matching certificate in the issuer’s root cert store. Failed

Aha! so what do we have here? The error message states that we have more than 1 matching certificate in the issuers root certificate store.

For what its worth it is the parameters “-ir localmachine -is root” that specifies the certificate store to use. In this case, it is the trusted root certificate store on the local computer.

So lets go and take a look. Run the Microsoft Management Console (MMC) and Choose “Add/Remove Snap In” from the File Menu.

image

From the list of snap ins choose Certificates and then choose “Computer Account”

image

Now in the list of certificate stores, we need to examine the one that the command refers to: The Trusted Root Certification Authorities store. Well, look at that, the error was telling the truth!

image

Clearly the Forefront Identity Manager provisioning/unprovisioning code does not check for all circumstances. I can only theorise what my client did to cause this situation because I wasn’t privy to what was done on this particular install before I got there. but step 1 of this provisioning process would create an issuing certificate whether one existed already or not. Step 2 then failed because it had no way to determine which of these certificates is the authoritative one.

This was further exacerbated because each re-attempt creates another root certificate because there is no check whether a certificate already exists.

The cure is quite easy. Just delete all of the ForefrontIdentityManager certificates from the Trusted Root Certification Authorities and re-provision the user profile sync in SharePoint. Provided that there is no certificate in this store to begin with, step 1 will create it and step 2 will then be able to create the self signed certificate using this issuer just fine.

Conclusion (and minor rant)

Many SharePoint pros have commented on the insane complexity of the new design of the user profile sync system. Yes I understand the increased flexibility offered by the new regime, leveraging a mature product like Forefront, but I see that with all of this flexibility comes risk that has not been well accounted for. SP2010 is twice as tough to learn as SP2007 and it is more likely that you will make a mistake than not making one. The more components added, the more points of failure and the less capable over-burdened support staff are in dealing with it when it happens.

SharePoint 2010 is barely out of nappies and I have already been in a remediation role for several sites over the user profile service alone.

I propose that Microsoft add a new program level KPI to rate how well they are doing with their SharePoint product development. That KPI should be something like % of time a system administrator can provision a feature without making a mistake or resorting to running it all as admin. The benefit to Microsoft would be tangible in terms of support calls and failed implementations. Such a KPI would force the product team to look at an example like the user profile service and think “How can we make this more resilient?”. “How can we remove the number of manual steps that are not obvious?”, “how can we make the wizard clearer to understand?” (yes they *will* use the wizard).

Right now it feels like the KPI was how many features could be crammed, in as well as how much integration between components there is. If there is indeed a KPI for that they definitely nailed it this time around.

Don’t get me wrong – its all good stuff, but if Microsoft are stumping seasoned SharePoint pros with this stuff, then they definitely need to change the focus a bit in terms of what constitutes a good product.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

Index index everywhere but not a result in sight.

Send to Kindle

I have been doing a bit more tech work than normal lately – SP2010 popularity I guess, and was asked to remediate a few issues on a problematic server that I hadn’t set up. The server in question had a number of issues (over and above the usual “lets all run it as one account” type stuff) that had a single root cause, so I thought I’d quickly document the symptoms and the cause here.

Symptom 1: SQL Server Event ID 28005:

“An exception occurred while enqueueing a message in the target queue. Error: 15404, State: 19. Could not obtain information about Windows NT group/user ‘DOMAIN\someuser’, error code 0x5”

This error would be reported in the Application log around 70 times per minute. As it happened, I had removed the account in question from running any of the SharePoint web, application or windows services, but this still persisted. I suspect SharePoint was installed as this account because it was the db owner of many of the databases on the SQL Server. Whatever the case, SQL was whinging about it despite its lack of actual need to be there.

Symptom 2: Event ID 4625:

At a similar rate of knots as the SQL error, was the rate of 4625 errors in the security log. These logs were not complaining about the account that the SQL event complained about, but instead it complained about ANY account running the SQL Server instance. I tried network service, a domain account and a local account and saw similar errors (although the local one had a different code).

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Event ID:      4625
Task Category: Logon
Keywords:      Audit Failure
Description:  An account failed to log on. 

Subject:
    Security ID:        NETWORK SERVICE
    Account Name:        COMPUTER$
    Account Domain:        DOMAIN
    Logon ID:        0x3e4 

Failure Information:
    Failure Reason:        Unknown user name or bad password.
    Status:            0xc000006d
    Sub Status:        0xc0000064 

Process Information:
    Caller Process ID:    0x7e0
    Caller Process Name:    E:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\sqlservr.exe 

Detailed Authentication Information:
    Logon Process:        Authz   
    Authentication Package:    Kerberos

When using a local, rather than a domain user account the code was:

Failure Information:
    Failure Reason:        An Error occured during Logon.
    Status:            0xc000005e
    Sub Status:        0x0 

Symptom 3: Search only working for domain administrators

On top of the logs being filled by endless entries of the previous two, I had another error (the original reason why I was called in actually). A SharePoint search would yield zip, nada, zero, no results for a regular user, but a domain administrator could happily search SP2010 and get results. (well actually regular users did get some results – people searched actually worked fine).

The crawler was fine and dandy and the default content source had not been messed with. There were no errors or logs to suggest anything untoward.

The resolution:

It was the second symptom that threw me because I thought that the problem must have been kerberos config. But I quickly discounted that after checking SPN’s and the like (notwithstanding the fact this was a single server install anyway!)

On a hunch (helped by the fact that I had dealt with the issue of registering managed accounts not so long ago), I concentrated on the user account that was causing SQL Server all the trouble (Event ID 28005). I loaded up Active Directory and temporarily changed the security of this user account so that “Authenticated Users” had “READ” access to it.

image

As soon as I did this, both event ID 28005 and 4625 stopped.

I then checked the search (symptom 3) and it was still barfing. In this case I decided to turn up the debug juice on the “Query” and “Query Processor” functions of “SharePoint Server Search”.

image

After upping the level of verbosity, I found what I was looking for.

08/12/2010 22:13:41.00     w3wp.exe (0x2228)                           0x1F0C    SharePoint Server Search          Query Processor                   g2j3    High        AuthzInitializeContextFromSid failed with ERROR_ACCESS_DENIED. This error indicates that the account under which this process is executing may not have read access to the tokenGroupsGlobalAndUniversal attribute on the querying user’s Active Directory object. Query results which require non-Claims Windows authorization will not be returned to this querying user.    debd2c54-d6a5-41b8-bf26-c4697b36f4d4

I knew immediately that this was very likely the same issue as the first two symptoms and when I googled this result, my Perth compatriot, Jeremy Thake had hit the same issue in July. The fix is to add your search service account to a group called “Pre-Windows 2000 Compatibility Access” group. This group happens to have the right required to read this attribute. Whether it is the same attribute I needed for my SQL issue, or the registering managed accounts issue I don’t know, but what I do know is that this group loosens up permissions enough to cure all four of these issues.

The little security guy in me keeps telling me I should confirm the least privilege in each of these scenarios, but hey if Microsoft are saying to put the accounts into this group, then who am I to argue?

Finally, it turns out that SP2010 re-introduced something that was in SP2003. A call to a function called AuthzInitializeContextFromSid which seems to be the root of it all. Apparently it was not used in SP2007, but its sure there now. I assume that one of the many stored procedures that SharePoint would call in SQL may have been the cause of Symptom 1. When you look at Symptom 2, it now makes sense because AD was faithfully reporting an access denied when the call to AuthzInitializeContextFromSid was made. It reported SQL Server as the culprit, I assume, because of a stored proc doing the work perhaps? It just sucks that the security event logged is a fairly stock message that doesn’t give you enough specifics to really work out what is going on.

Anyway, I hope that helps someone else – as googling the event ID details is not overly helpful

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

Also why I’ve been quiet…

Send to Kindle

I’m in an airport (again), typing this on my way back from my latest trip to New Zealand – a country I am loving more and more each time I go there. (Anywhere that I can go that uses the same power plugs as back home is a great place in my book).

image

A while back I posted about the book I am writing with Kailash Awati (Beyond Best Practices). If that project wasn’t taking enough time, dedication and brain cells, I have just finished an undertaking that has essentially consumed me for four months (some 450 man hours). This week it was delivered and the student responses far surpassed my expectations and made it all worthwhile.

I created a 4 day SharePoint 2010 Governance and Information Architecture training course as part of Microsoft New Zealand’s Elite initiative. (760 pages of SharePoint governance and IA goodness!) If you are not aware of the Elite initiative, it is a novel initiative by Microsoft in New Zealand to improve the quality of SharePoint practitioners in the Microsoft partner ecosystem. Now I tell you – Darryl Burling and his team down there at Microsoft have their ear to the ground – and really do listen to their customers. They initiated this program to allow local solution providers to take the next step beyond technical knowhow and turn it into deeper proficiency.

The SharePoint Elite Partner Initiative is designed to recognise those New Zealand Partners who have built skills excellence and a track record for success with SharePoint into their business. When it comes to SharePoint, these are the elite – the best of the best. If you are looking for a partner who can help you plan and deploy your SharePoint implementation, these are the best in the business.

This Elite program is unique in its focus and via the insight of those who conceived it, allowed me the flexibility to create a course that was a balance of technical labs, sensemaking, governance, critical thinking and user engagement. I was going through the course feedback just now and the key trend from it all was that the students really enjoyed the softer stuff that I teach, more so than the “here is a SharePoint feature and look at what it can do!” type material (they can get that sort of material anywhere).

So all in all it was a great week, which made all the effort, sweat and tears leading up to it worth it.

So thanks attendees, it was a great 4 days. For other readers, hopefully the course might come to a city near you in the not too distant future.

 

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle

SQL Server oddities

Send to Kindle

So its Saturday night and where I should be out having fun, I am instead sitting in a training room in Wellington New Zealand, configuring a lab for a course I am running on Monday.

Each student lab setup is two virtual machines. The first being a fairly stock standard AD domain controller and the second being a SQL/SharePoint 2010 box. Someone else set up the machines, and I came in to make some changes for the labs next week. But as soon as I fired up the first student VM’s I hit a snag. I loaded SQL Server Management Studio, only to find that I was unable to connect to it as the domain administrator, despite being fairly certain that the account had been granted the sysadmin role in SQL.

clip_image002

Checking the event logs, showed an error that I had not seen before. “Token-based server access validation failed with an infrastructure error”. hmmm

Log Name: Application

Source: MSSQLSERVER

Date: 31/07/2010 6:45:37 p.m.

Event ID: 18456

Task Category: Logon

Level: Information

Keywords: Classic,Audit Failure

User: TRAINSBYDAVE\administrator

Computer: SP01.trainsbydave.com

Description:

Login failed for user ‘TRAINSBYDAVE\administrator’. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors.

[CLIENT: <local machine>]

Event Xml:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

<System>

<Provider Name="MSSQLSERVER" />

<EventID Qualifiers="49152">18456</EventID>

<Level>0</Level>

<Task>4</Task>

<Keywords>0x90000000000000</Keywords>

<TimeCreated SystemTime="2010-07-31T06:45:37.000000000Z" />

<EventRecordID>8281</EventRecordID>

<Channel>Application</Channel>

<Computer>SP01.trainsbydave.com</Computer>

<Security UserID="S-1-5-21-3713613819-1395520312-4192346095-500" />

</System>

<EventData>

<Data>TRAINSBYDAVE\administrator</Data>

<Data> Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors.</Data>

<Data> [CLIENT: &lt;local machine&gt;]</Data>

As it happened, whoever set it up had SQL Server in mixed mode authentication, so I was able to sign in as the sa account and have a poke around. All things considered, it should have worked. The user account in question was definitely in the logins and set with sysadmin server rights as shown below.

clip_image004

Uncle google showed a few people with the error but not as many as I expected to see since half the world gets nailed on authentication issues. I also took the event log suggestion and looked for a previous error. A big nope on that suggestion. In fact in all respects, everything looked sweet. The machine was valid on the domain and I was able to perform any other administrative task.

Finally, I removed the TRAINSBYDAVE\administrator account from the list of Logins in SQL Server. It gave the unsurprising whinge about orphaned database users, but luckily for AD accounts when you re-add the same account back it is smart enough to re-establish the link.

clip_image006

As soon as I re-added the account, all was good again. If I was actually interested enough I’d delve into why this happened, but not tonight – I have another 6 student machines to configure 🙂

Goodnight!

clip_image008

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle