Back to Cleverworkarounds mainpage
Visit - A Seven Sigma Initiative
 

Sep 25 2008

Complexity bites: When SharePoint = Risk

I think as you age, you become more and more like your parents. Not so long ago I went out paintballing with some friends and we all agreed that the 16-18 year olds who also happened to be there were all obnoxious little twerps who needed a good kick in the rear. At the same time, we also agreed that we were just as obnoxious back when we were that age. Your perspective changes as you learn and your experience grows, but you don’t forget where you came from.

I now find myself saying stuff to my kids that my parents said to me, I think today’s music is crap, I have taken a liking to drinking quality scotch. Essentially all I need now to complete the metamorphosis to being my father is for all my hair to fall out!

So when I write an article whining about an assertion that IT has a credibility issue and has gone backwards in its ability to cope with various challenges, I fear that I have now officially become my parents. I’ll sound like grandpa who always tells you that life was so much simpler back in the 1940’s.

Consequences of complexity…

Before I go and dump on IT as a discipline, how about we dump on finance as a discipline, just so you can be assured that my cynicism extends far beyond nerds.

I previously wrote about how Sarbanes Oxley legislation was designed to, yet ultimately failed to, provide assurance to investors and regulators that public companies had adequate controls over their financial risk. As I write this, we are in the midst of a once in a generation-or-two credit crisis where some seven hundred billion dollars ($700,000,000,000) of US taxpayers’ money will be used to take ownership of crap assets (foreclosed or unpaid mortgages).

Part of the problem with the credit crisis was through the use of "collateralized debt obligations". This is a fancy, yet complex, way of taking a bunch of mortgages, and turning them into an "asset" that someone else who has some spare cash invests in. If you are wondering why the hell someone would invest in such a thing, then consider people with home loans, supposedly happily paying interest on those mortgages. It is that interest that finds its way to the holder (investor) of the CDO. So a CDO is supposedly an income stream.

Now if that explanation makes your eyes glaze over then I have bad news for you: that’s supposed to be the easy part. The reality is that the CDO’s are actually extremely complex things. They can be backed by residential property, commercial property, something called mortgage backed securities, corporate loans - essentially anything that someone is paying interest on can find its way into a CDO that someone else buys into, to get the income stream from the interest paid.

To provide "assurance" that these CDO’s are "safe", ratings agencies give them a mark that investors rely upon when making their investment. So a "AAA" CDO is supposed to have been given the tick of approval by experts in debt instrument style finance.

Here’s the rub about rating agencies. Below is a news article from earlier in the year with some great quotes

http://www.nytimes.com/2008/03/23/business/23how.html?pagewanted=print

Credit rating agencies, paid by banks to grade some of the new products, slapped high ratings on many of them, despite having only a loose familiarity with the quality of the assets behind these instruments.

Even the people running Wall Street firms didn’t really understand what they were buying and selling, says Byron Wien, a 40-year veteran of the stock market who is now the chief investment strategist of Pequot Capital, a hedge fund. “These are ordinary folks who know a spreadsheet, but they are not steeped in the sophistication of these kind of models,” Mr. Wien says. “You put a lot of equations in front of them with little Greek letters on their sides, and they won’t know what they’re looking at.”

Mr. Blinder, the former Fed vice chairman, holds a doctorate in economics from M.I.T. but says he has only a “modest understanding” of complex derivatives. “I know the basic understanding of how they work,” he said, “but if you presented me with one and asked me to put a market value on it, I’d be guessing.”

What do we see here? How many people really *understand* what’s going on underneath the complexity?

Of course, we now know that many of the mortgages backing these CDO’s were made to people with poor credit history, or with a high risk of not being able to pay the loans back. Jack up the interest rate or the cost of living and people foreclose or do not pay the mortgage. When that happens en masse, we have a glut of houses for sale, forcing down prices, lowering the value of the assets, eliminating the "income stream" that CDO investors relied upon, making them pretty much worthless.

My point is that the complexity of the CDO’s were such that even a guy with a doctorate in economics only had a ‘modest understanding’ of them. Holy crap! If he doesn’t understand it then who the hell does?

Thus, the current financial crisis is a great case study in the relationship between complexity and risk.

Consequences of complexity (IT version)…

One thing about doing what I do, is that you spent a lot of time on-site. You get to see the IT infrastructure  and development at many levels. But more importantly, you also spend a lot of time talking to IT staff and organisation stakeholders with a very wide range of skills and experience. Finally and most important of all, you get to see first hand organisational maturity at work.

My conclusion? IT is completely f$%#ed up across all disciplines and many will have their own mini equivalent of the US $700 billion dollar haemorrhage. Not only that, it is far worse today than it previously was - and getting worse! IT staff are struggling with ever accelerating complexity and the "disconnect" between IT and the business is getting worse as well. To many businesses, the IT department has a credibility problem, but to IT the feeling is completely mutual :-)

You can find a nice thread about this topic on slashdot. My personal favourite quote from that thread is this one

Let me just say, after 26 years in this business, of hearing this every year, the systems just keep getting more complex and harder to maintain, rather than less and easier.

Windows NT was supposed to make it so anyone who could use Windows could manage a server.

How many MILLION MSCEs do we have in the world now?

Storage systems with Petabytes of data are complex things. Cloud computing is a complex thing. Supercomputing clusters are complex things. World-spanning networks are complex things.

No offense intended, but the only people who think things are getting easier are people who don’t know how they work in the first place

Also there is this…

There are more software tools, programming languages, databases, report writers, operating systems, networking protocols, etc than ever before. And all these tools have a lot more features than they used to. It’s getting increasingly harder to know "some" of them well. Gone are the days when just knowing DOS, UNIX, MVS, VMS, and OS/400 would basically give you knowledge of 90% of the hardware running. Or knowing just Assembly/C/Cobol/C++ would allow you to read and maintain most of the source code being used. So I would argue that the need for IT staff is going to continue to increase.

I think the "disconnect" between IT and Business has a lot more to do with the fact that business "knows" they depend on IT, but they are frustrated that IT can’t seem to deliver what they want when they want it. On the other side, IT has to deal with more and more tools and IT staff has to learn more and more skills. And to increase frustration in IT, business users frequently don’t deliver clear requirements or they "change" their mind in the middle of projects….

So it seems that I am not alone :-)

I mentioned previously that more often than not, SQL Server is poorly maintained - I see it all the time. Yet today I was speaking to a colleague who is a storage (SAN) and VMware virtualisation god. I asked him what the average VMware setup was like and his answer was similar to my SQL Server and SharePoint experience. In his experience, most of them were sub-optimally configured, poorly maintained, poorly documented and he could not provide any assurance as to the stability of the platform.

These sorts of quality assurance issues are rampant in application development too. I see the same thing most definitely in the security realm too.

As the above quote sates, "it’s increasingly harder to know *some* of them well". These days I am working with specialists who live and breathe their particular discipline (such as storage, virtualisation, security or comms). Those disciplines over time grow more complex and sub-disciplines appear.

Pity then, the poor developer/sysadmin/IT manager who is trying to keep a handle on all of this and try to provide a decent service to their organisation!

Okay, so what? IT has always been complex - I sound like a Gartner cliche. What’s this got to do with SharePoint?

Consequences of SharePoint complexity…

SharePoint, for a number of reasons, is one of those products that has a way of really laying bare any gaps that the organisation has in terms of their overall maturity around technology and strategy.

Why?

Because it is so freakin’ complex! That complexity transcends IT disciplines and goes right to the heart organisational/people issues as well.

It’s bad enough getting nerds to agree on something, let alone organisation-wide stakeholders!

Put simply, if you do a half-arsed job of putting SharePoint in, you will be punished in so many ways! The simple fact is that the odds are against you before you even start because it only takes a mistake in one particular part of the complex layers of hardware, systems, training, methodology, information architecture and governance, to devalue the whole project.

When I first started out, I was helping organizations get SharePoint installed. However lately I am visiting a lot of sites where SharePoint has already been installed, but it has not been a success. There are various reasons;I have cited them in detail in the project failure series, so I won’t rehash all that here. (I’d suggest reading parts three, four and five in particular).

I am firmly of the conclusion that much of SharePoint is more art than science, and what’s more, the organisation has to be ready to come with you. Due to differing learning styles and poor communication of strategy, this is actually pretty rare. Unfortunately, IT are not the people who are well suited to "getting the organisation ready for SharePoint."

If that wasn’t enough, then there is this question. If IT already struggle to manage the underlying infrastructure and systems that underpin SharePoint, then how can you have any assurance that IT will have a "governance epiphany" and start doing things the right way?

This translates to risk, people! I will be writing all about risk in a similar style to the CFO Return on Investment series very soon. I am very interested in methods to quantify the risk brought about by the complexity of SharePoint and the IT services it relies on. For me, I see a massive parallel from the complexity factor in the current financial crisis and I think that a lot can be learned from it. SOX was supposed to provide assurance and yet did nothing to prevent the current crisis. Therefore, SOX represents a great example of mis-focused governance where a lot of effort can be put in for no tangible gain.

A quick test of "assurance"…

Governance is like learning to play the guitar. It takes practice, and it does not give up its secrets easily and despite good intent, you will be crap at it for a while. It is easy to talk about, but putting it into practice is another thing.

Just remember this. The whole point of the exercise is to provide *assurance* to stakeholders. When you set any rule, policy, procedure, standard (or similar), just ask yourself: Does this provide me the assurance I need that gives me confidence to vouch for the service I am providing? Just because you may be adopting ITIL principles, does *not* mean that you are necessarily providing the right sort assurance that is required.

I’ll leave you with a somewhat biased, yet relatively easy litmus test that you can use to test your current level of assurance.

It might be simplistic, but if you are currently scared to apply a service pack to SharePoint, then you might have an assurance issue. :-)

 

Thanks for reading

 

Paul Culmsee

www.sevensigma.com.au

No Tags



Sep 16 2008

Sometimes "Microsoft bashing" is justified

Microsoft bashing is a favourite pastime of many a nerd. Whether it is justified or not in many cases is debatable since M$ will never please everyone. But the point is, it is cathartic and in actual fact, good therapy because venting your frustrations at Bill Gates is much healthier than at your colleagues or family.

To my Microsoft employee friends reading this. Don’t feel all defensive - some of the very best Microsoft bashing I have ever heard comes from you guys anyway :-)

So although sometimes the M$ bashing is completely unjustified, long shall it continue to preserve the sanity of IT professionals around the globe.

Having said that, on occasion you will hit some Microsoft induced pain that is legitimately and frustratingly dumb. By "legitimately", I mean that you cannot say "although in hindsight it was dumb, I can actually understand why they decided to do that". Instead you get caught out and experience pain and frustration simply because of a silly Microsoft oversight.

In this case, the oversight is with the SharePoint Configuration Wizard

Continue reading “Sometimes "Microsoft bashing" is justified”

No Tags



Aug 30 2008

SQL God? No… I just know how to do a maintenance plan

image 

I am working on an essay about IT complexity at the moment, and one thing that sprung into my mind while thinking about this, is the fact that many of my clients seem to think I am some sort of SQL Server guru.

There are two sad realities inferred by this.

Firstly, I am far from a SQL Server god. Yes I have experience with it, but the only reason people think I’m any good at it stems from the general lack of knowledge that they have about the product. Often all I have to do is waltz in with my Michael Buble like smooth charm and recommend a maintenance plan be set up and I am instantly the guy to talk to in all things SQL.

The truth of the matter is that I’m not fit to lick the boots of a skilled DBA. But like all former system administrators/infrastructure managers who have had to deal with the pressure and consequences of downtime, irrespective of the product, I developed a reflex to learn what I need to to cover my butt. Often I didn’t care less how a product operated from the end-user perspective. All I cared about was how it hung together so I could recover it when it inevitably failed in some way.

Continue reading “SQL God? No… I just know how to do a maintenance plan”

No Tags



Jul 19 2008

Good advice hidden in the Infrastructure Update

I guess the entire SharePoint world now is aware of the post SP1 "Infrastructure updates" put out by Microsoft recently. Probably the best thing about them are that the flaky "content deployment" feature has had some serious work done on it. (My advice has always been use with extreme caution or avoid it, but now I will have to reassess).

Anyway, that is not what I am writing about. Being a former IT Security consultant, I have always installed SharePoint in a "least privilege" configuration. I use different service accounts for search, farm, SSP, reporting services and web applications. The farm account in particular, is one that needs to be very carefully managed given its control over the heart of SharePoint - the configuration database.

Specifically, the farm account never should have any significant rights on any SharePoint farm server, over and above what it minimally needs (which is some SQL Server rights and some DCOM permission stuff). For what it’s worth, I do not use the Network Service account either, as it is actually more risky than a low privileged domain or local user account.

Developers on the other hand tend to run their boxes as full admin. I have no problem with this, so long as there is a QA or test server that is running least privilege.

However, as always with tightening the screws, there are some potential side effects in doing this. There are occasions when the farm account actually needs to perform a task that requires higher privileges, over and above what it has by default. Upgrading SharePoint from a standard to enterprise license is one such example, and it seems that performing the infrastructure update may be another.

If you take the time to read the installation instructions for the infrastructure update, there is this gem of advice:

To ensure that you have the correct permissions to install the software update and run the SharePoint Products and Technologies Configuration Wizard, we recommend that you add the account for the SharePoint Central Administration v3 application pool identity to the Administrators group on each of the local Web servers and application servers and then log on by using that account

You may wonder why this even matters? After all, it is exceedingly likely that you logged into the server as an administrator or domain administrator anyway. Surely you have all the permission that you need, right?

The answer is that not all update tasks are run by your logged-in account. Although you start the installation using your account, at a certain point during the install, many tasks are performed by the SharePoint Central Administration application. Thus, despite you having administrative permissions, SharePoint (using the farm account) will not have.

So the advice above essentially says, temporarily add the SharePoint farm account to the local administrators group and then log into the server as that user account. Now perform the action as instructed. That way the account that starts the installation is the same account that runs SharePoint and thus we know that we are using the correct account with the correct server privileges.

Once the installation has been completed, you can log out of the farm account and then revoke its administrator access, and we are back to the original locked down configuration.

So keep this in mind when performing farm tasks that are likely to require some privileges at the operating system level. It is a good habit to get into (provided you remember to lock down permissions afterwards :-) ).

Cheers

Paul

No Tags



Jul 09 2008

Office Server Search memory leak and stuck on "crawling"

Tags: SQL Server, Troubleshooting @ 11:18 pm

It is the typical scenario isn’t it. Site works fine for a week and then is officially launched on a Monday morning and the site breaks after an hour complaining that it cannot connect to the configuration database. Whoa?

The SQL Server was checked and confirmed to be running fine, and in checking the SharePoint web front end server I I noticed was MSSEARCH.EXE was chewing memory at a rapid rate of knots. Checking the event logs showed up a steady sequence of event ID 7888, 10036 and 3355 messages. Later I noticed that the search crawl was stuck on "Crawling".

If you search on this topic, many people recommend recreating your Shared Service Provider. In this case, this is unnecessary (not to mention drastic).

It turned out to be an unfortunate combination of factors.

  • The client was using SQL Server 2005 with SP1
  • The client had a SQL Server maintenance plan with a "rebuild index" task.

imageNon SQL reader? Maintenance plans are a "good thing". Think of it as a fitness regime for your SQL Server. You can regularly perform integrity checks, tasks to optimise performance, run backups and the like. A screenshot of a basic SQL2005 maintenance plan is to the right.

As it turns out, my client ran a maintenance plan each Sunday (hence why this broke on the Monday of go-live). It also turns out that the "rebuild index" maintenance plan task in SQL Server prior to SP2 has a teeny weenie problem. (By "rebuilding" an index, we mean that it is in effect, deleting and recreating it again).

When rebuilt, various options set on the indexes are not recreated the same way as they were originally created. As you may have guessed, this is extremely uncool, since SharePoint set various indexes in a certain way, it expects things to remain consistent.

There is a KB that you can read all about it here.

http://support.microsoft.com/kb/930887/en-us

Also, another great blog post about the issue is here:

http://blogs.vertigo.com/personal/michael/Blog/Lists/Posts/Post.aspx?ID=4

Some of effects of this dodgily recreated index are listed in the KB article, including:

  1. In Shared Services, the "indexing status" remains stuck in the "Crawling" state
  2. The number of handles that are opened by the MSSearch.exe process increases
  3. The number of TCP connections to the computer that is running SQL Server increases
  4. Several error are recorded in the SharePoint logfiles.
  • "SqlCrawl::ExecuteCommand fails Error 0×80040e2f"
  • CGathererQueueManager::FlushQueue failed with recoverable error 0×80040e2f
  • CGathererFilterSink::CommitLinks : pGatherAddLink->AddLinkComplete error=0×80040e2f

Additionally, in my case, MSSEARCH.EXE leaked memory very rapidly when viewed in Task Manager and I received the event logs as shown at the end of this post.

To resolve it, follow the instructions at Michael’s blog as the KB article is poorly written, but make sure that you do it for all tables as listed in the KB article! Michael’s post only covers 2 indexes and there actually are more in the search database as shown below.

Table Index
MSSAlertDocHistory IX_AlertDocHistory
MSSAnchorChangeLog IX_MSSAnchorChangeLog
MSSAnchorPendingChangeLog IX_MSSAnchorPendingChangeLog
MSSCrawlChangedSourceDocs IX_MSSCrawlChangedSourceDocs
MSSCrawlChangedTargetDocs IX_MSSCrawlChangedTargetDocs
MSSCrawledPropSamples IX_MSSCrawledPropSamplesByDocid
MSSCrawlErrorList IX_MSSCrawlErrorList_hrResult
MSSCrawlHostList IX_MSSCrawlHostList_Name
MSSCrawlQueue IX_MSSCrawlQueue
MSSDocSdids IX_MSSDocSdids

Hope this helps someone

Paul

Error log entries below for the google crawler. You can stop reading now! :-)

  • Event Type:      Error
  • Event Source:   Office SharePoint Server
  • Event Category:            Office Server General
  • Event ID:          7888
  • Date:                1/07/2008
  • Time:                12:03:03 PM
  • User:                N/A
  • Computer:         MyServer

Description:

Message: A connection was successfully established with the server, but then an error occurred during the pre-login handshake.  When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: Named Pipes Provider, error: 0 - No process is on the other end of the pipe.)

 

  • Event Type:      Error
  • Event Source:   Office Server Search
  • Event Category:            Gatherer
  • Event ID:          10036
  • Date:                1/07/2008
  • Time:                12:03:38 PM
  • User:                N/A
  • Computer:         MyServer

Description:

A database error occurred.

Source: Microsoft OLE DB Provider for SQL Server

Code: 17 occurred 1 time(s)

Description: [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.

 

  • Event Type:      Error
  • Event Source:   Windows SharePoint Services 3
  • Event Category:            Database
  • Event ID:          3355
  • Date:                1/07/2008
  • Time:                12:04:08 PM
  • User:                N/A
  • Computer:         MyServer

Description:

Cannot connect to SQL Server. MyServer not found. Additional error information from SQL Server is included below.

[DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.

  • Event Type:      Error
  • Event Source:   Office SharePoint Server
  • Event Category:            Office Server Shared Services
  • Event ID:          6482
  • Date:                1/07/2008
  • Time:                12:05:12 PM
  • User:                N/A
  • Computer:         MyServer

Description:

Application Server Administration job failed for service instance Microsoft.Office.Server.Search.Administration.SearchServiceInstance (009e9001-5c8f-4705-9b4d-ee514c442fc7).

Reason: Exception from HRESULT: 0×80040D1B

Techinal Support Details:

System.Runtime.InteropServices.COMException (0×80040D1B): Exception from HRESULT: 0×80040D1B

   at Microsoft.Office.Server.Search.Administration.SearchServiceInstance.SynchronizeDefaultContentSource(IDictionary applications)

   at Microsoft.Office.Server.Search.Administration.SearchServiceInstance.Synchronize()

   at Microsoft.Office.Server.Administration.ApplicationServerJob.ProvisionLocalSharedServiceInstances(Boolean isAdministrationServiceJob)

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

07/01/2008 12:25:01.85 mssearch.exe (0×0E20)                          0×0E38 Search Server Common             GathererSql                               0          Monitorable       CSqlCrawl::ExecuteCommand fails Error 0×80040e2f - File:d:\office\source\search\search\gather\server\gathersql.cxx Line:407 

No Tags




Today is: Friday 21 November 2008 |