BPC 09 August Wrap-up

Tags: Best Practices,Conference,SharePoint,Speaking presentation,user engagement,Workshop @ 1:46 pm

It’s been quite some time since I have blogged, and the months of July/August were really pretty full-on professionally. High work demands and preparation for the Best Practices Conference meant that blogging and any sort of public work really took a back-seat.

So now that it is over and I am sitting here in an airport lounge reflecting, I have one word to say about the BPC 09 in DC that I just attended.

Segways rock!

You have not lived until you have ridden a segway – and I mean the new ones where you lean to control it. Even better than riding a segway is to ride it through DC, a city that blew me away with its amazing campus-like vibe. Never before has a city really pushed my buttons like this place. As you can see above, Ruven Gotz, Andrew Woodward and I visited some of the amazing DC sights and this photo in particular of Capitol Hill behind us, looking out toward the Lincoln memorial is something I’ll never forget. One thing is for sure, I am definitely coming back here for at least 2-3 weeks with the family just to absorb everything – sooner rather than later.

These best-practices events are really the conference where the material goes beyond the tech-stuff and caters additionally for an audience of consultants, business analysts, project managers and CIO’s. Some sessions are completely strategic, yet some delve deep into the guts of tech. In short, there is something for everyone.

Since the San Diego event back in Feb 09, my fellow kindred spirits, Ruven Gotz, Andrew Woodward, Dux Sy and I have been collaborating to bring our respective messages together and put a cohesiveness around what we have to say. This conference was the first peek of the fruit of that collaborative effort and as far as audience feedback went, it went down really well. In my last session of the conference I had a little Jerry Springer moment, when describing a particular slide that was a part of all of our respective talks, when I asked the audience to describe the slide and they all answered in perfect unison – hehe the message was heard 🙂

I presented 3 sessions, one on my own, and two co-presented with Ruven Gotz and Peter Serzo respectively. Ruven and I have been unhappy with the current (mis)understanding of governance around SharePoint and the culmination of our efforts over the last new months was a talk called “Governance – the other 90%”, although next time I will rename it to “Zen and the art of SharePoint Governance”. This was an umbrella talk to introduce our concepts and thinking, and then it was followed up later in the conference by talks by Ruven, myself and Andrew Woodward. Andrew took out the best overall speaker award too for the conference and we are all super-excited by the fruits of this collaboration so far, and I *know* that bigger and better things are to come from it.

For what its worth, if you want to be “one with SharePoint governance”, remember the key takeaway – beer is the best governance tool you’ve got. 🙂

The session with Peter Serzo was also such fun – we hit upon the idea of doing a Zoolander themed talk to introduce SQL Reporting Services to people who had not seen it. SSRS is one of those tools and technologies where even some well known SharePoint practitioners have not used it. With a title of “Reporting Services for the Really Really Good Looking”, we attracted a good turnout and managed to create and publish a report in 5 minutes, explain the architecture by getting members of the audience to stand up, wear aprons and “model” the various components and show how with no code, information workers could include drill down reports as well as personalised reports via parameters and filter web parts. The very awesome Brett Lonsdale (the guru behind BDC Meta Man), interrupted proceedings, claiming that the BDC was a better option and there was only one way to settle it – he and I had to do a BDC vs SSRS “walk off” which I *of course*, won easily :-).

Brett was a great sport, and I think for a minute audience members actually thought he was serious 🙂

(Now here comes the gushing love people..)

Also this time around, I’ve gotten to know more people and the rare opportunity to hook up again and spend some quality time with various friends was really the thing that makes the 36hr transit from Perth worth it. From the BBQ at Dux Sy’s house, to some great dialogue with Ben Curry of Mindsharp, Judy Cowan of Microsoft, Evan Burfield and the Synteractive people and my SamePage amigo’s (Dux, Ruven and Andrew Woodward). Ben Curry is an out and out genius, and our CV’s seem to read from the same script and I wish I had more opportunity to work with him. Evan Burfield is so scary smart that he automatically become my hero with his amazing understanding of stuff that I am interested in, yet only scratching the surface of the surface 🙂

Special callout to Laura Rogers (aka @wonderlaura). I had to look up what a muse was after she told me I was hers (I am a little sheltered over in Perth 🙂 ). Love her work – in fact, it is *her* work at endusersharepoint.com that inspired me to get back into more end-user content and try this whole screencasting thing. Laura, we are going to have to do something on EUSP together if you’d be up for it – it was a lot of fun hanging out and I think Peter Serzo would be up for something fun as well.

To the people who sat around the bar/restaurant/lobby table and supplied great laughs and great conversation. Zlatan Dzinic is utterly brilliant but is genetically programmed to be unable to whisper in someone else’s session 🙂 ). Mike Ferarra from SharePointReviews.com, Dan Usher, Lori Gowin, Ben Curry for messing with my head constantly, Paul Kolasky (the self-confessed poster boy for wicked problems who Laura eventually discovered *is* harmless 🙂 ) and special mention to Paul Stork – who was labelled as “that nerd who always has to jump ahead” during the Zoolander session.

Final thanks – Peter Serzo and his wife, Stacey. Peter is the most “Australian Amercian” who I have ever met. I think he must have been an Aussie in a former life. He is a tireless worker who’s imagination and inventiveness always sets off sparks of inspiration for me. I am going to try and find a way to get him over to Perth for a local training workshop or user group thing – they’d love his sense of humour.

Sorry if I missed anybody, I apologise. Also, please send me any photos, as many were taken, but I really was quite unsure about who’s camera was clicking at the time 🙂

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(5) Comments

Troubleshooting SPSearch and good practices for moving large files

Tags: Infrastructure,Performance,SAN,SharePoint,Storage @ 1:26 pm

Every so often I get back into geek mode and roll up the sleeves and get my SharePoint hands dirty. Not so long ago I was assisting a small WSS3 based organisation with a major disk subsystem upgrade (iSCSI SAN) and locally attached storage upgrade, driven by content database growth and a need to convert some sites to site collections. By the end of it all, we had a much better set-up with a much better performing disk subsystem, but I was hit by two problems. One was WSS Search being broken and needing a fix and the other was the appallingly slow speed of copying large files around.

So, let’s talk about fixing broken search and then talk about copying large files.

1. When SPSearch breaks…

The SharePoint install in question was a WSS3 site with SP1, and Search Server 2008 had not been installed. The partition that contained the indexes (F:\) had to be moved to another partition (G:), so to achieve this I used the command

stsadm –o spsearch indexlocation G:\DATA\INDEX

Now, 99.9% of the time this will happily do its thing. But today was the 0.01% of the time when it decided to be difficult. Upon executing this command, I received an RPC error. Unfortunately for me, I was out of time, so I decided to completely re-provision search and start all over again.

It didn’t matter whether I tried this in Central Administration->Operations->Services on Server, or via the command line below. Both methods would not work.

stsadm -o spsearch -action stop

On each attempt, search would get stuck on unprovisioning (stopping) with a sequence of events in the event log (eventID 2457, 2429 and 2428).

===========================================================================================

Event Source: Windows SharePoint Services 3 Search 
Event Category: Gatherer 
Event ID: 2457 
Description: 

The plug-in manager <SPSearch.Indexer.1> cannot be initialized. 
Context: Application 'Search index file on the search server' 

Details: 
The system cannot find the path specified. (0x80070003) 

===========================================================================================  
Event Source: Windows SharePoint Services 3 Search 
Event Category: Gatherer 
Event ID: 2429 
Description: 

The plug-in in <SPSearch.Indexer.1> cannot be initialized. 
Context: Application '93a1818d-a5ec-40e1-82d2-ffd8081e3b6e', Catalog 'Search' 

Details: 
The specified object cannot be found. Specify the name of an existing object. (0x80040d06) 

===========================================================================================

Event Source: Windows SharePoint Services 3 Search 
Event Category: Gatherer 
Event ID: 2428 
Description: 

The gatherer object cannot be initialized. 
Context: Application 'Search index file on the search server', Catalog 'Search' 

Details: 
The specified object cannot be found. Specify the name of an existing object. (0x80040d06)

So, as you can see I was stuck. I couldn’t not clear the existing configuration and the search service would never actually stop. In the end, I started to wonder whether the problem was that my failed attempt to change the index partition had perhaps not reapplied permissions to the new location. To be sure I reapplied permissions using the following STSADM command

psconfig -cmd secureresources

This seemed to do the trick. Re-executing the stsadm spsearch stop command finally did not come up with an error and the service was listed as stopped.

Once stopped, we repartitioned the disks accordingly and now all I had to do was start the damn thing 🙂

Through the Central Administration GUI I clicked Start and re-entered all of the configuration settings, including service accounts and the new index location (G:\DATA\INDEX). After a short time, I received the ever helpful “Unknown Error” error message.

Rather than change debug settings in web.config, I simply checked the SharePoint logs and the event viewer. Now, I had a new event in the logs.

Event Type: Warning 
Event Source: Windows SharePoint Services 3 Search 
Event Category: Gatherer 
Event ID: 10035 
Description: 
Could not import the registry hive into the registry because it does not exist in the configuration database. 
Context: Application '93a1818d-a5ec-40e1-82d2-ffd8081e3b6e'

Hmm… It suggests a registry issue, so I checked the registry.

Although the error message really made no sense to me, checking the registry turned out to be the key to solving this mystery. If you look carefully in the above screenshot, note that the registry key DataDirectory was set to “F:\DATA\INDEX”.

I was surprised at this, because I had re-provisioned the SPSearch to use the new location (G:\DATA\INDEX). I would have thought that changing the default index location would alter the value of this key. A delve into the ULS logs showed events like this.

STSADM.EXE (0x0B38) 0x0830 Search Server Common MS Search Administration 95k1 High WSS Search index move: Changing index location from ‘F:\data\index’ to ‘G:\data\index’.

STSADM.EXE (0x0B38) 0x0830 Search Server Common MS Search Administration 95k2 High WSS Search index move: Index location changed to ‘G:\data\index’.

STSADM.EXE (0x0B38) 0x0830 Search Server Common MS Search Administration 0 High CopyIndexFiles: Source directory ‘F:\data\index\93a1818d-a5ec-40e1-82d2-ffd8081e3b6e’ not found for application ’93a1818d-a5ec-40e1-82d2-ffd8081e3b6e’.

STSADM.EXE (0x0F10) 0x1558 Windows SharePoint Services Topology 8xqz Medium Updating SPPersistedObject SPSearchServiceInstance Parent=SPServer Name=DAPERWS03. Version: 218342 Ensure: 0, HashCode: 54267293, Id: 305c06d7-ec6d-465a-98be-1eafe64d8752, Stack: at Microsoft.SharePoint.Administration.SPPersistedObject.Update() at Microsoft.SharePoint.Administration.SPServiceInstance.Update() at Microsoft.SharePoint.Search.Administration.SPSearchServiceInstance.Update() at Microsoft.Search.Administration.CommandLine.ActionParameter.Run(StringBuilder& output) at Microsoft.SharePoint.Search.Administration.CommandLine.SPSearch.Execute() at Microsoft.Search.Administration.CommandLine.CommandBase.Run(String command, StringDictionary keyValues, String& output) at Microsoft.SharePoint.StsAdmin.SPStsAdmin.RunOperation(SPGlobalAdmin globalAdmin, String st…

mssearch.exe (0x1654) 0x1694 Search Server Common IDXPIPlugin 0 Monitorable CTripoliPiMgr::InitializeNew – _CopyNoiseFiles returned 0x80070003 – File:d:\office\source\search\ytrip\search\tripoliplugin\tripolipimgr.cxx Line:519

mssearch.exe (0x1654) 0x1694 Search Server Common Exceptions 0 Monitorable <Exception><HR>0x80070003</HR><eip>0000000001D4127F</eip><module>d:\office\source\search\ytrip\search\tripoliplugin\tripolipimgr.cxx</module><line>520</line></Exception>

Note the second last line above (marked bold and italic). It showed that a function called CopyNoiseFiles returned a code of 0x8007003. This code happens to be “The system cannot find the path specified,” so it appears something is missing.

It then dawned on me. Perhaps the SharePoint installer puts some files into the initially configured index location and despite moving the index to another location, SharePoint still looks to this original location for some necessary files. To test this, I loaded up a blank Windows 2003 VM and installed SharePoint SP1 *without* running the configuration wizard. When I looked in the location of the index files, sure enough – there are some files as shown below.

It turned out that during our disk reconfiguration, the path of F:\DATA\INDEX no longer existed. So I recreated the path specified in the registry (F:\DATA\INDEX) and copied the contents of the CONFIG folder from my fresh VM install. I then started the search service from Central Administration and… bingo! Search finally started successfully…Wohoo!

Now that I had search successfully provisioned, I re-ran the command to change the index location to G:\DATA\INDEX and then started a full crawl.

C:\>stsadm -o spsearch -indexlocation G:\DATA\INDEX

Operation completed successfully.

C:\>stsadm -o spsearch -action fullcrawlstart

Operation completed successfully.

I then checked the event logs and now it seems we are cooking with gas!

Event Type: Information 
Event Source: Windows SharePoint Services 3 Search 
Event Category: Gatherer 
Event ID: 10045 
Description: 

Successfully imported the application configuration snapshot into the registry. 
Context: Application '93a1818d-a5ec-40e1-82d2-ffd8081e3b6e' 

Event Type: Information 
Event Source: Windows SharePoint Services 3 Search 
Event Category: Gatherer 
Event ID: 10044 
Description: 

Successfully stored the application configuration registry snapshot in the database. 
Context: Application 'Serve search queries over help content'

As a final check, I re-examined the registry and noted that the DataDirectory key had not changed to reflect G:\. Bear that in mind when moving your index to another location. The original path may still be referred to in the configuration.

2. There are RAID cards and there are RAID cards

Since part of the upgrade work was to improve disk performance, we had to detach databases and move them around while we upgraded the disk infrastructure and repartitioned existing disk arrays. The server had an on-board Intel RAID controller with two arrays configured. One was a two-disk RAID 0 SCSI and the other was a three-disk RAID 5 SATA array. The performance of the RAID 5 SATA had always been crap – even crappier than you would expect from onboard RAID 5. When I say crap, I am talking around 35 megabytes per second transfer rate – even on the two-disk SCSI RAID 0 array.

Now, 35MB/sec isn’t great, but not completely intolerable. But, what made this much, much worse however, was the extreme slowness in copying large files (ie >4GB). When trying to copy files like this to the array, the throughput dropped to as low as 2MBs.

No matter whether it was Windows Explorer drag and drop or a command line utility like ROBOCOPY, the behaviour was the same. Throughput would be terrific for around 60 seconds, and then it would drop as shown below.

My client called the server vendor and was advised to purchase 4 SCSI disks to replace the SATA’s. Apparently the poor SATA performance was actually because SCSI and SATA were mixed on the same RAID bard and bus. That was a no-no.

Sounded plausible, but of course, after replacing the RAID 5 with the SCSI disks, there was no significant improvement in disk throughput at all. The performance of large files still reflected the pattern illustrated in the screenshot above.

Monitoring disk queue length on the new RAID array showed that disk queues were off the planet in terms of within normal boundaries. Now, I know that some people view disk queue length as a bit of an urban myth in terms of disk performance, but copying the same files to the iSCSI SAN yielded a throughput rate of around 95MBs and the disk queue value rarely spiked above 2.

Damn! My client wasn’t impressed with his well known server vendor! Not only does the onboard RAID card have average to crappy performance to begin with, RAID 5 with large files makes it positively useless.

Fun with buffers

To me, this smelt like a buffer type of issue. Imagine you are pouring sand into a bucket and the bucket has a hole in it. If you pour sand into the bucket at a faster rate than the hole allows sand to pour out, then eventually you will overflow the bucket. I suspected this sort of thing was happening here. The periods of high throughput were when the bucket was empty and the sand filled it fast. Then the bucket filled up and things slowed to a crawl while all of that sand passed through the metaphorical hole in the bottom. Once the bucket emptied, there was another all-too-brief burst of throughput as it filled quickly again.

I soon found a terrific article from EPS Windows Server Performance Team that explain what was going on very clearly.

Most file copy utilities like Robocopy or Xcopy call API functions that try and improve performance by keeping data in a buffer. The idea is that files that are changed or accessed frequently can be pulled from the buffer, thereby improving performance and responsiveness. But there is a trade-off. Adding this cache layer introduces an overhead in creating this buffer in the first place. If you are never going to access to copy this file again, adding it to the buffer is actually a bad idea.

Now imagine a huge file. Not only do you have the buffer overhead, but you now are also filling the buffer (and forcing it to be flushed), over and over again.

With a large file, you are actually better off avoiding the buffer altogether and doing a raw file copy. Any large file on a slow RAID card will still take time, but it’s a heck of a lot quicker than when combined with the buffering overhead.

Raw file copy methods

In the aforementioned article from the Microsoft EPS Server performance team, they suggest ESEUTIL as an alternative method. I hope they don’t mind me quoting them…

For copying files around the network that are very large, my copy utility of choice is ESEUTIL which is one of the database utilities provided with Exchange. To get ESEUTIL working on a non-Exchange server, you just need to copy the ESEUTIL.EXE and ESE.DLL from your Exchange server to a folder on your client machine. It’s that easy. There are x86 & x64 versions of ESEUTIL, so make sure you use the right version for your operating system. The syntax for ESEUTIL is very simple: eseutil /y <srcfile> /d <destfile>. Of course, since we’re using command line syntax – we can use ESEUTIL in batch files or scripts. ESEUTIL is dependent on the Visual C++ Runtime Library which is available as a redistributable package

I found an alternative to this, however, that proved its worth to me. It is called Teracopy and we tried the free edition to see what difference it would make in terms of copy times. As it happens, the difference was significant and the time taken to transfer large files around was reduced by a factor of 5. Teracopy also produced a nice running summary of the copy thoughput in MB/sec.The product definitely proved its worth and at a whopping $15 bucks, is not going to break the bank.

So, if you are doing any sort of large file copies and your underlying disk subsystem is not overly fast, then I recommend taking a look at this product. Believe me, it will save you a heap of time.

3. Test your throughput

A final note about this saga. Anyone who deals with SQL Server will have likely read articles about best practice disk configuration in terms of splitting data/logs/backups to different disk arrays to maximise throughput. This client had done this, but since Teracopy gave us nice throughput stats, we took the opportunity to test the read/write performance of all disk subsystems and it turned out that putting ALL data onto the SAN array had significantly better performance than using any of the onboard arrays.

This meant that the by-the-book configuration was hamstrung by a poorly performing onboard RAID controller and even if the idea of using separate disks/spindles seemed logical, the cold hard facts of direct throughput testing proved otherwise.

After reconfiguring the environment to leverage this fact, the difference in response time of WSS, when performing bulk uploads was immediately noticeable.

The moral of the story?

If you are a smaller organisation and can’t afford the high end server gear like Compaq/IBM, then take the time to run throughput tests before you go to production. The results may surprise you.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(2) Comments