Disk and I/O Sizing for MOSS2007 – Part 2

Send to Kindle

This is part two of a two part article that discusses techniques for sizing your disk requirements for SharePoint. It is one of a myriad of tasks that you have to perform as part of the planning process, but it is one task that you ideally want to get right the first time, as storage technology can be a potentially high capital cost. I first performed a comprehensive analysis of disk space and performance scalability requirements for a MOSS07 farm back in February 07.

The first article examines some of the tools and techniques you can utilise to help you fill out Microsoft’s sizing worksheet. Part two builds upon this with a real-world example. So to recap from part 1, we have the following stats captured.

MYSERVER stats taken 10am – 30/1/2007

  • System Uptime: 551 hours
  • Total Logons since reboot: 927794
  • Logons per minute – 927794 / (551 * 60): 28
  • Total Opened Files since reboot: 114518692
  • Files opened per minute – 114518692 / (551 * 60) : 3463
  • Average Disk Queue length: 2-4
  • Reported open files in computer management: 1206
  • Number of unique users listed with opened files: 201
  • Number of open files for APPS: 664
  • Number remaining not APPS: 542
  • % of open files to DATA shares: 45%
  • Of non APPS, number of open files (as opposed to folder): 314
  • % of open files versus active listed files (314/1206): 26%

So first we will estimate I/O requirements and finish off with disk space.

Required I/O Throughput Estimation

Throughput is the number of operations that a server farm can perform per second and is measured in requests per second (RPS). Ideally the number of operations that are requested per second is lower than the number that can be performed. If the number of operations that is requested exceeds the number that can be performed, user actions and other operations take longer to complete.

From Microsoft’s capacity documentation: “RPS measurements can be converted to the total number of users by using a model of typical end-user behaviour. Like many human behaviours, there is a broad range of “typical” behaviour. The user model for Windows SharePoint Services 3.0 has the following two variables:”

  • Concurrency – The percentage of users that are using the system simultaneously.
  • Request rate — The average number of requests per hour that an active user generates.

So, Referring to our assumptions above from examining MYSERVER.

  • Number of unique users listed with opened files: 219
  • Number of staff in Head Office: 600

Equals an estimated base user concurrence of 33%. 

So now we have a user concurrancy rate, lets now examine the rate that files are opened on the server and also apply a weighting.

Now when I did this examination, the “files open per minute” figure that I reached was 3463. This seemed a very high number, and I believed misrepresented the true number of files opened per minute. As it happened, this server hosted an application share called APPS where applications were run directly from the server (F:\). It was very difficult to determine how much impact the APPS share had on this figure, because the nature of the applications was that files were accessed/opened/executed potentially much more then static data files. Unfortunately for me, the APPS and DATA shares were on the same logical disk partition, so I had no easy way to split the I/O into quantifiable counters. So this is one estimate where I applied a best guess (see the 5% – 20% below).

  • Files opened per minute – 114518692 / (551 * 60) : 3463
  • Of non F:\APPS, number of open files (as opposed to folder): 314
  • % of open files versus active listed files (314/1206): 26%
  • Non APPS share I/O weighting: (5%-20%)

So now, I used the following methodology to estimate current load.

I took the “number of files opened per minute figure” and multiplied by the “% of open files versus active listed files” and called that “RELEVENT FILES”

  • 3463 * 26% = 900 Relevant files opened per minute

Now I took the “relevent files” figure and applied divided it by the number of concurrent users.

Relevant Files / Concurrent Users = Requests Per User Per Minute

  • 900 / 219 = 4.1

Next, I applied the “Non APPS share I/O weighting” figures and applied them

  • 4.1 * 5% = 0.2 requests per user per minute
  • 4.1 * 20% = 0.82 requests per user per minute

So, my result was that between .2 and 0.82 file requests per user per minute. If we examine Microsoft’s published throughput table that helps you estimate load, we can see where we fit.

Load

Request rate

Supported users

Light

20 requests per hour. An active user will generate a request every 180 seconds.

Each response per second of throughput supports 180 simultaneous users and 1,800 total users.

Typical

36 requests per hour. An active user will generate a request every 100 seconds.

Each response per second of throughput supports 100 simultaneous users and 1,000 total users.

Heavy

60 requests per hour. An active user will generate a request every 60 seconds.

Each response per second of throughput supports 60 simultaneous users and 600 total users.

Extreme

120 requests per hour. An active user will generate a request every 30 seconds.

Each response per second of throughput supports 30 simultaneous users and 300 total users.

  • .2 * 60 = 12 requests per minute
  • .82 * 60 = 49.2 requests per minute

According to the above table, this represents “Normal” to just under “Heavy” load. Given the server was already under Disk I/O strain (disk queue length > 2), and the fact that I applied a fudge factor when estimating the overhead of non data files (APPS), I recommended that we size the server to accommodate “heavy” load.

Phew! That was a lot of variables and formulas.. and unfortunately we are not there yet!

Complex vs. Common Operations and Daily Peak Factor

Microsoft identify that different types of operations performed in SharePoint have different load requirements.

  • Common Operations typically include tasks like browsing categories and lists and hitting the home page.
  • Complex operations are document collaboration functions (including document check-in and document upload.) Microsoft suggests that these operations carry a higher weighting when estimating total operations per day. (3 * common operations)

The next step is to estimate complex versus common operations. As I mentioned at the very start of this post, one of the key requirements of SharePoint here was for document collaboration. Thus I introduced my second “best guess” estimate. That estimate was that 75% of operations were common, and 25% of operations were complex.

Now astute readers of this post may have noticed that thus far, I had not applied any weighting calculations to time of day factors. Fear not, I did not forget about it! 🙂

So far, our estimations of usage have been based on a sample taken of the file server at 10am on the 30/1/2006. We do not know for sure if this represents peak usage, nor do we expect the same level of load to be experienced at 10PM as opposed to AM. So we make the next two assumptions

Peak Factor is an approximate number that estimates the ratio of peak site throughput to its average throughput. Microsoft recommends a value of 1-4.

Number of hours per day estimates the hours of the day where we would expect to experience average to peak load

  • Complex operations: 25%
  • Common operations: 75%
  • Peak Factor: 2
  • Number of hours per day: 10

Previously we estimated .2 to .8 requests per minute. Utilising .8, this equates to 1152 theoretical requests per user per day.

Here is the calculation

  • Total Operations Per user Per Day: 1152
  • Common Operations (75%): 864
  • Complex Operations (weight 3): 864* (288 * 3)
  • Peak Factor: 2
  • Number of hours per day: 10

Now we can use Microsoft’s “Estimate Peak Throughput Worksheet” to estimate capacity requirements to accommodate all of these factors.

image_thumb[9]

Estimate User Response Time

Response times are categorized in the following way:

  • Slow (3-5 seconds) User response times can slow to this rate without issue
  • Recommended (1-2 seconds) The average user response time target.
  • Fast (<1 second) For organizations whose businesses demand speed

The following Microsoft table lists throughput targets based on user response times.

Total users

Slow (RPS)

Recommended (RPS)

Fast (RPS)

500 .4 .5 .7
1,000 .7 1.0 1.2
5,000 4.0 5.0 6.0

There were 600+ users in the head office so estimated RPS to achieve FAST is around .875 RPS. This is simply an interpolation from .7 RPS for 500 users and 1.2RPS for 1000 users.

The following Microsoft table lists throughput targets in RPS at various concurrency rates

Total users

5% concurrency rate

10%

15%

25%

50%

75%

100%

500 .25 .5 .75 1.25 2.5 3.75 5.0
1000 .5 1.0 1.5 2.5 5.0 7.5 10.0
5,000 2.5 5.0 7.5 12.5 25.0 37.5 50.0

So, we previously determined that that there was 600+ users in head office where the file server we examined resides. The concurrency factor for the sample period was 33%. So, lets doubled the concurrency rate to accomodate for when a higher percentage of staff are active. In this case, the estimated RPS to achieve 66% is 3.125.

As we previously saw in the table earlier that illustrated the sorts of RPS figures required to accomodate “Normal”, vs “Heavy” load, the RPS required to accommodate 462 simultaneous users under heavy to extreme load is 7.2 and 14.5 RPS. Therefore, this figure should more than accommodate for response time and concurrency estimates which are considerably less.

Estimating Disk Space Requirements

To explain how I estimated future disk growth, I have to describe the activities of the client. They are a project oriented service delivery company. Prior to commencing the pilot, I asked several key stakeholders of the project to broadly categories the different project types across some basic time, concurrency and disk space criteria.

Activity Type

Number active

Disk Size per Project

Duration

Projects (small)

4-6

50GB

1 Year

Projects (medium)

3-4

150GB

2 Years

Projects (large)

1-3

350GB

3 Years

Detailed Feasibility Studies

3-4

5GB

6-9 months

Pre-Feasibility Studies

2-3

5GB

3-6 months

Scoping Studies

2-3

2GB

2-3 months

Proposals

3-4

2GB

1-2 months

As I mentioned in Part 1, a limited pilot was conducted over a 3 month period. The pilot used a detailed feasibility study. At the time of the commencement of the Pilot, the study was 70% complete. The original data copied to the SharePoint web application was:

3725 Files. 2.97GB

As of 8 weeks later using windows explorer view against document libraries, the total files/disk space was:

6432 Files, 3.83GB

Note: Windows Explorer View only reports the latest version of a file, and does not report recycle bin data. Therefore, we can use it to estimate organic file growth without SharePoint versioning and recycle bin implications.

So, the aforementioned study was around 70% complete at the start of the pilot and almost 90% complete by the time the study period officially concluded. Therefore over the period of 2 months, the data grew by around 942MB. Based on this, we can assume that the final size of the study would probably have been around 4.16GB.

This figure was derived from the 942MB in 2 months, extrapolated to 1413 megabytes (1.4GB) for the quarter (942 * 1.5) . This was added to the 2.97GB originally loaded in the Pilot. What gives me assurance that this estimate is accurate is that if we compare this real world figure initially provided for a detailed feasibility study, final estimate of 4.16GB fits closely to the 5GB estimated.

Activity Type

Number active

Disk Size per Project

Duration

Detailed Feasibility Studies

3-4

5GB

6-9 months

Now, again referring to the list of project types, examine the “number active” column. Lets assume that we are at full capacity, and we have the upper estimate of projects active. From this we can derive a quarterly growth rate in organic disk usage.

Activity Type

Growth Rate per quarter

Projects (small)

12.GB

Projects (medium)

18.75GB

Projects (large)

30GB

Detailed Feasibility Studies

1.6GB

Pre-Feasibility Studies

2.5GB

Scoping Studies

2GB

Proposals

1GB

Now, here is the fun bit! The Pilot database size was approximately 5GB after the data was initially copied into SharePoint. Remember this was when the project was 70% complete. However, the SQL database size at the conclusion of the pilot (90% completion of the project) was 8.6GB

This is 3.6GB for 2 months and projected further would be 5.4GB for a quarter. This is over 3 times the amount of disk required prior to SharePoint!  Microsoft say plan for twice the data you will experience. I suggest this estimate may be is a little low.

So, the next step is to revisit the above tables and recalculate assuming this figure seen in the pilot was actually the case going forward.

 

Activity Type

Number active

Disk Size

Duration

Projects (small)

4-6

169GB

1 Year

Projects (medium)

3-4

506GB

2 Years

Projects (large)

1-3

1181GB

3 Years

Detailed Feasibility Studies

3-4

17GB

6-9 months

Pre-Feasibility Studies

2-3

17GB

3-6 months

Scoping Studies

2-3

7GB

2-3 months

Proposals

3-4

7GB

1-2 months

Activity Type

Growth Rate per quarter in SharePoint

Projects (small)

42GB

Projects (medium)

63GB

Projects (large)

98GB

Detailed Feasibility Studies

5.6GB

Pre-Feasibility Studies

8.5GB

Scoping Studies

7GB

Proposals

4.6GB

So now that we have some figures to work with, lets look at Microsoft’s general recommendations for SQL Server.

Database log files        

Disk space for log files will vary based on log settings and the number of databases. For more information, see Physical Database Storage Design (http://go.microsoft.com/fwlink/?LinkId=78853&clcid=0x409).

Configuration database            

The configuration database will not grow past this size.   1.5 GB

Free space       

 Leave at least 25% free space for each hard disk or volume.           

Microsoft has recommended that you allow twice as much disk space for content to allow for versioning. Based in the figures in the previous section, the figure derived was more than 3 times the disk space as content.

So the next step was to estimate disk usage two years out. We were not going to immediately implement SharePoint for all existing projects. Instead, new projects would use SharePoint as they came onstream. This meant that it would be a ramp up phase before we hit our ‘optimum’ usage.

So to perform this estimate, I made the following assumptions.

  • It assumes that we are running the largest estimate of concurrent projects and studies
  • It assumes the length of these projects and studies are at the upper end of the estimation
  • It assumes an evenly varied spread of the timing of studies and projects.
    • For example: Of the 6 active small projects, 2 have recently started, 2 are halfway through and 2 are almost complete

So below is the consolidated table listing these assumptions, as well as incorporating Microsoft’s SQL recommendations.

Activity Point in time Disk Space Used
Projects (small) * 2 start 84GB
Projects (small) * 2 middle 216GB
Projects (small) * 2 end 338GB
Projects (medium) * 2 start 126GB
Projects (medium) * 1 middle 252GB
Projects (medium) * 1 end 506GB
Projects (large) * 1 start 392GB
Projects (large) * 1 middle 784GB
Projects (large) * 1 end 1181GB
Detailed Feasibility Study * 2 start 11.2GB
Detailed Feasibility Study * 1 middle 11.2GB
Detailed Feasibility Study * 1 end 16.8GB
Pre Feasibility Study * 2 start 17GB
Pre Feasibility Study * 1 end 17GB
Scoping Study * 4   28GB
Proposal * 4   18.4GB
TOTAL DISK SPACE   3998.6GB
Free Space Requirement (25%) 999.65GB
Database Log Files (10%) of content 500MB
TOTAL SQL Server Disk Space 5.5TB

So, excluding log files, we have a total potential disk space of over 5 terabytes! Holy Crap!! (collective intake of breath). That is even before we look at disk space for SQL backups and indexing!! For backups, I personally like to have at least 3-4 days worth of disk space to hold more than 24 hours worth of full SQL backups, before I have to go to tape. However if you do the maths, the numbers get really scary!

On top of that, indexing is supposed to take somewhere from 10-50% of the total disk space of the content. Lets go with 20% for now. Thus, in two years we would be talking an additional 1TB for the index, plus the mandatory 25% free disk space.

The Recommendation

I am not going to divulge the full details of the recommendation here, but suffice to say that anybody who takes these figures and waltzes into IT Management and suggests that you need 5TB+ for one application for what used to be 1.2TB of file storage is likely going to be laughed out of the office. Quite rightly so. This is an estimation at the end of the day and it all depends on your assumptions (garbage in, garbage out).

Change a percentage here and there and the result may have been totally different.

But what this does tell you, is that there is the *potential* for this sort of growth, and even if the real world figure is half this rate, you are still talking a more accelerated growth rate than ever before, putting more pressure on your disk and backup infrastructure to cope. I simply asked the client to examine my methodology and draw their own conclusions.

So irrespective of your assumptions and results, a detailed review of your storage infrastructure is a must and a SAN is likely for any SharePoint farm for medium to large organizations. A prudent organisation will start planning now for that eventuality, rather than spend stupid amounts of cash up front.

So the company did not buy a 5TB SAN, and I never suggested they do :-), but they did opt to review both their existing SAN and backup infrastructure as a result and found that both were sub-optimally set up and incapable of scaling/growing even to a fraction of the estimates. thus, when they approached vendors, they had clear, well defined requirements. (SharePoint was not the only consideration either – Oracle/ERP also weighed into this process heavily).

Eventually a new, fully redundant SAN was purchased and installed with 1.5TB of disk and that allowed for SQL Backups as well. The old backup technology was replaced by a new architecture and considerable time and effort was spent on disk array/logical volume design and performing redundancy and performance testing.

As a side note.. the factor of 3 estimation in disk consumption observed during the pilot phase continued into full scale production 🙂

Now many, many consultants will be asked to perform this sort of work as part of the planning phase of a large SharePoint implementation. If you find that these two article helped you in your endeavors, please let me know 🙂

thanks for reading

 Digg  Facebook  StumbleUpon  Technorati  Deli.cio.us  Slashdot  Twitter  Sphinn  Mixx  Google  DZone 

No Tags

Send to Kindle
Bookmark the permalink.

4 Responses to Disk and I/O Sizing for MOSS2007 – Part 2

  1. Steve says:

    A very well reasoned and not overly ambiguous set of estimates. Good job! Whilst the figures themselves are obviously irrelevant to anyone else. The very fact I was able to forward this over to my managers, finally gave some perspective on my apparent inane ramblings over sizing servers! So many thanks!

  2. Sola Noah says:

    This is an excellent job.

  3. Sola Noah says:

    using the formulas you used above, please advise on how you would go about with the following requirements:

    user: 35000
    concurrent user: 40%
    Peak factor: 2:1
    Business hours = 12.

    When i followed the steps you used i got 1360 requests per seconds which i think doesn’t make sense.

    if my calculation is right, how many web servers, search servers would you recommend?

  4. Sola Noah says:

    The microsoft tables referenced in your blog, where did you get them from?

Leave a Reply

Your email address will not be published. Required fields are marked *