CleverWorkarounds

Trials or tribulation? Inside SharePoint 2013 workflows–Part 3

Tags: Application Development,Business Analysis,Business Process Management,IAAS,oData,Process Improvement,REST,SharePoint,SharePoint 2013,Training,Troubleshooting,Workflow,Workflow Manager,Workflows @ 6:47 pm

Hi and welcome to part 3 of my series of articles that take a peek under the hood of SharePoint 2013 workflows, while trying to answer the question of whether SharePoint 2013 workflows can enable citizen developers to go forth and solve business problems and catapault organisations to success. In part 1, I introduced you to Megacorp Inc and their need for a controlled documents approval workflow. In part 2, we created a basic SharePoint 2013 workflow that implements the logic outlined in the picture below. The workflow is not yet finished, but we did enough to be able to run it and learn from it, which brings us to this post.

Now I will tell straight up that our first test of this workflow is not going to work. The entire point of this series of articles is to show you *why* things do not always work what you need to do about it. As we progress, I hope that you will learn quite a bit about the operation of workflows in SharePoint 2013, as well as developing and troubleshooting them. After all, we all know that the best citizens are informed citizens!

So let’s get on with testing this particular workflow. We published it to the Documents library in part 2, so now we trigger it by selecting one of the files in the library. In this example, I will select one of the documents tagged as from Megacorp GM Foods. So using the filtering feature provided by metadata navigation, we just show the four documents owned by the Megacorp GM Foods division.

In this example, we will update the document called Burger Additives Standard. The workflow has not yet been set to automatically start, so we will need to manually trigger the workflow ourselves. To do so, click the ellipses next to the Burger Additives Standard document, and then click the ellipsis in the bottom right of the properties/preview window. This will show a second drop down menu. Choose Workflows from this menu as shown below.

Underneath the Start a New Workflow text, you will see the workflow we published in part 2 called Process Owner Approval. Clicking it will start the workflow on this document.

First sign of trouble…

After starting the workflow, the browser will redirect you back to the Documents library. At this point, we see our first sign of trouble. When a workflow is published on a list or library, a column is added that is used to track workflow progress. In this case, we started our workflow, but there is nothing displayed in the workflow status column and the workflow does not appear to run. Hmm… what gives?

When a workflow does not behave as you intended, the easiest way to troubleshoot is to use the workflow status page. As it happens, you have already seen this page, because it is the same page where we started the workflow. So once again, we click on the ellipsis next to the burger additives standard document, click the ellipsis in the properties window and choose Workflows from the drop down menu.

Well look at that… it says the workflow is indeed started…

Clicking on the running workflow, we can see more detail which I have shown below. This screen also says that the workflow is started and nothing appears amiss. But then there is the little blue information symbol next to the Internal Status label. Hovering over this icon displays yet more information. This time, we see an error that would stump many – the sort of error that an information worker would have to call up helpdesk for.

Retrying last request. Next attempt scheduled in less than one minute. Details of last request: HTTP InternalServerError to http://megacorp/iso9001/_vti_bin/client.svc/web/lists/getbyid(guid’a64bb9ec-8b00-407c-a7d9-7e8e6ef3e647′)/Items(7) Correlation Id: de95e312-e24b-42c3-9369-5bae68040219 Instance Id: 9ed3a11d-f665-4512-9b17-78850356c846

Okaaaay, so that error message is about as useful as Windows Vista. It shows an internal server error and a correlation ID. Furthermore, if you wait another minute or so, and then refresh the workflow status screen, it’s internal status is no longer started, but now has the status of Cancelled. You can see it for yourself below…

One again, clicking the little blue info button gives us more detail. Unfortunately, the detail consists of an even scarier appearing message than the previous one. This one looks nasty enough to freak out some SharePoint admins too. Check it out – it is pretty much useless in terms of conveying any information of value.

RequestorId: 8ad4a017-7e6f-0d0f-35d2-81c56a05b37c. Details: System.ApplicationException: HTTP 500 {“Transfer-Encoding”:[“chunked”],”X-SharePointHealthScore”:[“0″],”SPClientServiceRequestDuration”:[“211″],”SPRequestGuid”:[“3d7950b2-3d9d-47d9-a5fb-588bf02b9551″],”request-id”:[“3d7950b2-3d9d-47d9-a5fb-588bf02b9551″],”X-FRAME-OPTIONS”:[“SAMEORIGIN”],”MicrosoftSharePointTeamServices”:[“15.0.0.4420″],”X-Content-Type-Options”:[“nosniff”],”X-MS-InvokeApp”:[“1; RequireReadOnly”],”Cache-Control”:[“max-age=0, private”],”Date”:[“Sun, 17 Nov 2013 22:56:19 GMT”],”Server”:[“Microsoft-IIS\/8.0″],”X-AspNet-Version”:[“4.0.30319″],”X-Powered-By”:[“ASP.NET”]} at Microsoft.Activities.Hosting.Runtime.Subroutine.SubroutineChild.Execute(CodeActivityContext context) at System.Activities.CodeActivity.InternalExecute(ActivityInstance instance, ActivityExecutor executor, BookmarkManager bookmarkManager) at System.Activities.Runtime.ActivityExecutor.ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation)

So what the hell is going on here?

Unfortunately, both of the error messages above are virtually useless for most people and the only way to dig deeper is to delve into the SharePoint ULS logs. Of course, if you use Office 365 ~~you don’t have that luxury because accessing the ULS logs are not available to you – so your best option to figuring out errors like this is trial and error~~. you need to learn CSOM code to access the logs. For the rest of us living in on-premises or IaaS land, a quick search of the ULS logs for the file name “Burger additives standards” reveals the issue and the root cause. Check out the errors reported below – it should be quite clear…

SharePoint Foundation             General                           8kh7    High        The file “http://megacorp/iso9001/Documents/Burger additives standards.pdf” is not checked out. You must first check out this document before making changes.

SharePoint Foundation             General                           aix9j    High        SPRequest.AddOrUpdateItem: UserPrincipalName=i:0).w|s-1-5-21-1480876320-1123302732-2276846122-500, AppPrincipalName=I:0I.T|MS.SP.EXT|2CC54B18-3F9D-43A3-BE55-B3A81C045562@B9A66C21-F39D-42DB-B6CD-DE520B6C1C91 ,bstrUrl=http://megacorp/iso9001 ,bstrListName={A64BB9EC-8B00-407C-A7D9-7E8E6EF3E647} , bAdd=False , bSystemUpdate=False , bPreserveItemVersion=False , bPreserveItemUIVersion=False , bUpdateNoVersion=False ,pbstrNewDocId=00000000-0000-0000-0000-000000000000 , bHasNewDocId=False , bstrVersion=8 , bCheckOut=False ,bCheckin=False , bUnRestrictedUpdateInProgress=False , bMigration=False , bPublish=False , bstrFileName=<null>

SharePoint Foundation             CSOM                              ahjq1    High        Exception occured in scope Microsoft.SharePoint.SPListItem.UpdateWithFieldValues. Exception=Microsoft.SharePoint.SPException: The file “http://megacorp/iso9001/Documents/Burger additives standards.pdf” is not checked out. You must first check out this document before making changes. —> System.Runtime.InteropServices.COMException: The file “http://megacorp/iso9001/Documents/Burger additives standards.pdf” is not checked out. You must first check out this document before making changes.     at Microsoft.SharePoint.Library.SPRequestInternalClass.AddOrUpdateItem(String bstrUrl, String bstrListName, Boolean bAdd, Boolean bSystemUpdate, Boolean bPreserveItemVersion, Boolean bPreserveItemUIVersion, Boolean bUpdateNoVersion, Int32& plID, String& pbstrGuid, Guid pbstrNewDocId, Boolean bHasNewDo…    3b84d615-e006-4457-811a-0af089963216

So in case it is not clear from the above messages, we have an error stating that the file Burger additives standards is not checked out and that to make changes to the document, we need to check it out first. This raises several questions:

1. Why is the document library configured to require check-out?
2. Why is the workflow trying to change the document anyway? The workflow we created in part 2 does no action on the Documents library.
3. Why were the error messages so useless (which really hurts in Office365 scenarios)
4. How can we fix this problem?

Let’s examine each of these in turn…

1. Why is the document library configured to require check-out?

The first question is really easy to answer. When you create a site using the Document Center site template, the document library versioning settings enable the Require Check Out option as shown below. Therefore no changes can be made to this document unless the user making the change checks it out.

2. Why is the workflow trying to change the document anyway?

The second question is also easy to answer, but the answer is somewhat more complicated. When a workflow is associated to a list or content type, it adds a column to track the status of the workflow. In SharePoint 2013, the default behaviour is to update this column with whatever stage in the workflow currently being executed. Therefore, as soon as the workflow runs, and before it has run any of the actions, it attempts to update its stage to the document that it was triggered from. So looking at the images below, if things were working we should see the string “Stage 1” in the Process Owner Approval column for the document Burger additives standards. But since the document requires check-out to make a change, SharePoint prevents this from happening by design.

Some readers who are experienced with SharePoint Designer workflow might be thinking “easy… just use the check out item workflow action before you do anything else.”Unfortunately this won’t work for you because this issue gets triggered when the stage info is written to the workflow status column which happens before any actions run.

3. Why were the error messages so useless (which really hurts in Office365 scenarios)

The next question is why on earth wasn’t the true error able to be reported back to the workflow? After all, in the end, this was a clearly identified error in SharePoint, yet all we got was error messages that did not state the problem at all. This would have saved the effort of delving deep into the ULS logs and that is not even possible in Office 365.

The answer to this question is a little more complex and relates to how Workflow Manager and SharePoint interact with each other. Without getting into detail, the gist of the issue is that Workflow Manager uses REST webservice calls to do all of its operations on SharePoint content. Each and every workflow action (like the Log to History List) uses REST to talk to SharePoint to get the work done. While a detailed discussion of REST would take us too far afield, I have drawn you possibly the dodgiest ever diagram of this process ever to help you understand it. The main point with REST worth mentioning is that the intention of REST is to embrace the key protocols of the web, so a successful or failed request is conveyed via a HTTP status code. If you have ever experienced your browser giving you a 404 page not found, then you know what I mean when I speak of HTTP status codes.

Now look again at the gory detail of the error that was logged by the workflow. We see the following string amongst all the other stuff. “Details: System.ApplicationException: HTTP 500”. So what happened is when the workflow tried to update the document with its stage, the check-out requirement resulted in SharePoint sending back a HTTP Error 500 (server error) back to the workflow manager. Unfortunately for us, it did not send back the underlying cause of the error. Instead, the details of the response logged by workflow has all sorts of information about the HTTP request, but no hint to the underlying error. Sucks eh?

4. How can we fix this problem?

The final question was what we could do about this issue. There are two relatively easy ways, but before I do that, let me show you what happens if you check out the document and then attempt to run the workflow. While you might think this problem might go away, instead we get a friendly dialog box telling us to check the document back in before we attempt to start a workflow. Given that I couldn’t run this workflow because the item needed to be checked out, I had to laugh.

Anyways, the two options you have are to disable require check out on the Documents library or change the default behaviour of the workflow so that it does not write the stage back to the item that triggered the workflow. The first one is pretty easy – in the Versioning Settings of the Documents library, we change the Require documents to be checked out before they can be edited? option from Yes to No

The problem with the first option is that requiring check-out for controlled documents is likely to be a key requirement, so it is not an option in all circumstances. So the other option is to change the behaviour of the workflow itself so that it does not write its stage information back to the Documents library. Luckily for us, this is actually really easy to do. In SharePoint Designer 2013, there is an option to disable the updating of stage information. In the workflow settings screen, look for the option called Automatically update the workflow status to the current stage name and untick it.

Now unticking this box will get us past this error, but the problem now is that we have no easy way to track workflows, as this status update can be used in views on the Documents library to see which workflows are at a particular stage. For a complex workflow, or one that will be running on many items, not being able to see the status of the workflow would make life difficult. One workaround is to add a Check Out Item workflow action to the start of the workflow, and then use the Set Workflow Status action to update the status column on the current item as the workflow progresses. This will give us back the ability to track the workflow behaviour in the Documents library, but it will mean that a new version will be added to that Document each time the status updates. So another workaround is to log workflow status to some other list altogether (using the Create List Item action) and use that list for tracking and reporting instead.

Conclusion

I think that the separation of SharePoint and Workflow Manager into separate products is ultimately a good thing. It’s just a pity that one of the legacies of this change is an issue like what we covered in this post, where a relatively simple problem was exacerbated by poor reporting of errors between Workflow Manager and SharePoint. I guess this is part of the bigger price we pay – that of increased technical complexity via more moving parts. Hopefully an issue like this one can be addressed in a future service pack or update, because it is this sort of stuff that can cause people to lose some confidence and jump to the 3rd party solution perhaps prematurely. So if this issue was enough for you to think “pah – let’s go third party”, I have news for you. As you progress through this series we are going to deal with more complex issues than this one too.

Anyway, the point is that we have identified the cause of this particular issue and gotten past it, so we should be able to continue testing the workflow and marvel at our awesomeness. So in the next article, we will continue with testing our workflow and see what else SharePoint throws at us.

Thanks for reading

Paul Culmsee

www.hereticsguidebooks.com

(16) Comments

The cloud isn’t the problem–Part 6: The pros and cons of patriotism

Tags: Amazon,Azure,Cloud compouting,Governance,IAAS,Infrastructure,Office365,Patriot Act,Risk,SaaS,Security,Strategy @ 10:13 am

Hi all and welcome to my 6th post on the weird and wonderful world of cloud computing. The recurring theme in this series has been to point out that the technological aspects of cloud computing have never really been the key issue. Instead, I feel It is everything else around the technology, ranging from immature process, through to the effects of the industry shakeout and consolidation, through to the adaptive change required for certain IT roles. To that end, in the last post, we had fun at the expense of server huggers and the typical defence mechanisms they use to scare the rest of the organization into fitting into their happy-place world of in-house managed infrastructure. In that post I made a note on how you can tell an IT FUD defence because risk averse IT will almost always try use their killer argument up-front to bury the discussion. For many server huggers or risk averse IT, the killer defence is US Patriot Act Issue.

Now just in case you have never been hit with the “…ah but what about the Patriot Act?” line and have no idea what the Patriot Act is all about, let me give you a nice metaphor. It is basically a legislative version of the “Men in Black” movies. Why Men in Black? Because in those movies, Will Smith and Tommy Lee Jones had the ability to erase the memories of anyone who witnessed any extra-terrestrial activity with that silvery little pen-like device. With the Patriot Act, US law enforcement now has a similar instrument. Best of all, theirs doesn’t need batteries – it is all done on paper.

In short, the Patriot Act provides a means for U.S. law enforcement agencies, to seek a court order allowing access to the personal records of anyone without their knowledge, provided that it is in relation to an anti-terrorism investigation. This act applies to pretty much any organisation who has any kind of presence in the USA and the rationale behind introducing it was to make it much easier for agencies to conduct terrorism investigations and better co-ordinate their efforts. After all, in the reflection and lessons learnt from the 911 tragedy, the need for for better inter-agency co-ordination was a recurring theme.

The implication of this act is for cloud computing should be fairly clear. Imagine our friendly MIB’s Will Smith (Agent J) and Tommy Lee Jones (Agent K) bursting into Google’s headquarters, all guns blazing, forcing them to hand over their customers data. Then when Google staff start asking too many questions, they zap them with the memory eraser gizmo. (Cue Tommy Lee jones stating “You never saw us and you never handed over any data to us.” )

Scary huh? It’s the sort of scenario that warms the heart of the most paranoid server hugger, because surely no-one in their right mind could mount a credible counter-argument to that sort of risk to the confidentiality and integrity of an organisations sensitive data.

But at the end of the day, cloud computing is here to stay and will no doubt grow. Therefore we need to unpack this issue and see what lies behind the rhetoric on both sides of the debate. Thus, I decided to look into the Patriot act a bit further to understand it better. Of course, it should be clear here that I am not a lawyer, and this is just my own opinions from my research and synthesis of various articles, discussion papers and interviews. My personal conclusion is that all the hoo-hah about the Patriot Act is overblown. Yet in stating this, I have to also state that we are more or less screwed anyway (and always were). As you will see later in this post, there are great counter arguments that pretty much dismantle any anti-cloud arguments that are FUD based, but be warned – in using these arguments, you will demonstrate just how much bigger this thing is beyond cloud computing and get a sense of the broader scale of the risk.

So what is the weapon?

The first thing we have to do is understand some specifics about the Patriot Act’s memory erasing device. Within the vast scope of the act, the two areas for greatest concern in relation to data is the National Security Letter and the Section 215 order. Both provide authorities access to certain types of data and I need to briefly explain them:

A National Security Letter (NSL) is a type of subpoena that permits certain law enforcement agencies to compel organisations or individuals to provide certain types of information like financial and credit records, telephone and ISP records (Internet searches, activity logs, etc). Now NSL’s existed prior to the Patriot Act, but the act loosened some of the controls that previously existed. Prior to the act, the information being sought had to be directly related a foreign power or the agent of a foreign power – thereby protecting US citizens. Now, all agencies have to do is assert that the data being sought is relevant in some way to any international terrorism or foreign espionage investigations.

Want to see what a NSL looks like? Check this redacted one from wikipedia.

A Section 215 Order is similar to an NSL in that it is an instrument that law enforcement agencies can use to obtain data. It is also similar to NSL’s in that it existed prior to the Patriot Act – except back then it was called a FISA Order – named after the Foreign Intelligence Surveillance Act that enacted it. The type of data available under a Section 215 Order is more expansive than what you can eke out of an NSL, but a Section 215 Order does require a judge to let you get hold of it (i.e. there is some judicial oversight). In this case, the FBI obtains a 215 order from the Foreign Intelligence Surveillance Court which reviews the application. What the Patriot Act did different to the FISA Order was to broaden the definition of what information could be sought. Under the Patriot Act, a Section 215 Order can relate to “any tangible things (including books, records, papers, documents, and other items).” If these are believed to be relevant to an authorised investigation they are fair game. The act also eased the requirements for obtaining such an order. Previously, the FBI had to present “specific articulable facts” that provided evidence that the subject of an investigation was a “foreign power or the agent of a foreign power.” From my reading, now there is no requirement for evidence and the reviewing judge therefore has little discretion. If the application meets the requirements of Section 215, they will likely issue the order.

So now that we understand the two weapons that are being wielded, let’s walk through the key concerns being raised.

Concern 1: Impacted cloud providers can’t guarantee that sensitive client data won’t be turned over to the US government

CleverWorkArounds short answer:

Yes this is dead-set true and it has happened already.

CleverWorkArounds long answer:

This concern stems from the “loosening” of previous controls on both NSL’s and Section 215 Orders. NSL’s for example, require no probable cause or judicial oversight at all, meaning that the FBI can issue these at their own volition. Now it is important to note that they could do this before the Patriot Act came into being too, but back then the parameters for usage was much stricter. Section 215 Orders on the other hand, do have judicial oversight, but that oversight has also been watered down. Additionally the breadth of information that can be collected is now greater. Add to that the fact that both NSL’s and Section 215 Orders almost always include a compulsory non-disclosure or “gag” order, preventing notification to the data owner that this has even happened.

This concern is not only valid but it has happened and continues to happen. Microsoft has already stated that it cannot guarantee customers would be informed of Patriot Act requests and furthermore, they have also disclosed that they have complied with Patriot Act requests. Amazon and Google are in the same boat. Google also have also disclosed that they have handed data stored in European datacenters back to U.S. law enforcement.

Now some of you – particularly if you live or work in Europe – might be wondering how this could happen, given the European Union’s strict privacy laws. Why is it that these companies have complied with the US authorities regardless of those laws?

That’s where the gag orders come in – which brings us onto the second concern.

Concern 2: The reach of the act goes beyond US borders and bypasses foreign legislation on data protection for affected providers

CleverWorkArounds short answer:

Yes this is dead-set true and it has happened already.

CleverWorkArounds long answer:

The example of Google – a US company – handing over data in its EU datacentres to US authorities, highlights that the Patriot Act is more pervasive than one might think. In terms of who the act applies to, a terrific article put out by Alex C. Lakatos put it really well when he said.

Furthermore, an entity that is subject to US jurisdiction and is served with a valid subpoena must produce any documents within its “possession, custody, or control.” That means that an entity that is subject to US jurisdiction must produce not only materials located within the United States, but any data or materials it maintains in its branches or offices anywhere in the world. The entity even may be required to produce data stored at a non-US subsidiary.

Think about that last point – “non-US subsidiary”. This gives you a hint to how pervasive this is. So in terms of jurisdiction and whether an organisation can be compelled to hand over data and be subject to a gag order, the list is expansive. Consider these three categories:

– US based company? Absolutely: That alone takes out Apple, Amazon, Dell, EMC (and RSA), Facebook, Google, HP, IBM, Symantec, LinkedIn, Salesforce.com, McAfee, Adobe, Dropbox and Rackspace
– Subsiduary company of a US company (incorporated anywhere else in the world)? It seems so.
– Non US company that has any form of US presence? It also seems so. Now we are talking about Samsung, Sony, Nokia, RIM and countless others.

The crux of this argument about bypassing is the gag order provisions. If the US company, subsidiary or regional office of a non US company receives the order, they may be forbidden from disclosing anything about it to the rest of the organisation.

Concern 3: Potential for abuse of Patriot Act powers by authorities

CleverWorkArounds short answer:

Yes this is true and it has happened already.

CleverWorkArounds long answer:

Since the Patriot Act came into place, there was a significant marked increase in the FBI’s use of National Security Letters. According to this New York Times article, there were 143,000 requests between 2003 to 2005. Furthermore, according to a report from the Justice Department’s Inspector General in March 2007, as reported by CNN, the FBI was guilty of “serious misuse” of the power to secretly obtain private information under the Patriot Act. I quote:

The audit found the letters were issued without proper authority, cited incorrect statutes or obtained information they weren’t supposed to. As many as 22% of national security letters were not recorded, the audit said. “We concluded that many of the problems we identified constituted serious misuse of the FBI’s national security letter authorities,” Inspector General Glenn A. Fine said in the report.

The Liberty and Security Coalition went into further detail on this. In a 2009 article, they list some of the specific examples of FBI abuses:

– FBI issued NSLs when it had not opened the investigation that is a predicate for issuing an NSL;
– FBI used “exigent letters” not authorized by law to quickly obtain information without ever issuing the NSL that it promised to issue to cover the request;
– FBI used NSLs to obtain personal information about people two or three steps removed from the subject of the investigation;
– FBI has used a single NSL to obtain records about thousands of individuals; and
– FBI retains almost indefinitely the information it obtains with an NSL, even after it determines that the subject of the NSL is not suspected of any crime and is not of any continuing intelligence interest, and it makes the information widely available to thousands of people in law enforcement and intelligence agencies.

Concern 4: Impacted cloud providers cannot guarantee continuity of service during investigations

CleverWorkArounds short answer:

Yes this is dead-set true and it has happened already.

CleverWorkArounds long answer:

An oft-overlooked side effect of all of this is that other organisations can be adversely affected. One aspect of cloud computing scalability that we talked about in part 1 is that of multitenancy. Now consider a raid on a datacenter. If cloud services are shared between many tenants, innocent tenants who had nothing whatsoever to do with the investigation can potentially be taken offline. Furthermore, the hosting provider may be gagged from explaining to these affected parties what is going on. Ouch!

An example of this happening was reported in the New York TImes in mid 2011 and concerned Curbed Network, a New York blog publisher. Curbed, along with some other companies, had their service disrupted after an F.B.I. raid on their cloud providers datacenter. They were taken down for 24 hours because the F.B.I.’s raid on the hosting provider seized three enclosures which, unfortunately enough, included the gear they ran on.

Ouch! Is there any coming back?

As I write this post, I wonder how many readers are surprised and dismayed by my four risk areas. The little security guy in me says If you are then that’s good! It means I have made you more aware than you were previously which is a good thing. I also wonder if some readers by now are thinking to themselves that their paranoid server huggers are right?

To decide this, let’s now examine some of the the counter-arguments of the Patriot Act issue.

Rebuttal 1: This is nothing new – Patriot Act is just amendments to pre-existing laws

One common rebuttal is that the Patriot Act legislation did not fundamentally alter the right of the government to access data. This line of argument was presented in August 2011 by Microsoft legal counsel Jeff Bullwinkel in Microsoft Australia’s GovTech blog. After all, it was reasoned, the areas frequently cited for concern (NSL’s and Section 215/FISA orders) were already there to begin with. Quoting from the article:

In fact, U.S. courts have long held that a company with a presence in the United States is obligated to respond to a valid demand by the U.S. government for information – regardless of the physical location of the information – so long as the company retains custody or control over the data. The seminal court decision in this area is United States v. Bank of Nova Scotia, 740 F.2d 817 (11th Cir. 1984) (requiring a U.S. branch of a Canadian bank to produce documents held in the Cayman Islands for use in U.S. criminal proceedings)

So while the Patriot Act might have made it easier in some cases for the U.S. government to gain access to certain end-user data, the right was always there. Again quoting from Bullwinkel:

The Patriot Act, for example, enabled the U.S. government to use a single search warrant obtained from a federal judge to order disclosure of data held by communications providers in multiple states within the U.S., instead of having to seek separate search warrants (from separate judges) for providers that are located in different states. This streamlined the process for U.S. government searches in certain cases, but it did not change the underlying right of the government to access the data under applicable laws and prior court decisions.

Rebuttal 2: Section 215’s are not often used and there are significant limitations on the data you can get using an NSL.

Interestingly, it appears that the more powerful section 215 orders have not been used that often in practice. The best article to read to understand the detail is one by Alex Lakatos. According to him, less than 100 applications for section 215 orders were made in 2010. He says:

In 2010, the US government made only 96 applications to the Foreign Intelligence Surveillance Courts for FISA Orders granting access to business records. There are several reasons why the FBI may be reluctant to use FISA Orders: public outcry; internal FBI politics necessary to obtain approval to seek FISA Orders; and, the availability of other, less controversial mechanisms, with greater due process protections, to seek data that the FBI wants to access. As a result, this Patriot Act tool poses little risk for cloud users.

So while section 215 orders seem less used, NSL’s seem to be used a dime a dozen – which I suppose is understandable since you don’t have to deal with a pesky judge and all that annoying due process. But the downside of NSL’s from a law enforcement point of view is that the the sort of data accessible via the NSL is somewhat limited. Again quoting from Lakatos (with emphasis mine):

While the use of NSLs is not uncommon, the types of data that US authorities can gather from cloud service providers via an NSL is limited. In particular, the FBI cannot properly insist via a NSL that Internet service providers share the content of communications or other underlying data. Rather [.] the statutory provisions authorizing NSLs allow the FBI to obtain “envelope” information from Internet service providers. Indeed, the information that is specifically listed in the relevant statute is limited to a customer’s name, address, and length of service.

The key point is that the FBI has no right to content via an NSL. This fact may not stop the FBI from having a try at getting that data anyway, but it seems that savvy service providers are starting to wise up to exactly what information an NSL applies to. This final quote from the Lakato article summarises the point nicely and at the same time, offers cloud providers a strategy to mitigate the risk to their customers.

The FBI often seeks more, such as who sent and received emails and what websites customers visited. But, more recently, many service providers receiving NSLs have limited the information they give to customers’ names, addresses, length of service and phone billing records. “Beginning in late 2009, certain electronic communications service providers no longer honored” more expansive requests, FBI officials wrote in August 2011, in response to questions from the Senate Judiciary Committee. Although cloud users should expect their service providers that have a US presence to comply with US law, users also can reasonably ask that their cloud service providers limit what they share in response to an NSL to the minimum required by law. If cloud service providers do so, then their customers’ data should typically face only minimal exposure due to NSLs.

Rebuttal 3: Too much focus on cloud data – there are other significant areas of concern

This one for me is a perverse slam-dunk counter argument that puts the FUD defence of a server hugger back in its box. The reason it is perverse is that it opens up the debate that for some server huggers, may mean that they are already exposed to the risks they are raising. You see, the thing to always bear in mind is that the Patriot Act applies to data, not just the cloud. This means that data, in any shape or form is susceptible in some circumstances if a service provider exercises some degree of control over it. When you consider all the applicable companies that I listed earlier in the discussion like IBM, Accenture, McAfee, EMC, RIM and Apple, you then start to think about the other services where this notion of “control” might come into play.

What about if you have outsourced your IT services and management to IBM, HP or Accenture? Are they running your datacentres? Are your executives using Blackberry services? Are you using an outsourced email spam and virus scanning filter supplied by a security firm like McAfee? Using federated instant messaging? Performing B2B transactions with a US based company?

When you start to think about all of the other potential touch-points where control over data is exercised by a service provider, things start to look quite disturbing. We previously established that pretty much any organisation with a US interest (whether US owned or not), falls under Patriot Act jurisdiction and may be gagged from disclosing anything. So sure. . .cloud applications are a potential risk, but it may well be that any one of these companies providing services regarded as “non cloud” might receive an NSL or section 215 order with a gag provision, ordering them to hand over some data in their control. In the case of an outsourced IT provider, how can you be sure that the data is not straight out of your very own datacenter?

Rebuttal 4: Most other countries have similar laws

It also turns out that many other jurisdictions have similar types of laws. Canada, the UK, most countries in the EU, Japan and Australia are some good examples. If you want to dig into this, examine Clive Gringa’s article on the UK’s Regulation of Investigatory Powers Act 2000 (RIPA) and an article published by the global law firm Linklaters (a SharePoint site incidentally), on the legislation of several EU countries.

In the UK, RIPA governs the prevention and detection of acts of terrorism, serious crime and “other national security interests”. It is available to security services, police forces and authorities who investigate and detect these offenses. The act regulates interception of the content of communications as well as envelope information (who, where and when). France has a bunch of acts which I won’t bore you too much with, but after 911, they instituted act 2001-1062 of 15 November 2001 which strengthens the powers of French law enforcement agencies. Now agencies can order anyone to provide them with data relevant to an inquiry and furthermore, the data may relate to a person other than the one being subject to the disclosure order.

The Linklaters article covers Spain and Belgium too and the laws are similar in intent and power. They specifically cite a case study in Belgium where the shoe was very much on the other foot. US company Yahoo was fined for not co-operating with Belgian authorities.

The court considered that Yahoo! was an electronic communication services provider (ESP) within the meaning of the Belgian Code of Criminal Procedure and that the obligation to cooperate with the public prosecutor applied to all ESPs which operate or are found to operate on Belgian territory, regardless of whether or not they are actually established in Belgium

I could go on citing countries and legal cases but I think the point is clear enough. Smile

Rebuttal 5: Many countries co-operate with US law enforcement under treaties

So if the previous rebuttal argument that other countries have similar regimes in place is not convincing enough, consider this one. Lets assume that data is hosted by a major cloud services provider with absolutely zero presence in, or contacts with, the United States. There is still a possibility that this information may still be accessible to the U.S. government if needed in connection with a criminal case. The means by which this can happen is via international treaties relation to legal assistance. These are called Mutual Assistance Legal Treaties (MLAT).

As an example, US and Australia have had a longstanding bilateral arrangement. This provides for law enforcement cooperation between the two countries and under this arrangement, either government can potentially gain access to data located within the territory of the other. To give you an idea of what such a treaty might look like consider the scope of the Australia-US one. The scope of assistance is wide and I have emphasised the more relevant ones:

(a) taking the testimony or statements of persons;
(b) providing documents, records, and other articles of evidence;
(c) serving documents;
(d) locating or identifying persons;
(e) transferring persons in custody for testimony or other purposes;
(f) executing requests for searches and seizures and for restitution;
(g) immobilizing instrumentalities and proceeds of crime;
(h) assisting in proceedings related to forfeiture or confiscation; and
(i) any other form of assistance not prohibited by the laws of the Requested State.

For what its worth, if you are interested in the boundaries and limitations of the treaty, it states that the “Central Authority of the Requested State may deny assistance if”:

(a) the request relates to a political offense;
(b) the request relates to an offense under military law which would not be an offense under ordinary criminal law; or
(c) the execution of the request would prejudice the security or essential interests of the Requested State.

Interesting huh? Even if you host in a completely independent country, better check the treaties they have in place with other countries.

Rebuttal 6: Other countries are adjusting their laws to reduce the impact

The final rebuttal to the whole Patriot Act argument that I will cover is that things are moving fast and countries are moving to mitigate the issue regardless of the points and counterpoints that I have presented here. Once again I will refer to an article from Alex Lakatos, who provides a good example. Lakatos writes that the EU may re-write their laws to ensure that it would be illegal for the US to invoke the Patriot Act in certain circumstances.

It is anticipated, however, that at the World Economic Forum in January 2012, the European Commission will announce legislation to repeal the existing EU data protection directive and replace it with more a robust framework. The new legislation might, among other things, replace EU/US Safe Harbor regulations with a new approach that would make it illegal for the US government to invoke the Patriot Act on a cloud-based or data processing company in efforts to acquire data held in the European Union. The Member States’ data protection agency with authority over the company’s European headquarters would have to agree to the data transfer.

Now Lakatos cautions that this change may take a while before it actually turns into law, but nevertheless is something that should be monitored by cloud providers and cloud consumers alike.

Conclusion

So what do you think? Are you enlightened and empowered or confused and jaded? Smile

I think that the Patriot Act issue is obviously a complex one that is not well served by arguments based on fear, uncertainty and doubt. The risks are real and there are precedents that demonstrate those risks. Scarily, it doesn’t make much digging to realise that those risks are more widespread than one might initially consider. Thus, if you are going to play the Patriot Act card for FUD reasons, or if you are making a genuine effort to mitigate the risks, you need to look at all of the touch points where service provider might exercise a degree of control. They may not be where you think they are.

In saying all of this, I think this examination highlights some strategy that can be employed by cloud providers and cloud consumers alike. Firstly, If I were a cloud provider, I would state my policy about how much data will be given when confronted by an NSL (since that has clear limitations). Many providers may already do this, so to turn it around to the customer, it is incumbent on cloud consumers to confirm with the providers as to where they stand. I don’t know if there is that much value in asking a cloud provider if they are exempt from the reach of the Patriot Act. Maybe its better to assume they are affected and instead, ask them how they intend to mitigate their customers downlevel risks.

Another obvious strategy for organisations is to encrypt data before it is stored on cloud infrastructure. While that is likely not going to be an option in a software as a service model like Office 365, it is certainly an option in the infrastructure and platform as a service models like Amazon and Azure. That would reduce the impact of a Section 215 Order being executed as the cloud provider is unlikely going to have the ability to decrypt the data.

Finally (and to sound like a broken record), a little information management governance would not go astray here. Organisations need to understand what data is appropriate for what range of cloud services. This is security 101 folks and if you are prudent in this area, cloud shouldn’t necessarily be big and scary.

Thanks for reading

Paul Culmsee

www.hereticsguidebooks.com

www.sevensigma.com.au

p.s Now do not for a second think this article is exhaustive as this stuff moves fast. So always do your research and do not rely on an article on some guys blog that may be out of date before you know it.

(0) Comments

The cloud is not the problem-Part 4: Industry shakeout and playing with the big kids…

Hi all

Welcome to the fourth post about the adaptive change that cloud computing is going to have on practitioners, paradigms and organisations. The previous two posts took a look at some of the dodgier side of two of the industries biggest players, Microsoft and Amazon. While I have highlighted some dumb issues with both, I nevertheless have to acknowledge their resourcing, scalability, and ability to execute. On that point of ability to execute, in this post we are going to expand a little towards the cloud industry more broadly and the inevitable consolidation that is, and will continue to take place.

Now to set the scene, a lot of people know that in the early twentieth century, there were a lot of US car manufacturers. I wonder if you can take a guess at the number of defunct car manufacturers there have been before and after that time.

…Fifty?

…One Hundred?

Not even close…

What if I told you that there were over 1700!

Here is another interesting stat. The table below shows the years where manufacturers went bankrupt or ceased operations. Below that I have put the average shelf life of each company for that decade.

Year	1870’s	1880’s	1890’s	1900’s	1910’s	1920’s	1930’s	1940’s	1950’s	1960’s	1970’s	1980’s	1990’s	2000’s	2010’s
# defunct	4	2	5	88	660	610	276	42	13	33	11	5	5	3	5
avg years in operation	5	1	1	3	3	4	5	7	14	10	19	37	16	49	42

Now, you would expect that the bulk of closures would be depression era, but note that the depression did not start until the late 1920’s and during the boom times that preceded it, 660 manufactures went to the wall – a worse result!

The pattern of consolidation

What I think the above table shows is the classic pattern of industry consolidation after an initial phase of innovation and expansion, where over time, the many are gobbled by the few. As the number of players consolidate, those who remain grow bigger, with more resources and economies of scale. This in turn creates barriers to entry for new participants. Accordingly, the rate of attrition slows down, but that is more due to the fact that there are fewer players in the industry. Those that are left continue to fight their battles, but now those battles take longer. Nevertheless, as time goes on, the number of players consolidate further.

If we applied a cloud/web hosting paradigm to the above table, I would equate the dotcom bust of 2000 with the depression era of the 1920’s and 1930’s. I actually think with cloud computing, we are in the 1960’s and on right now. The largest of the large players have how made big bets on the cloud and have entered the market in a big, big way. For more than a decade, other companies hosted Microsoft technology, with Microsoft showing little interest beyond selling licenses via them. Now Microsoft themselves are also the hosting provider. Does that mean most the hosting providers will have the fate of Netscape? Or will they manage to survive the dance with Goliath like Citrix or VMWare have?

For those who are not Microsoft or Amazon…

Imagine you have been hosting SharePoint solutions for a number of years. Depending on your size, you probably own racks or a cage in some-one else’s data centre, or you own a small data centre yourself. You have some high end VMWare gear to underpin your hosting offerings and you do both managed SharePoint (i.e. offer a basic site collection subscription with no custom stuff – ala Office 365) and you offer dedicated virtual machines for those who want more control (ala Amazon). You have dutifully paid your service provider licensing to Microsoft, have IT engineers on staff, some SharePoint specialists, a helpdesk and some dodgy sales guys – all standard stuff and life is good. You had a crack at implementing SharePoint multi tenancy, but found it all a bit too fiddly and complex.

Then Amazon comes along and shakes things up with their IaaS offerings. They are cost competitive, have more data centres in more regions, a higher capacity, more fault tolerance, a wider variety of services and can scale more than you can. Their ability to execute in terms of offering new services is impossible to keep up with. In short, they slowly but relentlessly take a chunk of the market and continue to grow. So, you naturally counter by pushing the legitimate line that you specialise in SharePoint, and as a result customers are in much more trusted hands than Amazon, when investing on such a complex tool as SharePoint.

But suddenly the game changes again. The very vendor who you provide cloud-based SharePoint services for, now bundles it with Exchange, Lync and offers Active Directory integration (yeah, yeah, I know there was BPOS but no-one actually heard of that). Suddenly the argument that you are a safer option than Amazon is shot down by the fact that Microsoft themselves now offer what you do. So whose hands are safer? The small hosting provider with limited resources or the multinational with billions of dollars in the bank who develops the product? Furthermore, given Microsoft’s advantage in being able to mobilise its knowledge resources with deep product knowledge, they have a richer managed service offering than you can offer (i.e. they offer multi tenancy :).

This puts you in a bit of a bind as you are getting assailed at both ends. Amazon trumps you in the capabilities at the IaaS end and is encroaching in your space and Microsoft is assailing the SaaS end. How does a small fish survive in a pond with the big ones? In my opinion, the mid-tier SharePoint cloud providers will have to reinvent themselves.

The adaptive change…

So for the mid-tier SharePoint cloud provider grappling with the fact that their play area is reduced because of the big kids encroaching, there is only one option. They have to be really, really good in areas the big kids are not good at. In SharePoint terms, this means they have to go to places many don’t really want to go: they need to bolster their support offerings and move up the SharePoint stack.

You see, traditionally a SharePoint hosting provider tends to take two approaches. They provide a managed service where the customer cannot mess with it too much (i.e. Site collection admin access only). For those who need more than that, they will offer a virtual machine and wipe their hands of any maintenance or governance, beyond ensuring that the infrastructure is fast and backed up. Until now, cloud providers could get away with this and the reason they take this approach should be obvious to anyone who has implemented SharePoint. If you don’t maintain operational governance controls, things can rapidly get out of hand. Who wants to deal with all that “people crap”? Besides, that’s a different skill set to typical skills required to run and maintain cloud services at the infrastructure layer.

So some cloud providers will kick and scream about this, and delude themselves into thinking that hosting and cloud services are their core business. For those who think this, I have news for you. The big boys think these are their core business too and they are going to do it better than you. This is now commodity stuff and a by-product of commoditisation is that many SharePoint consultancies are now cloud providers anyway! They sign up to Microsoft or Amazon and are able to provide a highly scalable SharePoint cloud service with all the value added services further up the SharePoint stack. In short, they combine their SharePoint expertise with Microsoft/Amazon’s scale.

Now on the issue of support, Amazon has no specific SharePoint skills and they never will. They are first and foremost a compelling IaaS offering. Microsoft’s support? … go and re-read part 2 if you want to see that. It seems that no matter the big multinational, level 1 tech support is always level 1 tech support.

So what strategies can a mid-tier provider take to stay competitive in this rapidly commoditising space. I think one is to go premium and go niche.

Provide brilliant support. If I call you, day or night, I expect to speak to a SharePoint person straight away. I want to get to know them on a first name basis and I do not want to fight the defence mechanism of the support hierarchy.
Partner with SharePoint consultancies or acquire consulting resources. The latter allows you to do some vertical integration yourself and broaden your market and offerings. A potential KPI for any SharePoint cloud provider should be that no support person ever says “sorry that’s outside the scope of what we offer.”
Develop skills in the tools and systems that surround SharePoint or invest in SharePoint areas where skills are lacking. Examples include Project Server, PerformancePoint, integration with GIS, Records management and ERP systems. Not only will you develop competencies that few others have, but you can target particular vertical market segments who use these tools.
(Controversial?) Dump your infrastructure and use Amazon in conjunction with another IaaS provider. You just can’t compete with their scale and price point. If you use them you will likely save costs, when combined with a second provider you can play the resiliency card and best of all … you can offer VPC 🙂

Conclusion

In the last two posts we looked at some of the areas where both Microsoft and Amazon sometimes struggle to come to grips with the SharePoint cloud paradigm. In this post, we took a look at other cloud providers having to come to grips with the SharePoint cloud paradigm of having to compete with these two giants, who are clearly looking to eke out as much value as they can from the cloud pie. Whether you agree with my suggested strategy (Rackspace appears to), the pattern of the auto industry serves as an interesting parallel to the cloud computing marketplace. Is the relentless consolidation a good thing? Probably not in the long term (we will tackle that issue in the last post in this series). In the next post, we are going to shift our focus away from the cloud providers themselves, and turn our gaze to the internal IT departments – who until now, have had it pretty good. As you will see, a big chunk of the irrational side of cloud computing comes from this area.

Thanks for reading

Paul Culmsee

www.sevensigma.com.au

(4) Comments

The cloud is not the problem–Part 3: When silos strike back…

Tags: Active Directory,Amazon,Cloud compouting,EC2,Governance,IAAS,Infrastructure,Networking,Performance,planning,Risk,SaaS,Security,SharePoint,VPC,VPN @ 12:27 pm

What can Ikea fails tell us about cloud computing?

My next door neighbour is a builder. When he moved next door, the house was an old piece of crap. Within 6 months, he completely renovated it himself, adding in two bedrooms, an underground garage and all sorts of cool stuff. On the other hand, I bought my house because it was a good location, someone had already renovated it and all we had to do was move in. The reason for this was simple: I had a new baby and more importantly, me and power tools do not mix. I just don’t have the skills, nor the time to do what my neighbour did.

You can probably imagine what would happen if I tried to renovate my house the way my neighbour did. It would turn out like the Ikea fails in the video. Similarly, many SharePoint installs tend to look similar to the video too. Moral of the story? Sometimes it is better to get something pre-packaged than to do it yourself.

In the last post, we examined the “Software as a Service” (SaaS) model of cloud computing in the form of Office 365. Other popular SaaS providers include SlideShare, Salesforce, Basecamp and Tom’s Planner to name a few. Most SaaS applications are browser based and not as feature rich or complex as their on-premise competition. Therefore the SaaS model is that its a bit like buying a kit home. In SaaS, no user of these services ever touches the underlying cloud infrastructure used to provide the solution, nor do they have a full mandate to tweak and customise to their hearts content. SaaS is basically predicated on the notion that someone else will do a better set-up job than you and the old 80/20 rule about what features for an application are actually used.

Some people may regard the restrictions of SaaS as a good thing – particularly if they have dealt with the consequences of one too many unproductive customization efforts previously. As many SharePointer’s know, the more you customise SharePoint, the less resilient it gets. Thus restricting what sort of customisations can be done in many circumstances might be a wise thing to do.

Nevertheless, this actually goes against the genetic traits of pretty much every Australian male walking the planet. The reason is simple: no matter how much our skills are lacking or however inappropriate tools or training, we nevertheless always want to do it ourselves. This brings me onto our next cloud provider: Amazon, and their Infrastructure as a Service (IaaS) model of cloud based services. This is the ultimate DIY solution for those of us that find SaaS to cramping our style. Let’s take a closer look shall we?

Amazon in a nutshell

Okay, I have to admit that as an infrastructure guy, I am genetically predisposed to liking Amazon’s cloud offerings. Why? well as an infrastructure guy, I am like my neighbour who renovated his own house. I’d rather do it all myself because I have acquired the skills to do so. So for any server-hugging infrastructure people out there who are wondering what they have been missing out on? Read on… you might like what you see.

Now first up, its easy for new players to get a bit intimidated by Amazon’s bewildering array of offerings with brand names that make no sense to anybody but Amazon… EC2, VPC, S3, ECU, EBS, RDS, AMI’s, Availability Zones – sheesh! So I am going to ignore all of their confusing brand names and just hope that you have heard of virtual machines and will assume that you or your tech geeks know all about VMware. The simplest way to describe Amazon is VMWare on steroids. Amazon’s service essentially allows you to create Virtual Machines within Amazon’s “cloud” of large data centres around the world. As I stated earlier, the official cloud terminology that Amazon is traditionally associated is called Infrastructure as a Service (IaaS). This is where, instead of providing ready-made applications like SaaS, a cloud vendor provides lower level IT infrastructure for rent. This consists of stuff like virtualised servers, storage and networking.

Put simply, utilising Amazon, one can deploy virtual servers with my choice of operating system, applications, memory, CPU and disk configuration. Like any good “all you can eat” buffet, one is spoilt for choice. One simply chooses an Amazon Machine Image (AMI) to use as a base for creating a virtual server. You can choose one of Amazon’s pre-built AMI’s (Base installs of Windows Server or Linux) or you can choose an image from the community contributed list of over 8000 base images. Pretty much any vendor out there who sells a turn-key solution (such as those all-in-one virus scanning/security solutions) has likely created an AMI. Microsoft have also gotten in on the Amazon act and created AMI’s for you, optimised by their product teams. Want SQL 2008 the way Microsoft would install it? Choose the Microsoft Optimized Base SQL Server 2008R2 AMI which “contains scripts to install and optimize SQL Server 2008R2 and accompanying services including SQL Server Analysis services, SQL Server Reporting services, and SQL Server Integration services to your environment based on Microsoft best practices.”

The series of screen shots below shows the basic idea. After signing up, use the “Request instance wizard” to create a new virtual server by choosing an AMI first. In the example below, I have shown the default Amazon AMI’s under “Quick start” as well as the community AMI’s.

Amazons default AMI’s

Community contributed AMI’s

From the list above, I have chosen Microsoft’s “Optimized SQL Server 2008 R2 SP1” from the community AMI’s and clicked “Select”. Now I can choose the CPU and memory configurations. Hmm how does a 16 core server sound with 60 gig of RAM? That ought to do it… 🙂

Now I won’t go through the full description of commissioning virtual servers, but suffice to say that you can choose which geographic location this server will reside within Amazon’s cloud and after 15 minutes or so, your virtual server will be ready to use. It can be assigned a public IP address, firewall restricted and then remotely managed as per any other server. This can all be done programmatically too. You can talk to Amazon via web services start, monitor, terminate, etc. as many virtual machines as you want, which allows you to scale your infrastructure on the fly and very quickly. There are no long procurement times and you then only pay for what servers are currently running. If you shut them down, you stop paying.

But what makes it cool…

Now I am sure that some of you might be thinking “big deal…any virtual machine hoster can do that.” I agree – and when I first saw this capability I just saw it as a larger scale VMWare/Xen type deployment. But when really made me sit up and take notice was Amazon’s Virtual Private Cloud (VPC) functionality. The super-duper short version of VPC is that it allows you extend your corporate network into the Amazon cloud. It does this by allowing you to define your own private network and connecting to it via site-to-site VPN technology. To describe how it works, diagrammatically check out the image below.

Let’s use an example to understand the basic idea. Let’s say your internal IP address range at your office is 10.10.10.0 to 10.10.10.255 (a /24 for the geeks). With VPC you tell Amazon “I’d like a new IP address range of 10.10.11.0 to 10.10.11.255” . You are then prompted to tell Amazon the public IP address of your internet router. The screenshots below shows what happens next:

The first screenshot asks you to choose what type of router is at your end. Available choices are Cisco, Juniper, Yamaha, Astaro and generic. The second screenshot shows you a sample configuration that is downloaded. Now any Cisco trained person reading this will recognise what is going on here. This is the automatically generated configuration to be added to an organisations edge router to create an IPSEC tunnel. In other words, we have extended our corporate network itself into the cloud. Any service can be run on such a network – not just SharePoint. For smaller organisations wanting the benefits of off-site redundancy without the costs of a separate datacenter, this is a very cost effective option indeed.

For the Cisco geeks, the actual configuration is two GRE tunnels that are IPSEC encrypted. BGP is used for route table exchange, so Amazon can learn what routes to tunnel back to your on-premise network. Furthermore Amazon allows you to manage firewall settings at the Amazon end too, so you have an additional layer of defence past your IPSEC router.

This is called Virtual Private Cloud (VPC) and when configured properly is very powerful. Note the “P” is for private. No server deployed to this subnet is internet accessible unless you choose it to be. This allows you to extend your internal network into the cloud and gain all the provisioning, redundancy and scalability benefits without exposure to the internet directly. As an example, I did a hosted SharePoint extranet where we use SQL log shipping of the extranet content databases back to the a DMZ network for redundancy. Try doing that on Office365!

This sort of functionality shows that Amazon is a mature, highly scalable and flexible IaaS offering. They have been in the business for a long time and it shows because their full suite of offerings is much more expansive than what I can possibly cover here. Accordingly my Amazon experiences will be the subject of a more in-depth blog post or two in future. But for now I will force myself to stop so the non-technical readers don’t get too bored. 🙂

So what went wrong?

So after telling you how impressive Amazon’s offering is, what could possibly go wrong? Like the Office365 issue covered in part 2, absolutely nothing with the technology. To understand why, I need to explain Amazon’s pricing model.

Amazon offer a couple of ways to pay for servers (called instances in Amazon speak). An on-demand instance is calculated based on a per-hour price while the server is running. The more powerful the server is in terms of CPU, memory and disk, the more you pay. To give you an idea, Amazon’s pricing for a Windows box with 8CPU’s and 16GB of RAM, running in Amazon’s “US east” region will set you back $0.96 per hour (as of 27/12/11). If you do the basic math for that, it equates to around $8409 per year, or $25228 over three years. (Yeah I agree that’s high – even when you consider that you get all the trappings of a highly scalable and fault tolerant datacentre).

On the other hand, a reserved instance involves making a one-time payment and in turn, receive a significant discount on the hourly charge for that instance. Essentially if you are going to run an Amazon server on a 24*7 basis for more than 18 months or so, a reserved instance makes sense as it reduces considerable cost over the long term. The same server would only cost you $0.40 per hour if you pay an up-front $2800 for a 3 year term. Total cost: $13312 over three years – much better.

So with that scene set, consider this scenario: Back at the start of 2011, a client of mine consolidated all of their SharePoint cloud services to Amazon from a variety of other another hosting providers. They did this for a number of reasons, but it basically boiled down to the fact they had 1) outgrown the SaaS model and 2) had a growing number of clients. As a result, requirements from clients were getting more complicated and beyond that which most of the hosting providers could cater for. They also received irregular and inconsistent support from their existing providers, as well as some unexpected downtime that reduced confidence. In short, they needed to consolidate their cloud offering and manage their own servers. They were developing custom SharePoint solutions, needed to support federated claims authentication and required disaster recovery assurance to mitigate the risk of going 100% cloud. Amazon’s VPC offering in particular seemed ideal, because it allowed full control of the servers in a secure way.

Now making this change was not something we undertook lightly. We spent considerable time researching Amazon’s offerings, trying to understand all the acronyms as well as their fine print. (For what its worth I used IBIS as the basis to develop an assessment and the map of my notes can be found here). As you are about to see though, we did not check well enough.

Back when we initially evaluated the VPC offering, it was only available in very few Amazon sites (two locations in the USA only) and the service was still in beta. This caused us a bit of a dilemma at the time because of the risk of relying on a beta service. But we were assured when Amazon confirmed that VPC would eventually be available in all of of their datacentres. We also stress tested the service for a few weeks, it remained stable and we developed and tested a disaster recovery strategy involving SQL log shipping and a standby farm. We also purchased reserved instances from Amazon since these servers were going to be there for the long haul, so we pre-paid to reduce the hourly rates. Quite a complex configuration was provisioned in only two days and we were amazed by how easy it all was.

Things hummed along for 9 months in this fashion and the world was a happy place. We were delighted when Amazon notified us that VPC had come out of beta and was now available in any of Amazon’s datacentres around the world. We only used the US datacentre because it was the only location available at the time. Now we wanted to transfer the services to Singapore. My client contacted Amazon about some finer points on such a move and was informed that they would have to pay for their reserved instances all over again!

What the?

It turns out, reserved instances are not transferrable! Essentially, Amazon were telling us that although we paid for a three year reserved instance, and only used it for 9 months, to move the servers to a new region would mean we have to pay all over again for another 3 year reserve. According to Amazon’s documentation, each reserved instance is associated with a specific region, which is fixed for the lifetime of the reserved instance and cannot be changed.

“Okay,” we answer, “we can understand that in circumstances where people move to another cloud provider. But in our case we were not.” We had used around 1/3rd of the reserved instance. So surely Amazon should pro-rata the unused amount, and offer that as a credit when we re-purchase reserved instances in Singapore? I mean, we will still be hosting with Amazon, so overall, they will not be losing any revenue al all. On the contrary, we will be paying them more, because we will have to sign up for an additional 3 years of reserve when we move the services.

So we ask Amazon whether that can be done. “Nope,” comes back the answer from amazons not so friendly billing team with one of those trite and grossly insulting “Sorry for any inconvenience this causes” ending sentences. After more discussions, it seems that internally within Amazon, each region or datacentre within each region is its own profit centre. Therefore in typical silo fashion, the US datacentre does not want to pay money to the Singapore operation as that would mean the revenue we paid would no longer recognised against them.

Result? Customer is screwed all because the Amazon fiefdoms don’t like sharing the contents of the till. But hey – the regional managers get their bonuses right? Sad smile

Conclusion

Like part 2 of this cloud computing series, this is not a technical issue. Amazon’s cloud service in our experience has been reliable and performed well. In this case, we are turned off by the fact that their internal accounting procedures create a situation that is not great for customers who wish to remain loyal to them. In a post about the danger of short termism and ignoring legacy, I gave the example of how dumb it is for organisations to think they are measuring success based on how long it takes to close a helpdesk call. When such a KPI is used, those in support roles have little choice but to try and artificially close calls when users problems have not been solved because that’s how they are deemed to be performing well. The reality though is rather than measure happy customers, this KPI simply rewards which helpdesk operators have managed to game the system by getting callers off the phone as soon as they can.

I feel that Amazon are treating this is an internal accounting issue, irrespective of client outcomes. Amazon will lose the business of my client because of this since they have enough servers hosted where the financial impost of paying all over again is much more than transferring to a different cloud provider. While VPC and automated provisioning of virtual servers is cool and all, at the end of the day many hosting providers can offer this if you ask them. Although it might not be as slick with fancy as Amazon’s automated configuration, it nonetheless is very doable and the other providers are playing catch-up. Like Apple, Amazon are enjoying the benefits of being first to market with their service, but as competition heats up, others will rapidly bridge the gap.

Thanks for reading

Paul Culmsee

(1) Comment