The Cost of HPC in the Cloud

To continue where we left off with the last blog, this time we are trying to understand cost. When we start considering cloud, the primary driver seems to be economic (cost), thus we need to make sure we address any cost-related barriers associated with adoption of cloud as well as ensure that our expectations are honest and appropriate.

Given that we are talking about HPC, this implies that compute is important to the business in some significant way. So whether your business is in media, EDA, oil & gas, biosciences, pharmaceuticals, financial analysis, or some other computationally intensive field, figuring out how to provide HPC services more efficiently will have an impact on the delivery of the core business product and the bottom line. Because cloud is a combination of people, process, and technology , we should hold off talking about reducing the cost of hardware until later in the blog, and focus on having the appropriate amount of hardware and increasing the efficiency of usage, provisioning, management, etc. We should also focus on linking into the core business, using EDA as an example, the cost of EDA Licenses are a higher order driver that is facilitated by the infrastructure. And finally, what are the consumers of the infrastructure sacrificing because they don’t have enough capacity, resources or lack access to a specific technology or capability?

In the big picture, cloud computing is outsourcing significant portions of what was once IT functions and resources. Some companies have been able to very successfully outsource, and have been happy with their decision and relationship. Others however, have not had such good experiences, and we want to address those negative legacy perception. For any people who survived a bad outsource experience, those bad memories become hurdles that we as an industry must navigate if we intend cloud computing to be successful. Those perceptions come in the form of additional complexity, higher costs, and a disconnect in responsibility.

When we talk about additional complexity, we are commonly referring to inserted complexity from contract structures. One of the smartest people I know once said that managing anything through contracts is no way to run a business. It needs to be about developing relationships, and if we resort to debating points in the contract, the relationship has already broken down, and we should look to mutually vacate that relationship. That is not to say that a contract serves no purpose. It is a statement of intent, and a testimony to the seriousness of the risk one or both of the parties is placing his or her business, reputation, and livelihood in. Contracts are like tactical nukes – it is very important to have them as a mutual deterrent for bad behavior, but using it means we have all lost. Many times in outsource scenarios, the provider wants to charge for any effort not specifically defined in the contract. Technology is a fast paced, evolving world. Things are going to change. In this scenario, focus turns to the contract, or the “rules” and policy, and gets in the way of what it should be about – delivering value and results to the end user.

When we talk about higher costs, we are really talking about additional cost, reduced control, and little or no benefit. In an outsource model, many times the resources are the same or similar to what was in-house originally, only there is less direct control over those resources, and we now have to pay for an additional layer of management from the outsource provider. One of the benefits of using an outsource model is that the provider is part of a larger eco-system, and can deliver a larger variety of resources or different skillsets than what the consumer company could normally provide on their own. If this dynamic is not met by the provider, then are we really gaining anything in exchange for what we gave up. The outsource supplier needs to BE a larger ecosystem. And always the filter of “could this be done better, cheaper, or faster in-house” needs to be passed. We will discuss this point more in the blog on “changes to organizational structure”. Being part of a larger eco-system allows benefits that would be difficult for an isolated entity to achieve:

  • Consuming top tier processors for all workloads, trading them for the next generation processor when they are available.
  • Access to domain experts at fractional cost in non-dedicated fashion
  • Access to very large capacity, paying based on use
  • Ability to share cost burden of special resources across multiple customers at different times

The final element is the disconnect in responsibility, where the outsourcer is focused on the business of outsourcing, or the responsibilities of IT, and losing sight of the core business of the customer. Responsibility to the business is lost in translation. For the customer to feel comfortable relinquishing control of their infrastructure, they need to know that the service provider feels responsible for the success of the customer business and knows the intimate details of how the business works well enough to properly determine how technology could help, and then proactively drive technology to the benefit of the customer. This is an important point. It is not sufficient for the provider to know technology and IT really well, assuming applications and implementations are less relevant. If the customer needs to know his business AND drive the service provider for what he/she needs from technology to help the business, then the customer will continue to own IT in-house, because that cheaper, easier, and faster for the customer. In this new world order (cloud), the service provider needs to be a trusted domain expert advisor to the customer about the customer’s business, which means not just IT. Service providers should not bring in additional distractions, and need to have responsibility as part of the equation

  • Maintenance is a series of constant upgrades. Design for change.
  • Infrastructure cost should be established as a run rate, not a series of one time buys
  • The Infrastructure needs to have an equal seat at the company table (see future blog on control)
  • A proper implementation of cloud computing implies changes to organizational structure for control purposes (see future blog on control)

Once we have addressed historical barriers to adoption based on cost, we also need to make sure we appropriately represent the cost benefits that cloud computing promises. Cost savings associated with cloud computing may not be what you think. Cloud is an opportunity to change people, process, and technology. If you are not open to changes on all three fronts, we will not be able to achieve the best value proposition. Just changing technology will result in the same results we have seen in the past (a large source of frustration with IT). We do not expect to get what we use for less money, but the process of getting and using the resource can be made more efficient and therefore the overall solution would be less expensive.

There is a long term opportunity for cost reduction, but it entails some re-education. The hardware industry has been delivering to the Moore’s Law equation (“2X more every 2 years for the same price” or “1/2 the price every 2 years for the same thing”) for the last 30 years, so the customer’s expectations is that solutions related to hardware fall on that curve as well. As software becomes a larger part of the solution, we have some correcting to do to match the expectations of the customers (and probably not in the direction to make the software industry happy).

Given that we are talking about HPC implies that we are working with a growth oriented workload, so our goal would be to get more value for the same money as the infrastructure grows and evolves out of necessity. In addition, there are lots of things that we are not getting to today that we would like to / should do. Having access to more capacity and different resources would go a long way to correcting the need. Leveraging cloud can eliminate operational activities, making time for design activities. And if we standardize those designs, we improve our negotiating position. The focus should be on differentiating the business, address customer issues and customer of customer issues. More efficient execution in any portion of the process frees up resources to do more in other areas.

Delivery on cloud computing will generate another long term cost benefit. Cost reductions will not be immediate though. Costs will reduce over time as consumption process matures and based on evolution. There will also be a “smushing” effect, much like what happened with the hardware industry, where components at one time were allowed to be priced individually, but then they became component parts of a single system, and the component vendors had to compete for their piece of the pie that was the total system cost. Cloud consumption will also drive commoditization of component resources in a similar way. Over time, cloud will have the effect of cost reduction of solutions, but not in a simplistic equation. With cloud, we will see costs abstracted to a requirements level, with consumers agreeing to pay for an expected result, and service providers absorbing the responsibility of delivering that expected result, or paying pre-negotiated indemnity. The service provider will then be in the position to provide that service with any combination of technologies they choose, and will be held accountable for meeting the agreed to performance metrics (what, not how). Things like the OS, virtualization technologies, monitoring, provisioning, will become part of the end solution. Customers will be buying services, not products, so name brands start to mean less. Features and capabilities are up to the cloud supplier to provide however they like, as long as the service level (requirement) is met. Customers will want a service and a price appropriate to the service level. This will take a little time because it is essentially building competitors to what exists today, and existing vendors will also begin competing at these new price points.

We will eventually achieve the cost benefit that is envisioned as the advantage of cloudd computing. But we will need to make that a reality by creating the market through demand, and then holding the suppliers accountable for delivering the solutions necessary at prices that are appropriate.

Original blog post found at HPC in the Cloud.

HPC, the Cloud, and Core Competency

What does HPC have to do with cloud computing? Well, given that HPC environments are constantly growing, consume large quantities of fairly generic compute resources, and have both peaks and valleys in workload profiles, it would seem that HPC would be the perfect candidate for cloud computing, if only we could get past the barriers to adoption.

What I would like to do is present a series of blogs, intended to be a philosophical framing, not a technical roadmap, that will show why HPC is the perfect consumer of cloud computing. These blogs will be broken up into distinct topics in an attempt to create a logical progression aimed at having a common frame of reference. The initial set of blogs will address the barriers to adoption as follows:

  1. Ego – IT as a core competency
  2. Cost – getting more value for the same money
  3. Trust – a historical lesson
  4. Control – changes to organizational structure
  5. Security – perspectives on internal security
  6. Performance – realities of simultaneous optimization theory

Once we frame the barriers, we can then discuss incremental steps to get to value:

  1. Cloud enablement – transforming your environment, internal private cloud
  2. Private Clouds – external private clouds
  3. Hybrid Configurations – leveraging public clouds for appropriate workloads
  4. Public clouds – where and when they may make sense

This is the intended general direction, but I reserve the right to deviate based on input from the forum, any needed clarification, or recalibration necessary to stay true to intent of the site.

Having said that, let’s move into the first topic of discussion, IT as a core competency.

Companies need IT to be executed competently, and to control IT direction, but IT is not the primary product of the company (IT companies aside), and therefore should not be considered a core competency. We can debate tying into the primary function (core business) of the company as a criteria to determine core competency, but I believe it goes to the investment decision process of the company leadership. The primary drivers for the business revolve around delivering product to customers, development of new markets, and customer relationship management. When given the option of where to invest critical resources and assets in the business, executive management will be driven to primarily invest in the direction of the core business, and minimize expenses around all other aspects of the running of the business. Core competency would imply sufficient investment to differentiate the business from the rest of the world.

Further reinforcement of these concepts can be seen by looking at where IT is accounted for within the business. Quite commonly, IT is accounted as an SG&A function. This places it into the “overhead” bucket and it gets to compete with facilities, HR, accounting, purchasing, and all other groups that make up the SG&A bucket for the company in order to get resources. I only say this to frame the mindset of financial decisions. Given that companies are measured by how well they control expenses in SG&A (SG&A as a function of revenue), and that many of the components of the SG&A bucket are fixed or based on headcount, you then start to see that budgets for IT are scrutinized with a control oriented mindset, optimized on the cost variable. The R&D side is usually the “spend money to make money” side of the house, where SG&A is driven to control or even cut costs. Having said that, I have also not met anyone who can flip between these mindsets.

In order to control costs as much as possible and to get as much value out of what is spent in the area of IT, most companies take the approach to limit change and hire resources with a breadth of skills as compared to a depth of skills in a specific area. They will attempt to limit change in order to get maximum value out of existing assets, maximize the ability to automate, and to minimize the quantity of personnel required to manage. By limiting change like this though, it defeats the ability of technology to ultimately deliver maximized value. Also, by limiting change, the organization is really promoting a philosophy of maintenance instead of development, and in doing that, many times symptoms will be addressed (just patch it up) instead of the root cause.

Additionally, by hiring generalists, the business accomplishes many things, like having the ability to solve any problem in the environment while minimizing overhead staff in addition to having the ability to have fault tolerance in personnel resources (people can take vacations, get sick, or leave for another position). The downside is that many times these generalist resources are attempting to mange the infrastructure, but lack the experience on new technologies that are brought in to properly manage them(they have not have the opportunity to gain experience). Solutions that they develop or integrate are more prone to configuration or design mistakes (doing it for the first time), are many times less efficient solutions than what is possible (not optimized), and are not designed to scale into the future with technologies that are not yet available to solve problems that have yet to surface. And finally, the complexity of the environment is growing faster than the capacity of the organization.

This is not to say that internal IT organizations are not excellent, that the personnel are not very talented, or that these organization don’t bring great value to the companies they work for. The only point is that there is more value that could be achieved, and that the company does not (and should not) invest in this function like they do the core product(s) of the company. How many times have we all sat in on meetings listening vendors explain to us what the “perfect solution” is, and knowing that they are right because we thought of it a long time ago, but just have not had the time, funding, resources, and priority to go execute that “perfectly”. Cloud computing has the promise to grant us access to that optimized, “perfect” solution, and next time, we will talk about getting that solution for the same price we are paying for IT today…

Original blog post found at HPC in the Cloud.