Bringing Jobs to the Cloud…

I never met Steve Jobs in person, but upon hearing of his passing I sensed a loss that I could not immediately explain. It took a bit of reflection to understand the feeling, and then more to put it to words.

What I will miss most is not tied to the wonderful gadgets he willed into existence, though I am grateful for every one of them. Nor is it his irreplaceable artistry, although he will no doubt be written into history with the likes of William Shakespeare, Michelangelo, Michael Jordan and the Beatles. Rather, it is his presence, both as an inspiration and a source of guidance. Steve embodied common sense in a world overrun with personal agendas and bureaucracy.

Based on the legend of Steve originating from the Valley, I crafted an image of him in my mind. I used that image for inspiration and motivation, giving me hope about the future of the world. Having since read the many recollections of personal experiences from people who knew him well, it seems that the image I created was not far from reality. The recollections also confirmed the loss I had sensed – that one of the guiding stars in my sky had been extinguished.

The Steve I imagined had a rare quality that allowed him to be honest with himself and true to his inner being. This type of honesty is the hardest to maintain – coming from a deep understanding of personal values combined with a quest for motivation and fluency with human nature. He also possessed an inner compass that indicated which decisions were correct and recognized that the road less traveled was a harder path. He answered the harder path’s challenge with courage, resolve and intellect – simply stepping past fear in the face of uncertainty and risk, thereby redefining impossible as possible. No matter how difficult it was to follow his inner compass, he always did what he thought should be done.

The Steve I imagined also viewed respect as a term encompassing pride, honor, integrity, and dignity. He took great pride in everything he touched and believed he should be credited for his accomplishments. Above all, he executed and delivered without fail, and did not rest until everything was the way he had imagined it.

He recognized immediately that a downside to achieving his personal best was to risk offending those around him who couldn’t keep up or were threatened by a change to the status quo. However, he viewed the cost of being polite as an unnecessary pacifier for a lesser world – only delaying the inevitable until someone displayed the courage to offend those who thought themselves the moderators of mankind. Thus, he surrounded himself with other ‘A-type’ personalities and held them accountable to reaching their potential. When their actions aligned with purpose, they generated passion. With passion, they developed focus, and when they were focused, they had power. Powerful action created great accomplishments and reshaped our world. Simply, people followed Steve because he was ahead of us all, not because he was in need of an entourage. He was headed someplace other people wanted to go and he seemed to know how to get there. He didn’t tell us how to live. If we liked what he did, then good for us, if we didn’t, we were all free to find our own path.

What I will miss about Steve is more than the art that only he could produce. I will miss knowing that he is out there fighting for what he thinks is right. He was very much like a parent figure to the technology industry, taking the responsibility to challenge bad governance, see past short-term thinking and “Think Different”. When so much seems to be going wrong, Steve was a beacon of rightness. I will miss the assurance in knowing he’s focused on causes that need to be championed, quietly taking on challenges without concern for personal sacrifice – all according to his personal compass. What I will miss most is the opportunity to personally tell him “Thank you for all you did.” I recognize that we had a guardian angel and that it may be needed more now than ever.

Cloud computing is a completely new direction for businesses, from both a consumer side and a supplier side, and we must “Think Different” about everything we know. In the cloud, there are bureaucracies to be circumnavigated, impossibilities to be disproved, and a lesser world to be avoided. While Steve armed us with his philosophy and opened our minds to new possibilities, the responsibility now rests on each of us to step up and reach for our potential and, in our own way, dent the universe.

Original blog post found at HPC in the Cloud.

New Season, Same Umbrella

One of the largest perceived barriers to adoption of cloud computing is the concept of security. Based on countless discussions with companies interested in adopting a cloud model, it is clear that many want to achieve the economic promise of cloud but are struggling to figure out how to use a multi-tenant, virtual environment in a way they are comfortable with, given the security concerns of their respective companies.

From an enterprise perspective, most companies are much slower to adopt change based on the amount of established process and policy around existing solutions (change implies cost). In that, one of the barriers that is getting in the way is how different cloud is from what most companies have today. Different means that companies are not as confident securing the new solution, but also different means additional cost to make it work. And while we will all agree that Google and Amazon are clouds, it does not imply that cloud is Google and Amazon.

What I mean by this is that there are many definitions of what cloud is, and while the Google and Amazon offerings are both very strong representations of a cloud solution, that does not limit the definition of cloud to be what Google and Amazon offer (and their offerings get more broad in definition every day). What each consumer needs to figure out is what solution they need, what parameters they are comfortable with (this is where security sits), and what the price needs to be for the solution to be interesting.

We have had several conversations with infrastructure service providers who are more than happy to make additional infrastructure available to companies as an extension of the customers existing infrastructure (They turn the entire system over to the customer, un-configured, and place it in a private VLAN. The customer loads their OS. The customer loads their configuration. The customer integrates the system into their cluster as they see fit), and charge the customer for the time the system is configured on the customer network. Additionally, there are software packages out there (look for “hybrid cloud” keywords) that will help acquire, configure and burst into these extra resources. Because these are complete systems and not virtual machines, customers feel more comfortable that this model is not a change from what they are doing today.

That would be one approach that would imply very little change on the consumer side and therefore minimize cost and additional security exposure. If there were still concerns about cloud resources, an additional set of steps that could be taken would involve classifying the data into security classifications (very typical security practice that may already be implemented) and specifically leverage cloud resources for only workloads that use public datasets (identify cloud-eligible workloads).

Cloud is an opportunity. Not only do companies get to realize economic benefit over time, but they also get to take advantage of emerging standards and innovations in the field of security that are evolving because of cloud. As we spend cycles adapting to cloud and retooling legacy applications into cloud-consumable footprints, they become eligible for the new security capabilities that are being designed and built for cloud. As standards are developed, certifications will become available, and then measurement and auditing will become available at a solution layer instead of at the specific implementation layer. This will help to drive the cost of security lower across the industry and, even better, allow for much more security for the same cost as today.

In summary, find a solution that minimizes change. Cloud is an opportunity to improve economic position and flexibility, and over time, improve performance and security. The more similar that we can make cloud infrastructures to the enterprise infrastructures we have today, the more comfortable customers will be with using cloud from a security perspective, and by minimizing change, we minimize the cost of transitioning to cloud, making it a viable solution for more customers sooner.

Original blog post found at HPC in the Cloud.

One Step Closer to Clouds for EDA Industry

Having just attended Synopsys User Group (SNUG) in San Jose, there was a nice little surprise for those of us interested in cloud computing related to EDA. In his keynote opening on Monday morning, Synopsys CEO Aart de Geus announced that Synopsys has a cloud offering available for customers to use – TODAY.

While most of us were aware that Synopsys has been looking at cloud, I am not sure that anyone was expecting his statement “We’re open for business”.

The specifics of the offering are still sparse, but the announcement by de Geus and subsequent presentations by David Hsu (see his blog on this here) indicate that the offering is for burst capacity related to VCS workloads.

While this is targeted at a very specific offering to begin with, it seems like a very appropriate target. VCS jobs include verification regressions, and can make up 30%-40% of the overall workload executed on an EDA compute cluster. The data sets involved (input and output) are finite and reasonable, so this would be a prime candidate for cloud use.

This has the potential to significantly offload the internal cluster resources, giving companies time to figure out how cloud works and what their strategy is for leveraging cloud beyond this. In addition, it gives them some breathing room on their ever expanding datacenter crunch, which is causing a great deal of pain today.

In Hsu’s presentation, he detailed out the process Synopsys went through for testing and evaluating cloud and the work they had did to make the extension into the cloud (using Amazon presently) as seamless to the engineers as possible. Also during David’s presentation, Synopsys invited Qualcomm’s Mike Broxterman to discuss his evaluation of the capability and the results seen by the Qualcomm team using the solution how a customer would use it. Both were well presented, and seemed to be well received.

The solution looks to be well researched by both Synopsys and the customers who participated in the POC phase, and this is probably the first of many steps to move EDA into the Cloud.

Blog originally posted at HPC in the Cloud.

Key Trends on the Horizon for HPC Clouds in 2011

As we begin a new year, it’s time to take a look at the current environment and try to determine the shape of what’s to come. So to kick off 2011, let’s begin with some predictions of things to come in 2011 for HPC and the computing industry in general.

One-year predictions however, must be about things that are already underway, so these may not be “predictions” to everyone, but these can, at the very least, serve as commentary on the shape of things to come.

While it would take a novel-length manifesto to go into depth about the following points, we can at least identify a host of key trends to watch as 2011 unfolds.

Infrastructure Management Evolution

  • Infrastructures are reaching a capacity and complexity point that manual intervention no longer makes sense. Much like we don’t write machine code to execute programs, we will generate policy, and policy will execute low level constructs at the system level to implement changes
  • Events that were previously not policy criteria begin to be integrated into management schema. Storage performance tiers (IOPS, throughput, capacity), compute performance tiers (over clocked, latest generation, core width, GPGPU), network performance tiers (IB, 10GbE, 1GbE), contiguous memory footprint are examples of criteria integrated into policy decision trees.
  • Virtualization performance inefficiencies are partially addressed, allowing HPC environments to consider use of virtualization.

Network Evolution

  • WAN bandwidth approaches commodity pricing
  • Usage models evolve to thin clients with data hosted from centralized, consolidated datacenters. While this may seem to not directly apply to the HPC market (most HPC shops have been doing thin client for years), the efforts and research in this space may enable very different futures provided sufficient success.
  • Significant efforts are invested in data centric cached distribution models.
  • Thin client software will evolve to present a more real time experience, narrowing the gap between remote execution and local execution performance.
  • Device aware content delivery progresses. There will be work invested in sensing the configuration of the client device to determine the quality of the experience.

Compute Evolution

  • Overclocking goes mainstream. This year we will see a couple different flavors of overclocked solutions emerge to allow a premium performance option on the compute front.
  • Larger variety of performance tiers. There will need to be accommodations on the provisioning side to allow for computational performance specification (overclocked, bin1, bin2, GPGPU) and prioritization.
  • DRAM capacities catch up (finally), but sill lag on the performance side.

Storage Evolution

  • SSD technologies integrate into enterprise storage solutions. This will add a performance component that has been sorely missing in spindle based solutions.
  • File systems start to look at performance characteristics and capacities of components as a storage decision criteria. Additional work will be invested in historical tracking of access patterns in order to fully flush out this capability.
  • Storage solution start taking part in policy based solutions. Policies will enable real time creation of cache copies of oversubscribed data sets, will constrain workload use of saturated file server resources, will migrate data to higher capacity, lower performing storage based on policy at a file, directory, or volume level.

Business Evolution

  • IT is recognized as a business enabler. Business will reassess how IT is funded, staffed, and reports
  • Continued growth of infrastructure drives reassessment of acquisition and management practices (getting too big, too complex with linear growth).
  • Internal IT organizations will evolve to address transforming into a management function, looking to outsource significant portions of technology consumption.
  • Purpose built clouds emerge to address specific business vectors. Over time, consolidation of these clouds can occur to accomplish additional cost benefit with the guidance of customer businesses.

This rounds out the list of what to watch in 2011 and provides some insight about some of the emerging trends in this rapidly-evolving space. While some of these movements may be well underway, we can expect to see greater maturation of clouds as a whole this year—for high-performance computing and beyond.

Blog originally posted at HPC in the Cloud.

Cloud Control: Outsourcing an HPC Cluster

So, thus far in this series of posts, we have discussed the following issues:

  • IT is not a core competency of the business, so we should look to outsource if we can outsource without jeopardizing the business.
  • We should look to cloud computing to bring costs under control and to deliver cost efficiencies over time, not as an immediate cost reduction activity.
  • In order to outsource IT, we must trust the suppliers and vendors involved, which means developing relationships, not better bludgeoning weapons. And we have already done an extremely similar divestiture in our past, so we have a model to look at that says it can be done successfully

Now we need to talk about what an organization would need to look like in order to properly manage the outsourcing of your HPC cluster. So what would that look like? Well, we should assume that all technical and operational capabilities necessary to execute the infrastructure are included in the outsource. The supplier is expected to provide the entirety of the technical function and carry out all operational duties. That is not to say that the customer is off the hook technically, just the opposite. The customer needs to assemble a small team of technically savvy, business minded (specific to the core product of your company) individuals to measure and manage the outsource. This team needs to be very strong technically in order to vet and gauge any available technologies for potential use as well as identify flaws in solutions or methodology of solutions delivered. The size of the team would be dependent on the size of the company (and therefore the size of the outsource).

Functionally, the outsource management team is the control point for the outsourcing of your infrastructure. Through this group, you maintain control over your infrastructure, and therefore can have full trust in your outsource partner (because you know exactly what you want, and you know how to measure if you are getting it). The intent of this team is to stay abreast of the constantly changing needs of the business, understand the continuously evolving capabilities of technology, and combine the two awareness’ to understand how the company should be leveraging technology to maximize benefit to the business and control costs. With that combined awareness, you now hold the outsource accountable for delivering an appropriate solution to your company’s need.

This is not to say that all responsibility falls to the customer outsource team. The supplier will need to have a disciplined focus in the specific space that your company does business, and be innovating their solution to specifically solve the problems of that industry. If they do not, then they will probably not be a cost competitive, viable supplier long term.

You will see many functions that fall under the customer outsource team. And remember, this team needs to remain small in order to avoid paying too much for your solution. There will be a constant loop for the outsource team to:

  1. Quantitatively measure the current solution
  2. Analyze cost and benefits of the current solution
  3. Assessment of best practices
  4. Revision of current solution
  5. Loop back to 1

There will be several technical responsibilities that the outsource team will participate in jointly. The supplier should be doing most of this work for the customer, but how do you know if the data they are presenting is 100% accurate or appropriate for your solution. When in doubt, the outsource team will generate their own data, and share that data with the supplier to derive a more accurate solution. In that, the outsource team will do some amount of, but not every facet of:

  • Technical and cost Benchmarking
  • Technical advisory / liaison (IT industry to customer business)
  • Technical architects – Designing architecture of applications and services that are appropriate for the company’s consumption

There are many responsibilities of the outsource team that will fall into the relationship management arena. This team will be the primary point of contact and control between the customer and the supplier, and I can’t say enough how important having a positive relationship with the supplier is to the quality of the product you consumer or the price you pay for that product. The outsource team will be responsible for communicating current and future requirements to the supplier, and many of those will take on the form of Service Level Agreements (SLAs) which we will talk about in a moment. The outsource team will also be responsible for how technology is being consumed by the customer company. The outsource team needs to make sure the company is getting the appropriate solution from the supplier company at an appropriate price with appropriate constraints / limitations / boundaries.

Another very important responsibility of the outsource team will be to maintain flexibility from a quality of solution as well as a cost perspective. In this, staying standards based is very important. It is not an absolute requirement, there may be solutions that are proprietary that solve a problem much more efficiently or cost effectively. What you need to consider in this case is when the vendor thinks they have you locked in, and start raising the price because they think you can’t get out of their solution, what is your plan for defeating them. So, where possible, use industry standards so that you can move from vendor to vendor without losing time, money, or critical features. Where that is not possible, what is the plan for using one vendor’s proprietary solution but being able to migrate to another vendor’s solutions without impact to maintain negotiating position.

Finally, there is the new component to infrastructure management. The outsource team will need to learn how to define and measure service level agreements (SLAs). The definition stage will have several components. What is the service level expectation (defines success and failure criteria)? This will sometimes have many different components for a single solution. An example would be storage: is there enough capacity, do we get enough IOPs, and is there enough throughput. All of these are different measurements, but critical to a storage infrastructure for HPC. How will this service level be measured and how often? We have all seen many improper SLA measurements where IT informs the engineer that they have 99.997% availability of the environment, but the engineer knows that there were several outages that had him or her non-productive for days at a time. So do you measure component level availability or solution level? How frequently are the polls for availability? Is availability the right measure? This is all part of the definition. And then, what happens when a failure criteria is met? This is where there is a lot of work happening in the industry. It is not sufficient to refund the months colo fees when a power outage cost the company 6 weeks worth of work. There is a cost to failure, and that is usually very specific to the industry. An outage on a cluster for an EDA company has different implications than an outage to a scientific computing cluster for a university. The recourse needs to be negotiated based impact. Does this at all sound familiar? Any insurance people reading this?? Well, that is one of the solutions the industry is exploring, is having insurance policies behind the supplier. Finally, we need to look at how service levels re-assessed over time. As the technology evolves, so should the service levels.

The fabless semiconductor industry is fairly mature in it’s process for outsourcing the fabrication function. They have cost models and laws (Rock’s Law for cost of a fab over time) that help decision processes, they have a collaborative (FSA) for arriving at better process, and they have an established track record that this can be accomplished very successfully and with cost benefit. The HPC Cloud industry needs to mature. That will just take time.

Original blog post found at HPC in the Cloud.