Consolidate static cluster silos into a dynamic, shared HPC data center

Derive greater value from existing high-performance technical computing resources with IBM Platform Cluster Manager – Advanced Edition

Email the article  Download pdf

TCIPCM1When researchers and engineers receive funding for a special project, it often results in the purchase of new servers. However, over time this can lead to static cluster silos spread out across an organization. Fighting “cluster sprawl” can be a major challenge for organizations that have already made a significant investment in their high-performance technical computing infrastructures, but lack the technology to effectively pool them together and manage them.

What if there was a way to better leverage these investments by consolidating them? Imagine being able to dynamically and automatically provision a cluster when you need one—based on sharing plans, quotas and policies—and not in weeks or days, but in minutes.

IBM® Platform Cluster Manager – Advanced Edition allows organizations to consolidate silos of cluster resources into a shared pool creating a high-performance computing (HPC) cloud, and provides a centralized management portal to simplify administration. Platform Cluster Manager provides integrations with other IBM Technical Computing and analytics products to achieve a fast time to deployment and usage. These products include IBM Platform™ LSF®—a powerful workload management platform for demanding, distributed technical computing environments—and IBM Platform Symphony, which accelerates dozens of parallel applications for faster results and better utilization of all available resources.

Cluster resources are shared by consolidating servers from multiple HPC silos into a single pool, creating a larger data center overall. Multiple clusters are deployed into this data center, and their size can be determined by request, entitlement or dynamically by policy. Clusters can grow and shrink over time based on policies that measure metrics and calendars, and through administrator actions. The advantage is that the servers are moved from cluster to cluster by automated provisioning and reprovisioning without the need to move machines or change networking. Platform Cluster Manager provides the ability to dynamically assign resources where they are needed, as they are needed.

Some cluster owners may be reluctant to share their HPC infrastructure because they think they will be giving up control of their precious resources. Platform Cluster Manager eliminates this concern by establishing sharing policies and resource limits to assure access to resources and enabling users to grow their cluster to larger sizes as they use available idle resources.

Setting up and managing multiple clusters is daunting for many administrators, as the task requires a specialized set of skills and can be both costly and time-consuming. To save time, IBM has created blueprints that define these clusters and accelerate their deployment. Platform Cluster Manager enables a cluster to be deployed with a few clicks and also provides users and administrators with a portal to manage their accounts, see the status of their resources and administer them.

With Platform Cluster Manager, cluster setup that used to take weeks is now fully automated, allowing clusters to be application-ready in minutes. The entire cluster configuration is captured in a blueprint that describes everything needed to deploy a fully functioning cluster, from the OS to the workload manager, applications and configuration. This process works equally well to reprovision already-deployed servers and to onboard new bare-metal machines. To illustrate, an end user or administrator can browse through a service catalog, select an application and request a number of servers. The system will verify the requestor’s entitlement, ensure the availability of the servers and then install the required operating system, middleware and user applications.

Time-to-results is also enhanced with the flexibility provided by Platform Cluster Manager – Advanced Edition. Organizations can now deliver on technical computing and big data analytic computing needs from a single environment—whereas in the past, separate clusters were assigned for different types of workloads. Now, clusters can be leveraged, reprovisioned and resized based on workload requirements.

With its management portal for provisioning, monitoring, alerting and troubleshooting, Platform Cluster Manager – Advanced Edition reduces overhead and increases operational efficiency. The management portal enhances administrative and user productivity, expanding the ability to meet and exceed service levels and lower operating costs. The secure dashboard allows account holders to manage existing resources and request additional resources as needed to support their workloads, which enables users to accelerate performance.

Moreover, by avoiding the cost of having idle cluster resources, organizations can better manage their existing investment in HPC clusters by allocating underutilized machines to busy workloads. Most importantly, the pooling and provisioning of clusters and the redistribution of machines are transparent to the end users. With Platform Cluster Manager – Advanced Edition, HPC administrators finally have an easy-to-use solution to stop cluster sprawl and boost operational efficiency while avoiding potential interdepartmental conflicts.

A lesson learned early in life applies equally to HPC clusters: sharing is good. When we share, everyone wins.

For more information, visit: