Teamwork Key to Meeting Tight Development Deadline
With over 500 HPC deployments completed, the X-ISS team knows there is only one real constant – each cluster is unique and presents its own set of challenges. A recent deployment of a new 1400-node Dell cluster was no different. The client had a tight deadline and despite the size, deployment, configuration and testing were to be completed in only seven weeks.
“We take great pride in the HPC expertise our personnel bring to every project,” said X-ISS President and CEO Deepak Khosla. “In this case, however, the ability of our expert staff to work together as a coordinated team was the key to bringing this project to fruition in the allotted time period.”
The client, a large manufacturer, selected Dell to design and build a new cluster with 1400 InfiniBand-connected compute nodes for deployment into an existing HPC environment. Although the new cluster would operate separately from the others, it would share existing storage capacity. Dell partnered with X-ISS to install and configure the cluster because of X-ISS’s reputation and skills with the technologies involved.
The assignment required validation testing before handing it over in phases to internal client personnel tasked with installation of applications. X-ISS was also asked to set up a separate testing environment for the client to experiment new graphical user interfaces for the platform cluster manager.
X-ISS assigned three network engineers to the project, each with extensive experience in Dell systems and HPC software stack technology. Although all three were rarely at the client site at the same time, they relied on best practices, standardized deployment templates and other tools to guide their work and ensure consistency, regardless of who was performing any given task.
Deploying the Cluster
Already a sophisticated user of HPC technology, the client specified the use of xCAT software as its cluster management system of choice. X-ISS’s extensive experience with this robust tool dictated the utilization of custom scripts to standardize the configuration of all the nodes. The scripts also automated and accelerated the configuration process, reducing by approximately 90 percent the time required to configure 1400 nodes manually.
Ensuring consistent configuration of nodes is crucial to the efficiency of an InfiniBand-connected system. The advantage of InfiniBand is that it enables compute nodes to communicate at a much higher rate than otherwise possible. But for this data interconnect to operate with maximum efficiency, the nodes themselves must be tightly synchronized. Just one node running the wrong firmware version or loaded with different settings can slow down the entire cluster, which is why the xCAT scripts were so important.
Another advantage of the xCAT scripts is they served as templates to standardize the overall deployment process. With three X-ISS network engineers assigned to the project, they traveled to the client site in shifts, often staying for a week or two. Regardless of who was onsite working on the deployment, the results were consistent because of the standardized details written into the scripts.
“As our engineers traveled between the client site and X-ISS headquarters in Houston, they remained in communication with each other,” said Khosla. “We set up remote access to the client cluster so our team members could assist each other directly whether they were onsite or not.”
Meeting the Deadline
Once the xCAT installation was completed, the team created deployment images for the compute and infrastructure nodes. These images allowed the compute nodes to be rapidly deployed throughout the process. Later, an additional image was created for visualization nodes.
At the request of the client, X-ISS released the cluster in phases as multiple independent compute environments. During the deployment of each individual environment, the team validated the BIOS and firmware versions on the compute node. A tool was used to enforce BIOS settings on the nodes, again for system-wide consistency.
Finally, the X-ISS engineers ran high performance LINPACK (HPL) benchmark tests on the cluster to help identify and resolve any issues related to misconfigurations or hardware failures typical in such large setups. Many minor hardware issues occur during shipping and can be easily sorted out between the engineers and the hardware vendor during installation.
In many projects, X-ISS installs and validates application software, but this client maintained an internal team that performed that work. The internal group also ran a series of its own tests on the new system before putting it into full production mode. The deployment and configuration done by X-ISS passed all acceptance testing by the client – all within the seven-week deadline.
Download this case study: Teamwork.CaseStudy12
X-ISS Removes Bottlenecks between Windows Desktops and GlusterFS Samba Servers
A provider of seismic data processing services to the oil & gas industry was experiencing slower-than-expected data transfer rates between its Windows desktop users and the GlusterFS Samba high-availability storage solution recently installed for its new Linux HPC cluster. X-ISS quickly diagnosed the cause of the bottleneck and improved data transfer speeds by more than 10X.
The seismic processing company has built its business on the quick turnaround of completed projects. In a typical job, the company receives enormous raw data files from its customer. A single project may include 100 unprocessed data files each in the 10 to 100 GB size range. Copying new data sets from the Windows desktops to the new cluster storage was sometimes taking days instead of hours.
“The slow transfer speed was unacceptable to the seismic processing firm because they often have to process the raw data, perform analysis and deliver end products within a few days,” said X-ISS president Deepak Khosla. “To meet client demands, data transfer rates had to be accelerated to minutes or hours.”
The firm had installed the GlusterFS to provide low-latency, high-throughput primary storage for the Linux cluster while allowing access to the cluster file system for Windows users. The Linux cluster routinely maintained file transfer rates of 900 Mbps, but when the Windows systems tried sending data, the transfer rate began at 25 Mbps and quickly slowed to just 2 to 5 Mbps.
X-ISS was called in to assess the problem and remove the bottleneck that was causing Windows desktop users to experience slow transfer to cluster storage. The X-ISS team first reviewed the specifications for the network hardware and configuration of the Gluster storage nodes, network switches, and Windows computers. It became readily apparent that several factors, all correctable, were contributing to the transfer slowdown. X-ISS recommended and then implemented a plan to fix the problem.
New Uplink, Reset Default Parameters
The first issue was the uplink switch between the networks. The Linux HPC cluster and the GlusterFS servers were connected to a 10 GbE Ethernet switch. The 10 GbE switch was connected via a 1 GbE link to a 1 GbE switch that drove the client’s Windows network. X-ISS upgraded the 1GbE link to 10GbE with a 10 GbE expansion module and checked the transfer speed.
The Iperf benchmarking tool measured the raw transfer speed at about 700 Mbps, but the Windows desktop transfer speed remained impaired. During the test, it copied at a peak of 25 Mbps and then fell to 5 Mbps. More tweaking had to be done.
After validating the network configurations against the desired network topology with particular attention paid to VLAN routing, the X-ISS team focused on changing default settings throughout the network which were set to handle small volumes of data, not the huge files sizes that were the norm for the client. The settings had to be changed to give priority to the network.
First, X-ISS examined how the file system was being used and modified performance settings in GlusterFS to optimize memory utilization. Default settings established for use of moderate file sizes were changed to minimize the IO wait time, a more optimal setting for Gluster usage of large sequential files.
Next, the team updated the Samba configurations for Windows 8. The default configurations were set to support the widest common denominator of Windows file transfer methods but with several performance features disabled that could be used with client’s Windows 8 operating system. Most notably, the team increased Samba buffers and enabled SMB2 protocol.
X-ISS then turned its attention to tuning the Windows TCP interface parameters. Again, the default settings there assumed the user’s network environment was likely similar to a home or small office situation where 100 Mb Ethernet networks are common. Unfortunately, that assumption didn’t hold true for the client’s 10 Gb network. The team changed the Netsh parameters for the TCP interface controlling heuristics to allow the Windows desktops to respond to the speed of the network.
With the fine tuning completed, X-ISS again ran performance tests on the system. The Iperf test showed an increase in the network to between 800 and 900 Mbps. More importantly, the Windows desktop file copy speed to the GlusterFS storage increased more than 10-fold to a rate of 150-200 Mbps. The client’s data files were soon moving between the desktops and the cluster at speeds needed to keep their customers happy with on time product delivery.
Download this case study: GlusterFS.CaseStudy10
X-ISS Creates Affordable, Efficient Remote 3D Visualization Solution
Visualization of 3D graphics can be an enormous benefit for organizations that deal with large volumes of visualization data and whose users are geographically dispersed. One such example is a company that provides seismic data processing services to oil & gas clients. As a long-time HPC services provider to the oil & gas industry, X-ISS was asked to develop and implement a Remote 3D Visualization solution that was both efficient and affordable for one of their Houston-based clients. They wanted to make it easy for their customers and partners to collaborate on large image models.
HPC technology is frequently relied upon today by oil & gas companies to handle the CPU-intensive processing, modeling and analysis of large data sets. For many of these applications, especially related to seismic survey data, the output of these processing jobs is a 3D model that will be viewed by engineers, scientists, and researchers. For X-ISS’s client, the 3D models are viewed by internal staff as well as their external customers.
The outputs files can be huge, ranging from multi-gigabytes to several terabytes of data. Many companies that provide processing services, however, keep their HPC clusters at remote datacenters and not the users’ offices due to the unique power, cooling and space requirements of the HPC environment. The data itself is also stored at the remote location where it will be processed to avoid the bandwidth issues of constantly transferring large files.
The challenge is making those 3D models available for visualization by the technical staff on standard workstations at remote offices or even at customers’ sites.
3D models of this size can usually only be rendered for viewing on high-end workstations with powerful GPUs. Some organizations have installed dedicated workstations at their offices just to visualize 3D models, an expensive alternative that also requires installation of additional bandwidth for delivery of the files from the data center. This is not always possible or economical. A common solution to this bandwidth challenge is to deliver the processed 3D models on hard drives or other media, which involves obvious associated security risks.
Remote access is not an issue confined to 3D data. Conventional solutions related to 2D data access exist, but they typically lack the ability to effectively render 3D models in a remote session. In this case a 3D-specific solution had to be implemented.
Only a handful of 3D solutions exist, and X-ISS examined alternatives that could be used by internal personnel as well as customers. X-ISS ultimately chose the NICE EnginFrame with the VDI plugin because it creates an environment that allows users to securely log into the visualization portal from their own desktop computers using either Windows or Linux sessions.
The remote visualization sessions use the NICE Desktop Cloud Virtualization (DCV) to render and compress 3D OpenGL graphics locally in the datacenter which are then displayed remotely as 3D images on the remote client’s screen. It’s important to note the NICE EnginFrame can accommodate other 3D remote visualization products, but for this use case, DCV was determined to have the best fit.
The key criteria for success included user experience from various locations, concurrent user sessions to the same system, and customization of security to make sure unauthorized access to data was not possible. X-ISS tested and benchmarked these with another leading product,and found that in the client’s environment the NICE DCV product used about one third the bandwidth. DCV has been optimized to deliver an improved user experience even when operating on lower bandwidth and higher latency connections.
Just as important to the client, which is a large organization with many technical employees and customers, the NICE DCV product enables multiple concurrent user sessions on the same physical host.
Lastly, X-ISS performed extensive customization of the DCV solution to accommodate the particular needs of the client’s business environment. The client processes data for many customers at once, and keeping the data sets and 3D models separate was imperative. The X-ISS team set up customized access controls to ensure isolation of user sessions and data, while not impacting the production environment. This arrangement permitted the client’s customers to access their 3D models from outside the network as well.
As a result, partners and customers can collaborate securely and efficiently from remote locations. This in turn lead to increased productivity for the client as getting timely feedback produced quicker turnaround of results.
Download this case study: RemoteViz.CaseStudy9
X-ISS Tweaks Cluster Configuration during Setup to Speed Pump Design Simulations
Personnel at a prominent engineering company in Houston, Texas, no longer have to stay up all night to make sure their pump design simulations finish on time and produce the desired results. Thanks to a Dell HPC cluster deployed and optimized by X-ISS Inc., the company is running simulations 18 times faster than was possible on their computer workstations.
Much of what the company’s Houston office does is destined for the oil industry. One type of time-critical project for the division’s engineers is figuring out why a pump from an oil rig or pipeline has failed and coming up with a solution to the failure.
Usually when a pump breaks down in the oil patch, the flow of petroleum products ceases until the faulty equipment can be repaired or replaced. Every minute that oil isn’t being pumped costs the operator money. It’s up to the company’s engineers to examine the defective hardware and develop a better design that can be rushed to manufacturing as quickly as possible. For the company’s dedicated personnel that often meant keeping watch over computerized design simulations late into the night or early morning.
“The client uses the ANSYS Fluent software to simulate and test various pump designs,” explained X-ISS CEO Deepak Khosla. “As the simulations became more complex, they simply took too long, even on high-end computer workstations.”
By nature, design simulations are an iterative, trial-and-error process. Engineers have to choose just the right level of detail, or granularity, in the simulation to produce workable design alternatives. Using their workstations, the engineers sometimes had to run a simulation for several hours before seeing the results would be inadequate. After tweaking the inputs, they then restarted the simulation from the beginning.
In some situations requiring high detail, the workstation was overwhelmed by data volume and the simulation crashed before completion. Both instances often required late nights at the office tending to the computer rather than risking coming in the next morning only to find poor results or a stalled simulation.
The company contracted Dell to build a Windows 2012 HPC cluster that could speed up the simulations in ANSYS Fluent, which scales extremely well in the HPC environment. Among the many advantages, faster simulations meant the engineers could identify and tweak problems with their designs in minutes rather than hours, ultimately delivering workable solutions to the oil field customers more quickly.
Tweaking the Deployment
X-ISS worked closely with Dell and ANSYS to design and build the HPC cluster. The HPC system is comprised of 10 mid-level Dell servers and Cisco 10-GB networking equipment. X-ISS personnel deployed, set up and configured the cluster onsite at the Company. As is standard procedure, the X-ISS team ensured the Fluent application ran well, fine tuning the cluster in the process so the client would get maximum speed from the new system.
A critical step in configuring the cluster was designing the networks. X-ISS created three separate networks for data transmission so that large volumes of data could move in many directions at once. The primary network carried the data needed for the nodes to run the Fluent simulations. A second was set up for cluster managers to monitor overall system operations. And the third network enabled the engineers to run the simulations.
“Three networks maintain system speed and throughput,” said Khosla. “If we had set up just one network for the cluster, it would have had to connect with the company LAN, and the large volume of data would have slowed both the LAN and the cluster.”
A second recommendation made by X-ISS during configuration also contributed to keeping the cluster running fast. The cluster itself had 9 TB of data storage available in the head node. X-ISS suggested the company’s engineers move their pump design data to the cluster and keep it there, rather than moving data back and forth from workstations or remote nodes, which would have slowed jobs.
This configuration concept required buy-in from the company’s IT department because they were the ones responsible for backing up massive volumes of data from the cluster on a regular basis. These backups were required for archiving purposes in case the engineers later had to revisit one of their design simulations. Fortunately, the IT staff agreed with the suggestion, and data is stored locally on the cluster.
To further maximize the power, speed and efficiency of the new HPC cluster, X-ISS ran several validation tests on it, adjusting power and BIOS settings as needed. At the client’s request, the team also set up redundant hardware connections so the cluster could be maintained while it is still operating.
“The company’s engineers can now run a simulation in five minutes that once took 90 minutes,” said Khosla. “The engineers are elated because the simulations no longer cut into their personal time, and the pump re-design process is faster than ever.”
Download this case study: SpeedBoostII.CaseStudy8
X-ISS Customizes Monitoring System for Faster, More Focused Alerting
When a long-time client added nodes to its HPC cluster, they asked X-ISS to customize the open-source monitoring system already in place to provide faster alerts at the first sign of a critical failure. The X-ISS team streamlined the monitoring system and aggregated alerts so operations personnel could quickly pinpoint trouble without being confused by dozens of simultaneous notifications. As part of the project, X-ISS also integrated temperature and power sensors with the DecisionHPC® platform to provide greater insight into cluster operations.
“Monitoring systems can get cluttered as new servers are added over time, and that’s what happened with this client,” said X-ISS President Deepak Khosla. “By customizing their existing monitoring system, we shortened the alert time for critical failures from 15 minutes to just three minutes, and we aggregated up to 70 alerts into one.”
The client, a leader in providing advanced seismic data processing and visualization services to oil and gas clients, has thousands of compute nodes spread out over several datacenters. Already in place at the time of the most recent cluster upgrade was the Zabbix open-source monitoring solution. Although Zabbix is an excellent alert system, the default set of system checks it performs does not scale up into thousands, as was needed in this case. If not correctly configured, the monitoring system can become overwhelmed. This can slow the notification process. In addition, the open-source solution comes with pre-configured alert settings, or templates, which may or may not satisfy the needs of all users.
Such was the case for this client. Zabbix performs a specified number of checks on each server and switch on a periodic basis. To avoid false alarms, these critical infrastructure elements have to record multiple failures during the check cycle before an alert is generated. The client had found that by the time the alert was delivered via email to operations personnel using the default monitoring settings, the failure had progressed too far for recovery to be successfully implemented.
The X-ISS team wrote custom Zabbix scripts that both shortened the time between critical system checks, and more importantly, the overall time that elapsed before an alert was sent via email to the operations staff. In many cases, this ensured they can address an issue before it impacts applications and end users.
“For our clients, keeping their applications running without interruption is of crucial importance,” said Khosla. “The default settings in Zabbix had to be shortened and then made consistent across all the nodes.”
If a major failure occurred, such as 33 nodes going down in one group, X-ISS created Zabbix scripts that sent specific warning emails and texts to designated individuals. Rather than 33 alerts, as is the Zabbix default, a single alert goes out with the message, “33 nodes in group 5 have failed.” A similar email comes to X-ISS as part of the ongoing ManagedHPC® service provided to clients.
Next on the client’s customization wish list was aggregation of file system alerts. Each compute node could have 50-80 file systems mounted at one time based on the type of workload. Zabbix sends an alert when a single file system gets full. As multiple alerts come in at once, this gets confusing for operations personnel as they search the summary for all those that are full.
X-ISS again rewrote the scripts so that only one alert is sent when a file system on any given node exceeds the threshold. A list of file system capacities is still generated, but the report highlights which file systems are full and which others are nearing their thresholds. This gives the operators an ability to stay one step ahead of their users, adding new storage capacity before the user complains.
“Operations personnel like to know about issues – and avoid a failure – before their end users know something is wrong,” said Khosla.
As part of this customization, X-ISS also wrote a special script to notify a specific individual in operations to let him know when a file had been added or deleted from a file system. In addition, the team prepared scripts to gather environmental temperature and humidity data from a facility management system in the data center and server rooms. An email alert is sent via Zabbix when certain thresholds are exceeded.
For reporting and analytics, the client also uses X-ISS DecisionHPC® software, a high-performance package that delivers business insights into cluster usage via a single dashboard so managers can keep operations running efficiently. During the monitoring upgrade, X-ISS also integrated temperature, power and other data into the DecisionHPC dashboard.
This was accomplished by writing scripts that query sensors in each node to determine their internal temperature and power usage. This data is first sent to a Ganglia distributed monitoring system, where it is pulled by DecisionHPC and presented as a color-coded map on the dashboard. Operations personnel simply glance at the map to see if all colors are the same. If one or more is a different color, the operator can zoom in on that node to determine why the temperature or power usage is out of the normal range.
“The bottom line advantage of customizing and aggregating alerts is that the client now has deeper insight into its HPC cluster operations,” said Khosla. “And the operations personnel know about issues before they spiral into full-blown failures that disrupt the work of their end users.”
Download this case study: StreamlineZabbix.CaseStudy6
X-ISS Helps Provide Major Speed Boost for Water Disinfection Simulations
X-ISS worked closely with simulation software developer ANSYS Inc. to design and build an HPC cluster that completes water treatment simulations in minutes rather than the hours or days once required. X-ISS created the HPC system using seven mid-level Dell servers and InfiniBand networking equipment and then performed extensive fine tuning on the cluster resulting in a 98.9% efficiency rating.
The client is a company specializing in disinfecting water using ultraviolet light technology. They treat water destined for drinking use in municipal systems as well as making waste water safe prior to discharge into the environment. The company has thousands of UV treatment installations around the world. One impressive example is a system they developed to serve a metro area, which cleans more than two billion gallons of water per day. This $1.5 billion system is the largest UV drinking water facility in the world.
Designing and building these water treatment systems requires engineering simulations to verify that all the water flowing through a treatment device is exposed to sufficient UV light to kill bacteria and destroy contaminants. The company uses ANSYS Fluent to simulate its water treatment processes. Simulating these designs on a single workstation can take hours or even a couple of days. Company engineers needed to complete their designs faster in order to keep up with market demand for their systems. The company contracted X-ISS to design and build what became the company’s first HPC cluster.
One of the remarkable achievements for X-ISS was the sheer efficiency of the cluster. Cluster performance is measured in “FLOPS,” or floating point operations per second. In theory, the maximum a cluster can compute is found by multiplying these variables: number of processors, number of cores on each processor, GHz speed of each processor, and number of FLOPS each core can perform per cycle.
For this cluster, the theoretical maximum was 1,200 Giga FLOPS, or 1.2 trillion math calculations per second. Through proper design and tuning, the cluster achieved 1,187 GFLOPS on a Linpack cluster efficiency test. That’s 98.9% efficient. To put that number in a practical perspective, the cluster ran a sample simulation in nine seconds that previously took several minutes on the desktop computers.
“The speed of this cluster is remarkable considering it is only comprised of seven nodes,” said X-ISS President Deepak Khosla. “The cluster performs at the same speed of systems that are much larger and more expensive.”
Most clusters cannot achieve this kind of efficiency due to the sheer volume of data to be transferred in order to “feed” the cluster and gather the results. Imagine if you are a math teacher passing out 1.2 trillion math problems. Just the logistics of distribution, tracking and gathering the results would take more time and effort the than the actual computation. The cluster faces the same challenge, needing to distribute the trillions of data and gather the 1.2 trillion results. All in one second!
“Key to achieving this kind of performance was the way we ordered and configured the memory in the nodes and set up the InfiniBand topology,” said Khosla. “We also ran our own custom optimization routines on the cluster to improve the way the nodes talk to each other.”
For the company’s engineers, a more practical way to measure the cluster performance is to ask how long one step or iteration of a simulation takes. The cluster was able to perform iterations for its simulations in 0.087 seconds, or nearly 12 iterations per second. Most simulations require hundreds, even thousands of iterations to “solve” a simulation. But with iteration time reduced to nearly a dozen per second, simulations requiring thousands of iterations require mere minutes to solve, not hours or days.
The company’s engineers are pleased with the cluster performance and expect to use it frequently to speed up their simulations and get their designs out in the hands of the customer much faster.
Download this case study: SpeedBoost.CaseStudy5
X-ISS Sets Up Diskless Windows HPC Cluster for Secure Military Environment
A Department of Defense site needed a powerful Microsoft Windows HPC cluster to run mission critical simulation applications. At 196 nodes, the cluster was relatively large, and due to security constraints, it had to be diskless.
In a diskless cluster, a central storage area network is typically loaded with a small number of physical hard drives storing files that serve as virtual hard drives to boot the compute nodes. Diskless Linux HPC systems were already relatively common at the time, but a diskless Windows HPC deployment was not.
The DoD site chose Dell to deliver this system, and since X-ISS had already been a long-time HPC-delivery partner, Dell called on X-ISS to assist with the job.
Proud of the platform-neutral reputation built over the past 15 years, X-ISS quickly dispatched a Senior Windows Analyst to the Dell integration facility to assist with building the cluster from the ground up. Specifically, X-ISS was tasked with customizing and installing the Windows cluster management system and testing the cluster.
“Starting with basic system architecture, we had to figure out how to make this work,” said Deepak Khosla. “Diskless booting with Windows is complex. It requires detailed planning to ensure all hardware is configured and set up to meet specific requirements.”
Making Diskless Windows Work
From a practical and financial perspective, diskless clusters make a lot of sense for any secure facility, Deepak Khosla explained. Organizations handling classified information deal with stringent security protocols for their computer networks. Among these is the mandated periodic DoD-grade wiping or outright destruction of disk drives containing sensitive data. For the military customer, this would have meant time-consuming and expensive cleansing – or destruction – of the 392 drives required for a standard 196-node system.
To customize the diskless Windows cluster, X-ISS interfaced extensively with Dell, Microsoft and the client.
After several conversations with Microsoft, X-ISS concluded that differencing disk technology would be key to a diskless system which met the military base’s requirement for system speed while also minimizing the number of hard drives. The differencing disks would enable the client to minimize the physical drive count and run and modify the simulation numerous times without ever changing the master boot image. Each change, or simulation modification, is saved to a differencing disk on a virtual drive.
Rather than set up hundreds of virtual drives, each taking up 15 gigabytes of space, the team created an equivalent number of differencing disks against a single 15GB virtual drive. The savings in disk space was enormous, and the system speed was not impaired.
Download this case study: DisklessWindows.CaseStudy2
X-ISS Helps University of Wisconsin Focus on Supporting Researchers
At the University of Wisconsin, the Engineering Department did not have the time to hire and train a person to administer the complex HPC cluster required for research. Only a small handful of candidates who applied for the position were qualified to administer the system.
The University of Wisconsin contacted X-ISS to do software upgrades and installations along with system management.
- Freedom to focus on research instead of HPC administration
- On-site installation and set-up
- Turnkey outsource system management service
- Secure remote system monitoring
- Proactive reporting process
- X-ISS professionals available 24/7
In the world of high performance computing (HPC), the gap is growing between organizations utilizing cluster computing systems and the supply of experienced and qualified system administrators. The HPC systems of today are extremely powerful and flexible, but are also complex to install, manage and maintain.
Using cluster computing, researchers across a range of scientific fields, from astronomy to genomics, can develop model experiments using programs created to utilize large amounts of data for designing high-resolution displays, intricate analysis and detailed simulations. Frequently, the scientists working with HPC clusters for research need to also administer the system. But when research scientists have to become computer scientists, time is lost and used inefficiently, productivity drops and the level of frustration rises as more time is spent managing the HPC.
“The major architectural trends in high performance computing—from single-system-image serves to distributed clusters, and from single-core to multi-core processors—have combined to make effective system administration a much different, much more complicated challenge than it used to be,” said Addison Snell, CEO of Intersect360 Research, a consulting firm focused on the HPC industry. “To achieve optimal performance and utilization can require significant expertise in a wide array of middleware options. This has led to a shortage of qualified system administration talent for HPC markets.”
X-ISS is an answer to this shortage of experienced HPC talent. With its outsourcing management service ManagedHPC, X-ISS gives its clients the freedom to focus on their research, while leaving the HPC system in the hands of competent administrators. Using a unique methodology, customers incorporating ManagedHPC receive on-site installation and setup, turnkey outsource system management service, secure remote system monitoring, a proactive reporting process and the professional X-ISS team available any time of day or night.
At the University of Wisconsin, scientific research in the fields of bioengineering requires the most sophisticated software and computer programs. Whether its biomechanics, cell or tissue engineering, or biomedical research, the university didn’t have the time to hire and train a person to administer the complex HPC cluster required for research. Additionally, only a small handful of the candidates that inquired about the position were even qualified. Instead, the university contacted X-ISS to do software upgrades and installations along with system management.
“As part of this significant investment into the Engineering Department at the University of Wisconsin, we were able to procure a 142-node cluster computer from Dell and funding for a cluster system administrator,” said David Crass, Director of Research Computing at the University of Wisconsin. “X-ISS has been able to handle getting the system up and running, software installations, and proactively handle technical issues so we could focus on working with the Engineering staff on specific code needs and department usage of this shared resource. It has allowed us to focus our attention where it was most needed.”
With an increase in HPC use, demand for X-ISS services will also increase. In addition to ManagedHPC, X-ISS has DecisionHPC, a Web-based monitoring and analytics software package to help customers maximize productivity, assist with future computing resource needs and align HPC resources with organizational goals.
With a focus on responsiveness, expertise and professionalism, X-ISS helps organizations implement cost savings solutions; increase efficiency to keep the research staff focused on operations; increase utilization by maximizing cluster capacity; increase top-line revenue to perform analysis in a shorter amount of time, leading to faster business decisions; and obtain peace of mind by knowing X-ISS is backed by more than 10 years of experience and success.
Download this case study: UWM.CaseStudy7