X-ISS Helps Provide Major Speed Boost for Water Disinfection Simulations

X-ISS worked closely with simulation software developer ANSYS Inc. to design and build an HPC cluster that completes water treatment simulations in minutes rather than the hours or days once required. X-ISS created the HPC system using seven mid-level Dell servers and InfiniBand networking equipment and then performed extensive fine tuning on the cluster resulting in a 98.9% efficiency rating.

The client is a company specializing in disinfecting water using ultraviolet light technology. They treat water destined for drinking use in municipal systems as well as making waste water safe prior to discharge into the environment. The company has thousands of UV treatment installations around the world. One impressive example is a system they developed to serve a metro area, which cleans more than two billion gallons of water per day. This $1.5 billion system is the largest UV drinking water facility in the world.

Designing and building these water treatment systems requires engineering simulations to verify that all the water flowing through a treatment device is exposed to sufficient UV light to kill bacteria and destroy contaminants. The company uses ANSYS Fluent to simulate its water treatment processes. Simulating these designs on a single workstation can take hours or even a couple of days. Company engineers needed to complete their designs faster in order to keep up with market demand for their systems. The company contracted X-ISS to design and build what became the company’s first HPC cluster.

One of the remarkable achievements for X-ISS was the sheer efficiency of the cluster. Cluster performance is measured in “FLOPS,” or floating point operations per second. In theory, the maximum a cluster can compute is found by multiplying these variables: number of processors, number of cores on each processor, GHz speed of each processor, and number of FLOPS each core can perform per cycle.

For this cluster, the theoretical maximum was 1,200 Giga FLOPS, or 1.2 trillion math calculations per second. Through proper design and tuning, the cluster achieved 1,187 GFLOPS on a Linpack cluster efficiency test. That’s 98.9% efficient. To put that number in a practical perspective, the cluster ran a sample simulation in nine seconds that previously took several minutes on the desktop computers.

“The speed of this cluster is remarkable considering it is only comprised of seven nodes,” said X-ISS President Deepak Khosla. “The cluster performs at the same speed of systems that are much larger and more expensive.”

Most clusters cannot achieve this kind of efficiency due to the sheer volume of data to be transferred in order to “feed” the cluster and gather the results.  Imagine if you are a math teacher passing out 1.2 trillion math problems. Just the logistics of distribution, tracking and gathering the results would take more time and effort the than the actual computation. The cluster faces the same challenge, needing to distribute the trillions of data and gather the 1.2 trillion results.  All in one second!

“Key to achieving this kind of performance was the way we ordered and configured the memory in the nodes and set up the InfiniBand topology,” said Khosla. “We also ran our own custom optimization routines on the cluster to improve the way the nodes talk to each other.”

For the company’s engineers, a more practical way to measure the cluster performance is to ask how long one step or iteration of a simulation takes. The cluster was able to perform iterations for its simulations in 0.087 seconds, or nearly 12 iterations per second.  Most simulations require hundreds, even thousands of iterations to “solve” a simulation. But with iteration time reduced to nearly a dozen per second, simulations requiring thousands of iterations require mere minutes to solve, not hours or days.

The company’s engineers are pleased with the cluster performance and expect to use it frequently to speed up their simulations and get their designs out in the hands of the customer much faster.

Download this case study: SpeedBoost.CaseStudy5