High-performance computers: everything (and openSUSE)

I will probably (help to) set-up a high-performance computer (HPC) mainly for climate simulations in the near future. Since we are on a tight budget, we want to do as much as possible ourselves. While I can set-up and build a desktop machine quite well (also with more non-idiot-proof distributions like Gentoo), I’ve never done anything anything more HPC-like. I think I know what I have to do, but not exactly how to do it. Can you recommend any literature or field report? Anything more towards HPC technology like Infiniband network etc. I would also like to stick with openSUSE. Do you see any issues?

IMO
You have to start with the application you intend to run.
Many of today’s leading and bleeding edge applications do Big Data Analysis, and today’s better solutions are clustered which means they run on an array of computers and not on any one. Really big number crunching on a single machine can be too expensive and limited. That said, even a distributed application which can scale horizontally can also benefit from running on bigger machines as well.

A really good implementation requires understanding the requirements of the solution… eg
Can this be done better on a Cloud platform where you rent capability or on your own machines?
How valuable is the data, are there security and privacy issues and what are the consequences of a hardware or software failure?
Does the data need to be transformed for your application or use?
Is the data static or continuous (likely streamed)?
Not to be ignored are the requirements and objectives set by the non-technical folk and non-technical constraints like your budget…

After you evaluate your application’s needs, then you can inventory your own in-house available skillsets and determine if you are able to create, support and maintain your solution or if you should consider out-sourcing or acquiring assets.

Of course, these are just general considerations and devil’s in the details.

TSU

Adding a little bit since my prior post focused primarily on the fact that many Big Data apps today are distributed, and are often or can be built on rather ordinary systems…

If your requirement really is an HPC, then you should know that AFAIK RHEL is the only distro that develops and releases a kernel specifically featured and tuned for that kind of deployment. There are other kernels, like the one Google develops primarily for internal use because of requirements to deploy the immensely massive clusters used for its own massively scalable “Search.” Although primarily for in-house use, Google publishes their kernel for others who might want to use it but it is so tuned for a Google deployment it’s anyone’s guess how well it would perform for others (It’s skewed to the extreme to support heavy network connection traffic).

So in a way, although RHEL is generally the most appropriate choice for HPC that’s not to say that it’s the absolutely correct choice.
You can build an HPC cluster using practically any distro including openSUSE, the packages are available to build your own so this is still very possible for anyone who likes openSUSE/SUSE features.

TSU