High Performance Computing

Hi All
i am going to build high performance computing HPC
specifically for data mining and analytic
what is the tool i will going to used
hortonworks for data collection and vertica DB
still analytic tool not selected
what i want to do is
what is the software i should use to build high performance computing based on open suse
is there any reference for that
any manual

for Rocks and OSCAR really outdated and no improvement has been done from long time

You’ll probably want to define exactly what you want to do, then build according to the requirements of your solution.

Some generalities about openSUSE…

  • Unlike RHEL, openSUSE does not make available a kernel specially tuned for HPC, but whether that will make a diff to you is YMMV. Just deploy the Default kernel instead of the default Desktop kernel to remove the workstation type kernel optimizations.

  • Unlike most distros, openSUSE typically makes everything imaginable accessible from the same few repos, with options to add repos for specific reasons. It does not matter what Desktop you may or may not choose, you can install apps for any Desktop or machine configuration.

The two leading solutions for high performance data analysis, mining and search today are the most popular Hadoop/Pig/Hive/etc and Elasticsearch/Logstash/Kibana application stacks.

Personally, I work on an Elasticsearch cluster based almost entirely on openSUSE.
Elasticsearch is competitive application stack to the Industry standard Hadoop/Pig/Hive stack with many similarities. Since ES’s launch about 4 years ago, these 2 stacks have gone head to head leapfrogging each other introducing new features and adopting best from the other.
The most important HPC feature of these two app stacks are that they both are based on noSQL application level clusters which means that

  • There is no limit to their capacity, just add another node
  • There is no complex OS or system level clustering, all clustering is done at the application level, ie nodes discovery, data and metadata distribution across nodes, data fault tolerance, node failure and recovery, management in general. For those who have lived through configuring clusters, heartbeats and more, this is an existential dream.
  • Both require Java re-optimization for high performance, not needed if you pushing the performance envelope.

The reasons I use Elasticsearch instead of the traditional Hadoop/et al is because

  • No need to learn a half dozen languages for each app in the stack. JSON is the standard used for data storage, communication between nodes and configurations.
  • Logstash is a very cool aggregator, parser, router and data transformer.
  • Kibana is a web frontend, although I’ve been looking at other web frontends including the ever-popular Graphite.

Unfortunately I haven’t kept up my ES on openSUSE writings, everything is very ancient and may not work, but if you want to read what I’ve posted it might still provide a flavor for major parts of the stack, how to invoke (those are still mostly current) and installation (I recommend only the repo method today, or you can simply read the most current documentation at https://www.elastic.co/).


For anyone who wants to install Elasticsearch, I have updated my wiki page describing how to install using the elastic repos.

The pages I’ve written about ES may be considered far superior to what you’ll find about openSUSE on the elastic.co website.
Still, there is a tremendous amount of information on the elastic.co website no covered by my writings. If you are unable to make anything work, just post your question.