SUSE Linux Enterprise High Performance Computing 15 GA

Release Notes

This document provides guidance and an overview to high-level general features and updates for the SUSE Linux Enterprise High Performance Computing 15 GA. It describes the capabilities and limitations of the SUSE Linux Enterprise High Performance Computing 15 GA.

These release notes are updated periodically. The latest version is always available at https://www.suse.com/releasenotes. General documentation can be found at: https://www.suse.com/documentation/sles-15.

General documentation for SLES and SUSE Linux Enterprise High Performance Computing can be found at: http://www.suse.com/documentation/.

Publication Date: 2018-06-05, Version: 15.20180605

1 SUSE Linux Enterprise High Performance Computing 15 GA

SUSE Linux Enterprise High Performance Computing is a highly scalable, high performance open-source operating system designed to utilize the power of parallel computing for modeling, simulation and advanced analytics workloads.

SUSE Linux Enterprise High Performance Computing 15 GA provides tools and libraries related to High Performance Computing. Presently, the tools include:

  • Workload manager (Slurm)

  • Remote and parallel shells

  • Performance monitoring and measuring tools

  • Serial console monitoring tool

  • Cluster power management tool

  • Tool to discover the machine hardware topology

  • Tool to monitor memory errors

  • Tool to determine CPU model on capabilities (x86-64 only)

  • User extensible heap manager capable of distinguishing between different kinds of memory (x86-64 only)

This document only describes features and procedures specific to this module. Make sure to also review the release notes for the base product, which is SUSE Linux Enterprise Server 15 GA at https://www.suse.com/releasenotes/x86_64/SUSE-SLES/15/.

2 Availability

SUSE Linux Enterprise High Performance Computing 15 GA is available for the x86-64 and AArch64 platforms.

3 Support and Life Cycle

SUSE Linux Enterprise High Performance Computing 15 GA is supported throughout the lifecycle of SLE 15 GA. Long Term Support Service is not available. Any release is fully maintained and supported until the availability of the next release.

For more information, see the Support Policy page https://www.suse.com/support/policy.html.

4 Documentation and Other Information

Accessing the documentation on the product media:

  • Read the READMEs on the media.

  • Get the detailed change log information about a particular package from the RPM (where <FILENAME>.rpm is the name of the RPM):

    rpm --changelog -qp <FILENAME>.rpm
  • Check the ChangeLog file in the top level of the media for a chronological log of all changes made to the updated packages.

  • These Release Notes are identical across all architectures, and the most recent version is always available online at http://www.suse.com/releasenotes/. Some entries may be listed twice, if they are important and belong to more than one section.

5 How to Obtain Source Code

This SUSE product includes materials licensed to SUSE under the GNU General Public License (GPL). The GPL requires SUSE to provide the source code that corresponds to the GPL-licensed material. The source code is available for download at http://www.suse.com/download-linux/source-code.html.

Also, for up to three years after distribution of the SUSE product, upon request, SUSE will mail a copy of the source code. Requests should be sent by e-mail to mailto:sle_source_request@suse.com or as otherwise instructed at http://www.suse.com/download-linux/source-code.html. SUSE may charge a reasonable fee to recover distribution costs.

6 Support Statement for SUSE Linux Enterprise High Performance Computing

To receive support, you need an appropriate subscription with SUSE. For more information, see http://www.suse.com/products/server/services-and-support/.

The following definitions apply:

L1

Problem determination, which means technical support designed to provide compatibility information, usage support, ongoing maintenance, information gathering and basic troubleshooting using available documentation.

L2

Problem isolation, which means technical support designed to analyze data, reproduce customer problems, isolate problem area and provide a resolution for problems not resolved by Level 1 or alternatively prepare for Level 3.

L3

Problem resolution, which means technical support designed to resolve problems by engaging engineering to resolve product defects which have been identified by Level 2 Support.

For contracted customers and partners, the SUSE Linux Enterprise High Performance Computing 15 GA is delivered with L3 support for all packages, except the following:

  • Technology Previews

  • sound, graphics, fonts and artwork

  • packages that require an additional customer contract

  • development packages for libraries which are only delivered with L2 support

SUSE will only support the usage of original (that is, unchanged and un-recompiled) packages.

7 Installation and Upgrade

Since different users may want to use different components of this product, there are presently no preselected HPC packages which will be installed by default.

Refer to the package list below about which packages are available.

7.1 Installation

This section includes information related to the initial installation of the SUSE Linux Enterprise High Performance Computing 15 GA.

7.1.1 System Roles for SUSE Linux Enterprise for High Performance Computing

With SLE HPC 15, it is possible to choose specific roles for the system based on modules selected during the installation process. There are three roles available:

  • HPC Management Server (Head Node): available when the HPC Module is selected.

  • HPC Compute Node: available when the HPC Module is selected.

  • HPC Development Node: available when the HPC Module are selected.

8 Functionality

This section comprises information about packages and their functionality, as well as additions, updates, removals and changes to the package layout of software.

8.1 Ganglia ‒ System Monitoring

Ganglia is a scalable distributed monitoring system for high-performance computing systems, such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters.

To use Ganglia, make sure to install ganglia-gmetad on the management serve then start the Ganglia meta-daemon: rcgmead start To make sure the service is started after a reboot, run: systemctl enable gmetad. On each cluster node which you want to monitor, install ganglia-gmond, start the service rcgmond start and make sure it is enabled to be started automatically after a reboot: systemctl enable gmond. To test whether the gmond daemon has connected to the meta-daemon, run gstat -a and check that each node to be monitored is present in the output.

When using the Btrfs filesystem, the monitoring data will be lost after a rollback and the service gmetad. To be able to start it again, either install the package ganglia-gmetad-skip-bcheck or create the file /etc/ganglia/no_btrfs_check.

To use the Ganglia Web interface, it is required to add the "Web and Scripting Module" first. This can be done by running SUSEConnect -p sle-module-web-scripting/12/x86_64. Install ganglia-web on the management server. Depending on which PHP version is used (default is PHP 5), enable it in Apache2: a2enmod php5 or a2enmod php7. Then start Apache2 on this machine: rcapache2 start and make sure it is started automatically after a reboot: systemctl enable apache2. The ganglia web interface should be accessible from http://<management_server>/ganglia.

8.2 pdsh host-list Plug-ins with Conflicting Options Packaged Separately

Some host-list plugins to pdsh have conflicting options. These options are passed to the first plugin found. Thus the order is not well defined.

Each pdsh host-list plugin have been packaged separately, packages containing plugins with conflicting options can no longer be installed simultaneously. Users that have been using the machines, slurm or netgroup or dshgroup plugin need to install these separately, now. To do so, run:

  • zypper in pdsh-machines to install the machines plugin

  • zypper in pdsh-slurm to install the slurm plugin

  • zypper in pdsh-netgroup to install the netgroup plugin

  • zypper in pdsh-dshgroup to install the dshgroup plugin

  • zypper in gdsh-genders to install the genders plugin

8.3 Support for Genders Static Cluster Configuration Database

Support for Genders has been added to the the HPC module.

Genders is a static cluster configuration database used for configuration management. It allows grouping and addressing sets of hosts by attributes and is used by a variety of tools. The Genders database is a text file which is usually replicated on each node in a cluster.

Perl, Python, C, and C++ bindings are supplied with Genders, the respective packages provide man pages or other documentation describing the APIs.

To create the Genders database, follow the instructions and examples in /etc/genders and check /usr/share/doc/packages/genders-base/TUTORIAL. Testing a configuration can be done with nodeattr (for more information, see man 1 nodeattr).

List of packages:

  • genders

  • genders-base

  • genders-devel

  • python-genders

  • genders-perl-compat

  • libgenders0

  • libgendersplusplus2

9 Updated Packages

9.1 Support for Genders in pdsh

Since Genders has been added to the HPC module, the genders plugin for pdsh is now supported.

At the same time, all host-list plugins to pdsh have been packaged separately to avoid conflicts due to identical options.

Host list plugins are no longer installed automatically. If, for instance, the slurm plugin has been used so far, it must be installed separately after the update.

9.2 Lmod Has Been Updated to Version 7.6

Lmod (package lua-lmod has been updated to version 7.6. This version is the minimum version that is required to work with the SUSE-supplied HPC libraries.

9.3 Support for Intel Knights Mill CPUs in cpuid

cpuid has been updated to support Intel Knights Mill CPUs (x86-64).

9.4 pdsh Has Been Updated to Version 2.33

pdsh has been updated version 2.33. For more information on the update, see the package change log.

9.5 ConMan Has Been Updated to Version 0.2.8

ConMan has been updated to version 0.2.8. For more information about the update, see the package change log.

9.6 Slurm Has Been Updated to Version 17.02.9

Slurm has been update to version 17.02.9. This update is recommended as it contains a security update to fix CVE-2017-15566. For more information about the update, see the package change log.

To make it possible to keep older versions of this library installed, with this version, the libslurm and libslurmdb have been split from the slurm base package.

Together with the updated version, the deprecated package slurm-sched-wiki has been removed. This package was only relevant in connection with the MOAB and MAUI schedulers which were never shipped with SUSE Linux Enterprise.

The subpackage slurm-torque has been newly introduced: It provides a Torque-like set of commands to Slurm for users switching from Torque.

When updating Slurm, the configuration file needs to be updated: In /etc/slurm/slurm.conf set: SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmdPidFile=/var/run/slurm/slurmd.pid

Print this page