SUSE Linux Enterprise High Performance Computing 15 GA
Release Notes #
This document provides guidance and an overview to high-level general features and updates for the SUSE Linux Enterprise High Performance Computing 15 GA. It describes the capabilities and limitations of the SUSE Linux Enterprise High Performance Computing 15 GA.
These release notes are updated periodically. The latest version is always available at https://www.suse.com/releasenotes. General documentation can be found at: https://www.suse.com/documentation/sles-15.
General documentation for SLES and SUSE Linux Enterprise High Performance Computing can be found at: http://www.suse.com/documentation/.
- 1 SUSE Linux Enterprise High Performance Computing 15 GA
- 2 Availability
- 3 Support and Life Cycle
- 4 Documentation and Other Information
- 5 How to Obtain Source Code
- 6 Support Statement for SUSE Linux Enterprise High Performance Computing
- 7 Installation and Upgrade
- 8 Functionality
- 9 Updated Packages
- 10 Legal Notices
1 SUSE Linux Enterprise High Performance Computing 15 GA #
SUSE Linux Enterprise High Performance Computing is a highly scalable, high performance open-source operating system designed to utilize the power of parallel computing for modeling, simulation and advanced analytics workloads.
SUSE Linux Enterprise High Performance Computing 15 GA provides tools and libraries related to High Performance Computing. Presently, the tools include:
Workload manager (Slurm)
Remote and parallel shells
Performance monitoring and measuring tools
Serial console monitoring tool
Cluster power management tool
Tool to discover the machine hardware topology
Tool to monitor memory errors
Tool to determine CPU model on capabilities (x86-64 only)
User extensible heap manager capable of distinguishing between different kinds of memory (x86-64 only)
This document only describes features and procedures specific to this module. Make sure to also review the release notes for the base product, which is SUSE Linux Enterprise Server 15 GA at https://www.suse.com/releasenotes/x86_64/SUSE-SLES/15/.
2 Availability #
SUSE Linux Enterprise High Performance Computing 15 GA is available for the x86-64 and AArch64 platforms.
3 Support and Life Cycle #
SUSE Linux Enterprise High Performance Computing 15 GA is supported throughout the lifecycle of SLE 15 GA. Long Term Support Service is not available. Any release is fully maintained and supported until the availability of the next release.
For more information, see the Support Policy page https://www.suse.com/support/policy.html.
4 Documentation and Other Information #
Accessing the documentation on the product media:
Read the READMEs on the media.
Get the detailed change log information about a particular package from the RPM (where
<FILENAME>.rpm
is the name of the RPM):rpm --changelog -qp <FILENAME>.rpm
Check the
ChangeLog
file in the top level of the media for a chronological log of all changes made to the updated packages.These Release Notes are identical across all architectures, and the most recent version is always available online at http://www.suse.com/releasenotes/. Some entries may be listed twice, if they are important and belong to more than one section.
5 How to Obtain Source Code #
This SUSE product includes materials licensed to SUSE under the GNU General Public License (GPL). The GPL requires SUSE to provide the source code that corresponds to the GPL-licensed material. The source code is available for download at http://www.suse.com/download-linux/source-code.html.
Also, for up to three years after distribution of the SUSE product, upon request, SUSE will mail a copy of the source code. Requests should be sent by e-mail to mailto:sle_source_request@suse.com or as otherwise instructed at http://www.suse.com/download-linux/source-code.html. SUSE may charge a reasonable fee to recover distribution costs.
6 Support Statement for SUSE Linux Enterprise High Performance Computing #
To receive support, you need an appropriate subscription with SUSE. For more information, see http://www.suse.com/products/server/services-and-support/.
The following definitions apply:
- L1
Problem determination, which means technical support designed to provide compatibility information, usage support, ongoing maintenance, information gathering and basic troubleshooting using available documentation.
- L2
Problem isolation, which means technical support designed to analyze data, reproduce customer problems, isolate problem area and provide a resolution for problems not resolved by Level 1 or alternatively prepare for Level 3.
- L3
Problem resolution, which means technical support designed to resolve problems by engaging engineering to resolve product defects which have been identified by Level 2 Support.
For contracted customers and partners, the SUSE Linux Enterprise High Performance Computing 15 GA is delivered with L3 support for all packages, except the following:
Technology Previews
sound, graphics, fonts and artwork
packages that require an additional customer contract
development packages for libraries which are only delivered with L2 support
SUSE will only support the usage of original (that is, unchanged and un-recompiled) packages.
7 Installation and Upgrade #
Since different users may want to use different components of this product, there are presently no preselected HPC packages which will be installed by default.
Refer to the package list below about which packages are available.
7.1 Installation #
This section includes information related to the initial installation of the SUSE Linux Enterprise High Performance Computing 15 GA.
7.1.1 System Roles for SUSE Linux Enterprise for High Performance Computing #
With SLE HPC 15, it is possible to choose specific roles for the system based on modules selected during the installation process. There are three roles available:
HPC Management Server (Head Node): available when the HPC Module is selected.
HPC Compute Node: available when the HPC Module is selected.
HPC Development Node: available when the HPC Module are selected.
8 Functionality #
This section comprises information about packages and their functionality, as well as additions, updates, removals and changes to the package layout of software.
8.1 Ganglia ‒ System Monitoring #
Ganglia is a scalable distributed monitoring system for high-performance computing systems, such as clusters and grids. It is based on a hierarchical design targeted at federations of clusters.
To use Ganglia, make sure to install ganglia-gmetad
on the management serve then start the Ganglia meta-daemon:
rcgmead start
To make sure the service is started
after a reboot, run: systemctl enable gmetad
. On
each cluster node which you want to monitor, install
ganglia-gmond
, start the service rcgmond
start
and make sure it is enabled to be started automatically
after a reboot: systemctl enable gmond
. To test
whether the gmond
daemon has connected to the
meta-daemon, run gstat -a
and check that each node to
be monitored is present in the output.
When using the Btrfs filesystem, the monitoring data will be lost after
a rollback and the service gmetad
. To be able to
start it again, either install the package
ganglia-gmetad-skip-bcheck
or create the file
/etc/ganglia/no_btrfs_check
.
To use the Ganglia Web interface, it is required to add the "Web and
Scripting Module" first. This can be done by running
SUSEConnect -p sle-module-web-scripting/12/x86_64
.
Install ganglia-web
on the management server.
Depending on which PHP version is used (default is PHP 5), enable it in
Apache2: a2enmod php5
or a2enmod
php7
. Then start Apache2 on this machine: rcapache2
start
and make sure it is started automatically after a
reboot: systemctl enable apache2
. The ganglia web
interface should be accessible from
http://<management_server>/ganglia
.
8.2 pdsh host-list Plug-ins with Conflicting Options Packaged Separately #
Some host-list plugins to
pdsh
have conflicting options.
These options are passed to the first plugin found. Thus the order is
not well defined.
Each pdsh
host-list plugin have been packaged
separately, packages containing plugins with conflicting options can no
longer be installed simultaneously. Users that have been using the
machines
, slurm
or
netgroup
or dshgroup
plugin need
to install these separately, now. To do so, run:
zypper in pdsh-machines
to install themachines
pluginzypper in pdsh-slurm
to install theslurm
pluginzypper in pdsh-netgroup
to install thenetgroup
pluginzypper in pdsh-dshgroup
to install thedshgroup
pluginzypper in gdsh-genders
to install thegenders
plugin
8.3 Support for Genders Static Cluster Configuration Database #
Support for Genders has been added to the the HPC module.
Genders is a static cluster configuration database used for configuration management. It allows grouping and addressing sets of hosts by attributes and is used by a variety of tools. The Genders database is a text file which is usually replicated on each node in a cluster.
Perl, Python, C, and C++ bindings are supplied with Genders, the respective packages provide man pages or other documentation describing the APIs.
To create the Genders database, follow the instructions and examples in
/etc/genders
and check
/usr/share/doc/packages/genders-base/TUTORIAL
.
Testing a configuration can be done with nodeattr
(for more information, see man 1 nodeattr
).
List of packages:
genders
genders-base
genders-devel
python-genders
genders-perl-compat
libgenders0
libgendersplusplus2
9 Updated Packages #
9.1 Support for Genders in pdsh #
Since Genders has been added to the HPC module, the
genders
plugin for pdsh
is now
supported.
At the same time, all host-list plugins to pdsh
have
been packaged separately to avoid conflicts due to identical options.
Host list plugins are no longer installed automatically. If, for
instance, the slurm
plugin has been used so far, it
must be installed separately after the update.
9.2 Lmod Has Been Updated to Version 7.6 #
Lmod (package lua-lmod
has been updated to version
7.6. This version is the minimum version that is required to work with
the SUSE-supplied HPC libraries.
9.3 Support for Intel Knights Mill CPUs in cpuid #
cpuid
has been updated to support Intel Knights Mill
CPUs (x86-64).
9.4 pdsh Has Been Updated to Version 2.33 #
pdsh
has been updated version 2.33. For more
information on the update, see the package change log.
9.5 ConMan Has Been Updated to Version 0.2.8 #
ConMan has been updated to version 0.2.8. For more information about the update, see the package change log.
9.6 Slurm Has Been Updated to Version 17.02.9 #
Slurm has been update to version 17.02.9. This update is recommended as it contains a security update to fix CVE-2017-15566. For more information about the update, see the package change log.
To make it possible to keep older versions of this library installed,
with this version, the libslurm
and
libslurmdb
have been split from the
slurm
base package.
Together with the updated version, the deprecated package
slurm-sched-wiki
has been removed. This package was
only relevant in connection with the MOAB and MAUI schedulers which were
never shipped with SUSE Linux Enterprise.
The subpackage slurm-torque
has been newly
introduced: It provides a Torque-like set of commands to Slurm for users
switching from Torque.
When updating Slurm, the configuration file needs to be updated: In
/etc/slurm/slurm.conf
set:
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid
10 Legal Notices #
SUSE makes no representations or warranties with respect to the contents or use of this documentation, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, SUSE reserves the right to revise this publication and to make changes to its content, at any time, without the obligation to notify any person or entity of such revisions or changes.
Further, SUSE makes no representations or warranties with respect to any software, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, SUSE reserves the right to make changes to any and all parts of SUSE software, at any time, without any obligation to notify any person or entity of such changes.
Any products or technical information provided under this Agreement may be subject to U.S. export controls and the trade laws of other countries. You agree to comply with all export control regulations and to obtain any required licenses or classifications to export, re-export, or import deliverables. You agree not to export or re-export to entities on the current U.S. export exclusion lists or to any embargoed or terrorist countries as specified in U.S. export laws. You agree to not use deliverables for prohibited nuclear, missile, or chemical/biological weaponry end uses. Refer to http://www.suse.com/company/legal/ for more information on exporting SUSE software. SUSE assumes no responsibility for your failure to obtain any necessary export approvals.
Copyright © 2010- 2018 SUSE LLC. This release notes document is licensed under a Creative Commons Attribution-NoDerivs 3.0 United States License (CC-BY-ND-3.0 US, http://creativecommons.org/licenses/by-nd/3.0/us/).
SUSE has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.suse.com/company/legal/ and one or more additional patents or pending patent applications in the U.S. and other countries.
For SUSE trademarks, see SUSE Trademark and Service Mark list (http://www.suse.com/company/legal/). All third-party trademarks are the property of their respective owners.