Skip to content

Latest commit

 

History

History
428 lines (325 loc) · 20 KB

weekly-email-20220221.md

File metadata and controls

428 lines (325 loc) · 20 KB

NERSC Weekly Email, Week of February 21, 2022

Contents

Summary of Upcoming Events and Key Dates

        February 2022
 Su  Mo  Tu  We  Th  Fr  Sa
          1   2   3   4   5
  6   7   8   9  10  11  12
 13  14  15  16  17  18  19   
 20 *21* 22 *23* 24  25  26   21 Feb         Presidents Day Holiday [1]
                              23 Feb         NVIDIA Perf Tools Training [2]
 27  28  

         March 2022
 Su  Mo  Tu  We  Th  Fr  Sa
         *1*  2   3   4   5    1 Mar         SYCL on Perlmutter Training [3]
                               1 Mar         ATPESC Applications Due [4]
  6   7   8  *9* 10  11  12    9 Mar         IDEAS-ECP Monthly Webinar [5]
 13  14  15 *16* 17  18  19   16 Mar         Cori Monthly Maintenance [6]
 20  21  22  23  24  25  26   
 27  28  29  30  31

         April 2022
 Su  Mo  Tu  We  Th  Fr  Sa
                      1   2
  3   4   5   6   7   8   9
 10  11  12  13  14  15  16
 17  18  19 *20* 21  22  23   20 Apr         Cori Monthly Maintenance [7]
 24  25  26  27  28  29  30
  1. February 21, 2022: Presidents Day Holiday (No Consulting or Account Support)
  2. February 23, 2022: NVIDIA Performance Tools Training
  3. March 1, 2022: Programming with SYCL on Perlmutter Training
  4. March 1, 2022: ATPESC Applications Due
  5. March 9, 2022: IDEAS-ECP Monthly Webinar
  6. March 16, 2022: Cori OS & Prog Env Upgrade
  7. April 20, 2022: Cori Monthly Maintenance
  8. All times are Pacific Time zone
  • Upcoming Planned Outage Dates (see Outages section for more details)

    • Tuesday: Perlmutter
    • Wednesday: HPSS Archive (User)
  • Other Significant Dates

    • May 18, June 15, & July 20, 2022: Cori Monthly Maintenance Window
    • May 30, 2022: Memorial Day Holiday (No Consulting or Account Support)
    • June 20, 2022: Juneteenth Holiday (No Consulting or Account Support)
    • July 4, 2022: Independence Day Holiday (No Consulting or Account Support)
    • September 5, 2022: Labor Day Holiday (No Consulting or Account Support)
    • November 24-25, 2022: Thanksgiving Holiday (No Consulting or Account Support)
    • December 23, 2022-January 2, 2023: Winter Shutdown (Limited Consulting and Account Support)

(back to top)


NERSC Status

NERSC Operations Continue, with Minimal Changes

Berkeley Lab, where NERSC is located, is operating under public health restrictions. NERSC continues to remain open while following site-specific protection plans. We remain in operation, with the majority of NERSC staff working remotely, and staff essential to operations onsite. We do not expect significant changes to our operations in the next few months.

You can continue to expect regular online consulting and account support as well as schedulable online appointments. Trainings continue to be held online. Regular maintenances on the systems continue to be performed while minimizing onsite staff presence, which could result in longer downtimes than would occur under normal circumstances.

Because onsite staffing is so minimal, we request that you continue to refrain from calling NERSC Operations except to report urgent system issues.

For current NERSC systems status, please see the online MOTD and current known issues webpages.

(back to top)


This Week's Events and Deadlines

President's Day Holiday Today; No Consulting or Account Support

Consulting and account support will be unavailable today, due to the Berkeley Lab-observed President's Day holiday. Regular consulting and account support services will resume tomorrow.

NVIDIA Performance Tools for A100 GPU Systems Training Wednesday

As part of the ALCF Developer Sessions, there will be a training this Wednesday, February 23 on NVIDIA Performance Tools for A100 GPU Systems. NVIDIA Developer Tools are available for detailed performance analysis of HPC applications running on DVIDIA DGX A100 systems, such as Perlmutter and ALCF's ThetaGPU systems. Learn to use Nsight Systems for a system-wide visualization of an application's performance. Use Nsight Compute for interactive kernel profiling. In this session, several use cases of Nsight Systems and Nsight Compute will be presented via a demo with simple HPC benchmarks on ThetaGPU.

For more information please see https://www.nersc.gov/users/training/events/nvidia-performance-tools-for-a100-gpus-feb2022.

(back to top)


Perlmutter

Perlmutter Machine Status

The initial phase of the Perlmutter supercomputer is in the NERSC machine room, running user jobs.

We have added many early users onto the machine. We hope to add even more users soon. Anyone interested in using Perlmutter may apply using the Perlmutter Access Request Form.

The second phase of the machine, consisting of CPU-only nodes, is beginning to arrive next month. After all the new nodes arrive, all of Perlmutter will be taken out of service and integrated over a period that we anticipate could take up to 8 weeks. We are developing a plan for integration that will reduce the amount of time the entire system is down. We will let you know when this plan is finalized.

This newsletter item will be updated each week with the latest Perlmutter status.

Reduction in Perlmutter Node Availability during Cooling System Physical Maintenance

The Perlmutter Phase 1 system, which is currently in its early user pre-production stage, requires physical maintenance of the cooling system that will take up to 6 weeks to complete. Rather than shut down the entire machine, NERSC will perform the maintenance in a rolling fashion with the aim of keeping 500 or more nodes available to users. Occasionally, some jobs may see decreased GPU performance during this time. We will try to keep as much of the system available as possible, but please understand that Perlmutter is not yet a production resource with any uptime guarantees.

(back to top)


Updates at NERSC

Need Help? Check out NERSC Documentation, Send in a Ticket or Consult Your Peers!

Are you confused about setting up your MFA token? Is there something not quite right with your job script that causes the job submission filter to reject it? Are you struggling to understand the performance of your code on the KNL nodes?

There are many ways that you can get help with issues at NERSC:

  • First, we recommend the NERSC documentation (https://docs.nersc.gov/). Usually the answers for simpler issues, such as setting up your MFA token using Google Authenticator, can be found there. (The answers to more complex issues can be found in the documentation too!)
  • For more complicated issues, or issues that leave you unable to work, submitting a ticket is a good way to get help fast. NERSC's consulting team will get back to you within four business-hours (8 am - 5 pm, Monday through Friday, except holidays) with a response. To submit a ticket, log in to https://help.nersc.gov (or, if the issue prevents you from logging in, send an email to [email protected]).
  • For queries that might require some back-and-forth, NERSC provides an appointment service. Sign up for an appointment on a variety of topics, including "NERSC 101", KNL Optimization, Containers at NERSC, NERSC File Systems, GPU Basics, GPUs in Python, and Checkpoint/Restart.
  • The NERSC Users Group Slack, while not an official channel for help, is a place where NERSC users often answer each others' questions, such as whether anyone else is seeing something strange, or how to get better job throughput. You can join the NUG Slack by following this link (login required)
  • Sometimes, a colleague can figure out the issue faster than NERSC, because they already understand your workflow. It's possible that they know what flag you need to add to your Makefile for better performance, or how to set up your job submission script just so.

Cori OS Upgrade and New Default Environment in March 2022

In order to remain in compliance with minimum requirements for support from HPE/Cray, Cori will undergo an operating system (OS) upgrade during the scheduled maintenance on March 16, 2022.

At that time, we will also update the default user programming environment on Cori for the first time since January 2020. The default Cray Developer Toolkit (CDT) version will change from 19.11 to 22.02 (note new version), and the Intel compiler default will change from 19.0.3.199 to 19.1.2.254. A detailed list of software changes (including cray-mpich, cray-libsci, cray-netcdf, cray-hdf5, gcc, cce, intel, perftools, etc.) can be found here. NERSC-supported software will be updated to be compatible with the new OS and CDT. Users will need to relink all statically compiled codes. We also highly recommend rebuilding all your applications.

E4S Updates

E4S Versions 20.10 & 21.02 Will Be Deprecated upon Cori OS Upgrade in March

Due to the upgrade of the operating system on Cori in March, the two earliest E4S versions on Cori, 20.10 and 21.02, will be deprecated at that time. The module files for these versions have been updated to inform you of this change. We encourage you to start using newer versions of E4S at this time.

E4S Version 21.11 Available on Perlmutter

We are pleased to announce that the E4S/21.11 software stack has been rebuilt for Perlmutter using GCC version 9.3.0 and NVIDIA version 21.9. We have deployed a subset of the most commonly used elements of the software stack. It is accessible via module load e4s/21.11-tcl or module load e4s/21.11-lmod. Both point to the same spack instance but employ two different types of module trees.

In addition, we have released instructions on using a containerized deployment of E4S via Shifter. The container, provided by the E4S team, includes the full E4S software stack built on Ubuntu 20.04.

For more information, please see the E4S documentation at https://docs.nersc.gov/applications/e4s/perlmutter/21.11/.

(back to top)


Calls for Participation

Argonne Training Program on Extreme-Scale Computing Applications Due March 1!

Are you a doctoral student, postdoc, or computational scientist looking for advanced training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future? If so, consider applying for the Argonne Training Program on Extreme-Scale Computing (ATPESC) program.

The core of the two-week program focuses on programming methodologies that are effective across a variety of supercomputers and applicable to exascale systems. Additional topics to be covered include computer architectures, mathematical models and numerical algorithms, approaches to building community codes for HPC systems, and methodologies and tools relevant for Big Data applications. This year's program will be held July 31-August 12 in the Chicago area. There is no cost to attend. Domestic airfare, meals, and lodging are provided.

For more information and to apply, please see https://extremecomputingtraining.anl.gov/. The application deadline is next Tuesday, March 1, 2021.

Attention Students: Apply for a Summer Internship at NERSC!

Are you an undergraduate or graduate student looking for a summer internship opportunity? Consider applying for a summer internship at NERSC! NERSC hosts a number of paid internships on a variety of topics every year.

Please check out the growing list of internship projects on our website. If you're interested in a project, reach out to the appropriate point of contact directly with your CV/resume.

(back to top)


Upcoming Training Events

Introduction to Programming with SYCL on Perlmutter & Beyond, March 1

SYCL is an open-standard programming model that allows developers to program for a range of GPUs and other accelerator processors using standard C++ code. This means your standard C++ code can target Nvidia, AMD, and Intel GPUs from a single code base. Engineers from Codeplay have partnered with national labs to bring SYCL support to Perlmutter, Polaris, and Frontier.

Join Codeplay engineers on March 1 for a half-day, hands-on workshop covering the fundamentals of SYCL programming, including practical examples and exercises to help reinforce the concepts being learned. Attendees will learn how to compile their SYCL code using the DPC++ compiler to target Nvidia GPUs such as on Perlmutter. We'll end with useful tips to achieve good performance, including best practices for memory management, with free time for questions and discussions.

For more information and to register, please see https://www.nersc.gov/users/training/events/an-introduction-to-programming-with-sycl-on-perlmutter-and-beyond-march2022.

IDEAS-ECP Webinar on "Software Design Patterns in Research Software with Examples from OpenFOAM" March 9

The March webinar in the Best Practices for HPC Software Developers series is entitled "Software Design Patterns in Research Software with Examples from OpenFOAM", and will take place Wednesday, March 9, at 10:00 am Pacific time.

In this webinar, Tomislav Marc (TU Darmstadt) will discuss beneficial software design patterns that provide a solid basis for developing numerical methods in a modular way, drawing concrete examples from OpenFOAM, a highly modular open-source software for Computational Fluid Dynamics.

There is no cost to attend, but registration is required. Please register here.

(back to top)


NERSC News

Come Work for NERSC!

NERSC currently has several openings for postdocs, system administrators, and more! If you are looking for new opportunities, please consider the following openings:

  • HPC Architecture and Performance Engineer: Contribute to the effort to develop a complete understanding of the issues leading to improved application and computer-system performance on extreme-scale advanced architectures.
  • Machine Learning Postdoctoral Fellow: Collaborate with computational and domain scientists to enable machine learning at scale on NERSC's Perlmutter supercomputer.
  • Scientific Data Architect: Collaborate with scientists to meet their Data, AI, and Analytics needs on NERSC supercomputers.
  • Exascale Computing Postdoctoral Fellow: Collaborate with ECP math library and scientific application teams to enable the solution of deep, meaningful problems targeted by the ECP program and other DOE/Office of Science program areas.
  • Data & Analytics Team Group Lead: Provide vision and guidance and lead a team that provides data management, analytics and AI software, support, and expertise to NERSC users.
  • Cyber Security Engineer: Help protect NERSC from malicious and unauthorized activity.
  • Machine Learning Engineer: Apply machine learning and AI to NERSC systems to improve on their ability to deliver productive science output.
  • HPC Performance Engineer: Join a multidisciplinary team of computational and domain scientists to speed up scientific codes on cutting-edge computing architectures.

(Note: You can browse all our job openings on the NERSC Careers page, and all Berkeley Lab jobs at https://jobs.lbl.gov.)

We know that NERSC users can make great NERSC employees! We look forward to seeing your application.

Upcoming Outages

  • Cori
    • 03/16/22 07:00-20:00 PDT, Scheduled Maintenance
    • 04/20/22 07:00-20:00 PDT, Scheduled Maintenance
    • 05/18/22 07:00-20:00 PDT, Scheduled Maintenance
  • Perlmutter
    • 02/22/22 09:00-17:00 PST, Scheduled Maintenance
      • System will be unavailable.
  • HPSS Archive (User)
    • 02/23/22 09:00-17:00 PST, Scheduled Maintenance
      • The HPSS Archive system will be degraded due to preventative library maintenance. The system will remain available, but some file retrievals may be delayed during the maintenance window.
    • 03/02/22 09:00-13:00 PST, Scheduled Maintenance
      • System available, retrievals may be delayed due to tape drive firmware updates
    • 03/09/22 09:00-13:00 PST, Scheduled Maintenance
      • System available, retrievals may be delayed due to tape library firmware updates
  • HPSS Regent (Backup)
    • 03/23/22 09:00-17:00 PDT, Scheduled Maintenance
      • System available, retrievals may be delayed due to tape library preventative maintenance
  • DNA
    • 03/16/22 11:00-14:00 PDT, Scheduled Maintenance
      • Users may see degraded performance while we perform maintenance on DNA.

Visit http://my.nersc.gov/ for latest status and outage information.

About this Email

You are receiving this email because you are the owner of an active account at NERSC. This mailing list is automatically populated with the email addresses associated with active NERSC accounts. In order to remove yourself from this mailing list, you must close your account, which can be done by emailing [email protected] with your request.