Initial draft of 20-architecture episode #48

bernhold · 2018-06-01T15:02:00Z

Initial draft developed at CarpentryCon2018 by @Ianvdl, @sleak-lbl, and @bernhold. Use it in good health!

ChristinaLK · 2018-06-01T20:32:24Z

_episodes/20-architecture.md

+
+A high-performance computer (HPC system) is a tool used by computational scientists and engineers to tackle problems that require more computing resources or time than they can obtain on the personal computers available to them. HPC systems range in size from the equivalent of just a few personal computers to tens, or even hundreds of thousands of them.  They tend to be expensive to buy and operate, so they are often shared at the departmental or institutional level.  There are also many regional and national HPC centers.  Because of this, most HPC systems are accessed remotely, over the network.
+
+HPC systems are generally constructed from many individual computers, similar in capability to many personal computers.  Each of these individual computers is often referred to as a **node**. HPC systems often include several different types of nodes, which are specialized for different purposes.  **Head** (or **front-end** or **login**) nodes are where you login to interact with the computer.  **Compute** nodes are where the real computing is done.  **Storage** nodes provide the specialized filesystems used on HPC systems.  Some HPC systems also have **service** nodes, which you don't usually interact with directly, but you will sometimes read about.  These nodes are connected by a network (or interconnect), which is often designed to provide very high performance as well.


I don't think we need to introduce service/storage nodes. Too many terms!

I debated with myself about service nodes. Storage, I'm not yet persuaded. I can see an argument of focusing on the filesystems and not worrying about the hardware behind that. My feeling was that storage nodes might be something they would see in descriptions of machines at some facilities. I'm willing to be argued out of this feeling.

I kinda agree with @ChristinaLK...from the standpoint of HPC software use and user experience, I think the only kinds of nodes users will routinely wind up dealing with in conversation, or instructions or in reading documentation would be the compute (back-end) nodes and the login (front-end) nodes. System architectural documents are probably the only places where other kinds of nodes (storage, service, gateway, etc.) might wind up getting described and so are probably outside the scope of a novice HPC lesson.

ChristinaLK · 2018-06-01T20:33:18Z

_episodes/20-architecture.md

+Something like http://www.archer.ac.uk/training/course-material/2018/03/intro-hw/slides/L01_WhyHPC.pdf
+-->
+
+Depending on the HPC system, the compute nodes, even individually, might be much more powerful than a typical personal computer.  They often have multiple processors (each with many cores), and may have accelerators (such as GPUs) and other capabilities less common on personal computers.


but what makes things go fast is usually quantity not quality. I think that should be clear.

Both can be important. To the extent that the nodes are much more powerful than a personal computer, users can do much more with them -- and need to in order to use the machine effectively. Example: the Summit system now being stood up at Oak Ridge, each node has 122 CPU cores and 6 GPUs, as well as two large flash drives and 608 GB of memory. If you expect to run there only what you run on your laptop, it is a complete waste.

good point!

ChristinaLK · 2018-06-01T20:34:04Z

_episodes/20-architecture.md

+
+In order to share these large systems among many users, it is common to allocate subsets of the compute nodes to tasks (or **jobs**), based on requests from users.  These jobs may take a long time to complete, so they come and go in time. To manage the sharing of the compute nodes among all of the jobs, HPC systems use a **batch system** or **scheduler**.  The batch system usually has commands for submitting jobs, inquiring about their status, and modifying them.  The HPC center defines the algorithms by which jobs are prioritized for execution on the compute nodes, while ensuring that the compute nodes are not overloaded. <!-- reference to episode 30 -->
+
+The kind of computing that people do on HPC systems often involves very large files, and/or many of them.  Further, the files have to be accessible from all of the front-end and compute nodes on the system.  So most HPC systems have specialized filesystems that are designed to do a better job of meeting these needs than typical network filesystems, like NFS. Frequently, these specialized filesystems are intended to be used only for short- or medium-term storage, not permanent storage.  So HPC systems often have several different filesystems available -- for example **home**, and **scratch** filesystems.  It can be very important to select the right filesystem to get the results you want (performance or permanence are the typical trade-offs).  <!-- reference to episode 35 -->


bernhold added 6 commits June 1, 2018 09:37

Gratuitous change to get web rendering going.

59f6a4e

Checkpoint

70b726a

Update 20-architecture.md

c9e5715

Checkpoint

85eb51c

Update 20-architecture.md

6ceb1af

Maybe version 0.9

72ad59c

ChristinaLK reviewed Jun 1, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial draft of 20-architecture episode #48

Initial draft of 20-architecture episode #48

bernhold commented Jun 1, 2018

ChristinaLK Jun 1, 2018

bernhold Jun 1, 2018

markcmiller86 Jun 2, 2018

ChristinaLK Jun 1, 2018

bernhold Jun 1, 2018

ChristinaLK Jun 1, 2018

ChristinaLK Jun 1, 2018


		A high-performance computer (HPC system) is a tool used by computational scientists and engineers to tackle problems that require more computing resources or time than they can obtain on the personal computers available to them. HPC systems range in size from the equivalent of just a few personal computers to tens, or even hundreds of thousands of them. They tend to be expensive to buy and operate, so they are often shared at the departmental or institutional level. There are also many regional and national HPC centers. Because of this, most HPC systems are accessed remotely, over the network.

		HPC systems are generally constructed from many individual computers, similar in capability to many personal computers. Each of these individual computers is often referred to as a node. HPC systems often include several different types of nodes, which are specialized for different purposes. Head (or front-end or login) nodes are where you login to interact with the computer. Compute nodes are where the real computing is done. Storage nodes provide the specialized filesystems used on HPC systems. Some HPC systems also have service nodes, which you don't usually interact with directly, but you will sometimes read about. These nodes are connected by a network (or interconnect), which is often designed to provide very high performance as well.


		In order to share these large systems among many users, it is common to allocate subsets of the compute nodes to tasks (or jobs), based on requests from users. These jobs may take a long time to complete, so they come and go in time. To manage the sharing of the compute nodes among all of the jobs, HPC systems use a batch system or scheduler. The batch system usually has commands for submitting jobs, inquiring about their status, and modifying them. The HPC center defines the algorithms by which jobs are prioritized for execution on the compute nodes, while ensuring that the compute nodes are not overloaded. <!-- reference to episode 30 -->

		The kind of computing that people do on HPC systems often involves very large files, and/or many of them. Further, the files have to be accessible from all of the front-end and compute nodes on the system. So most HPC systems have specialized filesystems that are designed to do a better job of meeting these needs than typical network filesystems, like NFS. Frequently, these specialized filesystems are intended to be used only for short- or medium-term storage, not permanent storage. So HPC systems often have several different filesystems available -- for example home, and scratch filesystems. It can be very important to select the right filesystem to get the results you want (performance or permanence are the typical trade-offs). <!-- reference to episode 35 -->

Initial draft of 20-architecture episode #48

Are you sure you want to change the base?

Initial draft of 20-architecture episode #48

Conversation

bernhold commented Jun 1, 2018

ChristinaLK Jun 1, 2018

Choose a reason for hiding this comment

bernhold Jun 1, 2018

Choose a reason for hiding this comment

markcmiller86 Jun 2, 2018

Choose a reason for hiding this comment

ChristinaLK Jun 1, 2018

Choose a reason for hiding this comment

bernhold Jun 1, 2018

Choose a reason for hiding this comment

ChristinaLK Jun 1, 2018

Choose a reason for hiding this comment

ChristinaLK Jun 1, 2018

Choose a reason for hiding this comment