From 45dc0883f770b859d9645e8736a154d580c911fc Mon Sep 17 00:00:00 2001 From: Trevor Keller Date: Thu, 2 Jun 2022 09:00:04 -0400 Subject: [PATCH] move URLs to end of files for easier reference --- _episodes/10-hpc-intro.md | 19 +++++++++--------- _episodes/11-connecting.md | 31 ++++++++++++++++++------------ _episodes/12-cluster.md | 3 ++- _episodes/13-scheduler.md | 6 +++--- _episodes/15-transferring-files.md | 18 +++++++++-------- _episodes/16-parallel.md | 18 ++++++++--------- 6 files changed, 53 insertions(+), 42 deletions(-) diff --git a/_episodes/10-hpc-intro.md b/_episodes/10-hpc-intro.md index be6fb041..78882ca2 100644 --- a/_episodes/10-hpc-intro.md +++ b/_episodes/10-hpc-intro.md @@ -74,13 +74,11 @@ separate window, then press `P` to toggle presentation mode. > > * Searching for a phrase online involves comparing your search term against > > a massive database of all known sites, looking for matches. This "query" > > operation can be straightforward, but building that database is a -> > [monumental task](https://en.wikipedia.org/wiki/MapReduce)! Servers are -> > involved at every step. +> > [monumental task][mapreduce]! Servers are involved at every step. > > * Searching for directions on a mapping website involves connecting your -> > (A) starting and (B) end points by [traversing a graph]( -> > https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) in search of -> > the "shortest" path by distance, time, expense, or another metric. -> > Converting a map into the right form is relatively simple, but +> > (A) starting and (B) end points by [traversing a graph][dijkstra] in +> > search of the "shortest" path by distance, time, expense, or another +> > metric. Converting a map into the right form is relatively simple, but > > calculating all the possible routes between A and B is expensive. > > > > Checking email could be serial: your machine connects to one server and @@ -88,10 +86,13 @@ separate window, then press `P` to toggle presentation mode. > > endpoints) could also be serial, in that one machine receives your query > > and returns the result. However, assembling and storing the full database > > is far beyond the capability of any one machine. Therefore, these functions -> > are served in parallel by a large, ["hyperscale"]( -> > https://en.wikipedia.org/wiki/Hyperscale_computing) collection of servers -> > working together. +> > are served in parallel by a large, ["hyperscale"][hyperscale] collection of +> > servers working together. > {: .solution} {: .challenge } {% include links.md %} + +[dijkstra]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm +[hyperscale]: https://en.wikipedia.org/wiki/Hyperscale_computing +[mapreduce]: https://en.wikipedia.org/wiki/MapReduce diff --git a/_episodes/11-connecting.md b/_episodes/11-connecting.md index f19f7aa1..72ebada0 100644 --- a/_episodes/11-connecting.md +++ b/_episodes/11-connecting.md @@ -47,7 +47,7 @@ SSH clients are usually command-line tools, where you provide the remote machine address as the only required argument. If your username on the remote system differs from what you use locally, you must provide that as well. If your SSH client has a graphical front-end, such as PuTTY or MobaXterm, you will -set these arguments before clicking "connect." From the terminal, you'll write +7set these arguments before clicking "connect." From the terminal, you'll write something like `ssh userName@hostname`, where the argument is just like an email address: the "@" symbol is used to separate the personal ID from the address of the remote machine. @@ -73,8 +73,9 @@ shell application with a Unix-like command line interface to your system. SSH keys are an alternative method for authentication to obtain access to remote computing systems. They can also be used for authentication when -transferring files or for accessing remote version control systems (such as [GitHub](https://docs.github.com/en/authentication/connecting-to-github-with-ssh)). In this section -you will create a pair of SSH keys: +transferring files or for accessing remote version control systems (such as +[GitHub][gh-ssh]). +In this section you will create a pair of SSH keys: * a private key which you keep on your own computer, and * a public key which can be placed on any remote system you will access. @@ -91,7 +92,8 @@ you will create a pair of SSH keys: {: .caution} Regardless of the software or operating system you use, _please_ choose a -strong password or passphrase to act as another layer of protection for your private SSH key. +strong password or passphrase to act as another layer of protection for your +private SSH key. > ## Considerations for SSH Key Passwords > @@ -131,7 +133,8 @@ produce a stronger key than the `ssh-keygen` default by invoking these flags: algorithm. `ed25519` specifies [EdDSA][wiki-dsa] with a 256-bit key; it is faster than RSA with a comparable strength. * `-f` (default is /home/user/.ssh/id_algorithm): filename to store your - private key. The public key filename will be identical, with a `.pub` extension added. + private key. The public key filename will be identical, with a `.pub` + extension added. ``` {{ site.local.prompt }} ssh-keygen -a 100 -f ~/.ssh/id_ed25519 -t ed25519 @@ -327,7 +330,7 @@ system using the terminal (if you logged in using PuTTY this will not apply because it does not offer a local terminal). This change is important because it can help you distinguish on which system the commands you type will be run when you pass them into the terminal. This change is also a small complication -that we will need to navigate throughout the workshop. Exactly what is reported +that we will need to navigate throughout the workshop. Exactly what is displayed as the prompt (which conventionally ends in `$`) in the terminal when it is connected to the local system and the remote system will typically be different for every user. We still need to indicate which system we are entering commands @@ -377,6 +380,10 @@ Great, we know where we are! Let's see what's in our current directory: {{ site.remote.prompt }} ls ``` {: .language-bash} +``` +id_ed25519.pub +``` +{: .output} The system administrators may have configured your home directory with some helpful files, folders, and links (shortcuts) to space reserved for you on @@ -403,11 +410,10 @@ keys and a record of authorized connections. > ## There May Be a Better Way > -> Policies and practices for handling SSH keys vary between HPC -> clusters: follow any guidance provided by the cluster -> administrators or documentation. In particular, if there is an -> online portal for managing SSH keys, use that instead of the -> directions outlined here. +> Policies and practices for handling SSH keys vary between HPC clusters: +> follow any guidance provided by the cluster administrators or +> documentation. In particular, if there is an online portal for managing SSH +> keys, use that instead of the directions outlined here. {: .callout} If you transferred your SSH public key with `scp`, you should see @@ -440,7 +446,7 @@ password for your SSH key. {: .language-bash} ``` -{{ site.local.prompt }} ssh {{ site.remote.user }}@{{ site.remote.login }} +{{ site.local.prompt }} ssh {{ site.remote.user }}@{{ site.remote.login }} ``` {: .language-bash} @@ -448,6 +454,7 @@ password for your SSH key. [bitwarden]: https://bitwarden.com [fshs]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard +[gh-ssh]: https://docs.github.com/en/authentication/connecting-to-github-with-ssh [keepass]: https://keepass.info [putty-gen]: https://tartarus.org/~simon/putty-prerel-snapshots/htmldoc/Chapter8.html#pubkey-puttygen [putty-agent]: https://tartarus.org/~simon/putty-prerel-snapshots/htmldoc/Chapter9.html#pageant diff --git a/_episodes/12-cluster.md b/_episodes/12-cluster.md index 2186fd44..d535fb5d 100644 --- a/_episodes/12-cluster.md +++ b/_episodes/12-cluster.md @@ -266,7 +266,7 @@ connect to a shared, remote fileserver or cluster of servers. > > > you're on the same login node (or compute node, later on). > > > * Networked filesystems (beegfs, cifs, gpfs, nfs, pvfs) will be similar > > > -- but may include {{ site.remote.user }}, depending on how it -> > > is [mounted](https://en.wikipedia.org/wiki/Mount_(computing)). +> > > is [mounted][mount]. > > {: .discussion} > > > > > ## Shared Filesystems @@ -310,3 +310,4 @@ scheduler, and use it to start running our scripts and programs! {% include links.md %} [fshs]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard +[mount]: https://en.wikipedia.org/wiki/Mount_(computing) diff --git a/_episodes/13-scheduler.md b/_episodes/13-scheduler.md index ca893def..54384471 100644 --- a/_episodes/13-scheduler.md +++ b/_episodes/13-scheduler.md @@ -320,12 +320,12 @@ Up to this point, we've focused on running jobs in batch mode. There are very frequently tasks that need to be done interactively. Creating an entire job script might be overkill, but the amount of resources required is too much for a login node to handle. A good example of this might be building a -genome index for alignment with a tool like -[HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml). Fortunately, we can -run these types of tasks as a one-off with `{{ site.sched.interactive }}`. +genome index for alignment with a tool like [HISAT2][hisat]. Fortunately, we +can run these types of tasks as a one-off with `{{ site.sched.interactive }}`. {% include {{ site.snippets }}/scheduler/using-nodes-interactively.snip %} {% include links.md %} [fshs]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard +[hisat]: https://ccb.jhu.edu/software/hisat2/index.shtml diff --git a/_episodes/15-transferring-files.md b/_episodes/15-transferring-files.md index 7eeeaae9..92017db8 100644 --- a/_episodes/15-transferring-files.md +++ b/_episodes/15-transferring-files.md @@ -157,7 +157,7 @@ A trailing slash on the target directory is optional, and has no effect for > ## A Note on `rsync` > > As you gain experience with transferring files, you may find the `scp` -> command limiting. The [rsync](https://rsync.samba.org/) utility provides +> command limiting. The [rsync][rsync] utility provides > advanced features for file transfer and is typically faster compared to both > `scp` and `sftp` (see below). It is especially useful for transferring large > and/or many files and creating synced backup folders. @@ -234,14 +234,14 @@ you will have to specify it using the appropriate flag, often `-p`, `-P`, or FileZilla is a cross-platform client for downloading and uploading files to and from a remote computer. It is absolutely fool-proof and always works quite well. It uses the `sftp` protocol. You can read more about using the `sftp` -protocol in the command line [here]({{ site.baseurl }}{% link -_extras/discuss.md %}). +protocol in the command line in the +[lesson discussion]({{ site.baseurl }}{% link extras/discuss.md %}). -Download and install the FileZilla client from -[https://filezilla-project.org](https://filezilla-project.org). After -installing and opening the program, you should end up with a window with a file -browser of your local system on the left hand side of the screen. When you -connect to the cluster, your cluster files will appear on the right hand side. +Download and install the FileZilla client from . +After installing and opening the program, you should end up with a window with +a file browser of your local system on the left hand side of the screen. When +you connect to the cluster, your cluster files will appear on the right hand +side. To connect to the cluster, we'll just need to enter our credentials at the top of the screen: @@ -429,3 +429,5 @@ then provide a directory to compress: {: .callout} {% include links.md %} + +[rsync]: https://rsync.samba.org/ diff --git a/_episodes/16-parallel.md b/_episodes/16-parallel.md index d8b2d649..bf0693e9 100644 --- a/_episodes/16-parallel.md +++ b/_episodes/16-parallel.md @@ -166,7 +166,7 @@ The first line calculates the bytes of memory required for a single 64-bit floating point number using the `dtype` function. The second line estimates the total amount of memory required to store three variables containing `n_samples` `float64` values, converting the value into -units of [gibibytes](https://en.wikipedia.org/wiki/Byte#Multiple-byte_units). +units of [gibibytes][units]. The third line prints both the estimate of π and the estimated amount of memory used by the script. @@ -609,13 +609,12 @@ in the computer, or across multiple compute nodes, additional time is required for communication compared to all processes operating on a single CPU. -[Amdahl's Law][wiki-amdahl] is one way of -predicting improvements in execution time for a __fixed__ parallel workload. -If a workload needs 20 hours to complete on a single core, -and one hour of that time is spent on tasks that cannot be parallelized, -only the remaining 19 hours could be parallelized. -Even if an infinite number of cores were used for the parallel parts of -the workload, the total run time cannot be less than one hour. +[Amdahl's Law][amdahl] is one way of predicting improvements in execution time +for a __fixed__ parallel workload. If a workload needs 20 hours to complete on +a single core, and one hour of that time is spent on tasks that cannot be +parallelized, only the remaining 19 hours could be parallelized. Even if an +infinite number of cores were used for the parallel parts of the workload, the +total run time cannot be less than one hour. In practice, it's common to evaluate the parallelism of an MPI program by @@ -660,9 +659,10 @@ parallelization, see the [parallel novice lesson][parallel-novice] lesson. {% include links.md %} +[amdahl]: https://en.wikipedia.org/wiki/Amdahl's_law [cmd-line]: https://swcarpentry.github.io/python-novice-inflammation/12-cmdline/index.html [inflammation]: https://swcarpentry.github.io/python-novice-inflammation/ [np-dtype]: https://numpy.org/doc/stable/reference/generated/numpy.dtype.html [parallel-novice]: http://www.hpc-carpentry.org/hpc-parallel-novice/ [python-func]: https://swcarpentry.github.io/python-novice-inflammation/08-func/index.html -[wiki-amdahl]: https://en.wikipedia.org/wiki/Amdahl's_law +[units]: https://en.wikipedia.org/wiki/Byte#Multiple-byte_units