From 529fc1a59b9c02cfa57a80b3442f29dc7712f3fd Mon Sep 17 00:00:00 2001 From: Sara Walton Date: Tue, 16 Jul 2024 14:46:41 -0600 Subject: [PATCH] removing white spaces --- rtd/README.rst | 6 +- rtd/docs/source/conf.py | 2 +- rtd/docs/source/contributing/index.rst | 2 +- rtd/docs/source/index.rst | 4 +- rtd/docs/source/ldms-index.rst | 4 +- rtd/docs/source/ldms-quickstart.rst | 208 +++++++++--------- rtd/docs/source/ldms-streams.rst | 198 ++++++++--------- rtd/docs/source/ldmscon.rst | 10 +- .../store_man/Plugin_avro_kafka_store.rst | 44 ++-- .../store_man/Plugin_darshan_stream_store.rst | 8 +- .../store_man/Plugin_store_flatfile.rst | 8 +- .../source/store_man/Plugin_store_sos.rst | 50 ++--- .../store_man/Plugin_store_timescale.rst | 22 +- .../store_man/Plugin_store_tutorial.rst | 12 +- .../store_man/Plugin_stream_csv_store.rst | 30 +-- rtd/docs/source/store_man/index.rst | 2 +- rtd/docs/source/ug.rst | 2 +- 17 files changed, 306 insertions(+), 306 deletions(-) diff --git a/rtd/README.rst b/rtd/README.rst index ef558b8786..80b6a640fa 100644 --- a/rtd/README.rst +++ b/rtd/README.rst @@ -33,14 +33,14 @@ Instructions and documentation on how to use ReadTheDocs can be found here: > git add > git commit -m "add message" > git push - -Adding A New File + +Adding A New File ****************** For any new RST files created, please include them in rtd/docs/src/index.rst under their corresponding sections. All RST files not included in index.rst will not populate on the offical webpage (e.g. readthedocs). Paper Lock ************ This is for claiming any sections you are working on so there is no overlap. -Please USE paper.lock to indicate if you are editing an existing RST file. +Please USE paper.lock to indicate if you are editing an existing RST file. diff --git a/rtd/docs/source/conf.py b/rtd/docs/source/conf.py index 40df1768b6..7de1dbac67 100644 --- a/rtd/docs/source/conf.py +++ b/rtd/docs/source/conf.py @@ -22,7 +22,7 @@ intersphinx_mapping = { 'python': ('https://docs.python.org/3/', None), 'sphinx': ('https://www.sphinx-doc.org/en/master/', None), - + # Link to the "apis" of the "hpc-ovis" project and subprojects "ovis-hpc": ("https://ovis-hpc.readthedocs.io/en/latest/", None), "sos": ("https://ovis-hpc.readthedocs.io/projects/sos/en/latest/", None), diff --git a/rtd/docs/source/contributing/index.rst b/rtd/docs/source/contributing/index.rst index 2181c21556..77499458d4 100644 --- a/rtd/docs/source/contributing/index.rst +++ b/rtd/docs/source/contributing/index.rst @@ -3,7 +3,7 @@ Contributing to LDMS .. toctree:: :maxdepth: 2 - + samplerwrite storewrite docreqs diff --git a/rtd/docs/source/index.rst b/rtd/docs/source/index.rst index 2285e05e50..1c5b2b950d 100644 --- a/rtd/docs/source/index.rst +++ b/rtd/docs/source/index.rst @@ -46,7 +46,7 @@ Welcome To OVIS-HPC Documentation! Baler ASF - + Other Projects ==================================== @@ -56,7 +56,7 @@ Other Projects `sos `_ `baler `_ - + diff --git a/rtd/docs/source/ldms-index.rst b/rtd/docs/source/ldms-index.rst index 8982e31574..0c5f280333 100644 --- a/rtd/docs/source/ldms-index.rst +++ b/rtd/docs/source/ldms-index.rst @@ -1,4 +1,4 @@ -LDMS +LDMS ====== .. image:: images/ovis-hpc_homepage.png @@ -41,6 +41,6 @@ To join the LDMS Users Group Mailing List: https://github.com/ovis-hpc/ovis-wiki :maxdepth: 2 :caption: Contributing to LDMS - contributing/index + contributing/index diff --git a/rtd/docs/source/ldms-quickstart.rst b/rtd/docs/source/ldms-quickstart.rst index b28a24a3bf..1670aa7935 100644 --- a/rtd/docs/source/ldms-quickstart.rst +++ b/rtd/docs/source/ldms-quickstart.rst @@ -4,7 +4,7 @@ LDMS Quick Start Installation ***************** -AlmaLinux8 +AlmaLinux8 ------------ Prerequisites @@ -60,9 +60,9 @@ The following steps were ran on AlmaLinux8 arm64v8 sudo dnf install -y python38-devel sudo dnf install -y python38-Cython sudo dnf install -y python38-libs - -RHEL 9 + +RHEL 9 ------------ Prerequisites @@ -74,28 +74,28 @@ Prerequisites * libtool * python3 (or higher) * python3-devel (or higher) -* cython +* cython * bison * flex Prerequisite Installation --------------------------- -The following steps were ran on a basic RHEL 9 instance via AWS. +The following steps were ran on a basic RHEL 9 instance via AWS. .. code-block:: RST sudo yum update -y sudo yum install automake -y - sudo yum install openssl-devel -y + sudo yum install openssl-devel -y sudo yum install pkg-config -y sudo yum install libtool -y sudo yum install python3 -y sudo yum install python3-devel.x86_64 -y sudo yum install python3-Cython -y - sudo yum install make -y - sudo yum install bison -y + sudo yum install make -y + sudo yum install bison -y sudo yum install flex -y - + LDMS Source Installation Instructions -------------------------- @@ -106,31 +106,31 @@ Getting the Source * This example shows cloning into $HOME/Source/ovis-4 and installing into $HOME/ovis/4.4.2 .. code-block:: RST - + mkdir $HOME/Source mkdir $HOME/ovis cd $HOME/Source git clone -b OVIS-4.4.2 https://github.com/ovis-hpc/ovis.git ovis-4 - + Building the Source ----------------------- * Run autogen.sh .. code-block:: RST - cd $HOME/Source/ovis + cd $HOME/Source/ovis ./autogen.sh * Configure and Build (Builds default linux samplers. Build installation directory is prefix): .. code-block:: RST - + mkdir build cd build ../configure --prefix=$HOME/ovis/4.4.2 make make install - + Basic Configuration and Running ******************************* * Set up environment: @@ -143,12 +143,12 @@ Basic Configuration and Running export ZAP_LIBPATH=$OVIS/lib/ovis-ldms export PATH=$OVIS/sbin:$OVIS/bin:$PATH export PYTHONPATH=$OVIS/lib/python3.8/site-packages - + Sampler *********************** * Edit a new configuration file, named `sampler.conf`, to load the `meminfo` and `vmstat` samplers. For this example, it can be saved anywhere, but it will be used later to start the LDMS Daemon (`ldmsd`) -The following configuration employs generic hostname, uid, gid, component id, and permissions octal set values. +The following configuration employs generic hostname, uid, gid, component id, and permissions octal set values. Sampling intervals are set using a "microsecond" time unit (i.e., 1 sec=1e+6 µs), and are adjustable, as needed. Some suggestions include: @@ -180,7 +180,7 @@ Some suggestions include: .. code-block:: RST :linenos: - + # Meminfo Sampler Plugin using 1 second sampling interval load name=meminfo config name=meminfo producer=host1 instance=host1/meminfo component_id=1 schema=meminfo job_set=host1/jobinfo uid=12345 gid=12345 perm=0755 @@ -189,8 +189,8 @@ Some suggestions include: load name=vmstat config name=vmstat producer=host1 instance=host1/vmstat component_id=1 schema=vmstat job_set=host1/jobinfo uid=0 gid=0 perm=0755 start name=vmstat interval=10000000 offset=0 - -As an alternative to the configuration above, one may, instead, export environmental variables to set LDMS's runtime configuration by using variables to reference those values in the sampler configuration file. + +As an alternative to the configuration above, one may, instead, export environmental variables to set LDMS's runtime configuration by using variables to reference those values in the sampler configuration file. The following setup will set the samplers to collect at 1 second, (i.e., 1000000 µs) intervals: @@ -213,18 +213,18 @@ The following setup will set the samplers to collect at 1 second, (i.e., 1000000 config name=vmstat producer=${HOSTNAME} instance=${HOSTNAME}/vmstat component_id=${COMPONENT_ID} schema=vmstat job_set=${HOSTNAME}/jobinfo uid=0 gid=0 perm=0755 start name=vmstat interval=${SAMPLE_INTERVAL} offset=${SAMPLE_OFFSET} -* Run a daemon using munge authentication: +* Run a daemon using munge authentication: .. code-block:: RST - + ldmsd -x sock:10444 -c sampler.conf -l /tmp/demo_ldmsd.log -v DEBUG -a munge -r $(pwd)/ldmsd.pid - + Or in non-cluster environments where munge is unavailable: .. code-block:: RST - + ldmsd -x sock:10444 -c sampler.conf -l /tmp/demo_ldmsd.log -v DEBUG -r $(pwd)/ldmsd.pid - + .. note:: For the rest of these instructions, omit the "-a munge" if you do not have munge running. This will also write out DEBUG-level information to the specified (-l) log. @@ -235,7 +235,7 @@ Or in non-cluster environments where munge is unavailable: ldms_ls -h localhost -x sock -p 10444 -a munge ldms_ls -h localhost -x sock -p 10444 -v -a munge ldms_ls -h localhost -x sock -p 10444 -l -a munge - + .. note:: Note the use of munge. Users will not be able to query a daemon launched with munge if not querying with munge. Users will only be able to see sets as allowed by the permissions in response to `ldms_ls`. @@ -244,12 +244,12 @@ Example (note permissions and update hint): .. code-block:: RST ldms_ls -h localhost -x sock -p 10444 -l -v -a munge - + Output: .. code-block:: RST - host1/vmstat: consistent, last update: Mon Oct 22 16:58:15 2018 -0600 [1385us] + host1/vmstat: consistent, last update: Mon Oct 22 16:58:15 2018 -0600 [1385us] APPLICATION SET INFORMATION ------ updt_hint_us : 5000000:0 METADATA -------- @@ -274,8 +274,8 @@ Output: D u64 app_id 0 D u64 nr_free_pages 32522123 ... - D u64 pglazyfree 1082699829 - host1/meminfo: consistent, last update: Mon Oct 22 16:58:15 2018 -0600 [1278us] + D u64 pglazyfree 1082699829 + host1/meminfo: consistent, last update: Mon Oct 22 16:58:15 2018 -0600 [1278us] APPLICATION SET INFORMATION ------ updt_hint_us : 5000000:0 METADATA -------- @@ -303,7 +303,7 @@ Output: D u64 MemAvailable 129556912 ... D u64 DirectMap1G 134217728 - + Aggregator Using Data Pull *********************** @@ -323,20 +323,20 @@ Aggregator Using Data Pull updtr_add name=policy_h2 interval=2000000 offset=100000 updtr_prdcr_add name=policy_h2 regex=host2 updtr_start name=policy_h2 - + * On host3, set up the environment as above and run a daemon: .. code-block:: RST ldmsd -x sock:10445 -c agg11.conf -l /tmp/demo_ldmsd.log -v ERROR -a munge - + * Run `ldms_ls` on the aggregator node to see set listing: .. code-block:: RST ldms_ls -h localhost -x sock -p 10445 -a munge - + Output: .. code-block:: RST @@ -345,12 +345,12 @@ Output: host1/vmstat host2/meminfo host2/vmstat - + You can also run `ldms_ls` to query the ldms daemon on the remote node: .. code-block:: RST - ldms_ls -h host1 -x sock -p 10444 -a munge + ldms_ls -h host1 -x sock -p 10444 -a munge Output: @@ -369,7 +369,7 @@ Aggregator Using Data Push * Make a configuration file (called agg11_push.conf) to cause the two samplers to push their data to the aggregator as they update. * Note that the prdcr configs remain the same as above but the updater_add includes the additional options: push=onchange auto_interval=false. - + * Note that the updtr_add interval has no effect in this case but is currently required due to syntax checking .. code-block:: RST @@ -381,20 +381,20 @@ Aggregator Using Data Push updtr_add name=policy_all interval=5000000 push=onchange auto_interval=false updtr_prdcr_add name=policy_all regex=.* updtr_start name=policy_all - - -* On host3, set up the environment as above and run a daemon: + + +* On host3, set up the environment as above and run a daemon: .. code-block:: RST ldmsd -x sock:10445 -c agg11_push.conf -l /tmp/demo_ldmsd_log -v DEBUG -a munge - + * Run ldms_ls on the aggregator node to see set listing: .. code-block:: RST - ldms_ls -h localhost -x sock -p 10445 -a munge - + ldms_ls -h localhost -x sock -p 10445 -a munge + Output: .. code-block:: RST @@ -403,9 +403,9 @@ Output: host1/vmstat host2/meminfo host2/vmstat - -Two Aggregators Configured as Failover Pairs + +Two Aggregators Configured as Failover Pairs *********************** * Use same sampler configurations as above * Make a configuration file (called agg11.conf) to aggregate from one sampler with the following contents: @@ -419,38 +419,38 @@ Two Aggregators Configured as Failover Pairs updtr_start name=policy_all failover_config host=host3 port=10446 xprt=sock type=active interval=1000000 peer_name=agg12 timeout_factor=2 failover_start - + * On host3, set up the environment as above and run two daemons as follows: .. code-block:: RST ldmsd -x sock:10445 -c agg11.conf -l /tmp/demo_ldmsd_log -v ERROR -n agg11 -a munge ldmsd -x sock:10446 -c agg12.conf -l /tmp/demo_ldmsd_log -v ERROR -n agg12 -a munge - + * Run ldms_ls on each aggregator node to see set listing: .. code-block:: RST - ldms_ls -h localhost -x sock -p 10445 -a munge + ldms_ls -h localhost -x sock -p 10445 -a munge host1/meminfo host1/vmstat ldms_ls -h localhost -x sock -p 10446 -a munge host2/meminfo host2/vmstat - + * Kill one daemon: .. code-block:: RST kill -SIGTERM - + * Make sure it died * Run ldms_ls on the remaining aggregator to see set listing: .. code-block:: RST - ldms_ls -h localhost -x sock -p 10446 -a munge - + ldms_ls -h localhost -x sock -p 10446 -a munge + Output: .. code-block:: RST @@ -464,60 +464,60 @@ Set Groups *********************** A set group is an LDMS set with special information to represent a group of sets inside ldmsd. A set group would appear as a regular LDMS set to other LDMS applications, but ldmsd and `ldms_ls` will treat it as a collection of LDMS sets. If ldmsd updtr updates a set group, it also subsequently updates all the member sets. Performing ldms_ls -l on a set group will also subsequently perform a long-query all the sets in the group. -To illustrate how a set group works, we will configure 2 sampler daemons with set groups and 1 aggregator daemon that updates and stores the groups in the following subsections. +To illustrate how a set group works, we will configure 2 sampler daemons with set groups and 1 aggregator daemon that updates and stores the groups in the following subsections. Creating a set group and inserting sets into it *********************** -The following is a configuration file for our s0 LDMS daemon (sampler #0) that collects sda disk stats in the s0/sda set and lo network usage in the s0/lo set. The s0/grp set group is created to contain both s0/sda and s0/lo. +The following is a configuration file for our s0 LDMS daemon (sampler #0) that collects sda disk stats in the s0/sda set and lo network usage in the s0/lo set. The s0/grp set group is created to contain both s0/sda and s0/lo. .. code-block:: RST ### s0.conf - load name=procdiskstats + load name=procdiskstats config name=procdiskstats device=sda producer=s0 instance=s0/sda - start name=procdiskstats interval=1000000 offset=0 - - load name=procnetdev - config name=procnetdev ifaces=lo producer=s0 instance=s0/lo - start name=procnetdev interval=1000000 offset=0 - - setgroup_add name=s0/grp producer=s0 interval=1000000 offset=0 - setgroup_ins name=s0/grp instance=s0/sda,s0/lo + start name=procdiskstats interval=1000000 offset=0 + + load name=procnetdev + config name=procnetdev ifaces=lo producer=s0 instance=s0/lo + start name=procnetdev interval=1000000 offset=0 + + setgroup_add name=s0/grp producer=s0 interval=1000000 offset=0 + setgroup_ins name=s0/grp instance=s0/sda,s0/lo -The following is the same for s1 sampler daemon, but with different devices (sdb and eno1). +The following is the same for s1 sampler daemon, but with different devices (sdb and eno1). .. code-block:: RST ### s1.conf - load name=procdiskstats + load name=procdiskstats config name=procdiskstats device=sdb producer=s1 instance=s1/sdb - start name=procdiskstats interval=1000000 offset=0 - - load name=procnetdev - config name=procnetdev ifaces=eno1 producer=s1 instance=s1/eno1 - start name=procnetdev interval=1000000 offset=0 - - setgroup_add name=s1/grp producer=s1 interval=1000000 offset=0 - setgroup_ins name=s1/grp instance=s1/sdb,s1/eno1 - -The s0 LDMS daemon is listening on port 10000 and the s1 LDMS daemon is listening on port 10001. + start name=procdiskstats interval=1000000 offset=0 + + load name=procnetdev + config name=procnetdev ifaces=eno1 producer=s1 instance=s1/eno1 + start name=procnetdev interval=1000000 offset=0 + + setgroup_add name=s1/grp producer=s1 interval=1000000 offset=0 + setgroup_ins name=s1/grp instance=s1/sdb,s1/eno1 + +The s0 LDMS daemon is listening on port 10000 and the s1 LDMS daemon is listening on port 10001. Perform `ldms_ls` on a group *********************** Performing `ldms_ls -v` or `ldms_ls -l` on a LDMS daemon hosting a group will perform the query on the set representing the group itself as well as iteratively querying the group's members. -Example: +Example: .. code-block:: RST ldms_ls -h localhost -x sock -p 10000 - + Output: .. code-block:: RST ldms_ls -h localhost -x sock -p 10000 -v s0/grp | grep consistent - + Output: .. code-block:: RST @@ -525,44 +525,44 @@ Output: s0/grp: consistent, last update: Mon May 20 15:44:30 2019 -0500 [511879us] s0/lo: consistent, last update: Mon May 20 16:13:16 2019 -0500 [1126us] s0/sda: consistent, last update: Mon May 20 16:13:17 2019 -0500 [1176us] - + .. code-block:: RST ldms_ls -h localhost -x sock -p 10000 -v s0/lo | grep consistent # only query lo set from set group s0 - + .. note:: - The update time of the group set is the time that the last set was inserted into the group. + The update time of the group set is the time that the last set was inserted into the group. Update / store with set group *********************** -The following is an example of an aggregator configuration to match-update only the set groups, and their members, with storage policies: +The following is an example of an aggregator configuration to match-update only the set groups, and their members, with storage policies: .. code-block:: RST - # Stores - load name=store_csv + # Stores + load name=store_csv config name=store_csv path=csv # strgp for netdev, csv file: "./csv/net/procnetdev" - strgp_add name=store_net plugin=store_csv container=net schema=procnetdev - strgp_prdcr_add name=store_net regex=.* - strgp_start name=store_net + strgp_add name=store_net plugin=store_csv container=net schema=procnetdev + strgp_prdcr_add name=store_net regex=.* + strgp_start name=store_net # strgp for diskstats, csv file: "./csv/disk/procdiskstats" - strgp_add name=store_disk plugin=store_csv container=disk schema=procdiskstats - strgp_prdcr_add name=store_disk regex=.* - strgp_start name=store_disk - - # Updater that updates only groups - updtr_add name=u interval=1000000 offset=500000 - updtr_match_add name=u regex=ldmsd_grp_schema match=schema - updtr_prdcr_add name=u regex=.* - updtr_start name=u - + strgp_add name=store_disk plugin=store_csv container=disk schema=procdiskstats + strgp_prdcr_add name=store_disk regex=.* + strgp_start name=store_disk + + # Updater that updates only groups + updtr_add name=u interval=1000000 offset=500000 + updtr_match_add name=u regex=ldmsd_grp_schema match=schema + updtr_prdcr_add name=u regex=.* + updtr_start name=u + Performing `ldms_ls` on the LDMS aggregator daemon exposes all the sets (including groups) .. code-block:: RST ldms_ls -h localhost -x sock -p 9000 - + Output: .. code-block:: RST @@ -573,13 +573,13 @@ Output: s0/sda s0/lo s0/grp - + Performing `ldms_ls -v` on a LDMS daemon hosting a group again but only querying the group and its members: .. code-block:: RST ldms_ls -h localhost -x sock -p 9000 -v s1/grp | grep consistent - + Output: .. code-block:: RST @@ -587,12 +587,12 @@ Output: s1/grp: consistent, last update: Mon May 20 15:42:34 2019 -0500 [891643us] s1/sdb: consistent, last update: Mon May 20 16:38:38 2019 -0500 [1805us] s1/eno1: consistent, last update: Mon May 20 16:38:38 2019 -0500 [1791us] - - -The following is an example of the CSV output: + + +The following is an example of the CSV output: .. code-block:: RST - + > head csv/*/* .. code-block:: RST @@ -607,7 +607,7 @@ The following is an example of the CSV output: 1558387834.001121,1121,s0,0,0,0,197797,0,9132,0,5382606,0,69312,0,522561,0,446083,0,418086168,0,966856,0,0,0,213096,0,1036080,0,1327776668,0,1380408297,0 1558387835.001179,1179,s0,0,0,0,197797,0,9132,0,5382606,0,69312,0,522561,0,446083,0,418086168,0,966856,0,0,0,213096,0,1036080,0,1327776668,0,1380408297,0 1558387835.001193,1193,s1,0,0,0,108887,0,32214,0,1143802,0,439216,0,1,0,0,0,8,0,44,0,0,0,54012,0,439240,0,1309384656,0,1166016512,0 - + ==> csv/net/procnetdev <== #Time,Time_usec,ProducerName,component_id,job_id,app_id,rx_bytes#lo,rx_packets#lo,rx_errs#lo,rx_drop#lo,rx_fifo#lo,rx_frame#lo,rx_compressed#lo,rx_multicast#lo,tx_bytes#lo,tx_packets#lo,tx_errs#lo,tx_drop#lo,tx_fifo#lo,tx_colls#lo,tx_carrier#lo,tx_compressed#lo 1558387831.001798,1798,s0,0,0,0,12328527,100865,0,0,0,0,0,0,12328527,100865,0,0,0,0,0,0 diff --git a/rtd/docs/source/ldms-streams.rst b/rtd/docs/source/ldms-streams.rst index 40818c40ba..3f4659a05c 100644 --- a/rtd/docs/source/ldms-streams.rst +++ b/rtd/docs/source/ldms-streams.rst @@ -4,7 +4,7 @@ Streams-enabled Application Data Collectors Caliper *********************** -This section covers the basic steps on how to compile, build and use the caliperConnector. +This section covers the basic steps on how to compile, build and use the caliperConnector. **What Is Caliper?** @@ -21,22 +21,22 @@ Build the Caliper program with the application you wish to analyze. No modificat One built, you will need to poin the $LD_LIBRARY_PATH to Caliper's library: .. code-block:: RST - + LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64 Now, to enable LDMS data collection, set (or export) the following list of caliper variables to ``ldms`` when executing a program. An example is shown below: .. code-block:: RST - + CALI_LOOP_MONITOR_ITERATION_INTERVAL=10 ./caliper_example.o 400 CALI_SERVICES_ENABLE=loop_monitor,mpi,ldms -The ``CALI_LOOP_MONITOR_ITERATION_INTERVAL`` collects measurements every n loop iterations of the acpplicaiton and the ``CALI_SERVICES_ENABLE`` define which services will be combined to collect the data. +The ``CALI_LOOP_MONITOR_ITERATION_INTERVAL`` collects measurements every n loop iterations of the acpplicaiton and the ``CALI_SERVICES_ENABLE`` define which services will be combined to collect the data. Once done, you will just need to execute your program and you will have application data collected by Caliper and LDMS. .. note:: - + The MPI service (i.e., mpi) is required when enabling LDMS because it is used for associating the MPI rank data collected by LDMS. LDMS Expected Output @@ -44,7 +44,7 @@ LDMS Expected Output LDMS collects a set of runtime timeseries data of the application in parallel with Caliper. Below is an example output of the data collect, formatted into a JSON string: .. code-block:: - + {"job_id":11878171,"ProducerName":“n1","rank":0,"timestamp":1670373198.056455,"region":"init","time":33.172237 } {"job_id":11878171,"ProducerName":"n1","rank":0,"timestamp":1670373198.056455,"region":"initialization","time":33.211929 } {"job_id":11878171,"ProducerName":“n1","rank":0,"timestamp":1670373198.056455,"region":"main","time":44.147736 } @@ -60,11 +60,11 @@ Any data collected by LDMS should have the same fields as the one shown above an Darshan *********************** -This section covers basics steps on how to compile, build and use the Darshan-LDMS Integration code (i.e. darshanConnector). The following application tests are part of the Darshan program and can be found under ``/darshan/darshan-test/regression/test-cases/src/``. +This section covers basics steps on how to compile, build and use the Darshan-LDMS Integration code (i.e. darshanConnector). The following application tests are part of the Darshan program and can be found under ``/darshan/darshan-test/regression/test-cases/src/``. **What Is Darshan?** -A lightweight I/O characterization tool that transparently captures application I/O behavior from HPC applications with minimal overhead. +A lightweight I/O characterization tool that transparently captures application I/O behavior from HPC applications with minimal overhead. **What Is The darshanConnector?** @@ -74,7 +74,7 @@ A Darshan-LDMS functionality that utilizes LDMS Streams to collect Darshan’s o :caption: The above diagrams provieds a high level visualization of the darshanConnector. During the Darshan initialization, the connector (on the left-hand side) checks to see if darshan has been built against the ldms library and if it has it will initialize a connection to the LDMS stream daemon when the DARSHAN_LDMS_ENABLE is set. Once initialized, the connecter will know which module data we want to collect by checking which environment variables are set. For example, if MPI-IO_ENABLE_LDMS is set, that specific I/O event data will be collected. The runtime data collection and JSON message formatting is then performed in the darshan ldms connector send function. This function is triggered whenever an I/O event occurs. The data is then published to LDMS streams interface and sent to through the LDMS Transport to be stored into a database. As you can see at the very bottom left is the JSON formatted message. Meanwhile, on the right, darshan is running as usual by initializing their modules, collecting the I/O event data for these modules, aggregating and calculating the data and then outputting the information into a Darshan log file. As you can see, the LDMS Streams implementation does not interfere with Darshan .. note:: - + LDMS must already be installed on the system or locally. If it is not, then please following ``Getting The Source`` and ``Building The Source`` in the `LDMS Quickstart Guide `_. If the Darshan-LDMS code is already deployed on your system, please skip to `Run An LDMS Streams Daemon`_ **Metric Definitions** @@ -111,7 +111,7 @@ Below are the list of Darshan metrics that are currently being collected by the * ``cnt:`` The count of the operations ("op" field) performed per module per rank. Resets to 0 after each "close" operation. * ``seg:`` Contains the following array metrics from the operation ("op" field): - + ``pt_sel: HDF5 number of different access selections. reg_hslab: HDF5 number of regular hyperslabs. irreg_hslab: HDF5 number of irregular hyperslabs. @@ -138,9 +138,9 @@ All data fields which that not change throughout the entire application run (i.e Compile and Build with LDMS --------------------------- 1. Run the following to build Darshan and link against an existing LDMS library on the system. - + .. code-block:: RST - + git clone https://github.com/darshan-hpc/darshan.git cd darshan && mkdir build/ ./prepare.sh && cd build/ @@ -149,16 +149,16 @@ Compile and Build with LDMS --prefix=/darshan/ \ --with-JOB_ID-env= \ --enable-ldms-mod \ - --with-ldms= + --with-ldms= make && make install .. note:: - * This configuration is specific to the system. should be replaced by the compiler wrapper for your MPI Library, (e.g., ``mpicc`` for Open MPI, or ``cc`` for Cray Development Environment MPI wrappers). + * This configuration is specific to the system. should be replaced by the compiler wrapper for your MPI Library, (e.g., ``mpicc`` for Open MPI, or ``cc`` for Cray Development Environment MPI wrappers). * If running an MPI program, make sure an MPI library is installed/loaded on the system. For more information on how to install and build the code across various platforms, please visit `Darshan's Runtime Installation Page `_ * ``--with-jobid-env=`` expects a string that is the environment variable that the hosted job scheduler utilizes on the HPC system. (e.g., Slurm would use ``--with-jobid-env=SLURM_JOB_ID``) - -2. **OPTIONAL** To build HDF5 module for Darshan, you must first load the HDF5 modulefile with ``module load hdf5-parallel``, then run configure as follows: + +2. **OPTIONAL** To build HDF5 module for Darshan, you must first load the HDF5 modulefile with ``module load hdf5-parallel``, then run configure as follows: .. code-block:: RST @@ -167,28 +167,28 @@ Compile and Build with LDMS --prefix=/darshan/ \ --with-jobid-env= \ --enable-ldms-mod \ - --with-ldms= + --with-ldms= --enable-hdf5-mod \ - --with-hdf5= + --with-hdf5= make && make install 2a. **OPTIONAL** If you do not have HDF5 installed on your system, you may install Python's ``h5py`` package with: .. code-block:: RST - + sudo apt-get install -y hdf5-tools libhdf5-openmpi-dev openmpi-bin # we need to build h5py with the system HDF5 lib backend export HDF5_MPI="ON" CC=cc python -m pip install --no-binary=h5py h5py .. note:: - + If the HDF5 library is installed this way, you do not need to include the ``--with-hdf5`` flag during configuration. For more information on other methods and HDF5 versions to install, please visit `Darshan's Runtime Installation Page `_. - + Run an LDMS Streams Daemon --------------------------- -This section will go over how to start and configure a simple LDMS Streams deamon to collect the Darshan data and store to a CSV file. +This section will go over how to start and configure a simple LDMS Streams deamon to collect the Darshan data and store to a CSV file. If an LDMS Streams daemon is already running on the system then please skip to `Test the Darshan-LDMS Integrated Code (Multi Node)`_. 1. First, initialize an ldms streams daemon on a compute node as follows: @@ -202,7 +202,7 @@ If an LDMS Streams daemon is already running on the system then please skip to ` .. code-block:: RST - LDMS_INSTALL= + LDMS_INSTALL= export LD_LIBRARY_PATH="$LDMS_INSTALL/lib/:$LDMS_INSTALL/lib:$LD_LIBRARY_PATH" export LDMSD_PLUGIN_LIBPATH="$LDMS_INSTALL/lib/ovis-ldms/" export ZAP_LIBPATH="$LDMS_INSTALL/lib/ovis-ldms" @@ -214,19 +214,19 @@ If an LDMS Streams daemon is already running on the system then please skip to ` export HOSTNAME="localhost" .. note:: - + LDMS must already be installed on the system or locally. If it is not, then please follow ``Getting The Source`` and ``Building The Source`` in the `LDMS Quickstart Guide `_. 3. Next, create a file called **"darshan\_stream\_store.conf"** and add the following content to it: .. code-block:: RST - + load name=hello_sampler config name=hello_sampler producer=${HOSTNAME} instance=${HOSTNAME}/hello_sampler stream=darshanConnector component_id=${COMPONENT_ID} start name=hello_sampler interval=${SAMPLE_INTERVAL} offset=${SAMPLE_OFFSET} - + load name=stream_csv_store - config name=stream_csv_store path=./streams/store container=csv stream=darshanConnector rolltype=3 rollover=500000 + config name=stream_csv_store path=./streams/store container=csv stream=darshanConnector rolltype=3 rollover=500000 4. Next, run the LDSM Streams daemon with the following command: @@ -235,7 +235,7 @@ If an LDMS Streams daemon is already running on the system then please skip to ` ldmsd -x sock:10444 -c darshan_stream_store.conf -l /tmp/darshan_stream_store.log -v DEBUG -r ldmsd.pid .. note:: - + To check that the ldmsd daemon is connected running, run ``ps auwx | grep ldmsd | grep -v grep``, ``ldms_ls -h -x sock -p -a none -v`` or ``cat /tmp/darshan_stream_store.log``. Where is the node where the LDMS daemon exists and is the port number it is listening on. Test the Darshan-LDMS Integrated Code (Multi Node) @@ -252,7 +252,7 @@ Set The Environment export LD_PRELOAD=$DARSHAN_INSTALL_PATH/lib/libdarshan.so export LD_LIBRARY_PATH=$DARSHAN_INSTALL_PATH/lib:$LD_LIBRARY_PATH # optional. Please visit Darshan's webpage for more information. - export DARSHAN_MOD_ENABLE="DXT_POSIX,DXT_MPIIO" + export DARSHAN_MOD_ENABLE="DXT_POSIX,DXT_MPIIO" # uncomment if hdf5 is enabled #export C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/include/hdf5/openmpi @@ -264,22 +264,22 @@ Set The Environment export DARSHAN_LDMS_HOST= export DARSHAN_LDMS_PORT=10444 export DARSHAN_LDMS_AUTH=none - + # enable LDMS data collection. No runtime data collection will occur if this is not exported. export DARSHAN_LDMS_ENABLE= - - # determine which modules we want to publish to ldmsd - #export DARSHAN_LDMS_ENABLE_MPIIO= - #export DARSHAN_LDMS_ENABLE_POSIX= + + # determine which modules we want to publish to ldmsd + #export DARSHAN_LDMS_ENABLE_MPIIO= + #export DARSHAN_LDMS_ENABLE_POSIX= #export DARSHAN_LDMS_ENABLE_STDIO= - #export DARSHAN_LDMS_ENABLE_HDF5= + #export DARSHAN_LDMS_ENABLE_HDF5= #export DARSHAN_LDMS_ENABLE_ALL= #export DARSHAN_LDMS_VERBOSE= -.. note:: - - The ```` is set to the node name the LDMS Streams daemon is running on (e.g. the node we previous ssh'd into). Make sure the ``LD_PRELOAD`` and at least one of the ``DARSHAN_LDMS_ENABLE_*`` variables are set. If not, no data will be collected by LDMS. - +.. note:: + + The ```` is set to the node name the LDMS Streams daemon is running on (e.g. the node we previous ssh'd into). Make sure the ``LD_PRELOAD`` and at least one of the ``DARSHAN_LDMS_ENABLE_*`` variables are set. If not, no data will be collected by LDMS. + .. note:: ``DARSHAN_LDMS_VERBOSE`` outputs the JSON formatted messages sent to the LDMS streams daemon. The output will be sent to STDERR. @@ -289,12 +289,12 @@ Execute Test Application Now we will test the darshanConnector with Darshan's example ``mpi-io-test.c`` code by setting the following environment variables: .. code-block:: RST - + export PROG=mpi-io-test export DARSHAN_TMP=/tmp/darshan-ldms-test export DARSHAN_TESTDIR=/darshan/darshan-test/regression export DARSHAN_LOGFILE_PATH=$DARSHAN_TMP - + Now ``cd`` to the executable and test the appilcation with the darshanConnector enabled. .. code-block:: RST @@ -305,15 +305,15 @@ Now ``cd`` to the executable and test the appilcation with the darshanConnector srun ${PROG} -f $DARSHAN_TMP/${PROG}.tmp.dat Once the application is complete, to view the data please skip to `Check Results`_. - -Test the Darshan-LDMS Integrated Code (Single Node) + +Test the Darshan-LDMS Integrated Code (Single Node) ---------------------------------- The section goes over step-by-step instructions on how to compile and execute the ``mpi-io-test.c`` program under ``darshan/darshan-test/regression/test-cases/src/``, collect the data with the LDMS streams daemon and store it to a CSV file on a single login node. This section is for those who will not be running their applications on a cluster (i.e. no compute nodes). 1. Set Environment Variables for Darshan, LDMS and Darshan-LDMS Integrated code (i.e. darshanConnector). .. code-block:: RST - + # Darshan export DARSHAN_INSTALL_PATH= export LD_PRELOAD=/lib/libdarshan.so @@ -324,10 +324,10 @@ The section goes over step-by-step instructions on how to compile and execute th # uncomment if hdf5 is enabled #export C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/include/hdf5/openmpi #export HDF5_LIB=/libhdf5.so - + # LDMS - LDMS_INSTALL= + LDMS_INSTALL= export LD_LIBRARY_PATH="$LDMS_INSTALL/lib/:$LDMS_INSTALL/lib:$LD_LIBRARY_PATH" export LDMSD_PLUGIN_LIBPATH="$LDMS_INSTALL/lib/ovis-ldms/" export ZAP_LIBPATH="$LDMS_INSTALL/lib/ovis-ldms" @@ -337,7 +337,7 @@ The section goes over step-by-step instructions on how to compile and execute th export SAMPLE_INTERVAL="1000000" export SAMPLE_OFFSET="0" export HOSTNAME="localhost" - + # darshanConnector export DARSHAN_LDMS_STREAM=darshanConnector export DARSHAN_LDMS_XPRT=sock @@ -347,10 +347,10 @@ The section goes over step-by-step instructions on how to compile and execute th # enable LDMS data collection. No runtime data collection will occur if this is not exported. export DARSHAN_LDMS_ENABLE= - - # determine which modules we want to publish to ldmsd - #export DARSHAN_LDMS_ENABLE_MPIIO= - #export DARSHAN_LDMS_ENABLE_POSIX= + + # determine which modules we want to publish to ldmsd + #export DARSHAN_LDMS_ENABLE_MPIIO= + #export DARSHAN_LDMS_ENABLE_POSIX= #export DARSHAN_LDMS_ENABLE_STDIO= #export DARSHAN_LDMS_ENABLE_HDF5= #export DARSHAN_LDMS_ENABLE_ALL= @@ -362,13 +362,13 @@ The section goes over step-by-step instructions on how to compile and execute th 2. Generate the LDMSD Configuration File and Start the Daemon -.. code-block:: RST +.. code-block:: RST cat > darshan_stream_store.conf << EOF load name=hello_sampler config name=hello_sampler producer=${HOSTNAME} instance=${HOSTNAME}/hello_sampler stream=darshanConnector component_id=${COMPONENT_ID} start name=hello_sampler interval=${SAMPLE_INTERVAL} offset=${SAMPLE_OFFSET} - + load name=stream_csv_store config name=stream_csv_store path=./streams/store container=csv stream=darshanConnector rolltype=3 rollover=500000 EOF @@ -376,19 +376,19 @@ The section goes over step-by-step instructions on how to compile and execute th ldmsd -x sock:10444 -c darshan_stream_store.conf -l /tmp/darshan_stream_store.log -v DEBUG # check daemon is running ldms_ls -p 10444 -h localhost -v - + 3. Set Up Test Case Variables -.. code-block:: RST +.. code-block:: RST export PROG=mpi-io-test export DARSHAN_TMP=/tmp/darshan-ldms-test export DARSHAN_TESTDIR=/darshan/darshan-test/regression export DARSHAN_LOGFILE_PATH=$DARSHAN_TMP - + 4. Run Darshan's mpi-io-test.c program -.. code-block:: RST +.. code-block:: RST cd darshan/darshan-test/regression/test-cases/src $DARSHAN_TESTDIR/test-cases/src/${PROG}.c -o $DARSHAN_TMP/${PROG} @@ -396,10 +396,10 @@ The section goes over step-by-step instructions on how to compile and execute th ./${PROG} -f $DARSHAN_TMP/${PROG}.tmp.dat Once the application is complete, to view the data please skip to `Check Results`_. - -Pre-Installed Darshan-LDMS + +Pre-Installed Darshan-LDMS --------------------------- -If both the Darshan-LDMS integrated code (i.e., darshanConnector) and LDMS are already installed, and a system LDMS streams daemon is running, then there are two ways to enable the LDMS functionality: +If both the Darshan-LDMS integrated code (i.e., darshanConnector) and LDMS are already installed, and a system LDMS streams daemon is running, then there are two ways to enable the LDMS functionality: 1. Set the environment via sourcing the ``darshan_ldms.env`` script  @@ -415,20 +415,20 @@ If both the Darshan-LDMS integrated code (i.e., darshanConnector) and LDMS are a In order to enable the darshanConnector code on the system, just source the following env script: .. code-block:: RST - + module use /projects/ovis/modules/ source /projects/ovis/modules//darshan_ldms.env **OPTIONAL**: Add a "-v" when sourcing this file to enable verbose: .. code-block:: RST - + $ source /projects/ovis/modules//darshan_ldms.env -v This will output json messages collected by ldms to the terminal window. .. note:: - + The STDIO data will NOT be collected by LDMS. This is to prevent any recursive LDMS function calls.  2. Load Module @@ -437,10 +437,10 @@ This will output json messages collected by ldms to the terminal window. If you do not wish to set the environment using the env script from above, you can always load the ``darshan_ldms`` modulefile, as follows: .. code-block:: RST - + module use /projects/ovis/modules/ module load darshan_ldms - + **OPTIONAL**: If you decide to load the module, you will need to turn on verbose by setting the following environment variable in your run script: .. code-block:: RST @@ -464,8 +464,8 @@ If you want to collect all types of data then set all *_ENABLE_LDMS variables: export DARSHAN_LDMS_ENABLE_HDF5="" .. note:: - - All Darshan binary log-files (i.e. .darshan) will be saved to ``$LOGFILE_PATH_DARSHAN``, as specified at build time and exported in the user environment. + + All Darshan binary log-files (i.e. .darshan) will be saved to ``$LOGFILE_PATH_DARSHAN``, as specified at build time and exported in the user environment. .. code-block:: RST @@ -517,21 +517,21 @@ Once the module is loaded and the environment is set, you will just need to run If runtime errors or issues occur, then this is most likely due to incompatibility issues with the application build, or the Darshan-LDMS build that is using ``LD_PRELOAD``. You may debug the issue, as follows: - 1. Unset the ``LD_PRELOAD`` environment variable (e.g., ``unset LD_PRELOAD``), then run the application with: ``mpiexec -env LD_PRELOAD $DARSHAN_INSTALL_PATH/lib/libdarshan.so`` or ``srun --export=LD_PRELOAD=$DARSHAN_INSTALL_PATH/lib/libdarshan.so``. - For more information please see section 5.2 in `Darshan's Runtime Installation Page `_. + 1. Unset the ``LD_PRELOAD`` environment variable (e.g., ``unset LD_PRELOAD``), then run the application with: ``mpiexec -env LD_PRELOAD $DARSHAN_INSTALL_PATH/lib/libdarshan.so`` or ``srun --export=LD_PRELOAD=$DARSHAN_INSTALL_PATH/lib/libdarshan.so``. + For more information please see section 5.2 in `Darshan's Runtime Installation Page `_. - 2. If you are still running into runtime issues, please send an email to ldms@sandia.gov and provide: - a) mpi-io, hdf5, pnetcdf, compiler version (if applicable) used to build your application - b) Contents of your environment variables: $PATH, $LIBRARY_PATH, $LD_LIBRARY_PATH and $LD_PRELOAD. + 2. If you are still running into runtime issues, please send an email to ldms@sandia.gov and provide: + a) mpi-io, hdf5, pnetcdf, compiler version (if applicable) used to build your application + b) Contents of your environment variables: $PATH, $LIBRARY_PATH, $LD_LIBRARY_PATH and $LD_PRELOAD. Check Results ------------- LDMS Output //////////// -This section provides the expected output of an application run with the data published to LDMS streams daemon with a CSV storage plugin (see section `Run An LDMS Streams Daemon`_). +This section provides the expected output of an application run with the data published to LDMS streams daemon with a CSV storage plugin (see section `Run An LDMS Streams Daemon`_). -* If you are publishing to a Local Streams Daemon (compute or login nodes) to collect the Darshan data, then compare the generated ``csv`` file to the one shown below in this section. +* If you are publishing to a Local Streams Daemon (compute or login nodes) to collect the Darshan data, then compare the generated ``csv`` file to the one shown below in this section. * If you are publishing to a System Daemon, that aggregates the data and stores to a Scalable Object Store (SOS), please skip this section and go to the :doc:`SOS Quickstart Guide ` for more information about viewing and accessing data from this database. @@ -540,7 +540,7 @@ LDMS Log File * Once the application has completed, run ``cat /tmp/hello_stream_store.log`` in the terminal window where the ldmsd is running (compute node). You should see a similar output to the one below. .. code-block:: RST - + cat /tmp/hello_stream_store.log Fri Feb 18 11:35:23 2022: INFO : stream_type: JSON, msg: "{ "job_id":53023,"rank":3,"ProducerName":"nid00052","file":"darshan-output/mpi-io-test.tmp.dat","record_id":1601543006480890062,"module":"POSIX","type":"MET","max_byte":-1,"switches":-1,"flushes":-1,"cnt":1,"op":"opens_segment","seg":[{"data_set":"N/A","pt_sel":-1,"irreg_hslab":-1,"reg_hslab":-1,"ndims":-1,"npoints":-1,"off":-1,"len":-1,"dur":0.00,"timestamp":1645209323.082951}]}", msg_len: 401, entity: 0x155544084aa0 Fri Feb 18 11:35:23 2022: INFO : stream_type: JSON, msg: "{ "job_id":53023,"rank":3,"ProducerName":"nid00052","file":"N/A","record_id":1601543006480890062,"module":"POSIX","type":"MOD","max_byte":-1,"switches":-1,"flushes":-1,"cnt":1,"op":"closes_segment","seg":[{"data_set":"N/A","pt_sel":-1,"irreg_hslab":-1,"reg_hslab":-1,"ndims":-1,"npoints":-1,"off":-1,"len":-1,"dur":0.00,"timestamp":1645209323.083581}]}", msg_len: 353, entity: 0x155544083f60 @@ -553,7 +553,7 @@ CSV File .. code-block:: RST - #module,uid,ProducerName,switches,file,rank,flushes,record_id,exe,max_byte,type,job_id,op,cnt,seg:off,seg:pt_sel,seg:dur,seg:len,seg:ndims,seg:reg_hslab,seg:irreg_hslab,seg:data_set,seg:npoints,seg:timestamp,seg:total,seg:start + #module,uid,ProducerName,switches,file,rank,flushes,record_id,exe,max_byte,type,job_id,op,cnt,seg:off,seg:pt_sel,seg:dur,seg:len,seg:ndims,seg:reg_hslab,seg:irreg_hslab,seg:data_set,seg:npoints,seg:timestamp,seg:total,seg:start POSIX,99066,n9,-1,/lustre//darshan-ldms-output/mpi-io-test_lC.tmp.out,278,-1,9.22337E+18,/lustre//darshan-ldms-output/mpi-io-test,-1,MET,10697754,open,1,-1,-1,0.007415,-1,-1,-1,-1,N/A,-1,1662576527,0.007415,0.298313 MPIIO,99066,n9,-1,/lustre//darshan-ldms-output/mpi-io-test_lC.tmp.out,278,-1,9.22337E+18,/lustre//darshan-ldms-output/mpi-io-test,-1,MET,10697754,open,1,-1,-1,0.100397,-1,-1,-1,-1,N/A,-1,1662576527,0.100397,0.209427 POSIX,99066,n11,-1,/lustre//darshan-ldms-output/mpi-io-test_lC.tmp.out,339,-1,9.22337E+18,/lustre//darshan-ldms-output/mpi-io-test,-1,MET,10697754,open,1,-1,-1,0.00742,-1,-1,-1,-1,N/A,-1,1662576527,0.00742,0.297529 @@ -567,12 +567,12 @@ Compare With Darshan Log File(s) //////////////////////////////// Parse the Darshan binary file using Darshan's standard and DXT (only if the ``DXT Module`` is enabled) parsers. -.. code-block:: RST +.. code-block:: RST $DARSHAN_INSTALL_PATH/bin/darshan-parser --all $LOGFILE_PATH_DARSHAN/.darshan > $DARSHAN_TMP/${PROG}.darshan.txt $DARSHAN_INSTALL_PATH/bin/darshan-dxt-parser --show-incomplete $LOGFILE_PATH_DARSHAN/.darshan > $DARSHAN_TMP/${PROG}-dxt.darshan.txt -Now you can view the log(s) with ``cat $DARSHAN_TMP/${PROG}.darshan.txt`` or ``cat $DARSHAN_TMP/${PROG}-dxt.darshan.txt`` and compare them to the data collected by LDMS. +Now you can view the log(s) with ``cat $DARSHAN_TMP/${PROG}.darshan.txt`` or ``cat $DARSHAN_TMP/${PROG}-dxt.darshan.txt`` and compare them to the data collected by LDMS. The ``producerName``, file path and record_id of each job should match and, if ``dxt`` was enabled, the individual I/O statistics of each rank (i.e., start time and number of I/O operations). @@ -589,9 +589,9 @@ Setup and Configuration ---------------------- **The KokkosConnector** -A Kokkos-LDMS functionality that utilizes LDMS Streams to collect Kokkos related data during runtime. Kokkos sampler, provided by the Kokkos-tools library, controls the sampling rate and provides the option to sample data using a count-based push. It then formats the data to a JSON message and *publishes* it to an LDMS streams interface. +A Kokkos-LDMS functionality that utilizes LDMS Streams to collect Kokkos related data during runtime. Kokkos sampler, provided by the Kokkos-tools library, controls the sampling rate and provides the option to sample data using a count-based push. It then formats the data to a JSON message and *publishes* it to an LDMS streams interface. -.. warning:: +.. warning:: To use kokkosConnector, all users will need to install Kokkos-Tools. You can find their repository and instructions on installing it here: https://github.com/kokkos/kokkos-tools @@ -599,14 +599,14 @@ The following environmental variables are needed in an application's runscript t .. code-block:: RST - export KOKKOS_LDMS_HOST="localhost" - export KOKKOS_LDMS_PORT="412" + export KOKKOS_LDMS_HOST="localhost" + export KOKKOS_LDMS_PORT="412" export KOKKOS_PROFILE_LIBRARY="/kokkos-tools/common/kokkos_sampler/kp_sampler.so;/ovis/kokkosConnector/kp_kernel_ldms.so" export KOKKOS_SAMPLER_RATE=101 export KOKKOS_LDMS_VERBOSE=0 export KOKKOS_LDMS_AUTH="munge" export KOKKOS_LDMS_XPRT="sock" - + * The KOKKOS_SAMPLER_RATE variable determines the rate of messages pushed to streams and collected. Please note that it is in best practice to set this to a prime number to avoid collecting information from the same kernels. * The KOKKOS_LDMS_VERBOSE variable can be set to 1 for debug purposes which prints all collected kernel data to the console. @@ -614,10 +614,10 @@ How To Make A Data Connector ***************************** In order to create a data connector with LDMS to collect runtime timeseries application data, you will need to utilize LDMS's Streams Functionality. This section will provide the necessary functions and Streams API required to make the data connector. -The example (code) below is pulled from the Darshan-LDMS Integration code. +The example (code) below is pulled from the Darshan-LDMS Integration code. .. note:: - + The LDMS Streams functionality uses a push-based method to reduce memory consumed and data loss on the node. Include the following LDMS files @@ -625,7 +625,7 @@ Include the following LDMS files * First, the following libaries will need to be included in the program as these contain all the functions that the data connector will be using/calling. .. code-block:: RST - #include + #include #include #include @@ -634,7 +634,7 @@ Initialize All Necessary Variables * Next, the following variables will need to be initialized globally or accessible by the Streams API Functions described in the next section: -.. code-block:: RST +.. code-block:: RST #define SLURM_NOTIFY_TIMEOUT 5 ldms_t ldms_g; @@ -647,7 +647,7 @@ Initialize All Necessary Variables Copy "Hello Sampler" Streams API Functions ------------------------------------------ -Next, copy the ``ldms_t setup_connection`` and ``static void event_cb`` functions listed below. These functions originated from the `ldmsd_stream_subscribe.c `_ code. +Next, copy the ``ldms_t setup_connection`` and ``static void event_cb`` functions listed below. These functions originated from the `ldmsd_stream_subscribe.c `_ code. The ``setup_connection`` contains LDMS API calls that connects to the LDMS daemon and the ``static void event_cb`` is a callback function to check the connection status of the LDMS Daemon. @@ -731,8 +731,8 @@ Initialize and Connect to LDMSD Once the above functions have been copied, the ``setup_connection`` will need to be called in order to establish a connection an LDMS Streams Daemon. .. note:: - - The LDMS Daemon is configured with the `Streams Plugin `_ and should already be running on the node. The host is set to the node the daemon is running on and port is set to the port the daemon is listening to. Below you will find an example of the Darshan Connector for reference. + + The LDMS Daemon is configured with the `Streams Plugin `_ and should already be running on the node. The host is set to the node the daemon is running on and port is set to the port the daemon is listening to. Below you will find an example of the Darshan Connector for reference. .. code-block:: RST @@ -765,7 +765,7 @@ Once the above functions have been copied, the ``setup_connection`` will need to pthread_mutex_unlock(ln_lock); return; } - + The environment variables ``DARSHAN_LDMS_X`` are used to define the stream name (configured in the daemon), transport type (sock, ugni, etc.), host, port and authentication of the LDMSD. In this specific example, the stream name is set to "darshanConnector" so the environment variable, ``DARSHAN_LDMS_STREAM`` is exported as follows: ``export DARSHAN_LDMS_STREAM=darshanConnector`` .. note:: @@ -773,10 +773,10 @@ The environment variables ``DARSHAN_LDMS_X`` are used to define the stream name .. note:: If you run into the following error: ``error:unknown type name 'sem_t'`` then you will need to add the following libraries to your code: - + * ``#include `` * ``#include `` - + Publish Event Data to LDMSD ------------------------------------- Now we will create a function that will collect all relevent application events and publish to the LDMS Streams Daemon. In the Darshan-LDMS Integration, the following Darshan's I/O traces for each I/O event (i.e. open, close, read, write) are collected along with the absolute timestamp (for timeseries data) for each I/O event: @@ -809,7 +809,7 @@ Now we will create a function that will collect all relevent application events out_1: return; } - + .. note:: For more information about the various Darshan I/O traces and metrics collected, please visit `Darshan's Runtime Installation Page `_ and `Darshan LDMS Metrics Collected `_ pages. @@ -818,7 +818,7 @@ Once this function is called, it initializes a connection to the LDMS Streams Da There are various types of formats that can be used to publish the data (i.e. JSON, string, etc.) so please review the `Defining A Format`_ section for more information. -Collect Event Data +Collect Event Data ///////////////////////// To collect the application data in real time (and using the example given in this section), the ``void darshan_ldms_connector_send(arg1, arg2, arg3,....)`` will be placed in all sections of the code where we want to publish a message. From the Darshan-LDMS Integration code we would have: @@ -826,11 +826,11 @@ To collect the application data in real time (and using the example given in thi .. code-block:: RST darshan_ldms_connector_send(rec_ref->file_rec->counters[MPIIO_COLL_OPENS] + rec_ref->file_rec->counters[MPIIO_INDEP_OPENS], "open", -1, -1, -1, -1, -1, __tm1, __tm2, __ts1, __ts2, rec_ref->file_rec->fcounters[MPIIO_F_META_TIME], "MPIIO", "MET"); - -This line of code is placed within multiple macros (`MPIIO_RECORD_OPEN/READ/WRITE `_) in Darshan's MPIIO module. + +This line of code is placed within multiple macros (`MPIIO_RECORD_OPEN/READ/WRITE `_) in Darshan's MPIIO module. * Doing this will call the function everytime Darshan detects an I/O event from the application (i.e. read, write, open, close). Once called, the arguements will be passed to the function, added to the JSON formatted message and pushed to the LDMS daemon. -.. note:: - +.. note:: + For more information about how to store the published data from and LDMS Streams Daemon, please see the Stream CSV Store plugin man pages on a system where LDMS Docs are installed: ``man Plugin_stream_csv_store`` diff --git a/rtd/docs/source/ldmscon.rst b/rtd/docs/source/ldmscon.rst index 75b1855e23..dc06f17eb2 100644 --- a/rtd/docs/source/ldmscon.rst +++ b/rtd/docs/source/ldmscon.rst @@ -4,7 +4,7 @@ The LDMS Users Group Conferences (LDMSCON) serves as a forum for users to share About ********** -You can find the general information and previous conferences following webpage: +You can find the general information and previous conferences following webpage: `LDMS Users Group Conference`_. .. _LDMS Users Group Conference: https://sites.google.com/view/ldmscon @@ -14,7 +14,7 @@ Please go to the to stay up to date on tutorials, presentations an LDMSCON2023 ************ -The following attachment contains the scripts and commands used in the LDMCON2023 Basics powerpoint presentation. +The following attachment contains the scripts and commands used in the LDMCON2023 Basics powerpoint presentation. **Please DOWNLOAD THE FOLLOWING .ZIP FILE to easily follow along with the tutorial.** @@ -31,7 +31,7 @@ Recordings of previous presentations, tutorials and information for LDMSCON2023 :width: 200 .. note:: - + **If the file directory ``ldmscon2023`` is not extracted under ``/root/``** then please keep in mind that **any reference to ``/root/``** in the powerpoint presentation, and following files, **will need to be changed to the absolute path of ``ldmscon2023/``**. * ``../conf/e3/agg_store_csv.conf`` @@ -41,9 +41,9 @@ Recordings of previous presentations, tutorials and information for LDMSCON2023 * ``../scripts/e3/store_csv.txt`` .. note:: - All files under ``../scripts/e*`` are not used in the tutorial but rather are the commands/steps used for each exercise. They demonstrate LDMS's ability to configure and initialize it's daemons with a single bash script. + All files under ``../scripts/e*`` are not used in the tutorial but rather are the commands/steps used for each exercise. They demonstrate LDMS's ability to configure and initialize it's daemons with a single bash script. -.. note:: +.. note:: These scripts must be ran in a directory that is readable and writable. Otherwise the log/data file generation will not work. LDMSCON2022 diff --git a/rtd/docs/source/store_man/Plugin_avro_kafka_store.rst b/rtd/docs/source/store_man/Plugin_avro_kafka_store.rst index 46232023cd..a847bf2405 100644 --- a/rtd/docs/source/store_man/Plugin_avro_kafka_store.rst +++ b/rtd/docs/source/store_man/Plugin_avro_kafka_store.rst @@ -8,12 +8,12 @@ Plugin_avro_kafka_store :depth: 3 .. -NAME +NAME ========================= avro_kafka_store - LDMSD avro_kafka_store plugin -SYNOPSIS +SYNOPSIS ============================= **config** **name=avro_kafka_store** **producer=PRODUCER** @@ -21,7 +21,7 @@ SYNOPSIS [ **encoding=\ AVRO** ] [ **kafka_conf=\ PATH** ] [ **serdes_conf=\ PATH** ] -DESCRIPTION +DESCRIPTION ================================ **``avro_kafka_store``** implements a decomposition capable LDMS metric @@ -37,26 +37,26 @@ When in *AVRO* mode, the plugin manages schema in cooperation with an Avro Schema Registry. The location of this registry is specified in a configuration file or optionally on the **``config``** command line. -CONFIG OPTIONS +CONFIG OPTIONS =================================== -mode +mode A string indicating the encoding mode: "JSON" will encode messages in JSON format, "AVRO" will encode messages using a schema and Avro Serdes. The default is "AVRO". The mode values are not case sensitive. -name +name Must be avro_kafka_store. -kafka_conf +kafka_conf A path to a configuration file in Java property format. This configuration file is parsed and used to configure the Kafka kafka_conf_t configuration object. The format of this file and the supported attributes are available here: https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md. -serdes_conf +serdes_conf A path to a configuration file in Java property format. This configuration file is parsed and used to configure the Avro Serdes serdes_conf_t configuration object. The only supported option for @@ -75,37 +75,37 @@ The '%' character introduces a *format specifier* that will be substituted in the topic format string to create the topic name. The format specifiers are as follows: -%F +%F The format in which the message is serialized: "json" or "avro". -%S +%S The set parameter's *schema* name. -%I +%I The instance name of the set, e.g. "orion-01/meminfo". -%P +%P The set parameter's *producer* name, e.g. "orion-01." -%u +%u The user-name string for the owner of the set. If the user-name is not known on the system, the user-id is used. -%U +%U The user-id (uid_t) for the owner of the set. -%g +%g The group-name string for the group of the set. If the group-name is not known on the system, the group-id is used. -%G +%G The group-id (gid_t) for the group of the set. -%a +%a The access/permission bits for the set formatted as a string, e.g. "-rw-rw----". -%A +%A The access/permission bits for the set formatted as an octal number, e.g. 0440. @@ -283,10 +283,10 @@ schema-id to query the Schema registry for a schema. Once found, the client will construct a serdes from the schema definition and use this serdes to decode the message into Avro values. -EXAMPLES +EXAMPLES ============================= -kafka_conf Example File +kafka_conf Example File ------------------------ :: @@ -297,7 +297,7 @@ kafka_conf Example File # Specify the location of the Kafka broker bootstrap.server=localhost:9092 -serdes_conf Example File +serdes_conf Example File ------------------------- :: @@ -307,7 +307,7 @@ serdes_conf Example File # set to anything other than an empty string serdes.schema.url=https://localhost:9092 -Example strg_add command +Example strg_add command ------------------------- :: diff --git a/rtd/docs/source/store_man/Plugin_darshan_stream_store.rst b/rtd/docs/source/store_man/Plugin_darshan_stream_store.rst index 533135513d..6195c1611b 100644 --- a/rtd/docs/source/store_man/Plugin_darshan_stream_store.rst +++ b/rtd/docs/source/store_man/Plugin_darshan_stream_store.rst @@ -38,20 +38,20 @@ CONFIGURATION ATTRIBUTE SYNTAX | configuration line name= - | + | | This MUST be darshan_stream_store. path= - | + | | The path to the root of the SOS container store (should be created by the user) stream= - | + | | stream to which to subscribe. mode= - | + | | The container permission mode for create, (defaults to 0660). INPUT JSON FORMAT diff --git a/rtd/docs/source/store_man/Plugin_store_flatfile.rst b/rtd/docs/source/store_man/Plugin_store_flatfile.rst index 082c8376a6..71b96fe0b5 100644 --- a/rtd/docs/source/store_man/Plugin_store_flatfile.rst +++ b/rtd/docs/source/store_man/Plugin_store_flatfile.rst @@ -40,21 +40,21 @@ output files via identification of the container and schema. | ldmsd_controller strgp_add line plugin= - | + | | This MUST be store_flatfile. name= - | + | | The policy name for this strgp. container= - | + | | The container and the schema determine where the output files will be written (see path above). They also are used to match any action=custom configuration.node/meminfo. schema= - | + | | The container and schema determines where the output files will be written (see path above). diff --git a/rtd/docs/source/store_man/Plugin_store_sos.rst b/rtd/docs/source/store_man/Plugin_store_sos.rst index 3bc2ef191e..b4a20e59c5 100644 --- a/rtd/docs/source/store_man/Plugin_store_sos.rst +++ b/rtd/docs/source/store_man/Plugin_store_sos.rst @@ -38,11 +38,11 @@ STORE_SOS INIT CONFIGURATION ATTRIBUTE SYNTAX | ldmsd_controller configuration line name= - | + | | This MUST be store_sos. path= - | + | | The store will be put into a directory whose root is specified by the path argument. This directory must exist; the store will be created. The full path to the store will be @@ -61,27 +61,27 @@ container and schema for a store. | ldmsd_controller strgp_add line plugin= - | + | | This MUST be store_sos. name= - | + | | The policy name for this strgp. container= - | + | | The container and schema define the store as described above (see path). schema= - | + | | The container and schema define the store as described above (see path). You can have multiples of the same path and container, but with different schema (which means they will have different metrics) and they will be stored in the same store. decomposition= - | + | | Optionally use set-to-row decomposition with the specified configuration file in JSON format. See more about decomposition in ldmsd_decomposition(7). @@ -101,11 +101,11 @@ into XXX. Any commands given with no argument, will return usage info. | Create a partition. **-C** ** - | + | | Path to the container **-s** *state* - | + | | State of the new partition (case insensitive). Default is OFFLINE. Optional parameter. Valid options are: @@ -117,7 +117,7 @@ into XXX. Any commands given with no argument, will return usage info. or deleted. **part_name** - | + | | Name of the partition **sos_part_delete** @@ -126,11 +126,11 @@ into XXX. Any commands given with no argument, will return usage info. OFFLINE state to be deleted. **-C** ** - | + | | Path to the container **name** - | + | | Name of the parition **sos_part_modify** @@ -138,11 +138,11 @@ into XXX. Any commands given with no argument, will return usage info. | Modify the state of a partition. **-C** ** - | + | | Path to the container **-s** *state* - | + | | State of the new partition (case insensitive). Default is OFFLINE. Optional parameter. Valid options are: @@ -154,24 +154,24 @@ into XXX. Any commands given with no argument, will return usage info. or deleted. **part_name** - | + | | Name of the partition **sos_part_move** - | + | | Move a partition to another storage location. -C -p part_name **-C** ** - | + | | Path to the container **-p** ** - | + | | The new path. **part_name** - | + | | Name of the partition USING SOS COMMANDS TO LOOK AT DATA IN A PARTITION @@ -185,7 +185,7 @@ command options are below. Example usage is in the example section. | Print a directory of the schemas. **-C** ** - | + | | Path to the container **sos_cmd** @@ -193,7 +193,7 @@ command options are below. Example usage is in the example section. | Show debug information for the container **-C** ** - | + | | Path to the container **sos_cmd** @@ -201,22 +201,22 @@ command options are below. Example usage is in the example section. | Print data from a container **-C** ** - | + | | Path to the container **-q** Used to query **-S** ** - | + | | Schema querying against **-X** ** - | + | | Variable that is indexed to use in the query. **-V** ** - | + | | One or more vars to output. NOTES diff --git a/rtd/docs/source/store_man/Plugin_store_timescale.rst b/rtd/docs/source/store_man/Plugin_store_timescale.rst index d020281b51..06a846870b 100644 --- a/rtd/docs/source/store_man/Plugin_store_timescale.rst +++ b/rtd/docs/source/store_man/Plugin_store_timescale.rst @@ -44,37 +44,37 @@ STORE_TIMESCALE CONFIGURATION ATTRIBUTE SYNTAX | ldmsd_controller configuration line name= - | + | | This MUST be store_timescale. user= - | + | | This option is required; It will be used as the user name to connect to timescaledb. pwfile= - | + | | This option is required; The file must have content secretword=, the password will be used as the password to connect to timescaledb. hostaddr= - | + | | This option is required; It will be used as the ip addr of timescaledb to connect to. port= - | + | | This option is required; It will be used as the port number of timescaledb to connect to. dbname= - | + | | This option is required; It will be used as the timescaledb database name to connect to. measurement_limit= - | + | | This is optional; It specifies the maximum length of the sql statement to create table or insert data into timescaledb; default 8192. @@ -91,20 +91,20 @@ output files via identification of the container and schema. | ldmsd_controller strgp_add line plugin= - | + | | This MUST be store_timescale. name= - | + | | The policy name for this strgp. container= - | + | | The container and the schema determine where the output files will be written (see path above). schema= - | + | | The container and the schema determine where the output files will be written (see path above). You can have multiples of the same sampler, but with different schema (which means they will diff --git a/rtd/docs/source/store_man/Plugin_store_tutorial.rst b/rtd/docs/source/store_man/Plugin_store_tutorial.rst index 9f73d3f024..5959adff4c 100644 --- a/rtd/docs/source/store_man/Plugin_store_tutorial.rst +++ b/rtd/docs/source/store_man/Plugin_store_tutorial.rst @@ -43,11 +43,11 @@ STORE_TUTORIAL CONFIGURATION ATTRIBUTE SYNTAX | ldmsd_controller configuration line name= - | + | | This MUST be store_tutorial. path= - | + | | This option is required; the config line or the options file must supply a default value. The output files will be put into a directory whose root is specified by the path argument. This @@ -68,20 +68,20 @@ output files via identification of the container and schema. | ldmsd_controller strgp_add line plugin= - | + | | This MUST be store_tutorial. name= - | + | | The policy name for this strgp. container= - | + | | The container and the schema determine where the output files will be written (see path above). schema= - | + | | The container and the schema determine where the output files will be written (see path above). You can have multiples of the same sampler, but with different schema (which means they will diff --git a/rtd/docs/source/store_man/Plugin_stream_csv_store.rst b/rtd/docs/source/store_man/Plugin_stream_csv_store.rst index ef9b246de4..0a80eedb38 100644 --- a/rtd/docs/source/store_man/Plugin_stream_csv_store.rst +++ b/rtd/docs/source/store_man/Plugin_stream_csv_store.rst @@ -40,35 +40,35 @@ CONFIGURATION ATTRIBUTE SYNTAX | configuration line name= - | + | | This MUST be stream_csv_store. path= - | + | | path to the directory of the csv output file container= - | + | | directory of the csv output file stream= - | + | | csv list of streams to which to subscribe. flushtime= - | + | | Flush any file that has not received data on its stream in the last N sec. This is asynchonous to any buffering or rollover that is occuring. Min time if enabled = 120 sec. This will occur again at this interval if there is still no data received. buffer=<0/1> - | + | | Optional buffering of the output. 0 to disable buffering, 1 to enable it with autosize (default) rolltype= - | + | | By default, the store does not rollover and the data is written to a continously open filehandle. Rolltype and rollover are used in conjunction to enable the store to manage rollover, including @@ -76,25 +76,25 @@ CONFIGURATION ATTRIBUTE SYNTAX roll occurs. Valid options are: 1 - | + | | wake approximately every rollover seconds and roll. Rollover is suppressed if no data at all has been written. 2 - | + | | wake daily at rollover seconds after midnight (>=0) and roll. Rollover is suppressed if no data at all has been written. 3 - | + | | roll after approximately rollover records are written. 4 - | + | | roll after approximately rollover bytes are written. 5 - | + | | wake at rollover seconds after midnight (>=0) and roll, then repeat every rollagain (> rollover) seconds during the day. For example "rollagain=3600 rollover=0 rolltype=5" rolls @@ -102,7 +102,7 @@ CONFIGURATION ATTRIBUTE SYNTAX been written. rollover= - | + | | Rollover value controls the frequency of rollover (e.g., number of bytes, number of records, time interval, seconds after midnight). Note that these values are estimates due to the @@ -166,7 +166,7 @@ Options for timing information are driven by #defines in the code source right now. TIMESTAMP_STORE - | + | | Set by #define or #undef TIMESTAMP_STORE. This will write out an absolute timestamp in the file as the last item in the csv and is called 'store_recv_time' in the header. The timestamp is only @@ -176,7 +176,7 @@ TIMESTAMP_STORE and json are timestamped. STREAM_CSV_DIAGNOSTICS - | + | | Set by #define or #undef STREAM_CSV_DIAGNOSTICS. This will write out diagnostic info to the log when stream_cb is called. diff --git a/rtd/docs/source/store_man/index.rst b/rtd/docs/source/store_man/index.rst index fd9179a193..d96eb60139 100644 --- a/rtd/docs/source/store_man/index.rst +++ b/rtd/docs/source/store_man/index.rst @@ -4,5 +4,5 @@ Store Plugin Man Pages .. toctree:: :maxdepth: 1 :glob: - + * diff --git a/rtd/docs/source/ug.rst b/rtd/docs/source/ug.rst index 31e7df01c4..c371f1fb29 100644 --- a/rtd/docs/source/ug.rst +++ b/rtd/docs/source/ug.rst @@ -1,7 +1,7 @@ LDMS User's Group ============================== -LDMS User's group will meet every other Monday at Noon (Mountain time). +LDMS User's group will meet every other Monday at Noon (Mountain time). Sign up for meeting announcements using the information below. The LDMS Mailing Lists are hosted by LLNL at listserv.llnl.gov. The current available lists are: