Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master QLP generation from qlp_parallel script #912

Merged
merged 9 commits into from
Jun 18, 2024

Conversation

awhoward
Copy link
Collaborator

@awhoward awhoward commented Jun 4, 2024

Addresses PR #910

Please keep the branch open. I should have used another branch, but I'd just created this one and switched gears to work on this issue.

@awhoward
Copy link
Collaborator Author

awhoward commented Jun 4, 2024

@bjfultn -- check out the auto throttling behavior with the --load parameter.

"""
Script Name: qlp_parallel.py

Description:
This script uses the 'parallel' utility to execute the recipe called
'recipes/quicklook_match.recipe' to generate standard Quicklook data
products. The script selects all KPF files based on their
type (L0/2D/L1/L2/master) from the standard data directory using a date
range specified by the parameters start_date and end_date. L0 files are
included if the --l0 flag is set or none of the --l0, --2d, --l1, --l2
flags are set (in which case all data types are included). The --2d,
--l1, and --l2 flags have similar functions. The script assumes that it
is being run in Docker and will return with an error message if not.
If start_date is later than end_date, the arguments will be reversed
and the files with later dates will be processed first.

The --ncpu parameter determines the maximum number of cores used. If the
--load parameter (a percentage, e.g. 90 = 90%) is set to a non-zero value,
this script will be throttled so that no new files will have QLPs
processed until the load is below that value. Note that throttling works
in steady state; it is possible to overload the system with the first set
of jobs if --ncpu is set too way high. Also, the system runs with a
little higher load than commanded, e.g., if you want 90% load, set it for
80%.

Invoking the --print_files flag causes the script to print the file
names, but not compute Quicklook data products.

Arguments:
start_date Start date as YYYYMMDD, YYYYMMDD.SSSSS, or YYYYMMDD.SSSSS.SS
end_date End date as YYYYMMDD, YYYYMMDD.SSSSS, or YYYYMMDD.SSSSS.SS

Options:
--l0 Select all L0 files in date range
--2d Select all 2D files in date range
--l1 Select all L1 files in date range
--l2 Select all L2 files in date range
--master Select all master files in date range
--ncpu Number of cores used for parallel processing; default=10
--load Maximum load (1 min average); default=0 (only activated if !=0)
--print_files Display file names matching criteria, but don't generate Quicklook plots
--help Display this message

Usage:
python qlp_parallel.py YYYYMMDD.SSSSS YYYYMMDD.SSSSS --ncpu NCPU --load LOAD --l0 --2d --l1 --l2 --master --print_files

Examples:
./scripts/qlp_parallel.py 20230101.12345.67 20230101.17 --ncpu 50 --l0 --2d
./scripts/qlp_parallel.py 20240501 20240505 --ncpu 150 --load 90
"""

@awhoward
Copy link
Collaborator Author

awhoward commented Jun 5, 2024

@bjfultn -- let's chat about this before merging. There's a problem with the automatic load throttling. I tested it outside of Docker, but it doesn't work inside because parallel can't access load information in /proc in the container. I tried a few fixes, but couldn't fix the issue.

@awhoward
Copy link
Collaborator Author

awhoward commented Jun 5, 2024

Ready to merge as per our discussion, @bjfultn.

@bjfultn bjfultn merged commit 72a21a6 into develop Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants