Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-2735: Add TRES Billing Weights #27

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Swarsel
Copy link

@Swarsel Swarsel commented Nov 20, 2024

This addresses the issue that GPU (and mem) usage is not reflected in the current Billing to compute the Fairshare Factor in SLURM

RFC: https://vbc.atlassian.net/wiki/spaces/ITS/pages/3334635523/RFC-2735+Add+billing+weight+for+TRES+to+SLURM

@Swarsel Swarsel requested review from timeu and ebirn November 20, 2024 13:13
@Swarsel
Copy link
Author

Swarsel commented Nov 20, 2024

Here is an example output:

[root@clip-control-0 ~]# sbatch -p g --gres=gpu:1 --tasks-per-node=1 --cpus-per-task=8 --time=10 --wrap "nvidia-smi"
Submitted batch job 9213114
[root@clip-control-0 ~]# scontrol show job 9213114
JobId=9213114 JobName=wrap
   UserId=root(0) GroupId=root(0) MCS_label=N/A
   Priority=100553 Nice=0 Account=root QOS=g_short
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A
   SubmitTime=2024-11-20T14:27:31 EligibleTime=2024-11-20T14:27:31
   AccrueTime=2024-11-20T14:27:31
   StartTime=2024-11-20T14:27:34 EndTime=2024-11-20T14:27:34 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-11-20T14:27:34 Scheduler=Main
   Partition=g AllocNode:Sid=clip-control-0:23641
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=clip-g1-4
   BatchHost=clip-g1-4
   NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
   ReqTRES=cpu=8,mem=32G,node=1,billing=26,gres/gpu=1
   AllocTRES=cpu=8,mem=32G,node=1,billing=26,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
   MinCPUsNode=8 MinMemoryCPU=4G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/root
   StdErr=/root/slurm-9213114.out
   StdIn=/dev/null
   StdOut=/root/slurm-9213114.out
   TresPerNode=gres/gpu:1
   TresPerTask=cpu=8

@ebirn
Copy link
Contributor

ebirn commented Nov 20, 2024

tracked as RFC-2735

@Swarsel Swarsel changed the title Add TRES Billing Weights RFC-2735: Add TRES Billing Weights Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants