-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WRF detailed setup procedure #689
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all these detailed instructions, but azurehpc/apps/* (e.g wrf) should only contain scripts and code to build, install and run applications (independent of cluster deployment). I think the best location for deploying WRF on a cyclecloud cluster would be under the experimental directory.
Would it be possible to update/add the wrf build and install scripts (including creating the wrf data) in azurehpc/wrf and putting the complete deployment of WRF on cyclecloud under the experimental directory?
@@ -0,0 +1,105 @@ | |||
# Install and Setup CycleCloud for a Lab environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does examples/cycleserver_msi and examples/cycleserver deploy VNET and cycleserver automatically via a simple azurehpc config file.
It seems you are deploying the same but with all the manual steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm using manual steps. These manual steps can be useful in scenarios:
- where people may not want to install it using azurehpc scripts; or
- for learning purposes, where people wants to understand what exactly is installed/required.
I can add a mention about examples/cycleserver_msi and examples/cycleserver as an alternative option.
apps/wrf/readme.md
Outdated
|
||
Summary of this procedure: | ||
- Installs CycleCloud environment from scratch | ||
- Creates NFS storage server using CycleCloud cluster template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would Azure netapp files or a PFS be better for production?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. But, this is a procedure to setup a Lab environment.
I will add comments regarding Lab env and ANF or PFS as options for production.
## Download azurehpc GitHub repository | ||
cd /data | ||
#git clone https://github.com/Azure/azurehpc.git | ||
git clone https://github.com/marcusgaspar/azurehpc.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this URL correct (you are pointing to your fork?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is temporary, as I'm currently using my fork during POCs.
apps/wrf/readme.md
Outdated
mkdir ~/test1 | ||
cd ~/test1 | ||
|
||
qsub -l select=1:nodearray=execute1:ncpus=60:mpiprocs=60,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For HBv2, should ncpus=120 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were tests I did to measure the execution time with different configs. I forgot to add the execution time duration results. I will add a chart with it.
qsub -l select=1:nodearray=execute1:ncpus=60:mpiprocs=60,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs | ||
``` | ||
|
||
- Test 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so many tests, is the only difference between each test the number of nodes (select=N)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were tests I did to measure the execution time with different configs. I forgot to add the execution time duration results. I will add a chart with it.
apps/wrf/readme.md
Outdated
mkdir ~/test5 | ||
cd ~/test5 | ||
|
||
qsub -l select=3:nodearray=execute1:ncpus=60:mpiprocs=60,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For HB120rs_v3, ncpus=120 ?, there is also references to hbv2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During my tests, I've used hbv2 reference and I was able to perform tests successfully on HBv2 and HBv3.
Do you recommend changing to hbv3 reference when running on HBv3?
If I change it, do I need to run the WRF and WPS build again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it would be better to use HBv3 (but not absolutely necessary, latest is now HBv4, it will keep changing)
apps/wrf/readme.md
Outdated
mkdir ~/test6 | ||
cd ~/test6 | ||
|
||
qsub -l select=3:nodearray=execute1:ncpus=64:mpiprocs=64,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HB120-64rs_v3 test, but hbv2 references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above. During my tests, I've used hbv2 reference and I was able to perform tests successfully on HBv2 and HBv3.
Do you recommend changing to hbv3 reference when running on HBv3?
If I change it, do I need to run the WRF and WPS build again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good practice to build on the SKU you are running on. It's confusing to run on HBv3 but reference hbv2. To simply to documentation, I would just pick HBv3 (because its newer than hbv2) and give a few examples running specifically on HBv3. You could then add a note to state that a very similar procedure can be also used to run WRF on hbv2.
In this Pull Request I'm detailing all the setup procedures to run and test WRF v4 using Cycle Cloud.
The original setup procedure was not clear enough and there were some missing steps. I took a long time to figure out the missing steps and make it work.
I'm sharing this back to the community as I believe I will be useful for everybody who wants to run a WRF v4 test on Azure using Cycle Cloud.