Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRF detailed setup procedure #689

Open
wants to merge 77 commits into
base: master
Choose a base branch
from
Open

Conversation

marcusgaspar
Copy link

In this Pull Request I'm detailing all the setup procedures to run and test WRF v4 using Cycle Cloud.
The original setup procedure was not clear enough and there were some missing steps. I took a long time to figure out the missing steps and make it work.
I'm sharing this back to the community as I believe I will be useful for everybody who wants to run a WRF v4 test on Azure using Cycle Cloud.

@xpillons xpillons requested a review from garvct October 27, 2022 16:28
Copy link
Collaborator

@garvct garvct left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all these detailed instructions, but azurehpc/apps/* (e.g wrf) should only contain scripts and code to build, install and run applications (independent of cluster deployment). I think the best location for deploying WRF on a cyclecloud cluster would be under the experimental directory.
Would it be possible to update/add the wrf build and install scripts (including creating the wrf data) in azurehpc/wrf and putting the complete deployment of WRF on cyclecloud under the experimental directory?

@@ -0,0 +1,105 @@
# Install and Setup CycleCloud for a Lab environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does examples/cycleserver_msi and examples/cycleserver deploy VNET and cycleserver automatically via a simple azurehpc config file.
It seems you are deploying the same but with all the manual steps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm using manual steps. These manual steps can be useful in scenarios:

  • where people may not want to install it using azurehpc scripts; or
  • for learning purposes, where people wants to understand what exactly is installed/required.

I can add a mention about examples/cycleserver_msi and examples/cycleserver as an alternative option.


Summary of this procedure:
- Installs CycleCloud environment from scratch
- Creates NFS storage server using CycleCloud cluster template
Copy link
Collaborator

@garvct garvct Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Azure netapp files or a PFS be better for production?

Copy link
Author

@marcusgaspar marcusgaspar Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. But, this is a procedure to setup a Lab environment.
I will add comments regarding Lab env and ANF or PFS as options for production.

## Download azurehpc GitHub repository
cd /data
#git clone https://github.com/Azure/azurehpc.git
git clone https://github.com/marcusgaspar/azurehpc.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this URL correct (you are pointing to your fork?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is temporary, as I'm currently using my fork during POCs.

mkdir ~/test1
cd ~/test1

qsub -l select=1:nodearray=execute1:ncpus=60:mpiprocs=60,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HBv2, should ncpus=120 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were tests I did to measure the execution time with different configs. I forgot to add the execution time duration results. I will add a chart with it.

qsub -l select=1:nodearray=execute1:ncpus=60:mpiprocs=60,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs
```

- Test 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so many tests, is the only difference between each test the number of nodes (select=N)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were tests I did to measure the execution time with different configs. I forgot to add the execution time duration results. I will add a chart with it.

mkdir ~/test5
cd ~/test5

qsub -l select=3:nodearray=execute1:ncpus=60:mpiprocs=60,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HB120rs_v3, ncpus=120 ?, there is also references to hbv2 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my tests, I've used hbv2 reference and I was able to perform tests successfully on HBv2 and HBv3.
Do you recommend changing to hbv3 reference when running on HBv3?
If I change it, do I need to run the WRF and WPS build again?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it would be better to use HBv3 (but not absolutely necessary, latest is now HBv4, it will keep changing)

mkdir ~/test6
cd ~/test6

qsub -l select=3:nodearray=execute1:ncpus=64:mpiprocs=64,place=scatter:excl -v "SKU_TYPE=hbv2,INPUTDIR=/apps/hbv2/wrf-openmpi/WRF-4.1.5/run" /data/azurehpc/apps/wrf/run_wrf_openmpi.pbs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HB120-64rs_v3 test, but hbv2 references?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above. During my tests, I've used hbv2 reference and I was able to perform tests successfully on HBv2 and HBv3.
Do you recommend changing to hbv3 reference when running on HBv3?
If I change it, do I need to run the WRF and WPS build again?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good practice to build on the SKU you are running on. It's confusing to run on HBv3 but reference hbv2. To simply to documentation, I would just pick HBv3 (because its newer than hbv2) and give a few examples running specifically on HBv3. You could then add a note to state that a very similar procedure can be also used to run WRF on hbv2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants