Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential parallel bug with solute transport in v1.5.1? #285

Open
smolins opened this issue Dec 16, 2024 · 19 comments
Open

Potential parallel bug with solute transport in v1.5.1? #285

smolins opened this issue Dec 16, 2024 · 19 comments
Assignees
Labels

Comments

@smolins
Copy link
Contributor

smolins commented Dec 16, 2024

I identified two issues with solute transport in ATS v1.5.1. These issues have been affecting reactive transport but I was able to narrow down to potential issues with transport. I cannot reproduce these issues in a 1D test simulation but it is still unclear to me why these appear only in higher dimensionality simulations. Also, it is unclear whether they are related or not.

The file attached is a transport version of the demo under ats-demos/13_integrated_hydro_reactive_transport/hillslope_calcite_crunch_sigmoid.xml, which is described in Molins et al 2022 WRR. Here it is modified to include only 1 tracer, with initial concentration in the domain = 1 and =0 in the rain water ( hillslope_transport_sigmoid_100s.xml)

hillslope_transport_sigmoid_100s.txt

The 2 issues are

  1. parallel simulations result in concentration hotspots that repeat as many times as the number processors one is using. The plot for time = 1 day is presented for a 4 processor run. (parallel.png)

parallel

  1. as the water table drops in the uphill portion of the domain from the initial position due to drainage in the bottom, the concentrations of the tracer increase above 1. That cannot be. At the same time concentrations of the tracer in the downhill portion of the domain decrease below zero, as that area's liquid saturation increases so that the initial water can exit through the surface . This is apparent as early as time = 3 days (concentration.png , note: this was obtained in a serial run to avoid mixing in the parallel issues) but gets worse as time advances.

concentration

There is another issue with concentrations that appears at time = 1 day near the left boundary. This issue is buried by issue 2 at time = 3 days. The position of the cell with an off concentration is suspiciously close to the position of the "hot" cells in the parallel runs.

@smolins
Copy link
Contributor Author

smolins commented Dec 17, 2024

I am adding a figure for the last point I raise above. It seems related to the issue raised in point 1 but parallel runs may make it more apparent.

concentration1day

@dasvyat
Copy link
Contributor

dasvyat commented Dec 17, 2024

One of the source of this bug is subcyling. With simple week coupling there is no overshoots (at least on 1 core). I'm looking into this issue

@dasvyat dasvyat changed the title Potential bugs with solute transport in v1.5.1? Potential parallel bug with solute transport in v1.5.1? Dec 19, 2024
@smolins
Copy link
Contributor Author

smolins commented Jan 15, 2025

I looked a little more into this issue and have not been able to link it clearly to anything. However, I do see that the initial velocities show the same striped pattern from some sort of parallel issue. This pattern disappears right away with the Darcy velocities but then little by little appears in the concentrations.

velocity_time_zero

Could any of you @levuvietphong or @dasvyat comment whether there could an issue that connected the initial velocities with the concentration hotspots showing up in the early transport.

This issue is not a big deal for transport as these differences are small and they are soon overwhelmed by larger concentration differences (like when a front arrives) but when coupled to geochemistry, this leads to problems. For example, it may change mineral or sorbed concentrations which remain like that for the rest of the simulation.

@dasvyat
Copy link
Contributor

dasvyat commented Jan 16, 2025

@smolins , I looked into this issues and for my surprise, it is not a transport issue, in my opinion. The discrepancies are coming directly from flow. I don't know what is the reason at the moment. I've increased nonlinear tolerance with the hope that it can help, but it didn't. I'm looking into this issue.

@ecoon
Copy link
Collaborator

ecoon commented Jan 17, 2025

Is it in velocities or in fluxes? I would be very surprised if it was in fluxes, but less surprised if it was in velocities/reconstruction, since we rarely look at those. I know that most of transport uses fluxes, but are you using a velocity-dependent dispersion?

@smolins
Copy link
Contributor Author

smolins commented Jan 17, 2025

These simulations do no include dispersion (i.e. dispersion not in the input file) and diffusion uses default values (0.0, I assume).

@ecoon
Copy link
Collaborator

ecoon commented Jan 17, 2025

Ok, then I'm not sure if transport uses velocity anywhere else.

I'm not sure of the magnitudes here -- is it possible that this is due to block preconditioners? Does this go away if you use e.g. Boomer AMG?

@dasvyat
Copy link
Contributor

dasvyat commented Jan 17, 2025

I withdraw my previous comment. My comparison was wrong

@levuvietphong
Copy link
Contributor

Danil's PR (#290) has fixed the stripe pattern in the velocity.2 field at t=0. However, I ran the hillslope_calcite_crunch_sigmoid.xml example again and I got the problem of no convergence at a specific cell (see log below). Do you have the same issue?

Image

Alquimia_PK:domain |  no convergence in cell: 1427
reactive transport |  Alquimia_PK:domain failed.
surface transport  |  ----------------------------------------------------------------
surface transport  |  Advancing: t0 = 252751 t1 = 252841 h = 89.5266
surface transport  |  ----------------------------------------------------------------
surface transport  |  1 sub-cycles, dt_stable=1.49211 min [sec]  dt_MPC=1.49211 min [sec]
subsurface transpo |  ----------------------------------------------------------------
subsurface transpo |  Advancing: t0 = 252751 t1 = 252841 h = 89.5266
subsurface transpo |  ----------------------------------------------------------------
inverse::PCG       |  Converged (relative RHS), itr=1 ||r||=5.6449e-19 ||f||=90.9331
inverse::PCG       |  Converged (relative RHS), itr=1 ||r||=2.09579e-14 ||f||=27949.6
inverse::PCG       |  Converged (relative RHS), itr=1 ||r||=1.16336e-14 ||f||=829934
inverse::PCG       |  Converged (relative RHS), itr=1 ||r||=9.95036e-15 ||f||=1.05614e+07
inverse::PCG       |  Converged (relative RHS), itr=1 ||r||=9.80417e-22 ||f||=1.05614
subsurface transpo |   dispersion solver ||r||=8.50849e-15 itrs=1
subsurface transpo |  1 sub-cycles, dt_stable=7.29735 min [sec]  dt_MPC=1.49211 min [sec]
Alquimia_PK:surfac |  min/avg/max Newton: 0/0/1, the maximum is in cell 98
No convergence at:            1           1           1

@smolins
Copy link
Contributor Author

smolins commented Jan 22, 2025 via email

@smolins
Copy link
Contributor Author

smolins commented Jan 22, 2025 via email

@levuvietphong
Copy link
Contributor

I ran again the hillslope_transport_sigmoid_100s.txt example:

  • The vertical stripes are still present. Probably, I made a mistake that I visualize results from a serial run instead of a parallel run in my previous post.
  • The stripes are in both darcy_velocity.0 and darcy_velocity.2 fields
  • The number of vertical stripes correspond to N-1, where N is the number of processors used in the parallel run. So I guess @smolins used 8 processors in the plot he showed, leading to 7 vertical stripes. It looks like to me the stripes are at the halo zone of each subdomain.
  • The stripes go away immediately in the next time step. May be it a parallel problem in the initialization step only.

@dasvyat
Copy link
Contributor

dasvyat commented Jan 23, 2025

To investigate parallel issue I suggest to switch to weak coupling instead of subcycling to simplify the model. Moreover, something happened between 1.5.1 and current master. When I've added parallel communication to 1.5.1 there were no stripes. After I've merged it with master they have appeared. @levuvietphong can you confirm that using amanzi-1.5.1 and dsv/test_ats-1.5.1 you don't see stripes either.

hillslope_transport_sigmoid_100s_weak.txt

@levuvietphong
Copy link
Contributor

levuvietphong commented Jan 23, 2025

@dasvyat: No, using amanzi-1.5.1 and dsv/test_ats-1.5.1 I still see the stripes. This run used 4 cores.

Image

@dasvyat
Copy link
Contributor

dasvyat commented Jan 23, 2025

What are you plotting? And what is the range?

@levuvietphong
Copy link
Contributor

@dasvyat: I updated the plot. It is the darcy_velocity.2

@dasvyat
Copy link
Contributor

dasvyat commented Jan 23, 2025

@levuvietphong What is about tracer concentration? Darcy velocity is not the best indicator. It is a post-processed quantity which is computed based on face unknown for visualization. After how many days do you plot it?

@dasvyat
Copy link
Contributor

dasvyat commented Jan 24, 2025

Currently, darcy velocity is computed at the initialization stage without parallel update. That's why stripes are observed at t=0. It can be (should be) corrected, but it is not a big deal since during actual AdvanceStep this parallel update is performed and concentration ( and all other fields are updated correctly). The pattern in concentration should not be observed.

@levuvietphong
Copy link
Contributor

The tracer concentration is consistently 1.0 across the domain and over time. Below is the total_component_quantity.tracer field.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants