Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #2995 might produce non-deterministic results for edrt simulations #3010

Closed
steffenaxer opened this issue Dec 21, 2023 · 21 comments
Closed
Assignees

Comments

@steffenaxer
Copy link
Collaborator

steffenaxer commented Dec 21, 2023

@michalmac with PR #2995, I have one edrt test failing with non-deterministic result. It's pretty hard to find the root cause. But I can say with 100 % certainty that with PR #3003 the same simulation behaves as expected.

In my test I am counting passenger pickup events. Since PR #2995 slightly different results occur in which vehicles show slightly different discharging curves. That leads finally to different vehicle availabilities causing slightly different passenger pickup events. May I kindly ask to check for this behavior? I will try to create a test. However, any assistance in the meantime would be wonderful. Maybe @sebhoerl has an idea regarding this issue?

@michalmac
Copy link
Member

@steffenaxer #2995 obviously results in a different behaviour, so that is fine. But could you confirm: is the behaviour always reproducible (i.e. deterministic) or not?

If it is non-deterministic, then it would be helpful if you could provide some samples of events of two or more different runs (for the same matsim version). Thanks!

@steffenaxer
Copy link
Collaborator Author

Yes in can confirm that the simulation produces different non deterministic results. I will provide you tomorrow the requested files!

@steffenaxer
Copy link
Collaborator Author

steffenaxer commented Dec 22, 2023

Unfortunately debugging is quite painful. I tried to strip down code and variants as best as possible and found out that pickUpEvents behaves non-deterministic if we have two drt modes. These two modes are the only available modes. Agents are assigned to modes. (no mode choice)

Both drt service specs are exactly the same. If I remove the second drt mode everything is again deterministic. However, I have also a test without eDrt and the remaining setup is identical. This non eDrt test produces after 10 iterations deterministic results.
So the problem needs to be somehow related to the following aspects:

  1. There are two drt modes
  2. eDrt is used
  3. in the first iteration (Iteration 0) results are identical. Simulation deviates in later iterations.

@steffenaxer
Copy link
Collaborator Author

steffenaxer commented Dec 22, 2023

Please find attached three simulations, executed with (PR #2991, which is merged after #2995)
output_events_1.xml.gz (pickUpEvents=2747)
output_events_2.xml.gz (pickUpEvents=2754)
output_events_3.xml.gz (pickUpEvents=2755)

@steffenaxer
Copy link
Collaborator Author

Please find attached three simulations, executed with (PR #3003)
output_events_1.xml.gz (pickUpEvents=2716)
output_events_2.xml.gz (pickUpEvents=2716)
output_events_3.xml.gz (pickUpEvents=2716)

With PR #3003 the simulation produces the very same number of pickUpEvents deterministically

@michalmac
Copy link
Member

Have you tried running with #2995 but without #2991?

Looking at the events, they start to differ from the very beginning of the day. For instance, this is the first change between output events 1 and 3 (attached to #3010 (comment)):

	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_d5cb375c-6afb-4fae-98a6-17d42f36f16f" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_d5cb375c-6afb-4fae-98a6-17d42f36f16f" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />
	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_8ed5dc20-7586-4954-8603-7591af0a22ec" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_8ed5dc20-7586-4954-8603-7591af0a22ec" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />
	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_2694435f-4624-4218-b9a8-4b5ed3f1ac39" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_2694435f-4624-4218-b9a8-4b5ed3f1ac39" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />
	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_0a99091c-bdbf-4ada-957a-0494a6164cc5" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_0a99091c-bdbf-4ada-957a-0494a6164cc5" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />

vs

	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_c082274f-6ee1-4469-affd-b3fad23d089f" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_c082274f-6ee1-4469-affd-b3fad23d089f" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />
	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_a5ba5671-1f30-42e2-9123-053f2b8ac8a8" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_a5ba5671-1f30-42e2-9123-053f2b8ac8a8" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />
	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_5a49e668-e8ea-41d3-8c05-85d0025aef18" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_5a49e668-e8ea-41d3-8c05-85d0025aef18" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />
	<event time="18010.0" type="actend" person="COMPANION_drt2_drt2652_50ab4f40-7344-453b-903b-9943afe6deef" link="10_4" actType="home"  />
	<event time="18010.0" type="departure" person="COMPANION_drt2_drt2652_50ab4f40-7344-453b-903b-9943afe6deef" link="10_4" legMode="walk" computationalRoutingMode="drt2"  />

@michalmac
Copy link
Member

These changes are in the person ids, maybe they have impact, maybe not.

@michalmac
Copy link
Member

Have you tried running with #2995 but without #2991?

I meant that maybe the non-determinism was introduced with #2991 .

@steffenaxer
Copy link
Collaborator Author

No, even with #2995 the problem exists.

@steffenaxer
Copy link
Collaborator Author

Executed with #2995 and without companions (so no UUIDS)
output_events_1.xml.gz
output_events_2.xml.gz
output_events_3.xml.gz

@steffenaxer
Copy link
Collaborator Author

I might have found a custom module that could cause this non-deterministic behavior. Maybe it is a side effect of #2995. I'll keep you posted.

@michalmac
Copy link
Member

Executed with #2995 and without companions (so no UUIDS) output_events_1.xml.gz output_events_2.xml.gz output_events_3.xml.gz

Thanks. This helped a lot. Comparing 1 and 2, the first meaningful difference that I spotted is:

	<event time="31752.0" type="DrtRequest submitted" mode="drt2" request="drt2_1576" person="drt24606" fromLink="42_1" toLink="4_3" unsharedRideTime="935.127999999997" unsharedRideDistance="18400.727921466456" earliestDepartureTime="31752.0" latestPickupTime="32352.0" latestDropoffTime="33661.1792"  />
	<event time="31753.0" type="PassengerRequest scheduled" mode="drt2" request="drt2_1576" person="drt24606" vehicle="drt2_32_3_3" pickupTime="31923.0" dropoffTime="33064.291"  />

vs

	<event time="31752.0" type="DrtRequest submitted" mode="drt2" request="drt2_1576" person="drt24606" fromLink="42_1" toLink="4_3" unsharedRideTime="934.8889999999969" unsharedRideDistance="18400.727921466456" earliestDepartureTime="31752.0" latestPickupTime="32352.0" latestDropoffTime="33660.8446"  />
	<event time="31753.0" type="PassengerRequest scheduled" mode="drt2" request="drt2_1576" person="drt24606" vehicle="drt2_32_3_3" pickupTime="31923.0" dropoffTime="33064.051999999996"  />

This would mean that unsharedRideTime is not the same, which may lead to different scheduling decisions downstream. So I would suggest finding the cause of this discrepancy.

@steffenaxer
Copy link
Collaborator Author

Thanks, good catch. I switch off all custom modules but error remains.

@steffenaxer
Copy link
Collaborator Author

steffenaxer commented Dec 22, 2023

Well the problem is, you are watching on the 10th iteration. Travel times could deviate. So this deviation in travel time is in my eyes not the root cause, it is also a symptom.

@steffenaxer
Copy link
Collaborator Author

steffenaxer commented Dec 22, 2023

If I replace the DriveDischargingHandler from #2995 with #3003. All results are identical. It needs to be related to the DriveDischargingHandler

@michalmac
Copy link
Member

Well the problem is, you are watching on the 10th iteration. Travel times could deviate. So this deviation in travel time is in my eyes not the root cause, it is also a symptom.

I thought it was iteration 1, so the first iteration with deviating events (as you mentioned events in iteration 0 were identical).

BTW. Do you run a single-threaded qsim?

@steffenaxer
Copy link
Collaborator Author

I'm running a multithreaded qsim and I almosed finished a test branch with a multiEdrt mielec scenario. I hope I can reproduce the problem. But I have the feeling it is also a matter of the scenario size. So in my test I'm normally simulating 9000 drt trips with 200 vehicles. Don't know if mielec is big enough.

@steffenaxer
Copy link
Collaborator Author

Finally, please checkout this branch https://github.com/steffenaxer/matsim-libs/tree/edrtDeterminism and run RunEDrtScenarioIT.testMultiModeDrtDeterminism() fails regularly. Hopefully @michalmac can debug it. It seems that the problem is not related to a mulithreaded qsim, it even produces non-deterministic results with default settings, which is one thread.

@steffenaxer
Copy link
Collaborator Author

With 9 runs the simulation produces the following sequence of counted pickUpEvents:
1928,1940,1968,1968,1928,1968,1968,1968,1972

@steffenaxer
Copy link
Collaborator Author

output_events_1.xml.gz
output_events_2.xml.gz
All good things come in threes. Now the events from it.2, which is the iteration, in which the results deviate.

@steffenaxer
Copy link
Collaborator Author

steffenaxer commented Dec 25, 2023

	<event time="21707.0" type="drivingEnergyConsumption" link="274" vehicle="drt_veh_3_1" energy="130789.80270128997" endCharge="5.377920119729871E7"  />
	<event time="21711.0" type="left link" link="271" vehicle="drt_veh_3_1"  />
	<event time="21711.0" type="entered link" link="256" vehicle="drt_veh_3_1"  />
	<event time="21712.0" type="drivingEnergyConsumption" link="271" vehicle="drt_veh_3_1" energy="31267.00242006006" endCharge="5.374793419487865E7"  />
	<event time="21726.0" type="left link" link="256" vehicle="drt_veh_3_1"  />
	<event time="21726.0" type="entered link" link="581" vehicle="drt_veh_3_1"  />
	<event time="21762.0" type="left link" link="581" vehicle="drt_veh_3_1"  />
	<event time="21762.0" type="entered link" link="584" vehicle="drt_veh_3_1"  />
	<event time="21763.0" type="drivingEnergyConsumption" link="581" vehicle="drt_veh_3_1" energy="253162.89461231497" endCharge="5.349477130026634E7"  />
	<event time="21707.0" type="drivingEnergyConsumption" link="274" vehicle="drt_veh_3_1" energy="130789.80270128997" endCharge="5.377920119729871E7"  />
	<event time="21711.0" type="left link" link="271" vehicle="drt_veh_3_1"  />
	<event time="21711.0" type="entered link" link="256" vehicle="drt_veh_3_1"  />
	<event time="21712.0" type="drivingEnergyConsumption" link="271" vehicle="drt_veh_3_1" energy="31267.00242006006" endCharge="5.374793419487865E7"  />
	<event time="21726.0" type="left link" link="256" vehicle="drt_veh_3_1"  />
	<event time="21726.0" type="entered link" link="581" vehicle="drt_veh_3_1"  />
	<event time="21727.0" type="drivingEnergyConsumption" link="256" vehicle="drt_veh_3_1" energy="103255.10739575524" endCharge="5.36446790874829E7"  />
	<event time="21762.0" type="left link" link="581" vehicle="drt_veh_3_1"  />
	<event time="21762.0" type="entered link" link="584" vehicle="drt_veh_3_1"  />
	<event time="21763.0" type="drivingEnergyConsumption" link="581" vehicle="drt_veh_3_1" energy="253361.04458302655" endCharge="5.339131804289987E7"  />

I don't know why but there are energy events missing e.g. at time 21727. That is the reason of the deviation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants