Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a developer, I want to investigate the performance benefits (if any) of feature-batched retrieval #270

Open
epag opened this issue Aug 21, 2024 · 332 comments

Comments

@epag
Copy link
Collaborator

epag commented Aug 21, 2024


Author Name: James (James)
Original Redmine Issue: 95867, https://vlab.noaa.gov/redmine/issues/95867
Original Date: 2021-09-03
Original Assignee: James


Given an evaluation that contains N singleton features, such as scenario703
When I consider how to evaluate it performantly
Then I want to consider feature-batched retrieval for N/M features at once


Related issue(s): #286
Redmine related issue(s): 98818, 99120, 99338, 99680, 99719, 99827, 99932, 99964, 99980, 109030


@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-09-03T10:55:02Z


One of the possibilities opened up by the refactoring for #95438 is to allow feature-batched retrieval. This is practically the same as multi-feature pooling, only the batched pool is then decomposed into its constituent 1-features pools.

It is almost certainly true that retrieval of time-series data for N features at once does not take N times as long as retrieval of N features separately - it is probably closer to 1:1. Thus, there may be a performance advantage for multi-feature evaluations w/r to retrieval time.

But there's no free lunch and batching would necessarily increase the risk of an oome in the same way that feature pooling increases that risk (only the batch sizes would be calibrated to reflect that - edit: perhaps even dynamically if we were clever, but then cleverness means extra code, more brittleness).

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-09-03T10:57:00Z


This is analogous to feature-batched retrieval from nwis or wrds on the reading side, I guess.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-04T11:33:56Z


Consider this retrieval script from #96951, which contains a feature group of 130 features (not in prepared statement form).

SELECT 
    TS.timeseries_id AS series_id,
    TS.initialization_date AS reference_time,
    TS.initialization_date + INTERVAL '1' MINUTE * TSV.lead AS valid_time,
    TSV.series_value AS measurement,
    TS.measurementunit_id,
    TS.scale_period,
    TS.scale_function,
    TS.feature_id,
    COUNT(*) AS occurrences
FROM wres.TimeSeries TS
    INNER JOIN wres.TimeSeriesValue TSV
        ON TSV.timeseries_id = TS.timeseries_id
    INNER JOIN wres.ProjectSource PS
        ON PS.source_id = TS.source_id
WHERE PS.project_id = 9025
    AND TS.variable_name = 'streamflow'
    AND TS.feature_id = ANY('{232695, 232783, 232814, 232828, 232699, 232791, 232812, 232716, 232745, 232741, 232800, 232825, 232754, 232768, 232832, 232743, 232705, 232709, 232763, 232728, 232798, 232733, 232715, 232778, 232689, 232816, 232736, 232737, 232827, 232819, 232704, 232829, 232712, 232739, 232820, 232773, 232794, 232801, 232822, 232750, 232714, 232719, 232792, 232764, 232777, 232765, 232718, 232779, 232826, 232700, 232707, 232803, 232758, 232730, 232752, 232770, 232693, 232775, 232802, 232738, 232766, 232692, 232786, 232781, 232735, 232698, 232706, 232810, 232722, 232723, 232756, 232701, 232697, 232729, 232696, 232751, 232717, 232811, 232724, 232755, 232688, 232785, 232824, 232694, 232815, 232710, 232774, 232809, 232821, 232799, 232702, 232759, 232761, 232753, 232691, 232731, 232732, 232805, 232762, 232727, 232818, 232744, 232796, 232776, 232747, 232767, 232760, 232808, 232784, 232725, 232726, 232831, 232807, 232746, 232813, 232720, 232817, 232830, 232713, 232797, 232787, 232769, 232823, 232749, 232771, 232806, 232748, 232833, 232782, 232734}')
    AND PS.member = 'right'
    AND TSV.lead > 0
    AND TSV.lead <= 1080
    AND TS.initialization_date > '2017-08-08T09:00Z'
    AND TS.initialization_date <= '2017-08-08T10:00Z'
    AND TS.initialization_date + INTERVAL '1' MINUTE * TSV.lead > '2017-08-07T23:00Z'
    AND TS.initialization_date + INTERVAL '1' MINUTE * TSV.lead <= '2017-08-09T17:00Z'
GROUP BY TS.feature_id, series_id, TSV.lead, TSV.series_value
ORDER BY occurrences, TS.initialization_date, valid_time, series_id;
</code>

Warmed up, an average of 5 retrievals is around ~60ms. Now consider the same script with a single location. The average-across-five retrieval time after warm-up is about 40ms.

No real surprise, but emphasizes the dramatic speed-up in retrieval times that can be expected for multi-feature evaluations by feature grouping (regardless of whether the evaluation involves feature pooling or singleton feature tuples).

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-04T11:41:51Z


In other words, think wpod-style evaluations.

With feature-batched retrieval already enabled for feature pooling, the time to implement this in a minimal way would be fairly small.

However, in order to avoid arbitrarily large evaluations in memory, we'd probably want to extend the @retriever@ api to allow for retrieval from a batched-retriever by feature tuple, thereby providing an interface to the underlying @ResultSet@ to acquire only some of the data from it, as needed. This is a little different from the solution to #95488, which is true streaming or deferred reading because, in this case, we only want to read part of the @ResultSet@ (edit: whereas #95488 is about reading all time-series, one by one, each one generating a pull from the @ResultSet@).

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-29T12:37:57Z


I want to look at this for a couple of reasons.

First, the wpod (large-in-space, short-in-time) evaluations are a significant bottleneck in cowres, some of them taking up to 10-12 hours and disrupting other (e.g., hefs) evaluations through interleaving ingest and retrieval, thereby reducing throughput overall and contributing to outliers. Around one-third to one-half of the total time is spent in retrieval, the rest in reading/ingest of which most seems to be wrds.

Second, this should be relatively easy to achieve with the way retrieval is now abstracted and now that feature pooling is implemented because it essentially amounts to feature pooling on the front-end followed by unpooling/decomposition on the backend. This type of optimization can be implemented in the @poolfactory@, which decides how best to retrieve and cache data for pooling with the aid of a @RetrieverFactory@.

Necessarily, this involves a trade-off between efficient retrieval (scripts that retrieve more features at once) and increased memory usage (more features in memory at once), so it will need to be configurable on the basis of what we actually see in production, with conservative initial assumptions. In general, large-in-space evaluations are short in time and vice versa, simply because large-in-space and long-in-time evaluations are impractical/intractable.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-29T12:45:34Z


The idea would be to have a threshold at which feature-batched retrieval occurs. Let's say an evaluation that contains 10 or more singleton feature groups (groups that are not singletons are already feature-batched), above which features are batched. Then there would be a batch size, say 10 features per batch (or the smaller of that number and the total features in the evaluation, since the number would be configurable). These two things would be configurable in @wresconfig.xml@ and overridable with runtime system properties and my expectation is that they would be quite easy to set, in practice, and would probably never/rarely get touched, but could be in the mix with ram increments as "what shall we tweak to make best use of this extra ram?".

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-29T15:17:16Z


#99040

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T12:00:22Z


Made a decent amount of progress with this. Indeed, it was relatively easy to achieve. Need to do some sensibility checking of the debug logging on @wres.io.retrieval@ and elsewhere to make sure the expected feature-batched retrievals are happening without any duplication. Also need to expose the two parameters in the @wresconfig.xml@ and choose initial values. Then need to top-level-integration-test w/ some large-in-space evaluations to see what performance gains are accrued. I think the largest evaluation (edit: spatially) I have locally is scenario703, so I might need to get an explicit list of features for a wpod evaluation.

Hank, any chance you could help with an explicit list of features from a wpod evaluation? I think there are around 2k, from memory. I'm not sure whether they use a wrds region definition or list them explicitly. Thanks!

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2021-11-30T12:30:33Z


James,

Sure thing. I'll see if I can get to it later this morning,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2021-11-30T14:03:10Z


James,

The list below is what I generated to support the dStore evaluation of NWM forecasts for the performance tests. Dave/Juzer, I believe, use the WRDS tag "usgs_gages_ii_ref_headwater" and this was constructed based on that tag.

Good enough?

Hank

=============================================================

    <feature left="01013500" right="724696"/>
    <feature left="01021470" right="2677654"/>
    <feature left="01021480" right="2676222"/>
    <feature left="01027200" right="1702734"/>
    <feature left="01029200" right="1711010"/>
    <feature left="01031300" right="1721977"/>
    <feature left="01031510" right="1722933"/>
    <feature left="01037380" right="2685414"/>
    <feature left="01044550" right="1032315"/>
    <feature left="01046000" right="3318560"/>
    <feature left="01047000" right="3321976"/>
    <feature left="01047200" right="3323262"/>
    <feature left="01048220" right="3320218"/>
    <feature left="01052500" right="19334265"/>
    <feature left="01054200" right="6716129"/>
    <feature left="01054300" right="6710239"/>
    <feature left="01055000" right="6709987"/>
    <feature left="01057000" right="6711893"/>
    <feature left="01064801" right="9315863"/>
    <feature left="01067950" right="5845376"/>
    <feature left="01073000" right="5845058"/>
    <feature left="01074520" right="6728911"/>
    <feature left="01077400" right="6730525"/>
    <feature left="01078000" right="6731199"/>
    <feature left="01086000" right="6761776"/>
    <feature left="01091000" right="6741162"/>
    <feature left="01115630" right="6129235"/>
    <feature left="01117370" right="6140308"/>
    <feature left="01117468" right="6140842"/>
    <feature left="01117800" right="6139996"/>
    <feature left="01118300" right="6140826"/>
    <feature left="01121000" right="6162583"/>
    <feature left="01123000" right="6162981"/>
    <feature left="01130000" right="4593015"/>
    <feature left="01134500" right="4570675"/>
    <feature left="01137500" right="4594599"/>
    <feature left="01139000" right="4573927"/>
    <feature left="01139800" right="4573939"/>
    <feature left="01142500" right="6083547"/>
    <feature left="01150900" right="6089027"/>
    <feature left="01154000" right="10102844"/>
    <feature left="01162500" right="9343363"/>
    <feature left="01169000" right="10294784"/>
    <feature left="01170100" right="10294622"/>
    <feature left="01174565" right="7690043"/>
    <feature left="01181000" right="6096297"/>
    <feature left="01187300" right="6105951"/>
    <feature left="01193500" right="7700458"/>
    <feature left="01194000" right="7702612"/>
    <feature left="01194500" right="7702618"/>
    <feature left="01195100" right="6177558"/>
    <feature left="01198000" right="7709954"/>
    <feature left="01203805" right="7713138"/>
    <feature left="01208950" right="7733309"/>
    <feature left="01208990" right="7731677"/>
    <feature left="01333000" right="22290373"/>
    <feature left="01343060" right="22741265"/>
    <feature left="01350080" right="3247464"/>
    <feature left="0136230002" right="6189648"/>
    <feature left="01362370" right="6189644"/>
    <feature left="01362497" right="6189752"/>
    <feature left="01363382" right="6191718"/>
    <feature left="01365000" right="6200212"/>
    <feature left="01374781" right="6227370"/>
    <feature left="01409810" right="9454449"/>
    <feature left="01411300" right="9436435"/>
    <feature left="01414000" right="1748723"/>
    <feature left="01415000" right="1748589"/>
    <feature left="01422500" right="2612842"/>
    <feature left="0142400103" right="2614018"/>
    <feature left="01434017" right="4147394"/>
    <feature left="01434025" right="4147946"/>
    <feature left="01439500" right="4153168"/>
    <feature left="01440000" right="4151302"/>
    <feature left="01440400" right="4153166"/>
    <feature left="01451800" right="4187713"/>
    <feature left="01460880" right="2591267"/>
    <feature left="01471875" right="4780921"/>
    <feature left="01485000" right="8401303"/>
    <feature left="01485500" right="8401785"/>
    <feature left="01486000" right="8401391"/>
    <feature left="01487000" right="8392898"/>
    <feature left="01490000" right="8382361"/>
    <feature left="01491000" right="9407484"/>
    <feature left="01492500" right="4768920"/>
    <feature left="01493500" right="4766828"/>
    <feature left="01502500" right="8086663"/>
    <feature left="01510000" right="9420683"/>
    <feature left="01516500" right="8112305"/>
    <feature left="01518862" right="8111873"/>
    <feature left="01525981" right="8110727"/>
    <feature left="01527500" right="8118867"/>
    <feature left="01539000" right="2603023"/>
    <feature left="01542810" right="8125917"/>
    <feature left="01544500" right="8134650"/>
    <feature left="01545600" right="8134878"/>
    <feature left="01547700" right="8139414"/>
    <feature left="01549500" right="8144818"/>
    <feature left="01550000" right="8152817"/>
    <feature left="01552000" right="8152257"/>
    <feature left="01552500" right="8151091"/>
    <feature left="01557500" right="4683152"/>
    <feature left="01564500" right="4699877"/>
    <feature left="01566000" right="4697481"/>
    <feature left="01567500" right="4710396"/>
    <feature left="01571184" right="4711782"/>
    <feature left="01580000" right="4726301"/>
    <feature left="01581810" right="11687020"/>
    <feature left="01581830" right="11687078"/>
    <feature left="01581870" right="11687118"/>
    <feature left="01581960" right="11687048"/>
    <feature left="01586610" right="11688538"/>
    <feature left="01591400" right="11907272"/>
    <feature left="01596500" right="14364088"/>
    <feature left="01597000" right="14364102"/>
    <feature left="01601000" right="14362658"/>
    <feature left="01605500" right="8423458"/>
    <feature left="01606000" right="8423282"/>
    <feature left="01609000" right="8431428"/>
    <feature left="01610155" right="8431622"/>
    <feature left="01613050" right="5892356"/>
    <feature left="01613525" right="5892862"/>
    <feature left="01613900" right="5895822"/>
    <feature left="01620500" right="5909157"/>
    <feature left="01632000" right="8441037"/>
    <feature left="01632900" right="8441303"/>
    <feature left="01634500" right="8440459"/>
    <feature left="01636690" right="4505510"/>
    <feature left="01639500" right="8449082"/>
    <feature left="01643700" right="4508450"/>
    <feature left="01658500" right="4529107"/>
    <feature left="01661050" right="4530917"/>
    <feature left="01662800" right="8466041"/>
    <feature left="01665500" right="8468791"/>
    <feature left="01666500" right="8468559"/>
    <feature left="01669000" right="8479268"/>
    <feature left="01669520" right="8457352"/>
    <feature left="02011400" right="8520617"/>
    <feature left="02013000" right="8525587"/>
    <feature left="02014000" right="8525619"/>
    <feature left="02015700" right="8520749"/>
    <feature left="02017500" right="8525659"/>
    <feature left="02024915" right="8548847"/>
    <feature left="02027000" right="8547581"/>
    <feature left="02027500" right="8549853"/>
    <feature left="02028500" right="8547459"/>
    <feature left="02030500" right="8547583"/>
    <feature left="02032640" right="8566583"/>
    <feature left="02038850" right="8611825"/>
    <feature left="02046000" right="8719481"/>
    <feature left="02051000" right="8744593"/>
    <feature left="02053200" right="8746117"/>
    <feature left="02053800" right="8629303"/>
    <feature left="02055100" right="8627131"/>
    <feature left="02056900" right="8626863"/>
    <feature left="02059500" right="8627301"/>
    <feature left="02064000" right="8647998"/>
    <feature left="02065500" right="8648830"/>
    <feature left="02069700" right="8675163"/>
    <feature left="02070000" right="8673051"/>
    <feature left="02074500" right="8673535"/>
    <feature left="02077200" right="8701361"/>
    <feature left="02079640" right="12049196"/>
    <feature left="0208111310" right="10449386"/>
    <feature left="02081500" right="8760623"/>
    <feature left="02082770" right="8755825"/>
    <feature left="02082950" right="10511316"/>
    <feature left="02084160" right="3350287"/>
    <feature left="02091000" right="11573918"/>
    <feature left="02092500" right="10976591"/>
    <feature left="02096846" right="8895436"/>
    <feature left="02102908" right="8849951"/>
    <feature left="02108000" right="10525935"/>
    <feature left="02111180" right="9251726"/>
    <feature left="02111500" right="9250814"/>
    <feature left="02118500" right="9233605"/>
    <feature left="02128000" right="9210446"/>
    <feature left="02137727" right="9754776"/>
    <feature left="02140991" right="9752476"/>
    <feature left="02142000" right="9753760"/>
    <feature left="02143000" right="9745122"/>
    <feature left="02143040" right="9745600"/>
    <feature left="02147126" right="9734048"/>
    <feature left="02147500" right="9735864"/>
    <feature left="02149000" right="12036581"/>
    <feature left="02152100" right="12034181"/>
    <feature left="02157470" right="9698267"/>
    <feature left="02167450" right="9869384"/>
    <feature left="02178400" right="6267840"/>
    <feature left="02192500" right="11737579"/>
    <feature left="02193340" right="6289497"/>
    <feature left="02196000" right="11730030"/>
    <feature left="02198100" right="9978327"/>
    <feature left="02198690" right="20106819"/>
    <feature left="02202600" right="9944858"/>
    <feature left="02204130" right="6333958"/>
    <feature left="02212600" right="6335902"/>
    <feature left="02215100" right="6365072"/>
    <feature left="02216180" right="6383975"/>
    <feature left="02221525" right="1056599"/>
    <feature left="02228500" right="18258887"/>
    <feature left="02231342" right="11014543"/>
    <feature left="02235200" right="10997161"/>
    <feature left="02245500" right="16662855"/>
    <feature left="02297155" right="16812461"/>
    <feature left="02298488" right="16837004"/>
    <feature left="02298530" right="16838802"/>
    <feature left="02299950" right="16877194"/>
    <feature left="02300700" right="16918210"/>
    <feature left="02310947" right="16955670"/>
    <feature left="02314500" right="1997544"/>
    <feature left="02321000" right="2161384"/>
    <feature left="02324000" right="1984358"/>
    <feature left="02324400" right="1978648"/>
    <feature left="02326000" right="1978638"/>
    <feature left="02327033" right="10319918"/>
    <feature left="02327100" right="10365698"/>
    <feature left="02330400" right="2077651"/>
    <feature left="02338523" right="3291248"/>
    <feature left="02339495" right="3296804"/>
    <feature left="02342850" right="3433426"/>
    <feature left="02343225" right="3440880"/>
    <feature left="02343940" right="2310009"/>
    <feature left="02349900" right="6442680"/>
    <feature left="02350600" right="6458195"/>
    <feature left="02361000" right="2188031"/>
    <feature left="02362240" right="2188549"/>
    <feature left="02363000" right="2210130"/>
    <feature left="02365470" right="2240423"/>
    <feature left="02365769" right="2241851"/>
    <feature left="02366996" right="476177"/>
    <feature left="02367310" right="476941"/>
    <feature left="02369800" right="789206"/>
    <feature left="02371500" right="2377281"/>
    <feature left="02372250" right="2323396"/>
    <feature left="02373000" right="2402121"/>
    <feature left="02374500" right="933140017"/>
    <feature left="02374745" right="2413254"/>
    <feature left="02374950" right="445834"/>
    <feature left="02381600" right="6478765"/>
    <feature left="02384540" right="12193270"/>
    <feature left="02388975" right="6495832"/>
    <feature left="02390000" right="6495864"/>
    <feature left="02391840" right="6497138"/>
    <feature left="02395120" right="6498620"/>
    <feature left="02408540" right="22274612"/>
    <feature left="02415000" right="22035157"/>
    <feature left="02422500" right="21676818"/>
    <feature left="02427250" right="21457950"/>
    <feature left="02430085" right="18693151"/>
    <feature left="02430880" right="18694277"/>
    <feature left="02438000" right="18670534"/>
    <feature left="02448900" right="18604744"/>
    <feature left="02450250" right="18578829"/>
    <feature left="02450825" right="18579837"/>
    <feature left="02464000" right="18229143"/>
    <feature left="02464146" right="18227533"/>
    <feature left="02464360" right="18227431"/>
    <feature left="02465493" right="18208346"/>
    <feature left="02467500" right="18531726"/>
    <feature left="02469800" right="21640642"/>
    <feature left="02470072" right="21638314"/>
    <feature left="02472000" right="18154237"/>
    <feature left="02472500" right="18157053"/>
    <feature left="02472850" right="18156447"/>
    <feature left="02479155" right="18105412"/>
    <feature left="02479300" right="18107248"/>
    <feature left="02479560" right="18094981"/>
    <feature left="02479945" right="18094191"/>
    <feature left="02481000" right="18070104"/>
    <feature left="02481510" right="18075806"/>
    <feature left="03010655" right="8971150"/>
    <feature left="03011800" right="8974366"/>
    <feature left="03015500" right="8975242"/>
    <feature left="03017500" right="10220060"/>
    <feature left="03021350" right="9049227"/>
    <feature left="03022540" right="9052485"/>
    <feature left="03026500" right="6874093"/>
    <feature left="03028000" right="6874111"/>
    <feature left="03049800" right="11049744"/>
    <feature left="03050000" right="4353002"/>
    <feature left="03065000" right="3775187"/>
    <feature left="03065400" right="3775587"/>
    <feature left="03070500" right="3773681"/>
    <feature left="03075905" right="3808623"/>
    <feature left="03076600" right="3808829"/>
    <feature left="03078000" right="3808365"/>
    <feature left="03114500" right="15431972"/>
    <feature left="03115400" right="15431680"/>
    <feature left="03121850" right="19389922"/>
    <feature left="03140000" right="15400312"/>
    <feature left="03144000" right="15380169"/>
    <feature left="03149500" right="15380941"/>
    <feature left="03154000" right="19418679"/>
    <feature left="03158200" right="15420817"/>
    <feature left="03159540" right="19440277"/>
    <feature left="03161000" right="6892192"/>
    <feature left="03165000" right="6886804"/>
    <feature left="03170000" right="6884590"/>
    <feature left="03173000" right="6908597"/>
    <feature left="03180500" right="12103826"/>
    <feature left="03186500" right="4547810"/>
    <feature left="03187500" right="4547840"/>
    <feature left="03190000" right="4547946"/>
    <feature left="03198350" right="6929270"/>
    <feature left="03201902" right="3366826"/>
    <feature left="03207965" right="1086659"/>
    <feature left="03210000" right="886365"/>
    <feature left="03228750" right="5212897"/>
    <feature left="03237280" right="1920776"/>
    <feature left="03237500" right="1919636"/>
    <feature left="03241500" right="3930910"/>
    <feature left="03250100" right="2091855"/>
    <feature left="03250322" right="2091809"/>
    <feature left="03251200" right="2088233"/>
    <feature left="03252300" right="2057678"/>
    <feature left="03272700" right="3882696"/>
    <feature left="03280700" right="487460"/>
    <feature left="03281100" right="504810"/>
    <feature left="03282040" right="867996"/>
    <feature left="03282500" right="867892"/>
    <feature left="03285000" right="1827630"/>
    <feature left="03291780" right="10161364"/>
    <feature left="03300400" right="10302627"/>
    <feature left="03302680" right="10356472"/>
    <feature left="03310400" right="4002874"/>
    <feature left="03318800" right="11622056"/>
    <feature left="03340800" right="10207857"/>
    <feature left="03346000" right="10337448"/>
    <feature left="03357330" right="18464804"/>
    <feature left="03364500" right="18454477"/>
    <feature left="03366500" right="18451023"/>
    <feature left="03373508" right="18445546"/>
    <feature left="03384450" right="11868270"/>
    <feature left="03403910" right="10191314"/>
    <feature left="03408500" right="12154450"/>
    <feature left="03409500" right="12154278"/>
    <feature left="03413200" right="3575414"/>
    <feature left="03415000" right="10183165"/>
    <feature left="03416000" right="10181647"/>
    <feature left="03424730" right="18421273"/>
    <feature left="03427500" right="18402499"/>
    <feature left="03431800" right="18409192"/>
    <feature left="03436690" right="11883626"/>
    <feature left="03439000" right="22165090"/>
    <feature left="03441000" right="22164500"/>
    <feature left="0344894205" right="22160778"/>
    <feature left="03450000" right="22161670"/>
    <feature left="03453000" right="22161598"/>
    <feature left="03455500" right="22152669"/>
    <feature left="03456500" right="22152435"/>
    <feature left="03460000" right="22151401"/>
    <feature left="03463300" right="19490270"/>
    <feature left="03471500" right="19752335"/>
    <feature left="03479000" right="19743430"/>
    <feature left="03488000" right="19761976"/>
    <feature left="03491000" right="22178350"/>
    <feature left="03497300" right="22130697"/>
    <feature left="03500000" right="19736555"/>
    <feature left="03500240" right="19735951"/>
    <feature left="03504000" right="19736561"/>
    <feature left="03518500" right="19722609"/>
    <feature left="03535000" right="22123016"/>
    <feature left="03539778" right="19710283"/>
    <feature left="03544970" right="19679561"/>
    <feature left="03578000" right="19604615"/>
    <feature left="03578500" right="19604435"/>
    <feature left="03584045" right="19596000"/>
    <feature left="03588500" right="19577851"/>
    <feature left="03593800" right="19550100"/>
    <feature left="03597590" right="19531616"/>
    <feature left="03599450" right="19531696"/>
    <feature left="03604000" right="19504798"/>
    <feature left="04015330" right="1757696"/>
    <feature left="04024430" right="1799897"/>
    <feature left="04031000" right="6790991"/>
    <feature left="04032000" right="6790631"/>
    <feature left="04033000" right="11951527"/>
    <feature left="04040500" right="11930606"/>
    <feature left="04043050" right="11937201"/>
    <feature left="04043150" right="12027144"/>
    <feature left="04043238" right="12025490"/>
    <feature left="04043244" right="12025464"/>
    <feature left="04043275" right="12025598"/>
    <feature left="04045500" right="12186641"/>
    <feature left="04046000" right="12214445"/>
    <feature left="04056500" right="12222392"/>
    <feature left="04057510" right="272589"/>
    <feature left="04057800" right="11959338"/>
    <feature left="04059500" right="6860182"/>
    <feature left="04063700" right="6844165"/>
    <feature left="04066500" right="6847893"/>
    <feature left="04067958" right="904030532"/>
    <feature left="04074950" right="9027875"/>
    <feature left="04085200" right="13064729"/>
    <feature left="04104945" right="3473093"/>
    <feature left="04105700" right="3473077"/>
    <feature left="04115265" right="12232338"/>
    <feature left="04117000" right="12145180"/>
    <feature left="04122200" right="904060094"/>
    <feature left="04122500" right="8992044"/>
    <feature left="04124000" right="12121104"/>
    <feature left="04124500" right="12121188"/>
    <feature left="04126970" right="13057812"/>
    <feature left="04127917" right="12206226"/>
    <feature left="04127997" right="12502977"/>
    <feature left="04185440" right="15662946"/>
    <feature left="04196800" right="15613832"/>
    <feature left="04197100" right="15612400"/>
    <feature left="04197170" right="15612196"/>
    <feature left="04199155" right="15604094"/>
    <feature left="04213000" right="9841738"/>
    <feature left="04213075" right="9841436"/>
    <feature left="04216418" right="15569391"/>
    <feature left="04221000" right="15550135"/>
    <feature left="04224775" right="15547935"/>
    <feature left="04233286" right="21983581"/>
    <feature left="0423401815" right="21978911"/>
    <feature left="04237962" right="21977433"/>
    <feature left="04256000" right="15514388"/>
    <feature left="04265432" right="15476223"/>
    <feature left="04268800" right="15456660"/>
    <feature left="04273700" right="9527383"/>
    <feature left="04273800" right="9527387"/>
    <feature left="04282525" right="22220425"/>
    <feature left="04282650" right="22220501"/>
    <feature left="04282780" right="22220497"/>
    <feature left="04288230" right="4576576"/>
    <feature left="04296000" right="4599789"/>
    <feature left="05014300" right="9305916"/>
    <feature left="05056100" right="14293805"/>
    <feature left="05056200" right="14299851"/>
    <feature left="05057200" right="14269214"/>
    <feature left="05059600" right="14251875"/>
    <feature left="05062500" right="7027381"/>
    <feature left="05065500" right="14144081"/>
    <feature left="05087500" right="7069743"/>
    <feature left="05120500" right="14172539"/>
    <feature left="05123400" right="14156742"/>
    <feature left="05129115" right="7140308"/>
    <feature left="05131500" right="7171122"/>
    <feature left="05132000" right="22237092"/>
    <feature left="05212700" right="4836620"/>
    <feature left="05290000" right="4073636"/>
    <feature left="05291000" right="4085588"/>
    <feature left="05293000" right="4085656"/>
    <feature left="05317200" right="4142704"/>
    <feature left="05357335" right="13344864"/>
    <feature left="05362000" right="12981697"/>
    <feature left="05383950" right="2464389"/>
    <feature left="05385500" right="2463033"/>
    <feature left="05387440" right="13336562"/>
    <feature left="05389000" right="13211090"/>
    <feature left="05389400" right="13211298"/>
    <feature left="05393500" right="13399941"/>
    <feature left="05399500" right="14733228"/>
    <feature left="05407470" right="13624135"/>
    <feature left="05413500" right="13324626"/>
    <feature left="05414000" right="13324402"/>
    <feature left="05420680" right="6949292"/>
    <feature left="05432695" right="13409794"/>
    <feature left="05432927" right="13410000"/>
    <feature left="05444000" right="10605920"/>
    <feature left="05451210" right="6572160"/>
    <feature left="05454000" right="11915429"/>
    <feature left="05458000" right="7016531"/>
    <feature left="05464220" right="22476707"/>
    <feature left="05467000" right="6960357"/>
    <feature left="05473450" right="7003470"/>
    <feature left="05487980" right="22252001"/>
    <feature left="05488200" right="4995875"/>
    <feature left="05489000" right="4994047"/>
    <feature left="05494300" right="5799184"/>
    <feature left="05495500" right="5802188"/>
    <feature left="05498150" right="5017044"/>
    <feature left="05498700" right="4989089"/>
    <feature left="05501000" right="2925495"/>
    <feature left="05503800" right="5640210"/>
    <feature left="05506100" right="5039952"/>
    <feature left="05507600" right="4868933"/>
    <feature left="05508805" right="4867221"/>
    <feature left="05514500" right="2507315"/>
    <feature left="05525500" right="13454068"/>
    <feature left="05556500" right="14836121"/>
    <feature left="05584500" right="13802760"/>
    <feature left="05591550" right="13771102"/>
    <feature left="05592050" right="13771802"/>
    <feature left="05592575" right="13869098"/>
    <feature left="05593575" right="13873068"/>
    <feature left="05593900" right="13881906"/>
    <feature left="05595730" right="13783124"/>
    <feature left="06036805" right="3061736"/>
    <feature left="06043500" right="3855315"/>
    <feature left="06073500" right="12439987"/>
    <feature left="06078500" right="12395494"/>
    <feature left="06079000" right="12395028"/>
    <feature left="06090500" right="12493987"/>
    <feature left="06102500" right="12744853"/>
    <feature left="06187915" right="2962790"/>
    <feature left="06191000" right="2965566"/>
    <feature left="06209500" right="4264796"/>
    <feature left="06218500" right="12898110"/>
    <feature left="06221400" right="12899984"/>
    <feature left="06224000" right="12900462"/>
    <feature left="06228350" right="12889559"/>
    <feature left="06278300" right="12804142"/>
    <feature left="06280300" right="12788124"/>
    <feature left="06289000" right="12771343"/>
    <feature left="06309200" right="5351095"/>
    <feature left="06311000" right="5348813"/>
    <feature left="06332515" right="21539242"/>
    <feature left="06336600" right="13466672"/>
    <feature left="06339100" right="16247217"/>
    <feature left="06342260" right="14519511"/>
    <feature left="06342450" right="14523345"/>
    <feature left="06344600" right="16233369"/>
    <feature left="06347000" right="16224323"/>
    <feature left="06347500" right="16224859"/>
    <feature left="06350000" right="16213178"/>
    <feature left="06352000" right="21860180"/>
    <feature left="06392900" right="9385393"/>
    <feature left="06402430" right="14396988"/>
    <feature left="06404000" right="14396112"/>
    <feature left="06408700" right="17532989"/>
    <feature left="06409000" right="17533503"/>
    <feature left="06422500" right="14553227"/>
    <feature left="06424000" right="14552381"/>
    <feature left="06429500" right="5481043"/>
    <feature left="06430532" right="5478981"/>
    <feature left="06430850" right="5481901"/>
    <feature left="06440200" right="16131765"/>
    <feature left="06446700" right="16074256"/>
    <feature left="06447230" right="16072528"/>
    <feature left="06447500" right="20179529"/>
    <feature left="06453600" right="11673186"/>
    <feature left="06468170" right="12570210"/>
    <feature left="06470800" right="11468868"/>
    <feature left="06471200" right="11446137"/>
    <feature left="06477500" right="12664828"/>
    <feature left="06479215" right="9370700"/>
    <feature left="06614800" right="16000378"/>
    <feature left="06622700" right="15983485"/>
    <feature left="06623800" right="15984809"/>
    <feature left="06632400" right="15968457"/>
    <feature left="06696980" right="5239426"/>
    <feature left="06746095" right="2900149"/>
    <feature left="06775500" right="17394882"/>
    <feature left="06803510" right="17405547"/>
    <feature left="06803530" right="17399459"/>
    <feature left="06814000" right="19158238"/>
    <feature left="06821080" right="2529443"/>
    <feature left="06846500" right="8164854"/>
    <feature left="06847900" right="8364344"/>
    <feature left="06853800" right="19017621"/>
    <feature left="06869950" right="18869658"/>
    <feature left="06876700" right="3539121"/>
    <feature left="06878000" right="18880718"/>
    <feature left="06879650" right="18841318"/>
    <feature left="06885500" right="2277053"/>
    <feature left="06888000" right="3642310"/>
    <feature left="06888500" right="3643688"/>
    <feature left="06889200" right="3645652"/>
    <feature left="06891810" right="3728285"/>
    <feature left="06895000" right="4386317"/>
    <feature left="06899700" right="5108288"/>
    <feature left="06903400" right="5124308"/>
    <feature left="06906150" right="4461060"/>
    <feature left="06906800" right="5995392"/>
    <feature left="06909500" right="5156625"/>
    <feature left="06910800" right="10116766"/>
    <feature left="06911490" right="10116456"/>
    <feature left="06911900" right="10117754"/>
    <feature left="06914990" right="2992636"/>
    <feature left="06917000" right="7360005"/>
    <feature left="06918460" right="7374577"/>
    <feature left="06919500" right="7371515"/>
    <feature left="06921070" right="7388043"/>
    <feature left="06921200" right="7388709"/>
    <feature left="06927000" right="5984765"/>
    <feature left="06928000" right="7423890"/>
    <feature left="06928300" right="7411738"/>
    <feature left="06930000" right="7434497"/>
    <feature left="07010350" right="5059289"/>
    <feature left="07014000" right="5055431"/>
    <feature left="07021000" right="5029785"/>
    <feature left="07030392" right="14208484"/>
    <feature left="07048800" right="8590392"/>
    <feature left="07049000" right="8588002"/>
    <feature left="07050152" right="8585070"/>
    <feature left="07053250" right="8586574"/>
    <feature left="07053810" right="7625104"/>
    <feature left="07054080" right="7622144"/>
    <feature left="07055646" right="11819355"/>
    <feature left="07055875" right="11818809"/>
    <feature left="07056515" right="11817935"/>
    <feature left="07057500" right="7650991"/>
    <feature left="07058000" right="7650971"/>
    <feature left="07060710" right="11834118"/>
    <feature left="07061270" right="7665728"/>
    <feature left="07064440" right="7516577"/>
    <feature left="07065200" right="7519951"/>
    <feature left="07075000" right="11778001"/>
    <feature left="07083000" right="916821"/>
    <feature left="07105945" right="1530177"/>
    <feature left="07142300" right="21195628"/>
    <feature left="07144050" right="21176984"/>
    <feature left="07144780" right="21160115"/>
    <feature left="07145700" right="21166753"/>
    <feature left="07148400" right="21028210"/>
    <feature left="07149000" right="21014649"/>
    <feature left="07151500" right="20987715"/>
    <feature left="07167500" right="21517052"/>
    <feature left="07176950" right="847956"/>
    <feature left="07179700" right="20929432"/>
    <feature left="07180500" right="20920709"/>
    <feature left="07184000" right="20874950"/>
    <feature left="07188653" right="7600617"/>
    <feature left="07189100" right="7600551"/>
    <feature left="07191222" right="21770993"/>
    <feature left="07195800" right="399452"/>
    <feature left="07196900" right="400822"/>
    <feature left="07197360" right="401366"/>
    <feature left="07208500" right="20060156"/>
    <feature left="07226500" right="20026105"/>
    <feature left="07227420" right="20001250"/>
    <feature left="07233500" right="13837931"/>
    <feature left="07247250" right="6048082"/>
    <feature left="07249800" right="1540035"/>
    <feature left="07249920" right="1539341"/>
    <feature left="07250935" right="7752590"/>
    <feature left="07250974" right="7753312"/>
    <feature left="07252000" right="7752938"/>
    <feature left="07257006" right="7766271"/>
    <feature left="07260000" right="7805591"/>
    <feature left="07263295" right="22846153"/>
    <feature left="072632962" right="22846151"/>
    <feature left="072632971" right="22846055"/>
    <feature left="072632982" right="22845923"/>
    <feature left="07290650" right="19138473"/>
    <feature left="07291000" right="19115892"/>
    <feature left="07295000" right="19104913"/>
    <feature left="07299670" right="13741403"/>
    <feature left="07301410" right="13754363"/>
    <feature left="07315200" right="13660367"/>
    <feature left="07315700" right="3132401"/>
    <feature left="07325860" right="683417"/>
    <feature left="07329780" right="19959914"/>
    <feature left="07331300" right="430036"/>
    <feature left="07332390" right="698694"/>
    <feature left="07335700" right="588170"/>
    <feature left="07340300" right="3746094"/>
    <feature left="07342480" right="4300953"/>
    <feature left="07346045" right="1017865"/>
    <feature left="07351500" right="8342461"/>
    <feature left="07359610" right="22000700"/>
    <feature left="07360200" right="22698054"/>
    <feature left="07362100" right="21956120"/>
    <feature left="07362500" right="21950596"/>
    <feature left="07366200" right="15221924"/>
    <feature left="07373000" right="19376616"/>
    <feature left="07375000" right="18928210"/>
    <feature left="07375800" right="20089222"/>
    <feature left="07376500" right="20090348"/>
    <feature left="07377000" right="18985872"/>
    <feature left="08013000" right="15085941"/>
    <feature left="08014500" right="15078398"/>
    <feature left="08023080" right="9533087"/>
    <feature left="08023400" right="9533201"/>
    <feature left="08025500" right="8329634"/>
    <feature left="08029500" right="8330722"/>
    <feature left="08031000" right="8331928"/>
    <feature left="08033900" right="1149925"/>
    <feature left="08041500" right="1166409"/>
    <feature left="08050800" right="1275870"/>
    <feature left="08050840" right="1275872"/>
    <feature left="08066200" right="1487570"/>
    <feature left="08066300" right="1494036"/>
    <feature left="08068780" right="1508121"/>
    <feature left="08070000" right="1520007"/>
    <feature left="08079600" right="13698835"/>
    <feature left="08082700" right="5542148"/>
    <feature left="08086290" right="5525073"/>
    <feature left="08095300" right="5523936"/>
    <feature left="08099300" right="2567906"/>
    <feature left="08101000" right="2580511"/>
    <feature left="08103900" right="5587890"/>
    <feature left="0810464660" right="5670825"/>
    <feature left="08104900" right="5671579"/>
    <feature left="08109700" right="5570395"/>
    <feature left="08128400" right="5707852"/>
    <feature left="08131400" right="5702253"/>
    <feature left="08133250" right="5711103"/>
    <feature left="08150800" right="5770545"/>
    <feature left="08152900" right="5785479"/>
    <feature left="08155200" right="5781265"/>
    <feature left="08158700" right="5780099"/>
    <feature left="08158810" right="5781401"/>
    <feature left="08160800" right="5791670"/>
    <feature left="08163500" right="7841703"/>
    <feature left="08164300" right="7846049"/>
    <feature left="08164600" right="9349285"/>
    <feature left="08165300" right="3585678"/>
    <feature left="08166000" right="3585554"/>
    <feature left="08168932" right="1619647"/>
    <feature left="08175000" right="1623207"/>
    <feature left="08177300" right="1639209"/>
    <feature left="08185100" right="7851629"/>
    <feature left="08186500" right="3838999"/>
    <feature left="08189300" right="5289427"/>
    <feature left="08190500" right="7876116"/>
    <feature left="08194200" right="10634531"/>
    <feature left="08195000" right="10645755"/>
    <feature left="08196000" right="10644541"/>
    <feature left="08198000" right="10645747"/>
    <feature left="08200000" right="10654651"/>
    <feature left="08200977" right="10653947"/>
    <feature left="08201500" right="10653905"/>
    <feature left="08210400" right="3168874"/>
    <feature left="08212400" right="1585173"/>
    <feature left="08267500" right="17863440"/>
    <feature left="08269000" right="17864360"/>
    <feature left="08271000" right="17863078"/>
    <feature left="08277470" right="17864778"/>
    <feature left="08302500" right="17866900"/>
    <feature left="08315480" right="17835114"/>
    <feature left="08324000" right="17826714"/>
    <feature left="08340500" right="17789879"/>
    <feature left="08377900" right="20815146"/>
    <feature left="08380500" right="20815196"/>
    <feature left="08386505" right="20772466"/>
    <feature left="08400000" right="22455973"/>
    <feature left="08401200" right="22455359"/>
    <feature left="08405105" right="22458323"/>
    <feature left="09035900" right="1238533"/>
    <feature left="09058500" right="1238569"/>
    <feature left="09065500" right="1320264"/>
    <feature left="09066000" right="1320274"/>
    <feature left="09066200" right="1320244"/>
    <feature left="09081600" right="1326465"/>
    <feature left="09107000" right="1333022"/>
    <feature left="09196500" right="18325008"/>
    <feature left="09210500" right="18354249"/>
    <feature left="09217900" right="3199586"/>
    <feature left="09223000" right="3192546"/>
    <feature left="09306242" right="3240437"/>
    <feature left="09312600" right="3906361"/>
    <feature left="09329050" right="4900159"/>
    <feature left="09352900" right="17034197"/>
    <feature left="09378170" right="3272718"/>
    <feature left="09378630" right="1399324"/>
    <feature left="09386900" right="20572245"/>
    <feature left="09404208" right="20721204"/>
    <feature left="09404222" right="20682381"/>
    <feature left="09404343" right="20667088"/>
    <feature left="09404450" right="10025746"/>
    <feature left="09408195" right="20653582"/>
    <feature left="09415460" right="20635122"/>
    <feature left="09430500" right="2430698"/>
    <feature left="09430600" right="2430436"/>
    <feature left="09444200" right="21355827"/>
    <feature left="09447800" right="21327929"/>
    <feature left="09460150" right="21331123"/>
    <feature left="09470800" right="15934417"/>
    <feature left="09471310" right="15932983"/>
    <feature left="09484000" right="15893872"/>
    <feature left="09484550" right="15895582"/>
    <feature left="09484580" right="15895584"/>
    <feature left="09492400" right="20488086"/>
    <feature left="09497800" right="22440644"/>
    <feature left="09497980" right="22440682"/>
    <feature left="09505200" right="20454544"/>
    <feature left="09505350" right="20454522"/>
    <feature left="09505800" right="20435246"/>
    <feature left="09508300" right="20437386"/>
    <feature left="09510200" right="20440676"/>
    <feature left="09512280" right="20476698"/>
    <feature left="09513780" right="20415812"/>
    <feature left="09537200" right="20371805"/>
    <feature left="10023000" right="7880800"/>
    <feature left="10166430" right="10327201"/>
    <feature left="10172700" right="10396937"/>
    <feature left="10172860" right="10406554"/>
    <feature left="10172870" right="10683178"/>
    <feature left="10173450" right="10818086"/>
    <feature left="10205030" right="3506561"/>
    <feature left="10234500" right="1215135"/>
    <feature left="10242000" right="14597053"/>
    <feature left="10243260" right="10407562"/>
    <feature left="10243700" right="11338977"/>
    <feature left="10244950" right="11339045"/>
    <feature left="10249300" right="10696957"/>
    <feature left="10257600" right="22590267"/>
    <feature left="10258000" right="22593497"/>
    <feature left="10258500" right="22593537"/>
    <feature left="10259000" right="22592131"/>
    <feature left="10259200" right="22592497"/>
    <feature left="10263500" right="22684930"/>
    <feature left="10291500" right="8915907"/>
    <feature left="10308200" right="8922715"/>
    <feature left="10310500" right="8920579"/>
    <feature left="10316500" right="10786444"/>
    <feature left="10321590" right="10783380"/>
    <feature left="10329500" right="11137442"/>
    <feature left="103366092" right="8943677"/>
    <feature left="10336645" right="8941733"/>
    <feature left="10336660" right="8941693"/>
    <feature left="10336676" right="8941685"/>
    <feature left="10340500" right="8933736"/>
    <feature left="10343500" right="8933522"/>
    <feature left="10396000" right="24013585"/>
    <feature left="11015000" right="20334440"/>
    <feature left="11046300" right="20351605"/>
    <feature left="11051502" right="22558244"/>
    <feature left="11058000" right="22557854"/>
    <feature left="11058600" right="22555344"/>
    <feature left="11063510" right="22557744"/>
    <feature left="11098000" right="22514774"/>
    <feature left="11111500" right="17567911"/>
    <feature left="11114495" right="17585808"/>
    <feature left="11120500" right="17596109"/>
    <feature left="11124500" right="17611425"/>
    <feature left="11138500" right="17625379"/>
    <feature left="11141280" right="8193647"/>
    <feature left="11143000" right="8189809"/>
    <feature left="11148900" right="8209949"/>
    <feature left="11151300" right="8205487"/>
    <feature left="11153000" right="17663037"/>
    <feature left="11154700" right="17673639"/>
    <feature left="11162500" right="17688105"/>
    <feature left="11162570" right="17687965"/>
    <feature left="11169800" right="17694891"/>
    <feature left="11172945" right="2809681"/>
    <feature left="11173200" right="2809859"/>
    <feature left="11176400" right="2806807"/>
    <feature left="11180500" right="2804369"/>
    <feature left="11180960" right="2804901"/>
    <feature left="11203580" right="14930711"/>
    <feature left="11224500" right="14883269"/>
    <feature left="11253310" right="14882615"/>
    <feature left="11264500" right="21609533"/>
    <feature left="11274500" right="2828012"/>
    <feature left="11274630" right="2827982"/>
    <feature left="11274790" right="17081597"/>
    <feature left="11284400" right="17078425"/>
    <feature left="11299600" right="348419"/>
    <feature left="11336580" right="15040355"/>
    <feature left="11355500" right="7952754"/>
    <feature left="11381500" right="8019544"/>
    <feature left="11383500" right="8020924"/>
    <feature left="11427000" right="14996611"/>
    <feature left="11447360" right="15022679"/>
    <feature left="11449500" right="948020963"/>
    <feature left="11451100" right="8005975"/>
    <feature left="11467200" right="8271049"/>
    <feature left="11468000" right="2665613"/>
    <feature left="11468500" right="2665525"/>
    <feature left="11468900" right="2546355"/>
    <feature left="11473900" right="8295207"/>
    <feature left="11475560" right="8287590"/>
    <feature left="11476600" right="8284190"/>
    <feature left="11478500" right="2705477"/>
    <feature left="11480390" right="8320019"/>
    <feature left="11481200" right="8315847"/>
    <feature left="11481500" right="8319319"/>
    <feature left="11522500" right="8261865"/>
    <feature left="11523200" right="8242324"/>
    <feature left="11525530" right="8246406"/>
    <feature left="11525670" right="8245912"/>
    <feature left="11526500" right="8244332"/>
    <feature left="11528700" right="8232392"/>
    <feature left="11532500" right="22226812"/>
    <feature left="12010000" right="23864404"/>
    <feature left="12013500" right="23864616"/>
    <feature left="12020800" right="23850611"/>
    <feature left="12024000" right="23850681"/>
    <feature left="12025700" right="23850773"/>
    <feature left="12035000" right="23856727"/>
    <feature left="12039005" right="23860867"/>
    <feature left="12040500" right="23844955"/>
    <feature left="12041200" right="23838194"/>
    <feature left="12043000" right="23838568"/>
    <feature left="12043300" right="24001093"/>
    <feature left="12048000" right="23997388"/>
    <feature left="12054000" right="24287056"/>
    <feature left="12056500" right="24285534"/>
    <feature left="12060500" right="24285572"/>
    <feature left="12079000" right="23988204"/>
    <feature left="12082500" right="24282122"/>
    <feature left="12083000" right="24282076"/>
    <feature left="12092000" right="23980479"/>
    <feature left="12094000" right="23980639"/>
    <feature left="12095000" right="23980763"/>
    <feature left="12097500" right="23981161"/>
    <feature left="12108500" right="23977660"/>
    <feature left="12114500" right="24538014"/>
    <feature left="12115500" right="24537972"/>
    <feature left="12115700" right="24538136"/>
    <feature left="12117000" right="24537924"/>
    <feature left="12137290" right="23963741"/>
    <feature left="12141300" right="23970363"/>
    <feature left="12142000" right="23970575"/>
    <feature left="12143400" right="23970313"/>
    <feature left="12145500" right="23970215"/>
    <feature left="12157250" right="23990063"/>
    <feature left="12175500" right="24255219"/>
    <feature left="12178100" right="24255169"/>
    <feature left="12179900" right="24255811"/>
    <feature left="12182500" right="24254981"/>
    <feature left="12186000" right="24264875"/>
    <feature left="12201500" right="24534294"/>
    <feature left="12202300" right="24534748"/>
    <feature left="12209490" right="23956076"/>
    <feature left="12323670" right="24293902"/>
    <feature left="12323710" right="24293950"/>
    <feature left="12332000" right="24310031"/>
    <feature left="12354000" right="22937058"/>
    <feature left="12358500" right="22957049"/>
    <feature left="12359800" right="22965282"/>
    <feature left="12374250" right="24357005"/>
    <feature left="12375900" right="24356439"/>
    <feature left="12377150" right="24356265"/>
    <feature left="12381400" right="24356091"/>
    <feature left="12390700" right="22976274"/>
    <feature left="12392155" right="22977134"/>
    <feature left="12392300" right="24114491"/>
    <feature left="12411000" right="24373072"/>
    <feature left="12413875" right="23004793"/>
    <feature left="12447383" right="24383455"/>
    <feature left="12447390" right="24384943"/>
    <feature left="12451000" right="23073999"/>
    <feature left="12452800" right="23208058"/>
    <feature left="12452890" right="23208194"/>
    <feature left="12456500" right="23080732"/>
    <feature left="12458000" right="23081224"/>
    <feature left="12488500" right="24422913"/>
    <feature left="13010065" right="23123539"/>
    <feature left="13011500" right="23123373"/>
    <feature left="13011900" right="23123209"/>
    <feature left="13016305" right="24433101"/>
    <feature left="13023000" right="24432173"/>
    <feature left="13046995" right="23141955"/>
    <feature left="13083000" right="23184753"/>
    <feature left="13161500" right="23287111"/>
    <feature left="13162225" right="23284453"/>
    <feature left="13185000" right="23382201"/>
    <feature left="13235000" right="24158523"/>
    <feature left="13237920" right="24164073"/>
    <feature left="13240000" right="24177409"/>
    <feature left="13309220" right="23518785"/>
    <feature left="13310700" right="23551284"/>
    <feature left="13313000" right="23551584"/>
    <feature left="13331500" right="23436993"/>
    <feature left="13334450" right="24227993"/>
    <feature left="13337000" right="23588130"/>
    <feature left="13339500" right="23606608"/>
    <feature left="13340600" right="23630350"/>
    <feature left="14020000" right="23648622"/>
    <feature left="14020300" right="23648490"/>
    <feature left="14036860" right="23822685"/>
    <feature left="14046890" right="23686446"/>
    <feature left="14092750" right="23719653"/>
    <feature left="14096850" right="23719315"/>
    <feature left="14107000" right="23663381"/>
    <feature left="14137000" right="23735819"/>
    <feature left="14138800" right="23736071"/>
    <feature left="14138870" right="23736433"/>
    <feature left="14138900" right="23736093"/>
    <feature left="14139800" right="23736041"/>
    <feature left="14141500" right="23735991"/>
    <feature left="14150800" right="23752608"/>
    <feature left="14154500" right="23759452"/>
    <feature left="14158500" right="23773371"/>
    <feature left="14158790" right="23773393"/>
    <feature left="14159200" right="23773035"/>
    <feature left="14161500" right="23773411"/>
    <feature left="14166500" right="23763161"/>
    <feature left="14179000" right="23780701"/>
    <feature left="14180300" right="23780557"/>
    <feature left="14182500" right="23780805"/>
    <feature left="14185000" right="23785793"/>
    <feature left="14185900" right="23786019"/>
    <feature left="14187000" right="23785723"/>
    <feature left="14216000" right="24241981"/>
    <feature left="14216500" right="24242219"/>
    <feature left="14219000" right="24241873"/>
    <feature left="14222500" right="24241689"/>
    <feature left="14236200" right="24249034"/>
    <feature left="14299800" right="23872135"/>
    <feature left="14301500" right="23875925"/>
    <feature left="14303200" right="23876773"/>
    <feature left="14305500" right="23880874"/>
    <feature left="14307620" right="23889518"/>
    <feature left="14308990" right="23901309"/>
    <feature left="14309500" right="23901147"/>
    <feature left="14316495" right="23894558"/>
    <feature left="14316700" right="23894572"/>
    <feature left="14318000" right="23894004"/>
    <feature left="14320934" right="24526862"/>
    <feature left="14325000" right="23914567"/>
    <feature left="14328000" right="23923664"/>
    <feature left="14353000" right="23930882"/>
    <feature left="14353500" right="23931320"/>
    <feature left="14362250" right="23935979"/>
    <feature left="14400000" right="23949601"/>
    <feature left="402114105350101" right="13584"/>
</code>

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T15:03:05Z


Thanks, that looks ideal. I probably don't have sufficient nwm data locally to test this for a wpod-style evaluation (I think scenario703 has around a day, whereas wpod is 30 or 60 days, I think). Still, it should provide an indication of the speed-up, percentage wise - we will need to uat on the wpod style evaluation anyway, also to check that the batching parameters (specifically, the batch size) is reasonable.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T15:30:06Z


Doing some test runs now, several with no batching, several with batches of 100 features to begin with. A 100-feature batch may not be conservative enough w/r to ram usage, tbd (we can use a wpod evaluation that contains 60 days of data or whatever as a better guide).

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T15:57:53Z


The retrieval portion of this test is pretty quick - around a minute or so - to reiterate, it would be good to test with something at the other end of the spectrum too, i.e., 60 days or so. Still, the spread around the results is pretty tight, so it should give an indication, percentage wise.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T16:24:48Z


Some results for the evaluation portion of each execution (which includes retrieval).

Three runs, no feature batching:

2021-11-30T15:18:37.176+0000 29664 [main] INFO wres.pipeline.PoolReporter - Statistics were created for 23347 pools, which included 974 features groups and 24 time windows. [ <SNIP> ] The time elapsed between the completion of the first and last pools was: PT1M24.7193911S.
2021-11-30T15:29:35.223+0000 27040 [main] INFO wres.pipeline.PoolReporter - Statistics were created for 23347 pools, which included 974 features groups and 24 time windows. [ <SNIP> ] The time elapsed between the completion of the first and last pools was: PT1M25.1464604S.
2021-11-30T15:37:51.870+0000 22828 [main] INFO wres.pipeline.PoolReporter - Statistics were created for 23347 pools, which included 974 features groups and 24 time windows. [ <SNIP> ] The time elapsed between the completion of the first and last pools was: PT1M29.2171735S.

Three runs, batches of 100 features:

2021-11-30T15:46:12.086+0000 28952 [main] INFO wres.pipeline.PoolReporter - Statistics were created for 23347 pools, which included 974 features groups and 24 time windows. [ <SNIP> ] The time elapsed between the completion of the first and last pools was: PT58.7603311S.
2021-11-30T15:53:47.466+0000 13660 [main] INFO wres.pipeline.PoolReporter - Statistics were created for 23347 pools, which included 974 features groups and 24 time windows. [ <SNIP> ] The time elapsed between the completion of the first and last pools was: PT54.4548247S.
2021-11-30T16:03:18.897+0000 5448 [main] INFO wres.pipeline.PoolReporter - Statistics were created for 23347 pools, which included 974 features groups and 24 time windows. [ <SNIP> ] The time elapsed between the completion of the first and last pools was: PT56.152774S.

Mean without batching: PT86.36100833S
Stdev without batching: PT2.48271155S
Mean with 100-feature batches: PT56.4559766S
Stdev with 100-feature batches: PT2.168708188S

In other words, it is around 35% faster with 100-feature batching when compared to v5.15 behavior. Not bad. How that scales to heavier retrievals (same number of features) and to other batch sizes is tbd. Likewise, ram usage is a variable/trade-off.

Given the very small coefficient of variation (around 4% at most), one run is probably good enough to test the various possibilities, no need to average over N.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T16:32:36Z


Trying with 50-feature batches.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T16:49:00Z


Around 52 seconds for 50-feature batches (2 instances), so just as good.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T17:27:00Z


Looking at the logging of the retrievals, it makes sense, I see the retrievals happening in clumps and they encapsulate the expected feature batches.

One thing to bear in mind is that, just because pool tasks are submitted to an executor service in a particular order, that doesn't mean the underlying pull on the data will happen in exactly the same order, so it is possible that pools could hang around in memory longer than would be strictly necessary (were they completed in exactly the order they were submitted). A pool will only become eligible for gc when there are no more references to it. This could probably be tightened up, but most likely at the expense of throughput. However, it is also a reason to be conservative with the batch sizes, because there will probably be some outlier scenario that uses more memory than expected.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-11-30T18:09:20Z


Can't detect any difference in memory profile for no batching vs. a 500-feature-batched instance of the same evaluation, but then these are very tiny retrievals, so that is not too surprising. Again, instrumentation of a 60-day evaluation will reveal more. The spaghetti after the quiescent period is the evaluation/retrieval portion.

Unbatched:

!no_batching.png!

Batched (500 features per batch):

!500_batching.png!

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-01T16:18:50Z


Adding the two parameters to the system settings.

But, in looking at the system settings, I see that many of the supplied settings are only partially validated. For example, it looks like negative thread counts would be accepted. Separate ticket.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-01T16:50:09Z


System settings and overrides work fine.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-01T17:17:35Z


Initially setting the default to 10 or more features (an evaluation with more than this number of singletons will be feature-batched) and a feature batch size of 50.

Trying the system tests.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-01T17:35:53Z


Tests pass.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-01T17:43:05Z


Pushed in commit:wres|ef9f7d7f1c3b9894855e1e86f355c651bb50cca4.

I doubt this will make it into v5.16. Regardless, it will need UAT. The main thing is to experiment with a WPOD-style evaluation, both to see how much faster the retrieval portion of the evaluation might be and to tweak the batch size, if necessary. The experiments above suggested a performance gain of about 35% w/r to retrieval time for an evaluation that contained ~900 features, but only 24 valid times per pool; this could vary with shape of evaluation.

On hold, pending UAT.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-03T17:48:19Z


Probably 5.17, unclear.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2021-12-14T13:38:24Z


See #99040-223 and #99040-224.

James,

You asked for this in #99040-225:

Do you have a memory profile too?

Check_MK isn't giving me access to -ti03, so I am limited to whatever I can get from the command line. I already provided a @top@ result:

top - 12:53:16 up 4 days, 16:32,  1 user,  load average: 1.11, 1.17, 1.21
Tasks: 296 total,   1 running, 295 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.7 us,  0.2 sy,  0.0 ni, 86.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32761624 total, 17973416 free,  7555440 used,  7232768 buff/cache
KiB Swap:  4194300 total,  4182160 free,    12140 used. 24782664 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                                      
 30790 498       20   0 8557300   3.0g  24716 S 109.6  9.5   1893:58 java                                                                                                                                                                                                                         
  1336 root      20   0  285748   5824   4424 S   0.7  0.0   4:21.65 vmtoolsd                                                                                                                                                                                                                     
 10160 502       20   0 5602712 408820  25624 S   0.3  1.2   6:38.06 java                                                                                                                                                                                                                         
 10273 501       20   0 4714196 874244  28708 S   0.3  2.7  10:34.20 java                                                                                                                                                                                                                         
 10286 501       20   0 4714196 864536  28660 S   0.3  2.6  10:34.09 java                                                                                                                                                                                                                         
 10300 501       20   0 4714196 856920  28676 S   0.3  2.6   8:31.07 java                                                                                                                                                                                                                         
 10441 498       20   0 4874652 135132  22412 S   0.3  0.4   1:08.90 java  

You can see its using 9.5% of 32 GB, or about 3 GB. Its allocated "Max Memory: 2493MiB", per the logging I shared in #99040-223. I'm probably just doing the math wrong, but regardless it appears to be at its upper limit. Let me search for better tools to do a memory profile,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2021-12-14T14:15:34Z


We don't have @jstat@ available on the -ti03 and the @pmap@ provides no information. I know we've been using some other Java tools on these processes, so I'll search the tickets or wiki for the proper command. I don't run them often enough to remember, apparently.

Given how slow it's progressing, after getting a memory profile, I think I should stop the evaluation, set the batch size to 1, and start it again to get a baseline for performance.

If that runs as before, and it should, then we can discuss deploying the batch size of 1 to production for 5.16 to get it out the door, and then do a proper study of the optimal batch size for 5.17.

Thoughts?

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-14T14:26:45Z


If you look at the jfrs under: @/mnt/wres_share/heap_dumps/wres/@, you may find a temporary directory that contains chunked files that can be loaded into jmc (edit: that cover the evaluation in progress). For a longer series, you could copy a bunch of them and merge/finalize them. At the end of a run, these files are merged into a final jfr.

As to whether the slow performance is feature batching or something else is really tbd. A memory profile will help. The loading of the db machine looks pretty low. High memory usage and high gc on the app machine is a possibility. Tweaking the batch-size for the wpod evaluations was a recognized thing to do as part of the uat (#95867-30), so I don't think we should defer that. After getting a baseline with a batch size of 1, we can increase to 10.

How much data are we talking about here? 60 days, hourly valid times (aggregated to three-hourly), forecasts issued 4x per day, 7 members, edit: and a forecast horizon of 8.5 days (but these are pooled into 3 hours per pool, I think)? ( Then multiplied by 50 locations per retrieval, 6 pooling threads at once. )

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2021-12-14T14:34:09Z


Your numbers look right. Add to it 204 hours of lead times for each forecast.

There is no jmc on the path in -ti03. Ugh. Do we have a wiki somewhere explaining how to make use of those .jfr files? Looking,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-14T14:35:58Z


( Right, edited to include the horizon after you read it. )

Can you do an ls in that jfr dir and report back? Do you see some subdirs that contain chunked jfrs?

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-12-14T14:43:56Z


Back of the envelope. 3 (valid times per lead duration pool) * 7 members * 4 (times per day) * 60 (days) * 50 (locations) * 6 (threads) = (up to) 1,512,000 forecast values in memory. 8 bytes per forecast value. All the observations on top of that as wrapped doubles, 24 bytes each. Plus some other overhead, of course (array definitions, instants etc.). Regardless, that's tiny. What factor(s) am I missing?

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-10-11T19:37:32Z


Forgot to share the declaration. With domains omitted:

<?xml version='1.0' encoding='UTF-8'?>
<project name="CNRFC - NWM vs RSA Streamflow Groups">
  <inputs>
    <left label="USGS Streamflow Observations">
      <type>observations</type>
      <source interface="usgs_nwis">https://nwis.waterservices.usgs.gov/nwis/iv</source>
      <variable>00060</variable>
    </left>
    <right label="NWM Streamflow Forecast">
      <type>single valued forecasts</type>
      <source interface="nwm_medium_range_deterministic_channel_rt_conus_hourly">https://nwcal-dstore.[domain]/nwm/2.0/</source>
      <source interface="nwm_medium_range_deterministic_channel_rt_conus_hourly">https://nwcal-dstore.[domain]/nwm/2.1/</source>
      <variable>streamflow</variable>
    </right>
    <baseline featureDimension="nws_lid" separateMetrics="true" label="AHPS Streamflow Forecast">
      <type>single valued forecasts</type>
      <source interface="wrds_ahps">https://nwcal-wrds.[domain]/api/rfc_forecast/v2.0/forecast/streamflow</source>
      <variable>QR</variable>
    </baseline>
  </inputs>
  <pair label="AHPS Leadtime Pools">
    <unit>ft3/s</unit>
    <featureService>
      <baseUrl>https://nwcal-wrds.[domain]/api/location/v3.0/metadata</baseUrl>
      <group pool="false">
        <type>rfc</type>
        <value>CNRFC</value>
      </group>
    </featureService>
    <leadHours minimum="0" maximum="120"/>
    <dates earliest="2020-10-01T00:00:00Z" latest="2021-10-01T00:00:00Z"/>
    <issuedDates earliest="2020-10-01T00:00:00Z" latest="2021-10-01T00:00:00Z"/>
    <leadTimesPoolingWindow>
      <period>6</period>
      <frequency>6</frequency>
      <unit>hours</unit>
    </leadTimesPoolingWindow>
  </pair>
  <metrics>
    <metric>
      <name>mean square error skill score</name>
    </metric>
  </metrics>
  <outputs durationFormat="hours">
    <destination type="netcdf2">
      <outputType>default</outputType>
    </destination>
  </outputs>
</project>
</code>

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-10-11T19:57:04Z


First, it appears that the evaluation I'm using may be a good one to use to test out batch size once we have this issue worked out. The production evaluation Anna started spent only about 5 minutes in retrieval, while I just performed using staging took about 18 minutes, as reported above. This is not apples to apples, but it does point to a need to do at least one run each of batch size 50 and batch size 1 for the Anna evaluation once we've figured out what's going on with the difference in outputs. Speaking of which...

This appears to be a real problem. If I take the declaration I just shared and shorten it to a day,

    <dates earliest="2020-10-01T00:00:00Z" latest="2020-10-02T00:00:00Z"/>
    <issuedDates earliest="2020-10-01T00:00:00Z" latest="2020-10-02T00:00:00Z"/>
</code>

I see this using production:

 MEAN_SQUARE_ERROR_SKILL_SCORE_THRESHOLD_1 = _, _, _, _, -1038.56172850073, 
    -10.4904416345601, _, 0.776044281122368, -5.79934962040058, 
    0.462886869319492, -12.1464713286588, 0.684894581564533, _, 
    -18.4093758338863, 0.781547184054521, _, _, -157143.859994764, 
    -1.97612811918845, -1.47504299559932, 0.873365516795392, _, 
    -1.94720012844488, _, -44.6194171311328, -2.13587316327724, 
    -1326.11550969104, _, -80.5715518233632, _, -3521.08346360351, 
    0.820702985818376, 0.424528704821101, _, _, -0.139592111763382, 
    -5.00649608232809, -4.09699997312339, -44.0293292791656, 
    -7.7181262455033, -2051543.39653202, -590.699867303647, 
    0.983297775849558, -11.7806062792636, -43.4479568643926, 
    0.882295397544749, _, -89.1801891266431, -1.54556261435071, 
    0.938513023092405, 0.777061277395567, -2.13436068662993, 
    -3951.78170792899, -0.496186651854263, -18.9953870191915, _, 
    -0.494843273420175, 0.234686147118674, -1.88131328836535, 
    -12.0696253643433, _, _, _, _, 0.753986901388963, 0.233041773884967, 
    0.981389097066239, -24.0641227123435, 0.856562471869518, 
    -2.19390514285039, -0.17875645158472, 0.0586663054545127, 
    0.283311229473245, 0.317191806612317, 0.789909123441471, 
    0.654790260460054, -997.7411028357, -6.34073001799714, 0.48552828151384, 
    -0.247352562664535, 0.504635576828222, -0.429510019567213 ;

 MEAN_SQUARE_ERROR_SKILL_SCORE_THRESHOLD_1_BASELINE = _, _, _, _, _, 
    -196.220930232558, _, -470.450000000002, -94.4148760330573, 
    -81.054976783453, -27.8479999999999, _, _, -2.02824096036517e+29, 
    -809.661157024794, _, _, _, -2.29394380853278, -41.0204714640199, 
    -10.9051865795905, _, -3.35128518971849, _, -5.19641034235826, 
    -101.56176319836, -1.45501730103809, _, -17.0164458656921, _, 
    -0.418685121107264, _, -1464.0243902439, _, _, -5.81818181818182, 
    -0.134453781512605, -25.972972972973, -1.96, -27, -0.125, -49, 
    -369.799999999998, -15.3489268024216, -8.80898876404494, -1860.5, _, 
    -10082.0000000012, -163.84, -10242.2413793103, -876.6, -125, 
    -0.0295857988165664, -188.438016528925, -21.7510204081633, _, 
    -604.547169811321, -301.370247933883, _, _, _, _, _, _, -8, _, 
    -42.4801512287335, -70.0448979591844, -2601, -721.999999999999, -8.192, 
    -13.0881557598702, -396.050000000001, -115.617769376182, 
    -94.5312499999998, -57.0672023374726, -4, _, -3.28571428571429, 
    -16.6208791208791, -56.18, -205.674567000912 ;
}


and this using the latest revision:

 MEAN_SQUARE_ERROR_SKILL_SCORE_THRESHOLD_1 = _, _, _, _, _, 
    -196.220930232558, _, -470.450000000002, -94.4148760330573, 
    -81.054976783453, -27.8479999999999, _, _, -2.02824096036517e+29, 
    -809.661157024794, _, _, _, -2.29394380853278, -41.0204714640199, 
    -10.9051865795905, _, -3.35128518971849, _, -5.19641034235826, 
    -101.56176319836, -1.45501730103809, _, -17.0164458656921, _, 
    -0.418685121107264, _, -1464.0243902439, _, _, -5.81818181818182, 
    -0.134453781512605, -25.972972972973, -1.96, -27, -0.125, -49, 
    -369.799999999998, -15.3489268024216, -8.80898876404494, -1860.5, _, 
    -10082.0000000012, -163.84, -10242.2413793103, -876.6, -125, 
    -0.0295857988165664, -188.438016528925, -21.7510204081633, _, 
    -604.547169811321, -301.370247933883, _, _, _, _, _, _, -8, _, 
    -42.4801512287335, -70.0448979591844, -2601, -721.999999999999, -8.192, 
    -13.0881557598702, -396.050000000001, -115.617769376182, 
    -94.5312499999998, -57.0672023374726, -4, _, -3.28571428571429, 
    -16.6208791208791, -56.18, -205.674567000912 ;

 MEAN_SQUARE_ERROR_SKILL_SCORE_THRESHOLD_1_BASELINE = _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _ ;

Again, it appears as though the baseline results are overwriting the right-side results.

I'm going to report this in the 6.8 deployment ticket, since it will hold up deployment. I'll also ask which ticket this likely relates to.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-10-21T15:45:20Z


To recap, the remaining work is to do some extra testing in -ti with a feature batch size of 50 versus no feature batching.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-02T00:52:06Z


Dear Hank,

Don't forget about this!

Yours truly,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-04T14:29:53Z


I've deployed @featureBatchSize=50@ to staging, confirmed it was 50 in a smoke test, and have started the evaluation that reproduces #108993.

Then I realized it was the retro sim evaluation that I needed to run. D'oh!

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-04T14:37:14Z


I've stopped the #108993 job.

For the #108361 job, here was the official reported retrieval/computation time:

The time elapsed between the completion of the first and last pools was: PT3H51M42.749584S.

The staging job testing with @featureBatchSize=50@ is 4300312082155091194. The earlier job was posted to production, so I may need to run this in staging with @featureBatchSize=1@, later, for a proper comparison.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-04T15:02:02Z


I managed to post the job before my VPN connection went down. However, I won't be able to check on progress until it comes back.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-04T19:16:02Z


The job I posted (the retro sim evaluation from #108361) failed due to OOME:

Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.base/com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:937)
	at java.base/com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:491)
	at java.base/javax.crypto.CipherSpi.bufferCrypt(CipherSpi.java:779)
	at java.base/javax.crypto.CipherSpi.engineDoFinal(CipherSpi.java:730)
	at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2497)
	at java.base/sun.security.ssl.SSLCipher$T12GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1655)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:260)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1509)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1476)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1065)
	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113)
	at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
	at org.postgresql.core.PGStream.receiveChar(PGStream.java:453)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2120)
	at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:2562)
	at org.postgresql.jdbc.PgResultSet.next(PgResultSet.java:2145)
	at com.zaxxer.hikari.pool.HikariProxyResultSet.next(HikariProxyResultSet.java)
	at com.github.marschall.jfr.jdbc.JfrResultSet.next(JfrResultSet.java:82)
	at wres.io.utilities.SQLDataProvider.next(SQLDataProvider.java:138)
	at wres.io.retrieval.database.TimeSeriesRetriever.lambda$getTimeSeriesSupplier$1(TimeSeriesRetriever.java:792)
	at wres.io.retrieval.database.TimeSeriesRetriever$$Lambda$593/0x000000084007b840.get(Unknown Source)
	at java.base/java.util.stream.StreamSpliterators$InfiniteSupplyingSpliterator$OfRef.tryAdvance(StreamSpliterators.java:1360)
	at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127)
	at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

I can increase the -Xmx as an experiment on Monday, but if there is anything you want me to look at before then, let me know. I'll check it out Monday morning or so.

Have a great weekend!

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-04T19:32:54Z


No, this is just the balance to strike, batch size vs. memory. This evaluation is on the larger size per feature/pool. Although they aren't forecasts, let alone ensemble forecasts, there is also no pooling by lead duration to make the pools smaller.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T13:53:11Z


Currently, the @mem_limit@ for the workers is 3584 MB. There appears to be no @-Xmx@ option specified in the Java options for the worker, but I see this in a smoke test:

2022-11-07T12:41:44.903+0000 INFO Main Processors: 8; Max Memory: 2493MiB; Free Memory: 2365MiB; Total Memory: 2493MiB;

I'm not clear on where those computations come from. Some sort of default for Java? Anyway, it seems significantly lower than the 3584 MB @mem_limit@ for the container, though some of that has to be reserved for the worker-shim, of course.

The total RAM on the COWRES machines is about 32 GBs. The services are assigned @mem_limit@ values as follows:

Entry Machine:

|container|mem_limit each (MB)|total (MB)|
|Persister|3072|3072|
|Tasker|1390|1390|
|Broker|720|720|
|Events Broker|2560|2560|
|Graphics (2)|1024|2048|
|TOTAL||9790|

Workers-only Machine:

|container|mem_limit each (MB)|total (MB)|
|Events Broker|2560|2560|
|Graphics (3)|1024|3072|
|TOTAL||5632|

I'm thinking about setting the @-Xmx@ to "4096m" for the workers and upping the worker container @mem_limit@ to 5120 MB, allowing for 1024 MB of overhead for the worker-shim. That will still fall well under the 32 GB RAM on the machines.

Thoughts?

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-07T14:27:27Z


Hank,

Yes, we have default jvm args set for the wres app in the @build.gradle@:

applicationDefaultJvmArgs = ['-Xms2560m', '-Xmx2560m',
                             '-XX:MaxDirectMemorySize=512m',
                             '-XX:+HeapDumpOnOutOfMemoryError',

Yes, there is an overhead for the worker shim container, including the os.

As to whether we set these higher, we can consider that, certainly, but I guess that is a separate ticket.

Of course, if you want to experiment with a larger setting for the worker in the context of this ticket in order to retain the feature batch size of 50, you could refresh the app default args in the docker options (@INNER_JAVA_OPTS@) temporarily, but we would not want to do it that way, eventually (not good to override the options in multiple contexts and rely on the last set being the set used, even though it typically is).

Cheers,

James

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T14:51:09Z


Just as an experiment, I wanted to override the settings and see what happens. Looks like I might have found an unexpected NPE:

    @Override
    public void run()
    {
        // To watch files, we have to jump through this hoop: get the filesystem
        FileSystem theFileSystem = FileSystems.getDefault();

        Path innerOutputDirectory = this.getOutputDirectory();

        try ( WatchService outputDirectoryWatchService = theFileSystem.newWatchService() )
        {
            // We assume that the
            this.getOutputDirectory()
                .register( outputDirectoryWatchService,
                           ENTRY_CREATE );

            while ( !this.foundInnerOutputDirectory )
            {
                // Look for the actual inner output directory to be created
                // by the WRES process (inside the one we're watching).
                WatchKey somethingFound =
                        outputDirectoryWatchService.poll( 1, TimeUnit.SECONDS );

                // Then look for files in that inner output directory
                LOGGER.debug( "Found something related to output? {}",
                              somethingFound );
...
</code>

The NPE happens when @somethingFound@ is set equal to @outputDirectoryWatchService.poll( 1, TimeUnit.SECONDS );@. I've never seen this before. I'll take a quick look to see if the problem is obvious, but otherwise, this might need to be a new ticket. Ugh.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T15:02:11Z


A new ticket needs to be created in general. There needs to be a null check on @outputDirectoryWatchService@.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T15:08:32Z


I'm backing out my change and trying to bring up the staging COWRES, again. I'm not sure how my change could cause the NPE, however.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T15:20:12Z


Appears to be the case. I had simply added, "-Xms=4096m -Xmx=4096m" at the beginning of @INNER_JAVA_OPTS@ for the worker. I also upped the @mem_limit@ to 5120m. Why would that cause the the NPE?

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T15:44:57Z


I'm abandoning my experiment to increase the memory in staging, because my first attempt leads to errors in the worker, and I have other work I need to focus on.

Would it be worthwhile for me to reduce the @featureBatchSize@ to, say, 20, and try it again?

Anyway, turning my attention elsewhere for the next couple of hours at least,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-07T15:50:13Z


I don't think that exception in the worker shim is anything to worry about. If you look through the worker shim logs, you will see that the polling of the output dir is routinely interrupted. I see a ton of those interruptions when I deploy locally. It is documented in the @JobOutputMessenger@ as "routine". It may be ugly, but it doesn't mean the service is broken.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T16:00:20Z


It was a syntax error in the @-Xmx@ and @-Xms@ settings. It doesn't use an '=' sign.

To be clear, my mistake caused the WRES engine to fail to start, which likely lead to the issue attempting to "observe" the output folder for files. The output messaging from the worker-shim basically told me nothing useful:

2022-11-07T15:30:22.586+0000 [main] INFO wres.worker.Worker - Waiting for work...
2022-11-07T15:30:22.587+0000 [pool-3-thread-1] WARN wres.worker.JobOutputMessenger - Interrupted while looking for innermost output directory.
java.lang.InterruptedException: null
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
        at java.base/java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:513)
        at java.base/java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:675)
        at java.base/sun.nio.fs.AbstractWatchService.poll(AbstractWatchService.java:108)
        at wres.worker.JobOutputMessenger.run(JobOutputMessenger.java:119)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2022-11-07T15:30:22.593+0000 [pool-3-thread-1] INFO wres.worker.JobOutputMessenger - Finished sending output messages for job 9037171929809242336

After fixing my syntax, the smoke test passes. I'll start the test run,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-07T16:06:04Z


OK, that makes more sense.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T16:09:04Z


Yeah, I've made that mistake multiple times over the years, too. The '=' sign just seems more intuitive to me. Anyway...

Next experiment is job 224927122560749953 posted with additional memory:

2022-11-07T16:07:26.166+0000 INFO Main WRES version 20221102-54cd7db
2022-11-07T16:07:26.184+0000 INFO Main Processors: 8; Max Memory: 4029MiB; Free Memory: 3933MiB; Total Memory: 4029MiB; WRES System Settings: ...SNIP...

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T17:44:49Z


I accidentally interrupted the job 224927122560749953 by bringing down its worker. The job was immediately picked up by another worker and has started from scratch. That means I may not have anything to say about this job until tomorrow morning given how long it usually takes.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-07T21:15:25Z


The job OOMEd again:

Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
	at java.base/java.util.ArrayList.grow(ArrayList.java:238)
	at java.base/java.util.ArrayList.grow(ArrayList.java:243)
	at java.base/java.util.ArrayList.add(ArrayList.java:486)
	at java.base/java.util.ArrayList.add(ArrayList.java:499)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2348)
	at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:2562)
	at org.postgresql.jdbc.PgResultSet.next(PgResultSet.java:2145)
	at com.zaxxer.hikari.pool.HikariProxyResultSet.next(HikariProxyResultSet.java)
	at com.github.marschall.jfr.jdbc.JfrResultSet.next(JfrResultSet.java:82)
	at wres.io.utilities.SQLDataProvider.next(SQLDataProvider.java:138)
	at wres.io.retrieval.database.TimeSeriesRetriever.lambda$getTimeSeriesSupplier$1(TimeSeriesRetriever.java:792)
	at wres.io.retrieval.database.TimeSeriesRetriever$$Lambda$592/0x000000084067f840.get(Unknown Source)
	at java.base/java.util.stream.StreamSpliterators$InfiniteSupplyingSpliterator$OfRef.tryAdvance(StreamSpliterators.java:1360)
	at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127)
	at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at wres.io.retrieval.CachingRetriever.lambda$new$0(CachingRetriever.java:57)
	at wres.io.retrieval.CachingRetriever$$Lambda$570/0x00000008400a2440.get(Unknown Source)
	at wres.io.retrieval.CachingSupplier.get(CachingSupplier.java:50)
	at wres.io.retrieval.CachingRetriever.get(CachingRetriever.java:41)
	at wres.io.retrieval.CachingRetriever.get(CachingRetriever.java:18)
	at wres.io.pooling.PoolSupplier.createPool(PoolSupplier.java:219)
	at wres.io.pooling.PoolSupplier.get(PoolSupplier.java:195)
	at wres.io.pooling.PoolSupplier.get(PoolSupplier.java:87)
	at wres.io.pooling.PoolFactory$SupplierWithPoolRequest.get(PoolFactory.java:1983)
	at wres.io.pooling.PoolFactory.lambda$decompose$17(PoolFactory.java:1580)
	at wres.io.pooling.PoolFactory$$Lambda$581/0x000000084007c040.get(Unknown Source)

More experimentation tomorrow. Thanks,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-07T21:33:58Z


As an aside, I don't think you need more than 6*50 features to reproduce an oome because there are 6 pooling threads and 50 features per thread. That might make it easier to test. Also, you can probably do some kind of back-of-the-envelope calculation based on the number of pairs per pool and the cost in memory of each pair, roughly. Otherwise, this is going to be a very tedious experiment :-)

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-07T21:35:10Z


All that said, this is probably a great evaluation for setting the limits of memory and/or batch size. It's possible we will encounter bigger ones in reality and an oome is kind of a bad outcome in the wild, so we probably want to be conservative about the batch size, but this use case will help a lot, I think.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-10T13:45:14Z


I don't want to have to refer to other tickets to find the declaration I'm using to stress test the COWRES in staging. I've pasted it below. This is the evaluation that has OOMEd in staging with a @featureBatchSize@ of 50 even with the memory upped to @-Xmx4096m@.

I plan to discuss next experiments during today's meeting.

Hank

=====================================================================

<?xml version='1.0' encoding='UTF-8'?>
<project name="NWM Retro Sim from Zarr Example">
  <inputs>
    <left label="USGS NWIS Streamflow Observations">
      <type>observations</type>
      <source interface="usgs_nwis">https://nwis.waterservices.usgs.gov/nwis/iv</source>
      <variable>00060</variable>
    </left>
    <right featureDimension="nwm_feature_id" label="MARFC RetroSim CSVs">
      <type>simulations</type>
      <source>/home/ISED/wres/nwm_2_1_retro_simulations/MARFC</source>
      <variable>streamflow</variable>
    </right>
  </inputs>
  <pair label="NWM Retro Sim Example Pair Config">
    <unit>ft3/s</unit>
    <featureService>
      <baseUrl>https://nwcal-wrds.[domain]/api/location/v3.0/metadata</baseUrl>
      <group pool="false">
        <type>rfc</type>
        <value>MARFC</value>
      </group>
    </featureService>
    <dates earliest="1980-01-01T00:00:00Z" latest="2021-01-01T00:00:00Z"/>
    <validDatesPoolingWindow>
      <period>360</period>
      <frequency>60</frequency>
      <unit>days</unit>
    </validDatesPoolingWindow>
  </pair>
  <metrics>
    <metric>
      <name>pearson correlation coefficient</name>
    </metric>
    <metric>
      <name>sample size</name>
    </metric>
    <metric>
      <name>mean error</name>
    </metric>
    <metric>
      <name>mean absolute error</name>
    </metric>
  </metrics>
  <outputs durationFormat="hours">
    <destination type="graphic"/>
    <destination type="pairs"/>
    <destination type="numeric"/>
    <destination type="csv2"/>
  </outputs>
</project>
</code>

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-10T17:45:00Z


Notes from Dev Call:

This style of evaluation, being a real use case, is reasonable to use to assess the balance.

We do have significantly more memory that we can use; we can up the amount of RAM per worker making sure the container @mem_limit@ is upped accordingly. We can also bring down the feature batch size, but keep an eye on the compute/retrieval time improvement.

First run using featureBatchSize 20 today errored out due to unrelated issues. I'll report a ticket before COB today.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-15T12:25:08Z


Using a @featureBatchSize@ of 20 also failed. Evidence from the start of the run:

@featureBatchThreshold=10,featureBatchSize=20@

Result:

2022-11-10T22:07:20.133+0000 ERROR Main Operation 'execute' completed unsuccessfully
wres.pipeline.InternalWresException: Could not complete project execution
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:337)
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:194)
	at wres.Functions.execute(Functions.java:183)
	at wres.Functions.call(Functions.java:121)
	at wres.Main.completeExecution(Main.java:171)
	at wres.Main.main(Main.java:132)
Caused by: wres.pipeline.WresProcessingException: Encountered an error while processing evaluation 'DBwlcJry9t42iyEnXLxWAr9rpTY': 
	at wres.pipeline.ProcessorHelper.processEvaluation(ProcessorHelper.java:283)
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:312)
	... 5 common frames omitted
Caused by: wres.pipeline.WresProcessingException: Project failed to complete with the following error: 
	at wres.pipeline.ProcessorHelper.processProjectConfig(ProcessorHelper.java:601)
	at wres.pipeline.ProcessorHelper.processEvaluation(ProcessorHelper.java:220)
	... 6 common frames omitted
Caused by: java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.TreeMap.put(TreeMap.java:575)
	at java.base/java.util.TreeSet.add(TreeSet.java:255)
	at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:352)
	at java.base/java.util.TreeSet.addAll(TreeSet.java:312)
	at wres.datamodel.time.TimeSeries.<init>(TimeSeries.java:175)
	at wres.datamodel.time.TimeSeries$Builder.build(TimeSeries.java:353)
	at wres.datamodel.time.TimeSeriesSlicer.transform(TimeSeriesSlicer.java:1009)
	at wres.io.pooling.PoolFactory.lambda$getSingleValuedTransformer$3(PoolFactory.java:789)
	at wres.io.pooling.PoolFactory$$Lambda$542/0x000000084009c840.apply(Unknown Source)
	at wres.io.pooling.PoolSupplier.createSeriesPairs(PoolSupplier.java:1216)
	at wres.io.pooling.PoolSupplier.createPairsPerLeftSeries(PoolSupplier.java:1031)
	at wres.io.pooling.PoolSupplier.createPairsPerFeature(PoolSupplier.java:929)
	at wres.io.pooling.PoolSupplier.createPool(PoolSupplier.java:625)
	at wres.io.pooling.PoolSupplier.createPool(PoolSupplier.java:248)
	at wres.io.pooling.PoolSupplier.get(PoolSupplier.java:198)
	at wres.io.pooling.PoolSupplier.get(PoolSupplier.java:87)
	at wres.io.pooling.PoolFactory$SupplierWithPoolRequest.get(PoolFactory.java:1994)
	at wres.io.pooling.PoolFactory.lambda$decompose$17(PoolFactory.java:1591)
	at wres.io.pooling.PoolFactory$$Lambda$575/0x000000084007b840.get(Unknown Source)
	at wres.io.retrieval.CachingSupplier.get(CachingSupplier.java:50)
	at wres.io.pooling.PoolFactory.lambda$decompose$18(PoolFactory.java:1602)
	at wres.io.pooling.PoolFactory$$Lambda$576/0x000000084007bc40.get(Unknown Source)
	at wres.io.pooling.PoolFactory$SupplierWithPoolRequest.get(PoolFactory.java:1994)
	at wres.pipeline.PoolProcessor.get(PoolProcessor.java:249)
	at wres.pipeline.PoolProcessor.get(PoolProcessor.java:39)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	... 3 common frames omitted

It did appear to complete about a 100 pools before bombing out, so maybe its not far from running. I'll try it with the increased memory amount in a bit. I have an evaluation running that will allow me to check if there is an option to select the metric displayed in the map and I don't want to interrupt it. The next experiment for this ticket should be later this morning.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-18T20:12:52Z


FYI... I just started another #108993 reproduction using the 20 feature batch size. I don't think that's really going to test much, since it had no problem at 50. Just saying.

I'll start another batch size = 20 run with more memory on Tuesday, my only workday next week, and then continue the experiments after Thanksgiving.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-11-23T19:07:26Z


I was supposed to start the next experimental run, batch size 20 with more memory, before I left today. That won't happen. Staging is currently busy with #110224.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-12-09T13:47:55Z


I think this is blocked by #110660.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant