Notebook.txt

==========================================
Seasonal and Decadal Predictability Engine
==========================================
Notes in Trello

2021.05.27
Problem with this file - it doesn't have lvl_bnds fields, but instead depthT.
"/data/users/marpay/PredictabilityEngine/data_srcs/CMIP6/so_Omon_IPSL-CM5A2-INCA_historical_r1i1p1f1_gn_185001-201412.nc"
This may be a sigma coordinate problem again?

2020.08.18

2020.08.14
Restared development by doing a complete rewrite on a new branch. Aim is to allow PredEng to incorporate alternative forms of forecast validation, and to minimise the number of intermediate files. This seems to be working ok so far.


2020.02.03
Working on 20191206 branch, removing spatial subsetting.

2019.10.25
Fixed errors with the handlign of fragstacks. Looks like we can go all the way through to end of H1 now.

2019.10.21
Think I have Stats working again - hit a problem with the number of jobs that LSF can handle (max 1024) which means that we need to take care of the chunking and subsetting in a much more active way, to ensure that we stay in underneath this limit.

2019.08.01
Back into it. Seem to have made good process on the NMME - it looks like it's working now. Moving on the CMIP5, which is a bit trickier, and needs some larger scale revisions. Still, am optimistic
* Check whether NMME worked
* Do production run for BW salinity
* Build metadata for CMIP5 fragments

2019.06.13
Last few days have seen a branch 20190611 to create a new approach to parallelisation - large data sets, such as the CESM-DPLE can now be handled in chunks, rather than being dependent on a single processor - this is critical for working with this dataset, and can hopefully be applied to other approaches as well e.g. CMIP5, CMIP6 etc. Such a complex set of changes was necessary - simply dividing the files up didn't really work when it came to calculating the climatologies, which need to be calculated across all files. Hence, the need to chunk the data processing. Seems to be working now, and we are ready to merge back into the trunk. Will give it a test run over the Bluefin though, just to be sure - first without CESM, then with CESM, then with a chunked version of it

2019.05.28
Implemented DPLE support. This required some thinking about the way that we deal with data sources, and the like. However, it was possible to fit into the CDO approach in the end, which is very relieving.

2019.05.16
Some more structure changes to the PredEng package to improve the statistics methods. Seems to have been a good idea - I have now got to the point where I can start to make spatial correlation maps.
* Start making visualisations of skill
* Extend to other metrics and allowing customisation of metrics

2019.05.14
Major rewrites around the summary statistics to support field-based results. Was a fairly major headache, but feel like I have a good solution now. Also required major redefinitions of the metadata conventions as well. A lot of work, and it may take some time to get the bugs out, but I feel it was worth it - we now have seamless support for spatial fields
* See if we can find a good way to do global detection - possibly with a flag?
* Update bluefin and run it through the analysis again, to check that it has worked properly. It did.
Also removed the PredEng.list class, as it was a bit naff.
Looks like we have a fully functioning system again now. Next step is to refine the spatial skill metrics. This is probably best done by defining the skill metric in association with the statistic, and then picking up the rest from there in H3. Will probably need to separate though into field metrics and indvidual metrics, but could pull it together again in the summary metrics. Lead times could be irritating though.

2019.05.13
Started work on spatial statistics. Got a start, anyway. Now need to settle on a structure for configuring and defining metrics.
* Implement CPRS metrics. Distinguish between deterministic and probabilistics metrics in the skill metric calculations. Check with Johanna how best to do the calculation

2019.05.08
Before moving to fully spatial fields, lets check two things
* does the Bluefin tuna still work properly? Yes, appears to work well.
* Lets get the realisations working as well..
Spent most of the morning working on problems with workload distribution - have streamlined that and removed some annoying bugs. Starting work on realisations now, but need to check how it is being handled by the indivudal sum stats, and possibly rewrite them to support realisations.
Late-night session broke a lot of code, but it looks like we have realizations working now

2019.05.03
Implement habitat suitability metrics - appears to be quite successful. Will probably need to play with this a bit in the future to find a final structure that works best, but at the moment it is just doing linear interpolation of supplied values - this should be faster than any model evaluation, especially if the model is fit in BRMS.
* Next step is to extend to the spatial aspect, to make maps of skill. This is best done using the same summary statistics, but perhaps overloading them to respond differently depending on the context. Dunno. Need to think it through.

2019.05.02
Got the visualisation code working, although it was a bit of a battle with Markdown as ever. Next step
* Implement carrying capacity model
* Implement spatial tools

2019.05.01
Got the calculation of skill metrics working, after some close investigation of how the persistence calculations work, including an upgrade of C2 to be RDS compatable. Next step is the to do the visualisations of these results.
* Refine H4 visualisation code

2019.04.30
A little less effective today but still progress. H2 is a little unclear how that should work - I have clearly been thinking about parallelising the collation process, but that is not how the script is at the moment. Anyway, seems to be effective. Running a full Mackerel run
* Maybe think about some automated visualisation and processing notebooks?

2019.04.29
Initiated a significant rewrite to remove condexex and replace with makefile-like functionality i.e. detecting whether the metadata files have been created. So far, so good, but requires some careful line-by-line checking. Currently up to D1. Completed. 
* Next step D2, then do a full run for Mackerel

2018.12.12
Tried to restart on this, but unsure what problem is - seems to be something with the downloading of the data, and the server possibly saturating. Need to try downloading NMME in smaller chunks.

2018.10.05
Worked on adapting code to work with new approach to NMME ie download the whole lot, and then process from there. Have managed to get E1 and E2 working (even if there is something wierd with my laptops ability to read NMME via OpenDAP). Now need to adapt E4 - this should be simply a case of removing the remapping (which is taken care of in E2 now).

2018.09.12
No problem still seems to be there - unsure where it is comming from, but I can't seem to reproduce it when running on my laptop, but it is still there even after rerunning from scratch on the HPC. This may be something related to the different versions of CDO on the two machines? Do a side-by-side comparison of the cdo sub command to check

2018.09.05
Identified problem with NMME handling - there are a lot of NAs in the summary statistics. These have been traced back to the original NMME data that I have downloaded (have sent an email to check with NOAA about this). However, the problem may have solved itself - I am therefore rerunning the previous Bluefin analysis to see if it is still there.

2018.07.27
Have been working on adjusting coe base to work with fragstacks and multiple spatial areas. I think we have it working completely now for the Decadal script, but need to check it closely. Check now results and compare with previous results.

2018.06.05
Found a problem in the MPI-ESM-LR with the dates - the following two files have corrupted date tags, and hae been removed from the database. I remember something about this previously but didn't correct for it.
thetao_Omon_MPI-ESM-LR_decs4e1996_r5i1p1_199701-200612.nc
thetao_Omon_MPI-ESM-LR_decs4e1964_r3i1p1_196501-197412.

2018.06.01
Finally back into it on the plane to ECCWO. Made good progress implementing the SST extraction for HadISST in the fragment format, and then calculating skill metrics. Next steps:
* Go through all of the code doing debug=Inf runs to update the metadata conventions. 
* Visualisation of results and preparation of talk
* Could we have foreseen the 2012/14 results?
* How does the spatial forecast look for 2018?
* Matchups with CPUE data?

2018.05.15
Good progress on the code base. Got the NMME code more or less up and running again, and am now working on trying to get it into a format that is compatable with the rest of the codebase.
* Make sure that CDO works with the fragmented data

2018.05.14
Restarted work on this code set by shifting everything to Git. Worked fine. Need to do some line by line review of the code base to make sure that we have everything under control, and to bring it into a more intuitive data frame work
. NextAction: Allow specification of the data source directory in the object, to improve transparency and avoid having a horrible array of nested directories. The IRODS store directory structure should ultimately mirror this structure
. Implement NMME code

2016.08.31
Restructured data sources to have a clear distinction between initialised and uninitialised runs

2016.08.28
Implemented (most of) the uninitialised model extraction and metric calculation - now need to calculate the correlation coefficients and handle this result properly. Also need to find a clean way for the processing chain to distinguish between intialised and uninitialised versions of the models.
* Add more uninitialised metrics to the calculations

2016-08-26
Added get.dates() functionality, and passing of metadata around, to make things cleaner
. PWW and G1 Skills need to be updated to reflect recent changes
* Add spatial correlation and probabilistic skill maps (one script each to generate for each model, one script to visualise)
* Add EN4 functionality
. Unitialised model skill

2016-08-25
General code tidying. Introduced parent and child classes for the models to allow much more focused extraction of metadata such as the realisation member and the initialisation date. This can be extended later to include specific processing schemes as well. Redesigned the way the spatial grid is handled - the grid is now specified as a configuration option, and everything gets interpolated onto this grid. Wrote functionality to calculate the mean over all realisations as well. 
- Save meta data from D1 for easy lookup
- Modify indicator calculation script D2 to use this as well
* Extend to allow spatial skill maps etc - maybe create a "spatial metrics" set of objects in project configuration as well.

2016-08-24
Some minor tweaks and polishs here and there to add a show() functionality, and to fix some problems with the OISST download etc. Have run analyses for Bluefin-Sep temperatures as well
* Run a Bluefin Aug-Sep setup as well
- Add NCEP into the mix
* Add uninitialised model skill as well.

2016-08-15 (~)
A series to steps to define a model class - this contains the specific configuration for that model, and allows for variation between various models. Also implemented and ran the MPI-ESM-ER model.
* Spatial analysis of skill
- Add analysis of MPI-NCEP - will probably need to define a CDO object
* Deal with forecasts more rigorously, inclusing spreads
* Probabilistic skill measures
* Historical uninitialised simulations as measure of ultimate skill
* Use EN4 / HadISST and a longer observational climatology
* Use RMSE skill metrics

2016-07-15
Made some big steps here implementing the full configuration system, and we are basically finished with that part of it - we can quickly reconfigure the analysis, which is great.
* Spatial analysis of skill
* Add analysis of MPI-NCEP - will probably need to define a CDO object

2016-07-14
Need to handle multiple configurations and reconfigurations more cleanly. Developed a configuration object that can be passed around easily and started some more work on the cdo functions. Am now working way through processing chain with Eel as an example - seems to be helping a lot and am making good progress.
* Next on the to-do list is B2.Calculate_CDO_metrics

2016-06-17
Generalised MPI processing code to work with anything that is supported by CDO. Tested with IPSL decadel prediction data and it appears to work well. Renamed project to the "predictability engine" to reflect the increasingly broad application of the tool. Got B1 working all the way through with IPSL, including on the server. 
* Now check and run B2
* Check and modify D1 to incorporate IPSL data
* Run NCAR/NCEP model as well

2016-06-06
Anoms look to be calculated correctly. Have handled averaging over realisations by simply averaging the metrics - this is identical for average temperature, and probably not all that important for area. Have calculated metrics - looks like we get good skill out to four years at least.
* Add persistence and reference forecasts.
* Anomaly persistence forecasts

2016-06-03
Developed processing of MPI-OM using exclusively cdo. Awesome. Got the anomalies calculated ok. Now need to check and calculate metrics.
* Check that we have correctly calculated anoms (do they add to 0?)
* Decide ifi we need to average over realisations?
* Calculate metrics

2016-06-02
Started on MPI-OM analysis now that Daniela has provided baseline-1 hindcasts. CDO is simply awesome - in a few lines I've already got the time-stamps-ROI-level extraction done.

2016-05-25
Lots of progress today. Implemented metric extraction code for both OISST and NMME, including a landmask, then combined it together to calculate skill metrics. I am essentially finished with the NMME predictability analysis now with some good looking results showing predictability out to at least a year. Some finer details could be used to polish it, but otherwise there. Also added visualisations of model forecasts, so that we have a forecast system ready to push out to the world.
* Add Persistence forecasts to plots
* Anomaly persistence forecasts?
* Taylor diagrams for model skill?
* Generate an NMME ensemble forecast
* MPI-OM forecasts and skill

2016-05-24
Refined and checked Month of Interest Extraction scripts. Appears to be fine. Implemented climatology calculations - easier than thought - and applied to calculate anomalies. Prepared visulations of model-shock.
* Calculate areas and then metrics.
* Combine models to produce an NMME forecast

2016-05-23
Worked on NMME processing today, with some success. Rewrote the download scripts, so that it all runs through R now, including building multiple files into one. Have extracted the August values and written into a raster compatible output file. Now need to bring it all together and calculate climatologies.
* Check that extraction has worked properly
* Calculate climatologies
* Convert to areas

2016-05-20
Defined a set of common elements. Wrote code to download and combine OISST data into a single file using NCO. Subsequent processing to calculate a climatology and anomaly are done using raster. The processing of the OISST data is therefore more or less complete.
* Move on to NMME processing

2016-05-19
Tweaking handling of dates to use the lubridate package, and convert months-since-1960 to normal dates

2016-05-18
A lot of this has been made redundant, as Daniela is updating the MPI-OM outputs for me. I don't have them yet, so will need to wait and see. Wrote a meta-data oveview script for the NMME data set.
* Continue with NMME analysis

2016-04-15
Added Makefile implementation

2016-04-14
Initial version. After playing a bit with bash scripts, decided the best approach was just to run CDO directly from R - this is generally a lot tidier approach I think, and still achieves the fantastic speed of CDO, with the userfriendliness of R. Next steps:
* Wrap it all up in a Makefile and run it.