Surface water quality data analysis and prediction of Potomac River, West Virginia, USA. Using time series forecasting, and anomaly detection : ARIMA, SARIMA, Isolation Forest, OCSVM and Gaussian Distribution
There exists an imperious need for development of schemes to analyse constantly monitored environmental data i.e. information about the various aspects of the ecosystem such as Surface Water Quality Parameters such as Dissolved Oxygen, Turbidity, Specific Conductance of water and analyse them for unnatural increase in their general values above predetermined standard levels to detect environmental anomalies that cause such increase. These parameters reflect the absolute state of the ecosystem of a particular geographical area, and thus help us to access any present or future discrepancies which can cause environmental degradation by direct or indirect activities of man in the geographical area.
This process is done using Time Series forecasting techniques ARIMA and Seasonal ARIMA and anomaly detection techniques which are Isolation Forest, Gaussian Distribution, OneclassSVM.
The above graph shows that isolation forest may be detecting a lot more false positives than the other approaches or it might be over measuring the result. All other methods give similar result with anomaly percentage ranging from 9 to 20 %. The Anomaly graph predictions shown earlier indicate that most anomalies occur on 29 January, 2017 and also on 22 March, 2017. These anomalies can be acknowledged by the fact that these dates had actually shown intensity rainfalls on the monitoring site.