From 593153ad449f017d241dd0af7a786f9cbe03a02f Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Sat, 16 Dec 2023 14:13:19 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- electricity.html | 2 +- gas.html | 2 +- oil.html | 2 +- post-mortem-electricity.html | 2 +- post-mortem-gas.html | 2 +- post-mortem-oil.html | 2 +- search.json | 314 +++++++++++++++++------------------ sitemap.xml | 48 +++--- 9 files changed, 188 insertions(+), 188 deletions(-) diff --git a/.nojekyll b/.nojekyll index 0af5ffd..6256871 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -6a0e15b9 \ No newline at end of file +12793a4a \ No newline at end of file diff --git a/electricity.html b/electricity.html index 816802b..bd6d3ea 100644 --- a/electricity.html +++ b/electricity.html @@ -99,7 +99,7 @@ diff --git a/gas.html b/gas.html index 2501d38..13b3c3f 100644 --- a/gas.html +++ b/gas.html @@ -99,7 +99,7 @@ diff --git a/oil.html b/oil.html index 49a0da0..bd1024b 100644 --- a/oil.html +++ b/oil.html @@ -99,7 +99,7 @@ diff --git a/post-mortem-electricity.html b/post-mortem-electricity.html index 434f650..105e601 100644 --- a/post-mortem-electricity.html +++ b/post-mortem-electricity.html @@ -99,7 +99,7 @@ diff --git a/post-mortem-gas.html b/post-mortem-gas.html index b7606a9..3c0dc81 100644 --- a/post-mortem-gas.html +++ b/post-mortem-gas.html @@ -99,7 +99,7 @@ diff --git a/post-mortem-oil.html b/post-mortem-oil.html index e38a81b..e6bf574 100644 --- a/post-mortem-oil.html +++ b/post-mortem-oil.html @@ -99,7 +99,7 @@ diff --git a/search.json b/search.json index 842605b..10380b4 100644 --- a/search.json +++ b/search.json @@ -1,66 +1,52 @@ [ { - "objectID": "methodology.html", - "href": "methodology.html", - "title": "Methodology", + "objectID": "oil.html", + "href": "oil.html", + "title": "European oil and petroleum product deliveries challenge", "section": "", - "text": "Our team employed a comprehensive methodology utilizing both standard econometric models and machine learning type models to accurately nowcast the target variables. Specifically, we utilized a combination of three econometric models, including RegArima, Exponential Smoothing (ETS), and Dynamic Factor Models (DFM), as well as two machine learning models, XGBoost and Long Short-Term Memory (LSTM) models, to ensure robust and accurate forecasting. While we applied the same five model classes across all three challenges, including GAS, OIL, and ELECTRICITY, we tailored the specific datasets and parameters to each challenge to optimize model performance. This approach allowed us to leverage the strengths of both traditional econometric models and cutting-edge ML techniques to achieve the best possible forecasting results." - }, - { - "objectID": "methodology.html#regarima", - "href": "methodology.html#regarima", - "title": "Methodology", - "section": "Regarima", - "text": "Regarima\n\nIntroduction\nARIMA modelling is a common type of time models used to capture internal information of a series, wether is is stationary or non stationary. ARIMA models offer some advantages in this exercise: they are relatively simple, easy to interpret, and therefore provide a useful benchmark for more complex models. As standard ARIMA models do not include external information, we use the extended version RegARIMA with external regressors (regression model with ARIMA errors, see Chatfield and Prothero (1973)):\n\\[\ny_t = c + \\sum_{i=1}^{p}\\alpha_i x^i_{t}+u_t\n\\] \\[\n\\phi(L)(u_t) = \\theta(L)(\\varepsilon_t)\n\\]\nAdditional regressors used in this project are of two types:\n\nEconomic variables presented in the data section. REGARIMA models must keep parsimonious (contrary to other methods implemented in this project), so relevant variables are selected through standard selection procedures or a priori.\nOutlier variables such as level-shift or additive outilers to control for atypical observations.\n\nModels are applied to the first differentiation of the variable of interest.\n\n\nAutomatic identification and nowcasting\nWe use the RJdemetra package (the TRAMO part) that provide an easy way to identify and estimate RegARIMA models with high flexibility on parameters.\n\nwe allow for automatic detection of outliers on the estimation period in order to avoid large bias on coefficients. Without adding outliers, “covid” points, for instance, would totally distort coefficients. Outliers identified through this procedure could in addition be used in other methods\ncomputation time is fast (a few seconds for the whole set of countries)\nexternal regressors are selected independently, a priori or through standard variable selection procedure.\n\nThe final nowcast value is provided by the projection of the model at horizon 1, 2 or 3, depending on the challenge and the position in the month.\n\n\nSeasonal adjustment for electricity\nFor electricity, the seasonal component is very strong and may not be the same as the explanatory variables. An X13 pre-treatment is applied to seasonnally adjust and correct for trading days the target variable (Number of nights spent at tourist accommodation establishments) and the potential explanatory variables. This treatment also provides a prediction for the seasonal coefficient of the nowcasted month.\nNext, seasonally adjusted variables are put in the REGARIMA model. The final step involves nowcasting the “raw value” by dividing the SA forecast by the projected seasonal coefficient." - }, - { - "objectID": "methodology.html#dynamic-factor-models", - "href": "methodology.html#dynamic-factor-models", - "title": "Methodology", - "section": "Dynamic Factor Models", - "text": "Dynamic Factor Models\n\nIntroduction\nDynamic Factor Models (DFM) are a powerful and versatile approach for nowcasting, which involves extracting latent factors from a large set of observed economic or financial indicators. These latent factors capture the underlying dynamics of the data and are used to generate forecasts or predictions in real-time.\nDFM are based on the idea that a small number of unobservable factors drive the behavior of a large number of observed variables. These factors represent the common underlying movements in the data and can be interpreted as representing the state of the economy or the financial system at a given point in time. By estimating these latent factors from historical data, DFM allows us to capture the relevant information embedded in the observed indicators and use it to generate accurate nowcasts.\nA standard dynamic factor model involves 2 main equations.\n\nThe factor equation (Equation 1): This equation represents the dynamics of the latent factors, which are unobservable variables that capture the common underlying movements in the observed data. The factor equation is usually specified as a dynamic system allowing the unobserved factors \\(F_t\\) to evolve according to a \\(VAR(p)\\) process, and can be written as: \\[\nF_t = \\sum_{j=1}^pA_{j}F_{t-j} + \\eta_t, \\qquad \\eta_t \\sim N(0, \\Sigma_{0})\n\\tag{1}\\] where \\(F_t\\) represents the vector of latent factors at time \\(t\\), \\(A_{j}\\) is the (state-) transition matrix capturing the dynamics of the factors at lag \\(j\\), \\(F_{t-1}\\) is the vector of factors at time \\(t-1\\), and \\(\\Sigma_{0}\\) is the (state-) covariance matrix.\nThe measurement equation (Equation 2): This equation links the latent factors to the observed variables. It specifies how the observed variables are generated from the latent factors and can be written as: \\[\nX_t = \\Lambda F_{t} + \\xi_t, \\qquad \\xi_t \\sim N(0, R)\n\\tag{2}\\] where \\(X_t\\) represents the vector of observed variables at time \\(t\\), \\(\\Lambda\\) is the factor loading matrix, linking the factors to the observed variables, \\(\\xi_t\\) is the vector of measurement errors at time \\(t\\) and \\(R\\) is the (measurement-) covariance matrix.\n\n\n\n\n\n\n\n\n\nMatrices\nSizes\nDescriptions\n\n\n\n\n\\(F_t\\)\n\\(n \\times 1\\)\nVector of factors at time t \\((f_{1t}, \\ldots, f_{nt})'\\)\n\n\n\\(A_j\\)\n\\(r \\times r\\)\nState transition matrix at lag \\(j\\)\n\n\n\\(\\Sigma_0\\)\n\\(r \\times r\\)\nState covariance matrix\n\n\n\\(X_t\\)\n\\(n \\times 1\\)\nVector of observed series at time t \\((x_{1t}, \\ldots, x_{nt})'\\)\n\n\n\\(\\Lambda\\)\n\\(n \\times r\\)\nFactor loading matrix\n\n\n\\(R\\)\n\\(n \\times n\\)\nMeasurement covariance matrix\n\n\n\\(r\\)\n\\(1 \\times 1\\)\nNumber of factors\n\n\n\\(p\\)\n\\(1 \\times 1\\)\nNumber of lags\n\n\n\n\n\nData used and estimation\nFor the estimation of our Dynamics Factor Models (DFM), we utilized three main sources of data to capture different aspects of the economic and financial activity. These data sources include Eurostat data for economic activity, financial data obtained using the Yahoo API for financial activity, and Google Trends data for capturing more recent evolutions1. By incorporating these three main sources of data, we aim to capture different aspects of the economic and financial activity, ranging from long-term trends to short-term fluctuations and recent evolutions. This multi-source data approach allows us to build a more comprehensive and robust DFM model, which can provide more accurate and timely nowcasting predictions for the variables of interest.1 You can refer to the Data section for more extensive details on the data that has been used.\nFor the estimation of our various Dynamics Factor Models (DFM), we relied on the “dfms” package in R, a powerful and user-friendly tool for estimating DFMs from time series data. The “dfms” package provides convenient and efficient methods for DFM estimation, making it an invaluable resource for our nowcasting project. Its user-friendly interface and optimized algorithms make it easy to implement and customize DFM models, while its efficient computational capabilities enable us to handle large datasets with ease. It allows different type of estimations (see Doz, Giannone, and Reichlin (2011), Doz, Giannone, and Reichlin (2012) and Banbura and Modugno (2014)). We really want to thanks Sebastian Krantz, the creator of this package.\nTo ensure the robustness and accuracy of our DFM estimation, we took several steps beforehand the estimation pipeline of the “dfms” package. Since there are no trend or intercept terms in Equation 1 and Equation 2, we made \\(X_t\\) stationary by taking a first difference. Note that \\(X_t\\) is also standardized (scaled and centered) automatically by the “dfms” package.\nWe also pay attention to the availability of the data to ensure that each observed series have sufficient data for estimation. If any series have inadequate data, it is removed from the estimation process to prevent biased results. We also account for potential collinearities that could occur among several economic variables. Highly correlated series (\\(\\rho \\geq 0.9999\\)) are removed to mitigate multicollinearity issues and improve estimation accuracy.\nTo determine the optimal values for the number of lags and factors in our DFM models, we use the Bai and Ng (2002) criteria, which provides statistical guidelines for selecting these parameters. We set a maximum limit of 4 lags and 2 factors for computational efficiency reasons.\nWe initiate the estimation process from February 2005 for all challenges, and for the ELECTRICITY challenge, we performed a seasonal adjustment on the target series prior to estimation in order to account for any seasonal patterns in the data.\n\n\nNowcasting\nOnce the DFM is estimated (\\(A_j\\), \\(\\Sigma_0\\), \\(\\Lambda\\), \\(R\\)), it can be used for nowcasting by forecasting the latent factors using the factor equation, and then generating nowcasts for the observed variables using the measurement equation. The predictions can be updated in real-time as new data becomes available, allowing for timely and accurate predictions of the variables of interest." + "text": "challenge = \"OIL\"\nThe monthly PVI indicator represents the production index in industry. The objective of the index is to measure changes in the volume of output at monthly intervals. It provides a measure of the volume trend in value added over a given reference period. The production index is calculated in the form of a Laspeyres type index." }, { - "objectID": "methodology.html#ets", - "href": "methodology.html#ets", - "title": "Methodology", - "section": "ETS", - "text": "ETS\nExponential smoothing models (Hyndman et al. (2008)) are a class of models where forecasts are linear combinations of past values, with the weights decaying exponentially as the observations get older. Therefore, the more recent the observation is, the higher the associated weight is. Moreover, exponential smoothing models do not require any external data.\nThe exponential smoothing models used are a combination of three components:\n\nAn error: additive (\\(A\\)) or multiplicative (\\(M\\)).\nA trend: additive (\\(A\\)), multiplicative (\\(M\\)), damped (\\(A_d\\) or \\(M_d\\)) or absent (\\(N\\)).\nA seasonality: additive (\\(A\\)), multiplicative \\(M\\) or absent (\\(N\\)). The seasonal component is only used with electricity data.\n\nSee Figure 1 for the description of the model.\n\n\n\n\n\n\n\n(a) Additive error\n\n\n\n\n\n\n\n\n\n(b) Multiplicative error\n\n\n\n\n\n\nSource: Hyndman and Athanasopoulos (2018) \n\n\nFigure 1: Exponential smoothing models.\n\n\nFor each series, the model is selected minimising the Akaike’s Information Criterion (Akaike (1974)) and parameters are estimated maximising the likelihood." + "objectID": "oil.html#next-forecast-for-a-given-country", + "href": "oil.html#next-forecast-for-a-given-country", + "title": "European oil and petroleum product deliveries challenge", + "section": "Next forecast for a given country", + "text": "Next forecast for a given country\n\nviewof country = Inputs.select(Object.values(country_map), {\n label: html`<b>Select a country:</b>`,\n placeholder: \"Enter a country name\",\n unique: true\n })\n\n\n\n\n\n\n\n\n\nPlot.plot({\n grid: true,\n y: {\n label: \"↑ Production volume in industry\",\n }, \n x: {\n label: \"Year\",\n domain: range\n },\n marks: [\n Plot.line(historical, {\n tip: true,\n x: \"date\", \n y: \"values\", \n stroke: \"black\",\n title: (d) =>\n `${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })}\\n ${d.values} `\n }),\n Plot.dot(predictions, {\n tip: true,\n x: \"date\", \n y: \"values\",\n fill: \"model\",\n title: (d) =>\n `${d.model}\\n ${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })} : ${d.values} `\n })\n ],\n color: {legend: true}\n})\n\n\n\n\n\n\n\n\n\n\n\ndates = {\n const data = historical.map(d => d.date)\n data.push(predictions.map(d => d.date)[0])\n return data \n}\n\nviewof range = offsetInterval(dates, {\n value: [ dates[dates.length-90], dates[dates.length-1] ],\n format: ([a, b]) => htl.html`<span ${{\n style: \"display: flex; justify-content: space-between\"\n }}>\n ${a.toISOString(\"en-UK\").slice(0, 10)}\n        \n        \n        \n        \n        \n ${b.toISOString(\"en-UK\").slice(0, 10)}\n </span>`\n})" }, { - "objectID": "methodology.html#xgboost", - "href": "methodology.html#xgboost", - "title": "Methodology", - "section": "XGBoost", - "text": "XGBoost\n\nIntroduction\nXGBoost is a powerful algorithm that has gained popularity in the field of machine learning due to its ability to handle complex interactions between variables and its flexibility in handling various types of data. In the context of Eurostat’s nowcasting competition, we utilized the XGBoost algorithm to predict the values of the Producer Price Index, the Producer Volume Index and the Number of nights spent at tourist accommodation establishments for most European countries. We will delve here into the technicalities of the XGBoost approach, and how we tailored it to our specific nowcasting problem.\n\n\nXGBoost Algorithm\nXGBoost (Chen and Guestrin (2016)) is a gradient boosting algorithm that is particularly well suited for regression and classification problems. It works by building a sequence of decision trees, each tree trying to correct the errors of the previous tree. During the training process, the algorithm iteratively adds decision trees to the model, where each new tree is fit on the residuals (i.e., the errors) of the previous trees. The final prediction is made by adding the output of all the trees.\nTo control for overfitting, XGBoost uses a combination of L1 and L2 regularization, also known as “lasso” and “ridge” regularization, respectively. These regularization methods add a penalty term to the loss function, which forces the algorithm to find simpler models. L1 regularization shrinks the less important features’ coefficients to zero, while L2 regularization encourages the coefficients to be small, but does not set them to zero. By using both methods, XGBoost is able to produce models that are both accurate and interpretable.\nAnother key feature of XGBoost is its ability to handle missing values. Rather than imputing missing values with a fixed value or mean, XGBoost assigns them a default direction in the split, allowing the algorithm to learn how to handle missing values during the training process.\nOverall, the XGBoost algorithm has proven to be a powerful tool in the field of machine learning, and its ability to handle large datasets and complex interactions between variables make it well-suited for nowcasting problems like the Eurostat competition.\n\n\nTransforming Time Series\nTo apply the XGBoost algorithm to our nowcasting problem, we first transformed the time series data into a larger dataset tailored for the algorithm. We gathered several sources of data, including financial series, macroeconomic series, and surveys, and created a dataset where each row corresponds to a value per country and per date with many explanatory variables. We added lagged versions of the target variable and some of the explanatory variables as additional features. By doing so, we captured the time series properties of the data and made it suitable for the XGBoost algorithm.\n\n\nGrid Search for Hyperparameters\nTo obtain optimal results from the XGBoost algorithm, we used a grid search technique to find the best combination of hyperparameters for each model. We experimented with various values of hyperparameters, including learning rate, maximum depth, and subsample ratio, to determine which combination of parameters resulted in the best performance. The grid search enabled us to identify the best hyperparameters for the model, allowing us to obtain the most accurate predictions. We did not differentiate the hyperparameters for each country as it would have likely caused even more overfitting.\n\n\nTraining XGBoost for Nowcasting\nTo predict our 3 indicators for each country, we trained an XGBoost model for each country independently. We randomly split the data into training and testing sets and trained the model on the training set using the optimal hyperparameters obtained from the grid search. We evaluated the model’s performance on the testing set using various metrics such as mean squared error and mean absolute error." + "objectID": "oil.html#country-specific-forecast-summary-by-model", + "href": "oil.html#country-specific-forecast-summary-by-model", + "title": "European oil and petroleum product deliveries challenge", + "section": "Country-specific forecast summary by model", + "text": "Country-specific forecast summary by model\n\nviewof form = Inputs.form({\n model: Inputs.checkbox(models, {value: models}),\n countries: Inputs.select([\"All\", ...Object.values(country_map)], {multiple: true, value: [\"All\"], width: 50, size: 1})\n})\n\n\n\n\n\n\n\nviewof rows = Inputs.table(summary_table,{\n rows: 25,\n maxWidth: 840,\n multiple: false,\n layout: \"fixed\"\n})\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nsummary_table = format_summary_table_data(pred, form.countries, form.model, country_map)\n\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REG-ARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_summary_table_data,\n get_countries_per_challenge,\n mapping_countries_weights,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nimport {offsetInterval} from '@mootari/offset-slider'\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" }, { - "objectID": "methodology.html#lstm", - "href": "methodology.html#lstm", - "title": "Methodology", - "section": "LSTM", - "text": "LSTM\n\nIntroduction\nLong Short Term Memory (LTSM) networks (Hochreiter and Schmidhuber (1997)) are a particularly interesting kind of recurrent Neural Network (NN) when it comes to time series forecasting. It allows for learning long-term dependencies in the data without losing performances in grasping short term relations. They overcome the main flaw addressed to recurrent NN models which is the unboundedness of the lookback time window that implies limitations in long-term dependencies. LSTM enable to cope with this problem thanks to the incorporation of a cell space that stores long term information that is updated at each step. This update implies incorporating but more importantly getting rid of some information which regulates for the long term dependence. This is done using a particular structure using repeated modules, each of which is composed of four layers that convey information in a particular way (see Figure 2).\n\n\n\nFigure 2: Architecture of a LSTM Unit (Credits: https://d2l.ai/chapter_recurrent-modern/lstm.html)\n\n\n\n\nLTSM model\nDuring the training process, the LSTM model is fed a sequence of inputs, with each input representing a timestep in the time series. The model then generates a corresponding output for each timestep, which is compared to the actual value to compute a loss function. The weights of the model are updated through backpropagation, where the gradient of the loss function is propagated backwards through the network.\nOne of the challenges of using LSTMs for time series forecasting is selecting an appropriate window size, or the number of previous timesteps that the model should consider when making each prediction. A larger window size can capture longer-term trends in the data, but may also introduce more noise and complicate the training process. A smaller window size, on the other hand, may be more sensitive to short-term fluctuations in the data, but may not capture longer-term trends as effectively.\nIn the context of the Eurostat competition, the LSTM approach was used to predict the value of several economic indicators, including the Producer prices in industry, the Producer volume in industry and the Number of nights spent at tourist accommodation establishments. The time series data for each country was transformed into a format that was suitable for LSTM training, with each row representing a single timestep in the time series and the columns representing the various input features.\n\n\nData and estimation\nWe gathered indicators of the macroeconomic environment from different sources. These data include hard : macro variables, financial indicators, economic surveys, prices and soft indicators. The data is transformed into a large dataset, we included macroeconomic series and their lags. The series are all scaled. The LSTM model is trained independently for each country, with a grid search used to find the optimal hyperparameters for each model. Overall, the LSTM approach proved to be a powerful tool for time series forecasting, and its ability to capture long-term dependencies in the data made it particularly well-suited for the nowcasting problem at hand." + "objectID": "reproducibility.html", + "href": "reproducibility.html", + "title": "Reproducibility", + "section": "", + "text": "At the INSEE Innovation Team, we greatly value the concept of reproducibility in scientific research. We understand that reproducibility is crucial for ensuring the credibility and trustworthiness of scientific findings. That’s why we prioritize documenting our research methodologies, data analyses, and experimental processes in a transparent and accessible manner. Our commitment to reproducibility is driven by a desire to contribute to the advancement of scientific knowledge and promote rigorous research practices." }, { - "objectID": "methodology.html#similarities-and-differences-to-state-of-the-art-techniques", - "href": "methodology.html#similarities-and-differences-to-state-of-the-art-techniques", - "title": "Methodology", - "section": "Similarities and differences to State-of-the-Art techniques", - "text": "Similarities and differences to State-of-the-Art techniques\nA few specificities of our work:\n\nA significant portion of our database comprises classic macroeconomic indicators, including prices, surveys, and Brent. All of them are open source.\nFor low-dimensional methods, we ensure minimal control over the consistency of selected variables or signs of estimated coefficients.\nWe have compared different methods and included traditional methods in our analysis, but our research also incorporates methods that rely on recent developments.\nWe exclusively use open data sources, avoiding non-free data aggregators, which ensures high reproducibility. However, data retrieval can be more expensive, and some non-free data cannot be utilized.\nWe made a conscious effort to identify new data sources or indicators. We also incorporate soft data such as Google Trends in our analysis.\nThe methods we use combine data with diverse frequencies, up to weekly indicators (created from daily data), allowing us to improve the precision of our predictions up until the last day of each month." + "objectID": "reproducibility.html#getting-started", + "href": "reproducibility.html#getting-started", + "title": "Reproducibility", + "section": "Getting started ", + "text": "Getting started \nTo ensure full reproducibility of the results, the project is accompanied by a Docker image that contains all the necessary packages and dependencies. You can pull the Docker image using the following command in your terminal after installing Docker :\ndocker pull inseefrlab/esa-nowcasting-2023:latest\nAlternatively, you can use the Onyxia instance SSPCloud (Comte, Degorre, and Lesur (2022)), a datalab developed by the French National Institute of Statistics and Economic Studies (INSEE) that provides an easy-to-use interface for running the Docker image.\nTo get started with SSPCloud:\n\n\nStep 0\n\n: Go to https://datalab.sspcloud.fr/home. Click on Sign In and then Create an account with your academic or institutional email address.\n\nStep 1\n\n: Click here or on the orange badge on top of the page.\n\nStep 2\n\n: Open the service and follow the instructions regarding username and credentials.\n\nStep 3\n\n: Open a new project by clicking the following file: ~/work/ESA-Nowcasting-2023/ESA-Nowcasting-2023.Rproj.\n\nStep 4\n\n: Ensure all necessary packages are installed by executing the renv::restore() command in the console. If prompted to proceed with the installation, enter y.\n\nYou are all set!" }, { - "objectID": "lessons-learned.html", - "href": "lessons-learned.html", - "title": "Lessons Learned", - "section": "", - "text": "Retrieving data is a costly task, as we exclusively use open data sources and avoid non-free data aggregators. In addition to classic macroeconomic indicators that are common to most European countries, identifying interesting indicators specific to certain countries can be expensive. Unfortunately, the short duration of the competition limited our ability to acquire new data sources, such as payment card data, which could have been useful for the tourism challenge. Moreover, for a goal of reproducibility, we decided to exclude non open source data from our scope.\nPost-mortem analysis on errors is crucial. However, in the real-time context of nowcasting challenges, having a track record of past residuals before the start of the challenge is not always straightforward. Economic variables availability can move throughout the month, making it difficult to establish a true track record.\nDepending on the model, taking into account the impact of COVID-19 on estimation is relevant. Otherwise, coefficients could be strongly biased, with the variance of COVID-19 points dominating the total series variance.\nOur approach is mainly neutral regarding the choice of variables, with an automatic selection procedure and a focus on treating all countries. This mainly neutral approach is partially due to a lack of time, but fine-tuning country by country can also be a useful approach.\n“Soft” data, such as Google Trends, appears to provide some information for the tourism challenge, but less so for production prices and production, at least during a “stationary” period.\nUsing nowcasting techniques on disaggregated variables is an interesting option, particularly for prices that have exhibited distinct dynamics across different products in recent times. However, implementing this approach can be expensive as it necessitates the use of different models for each disaggregated level and appropriate re-aggregation for obtaining the final nowcast value. Given our constraints with respect to time, we were unable to explore this approach thoroughly.\nFor most of our models, the last available value of the indicator often has a very big influence, more than we would have thought. Because of this, even in our most recent results we may observe a lag between the true value of the indicators and our predictions based on past data. This shows that we were not able to identify all the external factors influencing the indicators. With more resources and a larger time window, we would still be able to identify some more explicative variables to improve the predictions." + "objectID": "reproducibility.html#codes", + "href": "reproducibility.html#codes", + "title": "Reproducibility", + "section": "Codes", + "text": "Codes\n\nFunctions\nAll functions used in the project are organized by theme in the R/ folder :\nESA-Nowcasting-2023\n└─── R\n │ data_preprocessing.R\n │ data_retrieval.R\n │ dfms_functions.R\n │ ets_functions.R\n │ lstm_functions.R\n │ post_mortem_functions.R\n │ regarima_functions.R\n │ saving_functions.R\n │ XGBoost_functions.R\n\n\n\nConfiguration files\nThe project is composed of three configuration files that enable the operation of the models and the challenges as a whole. The first file, challenges.yaml, contains information about the challenges themselves, including the countries used for each challenge and the current dates.\nThe second file, models.yaml, is the backbone of the project as it contains all of the parameters used for all the models and challenges. This file is responsible for ensuring that the models are appropriately tuned. Any adjustments made to this file can have a significant impact on the accuracy of the models, and thus it is vital that the parameters are fine-tuned carefully.\nFinally, the data.yaml configuration file is responsible for specifying all the relevant information about the data sources used in the challenge. It is essential that this file is accurately updated as changes to data sources or updates can have a significant impact on the accuracy of the models.\n\n\nPipelines\nThe project is deeply relying on the target package, which is a tool for creating and running reproducible pipelines in R. target is particularly useful for managing large or complex data sets, as it allows you to define each task in a pipeline as a separate function, and then run the pipeline by calling the targets::tar_make() function. This ensures that tasks are run in the correct order, and can save time by only running tasks that are out of date or have not been run before.\nThe project is decomposed into four different pipelines specified in the targets_yaml file:\n- data: `run_data.R`\n- gas: `run_gas.R`\n- oil: `run_oil.R`\n- electricity: `run_electricity.R`\nThe first pipeline retrieves all the data necessary for the different challenges, while the other three run the five models for each challenge independently. Each pipeline can be run using the following command: targets::tar_make(script = \"run_***.R\").\n\n\n\n\n\n\nSaving to s3\n\n\n\nNote that the data used for the challenges is stored in a private bucket, and writing permissions are required to run the pipeline as is. Hence, if you don’t have access to our private bucket you have to run all 4 pipelines with the parameter SAVE_TO_S3 equals to False." }, { - "objectID": "utils/utils.html", - "href": "utils/utils.html", - "title": "European Statistics Awards for Nowcasting", - "section": "", - "text": "mapping_countries_weights = [\n {\"Name\": \"Austria\", \"ISO2\": \"AT\", \"GAS\": 1.56, \"OIL\": 1.93, \"ELECTRICITY\": 1.90},\n {\"Name\": \"Belgium\", \"ISO2\": \"BE\", \"GAS\": 1.59, \"OIL\": 1.78, \"ELECTRICITY\": 1.99},\n {\"Name\": \"Bulgaria\", \"ISO2\": \"BG\", \"GAS\": 1.66, \"OIL\": 1.43, \"ELECTRICITY\": 1.06},\n {\"Name\": \"Cyprus\", \"ISO2\": \"CY\", \"GAS\": undefined, \"OIL\": 1.49, \"ELECTRICITY\": 0.50},\n {\"Name\": \"Czech Republic\", \"ISO2\": \"CZ\", \"GAS\": 1.46, \"OIL\": 1.95, \"ELECTRICITY\": 1.60},\n {\"Name\": \"Germany\", \"ISO2\": \"DE\", \"GAS\": 1.55, \"OIL\": 1.86, \"ELECTRICITY\": 1.99},\n {\"Name\": \"Denmark\", \"ISO2\": \"DK\", \"GAS\": 1.47, \"OIL\": 1.74, \"ELECTRICITY\": 1.69},\n {\"Name\": \"Estonia\", \"ISO2\": \"EE\", \"GAS\": 1.28, \"OIL\": 0.89, \"ELECTRICITY\": 1.12},\n {\"Name\": \"Greece\", \"ISO2\": \"EL\", \"GAS\": 1.69, \"OIL\": 1.45, \"ELECTRICITY\": 0.90},\n {\"Name\": \"Spain\", \"ISO2\": \"ES\", \"GAS\": 2.00, \"OIL\": 2.00, \"ELECTRICITY\": 1.65},\n {\"Name\": \"Finland\", \"ISO2\": \"FI\", \"GAS\": 1.62, \"OIL\": 0.50, \"ELECTRICITY\": 1.25},\n {\"Name\": \"France\", \"ISO2\": \"FR\", \"GAS\": 1.40, \"OIL\": 1.77, \"ELECTRICITY\": 0.92},\n {\"Name\": \"Croatia\", \"ISO2\": \"HR\", \"GAS\": 1.64, \"OIL\": 1.67, \"ELECTRICITY\": 1.35},\n {\"Name\": \"Hungary\", \"ISO2\": \"HU\", \"GAS\": 1.46, \"OIL\": 1.75, \"ELECTRICITY\": 2.00},\n {\"Name\": \"Ireland\", \"ISO2\": \"IE\", \"GAS\": 1.07, \"OIL\": 1.22, \"ELECTRICITY\": 1.94},\n {\"Name\": \"Italy\", \"ISO2\": \"IT\", \"GAS\": 1.66, \"OIL\": 1.77, \"ELECTRICITY\": 1.46},\n {\"Name\": \"Lithuania\", \"ISO2\": \"LT\", \"GAS\": 1.36, \"OIL\": 1.69, \"ELECTRICITY\": 1.70},\n {\"Name\": \"Luxembourg\", \"ISO2\": \"LU\", \"GAS\": 1.53, \"OIL\": 1.79, \"ELECTRICITY\": 1.42},\n {\"Name\": \"Latvia\", \"ISO2\": \"LV\", \"GAS\": 1.30, \"OIL\": 1.75, \"ELECTRICITY\": 1.53},\n {\"Name\": \"Malta\", \"ISO2\": \"MT\", \"GAS\": 1.47, \"OIL\": 0.50, \"ELECTRICITY\": 0.91},\n {\"Name\": \"Netherlands\", \"ISO2\": \"NL\", \"GAS\": 1.68, \"OIL\": 1.95, \"ELECTRICITY\": 1.85},\n {\"Name\": \"Poland\", \"ISO2\": \"PL\", \"GAS\": 1.76, \"OIL\": 1.91, \"ELECTRICITY\": 1.96},\n {\"Name\": \"Portugal\", \"ISO2\": \"PT\", \"GAS\": 1.88, \"OIL\": 1.72, \"ELECTRICITY\": 1.48},\n {\"Name\": \"Romania\", \"ISO2\": \"RO\", \"GAS\": 1.61, \"OIL\": 1.48, \"ELECTRICITY\": 1.70},\n {\"Name\": \"Sweden\", \"ISO2\": \"SE\", \"GAS\": 1.30, \"OIL\": 0.51, \"ELECTRICITY\": 1.08},\n {\"Name\": \"Slovenia\", \"ISO2\": \"SI\", \"GAS\": 1.70, \"OIL\": 1.52, \"ELECTRICITY\": 1.90},\n {\"Name\": \"Slovakia\", \"ISO2\": \"SK\", \"GAS\": 0.50, \"OIL\": 1.80, \"ELECTRICITY\": 1.87},\n];\n\n\n\n\n\n\n\nfunction get_weights_per_challenge(mapping, challenge) {\n return mapping.filter(d => d[challenge] != undefined).map(({ Name, ISO2, [challenge]: Weight }) => ({ Name, ISO2, Weight }))\n }\n\n\n\n\n\n\n\nfunction get_countries_per_challenge(mapping, challenge) {\n return mapping\n .filter(d => d[challenge] != undefined)\n .reduce((acc, country) => { acc[country.ISO2] = country.Name;\n return acc; }, {}\n )\n }\n\n\n\n\n\n\n\nmap_country_name = mapping_countries_weights.reduce((acc, country) => {\n acc[country.ISO2] = country.Name;\n return acc;\n}, {})\n\n\n\n\n\n\n\nfunction unique(data, accessor) {\nreturn Array.from(new Set(accessor ? data.map(accessor) : data));\n }\n\n\n\n\n\n\n\ndateParser = (dateString) => {\n const [year, month, day] = dateString.split('-')\n const date = new Date(Date.UTC(year, month - 1, day, 0, 0, 0))\n const timezoneOffset = date.getTimezoneOffset()\n date.setMinutes(date.getMinutes() - timezoneOffset)\n return date\n}\n\n\n\n\n\n\n\nfunction format_historical_data(data, country) {\n\n let data_typed = transpose(data).map( d => (\n {\n date: dateParser(d.time),\n values: d.values,\n geo: d.geo\n } \n )\n )\n \nreturn data_typed.filter(d => d.geo == country);\n }\n\n\n\n\n\n\n\nfunction format_pred_data(data, country) {\n\n let pred_typed = transpose(data).map( d => (\n {\n date: dateParser(d.Date),\n values: d.value,\n geo: d.Country,\n model: d.Entries\n } \n )\n )\n \n return pred_typed.filter(d => d.geo == country);\n }\n\n\n\n\n\n\n\nfunction format_summary_table_data(data, country, model, country_map) {\n let table = transpose(data).map( d => (\n {\n Date: dateParser(d.Date),\n Model: d.Entries,\n Country: d.Country,\n Forecast: d.value\n } \n )\n )\n .map(item => {\n const dateStr = item.Date;\n const date = new Date(dateStr);\n const formattedDate = date.toLocaleString('en-US', { month: 'long', year: 'numeric' });\n return { ...item, Date: formattedDate };\n })\n .map(item => {\n const geoCode = item.Country;\n const countryName = country_map[geoCode];\n return { ...item, Country: countryName };\n })\n .filter(d => model.includes(d.Model) && (country.includes(\"All\") || country.includes(d.Country)));\n\n return table\n }\n\n\n\n\n\n\n\nfunction format_errors_data(data, country) {\n\n let data_formatted = transpose(data).map( d => (\n {\n Date: dateParser(d.time),\n Model: d.Entries,\n Country: d.geo,\n Errors: d.error_squared\n } \n )\n )\n .filter(d => d.Country == country)\n\n return data_formatted.filter(d => d.Country == country);\n }\n\n\n\n\n\n\n\nfunction format_ave_errors_data(data, country) {\n return transpose(data).filter(d => d.geo == country);\n }" + "objectID": "reproducibility.html#replicating-past-results", + "href": "reproducibility.html#replicating-past-results", + "title": "Reproducibility", + "section": "Replicating past results", + "text": "Replicating past results\nWe have made it a priority to ensure the full reproducibility of all our past submissions. In order to achieve this, we have taken the necessary steps to automatically save the data used for each specific submission in a publicly accessible S3 bucket. This allows anyone to easily access the exact datasets that were utilized in our analyses. In the event that there have been changes to the model codes, it is simply a matter of checking out the commit corresponding to the submission date and adjusting the relevant date variables in the challenges.yaml configuration file. By combining the code retrieval with the availability of the specific datasets, we have established a robust framework that enables the replication and verification of our past results. This commitment to transparency and reproducibility ensures that the findings and outcomes of our submissions can be reliably validated and built upon by anyone.\n\n\n\n\n\n\nAny issues?\n\n\n\nIf you encounter any difficulties or require assistance in replicating our past results, please do not hesitate to reach out to us. We understand that the replication process can sometimes be challenging, and we are here to provide support and guidance. Our team is available to answer any questions, clarify any uncertainties, and offer further explanations regarding the methodologies, data, or code used in our previous submissions." }, { "objectID": "post-mortem-electricity.html", @@ -90,55 +76,6 @@ "section": "Mean square relative error", "text": "Mean square relative error\nThis interactive graph displays the mean square relative error for each of the models used in the challenge, ranked by their performance from the least accurate to the most accurate. The mean square relative error is a statistical measure that provides an average of the square relative error across all the forecasts made by a given model.\nThese errors can be weighted by a factor, as was the case in the official evaluation of the challenge. The role of the weights is to reflect the difficulty of predicting the point estimate of the target variable for the corresponding country.\n\\[\nMSRE = \\frac{1}{n}\\sum_\\limits{i=1}^n\\left(\\frac{Y_i - R_i}{R_i}\\right)^2\n\\]\n\nviewof doweighted = Inputs.toggle({label: \"Weighted mean\", value: false})\n\n\n\n\n\n\n\nPlot.plot({\n x: {\n domain: d3.sort(ave_errors, d => -d.MSRE).map(d => d.Entries),\n label: null\n },\n y: {\n grid: true,\n transform: doweighted ? d => d / 1e6 * weight : d => d / 1e6\n\n },\n color: {\n legend: true\n },\n marks: [\n Plot.barY(ave_errors, {\n tip: true,\n x: \"Entries\", \n y: \"MSRE\",\n fill: \"Entries\",\n sort: {\n x: {value: \"y\", reverse: true}\n },\n title: (d) =>\n\n `${d.Entries}: \\n ${doweighted ? Math.round(d.MSRE) / 1e6 * weight : Math.round(d.MSRE) / 1e6} millions (n = ${d.N}) `\n }),\n Plot.ruleY([0])\n ]\n })\n\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nerrors = format_errors_data(errors_data, country_iso)\n\n\n\n\n\n\n\nave_errors = format_ave_errors_data(ave_errors_data, country_iso)\n\n\n\n\n\n\n\n\nweight = get_weights_per_challenge(mapping_countries_weights, challenge).filter(d => d.Name == country)[0].Weight\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REGARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_errors_data,\n format_ave_errors_data,\n mapping_countries_weights,\n get_countries_per_challenge,\n get_weights_per_challenge,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" }, - { - "objectID": "reproducibility.html", - "href": "reproducibility.html", - "title": "Reproducibility", - "section": "", - "text": "At the INSEE Innovation Team, we greatly value the concept of reproducibility in scientific research. We understand that reproducibility is crucial for ensuring the credibility and trustworthiness of scientific findings. That’s why we prioritize documenting our research methodologies, data analyses, and experimental processes in a transparent and accessible manner. Our commitment to reproducibility is driven by a desire to contribute to the advancement of scientific knowledge and promote rigorous research practices." - }, - { - "objectID": "reproducibility.html#getting-started", - "href": "reproducibility.html#getting-started", - "title": "Reproducibility", - "section": "Getting started ", - "text": "Getting started \nTo ensure full reproducibility of the results, the project is accompanied by a Docker image that contains all the necessary packages and dependencies. You can pull the Docker image using the following command in your terminal after installing Docker :\ndocker pull inseefrlab/esa-nowcasting-2023:latest\nAlternatively, you can use the Onyxia instance SSPCloud (Comte, Degorre, and Lesur (2022)), a datalab developed by the French National Institute of Statistics and Economic Studies (INSEE) that provides an easy-to-use interface for running the Docker image.\nTo get started with SSPCloud:\n\n\nStep 0\n\n: Go to https://datalab.sspcloud.fr/home. Click on Sign In and then Create an account with your academic or institutional email address.\n\nStep 1\n\n: Click here or on the orange badge on top of the page.\n\nStep 2\n\n: Open the service and follow the instructions regarding username and credentials.\n\nStep 3\n\n: Open a new project by clicking the following file: ~/work/ESA-Nowcasting-2023/ESA-Nowcasting-2023.Rproj.\n\nStep 4\n\n: Ensure all necessary packages are installed by executing the renv::restore() command in the console. If prompted to proceed with the installation, enter y.\n\nYou are all set!" - }, - { - "objectID": "reproducibility.html#codes", - "href": "reproducibility.html#codes", - "title": "Reproducibility", - "section": "Codes", - "text": "Codes\n\nFunctions\nAll functions used in the project are organized by theme in the R/ folder :\nESA-Nowcasting-2023\n└─── R\n │ data_preprocessing.R\n │ data_retrieval.R\n │ dfms_functions.R\n │ ets_functions.R\n │ lstm_functions.R\n │ post_mortem_functions.R\n │ regarima_functions.R\n │ saving_functions.R\n │ XGBoost_functions.R\n\n\n\nConfiguration files\nThe project is composed of three configuration files that enable the operation of the models and the challenges as a whole. The first file, challenges.yaml, contains information about the challenges themselves, including the countries used for each challenge and the current dates.\nThe second file, models.yaml, is the backbone of the project as it contains all of the parameters used for all the models and challenges. This file is responsible for ensuring that the models are appropriately tuned. Any adjustments made to this file can have a significant impact on the accuracy of the models, and thus it is vital that the parameters are fine-tuned carefully.\nFinally, the data.yaml configuration file is responsible for specifying all the relevant information about the data sources used in the challenge. It is essential that this file is accurately updated as changes to data sources or updates can have a significant impact on the accuracy of the models.\n\n\nPipelines\nThe project is deeply relying on the target package, which is a tool for creating and running reproducible pipelines in R. target is particularly useful for managing large or complex data sets, as it allows you to define each task in a pipeline as a separate function, and then run the pipeline by calling the targets::tar_make() function. This ensures that tasks are run in the correct order, and can save time by only running tasks that are out of date or have not been run before.\nThe project is decomposed into four different pipelines specified in the targets_yaml file:\n- data: `run_data.R`\n- gas: `run_gas.R`\n- oil: `run_oil.R`\n- electricity: `run_electricity.R`\nThe first pipeline retrieves all the data necessary for the different challenges, while the other three run the five models for each challenge independently. Each pipeline can be run using the following command: targets::tar_make(script = \"run_***.R\").\n\n\n\n\n\n\nSaving to s3\n\n\n\nNote that the data used for the challenges is stored in a private bucket, and writing permissions are required to run the pipeline as is. Hence, if you don’t have access to our private bucket you have to run all 4 pipelines with the parameter SAVE_TO_S3 equals to False." - }, - { - "objectID": "reproducibility.html#replicating-past-results", - "href": "reproducibility.html#replicating-past-results", - "title": "Reproducibility", - "section": "Replicating past results", - "text": "Replicating past results\nWe have made it a priority to ensure the full reproducibility of all our past submissions. In order to achieve this, we have taken the necessary steps to automatically save the data used for each specific submission in a publicly accessible S3 bucket. This allows anyone to easily access the exact datasets that were utilized in our analyses. In the event that there have been changes to the model codes, it is simply a matter of checking out the commit corresponding to the submission date and adjusting the relevant date variables in the challenges.yaml configuration file. By combining the code retrieval with the availability of the specific datasets, we have established a robust framework that enables the replication and verification of our past results. This commitment to transparency and reproducibility ensures that the findings and outcomes of our submissions can be reliably validated and built upon by anyone.\n\n\n\n\n\n\nAny issues?\n\n\n\nIf you encounter any difficulties or require assistance in replicating our past results, please do not hesitate to reach out to us. We understand that the replication process can sometimes be challenging, and we are here to provide support and guidance. Our team is available to answer any questions, clarify any uncertainties, and offer further explanations regarding the methodologies, data, or code used in our previous submissions." - }, - { - "objectID": "oil.html", - "href": "oil.html", - "title": "European oil and petroleum product deliveries challenge", - "section": "", - "text": "challenge = \"OIL\"\nThe monthly PVI indicator represents the production index in industry. The objective of the index is to measure changes in the volume of output at monthly intervals. It provides a measure of the volume trend in value added over a given reference period. The production index is calculated in the form of a Laspeyres type index." - }, - { - "objectID": "oil.html#next-forecast-for-a-given-country", - "href": "oil.html#next-forecast-for-a-given-country", - "title": "European oil and petroleum product deliveries challenge", - "section": "Next forecast for a given country", - "text": "Next forecast for a given country\n\nviewof country = Inputs.select(Object.values(country_map), {\n label: html`<b>Select a country:</b>`,\n placeholder: \"Enter a country name\",\n unique: true\n })\n\n\n\n\n\n\n\n\n\nPlot.plot({\n grid: true,\n y: {\n label: \"↑ Production volume in industry\",\n }, \n x: {\n label: \"Year\",\n domain: range\n },\n marks: [\n Plot.line(historical, {\n tip: true,\n x: \"date\", \n y: \"values\", \n stroke: \"black\",\n title: (d) =>\n `${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })}\\n ${d.values} `\n }),\n Plot.dot(predictions, {\n tip: true,\n x: \"date\", \n y: \"values\",\n fill: \"model\",\n title: (d) =>\n `${d.model}\\n ${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })} : ${d.values} `\n })\n ],\n color: {legend: true}\n})\n\n\n\n\n\n\n\n\n\n\n\ndates = {\n const data = historical.map(d => d.date)\n data.push(predictions.map(d => d.date)[0])\n return data \n}\n\nviewof range = offsetInterval(dates, {\n value: [ dates[dates.length-90], dates[dates.length-1] ],\n format: ([a, b]) => htl.html`<span ${{\n style: \"display: flex; justify-content: space-between\"\n }}>\n ${a.toISOString(\"en-UK\").slice(0, 10)}\n        \n        \n        \n        \n        \n ${b.toISOString(\"en-UK\").slice(0, 10)}\n </span>`\n})" - }, - { - "objectID": "oil.html#country-specific-forecast-summary-by-model", - "href": "oil.html#country-specific-forecast-summary-by-model", - "title": "European oil and petroleum product deliveries challenge", - "section": "Country-specific forecast summary by model", - "text": "Country-specific forecast summary by model\n\nviewof form = Inputs.form({\n model: Inputs.checkbox(models, {value: models}),\n countries: Inputs.select([\"All\", ...Object.values(country_map)], {multiple: true, value: [\"All\"], width: 50, size: 1})\n})\n\n\n\n\n\n\n\nviewof rows = Inputs.table(summary_table,{\n rows: 25,\n maxWidth: 840,\n multiple: false,\n layout: \"fixed\"\n})\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nsummary_table = format_summary_table_data(pred, form.countries, form.model, country_map)\n\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REG-ARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_summary_table_data,\n get_countries_per_challenge,\n mapping_countries_weights,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nimport {offsetInterval} from '@mootari/offset-slider'\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" - }, { "objectID": "post-mortem-gas.html", "href": "post-mortem-gas.html", @@ -167,6 +104,97 @@ "section": "Mean square relative error", "text": "Mean square relative error\nThis interactive graph displays the mean square relative error for each of the models used in the challenge, ranked by their performance from the least accurate to the most accurate. The mean square relative error is a statistical measure that provides an average of the square relative error across all the forecasts made by a given model.\nThese errors can be weighted by a factor, as was the case in the official evaluation of the challenge. The role of the weights is to reflect the difficulty of predicting the point estimate of the target variable for the corresponding country.\n\\[\nMSRE = \\frac{1}{n}\\sum_\\limits{i=1}^n\\left(\\frac{Y_i - R_i}{R_i}\\right)^2\n\\]\n\nviewof doweighted = Inputs.toggle({label: \"Weighted mean\", value: false})\n\n\n\n\n\n\n\nPlot.plot({\n x: {\n domain: d3.sort(ave_errors, d => -d.MSRE).map(d => d.Entries),\n label: null\n },\n y: {\n grid: true,\n transform: doweighted ? d => d * weight : null\n },\n color: {\n legend: true\n },\n marks: [\n Plot.barY(ave_errors, {\n tip: true, \n x: \"Entries\", \n y: \"MSRE\",\n fill: \"Entries\",\n sort: {\n x: {value: \"y\", reverse: true}\n },\n title: (d) =>\n `${d.Entries}: \\n ${doweighted ? Math.round(d.MSRE * weight * 10000) / 10000 : Math.round(d.MSRE * 10000) / 10000} (n = ${d.N}) `\n }),\n Plot.ruleY([0])\n ]\n })\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nerrors = format_errors_data(errors_data, country_iso)\n\n\n\n\n\n\n\nave_errors = format_ave_errors_data(ave_errors_data, country_iso)\n\n\n\n\n\n\n\n\nweight = get_weights_per_challenge(mapping_countries_weights, challenge).filter(d => d.Name == country)[0].Weight\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REGARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_errors_data,\n format_ave_errors_data,\n mapping_countries_weights,\n get_countries_per_challenge,\n get_weights_per_challenge,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" }, + { + "objectID": "utils/utils.html", + "href": "utils/utils.html", + "title": "European Statistics Awards for Nowcasting", + "section": "", + "text": "mapping_countries_weights = [\n {\"Name\": \"Austria\", \"ISO2\": \"AT\", \"GAS\": 1.56, \"OIL\": 1.93, \"ELECTRICITY\": 1.90},\n {\"Name\": \"Belgium\", \"ISO2\": \"BE\", \"GAS\": 1.59, \"OIL\": 1.78, \"ELECTRICITY\": 1.99},\n {\"Name\": \"Bulgaria\", \"ISO2\": \"BG\", \"GAS\": 1.66, \"OIL\": 1.43, \"ELECTRICITY\": 1.06},\n {\"Name\": \"Cyprus\", \"ISO2\": \"CY\", \"GAS\": undefined, \"OIL\": 1.49, \"ELECTRICITY\": 0.50},\n {\"Name\": \"Czech Republic\", \"ISO2\": \"CZ\", \"GAS\": 1.46, \"OIL\": 1.95, \"ELECTRICITY\": 1.60},\n {\"Name\": \"Germany\", \"ISO2\": \"DE\", \"GAS\": 1.55, \"OIL\": 1.86, \"ELECTRICITY\": 1.99},\n {\"Name\": \"Denmark\", \"ISO2\": \"DK\", \"GAS\": 1.47, \"OIL\": 1.74, \"ELECTRICITY\": 1.69},\n {\"Name\": \"Estonia\", \"ISO2\": \"EE\", \"GAS\": 1.28, \"OIL\": 0.89, \"ELECTRICITY\": 1.12},\n {\"Name\": \"Greece\", \"ISO2\": \"EL\", \"GAS\": 1.69, \"OIL\": 1.45, \"ELECTRICITY\": 0.90},\n {\"Name\": \"Spain\", \"ISO2\": \"ES\", \"GAS\": 2.00, \"OIL\": 2.00, \"ELECTRICITY\": 1.65},\n {\"Name\": \"Finland\", \"ISO2\": \"FI\", \"GAS\": 1.62, \"OIL\": 0.50, \"ELECTRICITY\": 1.25},\n {\"Name\": \"France\", \"ISO2\": \"FR\", \"GAS\": 1.40, \"OIL\": 1.77, \"ELECTRICITY\": 0.92},\n {\"Name\": \"Croatia\", \"ISO2\": \"HR\", \"GAS\": 1.64, \"OIL\": 1.67, \"ELECTRICITY\": 1.35},\n {\"Name\": \"Hungary\", \"ISO2\": \"HU\", \"GAS\": 1.46, \"OIL\": 1.75, \"ELECTRICITY\": 2.00},\n {\"Name\": \"Ireland\", \"ISO2\": \"IE\", \"GAS\": 1.07, \"OIL\": 1.22, \"ELECTRICITY\": 1.94},\n {\"Name\": \"Italy\", \"ISO2\": \"IT\", \"GAS\": 1.66, \"OIL\": 1.77, \"ELECTRICITY\": 1.46},\n {\"Name\": \"Lithuania\", \"ISO2\": \"LT\", \"GAS\": 1.36, \"OIL\": 1.69, \"ELECTRICITY\": 1.70},\n {\"Name\": \"Luxembourg\", \"ISO2\": \"LU\", \"GAS\": 1.53, \"OIL\": 1.79, \"ELECTRICITY\": 1.42},\n {\"Name\": \"Latvia\", \"ISO2\": \"LV\", \"GAS\": 1.30, \"OIL\": 1.75, \"ELECTRICITY\": 1.53},\n {\"Name\": \"Malta\", \"ISO2\": \"MT\", \"GAS\": 1.47, \"OIL\": 0.50, \"ELECTRICITY\": 0.91},\n {\"Name\": \"Netherlands\", \"ISO2\": \"NL\", \"GAS\": 1.68, \"OIL\": 1.95, \"ELECTRICITY\": 1.85},\n {\"Name\": \"Poland\", \"ISO2\": \"PL\", \"GAS\": 1.76, \"OIL\": 1.91, \"ELECTRICITY\": 1.96},\n {\"Name\": \"Portugal\", \"ISO2\": \"PT\", \"GAS\": 1.88, \"OIL\": 1.72, \"ELECTRICITY\": 1.48},\n {\"Name\": \"Romania\", \"ISO2\": \"RO\", \"GAS\": 1.61, \"OIL\": 1.48, \"ELECTRICITY\": 1.70},\n {\"Name\": \"Sweden\", \"ISO2\": \"SE\", \"GAS\": 1.30, \"OIL\": 0.51, \"ELECTRICITY\": 1.08},\n {\"Name\": \"Slovenia\", \"ISO2\": \"SI\", \"GAS\": 1.70, \"OIL\": 1.52, \"ELECTRICITY\": 1.90},\n {\"Name\": \"Slovakia\", \"ISO2\": \"SK\", \"GAS\": 0.50, \"OIL\": 1.80, \"ELECTRICITY\": 1.87},\n];\n\n\n\n\n\n\n\nfunction get_weights_per_challenge(mapping, challenge) {\n return mapping.filter(d => d[challenge] != undefined).map(({ Name, ISO2, [challenge]: Weight }) => ({ Name, ISO2, Weight }))\n }\n\n\n\n\n\n\n\nfunction get_countries_per_challenge(mapping, challenge) {\n return mapping\n .filter(d => d[challenge] != undefined)\n .reduce((acc, country) => { acc[country.ISO2] = country.Name;\n return acc; }, {}\n )\n }\n\n\n\n\n\n\n\nmap_country_name = mapping_countries_weights.reduce((acc, country) => {\n acc[country.ISO2] = country.Name;\n return acc;\n}, {})\n\n\n\n\n\n\n\nfunction unique(data, accessor) {\nreturn Array.from(new Set(accessor ? data.map(accessor) : data));\n }\n\n\n\n\n\n\n\ndateParser = (dateString) => {\n const [year, month, day] = dateString.split('-')\n const date = new Date(Date.UTC(year, month - 1, day, 0, 0, 0))\n const timezoneOffset = date.getTimezoneOffset()\n date.setMinutes(date.getMinutes() - timezoneOffset)\n return date\n}\n\n\n\n\n\n\n\nfunction format_historical_data(data, country) {\n\n let data_typed = transpose(data).map( d => (\n {\n date: dateParser(d.time),\n values: d.values,\n geo: d.geo\n } \n )\n )\n \nreturn data_typed.filter(d => d.geo == country);\n }\n\n\n\n\n\n\n\nfunction format_pred_data(data, country) {\n\n let pred_typed = transpose(data).map( d => (\n {\n date: dateParser(d.Date),\n values: d.value,\n geo: d.Country,\n model: d.Entries\n } \n )\n )\n \n return pred_typed.filter(d => d.geo == country);\n }\n\n\n\n\n\n\n\nfunction format_summary_table_data(data, country, model, country_map) {\n let table = transpose(data).map( d => (\n {\n Date: dateParser(d.Date),\n Model: d.Entries,\n Country: d.Country,\n Forecast: d.value\n } \n )\n )\n .map(item => {\n const dateStr = item.Date;\n const date = new Date(dateStr);\n const formattedDate = date.toLocaleString('en-US', { month: 'long', year: 'numeric' });\n return { ...item, Date: formattedDate };\n })\n .map(item => {\n const geoCode = item.Country;\n const countryName = country_map[geoCode];\n return { ...item, Country: countryName };\n })\n .filter(d => model.includes(d.Model) && (country.includes(\"All\") || country.includes(d.Country)));\n\n return table\n }\n\n\n\n\n\n\n\nfunction format_errors_data(data, country) {\n\n let data_formatted = transpose(data).map( d => (\n {\n Date: dateParser(d.time),\n Model: d.Entries,\n Country: d.geo,\n Errors: d.error_squared\n } \n )\n )\n .filter(d => d.Country == country)\n\n return data_formatted.filter(d => d.Country == country);\n }\n\n\n\n\n\n\n\nfunction format_ave_errors_data(data, country) {\n return transpose(data).filter(d => d.geo == country);\n }" + }, + { + "objectID": "methodology.html", + "href": "methodology.html", + "title": "Methodology", + "section": "", + "text": "Our team employed a comprehensive methodology utilizing both standard econometric models and machine learning type models to accurately nowcast the target variables. Specifically, we utilized a combination of three econometric models, including RegArima, Exponential Smoothing (ETS), and Dynamic Factor Models (DFM), as well as two machine learning models, XGBoost and Long Short-Term Memory (LSTM) models, to ensure robust and accurate forecasting. While we applied the same five model classes across all three challenges, including GAS, OIL, and ELECTRICITY, we tailored the specific datasets and parameters to each challenge to optimize model performance. This approach allowed us to leverage the strengths of both traditional econometric models and cutting-edge ML techniques to achieve the best possible forecasting results." + }, + { + "objectID": "methodology.html#regarima", + "href": "methodology.html#regarima", + "title": "Methodology", + "section": "Regarima", + "text": "Regarima\n\nIntroduction\nARIMA modelling is a common type of time models used to capture internal information of a series, wether is is stationary or non stationary. ARIMA models offer some advantages in this exercise: they are relatively simple, easy to interpret, and therefore provide a useful benchmark for more complex models. As standard ARIMA models do not include external information, we use the extended version RegARIMA with external regressors (regression model with ARIMA errors, see Chatfield and Prothero (1973)):\n\\[\ny_t = c + \\sum_{i=1}^{p}\\alpha_i x^i_{t}+u_t\n\\] \\[\n\\phi(L)(u_t) = \\theta(L)(\\varepsilon_t)\n\\]\nAdditional regressors used in this project are of two types:\n\nEconomic variables presented in the data section. REGARIMA models must keep parsimonious (contrary to other methods implemented in this project), so relevant variables are selected through standard selection procedures or a priori.\nOutlier variables such as level-shift or additive outilers to control for atypical observations.\n\nModels are applied to the first differentiation of the variable of interest.\n\n\nAutomatic identification and nowcasting\nWe use the RJdemetra package (the TRAMO part) that provide an easy way to identify and estimate RegARIMA models with high flexibility on parameters.\n\nwe allow for automatic detection of outliers on the estimation period in order to avoid large bias on coefficients. Without adding outliers, “covid” points, for instance, would totally distort coefficients. Outliers identified through this procedure could in addition be used in other methods\ncomputation time is fast (a few seconds for the whole set of countries)\nexternal regressors are selected independently, a priori or through standard variable selection procedure.\n\nThe final nowcast value is provided by the projection of the model at horizon 1, 2 or 3, depending on the challenge and the position in the month.\n\n\nSeasonal adjustment for electricity\nFor electricity, the seasonal component is very strong and may not be the same as the explanatory variables. An X13 pre-treatment is applied to seasonnally adjust and correct for trading days the target variable (Number of nights spent at tourist accommodation establishments) and the potential explanatory variables. This treatment also provides a prediction for the seasonal coefficient of the nowcasted month.\nNext, seasonally adjusted variables are put in the REGARIMA model. The final step involves nowcasting the “raw value” by dividing the SA forecast by the projected seasonal coefficient." + }, + { + "objectID": "methodology.html#dynamic-factor-models", + "href": "methodology.html#dynamic-factor-models", + "title": "Methodology", + "section": "Dynamic Factor Models", + "text": "Dynamic Factor Models\n\nIntroduction\nDynamic Factor Models (DFM) are a powerful and versatile approach for nowcasting, which involves extracting latent factors from a large set of observed economic or financial indicators. These latent factors capture the underlying dynamics of the data and are used to generate forecasts or predictions in real-time.\nDFM are based on the idea that a small number of unobservable factors drive the behavior of a large number of observed variables. These factors represent the common underlying movements in the data and can be interpreted as representing the state of the economy or the financial system at a given point in time. By estimating these latent factors from historical data, DFM allows us to capture the relevant information embedded in the observed indicators and use it to generate accurate nowcasts.\nA standard dynamic factor model involves 2 main equations.\n\nThe factor equation (Equation 1): This equation represents the dynamics of the latent factors, which are unobservable variables that capture the common underlying movements in the observed data. The factor equation is usually specified as a dynamic system allowing the unobserved factors \\(F_t\\) to evolve according to a \\(VAR(p)\\) process, and can be written as: \\[\nF_t = \\sum_{j=1}^pA_{j}F_{t-j} + \\eta_t, \\qquad \\eta_t \\sim N(0, \\Sigma_{0})\n\\tag{1}\\] where \\(F_t\\) represents the vector of latent factors at time \\(t\\), \\(A_{j}\\) is the (state-) transition matrix capturing the dynamics of the factors at lag \\(j\\), \\(F_{t-1}\\) is the vector of factors at time \\(t-1\\), and \\(\\Sigma_{0}\\) is the (state-) covariance matrix.\nThe measurement equation (Equation 2): This equation links the latent factors to the observed variables. It specifies how the observed variables are generated from the latent factors and can be written as: \\[\nX_t = \\Lambda F_{t} + \\xi_t, \\qquad \\xi_t \\sim N(0, R)\n\\tag{2}\\] where \\(X_t\\) represents the vector of observed variables at time \\(t\\), \\(\\Lambda\\) is the factor loading matrix, linking the factors to the observed variables, \\(\\xi_t\\) is the vector of measurement errors at time \\(t\\) and \\(R\\) is the (measurement-) covariance matrix.\n\n\n\n\n\n\n\n\n\nMatrices\nSizes\nDescriptions\n\n\n\n\n\\(F_t\\)\n\\(n \\times 1\\)\nVector of factors at time t \\((f_{1t}, \\ldots, f_{nt})'\\)\n\n\n\\(A_j\\)\n\\(r \\times r\\)\nState transition matrix at lag \\(j\\)\n\n\n\\(\\Sigma_0\\)\n\\(r \\times r\\)\nState covariance matrix\n\n\n\\(X_t\\)\n\\(n \\times 1\\)\nVector of observed series at time t \\((x_{1t}, \\ldots, x_{nt})'\\)\n\n\n\\(\\Lambda\\)\n\\(n \\times r\\)\nFactor loading matrix\n\n\n\\(R\\)\n\\(n \\times n\\)\nMeasurement covariance matrix\n\n\n\\(r\\)\n\\(1 \\times 1\\)\nNumber of factors\n\n\n\\(p\\)\n\\(1 \\times 1\\)\nNumber of lags\n\n\n\n\n\nData used and estimation\nFor the estimation of our Dynamics Factor Models (DFM), we utilized three main sources of data to capture different aspects of the economic and financial activity. These data sources include Eurostat data for economic activity, financial data obtained using the Yahoo API for financial activity, and Google Trends data for capturing more recent evolutions1. By incorporating these three main sources of data, we aim to capture different aspects of the economic and financial activity, ranging from long-term trends to short-term fluctuations and recent evolutions. This multi-source data approach allows us to build a more comprehensive and robust DFM model, which can provide more accurate and timely nowcasting predictions for the variables of interest.1 You can refer to the Data section for more extensive details on the data that has been used.\nFor the estimation of our various Dynamics Factor Models (DFM), we relied on the “dfms” package in R, a powerful and user-friendly tool for estimating DFMs from time series data. The “dfms” package provides convenient and efficient methods for DFM estimation, making it an invaluable resource for our nowcasting project. Its user-friendly interface and optimized algorithms make it easy to implement and customize DFM models, while its efficient computational capabilities enable us to handle large datasets with ease. It allows different type of estimations (see Doz, Giannone, and Reichlin (2011), Doz, Giannone, and Reichlin (2012) and Banbura and Modugno (2014)). We really want to thanks Sebastian Krantz, the creator of this package.\nTo ensure the robustness and accuracy of our DFM estimation, we took several steps beforehand the estimation pipeline of the “dfms” package. Since there are no trend or intercept terms in Equation 1 and Equation 2, we made \\(X_t\\) stationary by taking a first difference. Note that \\(X_t\\) is also standardized (scaled and centered) automatically by the “dfms” package.\nWe also pay attention to the availability of the data to ensure that each observed series have sufficient data for estimation. If any series have inadequate data, it is removed from the estimation process to prevent biased results. We also account for potential collinearities that could occur among several economic variables. Highly correlated series (\\(\\rho \\geq 0.9999\\)) are removed to mitigate multicollinearity issues and improve estimation accuracy.\nTo determine the optimal values for the number of lags and factors in our DFM models, we use the Bai and Ng (2002) criteria, which provides statistical guidelines for selecting these parameters. We set a maximum limit of 4 lags and 2 factors for computational efficiency reasons.\nWe initiate the estimation process from February 2005 for all challenges, and for the ELECTRICITY challenge, we performed a seasonal adjustment on the target series prior to estimation in order to account for any seasonal patterns in the data.\n\n\nNowcasting\nOnce the DFM is estimated (\\(A_j\\), \\(\\Sigma_0\\), \\(\\Lambda\\), \\(R\\)), it can be used for nowcasting by forecasting the latent factors using the factor equation, and then generating nowcasts for the observed variables using the measurement equation. The predictions can be updated in real-time as new data becomes available, allowing for timely and accurate predictions of the variables of interest." + }, + { + "objectID": "methodology.html#ets", + "href": "methodology.html#ets", + "title": "Methodology", + "section": "ETS", + "text": "ETS\nExponential smoothing models (Hyndman et al. (2008)) are a class of models where forecasts are linear combinations of past values, with the weights decaying exponentially as the observations get older. Therefore, the more recent the observation is, the higher the associated weight is. Moreover, exponential smoothing models do not require any external data.\nThe exponential smoothing models used are a combination of three components:\n\nAn error: additive (\\(A\\)) or multiplicative (\\(M\\)).\nA trend: additive (\\(A\\)), multiplicative (\\(M\\)), damped (\\(A_d\\) or \\(M_d\\)) or absent (\\(N\\)).\nA seasonality: additive (\\(A\\)), multiplicative \\(M\\) or absent (\\(N\\)). The seasonal component is only used with electricity data.\n\nSee Figure 1 for the description of the model.\n\n\n\n\n\n\n\n(a) Additive error\n\n\n\n\n\n\n\n\n\n(b) Multiplicative error\n\n\n\n\n\n\nSource: Hyndman and Athanasopoulos (2018) \n\n\nFigure 1: Exponential smoothing models.\n\n\nFor each series, the model is selected minimising the Akaike’s Information Criterion (Akaike (1974)) and parameters are estimated maximising the likelihood." + }, + { + "objectID": "methodology.html#xgboost", + "href": "methodology.html#xgboost", + "title": "Methodology", + "section": "XGBoost", + "text": "XGBoost\n\nIntroduction\nXGBoost is a powerful algorithm that has gained popularity in the field of machine learning due to its ability to handle complex interactions between variables and its flexibility in handling various types of data. In the context of Eurostat’s nowcasting competition, we utilized the XGBoost algorithm to predict the values of the Producer Price Index, the Producer Volume Index and the Number of nights spent at tourist accommodation establishments for most European countries. We will delve here into the technicalities of the XGBoost approach, and how we tailored it to our specific nowcasting problem.\n\n\nXGBoost Algorithm\nXGBoost (Chen and Guestrin (2016)) is a gradient boosting algorithm that is particularly well suited for regression and classification problems. It works by building a sequence of decision trees, each tree trying to correct the errors of the previous tree. During the training process, the algorithm iteratively adds decision trees to the model, where each new tree is fit on the residuals (i.e., the errors) of the previous trees. The final prediction is made by adding the output of all the trees.\nTo control for overfitting, XGBoost uses a combination of L1 and L2 regularization, also known as “lasso” and “ridge” regularization, respectively. These regularization methods add a penalty term to the loss function, which forces the algorithm to find simpler models. L1 regularization shrinks the less important features’ coefficients to zero, while L2 regularization encourages the coefficients to be small, but does not set them to zero. By using both methods, XGBoost is able to produce models that are both accurate and interpretable.\nAnother key feature of XGBoost is its ability to handle missing values. Rather than imputing missing values with a fixed value or mean, XGBoost assigns them a default direction in the split, allowing the algorithm to learn how to handle missing values during the training process.\nOverall, the XGBoost algorithm has proven to be a powerful tool in the field of machine learning, and its ability to handle large datasets and complex interactions between variables make it well-suited for nowcasting problems like the Eurostat competition.\n\n\nTransforming Time Series\nTo apply the XGBoost algorithm to our nowcasting problem, we first transformed the time series data into a larger dataset tailored for the algorithm. We gathered several sources of data, including financial series, macroeconomic series, and surveys, and created a dataset where each row corresponds to a value per country and per date with many explanatory variables. We added lagged versions of the target variable and some of the explanatory variables as additional features. By doing so, we captured the time series properties of the data and made it suitable for the XGBoost algorithm.\n\n\nGrid Search for Hyperparameters\nTo obtain optimal results from the XGBoost algorithm, we used a grid search technique to find the best combination of hyperparameters for each model. We experimented with various values of hyperparameters, including learning rate, maximum depth, and subsample ratio, to determine which combination of parameters resulted in the best performance. The grid search enabled us to identify the best hyperparameters for the model, allowing us to obtain the most accurate predictions. We did not differentiate the hyperparameters for each country as it would have likely caused even more overfitting.\n\n\nTraining XGBoost for Nowcasting\nTo predict our 3 indicators for each country, we trained an XGBoost model for each country independently. We randomly split the data into training and testing sets and trained the model on the training set using the optimal hyperparameters obtained from the grid search. We evaluated the model’s performance on the testing set using various metrics such as mean squared error and mean absolute error." + }, + { + "objectID": "methodology.html#lstm", + "href": "methodology.html#lstm", + "title": "Methodology", + "section": "LSTM", + "text": "LSTM\n\nIntroduction\nLong Short Term Memory (LTSM) networks (Hochreiter and Schmidhuber (1997)) are a particularly interesting kind of recurrent Neural Network (NN) when it comes to time series forecasting. It allows for learning long-term dependencies in the data without losing performances in grasping short term relations. They overcome the main flaw addressed to recurrent NN models which is the unboundedness of the lookback time window that implies limitations in long-term dependencies. LSTM enable to cope with this problem thanks to the incorporation of a cell space that stores long term information that is updated at each step. This update implies incorporating but more importantly getting rid of some information which regulates for the long term dependence. This is done using a particular structure using repeated modules, each of which is composed of four layers that convey information in a particular way (see Figure 2).\n\n\n\nFigure 2: Architecture of a LSTM Unit (Credits: https://d2l.ai/chapter_recurrent-modern/lstm.html)\n\n\n\n\nLTSM model\nDuring the training process, the LSTM model is fed a sequence of inputs, with each input representing a timestep in the time series. The model then generates a corresponding output for each timestep, which is compared to the actual value to compute a loss function. The weights of the model are updated through backpropagation, where the gradient of the loss function is propagated backwards through the network.\nOne of the challenges of using LSTMs for time series forecasting is selecting an appropriate window size, or the number of previous timesteps that the model should consider when making each prediction. A larger window size can capture longer-term trends in the data, but may also introduce more noise and complicate the training process. A smaller window size, on the other hand, may be more sensitive to short-term fluctuations in the data, but may not capture longer-term trends as effectively.\nIn the context of the Eurostat competition, the LSTM approach was used to predict the value of several economic indicators, including the Producer prices in industry, the Producer volume in industry and the Number of nights spent at tourist accommodation establishments. The time series data for each country was transformed into a format that was suitable for LSTM training, with each row representing a single timestep in the time series and the columns representing the various input features.\n\n\nData and estimation\nWe gathered indicators of the macroeconomic environment from different sources. These data include hard : macro variables, financial indicators, economic surveys, prices and soft indicators. The data is transformed into a large dataset, we included macroeconomic series and their lags. The series are all scaled. The LSTM model is trained independently for each country, with a grid search used to find the optimal hyperparameters for each model. Overall, the LSTM approach proved to be a powerful tool for time series forecasting, and its ability to capture long-term dependencies in the data made it particularly well-suited for the nowcasting problem at hand." + }, + { + "objectID": "methodology.html#similarities-and-differences-to-state-of-the-art-techniques", + "href": "methodology.html#similarities-and-differences-to-state-of-the-art-techniques", + "title": "Methodology", + "section": "Similarities and differences to State-of-the-Art techniques", + "text": "Similarities and differences to State-of-the-Art techniques\nA few specificities of our work:\n\nA significant portion of our database comprises classic macroeconomic indicators, including prices, surveys, and Brent. All of them are open source.\nFor low-dimensional methods, we ensure minimal control over the consistency of selected variables or signs of estimated coefficients.\nWe have compared different methods and included traditional methods in our analysis, but our research also incorporates methods that rely on recent developments.\nWe exclusively use open data sources, avoiding non-free data aggregators, which ensures high reproducibility. However, data retrieval can be more expensive, and some non-free data cannot be utilized.\nWe made a conscious effort to identify new data sources or indicators. We also incorporate soft data such as Google Trends in our analysis.\nThe methods we use combine data with diverse frequencies, up to weekly indicators (created from daily data), allowing us to improve the precision of our predictions up until the last day of each month." + }, + { + "objectID": "data.html", + "href": "data.html", + "title": "Data", + "section": "", + "text": "We take great pride in our dedication to using exclusively open data sources in our modeling efforts, which was a fundamental aspect of our approach during the challenge. While some proprietary data sources may have had greater predictive power, we firmly believed that utilizing open data sources was crucial to promoting the principles of transparency and reproducibility in our modeling efforts. By leveraging publicly available data, we were able to derive nowcasts of key economic indicators while ensuring that our work can be easily replicated and validated by others in the official statistics community. This approach not only provided us with a robust foundation for our models but also served to promote the values of open science, data transparency, and reproducibility.\nDuring the challenge, we utilized three primary sources of data to inform our modeling efforts. The first source was economic data from the Eurostat database, which provided us with a comprehensive overview of the economic situation in the European Union. The second source of data was financial data, which provided us with valuable insights into the financial context surrounding each of the challenges. This data included stock prices, exchange rates, and other financial indicators that were useful in predicting economic trends and identifying potential risks. Finally, we also used Google Trends data to capture the most recent trends and shifts in consumer behavior. This data enabled us to monitor changes in search volume for specific keywords, which served as an early warning system for sudden changes in consumer sentiment and preferences. Overall, our use of these three distinct sources of data allowed us to develop a comprehensive understanding of the economic landscape and to generate nowcasts of the 3 target variables." + }, + { + "objectID": "data.html#sec-data-eurostat", + "href": "data.html#sec-data-eurostat", + "title": "Data", + "section": "Economic data from Eurostat database", + "text": "Economic data from Eurostat database\nWe use classic macroeconomic variables provided to Eurostat by European countries and available on Eurostat website. These series are automatically retrieved from Eurostat database through its API.\n\nProducer prices in industry:\nProducer Price in Industry (PPI) refers to the average price that domestic producers receive for the goods and services they produce. This indicator measures changes in the price of goods and services at the producer level, and it is considered an important leading indicator of inflation.\nSpecificities:\n- The PPI challenge refers to this indicator for a particular NACE\n- Total on domestic market (target variable)\n- Division level of NACE and MIGs\nImport prices in industry:\nImport prices in industry, also known as industrial import price index (IPI), refer to the cost of goods and services imported into a country for use in production. This indicator reflects changes in the prices of imported raw materials, intermediate goods, and capital equipment, and it is influenced by factors such as exchange rates, global commodity prices, and trade policies.\nSpecificities:\n- Total\n- Division level of CPA and MIGs\nProduction index in industry :\nThe Production Index in Industry (or Production Volume in Industry - PVI) is a measure of the physical output of the industrial sector of an economy. It tracks changes in the volume of goods produced over time, and it is considered an important indicator of the health and performance of the manufacturing sector. The production index can be used to assess trends in productivity, capacity utilization, and competitiveness.\nSpecificities:\n- The PVI challenge refers to this indicator for a particular NACE\n- Total\n- Intermediary goods (MIG_ING)\nHarmonised Index of Consumer Prices on a few products\nThe Harmonised Index of Consumer Prices (HICP) is a measure of inflation that is used to compare price changes across the European Union. It tracks the average change over time in the prices of goods and services that households consume, including food, housing, transportation, and healthcare. The HICP is calculated using a harmonised methodology that ensures comparability across EU member states, and it is published on a monthly basis by Eurostat. It is a key indicator of price stability.\nBusiness survey in industry:\nThe Business Survey in Industry is a survey conducted by Eurostat to gather information on the business conditions and expectations of companies in the manufacturing sector. The survey covers a range of topics, including production, new orders, inventories, employment, prices, and investment, and it is conducted on a monthly basis across the European Union. The data collected from the survey can be used to assess the current and future state of the manufacturing sector, to identify sector-specific challenges and opportunities, and to inform policymaking and business decision-making.\nCovered by the survey:\n- Industrial confidence indicator\n- Production development observed over the past 3 months\n- Production expectations over the next 3 months\n- Employment expectations over the next 3 months\n- Assessment of order-book levels\n- Assessment of the current level of stocks of finished products\n- Selling price expectations over the next 3 months\nConsumer survey in industry:\nThe Consumer Survey in Industry is a survey conducted by Eurostat to gather information on the consumer sentiment and behavior in the European Union. The survey covers a range of topics, including household income, savings, spending intentions, and major purchases, and it is conducted on a monthly basis. The data collected from the survey can be used to assess consumer confidence, to identify trends in consumer spending and saving patterns, and to inform policymaking and business decision-making. The Consumer Survey in Industry is an important indicator of the overall health of the economy, as consumer spending is a major driver of economic activity.\nCovered by the survey:\n- Financial situation over the last 12 months\n- Financial situation over the next 12 months\n- General economic situation over the last 12 months\n- General economic situation over the next 12 months\n- Price trends over the last 12 months\n- Price trends over the next 12 months\n- Unemployment expectations over the next 12 months\n- The current economic situation is adequate to make major purchases\n- Major purchases over the next 12 months\n- The current economic situation is adequate for savings\n- Savings over the next 12 months\n- Statement on financial situation of household\n- Consumer confidence indicator\nNumber of nights spent at tourist accommodation establishments\nThe number of nights spent at tourist accommodation establishments is a measure of the volume of tourism activity in a country. It refers to the total number of nights that guests spend in all types of tourist accommodation establishments, including hotels, campsites, holiday homes, and other types of accommodation. This indicator is important for assessing the contribution of tourism to the economy, as well as for monitoring the performance and competitiveness of the tourism sector.\nSpecificities:\n- The TOURISM challenge refers to this indicator" + }, + { + "objectID": "data.html#financial-data-from-yahoo-finance", + "href": "data.html#financial-data-from-yahoo-finance", + "title": "Data", + "section": "Financial data from Yahoo Finance", + "text": "Financial data from Yahoo Finance\nYahoo Finance is a popular online platform for financial information and investment tools. It provides a wide range of financial data, including real-time stock prices, historical price charts, news articles, analyst ratings, and financial statements for publicly traded companies. We used its API to get the latest financial data to improve our short-term predictions.\n\nEuro/Dollar exchange rate\nThe Euro/Dollar exchange rate represents the value of one euro in terms of US dollars. It is a widely followed currency pair in the foreign exchange market, as it reflects the relative strength of two of the world’s largest economies. Movements in the exchange rate can be influenced by a range of factors, such as interest rate differentials, inflation expectations, political developments, and global economic trends. The exchange rate can impact international trade, investment flows, and the competitiveness of exports and imports, making it a key indicator for businesses, investors, and policymakers alike.\nBrent Crude Oil Stock Price\nThe Brent Crude Oil Last Day Financial Futures Stock Price is a benchmark for the price of crude oil from the North Sea, which is used as a pricing reference for approximately two-thirds of the world’s traded crude oil. As a financial futures contract, it allows investors to trade the price of oil without actually buying or selling the physical commodity. The stock price reflects the market’s perception of supply and demand dynamics, geopolitical risks, and other macroeconomic factors that impact the oil market.\nS&P 500 Index Stock Price\nThe S&P 500 Index stock is a market capitalization-weighted index of 500 leading publicly traded companies in the United States. It is widely considered to be a barometer of the US stock market’s performance, providing investors with a broad-based measure of the economy’s health and direction. The S&P 500 index includes companies from a range of sectors, such as technology, healthcare, finance, and energy, making it a diversified indicator of the US equity market.\nEuro stoxx 50 Index Stock Price\nThe Euro Stoxx 50 Index stock is a market capitalization-weighted index of 50 leading blue-chip companies from 12 Eurozone countries. It is designed to reflect the performance of the Eurozone’s most liquid and largest companies across a range of industries, including banking, energy, consumer goods, and healthcare. As a widely recognized benchmark of the Eurozone equity market, the Euro Stoxx 50 Index stock is used by investors and analysts to track market trends, benchmark portfolio performance, and identify investment opportunities. Movements in the index are influenced by a range of factors, such as economic growth prospects, monetary policy decisions, geopolitical risks, and corporate earnings announcements.\nCAC40 Index Stock Price\nThe CAC 40 Index Stock is a benchmark index of the top 40 companies listed on the Euronext Paris Stock Exchange, representing a broad range of industries such as energy, finance, healthcare, and technology. It is the most widely used indicator of the French equity market’s performance and is considered one of the leading indices in Europe. The CAC 40 Index Stock is weighted by market capitalization and is closely monitored by investors and analysts as an indicator of economic health and growth prospects in France. Movements in the index can be influenced by a variety of factors, such as geopolitical risks, macroeconomic indicators, and company-specific news." + }, + { + "objectID": "data.html#google-trends", + "href": "data.html#google-trends", + "title": "Data", + "section": "Google Trends", + "text": "Google Trends\nGoogle Trends is a free online tool provided by Google that allows users to explore the popularity of search queries over time and across different regions and languages. It provides valuable insights into the behavior of internet users, the topics they are interested in, and the evolution of search trends over time.\nNevertheless, the use of Google Trends data as a tool for economic analysis needs to be done carefully. Google Trends provides Search Volume Indices (SVI) based on search ratios, with the initial search volume for a category or topic at a given time divided by the total number of searches at that date. However, changes in the denominator (total searches) can induce biases as internet use has evolved since 2004.\nWe implemented an approach to address this downward bias by extracting a common component from concurrent time series using Principal Component Analysis (PCA) on the log-SVI series long-term trends filtered out using an HP filter. The rescaled first component obtained from the long-term log-SVIs is assumed to capture the common long-term trend, and it is subtracted from the log-SVIs. This approach can help to remove the downward bias common to all Google Trends variables and improve their economic predictive power. More information in this paper.\nObserved categories:\n\nManufacturing\nIndustrial Materials & Equipment\nFuel Economy & Gas Prices\nHotels & Accommodations\nTravel Agencies & Services\nVacation Offers\nMountain & Ski Resorts" + }, + { + "objectID": "data.html#other-data", + "href": "data.html#other-data", + "title": "Data", + "section": "Other data", + "text": "Other data\n\nElectricity prices\nEmber.org provides European wholesale electricity price data that can be used to analyze electricity market trends, monitor price volatility, and inform investment decisions. The data is sourced from various market operators and exchanges across Europe and covers a wide range of countries and regions.\n\n\nCalendar data\nWe retrieve the number of weekend days per month to include it as a feature to our models.\n\n\nLeading national indicators\n\nGermany developed some experimental indicators on activity. We use the daily Truck toll mileage index accessible on Destatis website.\n\n\nThe Weekly WIFO Economic Index is a measure of the real economic activity of the Austrian economy. We use its industrial production component.\n\n\n\nOther potential interesting data sources (not identified in open data sources)\n\nHistorical Purchasing manager’s index by country\nLondon metal exchange indices\nDaily electricity consumption by industrial firms country by country\n…" + }, { "objectID": "index.html", "href": "index.html", @@ -195,6 +223,34 @@ "section": "Acknowledgements", "text": "Acknowledgements\nOn behalf of our team, we would like to express our gratitude to:\n\nINSEE and our hierarchy for providing us with the time and resources to participate in this challenge and supporting our research efforts.\nThe IT innovation team at INSEE for providing us with the SSPCloud (Comte, Degorre, and Lesur (2022)) platform, which not only enabled us to perform our data science analysis with ease but also provided us with an efficient computing environment.\nThe various teams at INSEE whom we reached out to for their invaluable advice and guidance throughout the competition.\nThe organizers of this challenge for creating a stimulating and engaging platform to showcase our skills.\nThe creators of the RJDemetra package (Quartier-la-Tente et al. (2023) and Smyk and Tchang (2021)), which provided us with powerful tools for seasonal adjustment of time series data, enhancing the accuracy of our models.\nSebastian Krantz, the creator of the dfms package (Krantz and Bagdziunas (2022)), for openly sharing this valuable resource and providing a user-friendly and efficient tool for estimating Dynamics Factor Models (DFM) in R, which significantly facilitated our nowcasting analysis.\n\nThank you all for your support and contributions to our success in this endeavor." }, + { + "objectID": "gas.html", + "href": "gas.html", + "title": "European inland gas consumption challenge", + "section": "", + "text": "challenge = \"GAS\"\nProducer prices are also known as output prices and the objective of the output price index is to measure the monthly development of transaction prices of economic activities.\nThere is a general public interest in knowing the extent to which the prices of goods and services have risen. Also, it has long been customary in many countries to adjust levels of wages, pensions, and payments in long term contracts in proportion to changes in relevant prices.\nThe domestic output price index for an economic activity measures the average price development of all goods and related services resulting from that activity and sold on the domestic market between one time period and another." + }, + { + "objectID": "gas.html#next-forecast-for-a-given-country", + "href": "gas.html#next-forecast-for-a-given-country", + "title": "European inland gas consumption challenge", + "section": "Next forecast for a given country", + "text": "Next forecast for a given country\n\nviewof country = Inputs.select(Object.values(country_map), {\n label: html`<b>Select a country:</b>`,\n placeholder: \"Enter a country name\",\n unique: true\n })\n\n\n\n\n\n\n\n\n\nPlot.plot({\n grid: true,\n y: {\n label: \"↑ Production price in industry\",\n }, \n x: {\n label: \"Year\",\n domain: range\n },\n marks: [\n Plot.line(historical, {\n tip: true,\n x: \"date\", \n y: \"values\", \n stroke: \"black\",\n title: (d) =>\n `${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })}\\n${d.values}`\n }),\n Plot.dot(predictions, {\n tip: true,\n x: \"date\", \n y: \"values\",\n fill: \"model\",\n title: (d) =>\n `${d.model}\\n ${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })} : ${d.values} `\n })\n ],\n color: {legend: true}\n})\n\n\n\n\n\n\n\n\n\n\n\ndates = {\n const data = historical.map(d => d.date)\n data.push(predictions.map(d => d.date)[0])\n return data \n}\n\nviewof range = offsetInterval(dates, {\n value: [ dates[dates.length-90], dates[dates.length-1] ],\n format: ([a, b]) => htl.html`<span ${{\n style: \"display: flex; justify-content: space-between\"\n }}>\n ${a.toISOString(\"en-UK\").slice(0, 10)}\n        \n        \n        \n        \n        \n ${b.toISOString(\"en-UK\").slice(0, 10)}\n </span>`\n})" + }, + { + "objectID": "gas.html#country-specific-forecast-summary-by-model", + "href": "gas.html#country-specific-forecast-summary-by-model", + "title": "European inland gas consumption challenge", + "section": "Country-specific forecast summary by model", + "text": "Country-specific forecast summary by model\n\nviewof form = Inputs.form({\n model: Inputs.checkbox(models, {value: models}),\n countries: Inputs.select([\"All\", ...Object.values(country_map)], {multiple: true, value: [\"All\"], width: 50, size: 1})\n})\n\n\n\n\n\n\n\nviewof rows = Inputs.table(summary_table,{\n rows: 25,\n maxWidth: 840,\n multiple: false,\n layout: \"fixed\"\n})\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nsummary_table = format_summary_table_data(pred, form.countries, form.model, country_map)\n\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REG-ARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_summary_table_data,\n get_countries_per_challenge,\n mapping_countries_weights,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nimport {offsetInterval} from '@mootari/offset-slider'\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" + }, + { + "objectID": "lessons-learned.html", + "href": "lessons-learned.html", + "title": "Lessons Learned", + "section": "", + "text": "Retrieving data is a costly task, as we exclusively use open data sources and avoid non-free data aggregators. In addition to classic macroeconomic indicators that are common to most European countries, identifying interesting indicators specific to certain countries can be expensive. Unfortunately, the short duration of the competition limited our ability to acquire new data sources, such as payment card data, which could have been useful for the tourism challenge. Moreover, for a goal of reproducibility, we decided to exclude non open source data from our scope.\nPost-mortem analysis on errors is crucial. However, in the real-time context of nowcasting challenges, having a track record of past residuals before the start of the challenge is not always straightforward. Economic variables availability can move throughout the month, making it difficult to establish a true track record.\nDepending on the model, taking into account the impact of COVID-19 on estimation is relevant. Otherwise, coefficients could be strongly biased, with the variance of COVID-19 points dominating the total series variance.\nOur approach is mainly neutral regarding the choice of variables, with an automatic selection procedure and a focus on treating all countries. This mainly neutral approach is partially due to a lack of time, but fine-tuning country by country can also be a useful approach.\n“Soft” data, such as Google Trends, appears to provide some information for the tourism challenge, but less so for production prices and production, at least during a “stationary” period.\nUsing nowcasting techniques on disaggregated variables is an interesting option, particularly for prices that have exhibited distinct dynamics across different products in recent times. However, implementing this approach can be expensive as it necessitates the use of different models for each disaggregated level and appropriate re-aggregation for obtaining the final nowcast value. Given our constraints with respect to time, we were unable to explore this approach thoroughly.\nFor most of our models, the last available value of the indicator often has a very big influence, more than we would have thought. Because of this, even in our most recent results we may observe a lag between the true value of the indicators and our predictions based on past data. This shows that we were not able to identify all the external factors influencing the indicators. With more resources and a larger time window, we would still be able to identify some more explicative variables to improve the predictions." + }, { "objectID": "post-mortem-oil.html", "href": "post-mortem-oil.html", @@ -223,34 +279,6 @@ "section": "Mean square relative error", "text": "Mean square relative error\nThis interactive graph displays the mean square relative error for each of the models used in the challenge, ranked by their performance from the least accurate to the most accurate. The mean square relative error is a statistical measure that provides an average of the square relative error across all the forecasts made by a given model.\nThese errors can be weighted by a factor, as was the case in the official evaluation of the challenge. The role of the weights is to reflect the difficulty of predicting the point estimate of the target variable for the corresponding country.\n\\[\nMSRE = \\frac{1}{n}\\sum_\\limits{i=1}^n\\left(\\frac{Y_i - R_i}{R_i}\\right)^2\n\\]\n\nviewof doweighted = Inputs.toggle({label: \"Weighted mean\", value: false})\n\n\n\n\n\n\n\nPlot.plot({\n x: {\n domain: d3.sort(ave_errors, d => -d.MSRE).map(d => d.Entries),\n label: null\n },\n y: {\n grid: true,\n transform: doweighted ? d => d * weight : null\n },\n color: {\n legend: true\n },\n marks: [\n Plot.barY(ave_errors, {\n tip: true,\n x: \"Entries\", \n y: \"MSRE\",\n fill: \"Entries\",\n sort: {\n x: {value: \"y\", reverse: true}\n },\n title: (d) =>\n `${d.Entries}: \\n ${doweighted ? Math.round(d.MSRE * weight * 10000) / 10000 : Math.round(d.MSRE * 10000) / 10000} (n = ${d.N}) `\n }),\n Plot.ruleY([0])\n ]\n })\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nerrors = format_errors_data(errors_data, country_iso)\n\n\n\n\n\n\n\nave_errors = format_ave_errors_data(ave_errors_data, country_iso)\n\n\n\n\n\n\n\n\nweight = get_weights_per_challenge(mapping_countries_weights, challenge).filter(d => d.Name == country)[0].Weight\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REGARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_errors_data,\n format_ave_errors_data,\n mapping_countries_weights,\n get_countries_per_challenge,\n get_weights_per_challenge,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" }, - { - "objectID": "LICENCE.html", - "href": "LICENCE.html", - "title": "European Statistics Awards for Nowcasting", - "section": "", - "text": "EUROPEAN UNION PUBLIC LICENCE v. 1.2\n EUPL © the European Union 2007, 2016\nThis European Union Public Licence (the ‘EUPL’) applies to the Work (as defined below) which is provided under the terms of this Licence. Any use of the Work, other than as authorised under this Licence is prohibited (to the extent such use is covered by a right of the copyright holder of the Work).\nThe Work is provided under the terms of this Licence when the Licensor (as defined below) has placed the following notice immediately following the copyright notice for the Work:\n Licensed under the EUPL\nor has expressed by any other means his willingness to license under the EUPL.\n\nDefinitions\n\nIn this Licence, the following terms have the following meaning:\n\n‘The Licence’: this Licence.\n‘The Original Work’: the work or software distributed or communicated by the Licensor under this Licence, available as Source Code and also as Executable Code as the case may be.\n‘Derivative Works’: the works or software that could be created by the Licensee, based upon the Original Work or modifications thereof. This Licence does not define the extent of modification or dependence on the Original Work required in order to classify a work as a Derivative Work; this extent is determined by copyright law applicable in the country mentioned in Article 15.\n‘The Work’: the Original Work or its Derivative Works.\n‘The Source Code’: the human-readable form of the Work which is the most convenient for people to study and modify.\n‘The Executable Code’: any code which has generally been compiled and which is meant to be interpreted by a computer as a program.\n‘The Licensor’: the natural or legal person that distributes or communicates the Work under the Licence.\n‘Contributor(s)’: any natural or legal person who modifies the Work under the Licence, or otherwise contributes to the creation of a Derivative Work.\n‘The Licensee’ or ‘You’: any natural or legal person who makes any usage of the Work under the terms of the Licence.\n‘Distribution’ or ‘Communication’: any act of selling, giving, lending, renting, distributing, communicating, transmitting, or otherwise making available, online or offline, copies of the Work or providing access to its essential functionalities at the disposal of any other natural or legal person.\n\n\nScope of the rights granted by the Licence\n\nThe Licensor hereby grants You a worldwide, royalty-free, non-exclusive, sublicensable licence to do the following, for the duration of copyright vested in the Original Work:\n\nuse the Work in any circumstance and for all usage,\nreproduce the Work,\nmodify the Work, and make Derivative Works based upon the Work,\ncommunicate to the public, including the right to make available or display the Work or copies thereof to the public and perform publicly, as the case may be, the Work,\ndistribute the Work or copies thereof,\nlend and rent the Work or copies thereof,\nsublicense rights in the Work or copies thereof.\n\nThose rights can be exercised on any media, supports and formats, whether now known or later invented, as far as the applicable law permits so.\nIn the countries where moral rights apply, the Licensor waives his right to exercise his moral right to the extent allowed by law in order to make effective the licence of the economic rights here above listed.\nThe Licensor grants to the Licensee royalty-free, non-exclusive usage rights to any patents held by the Licensor, to the extent necessary to make use of the rights granted on the Work under this Licence.\n\nCommunication of the Source Code\n\nThe Licensor may provide the Work either in its Source Code form, or as Executable Code. If the Work is provided as Executable Code, the Licensor provides in addition a machine-readable copy of the Source Code of the Work along with each copy of the Work that the Licensor distributes or indicates, in a notice following the copyright notice attached to the Work, a repository where the Source Code is easily and freely accessible for as long as the Licensor continues to distribute or communicate the Work.\n\nLimitations on copyright\n\nNothing in this Licence is intended to deprive the Licensee of the benefits from any exception or limitation to the exclusive rights of the rights owners in the Work, of the exhaustion of those rights or of other applicable limitations thereto.\n\nObligations of the Licensee\n\nThe grant of the rights mentioned above is subject to some restrictions and obligations imposed on the Licensee. Those obligations are the following:\nAttribution right: The Licensee shall keep intact all copyright, patent or trademarks notices and all notices that refer to the Licence and to the disclaimer of warranties. The Licensee must include a copy of such notices and a copy of the Licence with every copy of the Work he/she distributes or communicates. The Licensee must cause any Derivative Work to carry prominent notices stating that the Work has been modified and the date of modification.\nCopyleft clause: If the Licensee distributes or communicates copies of the Original Works or Derivative Works, this Distribution or Communication will be done under the terms of this Licence or of a later version of this Licence unless the Original Work is expressly distributed only under this version of the Licence — for example by communicating ‘EUPL v. 1.2 only’. The Licensee (becoming Licensor) cannot offer or impose any additional terms or conditions on the Work or Derivative Work that alter or restrict the terms of the Licence.\nCompatibility clause: If the Licensee Distributes or Communicates Derivative Works or copies thereof based upon both the Work and another work licensed under a Compatible Licence, this Distribution or Communication can be done under the terms of this Compatible Licence. For the sake of this clause, ‘Compatible Licence’ refers to the licences listed in the appendix attached to this Licence. Should the Licensee’s obligations under the Compatible Licence conflict with his/her obligations under this Licence, the obligations of the Compatible Licence shall prevail.\nProvision of Source Code: When distributing or communicating copies of the Work, the Licensee will provide a machine-readable copy of the Source Code or indicate a repository where this Source will be easily and freely available for as long as the Licensee continues to distribute or communicate the Work.\nLegal Protection: This Licence does not grant permission to use the trade names, trademarks, service marks, or names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the copyright notice.\n\nChain of Authorship\n\nThe original Licensor warrants that the copyright in the Original Work granted hereunder is owned by him/her or licensed to him/her and that he/she has the power and authority to grant the Licence.\nEach Contributor warrants that the copyright in the modifications he/she brings to the Work are owned by him/her or licensed to him/her and that he/she has the power and authority to grant the Licence.\nEach time You accept the Licence, the original Licensor and subsequent Contributors grant You a licence to their contributions to the Work, under the terms of this Licence.\n\nDisclaimer of Warranty\n\nThe Work is a work in progress, which is continuously improved by numerous Contributors. It is not a finished work and may therefore contain defects or ‘bugs’ inherent to this type of development.\nFor the above reason, the Work is provided under the Licence on an ‘as is’ basis and without warranties of any kind concerning the Work, including without limitation merchantability, fitness for a particular purpose, absence of defects or errors, accuracy, non-infringement of intellectual property rights other than copyright as stated in Article 6 of this Licence.\nThis disclaimer of warranty is an essential part of the Licence and a condition for the grant of any rights to the Work.\n\nDisclaimer of Liability\n\nExcept in the cases of wilful misconduct or damages directly caused to natural persons, the Licensor will in no event be liable for any direct or indirect, material or moral, damages of any kind, arising out of the Licence or of the use of the Work, including without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, loss of data or any commercial damage, even if the Licensor has been advised of the possibility of such damage. However, the Licensor will be liable under statutory product liability laws as far such laws apply to the Work.\n\nAdditional agreements\n\nWhile distributing the Work, You may choose to conclude an additional agreement, defining obligations or services consistent with this Licence. However, if accepting obligations, You may act only on your own behalf and on your sole responsibility, not on behalf of the original Licensor or any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against such Contributor by the fact You have accepted any warranty or additional liability.\n\nAcceptance of the Licence\n\nThe provisions of this Licence can be accepted by clicking on an icon ‘I agree’ placed under the bottom of a window displaying the text of this Licence or by affirming consent in any other similar way, in accordance with the rules of applicable law. Clicking on that icon indicates your clear and irrevocable acceptance of this Licence and all of its terms and conditions.\nSimilarly, you irrevocably accept this Licence and all of its terms and conditions by exercising any rights granted to You by Article 2 of this Licence, such as the use of the Work, the creation by You of a Derivative Work or the Distribution or Communication by You of the Work or copies thereof.\n\nInformation to the public\n\nIn case of any Distribution or Communication of the Work by means of electronic communication by You (for example, by offering to download the Work from a remote location) the distribution channel or media (for example, a website) must at least provide to the public the information requested by the applicable law regarding the Licensor, the Licence and the way it may be accessible, concluded, stored and reproduced by the Licensee.\n\nTermination of the Licence\n\nThe Licence and the rights granted hereunder will terminate automatically upon any breach by the Licensee of the terms of the Licence.\nSuch a termination will not terminate the licences of any person who has received the Work from the Licensee under the Licence, provided such persons remain in full compliance with the Licence.\n\nMiscellaneous\n\nWithout prejudice of Article 9 above, the Licence represents the complete agreement between the Parties as to the Work.\nIf any provision of the Licence is invalid or unenforceable under applicable law, this will not affect the validity or enforceability of the Licence as a whole. Such provision will be construed or reformed so as necessary to make it valid and enforceable.\nThe European Commission may publish other linguistic versions or new versions of this Licence or updated versions of the Appendix, so far this is required and reasonable, without reducing the scope of the rights granted by the Licence. New versions of the Licence will be published with a unique version number.\nAll linguistic versions of this Licence, approved by the European Commission, have identical value. Parties can take advantage of the linguistic version of their choice.\n\nJurisdiction\n\nWithout prejudice to specific agreement between parties,\n\nany litigation resulting from the interpretation of this License, arising between the European Union institutions, bodies, offices or agencies, as a Licensor, and any Licensee, will be subject to the jurisdiction of the Court of Justice of the European Union, as laid down in article 272 of the Treaty on the Functioning of the European Union,\nany litigation arising between other parties and resulting from the interpretation of this License, will be subject to the exclusive jurisdiction of the competent court where the Licensor resides or conducts its primary business.\n\n\nApplicable Law\n\nWithout prejudice to specific agreement between parties,\n\nthis Licence shall be governed by the law of the European Union Member State where the Licensor has his seat, resides or has his registered office,\nthis licence shall be governed by Belgian law if the Licensor has no seat, residence or registered office inside a European Union Member State.\n\nAppendix\n‘Compatible Licences’ according to Article 5 EUPL are:\n\nGNU General Public License (GPL) v. 2, v. 3\nGNU Affero General Public License (AGPL) v. 3\nOpen Software License (OSL) v. 2.1, v. 3.0\nEclipse Public License (EPL) v. 1.0\nCeCILL v. 2.0, v. 2.1\nMozilla Public Licence (MPL) v. 2\nGNU Lesser General Public Licence (LGPL) v. 2.1, v. 3\nCreative Commons Attribution-ShareAlike v. 3.0 Unported (CC BY-SA 3.0) for works other than software\nEuropean Union Public Licence (EUPL) v. 1.1, v. 1.2\nQuébec Free and Open-Source Licence — Reciprocity (LiLiQ-R) or Strong Reciprocity (LiLiQ-R+).\n\nThe European Commission may update this Appendix to later versions of the above licences without producing a new version of the EUPL, as long as they provide the rights granted in Article 2 of this Licence and protect the covered Source Code from exclusive appropriation.\nAll other changes or additions to this Appendix require the production of a new EUPL version." - }, - { - "objectID": "gas.html", - "href": "gas.html", - "title": "European inland gas consumption challenge", - "section": "", - "text": "challenge = \"GAS\"\nProducer prices are also known as output prices and the objective of the output price index is to measure the monthly development of transaction prices of economic activities.\nThere is a general public interest in knowing the extent to which the prices of goods and services have risen. Also, it has long been customary in many countries to adjust levels of wages, pensions, and payments in long term contracts in proportion to changes in relevant prices.\nThe domestic output price index for an economic activity measures the average price development of all goods and related services resulting from that activity and sold on the domestic market between one time period and another." - }, - { - "objectID": "gas.html#next-forecast-for-a-given-country", - "href": "gas.html#next-forecast-for-a-given-country", - "title": "European inland gas consumption challenge", - "section": "Next forecast for a given country", - "text": "Next forecast for a given country\n\nviewof country = Inputs.select(Object.values(country_map), {\n label: html`<b>Select a country:</b>`,\n placeholder: \"Enter a country name\",\n unique: true\n })\n\n\n\n\n\n\n\n\n\nPlot.plot({\n grid: true,\n y: {\n label: \"↑ Production price in industry\",\n }, \n x: {\n label: \"Year\",\n domain: range\n },\n marks: [\n Plot.line(historical, {\n tip: true,\n x: \"date\", \n y: \"values\", \n stroke: \"black\",\n title: (d) =>\n `${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })}\\n${d.values}`\n }),\n Plot.dot(predictions, {\n tip: true,\n x: \"date\", \n y: \"values\",\n fill: \"model\",\n title: (d) =>\n `${d.model}\\n ${d.date.toLocaleString(\"en-UK\", {\n month: \"long\",\n year: \"numeric\"\n })} : ${d.values} `\n })\n ],\n color: {legend: true}\n})\n\n\n\n\n\n\n\n\n\n\n\ndates = {\n const data = historical.map(d => d.date)\n data.push(predictions.map(d => d.date)[0])\n return data \n}\n\nviewof range = offsetInterval(dates, {\n value: [ dates[dates.length-90], dates[dates.length-1] ],\n format: ([a, b]) => htl.html`<span ${{\n style: \"display: flex; justify-content: space-between\"\n }}>\n ${a.toISOString(\"en-UK\").slice(0, 10)}\n        \n        \n        \n        \n        \n ${b.toISOString(\"en-UK\").slice(0, 10)}\n </span>`\n})" - }, - { - "objectID": "gas.html#country-specific-forecast-summary-by-model", - "href": "gas.html#country-specific-forecast-summary-by-model", - "title": "European inland gas consumption challenge", - "section": "Country-specific forecast summary by model", - "text": "Country-specific forecast summary by model\n\nviewof form = Inputs.form({\n model: Inputs.checkbox(models, {value: models}),\n countries: Inputs.select([\"All\", ...Object.values(country_map)], {multiple: true, value: [\"All\"], width: 50, size: 1})\n})\n\n\n\n\n\n\n\nviewof rows = Inputs.table(summary_table,{\n rows: 25,\n maxWidth: 840,\n multiple: false,\n layout: \"fixed\"\n})\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nsummary_table = format_summary_table_data(pred, form.countries, form.model, country_map)\n\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REG-ARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_summary_table_data,\n get_countries_per_challenge,\n mapping_countries_weights,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nimport {offsetInterval} from '@mootari/offset-slider'\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" - }, { "objectID": "electricity.html", "href": "electricity.html", @@ -273,38 +301,10 @@ "text": "Country-specific forecast summary by model\n\nviewof form = Inputs.form({\n model: Inputs.checkbox(models, {value: models}),\n countries: Inputs.select([\"All\", ...Object.values(country_map)], {multiple: true, value: [\"All\"], width: 50, size: 1})\n})\n\n\n\n\n\n\n\nviewof rows = Inputs.table(summary_table,{\n rows: 25,\n maxWidth: 840,\n multiple: false,\n layout: \"fixed\"\n})\n\n\n\n\n\n\n\n\nhistorical = format_historical_data(data, country_iso)\n\n\n\n\n\n\n\npredictions = format_pred_data(pred, country_iso)\n\n\n\n\n\n\n\nsummary_table = format_summary_table_data(pred, form.countries, form.model, country_map)\n\n\n\n\n\n\n\n\ncountry_map = get_countries_per_challenge(mapping_countries_weights, challenge)\n\n\n\n\n\n\n\ncountry_iso = Object.keys(country_map).find(key => country_map[key] === country);\n\n\n\n\n\n\n\nmodels = [\"REG-ARIMA\", \"DFM\", \"ETS\", \"XGBOOST\", \"LSTM\"]\n\n\n\n\n\n\n\n\nimport { \n format_historical_data,\n format_pred_data,\n format_summary_table_data,\n get_countries_per_challenge,\n mapping_countries_weights,\n } from \"./utils/utils.qmd\"\n\n\n\n\n\n\n\nimport {offsetInterval} from '@mootari/offset-slider'\n\n\n\n\n\n\n\nPlot = require(\"https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6.8/dist/plot.umd.min.js\")" }, { - "objectID": "data.html", - "href": "data.html", - "title": "Data", + "objectID": "LICENCE.html", + "href": "LICENCE.html", + "title": "European Statistics Awards for Nowcasting", "section": "", - "text": "We take great pride in our dedication to using exclusively open data sources in our modeling efforts, which was a fundamental aspect of our approach during the challenge. While some proprietary data sources may have had greater predictive power, we firmly believed that utilizing open data sources was crucial to promoting the principles of transparency and reproducibility in our modeling efforts. By leveraging publicly available data, we were able to derive nowcasts of key economic indicators while ensuring that our work can be easily replicated and validated by others in the official statistics community. This approach not only provided us with a robust foundation for our models but also served to promote the values of open science, data transparency, and reproducibility.\nDuring the challenge, we utilized three primary sources of data to inform our modeling efforts. The first source was economic data from the Eurostat database, which provided us with a comprehensive overview of the economic situation in the European Union. The second source of data was financial data, which provided us with valuable insights into the financial context surrounding each of the challenges. This data included stock prices, exchange rates, and other financial indicators that were useful in predicting economic trends and identifying potential risks. Finally, we also used Google Trends data to capture the most recent trends and shifts in consumer behavior. This data enabled us to monitor changes in search volume for specific keywords, which served as an early warning system for sudden changes in consumer sentiment and preferences. Overall, our use of these three distinct sources of data allowed us to develop a comprehensive understanding of the economic landscape and to generate nowcasts of the 3 target variables." - }, - { - "objectID": "data.html#sec-data-eurostat", - "href": "data.html#sec-data-eurostat", - "title": "Data", - "section": "Economic data from Eurostat database", - "text": "Economic data from Eurostat database\nWe use classic macroeconomic variables provided to Eurostat by European countries and available on Eurostat website. These series are automatically retrieved from Eurostat database through its API.\n\nProducer prices in industry:\nProducer Price in Industry (PPI) refers to the average price that domestic producers receive for the goods and services they produce. This indicator measures changes in the price of goods and services at the producer level, and it is considered an important leading indicator of inflation.\nSpecificities:\n- The PPI challenge refers to this indicator for a particular NACE\n- Total on domestic market (target variable)\n- Division level of NACE and MIGs\nImport prices in industry:\nImport prices in industry, also known as industrial import price index (IPI), refer to the cost of goods and services imported into a country for use in production. This indicator reflects changes in the prices of imported raw materials, intermediate goods, and capital equipment, and it is influenced by factors such as exchange rates, global commodity prices, and trade policies.\nSpecificities:\n- Total\n- Division level of CPA and MIGs\nProduction index in industry :\nThe Production Index in Industry (or Production Volume in Industry - PVI) is a measure of the physical output of the industrial sector of an economy. It tracks changes in the volume of goods produced over time, and it is considered an important indicator of the health and performance of the manufacturing sector. The production index can be used to assess trends in productivity, capacity utilization, and competitiveness.\nSpecificities:\n- The PVI challenge refers to this indicator for a particular NACE\n- Total\n- Intermediary goods (MIG_ING)\nHarmonised Index of Consumer Prices on a few products\nThe Harmonised Index of Consumer Prices (HICP) is a measure of inflation that is used to compare price changes across the European Union. It tracks the average change over time in the prices of goods and services that households consume, including food, housing, transportation, and healthcare. The HICP is calculated using a harmonised methodology that ensures comparability across EU member states, and it is published on a monthly basis by Eurostat. It is a key indicator of price stability.\nBusiness survey in industry:\nThe Business Survey in Industry is a survey conducted by Eurostat to gather information on the business conditions and expectations of companies in the manufacturing sector. The survey covers a range of topics, including production, new orders, inventories, employment, prices, and investment, and it is conducted on a monthly basis across the European Union. The data collected from the survey can be used to assess the current and future state of the manufacturing sector, to identify sector-specific challenges and opportunities, and to inform policymaking and business decision-making.\nCovered by the survey:\n- Industrial confidence indicator\n- Production development observed over the past 3 months\n- Production expectations over the next 3 months\n- Employment expectations over the next 3 months\n- Assessment of order-book levels\n- Assessment of the current level of stocks of finished products\n- Selling price expectations over the next 3 months\nConsumer survey in industry:\nThe Consumer Survey in Industry is a survey conducted by Eurostat to gather information on the consumer sentiment and behavior in the European Union. The survey covers a range of topics, including household income, savings, spending intentions, and major purchases, and it is conducted on a monthly basis. The data collected from the survey can be used to assess consumer confidence, to identify trends in consumer spending and saving patterns, and to inform policymaking and business decision-making. The Consumer Survey in Industry is an important indicator of the overall health of the economy, as consumer spending is a major driver of economic activity.\nCovered by the survey:\n- Financial situation over the last 12 months\n- Financial situation over the next 12 months\n- General economic situation over the last 12 months\n- General economic situation over the next 12 months\n- Price trends over the last 12 months\n- Price trends over the next 12 months\n- Unemployment expectations over the next 12 months\n- The current economic situation is adequate to make major purchases\n- Major purchases over the next 12 months\n- The current economic situation is adequate for savings\n- Savings over the next 12 months\n- Statement on financial situation of household\n- Consumer confidence indicator\nNumber of nights spent at tourist accommodation establishments\nThe number of nights spent at tourist accommodation establishments is a measure of the volume of tourism activity in a country. It refers to the total number of nights that guests spend in all types of tourist accommodation establishments, including hotels, campsites, holiday homes, and other types of accommodation. This indicator is important for assessing the contribution of tourism to the economy, as well as for monitoring the performance and competitiveness of the tourism sector.\nSpecificities:\n- The TOURISM challenge refers to this indicator" - }, - { - "objectID": "data.html#financial-data-from-yahoo-finance", - "href": "data.html#financial-data-from-yahoo-finance", - "title": "Data", - "section": "Financial data from Yahoo Finance", - "text": "Financial data from Yahoo Finance\nYahoo Finance is a popular online platform for financial information and investment tools. It provides a wide range of financial data, including real-time stock prices, historical price charts, news articles, analyst ratings, and financial statements for publicly traded companies. We used its API to get the latest financial data to improve our short-term predictions.\n\nEuro/Dollar exchange rate\nThe Euro/Dollar exchange rate represents the value of one euro in terms of US dollars. It is a widely followed currency pair in the foreign exchange market, as it reflects the relative strength of two of the world’s largest economies. Movements in the exchange rate can be influenced by a range of factors, such as interest rate differentials, inflation expectations, political developments, and global economic trends. The exchange rate can impact international trade, investment flows, and the competitiveness of exports and imports, making it a key indicator for businesses, investors, and policymakers alike.\nBrent Crude Oil Stock Price\nThe Brent Crude Oil Last Day Financial Futures Stock Price is a benchmark for the price of crude oil from the North Sea, which is used as a pricing reference for approximately two-thirds of the world’s traded crude oil. As a financial futures contract, it allows investors to trade the price of oil without actually buying or selling the physical commodity. The stock price reflects the market’s perception of supply and demand dynamics, geopolitical risks, and other macroeconomic factors that impact the oil market.\nS&P 500 Index Stock Price\nThe S&P 500 Index stock is a market capitalization-weighted index of 500 leading publicly traded companies in the United States. It is widely considered to be a barometer of the US stock market’s performance, providing investors with a broad-based measure of the economy’s health and direction. The S&P 500 index includes companies from a range of sectors, such as technology, healthcare, finance, and energy, making it a diversified indicator of the US equity market.\nEuro stoxx 50 Index Stock Price\nThe Euro Stoxx 50 Index stock is a market capitalization-weighted index of 50 leading blue-chip companies from 12 Eurozone countries. It is designed to reflect the performance of the Eurozone’s most liquid and largest companies across a range of industries, including banking, energy, consumer goods, and healthcare. As a widely recognized benchmark of the Eurozone equity market, the Euro Stoxx 50 Index stock is used by investors and analysts to track market trends, benchmark portfolio performance, and identify investment opportunities. Movements in the index are influenced by a range of factors, such as economic growth prospects, monetary policy decisions, geopolitical risks, and corporate earnings announcements.\nCAC40 Index Stock Price\nThe CAC 40 Index Stock is a benchmark index of the top 40 companies listed on the Euronext Paris Stock Exchange, representing a broad range of industries such as energy, finance, healthcare, and technology. It is the most widely used indicator of the French equity market’s performance and is considered one of the leading indices in Europe. The CAC 40 Index Stock is weighted by market capitalization and is closely monitored by investors and analysts as an indicator of economic health and growth prospects in France. Movements in the index can be influenced by a variety of factors, such as geopolitical risks, macroeconomic indicators, and company-specific news." - }, - { - "objectID": "data.html#google-trends", - "href": "data.html#google-trends", - "title": "Data", - "section": "Google Trends", - "text": "Google Trends\nGoogle Trends is a free online tool provided by Google that allows users to explore the popularity of search queries over time and across different regions and languages. It provides valuable insights into the behavior of internet users, the topics they are interested in, and the evolution of search trends over time.\nNevertheless, the use of Google Trends data as a tool for economic analysis needs to be done carefully. Google Trends provides Search Volume Indices (SVI) based on search ratios, with the initial search volume for a category or topic at a given time divided by the total number of searches at that date. However, changes in the denominator (total searches) can induce biases as internet use has evolved since 2004.\nWe implemented an approach to address this downward bias by extracting a common component from concurrent time series using Principal Component Analysis (PCA) on the log-SVI series long-term trends filtered out using an HP filter. The rescaled first component obtained from the long-term log-SVIs is assumed to capture the common long-term trend, and it is subtracted from the log-SVIs. This approach can help to remove the downward bias common to all Google Trends variables and improve their economic predictive power. More information in this paper.\nObserved categories:\n\nManufacturing\nIndustrial Materials & Equipment\nFuel Economy & Gas Prices\nHotels & Accommodations\nTravel Agencies & Services\nVacation Offers\nMountain & Ski Resorts" - }, - { - "objectID": "data.html#other-data", - "href": "data.html#other-data", - "title": "Data", - "section": "Other data", - "text": "Other data\n\nElectricity prices\nEmber.org provides European wholesale electricity price data that can be used to analyze electricity market trends, monitor price volatility, and inform investment decisions. The data is sourced from various market operators and exchanges across Europe and covers a wide range of countries and regions.\n\n\nCalendar data\nWe retrieve the number of weekend days per month to include it as a feature to our models.\n\n\nLeading national indicators\n\nGermany developed some experimental indicators on activity. We use the daily Truck toll mileage index accessible on Destatis website.\n\n\nThe Weekly WIFO Economic Index is a measure of the real economic activity of the Austrian economy. We use its industrial production component.\n\n\n\nOther potential interesting data sources (not identified in open data sources)\n\nHistorical Purchasing manager’s index by country\nLondon metal exchange indices\nDaily electricity consumption by industrial firms country by country\n…" + "text": "EUROPEAN UNION PUBLIC LICENCE v. 1.2\n EUPL © the European Union 2007, 2016\nThis European Union Public Licence (the ‘EUPL’) applies to the Work (as defined below) which is provided under the terms of this Licence. Any use of the Work, other than as authorised under this Licence is prohibited (to the extent such use is covered by a right of the copyright holder of the Work).\nThe Work is provided under the terms of this Licence when the Licensor (as defined below) has placed the following notice immediately following the copyright notice for the Work:\n Licensed under the EUPL\nor has expressed by any other means his willingness to license under the EUPL.\n\nDefinitions\n\nIn this Licence, the following terms have the following meaning:\n\n‘The Licence’: this Licence.\n‘The Original Work’: the work or software distributed or communicated by the Licensor under this Licence, available as Source Code and also as Executable Code as the case may be.\n‘Derivative Works’: the works or software that could be created by the Licensee, based upon the Original Work or modifications thereof. This Licence does not define the extent of modification or dependence on the Original Work required in order to classify a work as a Derivative Work; this extent is determined by copyright law applicable in the country mentioned in Article 15.\n‘The Work’: the Original Work or its Derivative Works.\n‘The Source Code’: the human-readable form of the Work which is the most convenient for people to study and modify.\n‘The Executable Code’: any code which has generally been compiled and which is meant to be interpreted by a computer as a program.\n‘The Licensor’: the natural or legal person that distributes or communicates the Work under the Licence.\n‘Contributor(s)’: any natural or legal person who modifies the Work under the Licence, or otherwise contributes to the creation of a Derivative Work.\n‘The Licensee’ or ‘You’: any natural or legal person who makes any usage of the Work under the terms of the Licence.\n‘Distribution’ or ‘Communication’: any act of selling, giving, lending, renting, distributing, communicating, transmitting, or otherwise making available, online or offline, copies of the Work or providing access to its essential functionalities at the disposal of any other natural or legal person.\n\n\nScope of the rights granted by the Licence\n\nThe Licensor hereby grants You a worldwide, royalty-free, non-exclusive, sublicensable licence to do the following, for the duration of copyright vested in the Original Work:\n\nuse the Work in any circumstance and for all usage,\nreproduce the Work,\nmodify the Work, and make Derivative Works based upon the Work,\ncommunicate to the public, including the right to make available or display the Work or copies thereof to the public and perform publicly, as the case may be, the Work,\ndistribute the Work or copies thereof,\nlend and rent the Work or copies thereof,\nsublicense rights in the Work or copies thereof.\n\nThose rights can be exercised on any media, supports and formats, whether now known or later invented, as far as the applicable law permits so.\nIn the countries where moral rights apply, the Licensor waives his right to exercise his moral right to the extent allowed by law in order to make effective the licence of the economic rights here above listed.\nThe Licensor grants to the Licensee royalty-free, non-exclusive usage rights to any patents held by the Licensor, to the extent necessary to make use of the rights granted on the Work under this Licence.\n\nCommunication of the Source Code\n\nThe Licensor may provide the Work either in its Source Code form, or as Executable Code. If the Work is provided as Executable Code, the Licensor provides in addition a machine-readable copy of the Source Code of the Work along with each copy of the Work that the Licensor distributes or indicates, in a notice following the copyright notice attached to the Work, a repository where the Source Code is easily and freely accessible for as long as the Licensor continues to distribute or communicate the Work.\n\nLimitations on copyright\n\nNothing in this Licence is intended to deprive the Licensee of the benefits from any exception or limitation to the exclusive rights of the rights owners in the Work, of the exhaustion of those rights or of other applicable limitations thereto.\n\nObligations of the Licensee\n\nThe grant of the rights mentioned above is subject to some restrictions and obligations imposed on the Licensee. Those obligations are the following:\nAttribution right: The Licensee shall keep intact all copyright, patent or trademarks notices and all notices that refer to the Licence and to the disclaimer of warranties. The Licensee must include a copy of such notices and a copy of the Licence with every copy of the Work he/she distributes or communicates. The Licensee must cause any Derivative Work to carry prominent notices stating that the Work has been modified and the date of modification.\nCopyleft clause: If the Licensee distributes or communicates copies of the Original Works or Derivative Works, this Distribution or Communication will be done under the terms of this Licence or of a later version of this Licence unless the Original Work is expressly distributed only under this version of the Licence — for example by communicating ‘EUPL v. 1.2 only’. The Licensee (becoming Licensor) cannot offer or impose any additional terms or conditions on the Work or Derivative Work that alter or restrict the terms of the Licence.\nCompatibility clause: If the Licensee Distributes or Communicates Derivative Works or copies thereof based upon both the Work and another work licensed under a Compatible Licence, this Distribution or Communication can be done under the terms of this Compatible Licence. For the sake of this clause, ‘Compatible Licence’ refers to the licences listed in the appendix attached to this Licence. Should the Licensee’s obligations under the Compatible Licence conflict with his/her obligations under this Licence, the obligations of the Compatible Licence shall prevail.\nProvision of Source Code: When distributing or communicating copies of the Work, the Licensee will provide a machine-readable copy of the Source Code or indicate a repository where this Source will be easily and freely available for as long as the Licensee continues to distribute or communicate the Work.\nLegal Protection: This Licence does not grant permission to use the trade names, trademarks, service marks, or names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the copyright notice.\n\nChain of Authorship\n\nThe original Licensor warrants that the copyright in the Original Work granted hereunder is owned by him/her or licensed to him/her and that he/she has the power and authority to grant the Licence.\nEach Contributor warrants that the copyright in the modifications he/she brings to the Work are owned by him/her or licensed to him/her and that he/she has the power and authority to grant the Licence.\nEach time You accept the Licence, the original Licensor and subsequent Contributors grant You a licence to their contributions to the Work, under the terms of this Licence.\n\nDisclaimer of Warranty\n\nThe Work is a work in progress, which is continuously improved by numerous Contributors. It is not a finished work and may therefore contain defects or ‘bugs’ inherent to this type of development.\nFor the above reason, the Work is provided under the Licence on an ‘as is’ basis and without warranties of any kind concerning the Work, including without limitation merchantability, fitness for a particular purpose, absence of defects or errors, accuracy, non-infringement of intellectual property rights other than copyright as stated in Article 6 of this Licence.\nThis disclaimer of warranty is an essential part of the Licence and a condition for the grant of any rights to the Work.\n\nDisclaimer of Liability\n\nExcept in the cases of wilful misconduct or damages directly caused to natural persons, the Licensor will in no event be liable for any direct or indirect, material or moral, damages of any kind, arising out of the Licence or of the use of the Work, including without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, loss of data or any commercial damage, even if the Licensor has been advised of the possibility of such damage. However, the Licensor will be liable under statutory product liability laws as far such laws apply to the Work.\n\nAdditional agreements\n\nWhile distributing the Work, You may choose to conclude an additional agreement, defining obligations or services consistent with this Licence. However, if accepting obligations, You may act only on your own behalf and on your sole responsibility, not on behalf of the original Licensor or any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against such Contributor by the fact You have accepted any warranty or additional liability.\n\nAcceptance of the Licence\n\nThe provisions of this Licence can be accepted by clicking on an icon ‘I agree’ placed under the bottom of a window displaying the text of this Licence or by affirming consent in any other similar way, in accordance with the rules of applicable law. Clicking on that icon indicates your clear and irrevocable acceptance of this Licence and all of its terms and conditions.\nSimilarly, you irrevocably accept this Licence and all of its terms and conditions by exercising any rights granted to You by Article 2 of this Licence, such as the use of the Work, the creation by You of a Derivative Work or the Distribution or Communication by You of the Work or copies thereof.\n\nInformation to the public\n\nIn case of any Distribution or Communication of the Work by means of electronic communication by You (for example, by offering to download the Work from a remote location) the distribution channel or media (for example, a website) must at least provide to the public the information requested by the applicable law regarding the Licensor, the Licence and the way it may be accessible, concluded, stored and reproduced by the Licensee.\n\nTermination of the Licence\n\nThe Licence and the rights granted hereunder will terminate automatically upon any breach by the Licensee of the terms of the Licence.\nSuch a termination will not terminate the licences of any person who has received the Work from the Licensee under the Licence, provided such persons remain in full compliance with the Licence.\n\nMiscellaneous\n\nWithout prejudice of Article 9 above, the Licence represents the complete agreement between the Parties as to the Work.\nIf any provision of the Licence is invalid or unenforceable under applicable law, this will not affect the validity or enforceability of the Licence as a whole. Such provision will be construed or reformed so as necessary to make it valid and enforceable.\nThe European Commission may publish other linguistic versions or new versions of this Licence or updated versions of the Appendix, so far this is required and reasonable, without reducing the scope of the rights granted by the Licence. New versions of the Licence will be published with a unique version number.\nAll linguistic versions of this Licence, approved by the European Commission, have identical value. Parties can take advantage of the linguistic version of their choice.\n\nJurisdiction\n\nWithout prejudice to specific agreement between parties,\n\nany litigation resulting from the interpretation of this License, arising between the European Union institutions, bodies, offices or agencies, as a Licensor, and any Licensee, will be subject to the jurisdiction of the Court of Justice of the European Union, as laid down in article 272 of the Treaty on the Functioning of the European Union,\nany litigation arising between other parties and resulting from the interpretation of this License, will be subject to the exclusive jurisdiction of the competent court where the Licensor resides or conducts its primary business.\n\n\nApplicable Law\n\nWithout prejudice to specific agreement between parties,\n\nthis Licence shall be governed by the law of the European Union Member State where the Licensor has his seat, resides or has his registered office,\nthis licence shall be governed by Belgian law if the Licensor has no seat, residence or registered office inside a European Union Member State.\n\nAppendix\n‘Compatible Licences’ according to Article 5 EUPL are:\n\nGNU General Public License (GPL) v. 2, v. 3\nGNU Affero General Public License (AGPL) v. 3\nOpen Software License (OSL) v. 2.1, v. 3.0\nEclipse Public License (EPL) v. 1.0\nCeCILL v. 2.0, v. 2.1\nMozilla Public Licence (MPL) v. 2\nGNU Lesser General Public Licence (LGPL) v. 2.1, v. 3\nCreative Commons Attribution-ShareAlike v. 3.0 Unported (CC BY-SA 3.0) for works other than software\nEuropean Union Public Licence (EUPL) v. 1.1, v. 1.2\nQuébec Free and Open-Source Licence — Reciprocity (LiLiQ-R) or Strong Reciprocity (LiLiQ-R+).\n\nThe European Commission may update this Appendix to later versions of the above licences without producing a new version of the EUPL, as long as they provide the rights granted in Article 2 of this Licence and protect the covered Source Code from exclusive appropriation.\nAll other changes or additions to this Appendix require the production of a new EUPL version." } ] \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 8d97514..2af5717 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -1,55 +1,55 @@ - https://inseefrlab.github.io/ESA-Nowcasting-2024/methodology.html - 2023-11-30T14:28:18.990Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/oil.html + 2023-12-16T14:13:19.013Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/lessons-learned.html - 2023-11-30T14:28:17.662Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/reproducibility.html + 2023-12-16T14:13:16.501Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/utils/utils.html - 2023-11-30T14:28:15.278Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/post-mortem-electricity.html + 2023-12-16T14:13:14.165Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/post-mortem-electricity.html - 2023-11-30T14:28:12.602Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/post-mortem-gas.html + 2023-12-16T14:13:10.253Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/reproducibility.html - 2023-11-30T14:28:09.926Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/utils/utils.html + 2023-12-16T14:13:07.953Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/oil.html - 2023-11-30T14:28:07.398Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/methodology.html + 2023-12-16T14:13:05.057Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/post-mortem-gas.html - 2023-11-30T14:28:04.578Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/data.html + 2023-12-16T14:13:03.641Z https://inseefrlab.github.io/ESA-Nowcasting-2024/index.html - 2023-11-30T14:28:05.282Z + 2023-12-16T14:13:04.305Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/post-mortem-oil.html - 2023-11-30T14:28:09.498Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/gas.html + 2023-12-16T14:13:07.333Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/LICENCE.html - 2023-11-30T14:28:10.538Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/lessons-learned.html + 2023-12-16T14:13:08.273Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/gas.html - 2023-11-30T14:28:14.646Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/post-mortem-oil.html + 2023-12-16T14:13:12.209Z https://inseefrlab.github.io/ESA-Nowcasting-2024/electricity.html - 2023-11-30T14:28:17.354Z + 2023-12-16T14:13:16.101Z - https://inseefrlab.github.io/ESA-Nowcasting-2024/data.html - 2023-11-30T14:28:18.226Z + https://inseefrlab.github.io/ESA-Nowcasting-2024/LICENCE.html + 2023-12-16T14:13:17.089Z