Skip to content

Commit

Permalink
Merge pull request #17 from pascalhorton/dev-pascal
Browse files Browse the repository at this point in the history
CNN development
  • Loading branch information
pascalhorton authored Aug 6, 2024
2 parents 02b0c33 + dc4453b commit a023c40
Show file tree
Hide file tree
Showing 21 changed files with 1,104 additions and 381 deletions.
74 changes: 45 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,34 +61,50 @@ The damages correspond to insurance claims per cell (pixel of the precipitation
The damages are managed by the `Damages` class in `swafi/damages.py` and are also handled internally as a Pandas dataframe.

There are two classes of damages:
- `DamagesMobiliar` from the file `swafi/damages_mobiliar.py`: handles the claims from the Swiss Mobiliar Insurance Company as GeoTIFF.
The dataset from the Mobiliar contains the following categories of claims:

| Name in swafi | Client | Ext/Int | Object | Flood type | Original file names |
|---------------------|---------|----------|-----------|------------|-----------------------------------|
| sme_ext_cont_pluv | SME | external | content | pluvial | Ueberschwemmung_pluvial_KMU_FH |
| sme_ext_cont_fluv | SME | external | content | fluvial | Ueberschwemmung_fluvial_KMU_FH |
| sme_ext_struc_pluv | SME | external | structure | pluvial | Ueberschwemmung_pluvial_KMU_GB |
| sme_ext_struc_fluv | SME | external | structure | fluvial | Ueberschwemmung_fluvial_KMU_GB |
| sme_int_cont | SME | internal | content | | Wasser_KMU_FH |
| sme_int_struc | SME | internal | structure | | Wasser_KMU_GB |
| priv_ext_cont_pluv | Private | external | content | pluvial | Ueberschwemmung_pluvial_Privat_FH |
| priv_ext_cont_fluv | Private | external | content | fluvial | Ueberschwemmung_fluvial_Privat_FH |
| priv_ext_struc_pluv | Private | external | structure | pluvial | Ueberschwemmung_pluvial_Privat_GB |
| priv_ext_struc_fluv | Private | external | structure | fluvial | Ueberschwemmung_fluvial_Privat_GB |
| priv_int_cont | Private | internal | content | | Wasser_Privat_FH |
| priv_int_struc | Private | internal | structure | | Wasser_Privat_GB |

- `DamagesGVZ` from the file `swafi/damages_gvz.py`: handles the claims from the GVZ (Building insurance Canton Zurich) as netCDF.
The dataset from the GVZ contains the following categories:

| Name in swafi | Original tag |
|---------------------|---------------|
| most_likely_pluvial | A |
| likely_pluvial | A, B |
| fluvial_or_pluvial | A, B, C, D, E |
| likely_fluvial | D, E |
| most_likely_fluvial | E |

#### DamagesMobiliar
`DamagesMobiliar` from the file `swafi/damages_mobiliar.py`: handles the claims from the Swiss Mobiliar Insurance Company as GeoTIFF.
The dataset from the Mobiliar contains the following categories of **exposure** (contracts):

| Name in swafi | Client | Ext/Int | Object | Original file names |
|---------------------|---------|----------|-----------|-----------------------------|
| sme_ext_cont | SME | external | content | Vertraege_KMU_ES_FH_YYYY |
| sme_ext_struc | SME | external | structure | Vertraege_KMU_ES_GB_YYYY |
| sme_int_cont | SME | internal | content | Vertraege_KMU_W_FH_YYYY |
| sme_int_struc | SME | internal | structure | Vertraege_KMU_W_GB_YYYY |
| priv_ext_cont | Private | external | content | Vertraege_Privat_ES_FH_YYYY |
| priv_ext_struc | Private | external | structure | Vertraege_Privat_ES_GB_YYYY |
| priv_int_cont | Private | internal | content | Vertraege_Privat_W_FH_YYYY |
| priv_int_struc | Private | internal | structure | Vertraege_Privat_W_GB_YYYY |

The dataset from the Mobiliar contains the following categories of **claims**:

| Name in swafi | Client | Ext/Int | Object | Flood type | Original file names |
|---------------------|---------|----------|-----------|------------|-----------------------------------|
| sme_ext_cont_pluv | SME | external | content | pluvial | Ueberschwemmung_pluvial_KMU_FH |
| sme_ext_cont_fluv | SME | external | content | fluvial | Ueberschwemmung_fluvial_KMU_FH |
| sme_ext_struc_pluv | SME | external | structure | pluvial | Ueberschwemmung_pluvial_KMU_GB |
| sme_ext_struc_fluv | SME | external | structure | fluvial | Ueberschwemmung_fluvial_KMU_GB |
| sme_int_cont | SME | internal | content | | Wasser_KMU_FH |
| sme_int_struc | SME | internal | structure | | Wasser_KMU_GB |
| priv_ext_cont_pluv | Private | external | content | pluvial | Ueberschwemmung_pluvial_Privat_FH |
| priv_ext_cont_fluv | Private | external | content | fluvial | Ueberschwemmung_fluvial_Privat_FH |
| priv_ext_struc_pluv | Private | external | structure | pluvial | Ueberschwemmung_pluvial_Privat_GB |
| priv_ext_struc_fluv | Private | external | structure | fluvial | Ueberschwemmung_fluvial_Privat_GB |
| priv_int_cont | Private | internal | content | | Wasser_Privat_FH |
| priv_int_struc | Private | internal | structure | | Wasser_Privat_GB |

#### DamagesGVZ
`DamagesGVZ` from the file `swafi/damages_gvz.py`: handles the claims from the GVZ (Building insurance Canton Zurich) as netCDF.
The dataset from the GVZ contains a single category of **exposure** (contracts): `all_buildings`, and the following categories of **claims**:

| Name in swafi | Original tag |
|---------------------|---------------|
| most_likely_pluvial | A |
| likely_pluvial | A, B |
| fluvial_or_pluvial | A, B, C, D, E |
| likely_fluvial | D, E |
| most_likely_fluvial | E |

These classes are subclasses of the `Damages` class and implement the data loading according to the corresponding file format as well as their specific classification.

Expand Down Expand Up @@ -273,7 +289,7 @@ All the hyperparameters of the model can be set as options of the script.
The model can be trained using the following command:

```bash
train_dl_occurrence.py [-h] [--run-id RUN_ID] [--optimize-with-optuna]
train_cnn_occurrence.py [-h] [--run-id RUN_ID] [--optimize-with-optuna]
[--target-type TARGET_TYPE]
[--factor-neg-reduction FACTOR_NEG_REDUCTION]
[--weight-denominator WEIGHT_DENOMINATOR]
Expand Down
6 changes: 3 additions & 3 deletions config_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ YEAR_END_MOBILIAR: 2022
YEAR_START_GVZ: 2005
YEAR_END_GVZ: 2022

# CID (cells IDs) raster path
CID_PATH: '..\files\cids.tif'
# CID (cells IDs) raster path (not needed for Switzerland)
CID_PATH: 'path/to/cids.tif'

# Contract and damage data directories
DIR_EXPOSURE_MOBILIAR: ''
Expand All @@ -19,7 +19,7 @@ DIR_EXPOSURE_GVZ: ''
DIR_CLAIMS_GVZ: ''

# Path to the events parquet file
EVENTS_PATH: ''
EVENTS_PATH: 'path/to/prec_events_2005-2023_no_smoothing.parquet'

# Path to the precipitation directory
DIR_PRECIP: ''
Expand Down
1 change: 1 addition & 0 deletions requirements-optional.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ optuna
asyncpg
psycopg2
psycopg2-binary
plotly
186 changes: 186 additions & 0 deletions scripts/data_analyses/analyze_damage_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
"""
This script analyzes the distribution of the number of contracts and claims per cell.
"""

from swafi.config import Config
from swafi.damages_mobiliar import DamagesMobiliar
from swafi.damages_gvz import DamagesGvz
import pandas as pd
import matplotlib.pyplot as plt

config = Config(output_dir='analysis_damage_distribution')
output_dir = config.output_dir

PICKLES_DIR = config.get('PICKLES_DIR')
DATASET = 'mobiliar' # 'mobiliar' or 'gvz'

if DATASET == 'mobiliar':
EXPOSURE_CATEGORIES = ['external']
CLAIM_CATEGORIES = ['external', 'pluvial']
elif DATASET == 'gvz':
EXPOSURE_CATEGORIES = ['all_buildings']
CLAIM_CATEGORIES = ['likely_pluvial']


def main():
if DATASET == 'mobiliar':
damages = DamagesMobiliar(dir_exposure=config.get('DIR_EXPOSURE_MOBILIAR'),
dir_claims=config.get('DIR_CLAIMS_MOBILIAR'),
year_start=config.get('YEAR_START_MOBILIAR'),
year_end=config.get('YEAR_END_MOBILIAR'))
elif DATASET == 'gvz':
damages = DamagesGvz(dir_exposure=config.get('DIR_EXPOSURE_GVZ'),
dir_claims=config.get('DIR_CLAIMS_GVZ'),
year_start=config.get('YEAR_START_GVZ'),
year_end=config.get('YEAR_END_GVZ'))
else:
raise ValueError(f'Dataset {DATASET} not recognized.')

# Format the date of the claims
df_claims = damages.claims
df_claims['date_claim'] = pd.to_datetime(
df_claims['date_claim'], errors='coerce')

# Compute the monthly sum of claims for each month per category
df_claims_month = df_claims.copy()
df_claims_month['month'] = df_claims_month['date_claim'].dt.month
df_claims_month = df_claims_month.drop(
columns=['date_claim', 'mask_index', 'cid', 'x', 'y'])
df_claims_month_sum = df_claims_month.groupby('month').sum()

# Plot the monthly distribution of the total # of claims for different categories
for category in damages.claim_categories:
sum_claims = df_claims_month_sum[category].sum()
plt.figure(figsize=(8, 4))
plt.title(f'Monthly distribution of the claims for category {category} '
f'(total: {sum_claims})')
plt.xlabel('Month')
plt.ylabel('Percentage of claims [%]')
nb_annual_claims = df_claims_month_sum[category] / sum_claims
plt.bar(df_claims_month_sum.index, 100 * nb_annual_claims)
plt.xticks(range(1, 13))
plt.tight_layout()
plt.savefig(output_dir / f'monthly_distribution_tot_claims_{category}.png')
plt.savefig(output_dir / f'monthly_distribution_tot_claims_{category}.pdf')
plt.close()

# For the whole domain, aggregate by date (sum)
df_claims_date = df_claims.copy()
df_claims_date = df_claims_date.drop(
columns=['mask_index', 'cid', 'x', 'y'])
df_claims_date = df_claims_date.groupby('date_claim').sum()
df_claims_date['date_claim'] = pd.to_datetime(df_claims_date.index, errors='coerce')
df_claims_date['month'] = df_claims_date['date_claim'].dt.month

# Plot the monthly distribution of the mean # of claims for different categories
for category in damages.claim_categories:
df_claims_date_cat = df_claims_date.copy()
df_claims_date_cat = df_claims_date_cat[df_claims_date_cat[category] > 0]
df_claims_date_cat = df_claims_date_cat.groupby('month').mean()
plt.figure(figsize=(8, 4))
plt.title(f'Monthly distribution of the mean # of claims / event for category {category}')
plt.xlabel('Month')
plt.ylabel('Mean number of claims')
plt.bar(df_claims_date_cat.index, df_claims_date_cat[category])
plt.xticks(range(1, 13))
plt.tight_layout()
plt.savefig(output_dir / f'monthly_distribution_mean_claims_{category}.png')
plt.savefig(output_dir / f'monthly_distribution_mean_claims_{category}.pdf')
plt.close()

# Select the categories of interest
damages.select_categories_type(EXPOSURE_CATEGORIES, CLAIM_CATEGORIES)

# Analyze the occurrences of damages to the structure and/or content
if DATASET == 'mobiliar':
df_mobi = damages.claims

# Sum priv and sme
df_mobi['ext_struc_pluv'] = (
df_mobi['priv_ext_struc_pluv'] + df_mobi['sme_ext_struc_pluv'])
df_mobi['ext_cont_pluv'] = (
df_mobi['priv_ext_cont_pluv'] + df_mobi['sme_ext_cont_pluv'])
df_mobi['ext_both_pluv'] = (
df_mobi[['ext_struc_pluv', 'ext_cont_pluv']].min(axis=1))
df_mobi['ext_struc_only_pluv'] = (
df_mobi['ext_struc_pluv'] - df_mobi['ext_both_pluv'])
df_mobi['ext_cont_only_pluv'] = (
df_mobi['ext_cont_pluv'] - df_mobi['ext_both_pluv'])

nb_both = df_mobi['ext_both_pluv'].sum()
nb_struc = df_mobi['ext_struc_only_pluv'].sum()
nb_cont = df_mobi['ext_cont_only_pluv'].sum()
nb_tot = nb_both + nb_struc + nb_cont
pc_both = 100 * nb_both / nb_tot
pc_struc = 100 * nb_struc / nb_tot
pc_cont = 100 * nb_cont / nb_tot

print(f"Number of claims with both structure and content: {pc_both:.2f}%")
print(f"Number of claims with structure only: {pc_struc:.2f}%")
print(f"Number of claims with content only: {pc_cont:.2f}%")

# Analyze the distribution of the number of contracts and claims per cell
df_contracts = damages.exposure
df_contracts = df_contracts[['mask_index', 'selection', 'cid']]

# Average the number of annual contracts per location
df_contracts = df_contracts.groupby('mask_index').mean()

# Plot the histogram of the number of contracts per cell
plt.figure()
plt.title('Histogram of the number of contracts per cell')
plt.xlabel('Number of contracts')
plt.ylabel('Number of cells')
plt.hist(df_contracts['selection'], bins=100)
plt.yscale('log')
plt.xlim(0, None)
plt.tight_layout()
plt.savefig(output_dir / 'histogram_contracts.png')
plt.savefig(output_dir / 'histogram_contracts.pdf')

df_claims = damages.claims
df_claims = df_claims[['mask_index', 'selection']]

# Sum the number of claims per location and divide by the number of years
df_claims = df_claims.groupby('mask_index').sum()
n_years = damages.year_end - damages.year_start + 1
df_claims['selection'] = df_claims['selection'] / n_years

# Plot the histogram of the number of claims per cell
plt.figure()
plt.title('Histogram of the number of annual claims per cell')
plt.xlabel('Number of claims')
plt.ylabel('Number of cells')
plt.hist(df_claims['selection'], bins=50)
plt.yscale('log')
plt.xlim(0, None)
plt.tight_layout()
plt.savefig(output_dir / 'histogram_claims.png')
plt.savefig(output_dir / 'histogram_claims.pdf')

# Merge the contracts and claims dataframes on the index
df_merged = df_contracts.merge(df_claims, left_index=True,
right_index=True, how='left')

# Rename the columns
df_merged.columns = ['contracts', 'cid', 'claims']

# Replace nan values with 0
df_merged.fillna(0, inplace=True)

# Plot the relationship between the number of contracts and the number of claims
plt.figure()
plt.title('Relationship between the number of contracts and the claims')
plt.xlabel('Number of contracts (mean per cell)')
plt.ylabel('Mean number of annual claims (sum per cell)')
plt.scatter(df_merged['contracts'], df_merged['claims'],
facecolors='none', edgecolors='k')
plt.xscale('log')
plt.yscale('log')
plt.tight_layout()
plt.savefig(output_dir / 'scatter_contracts_claims.png')
plt.savefig(output_dir / 'scatter_contracts_claims.pdf')


if __name__ == '__main__':
main()
Loading

0 comments on commit a023c40

Please sign in to comment.