CIA WFB download code updated

ClimateImpactLab · bolliger32 · Apr 8, 2022 · Apr 8, 2022 · Apr 9, 2022 · Apr 9, 2022
commit ccea0034900c86fa4bf439fdc9e2c08ac6965a52
diff --git a/notebooks/create-SLIIDERS-ECON/download-sliiders-econ-input-data.ipynb b/notebooks/create-SLIIDERS-ECON/download-sliiders-econ-input-data.ipynb
@@ -21,7 +21,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import os\n",
+    "from os import remove as osrem\n",
     "import ssl\n",
     "import subprocess\n",
     "import tarfile\n",
@@ -341,6 +341,30 @@
     "file.close()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "2d45bbf7-2569-4365-b810-cd81c075286d",
+   "metadata": {},
+   "source": [
+    "### CIA World Factbook, versions 2000 to 2020"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2a86932d-877b-4788-a25e-1764ba958212",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cia_download_url = \"https://www.cia.gov/the-world-factbook/about/archives/download\"\n",
+    "cia_files = [f\"factbook-{x}.zip\" for x in range(2000, 2021)]\n",
+    "\n",
+    "for i in tqdm(cia_files):\n",
+    "    cia_req = requests.get(\"/\".join([cia_download_url, i]))\n",
+    "    cia_zip = ZipFile(BytesIO(cia_req.content))\n",
+    "    cia_zip.extractall(str(sset.DIR_CIA_RAW))"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "a9f0b8fa-7c93-4735-9caa-e07777d150a2",
@@ -469,7 +493,7 @@
     "    file.extractall(sset.DIR_LITPOP_RAW)\n",
     "\n",
     "# clear storage for the existing tar file\n",
-    "os.remove(regular_litpop)"
+    "osrem(regular_litpop)"
    ]
   },
   {
@@ -499,7 +523,7 @@
     "\n",
     "# unzipping\n",
     "outpath = sset.DIR_GEG15_RAW / zip_path.stem\n",
-    "os.makedirs(outpath, exist_ok=True)\n",
+    "outpath.mkdir(parents=True, exist_ok=True)\n",
     "subprocess.Popen([\"unzip\", f\"{zip_path}\", \"-d\", f\"{outpath}\"])"
    ]
   },
@@ -511,7 +535,7 @@
    "outputs": [],
    "source": [
     "# remove zip file (use after unzipping)\n",
-    "os.remove(zip_path)"
+    "osrem(zip_path)"
    ]
   },
   {
@@ -672,13 +696,6 @@
     "2. Once on the page, download the dataset through your MY AVISO+ account (click on `access via MY AVISO+` link and follow the instructions).\n",
     "3. After following the instructions, you will acquire the file `mdt_cnes_cls18_global.nc.gz`. Extract the file `mdt_cnes_cls18_global.nc` from the `.gz` file and save it as `sset.PATH_GEOG_MDT_RAW`.\n",
     "\n",
-    "### CIA World Factbook (compiled by Coleman [2020])\n",
-    "\n",
-    "1. Travel to this [link](https://github.com/iancoleman/cia_world_factbook_api) (credit to Coleman [2020]), and scroll down to the `readme.md`.\n",
-    "2. In the **Data** section of the `readme.md` file, there should be a link on \"Historical\"; click on this link to travel to a `mega.nz` website having `weekly_json.7z` file.\n",
-    "3. After checking that the filename to download is `weekly_json.7z`, download the said file by clicking on the \"Download\" button.\n",
-    "4. When download is successful, import `weekly_json.7z` to the preferred directory (`sset.DIR_YPK_RAW` in this implementation).\n",
-    "\n",
     "### HydroSHEDS\n",
     "1. Go to https://hydrosheds.org/downloads\n",
     "2. Download the \"standard\" level-0 HydroBASINS files for each continent (use the Dropbox link if available--this appears as \"NOTE: you may also download data from here.\" as of 8/16/21. Download the shapefiles into the directory defined in `sset.DIR_HYDROBASINS_RAW`"