add shiqin zeng winning submission

thinkonward · Dec 11, 2024 · 61f3a8e · 61f3a8e
1 parent 7dc7138
commit 61f3a8e
Show file tree

Hide file tree

Showing 14 changed files with 2,570 additions and 0 deletions.
diff --git a/geoscience/image-impeccable/Shiqin Zeng/LICENSE b/geoscience/image-impeccable/Shiqin Zeng/LICENSE
@@ -0,0 +1,52 @@
+MIT License
+
+Copyright (c) 2024 Shiqin Zeng
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this pre-trained model and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+
+------------------------------------------------------------------------
+The following components of this software are licensed under the MIT License:
+
+- [3D unet]
+
+MIT License
+
+Copyright (c) 2018 Adrian Wolny
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
diff --git a/geoscience/image-impeccable/Shiqin Zeng/README.md b/geoscience/image-impeccable/Shiqin Zeng/README.md
@@ -0,0 +1,41 @@
+# Image Impeccable: Journey to Clarity
+
+Official PyTorch implementation.<br>
+3D Unet for 3D Seismic Data Denoising<br>
+
+## Requirements
+
+Python libraries: See [environment.yaml](environment.yaml) for library dependencies. The conda environment can be set up using these commands:
+
+```.bash
+conda env create -f environment.yaml 
+conda activate seismic_Denoising
+```
+
+## Data Preparation
+Open the [test.ipynb](test.ipynb) and follow the instructions to download the dataset and transfer the dataset to ``.h5`` format.
+```.bash
+!python data_prep/data_download.py
+!python data_prep/data_format.py
+```
+
+## Train 3D Unet Models
+Our training script is derived from [Deep Learning Semantic Segmentation for High-Resolution Medical Volumes](https://ieeexplore.ieee.org/abstract/document/9425041) and implemented based on [Accurate and Versatile 3D Segmentation of Plant Tissues at Cellular Resolution](https://doi.org/10.7554/eLife.57613). The training loss includes an edge loss component based on the Laplacian operator, implemented according to the paper [Multi-Stage Progressive Image Restoration](https://doi.org/10.48550/arXiv.2102.02808).
+
+
+We are using one HDF5 file for training one epoch to test the code (num_epochs = 1, start = 1, end = 1).
+The `start` and `end` values correspond to the dataset file names. 
+For example:
+`start = 1` and `end = 2` means the script will use the files `original_image-impeccable-train-data-part1.h5` and `original_image-impeccable-train-data-part2.h5`. To include all dataset files, set `start = 1` and `end = 17` to use all training data from `original_image-impeccable-train-data-part1.h5` to `original_image-impeccable-train-data-part17.h5`.
+ You can modify the [config.yaml](scripts/config.yaml) file to adjust parameters such as batch_size, num_epochs, start, and end. Once you have downloaded all the data in the h5py files, set the appropriate start and end values to
+train on the full dataset by running the Python script.
+```.bash
+!python scripts/train_model.py
+```
+
+## Test the pretrained model
+
+The pre-trained model is in the directory [pretrained_model](pretrained_model). See the details in the [test.ipynb](test.ipynb).
+
+
+
diff --git a/geoscience/image-impeccable/Shiqin Zeng/data_prep/data_download.py b/geoscience/image-impeccable/Shiqin Zeng/data_prep/data_download.py
@@ -0,0 +1,92 @@
+# -*- coding: utf-8 -*-
+"""image_impeccable_starter_notebook.ipynb
+
+Automatically generated by Colab.
+
+Original file is located at
+    https://colab.research.google.com/github/thinkonward/challenges/blob/main/geoscience/image-impeccable/image-impeccable-starter-notebook/image_impeccable_starter_notebook.ipynb
+
+# Image Impeccable: Journey to Clarity - Starter Notebook
+
+Welcome to the Image Impeccable Challenge. Your mission, if you choose to accept it, is to build a deep learning model that ingests 3D seismic volumes with noise and returns a 3D volume free of noise. Please see the [challenge page](https://thinkonward.com/app/c/challenges) for more details about the rules and requirements.
+
+### Supplied Materials:
+
+* Starter Notebook
+* Training data: 250 noisy and denoised synthetic 3D seismic volume pairs
+* Test data: 15 noisy 3D seismic volumes
+* `utils.py` script containing helpful functions (optional)
+* `requirements.txt` for all required packages
+
+### Imports
+"""
+
+import os
+import subprocess
+#import torch
+import zipfile
+# Create directories where the data is stored
+directories = ["training_data", "test_data", "submission_files"]
+for directory in directories:
+    if not os.path.exists(directory):
+        os.makedirs(directory)
+
+#device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# List of URLs for training data
+training_urls = [
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part1.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part2.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part3.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part4.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part5.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part6.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part7.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part8.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part9.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part10.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part11.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part12.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part13.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part14.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part15.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part16.zip",
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part17.zip"
+]
+
+# List of URLs for test data
+test_urls = [
+    "https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-test-data.zip"
+]
+
+# Function to download files using wget
+def download_files(urls, output_directory):
+    for url in urls:
+        subprocess.run(["wget", url, "-P", output_directory], check=True)
+
+## Download training data
+download_files(training_urls, "./training_data/")
+#
+### Download test data
+download_files(test_urls, "./test_data/")
+
+# Unzip downloaded files into corresponding directories
+def unzip_files(directory):
+    for filename in os.listdir(directory):
+        if filename.endswith(".zip"):
+            file_path = os.path.join(directory, filename)
+            # Create a directory with the name of the zip file (without the .zip extension)
+            extract_dir = os.path.join(directory, os.path.splitext(filename)[0])
+            os.makedirs(extract_dir, exist_ok=True)
+            # Extract all contents into the corresponding directory
+            with zipfile.ZipFile(file_path, 'r') as zip_ref:
+                zip_ref.extractall(extract_dir)
+            os.remove(file_path)  # Optionally remove the zip file after extraction
+
+# Unzip training and test data
+unzip_files("./training_data/")
+unzip_files("./test_data/")
+"""You have been provided with 250 paired synthetic seismic datasets. There are 500 total volumes, 250 volumes are the noisy seismic, and 250 volumes are the target denoised seismic. The synthetic data is delivered as Numpy arrays with a shape of `(1259,300,300)`.  You are free to use any inline or crosslines from the volumes that you choose for training. The output of your model must be the same shaped volumes as those provided, `(1259,300,300)`. The test dataset will contain 15 noisy seismic volumes of the same shape as the training dataset.
+
+Enough reading, go ahead and load up some seismic data with the cells below and take a look at what we are talking about!
+"""