Welcome to the Azure Machine Learning Workshop! In this session, you’ll embark on a hands-on journey to create and deploy machine learning models, with a special focus on geoscience applications. Using Azure Machine Learning's Designer, AutoML, and Notebooks, you’ll build models from the ground up, concentrating on practical geoscience scenarios.
This workshop is structured to provide clear, step-by-step guidance. Follow these instructions closely to maximize your learning experience.
Estimated Time to Complete: 1 to 2 hours
Rest assured, every step of the exercise is carefully laid out to support your progress.
The primary objective of this project is to predict geothermal characteristics in Colombia, with a particular focus on estimating the geothermal gradient. By leveraging machine learning techniques, you’ll aim to predict the Apparent Geothermal Gradient (°C/Km), which is crucial for geothermal exploration.
This project utilizes a blend of geospatial data, geophysical information, and geothermal measurements. The data, found in normalized_data_minimax.csv
, includes details such as well depths, temperatures, geological features, and proximity to volcanic structures.
The dataset used in this project is normalized, ensuring that all features have been scaled to a similar range, which is crucial for effective machine learning model training. Each column in the dataset is explained below:
- Latitude: Specifies the north-south position of a point on the Earth's surface in degrees.
- Longitude: Specifies the east-west position of a point on the Earth's surface in degrees.
- Elevation (m): The height of a point above sea level, measured in meters.
- Surface Temperature (°C): The temperature at the Earth's surface at a specific location, measured in degrees Celsius.
- Apparent Geothermal Gradient (°C/Km): The rate of temperature increase with depth beneath the Earth's surface, expressed in degrees Celsius per kilometer.
- Moho Depth (m): The depth to the Mohorovičić discontinuity, the boundary between the Earth's crust and the mantle, measured in meters.
- Magnetic Anomaly (nT): The deviation of the Earth's magnetic field from the expected value, measured in nanoteslas (nT), indicating variations in the magnetic properties of underlying rocks.
- Fault: Indicates the presence (1) or absence (0) of a fault at the location.
- Strike-slip Fault: A fault type where the motion is predominantly horizontal along the fault line.
- Reverse or Thrust Fault: A fault where one block moves upwards relative to another, typically associated with compressional forces.
- Lineament: Linear features on the Earth's surface representing underlying geological structures such as faults or fractures.
- Right-lateral Fault: A type of strike-slip fault where the opposite side of the fault moves to the right.
- Normal Fault: A fault where one block moves downward relative to another, usually associated with extensional forces.
- Active Fault: A fault that has recently been active and may be prone to future earthquakes.
- Curie Depth (Km): The depth at which magnetic minerals lose their permanent magnetism due to high temperatures, measured in kilometers.
- Vertical Gravity Gradient (E): The rate of change of the gravitational field with respect to height, measured in Eötvös units (E).
- Free Air Anomaly (mGal): The difference between measured gravity at a location and theoretical gravity, corrected for elevation, measured in milligals (mGal).
- Bouguer Anomaly (mGal): The difference between measured gravity and theoretical gravity after correcting for elevation and the mass of rocks above sea level, measured in milligals (mGal).
- Nearest Basement: The depth to the basement rock beneath sedimentary deposits.
- Nearest Volcano: The distance to the nearest volcano from the given location.
- Volcanic Domain: Classification of the area based on its volcanic activity or history.
- Volcanic Weight: A weighted score representing volcanic activity in the area, often used in risk assessment models.
- Gradient Weight: A weighted value representing the influence of the geothermal gradient in predictive models.
- Sample Weight: The weight assigned to each sample in a dataset, used in machine learning models to give varying importance to samples.
Start by cloning the repository to your local machine:
git clone https://github.com/GitHub-Nawatech-Lab/azureml-exercise.git
- Sign in to Azure Portal: Access the Azure Portal and log in with your credentials.
- Create an Azure Machine Learning Workspace: If you don’t have one already, follow the Azure Machine Learning documentation to set up a new workspace.
- Upload the Repository: Navigate to your workspace and upload the cloned repository.
- Navigate to the Datasets section within your Azure Machine Learning workspace.
- Click on + Create Dataset and select From local files.
- Upload the
normalized_data_minimax.csv
file located in the data folder of the cloned repository. - Complete the dataset registration process by providing a name, description, and ensuring the correct format is selected for the data.
- Create a New Pipeline: Go to the Designer section in Azure Machine Learning Studio and initiate a new pipeline.
- Drag and Drop Modules: Utilize the drag-and-drop interface to add data input, data transformation, and machine learning modules.
- Configure Modules: Set up each module according to the specific requirements of the project.
- Run the Pipeline: After configuring the pipeline, execute it to train and evaluate your model.
- Create a New AutoML Experiment: In Azure Machine Learning Studio, navigate to the Automated ML section and start a new experiment.
- Select Dataset: Choose the dataset you uploaded to Data Assets.
- Configure Experiment: Set the target column to the geothermal gradient and adjust other settings as necessary.
- Run the Experiment: Launch the AutoML experiment to automatically train and evaluate multiple models.
- Install Necessary Libraries: Open the terminal in Azure Machine Learning Studio and run the following command:
pip install -r requirements.txt
- Open the Notebook: In Azure Machine Learning Studio, open the
Model_V4.ipynb
notebook. - Run Cells: Execute each cell to preprocess data, train the model, and evaluate the results.
- Analyze Results: Review the outputs and visualizations to assess the model's performance.