Skip to content

Gravitational Wave Data Exploration: A Practical Training in Programming and Analysis

License

Notifications You must be signed in to change notification settings

Photonnnn/GWData-Bootcamp

 
 

Repository files navigation

引力波数据探索:编程与分析实战训练营

Gravitational Wave Data Exploration: A Practical Training in Programming and Analysis

The course recordings are now available on Bilibili.

Welcome to the GitHub repository for the Gravitational Wave Data Exploration Bootcamp Series! This course is meticulously designed to provide a solid foundation in programming, operational knowledge, and data-driven modeling skills centered around gravitational wave data analysis and research.

Training Objectives

  • Equip participants with robust programming and operational skills, and foundational training in data-driven modeling, focusing on gravitational wave data analysis and related research areas.
  • Note: The course is conducted entirely in Mandarin Chinese to cater to a wide range of Chinese-speaking students and researchers.
  • Discuss the common research methodologies combining gravitational wave data processing with AI technologies, with hands-on examples and projects for practical understanding and mastery.
  • Analyze cutting-edge deep learning models and apply them to real-world gravitational wave data analysis problems through specific case studies.

Target Audience

  • Undergraduate and graduate students interested in data analysis and algorithm development, especially those focusing on gravitational wave data processing and related research.
  • The course also welcomes undergraduates with a basic programming background, looking to enhance their data analysis skills or with an interest in gravitational wave data processing.
  • Future professionals aspiring to work in space-based gravitational wave detection projects and related research fields.

Course Design Philosophy

  • Drawing from past teaching experiences and identified knowledge gaps in student research projects, the course introduces relevant concepts and common methods to ensure comprehensive understanding and application in research.
  • The course is scheduled weekly or bi-weekly, each session lasting about 3 hours, combining online and offline methods (腾讯会议) to ensure interactivity and practicality.
  • The curriculum is expected to be offered once per semester or annually, with continual updates and enrichment based on student feedback and research demands.

Course Outline / Schedule

  • Part Zero: Motivational Introduction

    Description
    - 办课初衷与学员构成
    - 讲师介绍
    - 与本课程相关的知识架构
        - 引力波数据分析
        - 课程大纲
        - 本课程是什么,不是什么
    - 本课程的学习方法与教学团队
    - 本课程的考核规则和项目作业
    - 通向自我实现之路
        - 如何自学
        - 如何提问
    - 提问环节
    
  • Part One: Programming Development Environment and Workflow

    Description
    - Linux Commands and Shell Scripting
    - Git Version Control (GitHub / GitLab)
    - SSH Remote Server Access (Shell / VSCode)
    - Containerization with Docker
    - Hands-On: Setting up Python / Jupyter Development Environment
    - Hands-On: Compiling LALsuite / LISAcode Source Code
    
  • Tech Talk: It's all about data (Guest Lecture by Xinyao Tian)

    Description
    - 数据的起源 (The origin of data)
    - 何谓数据? (What is data?)
    - 现代数据技术的发展脉络 (The development momentum behind data)
    - 当前主流数据技术 (Modern data technologies)
      - 关系型数据库 (RDBMS)
      - 非关系型数据库 (Not-only SQL (NoSQL) Database)
      - 大数据 (Big Data)
      - 数据仓库 (Data Warehouse)
      - 流式计算 (Stream Processing)
      - 数据湖 (Data Lake)
      - 数据湖仓 (Data Lakehouse)
    - 思考:从数据的角度认识世界 (Thinking: Realizing the world from a data perspective)
    - 推荐阅读 (Recommend readings)
    - Q & A
    
  • Part Two: Python-Based Data Analysis Fundamentals

    Description
    - Introduction to Python Programming
    - Algorithms with Numpy / Pandas / Scipy
    - Hands-On: Exploratory Data Analysis of GW Event Catalog / Glitch Data
    - Hands-On: Matched Filtering for GW150914 Data
    - Data Visualization in Python: Theory and Practice
    - Hands-On: Reproducing Figures from GWTC Papers
    
  • Sci Talk: Bayesian inference for gravitational-wave science (Guest Lecture by Junjie Zhao)

    Description
      - Brief introduction to gravitational wave (引力波简要介绍)
      - Part I: Bayesian inference (贝叶斯推断)
          - Definition of “probability” ("概率"的定义)
          - Rethink the interpretations (重思概率诠释)
              - Frequentist statistics (频率学派)
              - Bayesian statistics (贝叶斯学派)
          - Bayes' theorem (贝叶斯定理)
              - Application to the detection of gravitational wave (在引力波探测上应用)
          - Bayesian inference framework (贝叶斯推断框架)
              - Parameter estimation for gravitational-wave data (引力波数据分析中参数估计)
              - Model selection for gravitational-wave data (引力波数据分析中模型选择)
      - Q & A
      - Part II: Bayesian computation (贝叶斯计算方法)
          - Markov Chain Monte Carlo (MCMC; 马尔可夫链-蒙特卡罗方法)
              - hands-on tiny mcmc example
          - Nested sampling (嵌套采样)
              - hands-on tiny nested-sampling example
      - Part III: All in gravitational-wave data (一切尽在引力波数据中)
          - Use Bilby & Parallel Bilby in the GW data analysis
          - nShow the complete pipeline for the data analysis
      - The AMAZING Thomas Bayes (为美好的世界献上"贝叶斯定理")
      - Q & A
    
  • Part Three: Basics of Machine Learning

    Description
    - Overview of Artificial Intelligence
    - Definitions, Objectives, and Types of Machine Learning
    - Machine Learning Project Development and Preparation
    - Hands-On: Clustering Analysis of LIGO's Glitch Data
    
    • Date:2023/12/22 | Video recording | Slide: PDF or online
    • Homework
      • Implement a classification model for credit scoring using the sklearn library in Python.
    • Date:2023/12/24 | Video recording | Slide: PDF or online
      • Use GravitySpy Glitch metadata to build and train a classification model. (Project)
      • Train a clustering model using the time-frequency image information from GravitySpy Glitch. (Project)
    • Homework
      • Model Evaluation and Hyperparameter Tuning for a Credit Scoring Dataset.
  • Part Four: Introduction to Deep Learning

    Description
    - Overview of Deep Learning Technologies
    - Fundamentals of Artificial Neural Networks (ANN)
    - Convolutional Neural Networks (CNN)
    - Hands-On: Identifying Gravitational Waves from Binary Black Hole Systems using CNN
    - Frontiers of Gravitational Wave Data Analysis and AI
    
  • Tech Talk: AI Revolution: From Concept to GPT Breakthroughs (Guest Lecture by Minquan Gao)

    Description
      1. Why AI Was Proposed:
          - Exploring the historical context and reasoning behind the emergence of AI.
          - Initial challenges and needs that AI aimed to address.
      2. Earliest Form of AI and Solutions:
          - Description of the first AI systems, such as simple computational machines.
          - Early AI applications and the problems they solved.
      3. Similarities between AI and Physics Methodologies:
          - Comparing the theoretical frameworks and approaches used in both fields.
          - Identifying shared principles and methods.
      4. From Symbolic Systems to Machine Learning:
          - Evolution of AI from early symbolic and numeric systems.
          - The transition to probabilistic and statistical methods.
          - The development of machine learning technologies.
      5. Principles of Deep Learning:
          - Understanding the core concepts behind deep learning.
          - The architecture of neural networks and their functionality.
      6. Breakthroughs Brought by Deep Learning:
          - Identifying key advancements and innovations due to deep learning.
          - Impact of deep learning on various AI applications.
      7. Typical Deep Learning Scenarios:
          - Examples of deep learning applications in real-world scenarios.
          - Discussion of its effectiveness and adaptability.
      8. Pre-trained Models and Large Models:
          - The role and significance of pre-trained models in AI.
          - Characteristics and implications of large-scale AI models.
      9. Principles of GPT:
          - Explaining the foundational concepts of Generative Pre-trained Transformers.
          - Discussing its applications and impact.
      10. Breakthroughs in AIGC (AI Generated Content):
          - Overview of advancements in AI-generated content.
          - Examples and implications of these breakthroughs.
      11. Current Challenges in AI:
          - Discussing ethical, technical, and practical problems in AI.
          - Examination of ongoing debates and concerns in the field.
      12. Frontiers of AI Research:
          - Exploring cutting-edge research and future directions in AI.
          - Innovations and potential developments on the horizon.
    
  • Final Part: End-of-Camp Ceremony and Wrap-Up

    Description
      - Acknowledgements
      - Training Camp Course Review and Summary
      - Homework Completion & Competition Rankings
      - Awards Ceremony
      - Video Recording Sharing + Bilibili Channel Launch + Remember to Star the Course
      - Welcoming More Feedback and Suggestions
    

Kaggle Data Science Competition (Hackathon)

Homepage: https://www.kaggle.com/competitions/2023-gwdata-bootcamp/

Overview

Objective

  • The objective of this competition is to develop a model that can accurately identify gravitational wave signals from the provided dataset.
  • You will be given a dataset containing a mix of noise and gravitational wave signals. Your task is to develop a model that can accurately distinguish between the two.

Timeline

  • This competition will start at 10:00 PM (Beijing Time) on December 29, 2023, and end at 11:59 PM (Beijing Time) on January 6, 2024.

  • Please make sure to submit your solutions before the deadline.

  • Good luck and may the best team win!

Files

Anyway, just check the baseline notebook for everything!

Hall of Fame (龍虎榜)

Homework

  • You can view the complete assignment results from the assignments committed by students with a total score of 6 and 7.
Total Score 1 2 3 4 5 6 7
Frequency 4 5 6 10 7 23 8
Top Percentage Ranking $100.00 $% $93.65 $% $85.71 $% $76.19 $% $60.32 $% $49.21 $% $12.70 $%
高远坤
高远坤

7️⃣ ✅️
郭印达
郭印达

7️⃣ ✅️
刘守潘
刘守潘

7️⃣ ✅️
张徐蔚
张徐蔚

7️⃣ ✅️
汤丰杰
汤丰杰

7️⃣ ✅️
苏鸿
苏鸿

7️⃣ ✅️
李炳辰
李炳辰

7️⃣ ✅️
郝赵
郝赵

7️⃣ ✅️
黄震洋
黄震洋

6️⃣ ✅️
范钧
范钧

6️⃣ ✅️
邹增慧
邹增慧

6️⃣ ✅️
蒙晓锋
蒙晓锋

6️⃣ ✅️
刘世睿
刘世睿

6️⃣ ✅️
孟德双
孟德双

6️⃣ ✅️
董玉豪
董玉豪

6️⃣ ✅️
王尊
王尊

6️⃣ ✅️
薛亚东
薛亚东

6️⃣ ✅️
刘冉
刘冉

6️⃣ ✅️
单磊磊
单磊磊

6️⃣ ✅️
邹靓
邹靓

6️⃣ ✅️
沈萍
沈萍

6️⃣ ✅️
韩佩佳
韩佩佳

6️⃣ ✅️
吉祥
吉祥

6️⃣ ✅️
张嘉宝
张嘉宝

6️⃣ ✅️
潘洋
潘洋

6️⃣ ✅️
周子力
周子力

6️⃣ ✅️
邱智翀
邱智翀

6️⃣ ✅️
李倾城
李倾城

6️⃣ ✅️
何禹成
何禹成

6️⃣ ✅️
王天龙
王天龙

6️⃣ ✅️
汪一凡
汪一凡

6️⃣ ✅️

Leaderboard of Kaggle Competition

Rank Team Members Score Rank Team Members Score
1 XAO 黄震洋
张徐蔚
0.86173 16 Yuanhao Zhang 张渊皞 0.83317
2 UCAS Li Jiahao 李嘉豪 0.86160 17 B4rRY_G 郭意扬
赖景祺
0.82729
3 UCAS_212x2 刘洋毓 0.86157 18 Shao dong zhao 赵少东 0.82723
4 sophiainshao 沈萍 0.86145 19 Sparkle79 0.82595
5 Haihao SHI 史海浩 0.85954 20 Shoupan Liu 刘守潘 0.82526
6 Yinda Guo 郭印达 0.85832 21 Capoo Cat 孙文博 0.82387
7 deslenlir 温怡蓉 0.85753 22 Tian_Jun 田军 0.82204
8 MengXiaofeng-UCAS 蒙晓锋 0.85188 23 Zhao_Hao 郝赵 0.82047
9 1500!!! 刘冉
刘世睿
王天龙
0.84868 24 douking 王尊 0.81871
10 Qinglin Yan 王霆澜
闫庆琳
0.84530 25 JunFan 范钧 0.8141
11 Zhiqing Zhu 朱智清 0.84527 26 Phi267 秦戈宇 0.81304
12 knnbenn 0.84023 27 tastonlyjust 0.81038
13 HIAS 苏鸿
张景瑞
汤丰杰
0.84023 28 junda zhou 周均达 0.80846
14 HanPeijia 韩佩佳 0.84003 29 SCU_CTP 曹旭
高鸿飞
李志威
0.80705
15 Zenghui Zou 邹增慧
李倾城
0.83883 30 DESHUANGMeng 孟德双 0.80462

Getting Started

Welcome to the course project! To get started with your programming assignments, you'll need to set up your workspace. Here's a step-by-step guide to help you through the process.

Step 1: Set Up Your GitHub Account and Fork the Repository
  1. Create a GitHub Account: If you don't already have a GitHub account, go to GitHub and sign up.
  2. Fork the Course Repository:
    • Navigate to the course's GitHub repository: GWData-Bootcamp.
    • Click on the Fork button at the top right of the page.
    • In the fork settings, make sure to uncheck the option 'copy main branch only'.
  3. Clone the Forked Repository:
    • Open your terminal or Git Bash.
    • Clone the forked repository to your local machine using the following command:
      git clone [email protected]:<YourGitHubUsername>/GWData-Bootcamp.git
    • Replace <YourGitHubUsername> with your actual GitHub username.
Step 2: Set Up Your Local Workspace
  1. Switch to the homework Branch:
    • Navigate to your cloned repository's directory:
      cd GWData-Bootcamp
    • Switch to the homework branch using:
      git switch homework
  2. Create Your Personal Homework Directory:
    • Inside the GWData-Bootcamp directory, create a new directory path for your homework submissions:
      mkdir -p 2023/homework/<YourName>
      • Replace <YourName> with your name or a unique identifier.
Step 3: Submitting Your Homework
  1. Complete Your Assignments:
    • Add your completed assignments to your personal homework directory that you created in the previous step.
    • The assignments should be named as python_submit.txt, numpy_submit.txt, or pandas_submit.txt depending on the type of the assignment.
  2. Push Your Changes:
    • Stage and commit your changes. For example:
      git add .
      git commit -m "Add homework for <SpecificHomework>"
    • Push your homework branch to your forked repository:
      git push origin homework
  3. Create a Pull Request:
    • Go to your forked repository on GitHub.
    • Switch to the homework branch.
    • Click on New Pull Request.
    • Ensure the base repository is set to the original GWData-Bootcamp repository and the base branch is set to homework.
    • Complete the PR form and submit.
    • The GitHub Actions workflow will automatically check your submission (Homework) and compare it with the solution. If your submission passes the check, a merge request will be initiated. Please note that only the repository owners have the authority to merge the request.

Important Notes

  • Do Not Modify Other Students' Work: It's crucial that you do not make any changes to other students' homework directories and contents.
  • Regular Updates: Keep your fork synchronized with the main repository to get the latest updates and assignments.
  • Automated Checks: The GitHub Actions workflow will automatically check your submission when you create a pull request. Make sure your submission passes the check before you submit it.
  • Happy Coding! 🚀👩‍💻👨

Staff

This class is co-taught by He Wang and several esteemed colleagues, including guest lecturers (Junjie Zhao) and industry experts (Xinyao Tian and Minquan Gao), whose names will be announced as they join.

Questions

For any inquiries regarding the course, please email us at 📧 [email protected].

We look forward to your participation and contribution to this exciting field of study!

Contributing

We welcome contributions to enhance course materials. Please fork the repository, make your changes, and submit a pull request.

Collaborating Institutions

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Contributions from the Gravitational Wave Open Science Center (gwosc.org).
  • Educational resources and datasets from renowned institutions and projects in the field.

About

Gravitational Wave Data Exploration: A Practical Training in Programming and Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 56.1%
  • Jupyter Notebook 43.7%
  • Other 0.2%