update

eunomia-bpf · Sep 18, 2024 · b51b617 · b51b617
1 parent bb901f9
commit b51b617
Show file tree

Hide file tree

Showing 18 changed files with 127,939 additions and 420 deletions.
diff --git a/README.md b/README.md
@@ -1,34 +1,42 @@
 # Code-Survey: Uncovering Insights in Complex Systems with LLM
 
-- **Do we really kno how complex systems like the Linux works?** 
-- **How can we understand the high-level design cohice and evolution of a Super Complex system, like the Linux kernel?**
+- Do we really know how complex systems like the Linux works?
+- How can we understand the high-level design cohice and evolution of a Super Complex system, like the Linux kernel?
 
 **Code-Survey** is `the first step` here to change that.
 
-> Imagine if you can ask every kernel developer, or graduate student who is studying kernel, to do a survey and answer questions about every commit, what can you find with the results?
+> Imagine if you can ask every entry-level kernel developer, or a Graduate Student who is studying kernel, to do a survey and answer questions about every commit, what can you find with the results?
 
-Code-Survey helps you `explore` and `analyze`. the world's largest and most intricate codebases, like the Linux kernel. By carefully **design a survey** and **transforming** `unstructured data` like commits, mailing lists into organized, ``structed and easy-to-analyze data`, Code-Survey makes it simpler to uncover valuable insights in modern complex software systems.
+Code-Survey helps you `explore` and `analyze` the world's largest and most intricate codebases, like the Linux kernel. By carefully **design a survey** and **transforming** `unstructured data` like commits, mails into organized, `structed and easy-to-analyze data`, then you can do `quantitative` analysis on it. Code-Survey makes it simpler to uncover valuable insights in modern complex software systems.
 
-With the power of AI and Large Language Models (LLMs), you can ask questions, run queries, and gain a deeper understanding of how systems evolve over time. AI Agents can also help you analysis that. Whether you're a developer, researcher, or enthusiast, Code-Survey bridges the gap between `design`, `implementation`, `maintenance`, `reliability` and `security`, making complex systems more accessible.
+Code-Survey is the first step trying to bridges the gap between high level `design`, `implementation`, `maintenance`, `reliability` and `security` using LLM, making complex systems more accessible.
 
 Unlike other approaches:
 
-- **No human can do that before, but AI can.**
-- **No chatbots, document search, or code generation: everyone is doing so.**
-- **Just using data like git message and email data. Hundreds lines of codes in Python. Just Apply to other project or subsystems by design your code-survey!**
+- No human can do that before, but AI can.
+- No chatbots, RAG document search, or code generation: **stop the stupid AI!**
+- Just using data like git message and email data. Design a survey and run with hundreds lines of codes in Python. Just Apply to other project or subsystems by designing your code-survey!
 
 **Let's do Code-Survey!**
 
 ## What can `Code-Survey` help answer?
 
-- How do new feature introductions affect software stability and performance over the time?
-- What identifiable phases exist in a feature’s lifecycle? Is it new, mature, refactored or deprecated?
-- What dependencies have emerged between features and componeents, and how do they affect software evolution?
-- How does bug frequency correlate with feature complexity, and how are regressions managed?
+- How do new feature introductions in component affect software stability and performance over the time?
+- What identifiable phases exist in a component lifecycle? Is it new, mature, refactored or deprecated?
+- What dependencies have emerged between features and component, and how do they affect software evolution?
+- How does bug frequency correlate with feature complexity?
 - What were the trade-offs considered in design decisions, and how do they manifest in the system's implementation?
 - How does the collaboration between developers affect the consistency and coherence of feature development?
 
-## Workflow / Methodology
+Here is an example of analysis: **[docs/report_ebpf.md](docs/report_ebpf.md).** (Not yet complete...more is adding)
+
+## Workflow / Methodolog
+
+The core idea of Code-survey is to treat LLMs like human participants in a survey: 
+
+- they can process data faster and cheaper, but are also prone to errors and limitations. 
+- By applying traditional human survey methods, we can efficiently conduct LLM-based surveys, while human experts provide oversight and validation to ensure accuracy.
+- You can let LLM help you with survey design and data 
 
 ```
 [Human Experts design survey] -> [LLM Agent complete survey] -> [Human Experts (LLM) evaluate survey results samples] -> [Human Experts (LLM) give the report]
@@ -53,53 +61,73 @@ There are also 4 key steps to allow LLM Agent asistant to design the survey. The
 The **Linux-bpf dataset** focuses on the eBPF subsystem and is continuously updated via CI. The dataset includes:
 
 - **680+ expert-selected commits**: Features, commit details, types (Map, Helper, Kfunc, Prog, etc.). Human experts tagged these commits and can be analyzed by LLM Agents. [dataset here](data/feature_commit_details.csv)
-- **12,000+ BPF-related commits**: LLM Agent surveys and summaries. [dataset here](data/commit_survey.csv)
+- **12,000+ BPF-related commits**: LLM Agent surveys and summaries. You can download [dataset here](data/commit_survey.csv).
 - **150,000+ BPF subsystem-related emails**: LLM Agent surveys and summaries(`TODO`).
 
+**To see more details abot what we find, check the analysis in [report_ebpf.md](docs/report_ebpf.md).**
+
 A simplest approach to see how these data works is just **Upload the CSV to ChatGPT**(**Or other platforms) and Ask questions to let it Analysis for you!
 
-To see more details, check the analysis in [report_ebpf.md](docs/report_ebpf.md).
+Note this is just a very simple demo now --- there are hundreds of ways to improve the survey accuracy:
+
+- It's using gpt4o API, o1 model can be much better;
+- You can simply run it multiple times to get multiple survey results and then average them. This is typically a real survey d and the result would be much better, but we need time and more money for API.
+- More Advance Agent design with multi-steps and reasonging, or multi-agent;
+- Better prompt engineering;
+
 
 ## Survey Example
 
-You can find this example in [survey/commit_survey.yml](survey/commit_survey.yml), which analysis the commits in the Linux kernel eBPF subsystem.
+You can find this example in [survey/commit_survey.yml](survey/commit_survey.yml), which analysis all the 10000+ bpf commits in the Linux kernel eBPF subsystem.
 
 ```yml
-# Configuration for LLM Agent in Code-survey
-title: "Feature Classification Survey"
-description: "A survey about the use cases and summary of feature in Linux eBPF. Note there might be some times the commit message is not related to the feature itself. If that happens you need to focus on the feature itself and ignore the commit message. Pay attention to the feature itself."
+title: "Commit Classification Survey"
+description: "A survey about the commit in Linux eBPF, to help better understand the design and evolution of bpf subsystem. For choice, try to be as specific as possible based on the commit message and code changes. If the commit message is not clear or does not provide enough information, you can choose the 'I'm not sure' option."
+hint: "For example, when seems not related to eBPF, confirm it's a rare cases really has nothing to do with eBPF in all it's contents, such as btrfs or misspelled commit message. Do not tag subsystem changes related to eBPF as not."
 questions:
-  - id: summary
-    type: fill_in
-    question: "Please provide a summary of the Feature in one short sentence not longer than 100 words. Only output one sentence."
-    required: true
-
-  - id: keywords
-    type: fill_in
-    question: "Please extract no more than 3 keywords from the Feature. Only output 3 keywords."
-    required: true
-
-  - id: feature_classification
-    type: multiple_choice
-    question: "What may be the main use cases of the feature? You can select multiple options."
-    choices:
-      - value: It's used to improve the security of the Linux. e.g. used for controlling access to the system etc.
-      - value: It's used to improve the network of the Linux. e.g. used for improving the network performance etc.
-      - value: It's used to improve the observability of the Linux. e.g. used for monitoring tracing the system improve the performance etc.
-      - value: It's a performance optimization unrelated to security networking or observability.
-      - value: It relates to a specific hardware feature (e.g. accelerated offload hardware tracing etc.)
-      - value: The feature relate other use cases not listed above.
-      - value: It does not relate to any of the above it's a general feature for runtime such as verifier instructions etc. You should not other use cases if you select this option.
-      - value: It's a merge or include multiple features.
-
-  - id: feature_complexity
-    type: single_choice
-    question: "What is the estimated complexity of implementing this commit?"
-    choices:
-      - value: Simple and can be used without much configuration. e.g. a simple helper function.
-      - value: Moderate and requires some setup or understanding of the system. e.g. a new map type or a new link type.
-      - value: Complex and needs expert knowledge or significant changes to existing systems. Like adding a completely new subsystem support or a completely new program type don't exist before.
-      - value: It's a merge commit not related to any of the above.
+- id: summary
+  type: fill_in
+  question: "Please provide a summary of It in one short sentence not longer than 30 words. Only output one sentence."
+  required: true
+
+- id: keywords
+  type: fill_in
+  question: "Please extract no more than 3 keywords from the commit. Only output 3 keywords without any special characters."
+  required: true
+
+- id: commit_classification
+  type: single_choice
+  question: "What may be the main type of the commit?"
+  choices:
+    - value: A bug fix. It primarily resolves a bug or issue in the code.
+    - value: A new feature. It adds a new capability or feature that was not previously present.
+    - value: A performance optimization. It improves the performance of existing code such as reducing latency or improving throughput.
+    - value: A cleanup or refactoring in the code. It involves changes to improve code readability maintainability or structure without changing its functionality.
+    - value: A documentation change or typo fix. It only involves changes to documentation files or fixes a typographical error.
+    - value: A test case or test infrastructure change. It adds or modifies test cases test scripts or testing infrastructure.
+    - value: A build system or CI/CD change. It affects the build process continuous integration or deployment pipelines.
+    - value: A security fix. It resolves a security vulnerability or strengthens security measures.
+    - value: It's like a merge commit. It merges changes from another branch or repository.
+    - value: It's other type of commit. It does not fit into any of the categories listed above.
+    - value: I'm not sure about the type of the commit. The nature of It is unclear or uncertain.
+
+- id: major_related_implementation_component
+  type: single_choice
+  question: "What major implementation component is modified by the commit? It's typically where the code changes happened."
+  choices:
+    - value: The eBPF verifier. This component ensures that eBPF programs are safe to run within the kernel.
+    - value: The eBPF JIT compiler for different architectures. It changes how eBPF bytecode is translated into machine code for different hardware architectures.
+    - value: The helper and kfuncs. It modifies or adds helpers and kernel functions that eBPF programs can call.
+    - value: The syscall interface. It changes the system calls through which user-space programs interact with eBPF.
+    - value: The eBPF maps. It changes how data structures shared between user-space and kernel-space (maps) are created or managed.
+    - value: The libbpf library. It affects the library that simplifies interaction with eBPF from user-space applications.
+    - value: The bpftool utility. It modifies the bpftool utility used for introspecting and interacting with eBPF programs and maps.
+    - value: The test cases and makefiles. It adds or modifies test cases or makefile scripts used for testing or building eBPF programs.
+    - value: The implementation happens in other subsystem and is related to eBPF events. e.g. probes perf events tracepoints network scheduler HID LSM etc. Note it's still related to how eBPF programs interact with these events. 
+    - value: It's like a merge commit. It includes significant changes across multiple components of the system.
+    - value: It's not related to any above. It affects an implementation component not listed but does related to the BPF subsystem.
+    - value: It's not related to any above. It affects an implementation component is totally unrelated to the BPF subsystem.  It's not related to any above because it totally not related to the BPF subsystem. It's a rare case wrong data and need removed.
+    - value: I'm not sure about the implementation component of the commit. The component affected by It is unclear.
 ......
 ```
 
@@ -150,6 +178,13 @@ For a more detailed explanation and the general approach, see the [docs/best-pra
 
 ## References
 
-1. [How to Communicate When Submitting Patches: An Empirical Study of the Linux Kernel](https://dl.acm.org/doi/abs/10.1145/3359210)
-2. [Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List](https://dl.acm.org/doi/abs/10.1145/2957792)
+Linux development:
+
+1. [Submitting patches: the essential guide to getting your code into the kernel](https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html)
+2. [how to ask question in maillist](https://www.linuxquestions.org/questions/linux-kernel-70/how-to-ask-question-in-maillist-4175719442/)
+3. [How to Communicate When Submitting Patches: An Empirical Study of the Linux Kernel](https://dl.acm.org/doi/abs/10.1145/3359210)
+3. [Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List](https://dl.acm.org/doi/abs/10.1145/2957792)
+
+AI model:
 
+- [Introducing OpenAI o1-preview](https://openai.com/index/introducing-openai-o1-preview/) They can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
diff --git a/analysis/bpf/timeline_commits_6m.py b/analysis/bpf/timeline_commits_6m.py
@@ -48,10 +48,11 @@ def parse_usecases(usecase_str):
 # Apply the parsing function to the 'usecases_or_submodule_events' column
 survey_data['parsed_usecases'] = survey_data['usecases_or_submodule_events'].apply(parse_usecases)
 
-# Filter out 'merge' commits based on 'commit_classification'
+# Filter out 'merge' commits based on 'major_related_implementation_component'
+# because some important commits might be classified as 'merge' in classification, but it's related to the major component
 # Assuming 'commit_classification' contains phrases like 'merge' and 'not related'
 filter_pattern = re.compile(r'merge', re.IGNORECASE)
-filtered_data = survey_data[~survey_data['commit_classification'].str.contains(filter_pattern, na=False)]
+filtered_data = survey_data[~survey_data['major_related_implementation_component'].str.contains(filter_pattern, na=False)]
 
 print(f"Total commits before filtering: {survey_data.shape[0]}")
 print(f"Total commits after filtering out 'unrelated' and 'merge' commits: {filtered_data.shape[0]}")
@@ -121,8 +122,8 @@ def plot_frequency_timeline(field_name, title, max_labels, threshold, save_path,
     print(f"\nGenerating timeline for: {title}")
 
     # Group by 6-month intervals and category, count commits
-    # Changed '6M' to '6MS' to align with possible deprecation
-    monthly_counts = filtered_data.resample('6MS')[field_name].value_counts().unstack()
+    # Changed '6M' to '3MS' to align with possible deprecation
+    monthly_counts = filtered_data.resample('3MS')[field_name].value_counts().unstack()
 
     # Determine significant categories
     if field_name == 'usecases_or_submodule_events':
@@ -151,7 +152,7 @@ def plot_frequency_timeline(field_name, title, max_labels, threshold, save_path,
     smoothed_counts = apply_moving_average(monthly_counts, window=smoothing_window)
 
     # Plotting
-    fig, ax = plt.subplots(figsize=(14, 8))
+    fig, ax = plt.subplots(figsize=(10, 6))
 
     # Plot each category
     for column in smoothed_counts.columns:
@@ -200,8 +201,8 @@ def plot_usecases_timeline(title, save_path, max_labels=8, threshold=0.005, smoo
     exploded_data = exploded_data.dropna(subset=['parsed_usecases'])
 
     # Group by 6-month intervals and use case, count commits
-    # Changed '6M' to '6MS' to align with possible deprecation
-    monthly_counts = exploded_data.resample('6MS')['parsed_usecases'].value_counts().unstack()
+    # Changed '6M' to '3MS' to align with possible deprecation
+    monthly_counts = exploded_data.resample('3MS')['parsed_usecases'].value_counts().unstack()
 
     # Determine significant categories
     significant_categories = get_significant_categories(flattened_usecases, max_labels, threshold)
@@ -226,7 +227,7 @@ def plot_usecases_timeline(title, save_path, max_labels=8, threshold=0.005, smoo
     smoothed_counts = apply_moving_average(monthly_counts, window=smoothing_window)
 
     # Plotting
-    fig, ax = plt.subplots(figsize=(14, 8))
+    fig, ax = plt.subplots(figsize=(10, 6))
 
     # Plot each category
     for column in smoothed_counts.columns:
@@ -269,7 +270,7 @@ def plot_usecases_timeline(title, save_path, max_labels=8, threshold=0.005, smoo
         max_labels=settings['max_labels'],
         threshold=settings['threshold'],
         save_path=save_path,
-        smoothing_window=2  # Adjust window size as needed
+        smoothing_window=4  # Adjust window size as needed
     )
 
 # Generate timeline chart for use cases or submodule events with smoothing_window=2
@@ -278,7 +279,7 @@ def plot_usecases_timeline(title, save_path, max_labels=8, threshold=0.005, smoo
     save_path='imgs/timeline_usecases_or_submodule_events_smoothed.png',
     max_labels=12,
     threshold=0.005,  # Adjusted threshold for more use cases
-    smoothing_window=2  # Adjust window size as needed
+    smoothing_window=4  # Adjust window size as needed
 )
 
 print("\nAll smoothed timeline charts have been saved successfully.")
diff --git a/analysis/bpf/timeline_features.py b/analysis/bpf/timeline_features.py
@@ -21,7 +21,7 @@
     'attach_types': 'events'
 })
 
-df['feature_type'] = df['feature_type'].where(df['feature_type'].isin(['helper/kfunc']), 'other')
+# df['feature_type'] = df['feature_type'].where(df['feature_type'].isin(['helper/kfunc']), 'other')
 
 # Remove 'argument_constants'
 df = df[df['feature_type'] != 'argument_constants']