-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit bba2a28
Showing
361 changed files
with
25,956 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
Default Semgrep rules for `endorctl` SAST scans reside in this repository. This includes rules authored by Endor Labs and ones from 3rd parties. | ||
|
||
__Important__: Proper attribution of rules authored by 3rd parties is ensured through | ||
- including the license and a link to the upstream repository and rule in the rule metadata, | ||
- maintaining leading comments with license and copyright information in the YAML files, and | ||
- including separate copyright notices and license files in the respective 3rd party subfolders. | ||
|
||
# Directory structure | ||
|
||
The directory structure looks as follows, whereby: | ||
- Rules and samples are kept in separate directories | ||
- Content authored by 3rd parties resides in subdirectory `3p`, whereby content from Endor Labs resides in `endor` | ||
- The directory structure for 3rd party rules follows the one from the Git repository they have been sourced from | ||
- The directory structure for rules from Endor Labs depends on | ||
- `<category>`: one of `vuln`, `malware` or `api` | ||
- `<lang>`: one of `java`, `js`, `py` or `gen` (for cross-language rules) | ||
|
||
```bash | ||
. | ||
├── rules | ||
│ ├── 3p | ||
│ │ └── <3rd-party> | ||
│ │ └── <dir-structure-from-remote-repo> | ||
│ └── endor | ||
│ └── <category> | ||
│ └── <lang> | ||
│ └── <lang>-<rule-id>.yaml | ||
└── samples | ||
├── 3p | ||
│ └── <3rd-party> | ||
└── endor | ||
└── <category> | ||
└── <lang> | ||
└── <lang>-<rule-id>.<ext> | ||
``` | ||
|
||
# Statistics | ||
|
||
![No. of rules per OWASP Top 10](stats/rules_per_language_owasp.svg) | ||
|
||
![No. of rules per language and category](stats/rules_per_language_category.svg) | ||
|
||
![No. of rules per technology](stats/rules_per_technology.svg) | ||
|
||
![No. of rules per language and confidence](stats/rules_per_language_confidence.svg) | ||
|
||
![No. of 3rd party rules per language and license](stats/3p_rules_per_language_license.svg) | ||
|
||
|
||
# Anomalies | ||
|
||
The following charts and CSV files describe anomalies and shortcomings that should be addressed to improve rule quality: | ||
|
||
| File/link | Description | | ||
| --- | --- | | ||
| todo | YAML files with more than 1 rule | | ||
| todo | vulnerability rules with identical `description` | | ||
| todo | vulnerability rules with `TODO` in `cwe` or `description` | | ||
| [rules_without_confidence.csv](stats/rules_without_confidence.csv) | rules without `confidence` | | ||
| [vuln_rules_without_cwe.csv](stats/vuln_rules_without_cwe.csv) | vulnerability rules without `cwe` | | ||
| [vuln_rules_with_many_cwes.csv](stats/vuln_rules_with_many_cwes.csv) | vulnerability rules with more than one `cwe` | | ||
| [vuln_rules_without_owasp_top10.csv](stats/vuln_rules_without_owasp_top10.csv) | vulnerability rules with a `cwe` that is not part of the OWASP Top 10 | | ||
|
||
# Adding Rules | ||
|
||
__Mandatory rule metadata__ to ensure correct processing and display: | ||
- `confidence`: the confidence in the finding (`LOW`, `MEDIUM` or `HIGH`) | ||
- `cwe`: a list of one or more strings in the form `CWE-xxx: Name` (only for category `vulnerability`) | ||
- `description`: a short, user-facing description of the rule | ||
- `endor-category`: one of `critical-api`, `malware-detection`, or `vulnerability` | ||
- `endor-rule-origin.license`: the license of a 3rd party rule (or `none` if no corresponding information can be found in the upstream repository) | ||
- `endor-rule-origin.url`: the Git URL including the commit hash that last touched the respective file in the upstream repository | ||
- `endor-targets`: always `ENDOR_TARGET_REPOSITORY` for the time being | ||
- `version`: a semantic version identifier | ||
- `technology`: should be set in case a rule targets a specific technology, library or framework (not the programming language, which would be redundant with `languages`) | ||
- Vue.js | ||
- Express | ||
- Angular | ||
- React | ||
- Spring | ||
- Spring Boot | ||
- Flask | ||
- Django | ||
|
||
__Pull Requests and CI__: | ||
A pull request needs to be raised and the CI checks have to be passed before it gets merged. The current settings require the approval of 1 reviewer for the PR to be merged. | ||
A set of checks is triggered with each commit. All checks need to pass for a PR to be merged. Those tests include: | ||
- Semgrep validation: runs checks against all rules for errors | ||
- Semgrep tests: runs all rules against the samples provided | ||
- Proto validation: runs tests to ensure that the rules adhere to the protocol buffer specification from Endor Labs as defined [here](https://docs.endorlabs.com/api/#tag/SemgrepRuleService/operation/SemgrepRuleService_CreateSemgrepRule). | ||
- Duplicate detection: runs tests to ensure that the rules don't create duplicate results. | ||
|
||
|
||
## From 3rd party | ||
|
||
3rd party rules must be sourced using the Python script `fetch-3p-rules.py`, to make sure that the above-mentioned metadata is auto-generated where possible. | ||
|
||
__Prerequisites__: | ||
``` | ||
python3 -m venv .venv | ||
source .venv/bin/activate | ||
python3 -m pip install -r bin/requirements.txt | ||
``` | ||
|
||
__Import/Update__: | ||
``` | ||
python3 bin/fetch-3p-rules.py --repo <URL of upstream repo> --clone-into .tmp --license <SPDX license identifier> --third-party <3rd-party> --repo-subdir <subdirectory in upstream repo> --copyright-notice <file in upstream repo> | ||
``` | ||
|
||
The script downloads rule and sample files to `rules/<3rd-party>/<name>` and `samples/<3rd-party>/<name>`, whereby the `<name>` is specified with option `--third-party`, and should correspond to the name of the GitHub/GitLab organization or repository name. | ||
|
||
__License and copyright__: | ||
- The open source license of the rule must be specified as [SPDX license identifier](https://spdx.org/licenses/) using `--license`. If the license identifier needed is not yet present among the choices, add it in the script. | ||
- Additionally, the file containing the original copyright notice must be included with `--copyright-notice`. It will be copied into `rules/<3rd-party>/<name>`. | ||
|
||
__Rule versioning__: The script loops over all files in the respective repo and subfolder (if any, specified with `--repo-subdir`) and checks whether the files already exist in the rules or samples subfolders of the monorepo: | ||
- If not, the file is copied and `metadata.version` is set to `v1.0.0`. | ||
- If yes, it compares the commit hash of the file in the upstream repo with the commit hash in the metadata field `endor-rule-origin.url` of the existing file. If the commits are identical, the file is not copied. If they are different, the file is copied and `metadata.version` is bumped. | ||
|
||
__CWE and description__: The rules in the upstream repository may not have CWE metadata or a proper description. In such cases, the script adds them with a `TODO` in the YAML files. Search and fix those manually to meet the above-described metadata requirements. | ||
|
||
3rd party rules from __GitLab__: | ||
|
||
``` | ||
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir c | ||
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir java | ||
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir javascript | ||
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir python | ||
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license LGPL-3.0-only --third-party gitlab --repo-subdir rules/lgpl/javascript --copyright-notice rules/lgpl/LICENSE | ||
``` | ||
|
||
3rd party rules from __akabe1__: | ||
``` | ||
python3 bin/fetch-3p-rules.py --repo https://github.com/akabe1/akabe1-semgrep-rules --clone-into .tmp --license GPL-3.0-or-later --copyright-notice README.md --third-party akabe1 --repo-subdir java/xxe | ||
``` | ||
|
||
3rd party rules from __chenlvtang__: | ||
|
||
``` | ||
python3 bin/fetch-3p-rules.py --repo https://github.com/chenlvtang/MySemgrepRules --clone-into .tmp --license none --third-party chenlvtang --repo-subdir file-path-traversal | ||
``` | ||
|
||
3rd party rules from __0xdea__: | ||
|
||
``` | ||
python3 bin/fetch-3p-rules.py --repo https://github.com/0xdea/semgrep-rules --clone-into .tmp --repo-subdir c --third-party 0xdea --license MIT --copyright-notice LICENSE | ||
``` | ||
|
||
## From Endor Labs | ||
|
||
What we expect with a new rule: | ||
**Do not look at external semgrep rules for reference.** | ||
**AI usage from ChatGPT or Co-pilot is completely acceptable and encouraged but it should also be human reviewed** | ||
|
||
- All commits must be signed | ||
- A new rule should be added to the appropriate category and language directory. | ||
- There should only be one Semgrep rule per YAML file. | ||
- The rule-id should be names using the following format: **\<lang\>-\<name\>**, for example: | ||
- java-http-repo | ||
- The file should be named using the following format: **\<rule-id\>.yaml**, for example: | ||
- java-http-repo.yaml | ||
- The test target file should be named in the same way with the appropriate file extension, for example: | ||
- java-http-repo.xml | ||
- The rule needs to adhere to the Semgrep syntax. [This page](https://semgrep.dev/docs/writing-rules/rule-syntax) describes the mandatory fields for a semgrep rule. | ||
- Every vulnerability-related rule must also have the metadata field `cwe` (cf. [Semgrep documentation](https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository#including-fields-required-by-security-category)). This CWE will be the basis for creating different categories and subcategories that can be used for selecting a subset of Semgrep rules for a given scan or in the UI. Example categories or TOP-X lists like OWASP Top-10 or CWE Top-25 (cf. [example categories](https://cwe.mitre.org/scoring/index.html#top_n_lists)). | ||
- Each rule should also adhere to the Endor Labs' supported grammar defined [here](https://docs.endorlabs.com/api/#tag/SemgrepRuleService/operation/SemgrepRuleService_CreateSemgrepRule). | ||
- The metadata field `message` must be spell-checked, to make sure it can be shown as-is in our UI. Consider [those advices](https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository#rule-messages) regarding high-quality rule messages. Moreover, the message must not contain any metavariables. The message should also contain descriptive but general advice how how this type of rule should impacts a user, why and how it should be resolved. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
// Marco Ivaldi <[email protected]> | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
|
||
int main(int argc, char** argv) | ||
{ | ||
char cmd[CMD_MAX] = "/usr/bin/cat "; | ||
// ruleid: raptor-argv-envp-access | ||
strcat(cmd, argv[1]); | ||
system(cmd); | ||
|
||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
// Marco Ivaldi <[email protected]> | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
|
||
void invoke1(char *string) | ||
{ | ||
char buf[] = "uname -a; id"; | ||
|
||
// ok: raptor-command-injection | ||
system(buf); | ||
|
||
// ok: raptor-command-injection | ||
system("whoami"); | ||
|
||
// ruleid: raptor-command-injection | ||
system(string); | ||
} | ||
|
||
void invoke2(char *string) | ||
{ | ||
char buf[] = "uname -a; id"; | ||
|
||
// ok: raptor-command-injection | ||
popen(buf, "r"); | ||
|
||
// ok: raptor-command-injection | ||
popen("whoami", "r"); | ||
|
||
// ruleid: raptor-command-injection | ||
popen(string, "r"); | ||
} | ||
|
||
int send_mail(char *user) | ||
{ | ||
char buf[1024]; | ||
FILE *fp; | ||
|
||
snprintf(buf, sizeof(buf), "/usr/bin/sendmail -s \"hi\" %s", user); | ||
|
||
// ruleid: raptor-command-injection | ||
fp = popen(buf, "w"); | ||
|
||
if (fp == NULL) | ||
return 1; | ||
// ... | ||
} | ||
|
||
int main() | ||
{ | ||
printf("Hello, World!"); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
// Marco Ivaldi <[email protected]> | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
|
||
#define MEMSIZE 256 | ||
|
||
void alloc_and_free1() | ||
{ | ||
int bailout = 1; | ||
char *ptr = (char *)malloc(MEMSIZE); | ||
|
||
// this should be catched but it isn't, due to a documented limitation in semgrep | ||
// https://semgrep.dev/docs/writing-rules/pattern-syntax/#ellipses-and-statement-blocks | ||
if (bailout) | ||
free(ptr); | ||
|
||
free(ptr); | ||
// ruleid: raptor-double-free | ||
free(ptr); | ||
} | ||
|
||
void alloc_and_free2() | ||
{ | ||
char *ptr = (char *)malloc(MEMSIZE); | ||
|
||
free(ptr); | ||
ptr = NULL; | ||
// ok: raptor-double-free | ||
free(ptr); | ||
} | ||
|
||
void alloc_and_free3() | ||
{ | ||
char *ptr = (char *)malloc(MEMSIZE); | ||
|
||
free(ptr); | ||
ptr = (char *)malloc(MEMSIZE); | ||
// ok: raptor-double-free | ||
free(ptr); | ||
} | ||
|
||
void double_free(int argc, char **argv) | ||
{ | ||
char *buf1R1; | ||
char *buf2R1; | ||
char *buf1R2; | ||
buf1R1 = (char *) malloc(BUFSIZE2); | ||
buf2R1 = (char *) malloc(BUFSIZE2); | ||
free(buf1R1); | ||
free(buf2R1); | ||
buf1R2 = (char *) malloc(BUFSIZE1); | ||
strncpy(buf1R2, argv[1], BUFSIZE1-1); | ||
// ruleid: raptor-double-free | ||
free(buf2R1); | ||
free(buf1R2); | ||
} | ||
|
||
int Packet *getNextPacket() | ||
{ | ||
Packet *y = (Packet *) malloc(1024); | ||
retval = waitForPacket(y); | ||
if(retval == OK) { | ||
return y; | ||
} else { | ||
return NULL; | ||
} | ||
} | ||
|
||
int bad() | ||
{ | ||
free(logData); | ||
pkt = getNextPacket(); | ||
if(!pkt) { | ||
return NULL; | ||
} | ||
logPktData(pkt); | ||
// ruleid: raptor-double-free | ||
free(logData); | ||
processPacket(pkt); | ||
} | ||
|
||
int main() | ||
{ | ||
printf("Hello, World!"); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
// Marco Ivaldi <[email protected]> | ||
|
||
#include <stdio.h> | ||
#include <syslog.h> | ||
|
||
#define BUFSIZE 256 | ||
|
||
void build_string(char *string) | ||
{ | ||
char buf[BUFSIZE]; | ||
|
||
// ruleid: raptor-format-string-bugs | ||
snprintf(buf, BUFSIZE, string); | ||
|
||
// ok: raptor-format-string-bugs | ||
snprintf(buf, BUFSIZE, "%s", string); | ||
} | ||
|
||
void print_stuff(char *string) | ||
{ | ||
char buf[BUFSIZE]; | ||
|
||
// ruleid: raptor-format-string-bugs | ||
printf(string); | ||
|
||
// ok: raptor-format-string-bugs | ||
printf("%s\n", string); | ||
} | ||
|
||
void log_stuff(char *string) | ||
{ | ||
char buf[BUFSIZE]; | ||
|
||
// ruleid: raptor-format-string-bugs | ||
syslog(LOG_ERR, string); | ||
|
||
// ok: raptor-format-string-bugs | ||
syslog(LOG_ERR, "%s", string); | ||
} | ||
|
||
void log_error(char *fmt, ...) | ||
{ | ||
char buf[BUFSIZE]; | ||
va_list ap; | ||
|
||
va_start(ap, fmt); | ||
// ruleid: raptor-format-string-bugs | ||
vsnprintf(buf, sizeof(buf), fmt, ap); | ||
va_end(ap); | ||
// ruleid: raptor-format-string-bugs | ||
syslog(LOG_NOTICE, buf); | ||
} | ||
|
||
void printWrapper(char *string) | ||
{ | ||
// ruleid: raptor-format-string-bugs | ||
printf(string); | ||
} | ||
|
||
int main(int argc, char **argv) | ||
{ | ||
char buf[5012]; | ||
memcpy(buf, argv[1], 5012); | ||
printWrapper(argv[1]); | ||
return 0; | ||
} |
Oops, something went wrong.