Skip to content

Commit

Permalink
added sast sample
Browse files Browse the repository at this point in the history
  • Loading branch information
shivaprakash236 committed Dec 12, 2024
0 parents commit bba2a28
Show file tree
Hide file tree
Showing 361 changed files with 25,956 additions and 0 deletions.
170 changes: 170 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
Default Semgrep rules for `endorctl` SAST scans reside in this repository. This includes rules authored by Endor Labs and ones from 3rd parties.

__Important__: Proper attribution of rules authored by 3rd parties is ensured through
- including the license and a link to the upstream repository and rule in the rule metadata,
- maintaining leading comments with license and copyright information in the YAML files, and
- including separate copyright notices and license files in the respective 3rd party subfolders.

# Directory structure

The directory structure looks as follows, whereby:
- Rules and samples are kept in separate directories
- Content authored by 3rd parties resides in subdirectory `3p`, whereby content from Endor Labs resides in `endor`
- The directory structure for 3rd party rules follows the one from the Git repository they have been sourced from
- The directory structure for rules from Endor Labs depends on
- `<category>`: one of `vuln`, `malware` or `api`
- `<lang>`: one of `java`, `js`, `py` or `gen` (for cross-language rules)

```bash
.
├── rules
│ ├── 3p
│ │ └── <3rd-party>
│ │ └── <dir-structure-from-remote-repo>
│ └── endor
│ └── <category>
│ └── <lang>
│ └── <lang>-<rule-id>.yaml
└── samples
├── 3p
│ └── <3rd-party>
└── endor
└── <category>
└── <lang>
└── <lang>-<rule-id>.<ext>
```

# Statistics

![No. of rules per OWASP Top 10](stats/rules_per_language_owasp.svg)

![No. of rules per language and category](stats/rules_per_language_category.svg)

![No. of rules per technology](stats/rules_per_technology.svg)

![No. of rules per language and confidence](stats/rules_per_language_confidence.svg)

![No. of 3rd party rules per language and license](stats/3p_rules_per_language_license.svg)


# Anomalies

The following charts and CSV files describe anomalies and shortcomings that should be addressed to improve rule quality:

| File/link | Description |
| --- | --- |
| todo | YAML files with more than 1 rule |
| todo | vulnerability rules with identical `description` |
| todo | vulnerability rules with `TODO` in `cwe` or `description` |
| [rules_without_confidence.csv](stats/rules_without_confidence.csv) | rules without `confidence` |
| [vuln_rules_without_cwe.csv](stats/vuln_rules_without_cwe.csv) | vulnerability rules without `cwe` |
| [vuln_rules_with_many_cwes.csv](stats/vuln_rules_with_many_cwes.csv) | vulnerability rules with more than one `cwe` |
| [vuln_rules_without_owasp_top10.csv](stats/vuln_rules_without_owasp_top10.csv) | vulnerability rules with a `cwe` that is not part of the OWASP Top 10 |

# Adding Rules

__Mandatory rule metadata__ to ensure correct processing and display:
- `confidence`: the confidence in the finding (`LOW`, `MEDIUM` or `HIGH`)
- `cwe`: a list of one or more strings in the form `CWE-xxx: Name` (only for category `vulnerability`)
- `description`: a short, user-facing description of the rule
- `endor-category`: one of `critical-api`, `malware-detection`, or `vulnerability`
- `endor-rule-origin.license`: the license of a 3rd party rule (or `none` if no corresponding information can be found in the upstream repository)
- `endor-rule-origin.url`: the Git URL including the commit hash that last touched the respective file in the upstream repository
- `endor-targets`: always `ENDOR_TARGET_REPOSITORY` for the time being
- `version`: a semantic version identifier
- `technology`: should be set in case a rule targets a specific technology, library or framework (not the programming language, which would be redundant with `languages`)
- Vue.js
- Express
- Angular
- React
- Spring
- Spring Boot
- Flask
- Django

__Pull Requests and CI__:
A pull request needs to be raised and the CI checks have to be passed before it gets merged. The current settings require the approval of 1 reviewer for the PR to be merged.
A set of checks is triggered with each commit. All checks need to pass for a PR to be merged. Those tests include:
- Semgrep validation: runs checks against all rules for errors
- Semgrep tests: runs all rules against the samples provided
- Proto validation: runs tests to ensure that the rules adhere to the protocol buffer specification from Endor Labs as defined [here](https://docs.endorlabs.com/api/#tag/SemgrepRuleService/operation/SemgrepRuleService_CreateSemgrepRule).
- Duplicate detection: runs tests to ensure that the rules don't create duplicate results.


## From 3rd party

3rd party rules must be sourced using the Python script `fetch-3p-rules.py`, to make sure that the above-mentioned metadata is auto-generated where possible.

__Prerequisites__:
```
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r bin/requirements.txt
```

__Import/Update__:
```
python3 bin/fetch-3p-rules.py --repo <URL of upstream repo> --clone-into .tmp --license <SPDX license identifier> --third-party <3rd-party> --repo-subdir <subdirectory in upstream repo> --copyright-notice <file in upstream repo>
```

The script downloads rule and sample files to `rules/<3rd-party>/<name>` and `samples/<3rd-party>/<name>`, whereby the `<name>` is specified with option `--third-party`, and should correspond to the name of the GitHub/GitLab organization or repository name.

__License and copyright__:
- The open source license of the rule must be specified as [SPDX license identifier](https://spdx.org/licenses/) using `--license`. If the license identifier needed is not yet present among the choices, add it in the script.
- Additionally, the file containing the original copyright notice must be included with `--copyright-notice`. It will be copied into `rules/<3rd-party>/<name>`.

__Rule versioning__: The script loops over all files in the respective repo and subfolder (if any, specified with `--repo-subdir`) and checks whether the files already exist in the rules or samples subfolders of the monorepo:
- If not, the file is copied and `metadata.version` is set to `v1.0.0`.
- If yes, it compares the commit hash of the file in the upstream repo with the commit hash in the metadata field `endor-rule-origin.url` of the existing file. If the commits are identical, the file is not copied. If they are different, the file is copied and `metadata.version` is bumped.

__CWE and description__: The rules in the upstream repository may not have CWE metadata or a proper description. In such cases, the script adds them with a `TODO` in the YAML files. Search and fix those manually to meet the above-described metadata requirements.

3rd party rules from __GitLab__:

```
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir c
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir java
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir javascript
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license MIT --third-party gitlab --copyright-notice LICENSE --repo-subdir python
python3 bin/fetch-3p-rules.py --repo https://gitlab.com/gitlab-org/security-products/sast-rules --clone-into .tmp --license LGPL-3.0-only --third-party gitlab --repo-subdir rules/lgpl/javascript --copyright-notice rules/lgpl/LICENSE
```

3rd party rules from __akabe1__:
```
python3 bin/fetch-3p-rules.py --repo https://github.com/akabe1/akabe1-semgrep-rules --clone-into .tmp --license GPL-3.0-or-later --copyright-notice README.md --third-party akabe1 --repo-subdir java/xxe
```

3rd party rules from __chenlvtang__:

```
python3 bin/fetch-3p-rules.py --repo https://github.com/chenlvtang/MySemgrepRules --clone-into .tmp --license none --third-party chenlvtang --repo-subdir file-path-traversal
```

3rd party rules from __0xdea__:

```
python3 bin/fetch-3p-rules.py --repo https://github.com/0xdea/semgrep-rules --clone-into .tmp --repo-subdir c --third-party 0xdea --license MIT --copyright-notice LICENSE
```

## From Endor Labs

What we expect with a new rule:
**Do not look at external semgrep rules for reference.**
**AI usage from ChatGPT or Co-pilot is completely acceptable and encouraged but it should also be human reviewed**

- All commits must be signed
- A new rule should be added to the appropriate category and language directory.
- There should only be one Semgrep rule per YAML file.
- The rule-id should be names using the following format: **\<lang\>-\<name\>**, for example:
- java-http-repo
- The file should be named using the following format: **\<rule-id\>.yaml**, for example:
- java-http-repo.yaml
- The test target file should be named in the same way with the appropriate file extension, for example:
- java-http-repo.xml
- The rule needs to adhere to the Semgrep syntax. [This page](https://semgrep.dev/docs/writing-rules/rule-syntax) describes the mandatory fields for a semgrep rule.
- Every vulnerability-related rule must also have the metadata field `cwe` (cf. [Semgrep documentation](https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository#including-fields-required-by-security-category)). This CWE will be the basis for creating different categories and subcategories that can be used for selecting a subset of Semgrep rules for a given scan or in the UI. Example categories or TOP-X lists like OWASP Top-10 or CWE Top-25 (cf. [example categories](https://cwe.mitre.org/scoring/index.html#top_n_lists)).
- Each rule should also adhere to the Endor Labs' supported grammar defined [here](https://docs.endorlabs.com/api/#tag/SemgrepRuleService/operation/SemgrepRuleService_CreateSemgrepRule).
- The metadata field `message` must be spell-checked, to make sure it can be shown as-is in our UI. Consider [those advices](https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository#rule-messages) regarding high-quality rule messages. Moreover, the message must not contain any metavariables. The message should also contain descriptive but general advice how how this type of rule should impacts a user, why and how it should be resolved.


14 changes: 14 additions & 0 deletions samples/3p/0xdea/c/argv-envp-access.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Marco Ivaldi <[email protected]>

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
char cmd[CMD_MAX] = "/usr/bin/cat ";
// ruleid: raptor-argv-envp-access
strcat(cmd, argv[1]);
system(cmd);

return 0;
}
53 changes: 53 additions & 0 deletions samples/3p/0xdea/c/command-injection.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
// Marco Ivaldi <[email protected]>

#include <stdio.h>
#include <stdlib.h>

void invoke1(char *string)
{
char buf[] = "uname -a; id";

// ok: raptor-command-injection
system(buf);

// ok: raptor-command-injection
system("whoami");

// ruleid: raptor-command-injection
system(string);
}

void invoke2(char *string)
{
char buf[] = "uname -a; id";

// ok: raptor-command-injection
popen(buf, "r");

// ok: raptor-command-injection
popen("whoami", "r");

// ruleid: raptor-command-injection
popen(string, "r");
}

int send_mail(char *user)
{
char buf[1024];
FILE *fp;

snprintf(buf, sizeof(buf), "/usr/bin/sendmail -s \"hi\" %s", user);

// ruleid: raptor-command-injection
fp = popen(buf, "w");

if (fp == NULL)
return 1;
// ...
}

int main()
{
printf("Hello, World!");
return 0;
}
87 changes: 87 additions & 0 deletions samples/3p/0xdea/c/double-free.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
// Marco Ivaldi <[email protected]>

#include <stdio.h>
#include <stdlib.h>

#define MEMSIZE 256

void alloc_and_free1()
{
int bailout = 1;
char *ptr = (char *)malloc(MEMSIZE);

// this should be catched but it isn't, due to a documented limitation in semgrep
// https://semgrep.dev/docs/writing-rules/pattern-syntax/#ellipses-and-statement-blocks
if (bailout)
free(ptr);

free(ptr);
// ruleid: raptor-double-free
free(ptr);
}

void alloc_and_free2()
{
char *ptr = (char *)malloc(MEMSIZE);

free(ptr);
ptr = NULL;
// ok: raptor-double-free
free(ptr);
}

void alloc_and_free3()
{
char *ptr = (char *)malloc(MEMSIZE);

free(ptr);
ptr = (char *)malloc(MEMSIZE);
// ok: raptor-double-free
free(ptr);
}

void double_free(int argc, char **argv)
{
char *buf1R1;
char *buf2R1;
char *buf1R2;
buf1R1 = (char *) malloc(BUFSIZE2);
buf2R1 = (char *) malloc(BUFSIZE2);
free(buf1R1);
free(buf2R1);
buf1R2 = (char *) malloc(BUFSIZE1);
strncpy(buf1R2, argv[1], BUFSIZE1-1);
// ruleid: raptor-double-free
free(buf2R1);
free(buf1R2);
}

int Packet *getNextPacket()
{
Packet *y = (Packet *) malloc(1024);
retval = waitForPacket(y);
if(retval == OK) {
return y;
} else {
return NULL;
}
}

int bad()
{
free(logData);
pkt = getNextPacket();
if(!pkt) {
return NULL;
}
logPktData(pkt);
// ruleid: raptor-double-free
free(logData);
processPacket(pkt);
}

int main()
{
printf("Hello, World!");
return 0;
}
66 changes: 66 additions & 0 deletions samples/3p/0xdea/c/format-string-bugs.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
// Marco Ivaldi <[email protected]>

#include <stdio.h>
#include <syslog.h>

#define BUFSIZE 256

void build_string(char *string)
{
char buf[BUFSIZE];

// ruleid: raptor-format-string-bugs
snprintf(buf, BUFSIZE, string);

// ok: raptor-format-string-bugs
snprintf(buf, BUFSIZE, "%s", string);
}

void print_stuff(char *string)
{
char buf[BUFSIZE];

// ruleid: raptor-format-string-bugs
printf(string);

// ok: raptor-format-string-bugs
printf("%s\n", string);
}

void log_stuff(char *string)
{
char buf[BUFSIZE];

// ruleid: raptor-format-string-bugs
syslog(LOG_ERR, string);

// ok: raptor-format-string-bugs
syslog(LOG_ERR, "%s", string);
}

void log_error(char *fmt, ...)
{
char buf[BUFSIZE];
va_list ap;

va_start(ap, fmt);
// ruleid: raptor-format-string-bugs
vsnprintf(buf, sizeof(buf), fmt, ap);
va_end(ap);
// ruleid: raptor-format-string-bugs
syslog(LOG_NOTICE, buf);
}

void printWrapper(char *string)
{
// ruleid: raptor-format-string-bugs
printf(string);
}

int main(int argc, char **argv)
{
char buf[5012];
memcpy(buf, argv[1], 5012);
printWrapper(argv[1]);
return 0;
}
Loading

0 comments on commit bba2a28

Please sign in to comment.