Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing text sections and update images #1833

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
2 changes: 1 addition & 1 deletion docs/deployment/add-semgrep-to-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ If your provider is **not** on this list, you can still integrate Semgrep into y

## Projects

Adding a Semgrep job to your CI provider also adds the repository's records, including findings, as a **project** in Semgrep AppSec Platform. Each Project can be individually configured to send notifications or tickets.
Adding a Semgrep job to your CI provider creates a project with that repository's name. Projects in Semgrep encapsulate all findings and settings for that repository. Each Project can be individually configured to send notifications or tickets.

![Semgrep Projects page](/img/projects-page.png)
_**Figure.** Semgrep **Projects** page. This displays all the repositories you have successfully added a Semgrep job to._
Expand Down
5 changes: 2 additions & 3 deletions docs/extensions/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,15 @@ description: >-


import Login from "/src/components/procedure/_login-activate.mdx"
import ExtensionsIdes from "/src/components/reference/_extensions-ides.md"

# Extensions

Several third-party tools include Semgrep extensions.

### Official IDE extensions

- Microsoft Visual Studio Code: [`semgrep-vscode`](https://marketplace.visualstudio.com/items?itemName=semgrep.semgrep)
- IntelliJ Ultimate Idea (and most other IntelliJ products) [`semgrep-intellij`](https://plugins.jetbrains.com/plugin/22622-semgrep)
- Emacs: [`lsp-mode`](https://github.com/emacs-lsp/lsp-mode)
<ExtensionsIdes />

### The LSP Command

Expand Down
17 changes: 12 additions & 5 deletions docs/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ tags:
- Deployment
---

import ExtensionsIdes from "/src/components/reference/_extensions-ides.md"

# Prerequisites

This document details the required software or services to run Semgrep products.
Expand All @@ -18,8 +20,8 @@ A programming language must be supported by Semgrep for your chosen product.

| Product | Scan type | Link |
| ------- | ------ | ------ |
| Semgrep OSS | SAST | [Supported languages](/supported-languages#language-maturity-levels) |
| Semgrep Code | SAST | [Supported languages](/supported-languages#language-maturity-levels) |
| Semgrep OSS | SAST | [Supported languages](/supported-languages#semgrep-code-and-oss) |
| Semgrep Code | SAST | [Supported languages](/supported-languages#semgrep-code-and-oss) |
| Semgrep Supply Chain | SCA | [Supported languages](/supported-languages#semgrep-supply-chain) |
| Semgrep Secrets | Secrets | Language-agnostic |

Expand All @@ -41,14 +43,19 @@ These requirements apply to both Semgrep Pro and Semgrep OSS.

## Semgrep AppSec Platform

These requirements apply to Semgrep Pro.
These requirements apply to Semgrep AppSec Platform.

- A GitHub or GitLab cloud account. The credentials are used to authenticate and identify you.
- A Git repository to scan, stored in any of the following source code managers:
- GitHub
- GitLab
- Bitbucket
- Azure DevOps
- A CI provider and sufficient permissions to create CI jobs.
- A CI provider and sufficient permissions to create CI jobs. Alternatively, you can grant Semgrep Code access.

## IDE extensions

Semgrep provides an official extension for the following IDEs:

<ExtensionsIdes />

<!-- IDEs - to add after -->
4 changes: 2 additions & 2 deletions docs/semgrep-assistant/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@ If [auto-triage](/semgrep-assistant/overview/#auto-triage), which allows you to
![MR comment from Semgrep Assistant in GitLab](/img/assistant-gl-comment.png#md-width)
*Figure*. MR comment from Semgrep Assistant in GitLab.

#### Missing PR and comments
Semgrep Assistant messages only appear in your PR comments for rules that are set to Comment or Block mode on the Rule Management page. Ensure that:
#### Missing PR comments
Semgrep Assistant messages only appear in PR comments for rules that are set to **Comment** or **Block** mode in the Policies page. Ensure that:

* You have set rules to Comment or Block mode.
![ Policies modes](/img/semgrep-assistant-comment.png#md-width)
Expand Down
2 changes: 1 addition & 1 deletion docs/semgrep-code/editor.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ All private rules for an organization are saved to the organization's folder. To

<ForkExistingRule />

### Contribute to the open-source Semgrep Registry
### Contribute to the open source Semgrep Registry

:::info
For general contributing guidelines, see [Contributing rules](/contributing/contributing-to-semgrep-rules-repository).
Expand Down
105 changes: 93 additions & 12 deletions docs/semgrep-code/remove-duplicates.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,110 @@ slug: remove-duplicates
append_help_link: true
title: Remove duplicate findings
hide_title: true
description: Learn how to remove duplicate findings and prevent them from displayed in Semgrep AppSec Platform.
description: Learn how to remove duplicate findings and prevent them from being displayed in Semgrep AppSec Platform.
tags:
- Semgrep Code
- Semgrep AppSec Platform
- Semgrep Code
- Semgrep AppSec Platform
---

# Remove duplicate findings

Semgrep scans are performed on both mainline (trunk) and non-mainline branches. The scope of the scan can differ depending on if Semgrep is called on a mainline or non-mainline branch.
Semgrep scans are performed on both primary (default, mainline, or trunk) and non-primary branches. The scope of the scan differs depending on which type of branch Semgrep is scanning.

<dl>
<dt>Full scan</dt>
<dd>Scans the repository in its entirety. It is recommended to perform full scans on mainline branches, such as <code>master</code> or <code>main</code>. This scan is performed on a scheduled basis.</dd>
<dt>Diff-aware scan</dt>
<dd>Diff-aware scans are performed on non-mainline branches, such as in pull requests and merge requests. Diff-aware scans traverse the repository's files based on the commit where the branch diverged from the mainline branch (or diverged from the last commit that was fully scanned)</dd>
<dt>Full scan</dt>
<dd>Scans the repository in its entirety. It is recommended to perform full scans on mainline branches, such as <code>master</code> or <code>main</code>. Full scans are typically performed on a scheduled basis or on merge to a default branch.</dd>
<dt>Diff-aware scan</dt>
<dd>Diff-aware scans are performed on non-mainline branches, such as in pull requests and merge requests. Diff-aware scans traverse the repository's files based on the commit where the branch diverged from the mainline branch.</dd>
</dl>

## Remove duplicate findings using Semgrep AppSec Platform
## How Semgrep distinguishes between new and duplicate findings

Regardless of the scope of a scan, Semgrep correlates findings across branches based on their unique fingerprint, automatically deduplicating findings and making it simpler to triage.
Semgrep generates a finding whenever it scans a repository and one of its rules matches a piece of code. Since Semgrep usually scans a repository multiple times, it needs a way to track the same finding in a file over time. Semgrep does this using two types of fingerprints: `match_based_id` and `syntactic_id`.

If a finding is fixed in one branch (such as `main`) but open in another (such as `production`), and the code fixes are present in both branches, initiate a scan through your CI job or SCM tool on the branch(es) with open findings to have Semgrep mark the findings as fixed.
:::info
The calculations used to determine whether findings are new are subject to change at any time as Semgrep improves its deduplication logic.
:::

### `match_based_id`

Using the `match_based_id`, Semgrep can determine if a given finding in a file is the same as a finding identified during a different scan, even if the code snippet that the rule matched had been moved to a different location in the file. This allows Semgrep to avoid generating a new finding and to deduplicate its records accordingly, even across multiple branches associated with the project. It also means that Semgrep can cross-correlate findings, so a finding that has been triaged in one branch will be flagged as triaged if it's identified in another branch.

Semgrep generates the `match_based_id` for a finding using the following information:

- The file path
- The name of the rule that generated the finding
- The rule pattern with the metavariables' values substituted in

This information is combined and then hashed. At this point, Semgrep appends the **index**, a value generated by determining the number of times the rule involved matched code in the file. Note that the index is appended to the hash, not combined with the other finding information before hashing. This is done to preserve information on how findings are related. For example, `finding0` with `match_based_id = 123_0` and `finding1` with `match_based_id = 123_1` indicate that both were generated from the same rule matching the same code pattern in the same file.

Because Semgrep uses the rule pattern instead of the literal code syntax to generate `match_based_id`, code changes that don't impact the code pattern matched also don't hinder Semgrep's ability to recognize that the finding isn't a duplicate of an existing finding.

![Semgrep AppSec Platform groups together findings on different branches](/img/matched-findings.png)
_**Figure**. Semgrep AppSec Platform groups together the same finding identified as present on multiple branches._

For example, if the original file scanned is:

```python
a = 1
b = 2
spcd.get("foo")
c = 3
d = 4
sink("foo")
```

The rule pattern identified and used in generating the `match_based_id` is:

```python
spcd.get("foo")
...
sink("foo")
```

If the following change is made to the original file:

```python
a = 1
b = 2
spcd.get("foo")
c = 3
c_1 = 5
d = 4
sink("foo")
```

The rule pattern identified and used in generating the `match_based_id` does not change:

```python
spcd.get("foo")
...
sink("foo")
```

This means that the `match_based_id` itself doesn't change, allowing Semgrep to identify that the two findings are the same and to deduplicate them. Furthermore, this process enables Semgrep to ignore lines that do not impact code function.

### `syntactic_id`

Semgrep generates the `syntactic_id` for a finding using the following information:

- The file path
- The name of the rule that generated the finding
- The code syntax, or the literal piece of code that matched the rule
- The index, a value generated by determining the number of times the rule involved matched code in the file

This information is combined and then hashed for privacy before being stored.

:::info
The `syntactic_id` is primarily used by Semgrep for internal debugging purposes, since no code is stored except in cases where you have provided code access permissions to Semgrep.
:::

## Update findings by rescanning the project

Semgrep's correlation of findings across branches based on their unique fingerprint allows for automatic consolidation of findings and makes it simpler to triage findings.

If a finding is fixed in one branch (such as `main`) but open in another (such as `production`), and the code fixes are present in both branches, initiate scans through your CI job or SCM tool on the branches with open findings. Semgrep will reconcile the findings and mark them as fixed.

## Remove duplicate findings using Semgrep API

Semgrep API does not automatically deduplicate findings. If you are using Semgrep API to receive or pull findings data, set the `dedup` flag to `true` to deduplicate findings across refs or branches. Refer to [List all findings](https://semgrep.dev/api/v1/docs/#tag/Finding/operation/semgrep_app.saas.handlers.issue.openapi_list_recent_issues) in the Semgrep API docs for more information.
Semgrep API does not automatically group findings with the same match-based ID across branches. If you use Semgrep API to receive or pull findings data, set the `dedup` flag to `true` to deduplicate findings across refs or branches. Refer to [List all findings](https://semgrep.dev/api/v1/docs/#tag/Finding/operation/semgrep_app.saas.handlers.issue.openapi_list_recent_issues) in the Semgrep API docs for more information.
4 changes: 2 additions & 2 deletions docs/semgrep-code/triage-remediation.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ The following sections show you have to manage your findings by:

Note that some actions, such as ignoring and reopening findings, require different steps based on whether you have chosen **Group by Rule** or **No Grouping** when viewing your results on the **Findings** page.

![Screenshot of Semgrep AppSec Platform triage menu](/img/app-findings-triage.png#md-width)

![Screenshot of Semgrep AppSec Platform triage menu.](/img/app-findings-triage.png)
_**Figure**. Screenshot of Semgrep AppSec Platform triage menu._
### Fix a finding

To **fix a finding**, update or refactor the code such that the Semgrep rule pattern no longer matches the code.
Expand Down
3 changes: 3 additions & 0 deletions src/components/reference/_extensions-ides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- Microsoft Visual Studio Code: [**<i class="fas fa-external-link fa-xs"></i> semgrep-vscode**](https://marketplace.visualstudio.com/items?itemName=semgrep.semgrep)
- IntelliJ Ultimate Idea (and most other IntelliJ products): [**<i class="fas fa-external-link fa-xs"></i> semgrep-intellij**](https://plugins.jetbrains.com/plugin/22622-semgrep)
- Emacs: [**<i class="fas fa-external-link fa-xs"></i> lsp-mode**](https://github.com/emacs-lsp/lsp-mode)
5 changes: 5 additions & 0 deletions src/components/reference/_pro-rules-language-coverage.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
<details>
<summary>Click to view languages with Pro rules coverage</summary>
- C and C++
- C#
- Go
- Java
- JavaScript
- JSX
- Kotlin
- PHP
- Python
- Ruby
- Rust
- Swift
- TypeScript
</details>
40 changes: 21 additions & 19 deletions src/components/reference/_supported-languages-table.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
<!-- ensure that you also update pro-rules-language-coverage.mdx and index.md
if you update this page -->
<div class="language-support-table">

<table>
Expand All @@ -8,7 +10,7 @@
</tr></thead>
<tbody>
<tr>
<td>C / C++</td>
<td>C and C++</td>
<td><strong>✅ Generally available</strong><br />
• Cross-file dataflow analysis<br />
• 150+ Pro rules </td>
Expand Down Expand Up @@ -50,12 +52,24 @@
• Framework-specific control flow analysis<br />
• 70+ Pro rules</td>
</tr>
<tr>
<td>JSX</td>
<td><strong>✅ Generally available </strong><br />
• Cross-function dataflow analysis<br />
• 70+ Pro rules</td>
</tr>
<tr>
<td>Kotlin</td>
<td><strong>✅ Generally available </strong><br />
• Cross-file dataflow analysis<br />
• 60+ Pro rules</td>
</tr>
<tr>
<td>PHP</td>
<td><strong>✅ Generally available </strong><br />
• Cross-function dataflow analysis<br />
• 20+ Pro rules</td>
</tr>
<tr>
<td>[Python](/docs/semgrep-code/supported-languages-python)</td>
<td><strong>✅ Generally available</strong><br />
Expand All @@ -64,13 +78,6 @@
• 300+ Pro rules<br />
• See [Python-specific support details](/docs/semgrep-code/supported-languages-python)</td>
</tr>
<tr>
<td>Typescript</td>
<td><strong>✅ Generally available </strong><br />
• Cross-file dataflow analysis<br />
• Framework-specific control flow analysis<br />
• 70+ Pro rules</td>
</tr>
<tr>
<td>Ruby</td>
<td><strong>✅ Generally available </strong><br />
Expand All @@ -84,29 +91,24 @@
• 40+ Pro rules</td>
</tr>
<tr>
<td>JSX</td>
<td>Swift</td>
<td><strong>✅ Generally available </strong><br />
• Cross-function dataflow analysis<br />
70+ Pro rules</td>
50+ Pro rules</td>
</tr>
<tr>
<td>PHP</td>
<td>TypeScript</td>
<td><strong>✅ Generally available </strong><br />
• Cross-function dataflow analysis<br />
• 20+ Pro rules</td>
• Cross-file dataflow analysis<br />
• Framework-specific control flow analysis<br />
• 70+ Pro rules</td>
</tr>
<tr>
<td>Scala</td>
<td><strong>✅ Generally available </strong><br />
• Cross-function dataflow analysis<br />
• Community rules</td>
</tr>
<tr>
<td>Swift</td>
<td><strong>✅ Generally available </strong><br />
• Cross-function dataflow analysis<br />
• 50+ Pro rules</td>
</tr>
<tr>
<td>Terraform</td>
<td><strong>✅ Generally available</strong><br />
Expand Down
Binary file added static/img/matched-findings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading