Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add taint source on plugin-added taints #10206

Merged
merged 29 commits into from
Feb 15, 2025

Conversation

Patrick-Remy
Copy link
Contributor

@Patrick-Remy Patrick-Remy commented Sep 15, 2023

Fixes #10057

Problem

Taints not found in psalm v5 using $codebase->addTaintSource() in plugin

With the change of immutability (d0be59e#diff-122c0df94b9a0557e4fd09b8d8c7324f4d6d8fff34f344e1e49318c8e7a0a242), addTaintSource() doesn't manipulate the $expr_type anymore and instead returns the adjusted $expr_type. But this breaks custom-added taint sources via plugin as shown in the docs currently. From plugin perspective you cannot anymore adjust the expression type inside the plugin.

See also my repro here: https://github.com/Patrick-Remy/repro-psalm-5-taint-analysis-issue/blob/master/README.md

<?php

namespace Some\Ns;

use PhpParser;
use Psalm\CodeLocation;
use Psalm\Context;
use Psalm\FileManipulation;
use Psalm\Plugin\EventHandler\AfterExpressionAnalysisInterface;
use Psalm\Plugin\EventHandler\Event\AfterExpressionAnalysisEvent;
use Psalm\Type\TaintKindGroup;

class BadSqlTainter implements AfterExpressionAnalysisInterface
{
    /**
     * Called after an expression has been checked
     *
     * @param  PhpParser\Node\Expr  $expr
     * @param  Context              $context
     * @param  FileManipulation[]   $file_replacements
     *
     * @return void
     */
    public static function afterExpressionAnalysis(AfterExpressionAnalysisEvent $event): ?bool {
        $expr = $event->getExpr();
        $statements_source = $event->getStatementsSource();
        $codebase = $event->getCodebase();
        if ($expr instanceof PhpParser\Node\Expr\Variable
            && $expr->name === 'bad_data'
        ) {
            $expr_type = $statements_source->getNodeTypeProvider()->getType($expr);

            // should be a globally unique id
            // you can use its line number/start offset
            $expr_identifier = '$bad_data'
                . '-' . $statements_source->getFileName()
                . ':' . $expr->getAttribute('startFilePos');

            if ($expr_type) {
                $codebase->addTaintSource(
                    $expr_type,
                    $expr_identifier,
                    TaintKindGroup::ALL_INPUT,
                    new CodeLocation($statements_source, $expr)
                );

                // Custom addition here to check if variable is really tainted
                echo "\n\n\nTAINTED\n\n\n";
            }
        }

        return null;
    }

This is probably an intended behaviour, but there needs to be another way for adding taints by plugin.

Taints not found in psalm v4 + v5 using AddTaintsInterface

I found the undocumented plugin interface AddTaintsInterface that makes it possible to return an array of taints that should be added to analyzed expression.
Unfortunately this also lead to no findings in both psalm v4 and psalm v5.

<?php

namespace Some\Ns;

use PhpParser;
use Psalm\Plugin\EventHandler\AddTaintsInterface;
use Psalm\Plugin\EventHandler\Event\AddRemoveTaintsEvent;
use Psalm\Type\TaintKindGroup;

class BadSqlTainter implements AddTaintsInterface
{

    /**
     * Called to see what taints should be added
     *
     * @return list<string>
     */
    public static function addTaints(AddRemoveTaintsEvent $event): array
    {
        $expr = $event->getExpr();
        if ($expr instanceof PhpParser\Node\Expr\Variable
            && $expr->name === 'bad_data'
        ) {
            echo "\n\n\nTAINTED\n\n\n";
            return TaintKindGroup::ALL_INPUT;
        }

        return [];
    }
}

Analysis

After deep-diving into the taint analysis, I found the reason why addTaints only sometimes could have an effect.
Other than $codebase->addSource() the callees most time do not add TaintSource via data_flow_graph->addSource(). Seems like only those Analyzers that analyze types that can be annotated with @psalm-taint-source like MethodCallReturnTypeFetcher already added those.

Implemented solution

  1. As anywhere where AddRemoveTaintsEvent is dispatched, a plugin could add new taints (also at Taint sinks, which seemed a bit strange to me), a new TaintSource has to be added, so that the already added data flow paths get connected. At some of those places where AddRemoveTaintsEvent was implemented, it was intended to only remove events, instead of also allowing it to add new taints
  2. At some places AddRemoveTaintsEvent wasn't called before, so it wasn't triggered to analyze some expressions and therefor it was impossible to add taints in some situations
  3. I also added a new 5.x+ compatible plugin example and test for addTaints and updated the documentaiton.
  4. Independently of AddRemoveTaintsEvent method-return taints weren't correctly inherited by return statements. Multiple return statements now merge and store the taints in the function-storage. When the return type of a method call gets analyzed, a taint source now gets created by the function storage.

@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch 2 times, most recently from b7910ca to 01a9575 Compare September 19, 2023 18:36
@Patrick-Remy
Copy link
Contributor Author

I have just refactored my implementation and more important added a new test case that covers the issue.

As now all checks are passing (ignoring the test-with-real-projects), could you please review it, so that this fix can be integrated into 5.x? As unfortunately this currently blocks our upgrade to psalm v5.

@orklah
Copy link
Collaborator

orklah commented Sep 28, 2023

Hi, thanks for this work and sorry for not answering earlier.

Does that mean that any existing plugin that may have tried to add taints or sink was broken?

I'm trying to establish the consequences for other existing plugins that may exist to see if this is a BC break or not

@Patrick-Remy
Copy link
Contributor Author

As far as I know, it may probably still work when adding taints via either addTaintSource() or AddTaintsInterface if there is already some taint source at a node (due to psalm) and you are adding another to the exact same node. But I haven't tested that so far. But as there weren't any tests related to plugin-added taints since now, it could also be completely broken.

There should probably be a full suite of tests related to plugin-added taints. If you wish I can add some more.

@orklah
Copy link
Collaborator

orklah commented Sep 29, 2023

More tests are always good :)

Should we deprecated addTaintSource for removal in Psalm 6?

@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch from 4640aa6 to bb10f13 Compare January 23, 2024 21:14
@Patrick-Remy
Copy link
Contributor Author

I was adding some more tests, and fastly failed with some odd behaviour where I am not sure, if it is due to my changes or if it was already previously.

Using the TaintBadDataPlugin

    public static function addTaints(AddRemoveTaintsEvent $event): array
    {
        $expr = $event->getExpr();

        if ($expr instanceof Variable && $expr->name === 'bad_data') {
            return TaintKindGroup::ALL_INPUT;
        }

        return [];
    }

The following taint is not recognized

<?php // --taint-analysis

$foo = $bad_data;
echo $foo;

whereas this is still recognized:

<?php // --taint-analysis

echo $bad_data;

Anyway I wanted to fix it, but I am still stucking on how inside the AssignmentAnalyzer. Any hints to me?

@Patrick-Remy
Copy link
Contributor Author

Any ideas @orklah ?

@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch 2 times, most recently from bb8a5d4 to bd424dc Compare July 1, 2024 16:12
@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch from bd424dc to ba334db Compare December 31, 2024 18:10
@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch from 3ea4dd3 to ab8ad62 Compare February 9, 2025 15:03
Change wasn't necessary, instead a taint source can be created if taints get
added during assignment.
Trigger AddRemoveTaintEvent and check if it results in added taints. Also allow
now removing taints from superglobals.
@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch from ab8ad62 to 44e6aa8 Compare February 9, 2025 15:16
@Patrick-Remy Patrick-Remy changed the base branch from 5.x to 6.x February 9, 2025 15:18
@Patrick-Remy Patrick-Remy force-pushed the fix/custom-taint-sources-not-detected branch from 71d9de1 to 1d6e525 Compare February 9, 2025 16:26
@Patrick-Remy
Copy link
Contributor Author

It's finally done! After a lot of work and deeply digging into the taint system of psalm, I was able to fix many taint bugs since 5.x and added a bunch of tests, that were previously failing. I updated the initial comment for reference.

As @danog is extremely active currently, I would kindly ask to review this and hoping for a fast release, as most (adding) taint plugins are broken since 1 1/2 year right now!

@Patrick-Remy
Copy link
Contributor Author

Someone? Maybe @orklah?

@orklah orklah added the release:fix The PR will be included in 'Fixes' section of the release notes label Feb 15, 2025
@orklah
Copy link
Collaborator

orklah commented Feb 15, 2025

Thanks!

Sorry for not answering earlier and congrats on this massive work!

@orklah orklah merged commit 20a21f9 into vimeo:6.x Feb 15, 2025
51 of 52 checks passed
@Patrick-Remy
Copy link
Contributor Author

Thank you so much! 😊

public static function fromNode(DataFlowNode $node): self
{
return new self(
$node->id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, looking through the taint system now, shouldn't this be $node->unspecialized_id ?? $node->id to avoid re-specialization of already specalized nodes?

Copy link
Contributor Author

@Patrick-Remy Patrick-Remy Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have no idea... maybe the impact can checked when changing and checking if all tests still pass - at least if there are any specialization taint tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general my feedback to the taint system is, that it's really hard to overview and internal documentation is extremely short. In sum there are unfortunately only a few comments or docblocks in the codebase with explanations which makes it very hard to get into all the Analyzer classes and their (taint) handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:fix The PR will be included in 'Fixes' section of the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding custom taint sources via plugin ignored
3 participants