update provider configs

Signed-off-by: Pranav Gaikwad <[email protected]>
konveyor · Jan 23, 2024 · 5dee8dd · 5dee8dd
1 parent cf28a39
commit 5dee8dd
Showing 1 changed file with 33 additions and 17 deletions.
diff --git a/enhancements/analyzer-rules-testing/README.md b/enhancements/analyzer-rules-testing/README.md
@@ -27,7 +27,7 @@ This enhancement proposes a new test runner for YAML rules. It will allow write
 
 The biggest drawback with testing rules right now is that the output must be verified by the rule author manually. They have to go through all incidents / violations and verify if they see a desired result. Once they do verify the output, the only way to share this expected output is to share the entire _output.yaml_ file generated post analysis. Comparing the expected output with the actual results is done by computing _diff_ between the expected and actual output files. While this works great in end-to-end CI tests that run on a single rules file, there are limitations. First, it's very hard to understand what exactly is failing by only looking at diff output. And second, you have to generate that very first "expected output" file by running the analyzer. You cannot expect someone to write that file by hand. 
 
-Assume that you're a rule author creating rules for a certain technology. You write your first rule, you run it against sample application, manually verify that your rule matched in the `output.yaml` file. You count the number of incidents, check if produced messages are correct. You check for any tags created if applicable. All good! You move onto writing your second rule, you run the rule again and get another `output.yaml` file. This time, you ensure that the first rule still produces the same output as before in addition to verifying your new rule. In every iteration, you manually verify every rule in the ruleset. Ideally, we want the rule author to write a test first and then iteratively work towards perfecting their rule until it produces the desired output. It's necessary that they can write the test by hand and on top of that, they should be able to run a command to verify the results.
+Assume that you're a rule author creating rules for a certain technology. You write your first rule, you run it against sample application, manually verify that your rule matched in the `output.yaml` file. You count the number of incidents, check if produced messages are correct. You check for any tags created if applicable. All good! You move onto writing your second rule, you run the rules file again and get an output file. This time, you ensure that the first rule still produces the same output as before in addition to verifying your new rule. In every iteration, you manually verify every rule in the ruleset. Ideally, we want the rule author to write a test first and then iteratively work towards perfecting their rule until it produces the desired output. It's necessary that they can write the test by hand and on top of that, they should be able to run a command to verify the results.
 
 In addition to the UX concern above, there is another important concern. As analyzer-lsp adds more and more providers, contribution to rulesets will increase as a result. For the long-term health of the project, it is absolutely essential that any new rules added to rulesets have associated tests and sample data. An arduous manual testing process will only discourage rule authors from submitting tests.
 
@@ -52,8 +52,13 @@ Lastly, the project has been mainly relying on rules from the Windup project whi
 Rule authors should should be able to "declare" a test for a rule in a YAML file:
 
 ```yaml
+providers:
+- name: java
+  location: ./java-sample-data/
+- name: go
+  location: ./go-sample-data/
+tests:
 - ruleID: same-rule-id-as-the-rule-to-be-tested
-  data: /path/to/sample/data/
   testCases:
   - description: In source-only mode, should match 10 times
     analysisParams:
@@ -66,9 +71,11 @@ Rule authors should should be able to "declare" a test for a rule in a YAML file
     hasTags:
     - Category=tag1
 ```
-
+* providers: List of configs needed to setup providers. The runner will take a settings file at runtime and mutate it with values defined here for every run. This can also be defined at ruleset level. Values here will take precedance:
+  * name: Name of the provider to match in _provider_settings.json_
+  * location: Path to sample data for this provider
+* tests: List of tests, each testing a distinct rule
   * ruleID: ID of the rule this test applies to. 
-  * data: Optional path to a directory containing sample data for the test. By default, path declared at the ruleset level will be used (see next section for details).
   * testCases: A list of different test cases for this test with each case containing following fields:
     * description: This is what will be printed as text next to PASS / FAIL result
     * analysisParams: Additional _optional_ parameters to configure analysis when running this test. By default, these won't be set.
@@ -82,11 +89,7 @@ Rule authors should should be able to "declare" a test for a rule in a YAML file
 
 Notice that there could be more than one _testCases_ in a test with different _analysisParams_. This is useful when you want to test the same rule but under different analysis parameters. 
 
-Note that above example contains a test for one rule. The YAML file will contain a list of such tests, each item being a test for a distinct rule.
-
-Having a schema for the tests would be nice. That will allow users to auto-generate the boilerplate.
-
-Test passes when all of the following conditions are met:
+A testcase passes when all of the following conditions are met:
 
 - either one of _hasTags_ or _hasIncidents_ is defined
 - _hasTags_ is not defined OR all given tags are present in the output.
@@ -95,19 +98,22 @@ Test passes when all of the following conditions are met:
   - _messageContains_ is not defined OR (it is defined AND message is found in all incidents)
   - _codeSnipContains_ is not defined OR (it is defined AND code snip is found in all incidents)
 
+> Having a schema for the tests would be nice. It will allow users to auto-generate the boilerplate.
+
 #### Organizing test files in a ruleset
 
 The YAML file we described in the previous section will be saved in the same directory as its rules file. The name of this test file will be same as of the rules file appended with `_test` suffix. For instance, tests for a rules file `rules.yaml` will be stored in `rules_test.yaml` in the same directory. The analyzer-lsp engine will ignore any files that have `_test.yaml` during rules parsing. 
 
-Each rule test above has an optional field `data` that points to the sample data directory to use for the test. In most cases, rule authors will end up re-using the same sample data for a ruleset. Therefore, a new field `testData` will be introduced in `ruleset.yaml` file:
+The providers can also be defined at a ruleset level in case users just want to re-use for all tests in a rulest. A new field `testConfig` can be introduced in `ruleset.yaml` file:
 
 ```yaml
 name: my-ruleset
-testData: ./data/
+testConfig:
+  providers:
+  - name: java
+    location: /path/to/java/
 ```
 
-This sample data will be used as input in case a test case doesn't explicitely specify the `data` field.
-
 #### Running the tests
 
 We are proposing a new CLI that takes the tests as input. 
@@ -186,6 +192,16 @@ This output will be later used by the CI. When a new PR is opened in _konveyor/r
 The Go structs are defined below:
 
 ```go
+type TestConfig struct {
+  Providers []ProviderConfig `yaml:"providers"`
+  Tests     []RuleTest       `yaml:"tests"`
+}
+
+type ProviderConfig struct {
+  Name     string `yaml:"name,omitempty"`
+  Location string `yaml:"location,omitempty"`
+}
+
 type RuleTest struct {
 	RuleID    string     `yaml:"ruleID"`
 	Data      string     `yaml:"data,omitempty"`
@@ -226,9 +242,11 @@ Here are options for the  test runner CLI:
 
 #### Running Tests
 
-Depending on the `analysisParams` used, how many rulesets are given and what distinct data directories are present, there will be different combinations of tests with overalapping _analysisParams_ and _data_ directories. It only makes sense for the test runner to first, group tests by data directories, and second, group further by analysis params. This is faster than running every file at once. One drawback of this is that output of a certain rule _may_ change in presence of another rule in the system. Analyzer-lsp engine will eventually run all rules at once though, so _maybe_ it's better to surface these problems early on.
+The runner will run each test file at once setting up the provider by merging two things - provider settings passed at runtime via `--provider-settings` and locations specified in the test file itself for each provider. Each test will run at once. This can be run concurrently in future.
+
+Based on different analysis params used, there will be different combinations of analyses. The overlapping analyses runs can be grouped together to make things faster. It might potentially affect the results if there are rules that somehow affect each other. But maybe there's profit in catching those things early on.
 
-The groups of tests will look something like with 4 instances of different tests running at a time:
+The groups of tests will look something like:
 
 ```sh
 .
@@ -248,8 +266,6 @@ The groups of tests will look something like with 4 instances of different tests
         └── networking_rules_test.yaml
 ```
 
-For this system to work correctly, it's necessary that every test specifies the correct _ruleID_ so that the runner can pick the correct rules.
-
 The test output and logs will be generated in a temp directory per analysis run:
 
 ```sh