Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safely load metadata files. #19

Merged
merged 2 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ inputs:
root:
description: "Root of the repository checkout to validate metadata within"
default: ${{ github.workspace }}
check_spdx:
description: "Check for two-line SDPX header conformance"
default: "False"
runs:
using: "composite"
steps:
Expand All @@ -15,5 +18,5 @@ runs:
run: pip install -e "${{ github.action_path }}"
shell: bash
- name: Run validator
run: python3 -m "aws_doc_sdk_examples_tools.validate" --root "${{ inputs.root }}"
run: python3 -m "aws_doc_sdk_examples_tools.validate" --root "${{ inputs.root }}" --check-spdx "${{ inputs.check_spdx }}"
shell: bash
61 changes: 37 additions & 24 deletions aws_doc_sdk_examples_tools/doc_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict, Iterable, List, Optional, Set
from typing import Dict, Iterable, Optional, Set

# from os import glob

Expand Down Expand Up @@ -123,34 +123,47 @@ def clone(self) -> "DocGen":

def for_root(self, root: Path, config: Optional[Path] = None) -> "DocGen":
self.root = root
metadata = root / ".doc_gen/metadata"

if config is None:
config = Path(__file__).parent / "config"

with open(root / ".doc_gen" / "validation.yaml") as file:
validation = yaml.safe_load(file)
self.validation.allow_list.update(validation.get("allow_list", []))
self.validation.sample_files.update(validation.get("sample_files", []))

with open(config / "sdks.yaml", encoding="utf-8") as file:
meta = yaml.safe_load(file)
sdks, errs = parse_sdks("sdks.yaml", meta)
self.errors.extend(errs)

with open(config / "services.yaml", encoding="utf-8") as file:
meta = yaml.safe_load(file)
services, service_errors = parse_services("services.yaml", meta)
self.errors.extend(service_errors)

cross = set(
[path.name for path in (metadata.parent / "cross-content").glob("*.xml")]
)
try:
with open(root / ".doc_gen" / "validation.yaml", encoding="utf-8") as file:
validation = yaml.safe_load(file)
validation = {} if validation is None else validation
self.validation.allow_list.update(validation.get("allow_list", []))
self.validation.sample_files.update(validation.get("sample_files", []))
except Exception:
pass

try:
with open(config / "sdks.yaml", encoding="utf-8") as file:
meta = yaml.safe_load(file)
sdks, errs = parse_sdks("sdks.yaml", meta)
self.sdks = sdks
self.errors.extend(errs)
except Exception:
pass

try:
with open(config / "services.yaml", encoding="utf-8") as file:
meta = yaml.safe_load(file)
services, service_errors = parse_services("services.yaml", meta)
self.services = services
self.errors.extend(service_errors)
except Exception:
pass
Comment on lines +139 to +155
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these silently fail then you'll have empty sdks and services, which will blow up later. Better to error out here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the default state, and there are clear error messages then. In fact, most tributaries are not expected to have sdks or services, so they need to be empty and need to inherit from the parent doc_gen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these files don't exist at all, even in the root repo, the first place I see an error happening is in metadata.parse_services, where it will raise an UnknownService exception. This seems a lot harder to track down than "no services defined" or something in this doc_gen function. I see that you don't want to error out on a missing file, but I think you'll want to check that at least some sdks and services exist before you start the parse_examples loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an error here until the consumer of for_root says it is. doc_gen.py doesn't care if there are services or sdks or validation. Only consumers do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is an error here if there are no services loaded at all. In this function it eventually calls parse_examples, which ultimately calls metadata.parse_services, which will fail with UnknownService. I think this is a bad experience and easily overcome by checking that sdks and services are not None or empty. I see that this is unlikely in that the main root would have to omit one of these files, so arguably "it will never happen", although saying that means it is now guaranteed to happen. If we just need to get this in today that's fine, I don't want to block on this.


self.root = root
self.sdks = sdks
self.services = services
self.cross_blocks = cross
metadata = root / ".doc_gen/metadata"
try:
self.cross_blocks = set(
[
path.name
for path in (metadata.parent / "cross-content").glob("*.xml")
]
)
except Exception:
pass

for path in metadata.glob("*_metadata.yaml"):
with open(path) as file:
Expand Down
9 changes: 7 additions & 2 deletions aws_doc_sdk_examples_tools/project_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,13 @@ def check_files(
file_count += 1
logger.info("\nChecking File: %s", file_path)

with open(file_path, encoding="utf-8-sig") as f:
file_contents = f.read()
try:
with open(file_path, encoding="utf-8-sig") as f:
file_contents = f.read()
except Exception as e:
file_contents = ""
print(f"Could not verify {file_path}: {e}")
errors.append(MetadataError(file=str(file_path)))

verify_no_deny_list_words(file_contents, file_path, errors)
verify_no_secret_keys(file_contents, file_path, validation, errors)
Expand Down
4 changes: 3 additions & 1 deletion aws_doc_sdk_examples_tools/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-License-Identifier: Apache-2.0

import argparse
from ast import literal_eval
from pathlib import Path
from sys import exit

Expand All @@ -23,7 +24,8 @@ def main():
# )
parser.add_argument(
"--check-spdx",
default=True,
type=literal_eval,
default=False,
help="Verify all files start with SPDX header",
required=False,
)
Expand Down