Skip to content

Commit

Permalink
SCM Inventory - GitHub Inventory IaC on AWS
Browse files Browse the repository at this point in the history
  • Loading branch information
yilmi committed Jun 27, 2024
1 parent 1ae0e48 commit 07bc544
Show file tree
Hide file tree
Showing 15 changed files with 600 additions and 0 deletions.
146 changes: 146 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# SCM Inventory Module

The SCM Inventory module is designed to automate the deployment of resources necessary for scanning SCM and pulling an inventory from such platforms. Initially it supports pullung GitHub organizations' repositories, their issues and pull requests to generate an inventory and maintain it.

The inventory includes by default additional information about the top 5 languages used in the repository as well as the top 5 topics used. This information can be customized to include additional data.

This Terraform module provisions an AWS EC2 instance, configures it with necessary permissions, and sets up a workflow to fetch GitHub inventory data and pushes it to an S3 bucket. The module is designed to be flexible and can be customized to support additional SCM platforms and data sources.

## Supported SCM

- GitHub: For more information see the python module [github_inventory](scripts/inventory/github_inventory/README.md) stored in this repository.

## Prerequisites
- AWS CLI configured with appropriate credentials
- Access to an AWS account with permissions to create EC2 instances, IAM roles, policies, and S3 buckets
- A GitHub token with permissions to access the repositories and organizations you wish to scan

## Usage

**Configure AWS Credentials**

Ensure your AWS CLI is configured with credentials that have the necessary permissions to create the resources defined in this module.

**Prepare GitHub Token**

Store your GitHub token in AWS Secrets Manager. Note the ARN of the secret as it will be used in the Terraform variables.

**Set Terraform Variables**

Customize the Terraform variables defined in the variables.tf file or provide a terraform.tfvars file with your specific values.

We recommend setting the variables in a terraform.tfvars file based off the [terraform.tfvars.example](infrastructure/inventory/aws/scm-inventory/deployment.tfvars.example) file provided.

Key variables include:
- aws_profile: The AWS profile to use for authentication.
- aws_region: The AWS region where resources will be deployed.
- s3_bucket_name: The name of the S3 bucket where the inventory will be stored. (This bucket must be created beforehand).
- github_token_secret_name: The ARN of the AWS Secrets Manager secret containing your GitHub token. This will have to be provisonned separately
- project_name: A name for your project.
- scanned_org: The GitHub organization you wish to scan.

**Initialize Terraform**

Run terraform init in the infrastructure/inventory/aws/scm-inventory/ directory to initialize the Terraform project.

**Apply Terraform Configuration**

Execute terraform apply to create the resources. Review the plan and confirm the action.

**Access the Inventory**

Once the EC2 instance completes its run, the generated inventory will be available in the specified S3 bucket. The instance can be configured to terminate automatically after completion.

**Additional Notes**

The EC2 instance will use a `t2.micro` instance type by default, but this can be adjusted based on your needs. We didn't want to use a larger instance type by default to avoid unnecessary costs.

It is also possible to keep the EC2 running after the inventory generation, which can be useful for debugging purposes. This can be done by setting the `terminate_instance_after_completion` variable to `false`.

The module supports optional fetching of issues and pull requests from the scanned GitHub organizations by setting the fetch_issues and fetch_pr variables.

The inventory script is located in the `scripts/inventory/github_inventory` directory.

For detailed information on the resources created and managed by this module, refer to the automatically generated documentation below.


<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >=1.7 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | ~> 5.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 5.0 |
| <a name="provider_local"></a> [local](#provider\_local) | n/a |
| <a name="provider_null"></a> [null](#provider\_null) | n/a |

## Modules

No modules.

## Resources

| Name | Type |
|------|------|
| [aws_iam_instance_profile.ec2_instance_profile](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource |
| [aws_iam_policy.permissions_for_ec2_instance](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_iam_policy.s3_access_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_iam_role.ec2_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role_policy_attachment.PermissionsForEC2InstancePolicyAttachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_instance.ec2_inventory](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance) | resource |
| [aws_s3_object.poetry_dist](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_object) | resource |
| [null_resource.poetry_build](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [aws_ami.amazon_ami](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_iam_policy_document.ec2_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.policy_document_permissions_for_ec2_instance](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.s3_access_policy_document](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_s3_bucket.resources_and_results](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/s3_bucket) | data source |
| [aws_secretsmanager_secret.github_token_secret](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret) | data source |
| [aws_security_group.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/security_group) | data source |
| [aws_security_groups.custom_security_groups](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/security_groups) | data source |
| [aws_subnet.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnet) | data source |
| [aws_subnets.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnets) | data source |
| [aws_vpc.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc) | data source |
| [local_file.dist](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_ami_image_filter"></a> [ami\_image\_filter](#input\_ami\_image\_filter) | Filter to use to find the Amazon Machine Image (AMI) to use for the EC2 instance the name can contain wildcards. Only GNU/Linux images are supported. | `string` | `"amzn2-ami-hvm*"` | no |
| <a name="input_ami_owner"></a> [ami\_owner](#input\_ami\_owner) | Owner of the Amazon Machine Image (AMI) to use for the EC2 instance | `string` | `"amazon"` | no |
| <a name="input_aws_default_security_groups_filters"></a> [aws\_default\_security\_groups\_filters](#input\_aws\_default\_security\_groups\_filters) | Filters to use to find the default security groups | `list(string)` | `[]` | no |
| <a name="input_aws_profile"></a> [aws\_profile](#input\_aws\_profile) | AWS profile to use for authentication | `string` | n/a | yes |
| <a name="input_aws_region"></a> [aws\_region](#input\_aws\_region) | AWS region where to deploy resources | `string` | `"us-east-1"` | no |
| <a name="input_ec2_workdir"></a> [ec2\_workdir](#input\_ec2\_workdir) | Working directory for the EC2 instance | `string` | `"~/github-inventory"` | no |
| <a name="input_environment_type"></a> [environment\_type](#input\_environment\_type) | Environment (PRODUCTION, PRE-PRODUCTION, QUALITY ASSURANCE, INTEGRATION TESTING, DEVELOPMENT, LAB) | `string` | `"PRODUCTION"` | no |
| <a name="input_fetch_issues"></a> [fetch\_issues](#input\_fetch\_issues) | Indicates whether to fetch issues for the repositories | `bool` | `false` | no |
| <a name="input_fetch_pr"></a> [fetch\_pr](#input\_fetch\_pr) | Indicates whether to fetch pull requests for the repositories | `bool` | `false` | no |
| <a name="input_github_token_secret_name"></a> [github\_token\_secret\_name](#input\_github\_token\_secret\_name) | SSM parameter name containing the GitHub token of the Service Account | `string` | n/a | yes |
| <a name="input_instance_type"></a> [instance\_type](#input\_instance\_type) | Instance type to use for fetching the inventory | `string` | `"t2.micro"` | no |
| <a name="input_inventory_project_dir"></a> [inventory\_project\_dir](#input\_inventory\_project\_dir) | Path to the directory containing the inventory project | `string` | `"../../../../scripts/inventory/github_inventory"` | no |
| <a name="input_permissions_boundary_arn"></a> [permissions\_boundary\_arn](#input\_permissions\_boundary\_arn) | Permissions boundary to use for the IAM role | `string` | `null` | no |
| <a name="input_project_name"></a> [project\_name](#input\_project\_name) | Name of the project | `string` | `"secrets-detection"` | no |
| <a name="input_project_version"></a> [project\_version](#input\_project\_version) | Version of the project | `string` | `"0.1.0"` | no |
| <a name="input_s3_bucket_name"></a> [s3\_bucket\_name](#input\_s3\_bucket\_name) | S3 bucket name where to upload the scripts and results | `string` | n/a | yes |
| <a name="input_scanned_org"></a> [scanned\_org](#input\_scanned\_org) | Name of the organization to scan | `string` | n/a | yes |
| <a name="input_subnet_name"></a> [subnet\_name](#input\_subnet\_name) | Filter to select the subnet to use, this can use wildcards. | `string` | `null` | no |
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to add to the resources | `map(string)` | `{}` | no |
| <a name="input_terminate_instance_after_completion"></a> [terminate\_instance\_after\_completion](#input\_terminate\_instance\_after\_completion) | Indicates whether the instance should be terminated once the scan has finished (set to false for debugging purposes) | `bool` | `true` | no |
| <a name="input_vpc_name"></a> [vpc\_name](#input\_vpc\_name) | Filter to select the VPC to use, this can use wildcards. | `string` | `""` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_ec2_instance_arn"></a> [ec2\_instance\_arn](#output\_ec2\_instance\_arn) | n/a |
| <a name="output_ec2_instance_id"></a> [ec2\_instance\_id](#output\_ec2\_instance\_id) | n/a |
| <a name="output_ec2_role_arn"></a> [ec2\_role\_arn](#output\_ec2\_role\_arn) | n/a |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
113 changes: 113 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/iam.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
## Role assumable by EC2 instance
data "aws_iam_policy_document" "ec2_assume_role" {
statement {
effect = "Allow"
principals {
identifiers = ["ec2.amazonaws.com"]
type = "Service"
}
actions = ["sts:AssumeRole"]
}
}

resource "aws_iam_role" "ec2_role" {
name = "${var.project_name}-ec2-role"
assume_role_policy = data.aws_iam_policy_document.ec2_assume_role.json
path = "/"
permissions_boundary = var.permissions_boundary_arn
}

resource "aws_iam_instance_profile" "ec2_instance_profile" {
name = "${var.project_name}-instance-profile"
role = aws_iam_role.ec2_role.name
}

data "aws_iam_policy_document" "policy_document_permissions_for_ec2_instance" {
# S3: Get and put objects in S3 bucket
statement {
sid = "ListS3Bucket"
effect = "Allow"
actions = ["s3:ListBucket"]
resources = [data.aws_s3_bucket.resources_and_results.arn]
}

statement {
sid = "GetAndPutObjectsInS3Bucket"
effect = "Allow"
actions = [
"s3:GetObject*",
"s3:PutObject*"
]
resources = ["${data.aws_s3_bucket.resources_and_results.arn}/*"]
}

# Secrets Manager: Get GitHub API token

statement {
sid = "FetchGitHubToken"
effect = "Allow"
actions = [
"secretsmanager:GetSecretValue",
]
resources = ["arn:aws:secretsmanager:${var.aws_region}:${data.aws_caller_identity.current.account_id}:secret:${var.project_name}/${var.github_token_secret_name}-*"]
}

# EC2: Allow instance to schedule termination for itself (end of scan)
statement {
sid = "AllowTerminationOfEC2Instance"
effect = "Allow"
actions = [
"ec2:TerminateInstances"
]
resources = ["arn:aws:ec2:${var.aws_region}:${data.aws_caller_identity.current.account_id}:instance/*"]

condition {
test = "StringLike"
variable = "aws:ResourceTag/Name"
values = ["${var.project_name}*"]
}

condition {
test = "StringLike"
variable = "ec2:InstanceProfile"
values = [aws_iam_instance_profile.ec2_instance_profile.arn]
}
}
}

resource "aws_iam_policy" "permissions_for_ec2_instance" {
name = "${var.project_name}-ec2-permissions"
description = "Policy granting necessary permissions to EC2 instance"
policy = data.aws_iam_policy_document.policy_document_permissions_for_ec2_instance.json
}

resource "aws_iam_role_policy_attachment" "PermissionsForEC2InstancePolicyAttachment" {
policy_arn = aws_iam_policy.permissions_for_ec2_instance.arn
role = aws_iam_role.ec2_role.name
}


data "aws_iam_policy_document" "s3_access_policy_document" {
statement {
sid = "ListS3Bucket"
effect = "Allow"
actions = ["s3:ListBucket"]
resources = [data.aws_s3_bucket.resources_and_results.arn]
}

statement {
sid = "GetAndListObjectsInS3Bucket"
effect = "Allow"
actions = [
"s3:GetObject*",
"s3:ListObject*"
]
resources = ["${data.aws_s3_bucket.resources_and_results.arn}/*"]
}
}

resource "aws_iam_policy" "s3_access_policy" {
name = "${var.project_name}-s3-access"
description = "Policy allowing to access the S3 bucket used for Trufflehog"
policy = data.aws_iam_policy_document.s3_access_policy_document.json
}
9 changes: 9 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/images.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
data "aws_ami" "amazon_ami" {
most_recent = true
owners = [var.ami_owner]

filter {
name = "name"
values = ["${var.ami_image_filter}"]
}
}
4 changes: 4 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/locals.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
locals {
environment = replace(lower(var.environment_type), " ", "-")
tags = var.tags
}
50 changes: 50 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@

resource "aws_instance" "ec2_inventory" {
ami = data.aws_ami.amazon_ami.id
instance_type = var.instance_type
subnet_id = data.aws_subnet.selected.id
iam_instance_profile = aws_iam_instance_profile.ec2_instance_profile.name
security_groups = length(var.aws_default_security_groups_filters) > 0 ? data.aws_security_groups.custom_security_groups[0].ids : [data.aws_security_group.default[0].id]
user_data_replace_on_change = true
metadata_options {
http_tokens = "required"
}

root_block_device {
volume_size = 30
volume_type = "gp2"
delete_on_termination = true
}

user_data = join("\n", [
"#!/bin/bash",
"aws configure set region ${var.aws_region}",
"mkdir -p ${var.ec2_workdir}/github_inventory-${var.project_version}",
"aws s3 cp s3://${data.aws_s3_bucket.resources_and_results.id}/${aws_s3_object.poetry_dist.key} ${var.ec2_workdir}/",
"export GITHUB_INVENTORY_TOKEN=$(aws secretsmanager get-secret-value --secret-id ${data.aws_secretsmanager_secret.github_token_secret.arn} --query SecretString --output text)",
"tar -xvf ${var.ec2_workdir}/github_inventory-${var.project_version}.tar.gz -C ${var.ec2_workdir}",
"cd ${var.ec2_workdir}/github_inventory-${var.project_version}",
"virtualenv local",
"source local/bin/activate",
"pip3 install poetry",
"poetry lock && poetry install",
var.fetch_pr ? "export GITHUB_INVENTORY_PR=True" : "",
var.fetch_issues ? "export GITHUB_INVENTORY_ISSUES=True" : "",
"poetry run python -m github_inventory --org ${var.scanned_org}",
"aws s3 cp ${var.ec2_workdir}/github_inventory-${var.project_version}/inventory-${var.scanned_org}.json s3://${data.aws_s3_bucket.resources_and_results.id}/outbound/json/inventory-${var.scanned_org}.json",
"TOKEN=$(curl -X PUT \"http://169.254.169.254/latest/api/token\" -H \"X-aws-ec2-metadata-token-ttl-seconds: 21600\")",
"export INSTANCE_ID=$(curl -H \"X-aws-ec2-metadata-token: $TOKEN\" -s http://169.254.169.254/latest/meta-data/instance-id)",
var.terminate_instance_after_completion ? "aws ec2 terminate-instances --instance-ids $INSTANCE_ID" : ""
])


tags = merge(var.tags, { Name = "${var.project_name}-ec2-${var.scanned_org}" })

depends_on = [
data.local_file.dist,
null_resource.poetry_build,
aws_s3_object.poetry_dist,
aws_iam_policy.permissions_for_ec2_instance,
aws_iam_role_policy_attachment.PermissionsForEC2InstancePolicyAttachment,
]
}
11 changes: 11 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
output "ec2_role_arn" {
value = aws_iam_role.ec2_role.arn
}

output "ec2_instance_id" {
value = aws_instance.ec2_inventory.id
}

output "ec2_instance_arn" {
value = aws_instance.ec2_inventory.arn
}
22 changes: 22 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
terraform {
required_version = ">=1.7"

required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}

backend "s3" {
encrypt = true
}
}

provider "aws" {
region = var.aws_region
profile = var.aws_profile
default_tags {
tags = local.tags
}
}
27 changes: 27 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/s3.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
data "aws_s3_bucket" "resources_and_results" {
bucket = var.s3_bucket_name
}

resource "null_resource" "poetry_build" {
provisioner "local-exec" {
command = "poetry build -f sdist"
working_dir = "${var.inventory_project_dir}/"
}

triggers = {
always_run = timestamp()
}
}

data "local_file" "dist" {
filename = "${var.inventory_project_dir}/dist/github_inventory-${var.project_version}.tar.gz"
depends_on = [null_resource.poetry_build]
}

resource "aws_s3_object" "poetry_dist" {
bucket = data.aws_s3_bucket.resources_and_results.id
key = "inventory/scripts/poetry_dist/github_inventory-${var.project_version}.tar.gz"
source = "${var.inventory_project_dir}/dist/github_inventory-${var.project_version}.tar.gz"
source_hash = data.local_file.dist.content_sha256
depends_on = [data.local_file.dist]
}
5 changes: 5 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/s3.tfbackend
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
bucket = "<bucket_name>"
key = "<tf_state_s3_path>"
region = "<aws_region>"
dynamodb_table = "<dynamodb_table>"
profile = "<aws_profile>"
3 changes: 3 additions & 0 deletions infrastructure/inventory/aws/scm-inventory/secrets.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
data "aws_secretsmanager_secret" "github_token_secret" {
name = var.github_token_secret_name
}
Loading

0 comments on commit 07bc544

Please sign in to comment.