Skip to content

snowplow-devops/terraform-aws-enrich-kinesis-ec2

Repository files navigation

Release CI License Registry Source

terraform-aws-enrich-kinesis-ec2

A Terraform module which deploys Snowplow Enrich service on EC2. If you want to use a custom AMI for this deployment you will need to ensure it is based on top of Amazon Linux 2.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

Standard usage

Stream Enrich takes data from a raw input stream and pushes validated data to the enriched stream and failed data to the bad stream. As part of this validation process we leverage Iglu which is Snowplow's schema repository - the home for event and entity definitions. If you are using custom events that you have defined yourself you will need to ensure that you link in your own Iglu Registries to this module so that they can be discovered correctly.

By default this module enables 5 enrichments which you can find in the templates/enrichments directory of this module.

module "raw_stream" {
  source  = "snowplow-devops/kinesis-stream/aws"
  version = "0.2.0"

  name = "raw-stream"
}

module "enriched_stream" {
  source  = "snowplow-devops/kinesis-stream/aws"
  version = "0.2.0"

  name = "enriched-stream"
}

module "bad_1_stream" {
  source  = "snowplow-devops/kinesis-stream/aws"
  version = "0.2.0"

  name = "bad-1-stream"
}

module "enrich_kinesis" {
  source = "snowplow-devops/enrich-kinesis-ec2/aws"

  accept_limited_use_license = true

  name                 = "enrich-server"
  vpc_id               = var.vpc_id
  subnet_ids           = var.subnet_ids
  in_stream_name       = module.raw_stream.name
  enriched_stream_name = module.enriched_stream.name
  bad_stream_name      = module.bad_1_stream.name

  ssh_key_name     = "your-key-name"
  ssh_ip_allowlist = ["0.0.0.0/0"]

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]
}

Inserting custom enrichments

To define your own enrichment configurations you will need to provide a JSON encoded string of the enrichment in the appropriate placeholder.

locals {
  enrichment_anon_ip = jsonencode(<<EOF
{
  "schema": "iglu:com.snowplowanalytics.snowplow/anon_ip/jsonschema/1-0-1",
  "data": {
    "name": "anon_ip",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "anonOctets": 1,
      "anonSegments": 1
    }
  }
}
EOF
  )
}

module "enrich_kinesis" {
  source = "snowplow-devops/enrich-kinesis-ec2/aws"

  accept_limited_use_license = true

  name                 = "enrich-server"
  vpc_id               = var.vpc_id
  subnet_ids           = var.subnet_ids
  in_stream_name       = module.raw_stream.name
  enriched_stream_name = module.enriched_stream.name
  bad_stream_name      = module.bad_1_stream.name

  ssh_key_name     = "your-key-name"
  ssh_ip_allowlist = ["0.0.0.0/0"]

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]

  # Enable this enrichment
  enrichment_anon_ip = local.enrichment_anon_ip
}

Disabling default enrichments

As with inserting custom enrichments to disable the default enrichments a similar strategy must be employed. For example to disable YAUAA you would do the following.

locals {
  enrichment_yauaa = jsonencode(<<EOF
{
  "schema": "iglu:com.snowplowanalytics.snowplow.enrichments/yauaa_enrichment_config/jsonschema/1-0-0",
  "data": {
    "enabled": false,
    "vendor": "com.snowplowanalytics.snowplow.enrichments",
    "name": "yauaa_enrichment_config"
  }
}
EOF
  )
}

module "enrich_kinesis" {
  source = "snowplow-devops/enrich-kinesis-ec2/aws"

  accept_limited_use_license = true

  name                 = "enrich-server"
  vpc_id               = var.vpc_id
  subnet_ids           = var.subnet_ids
  in_stream_name       = module.raw_stream.name
  enriched_stream_name = module.enriched_stream.name
  bad_stream_name      = module.bad_1_stream.name

  ssh_key_name     = "your-key-name"
  ssh_ip_allowlist = ["0.0.0.0/0"]

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]

  # Disable this enrichment
  enrichment_yauaa_enrichment_config = local.enrichment_yauaa
}

Requirements

Name Version
terraform >= 1.0.0
aws >= 3.72.0

Providers

Name Version
aws >= 3.72.0

Modules

Name Source Version
config_autoscaling snowplow-devops/dynamodb-autoscaling/aws 0.2.0
instance_type_metrics snowplow-devops/ec2-instance-type-metrics/aws 0.1.2
kcl_autoscaling snowplow-devops/dynamodb-autoscaling/aws 0.2.0
service snowplow-devops/service-ec2/aws 0.2.1
telemetry snowplow-devops/telemetry/snowplow 0.5.0

Resources

Name Type
aws_cloudwatch_log_group.log_group resource
aws_dynamodb_table.config resource
aws_dynamodb_table.kcl resource
aws_dynamodb_table_item.enrichment_anon_ip resource
aws_dynamodb_table_item.enrichment_api_request_enrichment_config resource
aws_dynamodb_table_item.enrichment_campaign_attribution resource
aws_dynamodb_table_item.enrichment_cookie_extractor_config resource
aws_dynamodb_table_item.enrichment_currency_conversion_config resource
aws_dynamodb_table_item.enrichment_event_fingerprint_config resource
aws_dynamodb_table_item.enrichment_http_header_extractor_config resource
aws_dynamodb_table_item.enrichment_iab_spiders_and_bots_enrichment resource
aws_dynamodb_table_item.enrichment_ip_lookups resource
aws_dynamodb_table_item.enrichment_javascript_script_config resource
aws_dynamodb_table_item.enrichment_pii_enrichment_config resource
aws_dynamodb_table_item.enrichment_referer_parser resource
aws_dynamodb_table_item.enrichment_sql_query_enrichment_config resource
aws_dynamodb_table_item.enrichment_ua_parser_config resource
aws_dynamodb_table_item.enrichment_weather_enrichment_config resource
aws_dynamodb_table_item.enrichment_yauaa_enrichment_config resource
aws_dynamodb_table_item.iglu_resolver resource
aws_iam_instance_profile.instance_profile resource
aws_iam_policy.iam_policy resource
aws_iam_role.iam_role resource
aws_iam_role_policy_attachment.policy_attachment resource
aws_security_group.sg resource
aws_security_group_rule.egress_tcp_443 resource
aws_security_group_rule.egress_tcp_80 resource
aws_security_group_rule.egress_tcp_custom resource
aws_security_group_rule.egress_udp_123 resource
aws_security_group_rule.ingress_tcp_22 resource
aws_caller_identity.current data source
aws_region.current data source

Inputs

Name Description Type Default Required
bad_stream_name The name of the bad kinesis stream that the Enricher will insert bad data into string n/a yes
enriched_stream_name The name of the enriched kinesis stream that the Enricher will insert validated data into string n/a yes
in_stream_name The name of the input kinesis stream that the Enricher will pull data from string n/a yes
name A name which will be pre-pended to the resources created string n/a yes
ssh_key_name The name of the SSH key-pair to attach to all EC2 nodes deployed string n/a yes
subnet_ids The list of subnets to deploy Enrich across list(string) n/a yes
vpc_id The VPC to deploy Enrich within (must have DNS hostnames enabled) string n/a yes
accept_limited_use_license Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) bool false no
amazon_linux_2_ami_id The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used string "" no
app_version App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. string "3.9.0" no
assets_update_period Period after which enrich assets should be checked for updates (e.g. MaxMind DB) string "7 days" no
associate_public_ip_address Whether to assign a public ip address to this instance bool true no
byte_limit The amount of bytes to buffer events before pushing them to Kinesis number 1000000 no
cloudwatch_logs_enabled Whether application logs should be reported to CloudWatch bool true no
cloudwatch_logs_retention_days The length of time in days to retain logs for number 7 no
custom_iglu_resolvers The custom Iglu Resolvers that will be used by Enrichment to resolve and validate events
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[] no
custom_s3_hosted_assets_bucket_name Name of the bucket in which hosted database for the IP Lookups and/or IAB Enrichments are stored string "" no
custom_tcp_egress_port_list For opening up TCP ports to access other destinations not served over HTTP(s) (e.g. for SQL / API enrichments) list(string) [] no
default_iglu_resolvers The default Iglu Resolvers that will be used by Enrichment to resolve and validate events
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[
{
"api_key": "",
"name": "Iglu Central",
"priority": 10,
"uri": "http://iglucentral.com",
"vendor_prefixes": []
},
{
"api_key": "",
"name": "Iglu Central - Mirror 01",
"priority": 20,
"uri": "http://mirror01.iglucentral.com",
"vendor_prefixes": []
}
]
no
enable_auto_scaling Whether to enable auto-scaling policies for the service bool true no
enrichment_anon_ip n/a string "" no
enrichment_api_request_enrichment_config n/a string "" no
enrichment_campaign_attribution n/a string "" no
enrichment_cookie_extractor_config n/a string "" no
enrichment_currency_conversion_config n/a string "" no
enrichment_event_fingerprint_config n/a string "" no
enrichment_http_header_extractor_config n/a string "" no
enrichment_iab_spiders_and_bots_enrichment Note: Requires paid database to function string "" no
enrichment_ip_lookups Note: Requires free or paid subscription to database to function string "" no
enrichment_javascript_script_config n/a string "" no
enrichment_pii_enrichment_config n/a string "" no
enrichment_referer_parser n/a string "" no
enrichment_sql_query_enrichment_config n/a string "" no
enrichment_ua_parser_config n/a string "" no
enrichment_weather_enrichment_config n/a string "" no
enrichment_yauaa_enrichment_config n/a string "" no
iam_permissions_boundary The permissions boundary ARN to set on IAM roles created string "" no
initial_position Where to start processing the input Kinesis Stream from (TRIM_HORIZON or LATEST) string "TRIM_HORIZON" no
instance_type The instance type to use string "t3a.small" no
java_opts Custom JAVA Options string "-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" no
kcl_read_max_capacity The maximum READ capacity for the KCL DynamoDB table number 10 no
kcl_read_min_capacity The minimum READ capacity for the KCL DynamoDB table number 1 no
kcl_write_max_capacity The maximum WRITE capacity for the KCL DynamoDB table number 10 no
kcl_write_min_capacity The minimum WRITE capacity for the KCL DynamoDB table number 1 no
max_size The maximum number of servers in this server-group number 2 no
min_size The minimum number of servers in this server-group number 1 no
private_ecr_registry The URL of an ECR registry that the sub-account has access to (e.g. '000000000000.dkr.ecr.cn-north-1.amazonaws.com.cn/') string "" no
record_limit The number of events to buffer before pushing them to Kinesis number 500 no
scale_down_cooldown_sec Time (in seconds) until another scale-down action can occur number 600 no
scale_down_cpu_threshold_percentage The average CPU percentage that we must be below to scale-down number 20 no
scale_down_eval_minutes The number of consecutive minutes that we must be below the threshold to scale-down number 60 no
scale_up_cooldown_sec Time (in seconds) until another scale-up action can occur number 180 no
scale_up_cpu_threshold_percentage The average CPU percentage that must be exceeded to scale-up number 60 no
scale_up_eval_minutes The number of consecutive minutes that the threshold must be breached to scale-up number 5 no
ssh_ip_allowlist The list of CIDR ranges to allow SSH traffic from list(any)
[
"0.0.0.0/0"
]
no
tags The tags to append to this resource map(string) {} no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
time_limit_ms The amount of time to buffer events before pushing them to Kinesis number 500 no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no

Outputs

Name Description
asg_id ID of the ASG
asg_name Name of the ASG
sg_id ID of the security group attached to the Enrich servers

Copyright and license

Copyright 2021-current Snowplow Analytics Ltd.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)