Skip to content

Commit

Permalink
Add Minimal Data Platform blueprint (GoogleCloudPlatform#1362)
Browse files Browse the repository at this point in the history
Minimal Data Platform blueprint
  • Loading branch information
lcaggio authored May 8, 2023
1 parent f0d928f commit 3cc6c71
Show file tree
Hide file tree
Showing 19 changed files with 1,404 additions and 3 deletions.
2 changes: 1 addition & 1 deletion blueprints/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Currently available blueprints:

- **apigee** - [Apigee Hybrid on GKE](./apigee/hybrid-gke/), [Apigee X analytics in BigQuery](./apigee/bigquery-analytics), [Apigee network patterns](./apigee/network-patterns/)
- **cloud operations** - [Active Directory Federation Services](./cloud-operations/adfs), [Cloud Asset Inventory feeds for resource change tracking and remediation](./cloud-operations/asset-inventory-feed-remediation), [Fine-grained Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Cloud DNS & Shared VPC design](./cloud-operations/dns-shared-vpc), [Delegated Role Grants](./cloud-operations/iam-delegated-role-grants), [Networking Dashboard](./cloud-operations/network-dashboard), [Managing on-prem service account keys by uploading public keys](./cloud-operations/onprem-sa-key-management), [Compute Image builder with Hashicorp Packer](./cloud-operations/packer-image-builder), [Packer example](./cloud-operations/packer-image-builder/packer), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Configuring workload identity federation with Terraform Cloud/Enterprise workflows](./cloud-operations/terraform-cloud-dynamic-credentials), [TCP healthcheck and restart for unmanaged GCE instances](./cloud-operations/unmanaged-instances-healthcheck), [Migrate for Compute Engine (v5) blueprints](./cloud-operations/vm-migration), [Configuring workload identity federation to access Google Cloud resources from apps running on Azure](./cloud-operations/workload-identity-federation)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Minimal Data Platform](./data-solutions/data-platform-minimal), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml)
- **factories** - [The why and the how of Resource Factories](./factories), [Google Cloud Identity Group Factory](./factories/cloud-identity-group-factory), [Google Cloud BQ Factory](./factories/bigquery-factory), [Google Cloud VPC Firewall Factory](./factories/net-vpc-firewall-yaml), [Minimal Project Factory](./factories/project-factory)
- **GKE** - [Binary Authorization Pipeline Blueprint](./gke/binauthz), [Storage API](./gke/binauthz/image), [Multi-cluster mesh on GKE (fleet API)](./gke/multi-cluster-mesh-gke-fleet-api), [GKE Multitenant Blueprint](./gke/multitenant-fleet), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [GKE Autopilot](./gke/autopilot)
- **networking** - [Calling a private Cloud Function from On-premises](./networking/private-cloud-function-from-onprem), [Decentralized firewall management](./networking/decentralized-firewall), [Decentralized firewall validator](./networking/decentralized-firewall/validator), [Network filtering with Squid](./networking/filtering-proxy), [GLB and multi-regional daisy-chaining through hybrid NEGs](./networking/glb-hybrid-neg-internal), [Hybrid connectivity to on-premise services through PSC](./networking/psc-hybrid), [HTTP Load Balancer with Cloud Armor](./networking/glb-and-armor), [Hub and Spoke via VPN](./networking/hub-and-spoke-vpn), [Hub and Spoke via VPC Peering](./networking/hub-and-spoke-peering), [Internal Load Balancer as Next Hop](./networking/ilb-next-hop), [Network filtering with Squid with isolated VPCs using Private Service Connect](./networking/filtering-proxy-psc), On-prem DNS and Google Private Access, [PSC Producer](./networking/psc-hybrid/psc-producer), [PSC Consumer](./networking/psc-hybrid/psc-consumer), [Shared VPC with optional GKE cluster](./networking/shared-vpc-gke)
Expand Down
11 changes: 9 additions & 2 deletions blueprints/data-solutions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,15 @@ This [blueprint](./composer-2/) creates a [Cloud Composer](https://cloud.google.

### Data Platform Foundations

<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/images/overview_diagram.png" align="left" width="280px"></a>
This [blueprint](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
<a href="./data-platform-foundations/" title="Data Platform"><img src="./data-platform-foundations/images/overview_diagram.png" align="left" width="280px"></a>
This [blueprint](./data-platform-foundations/) implements a robust and flexible Data Platform on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.

<br clear="left">

### Minimal Data Platform

<a href="./data-platform-minimal/" title="Minimal Data Platform"><img src="./data-platform-minimal/images/diagram.png" align="left" width="280px"></a>
This [blueprint](./data-platform-minimal/) implements a minimal Data Platform on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.

<br clear="left">

Expand Down
2 changes: 2 additions & 0 deletions blueprints/data-solutions/data-platform-foundations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This module implements an opinionated Data Platform Architecture that creates and setup projects and related resources that compose an end-to-end data environment.

For a minimal Data Platform, plese refer to the [Minimal Data Platform](../data-platform-minimal/) blueprint.

The code is intentionally simple, as it's intended to provide a generic initial setup and then allow easy customizations to complete the implementation of the intended design.

The following diagram is a high-level reference of the resources created and managed here:
Expand Down
77 changes: 77 additions & 0 deletions blueprints/data-solutions/data-platform-minimal/01-landing.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tfdoc:file:description Landing project and resources.

locals {
iam_lnd = {
"roles/storage.objectCreator" = [module.land-sa-cs-0.iam_email]
"roles/storage.objectViewer" = [module.processing-sa-cmp-0.iam_email]
"roles/storage.objectAdmin" = [module.processing-sa-dp-0.iam_email]
}
}

module "land-project" {
source = "../../../modules/project"
parent = var.project_config.parent
billing_account = var.project_config.billing_account_id
project_create = var.project_config.billing_account_id != null
prefix = var.project_config.billing_account_id == null ? null : var.prefix
name = (
var.project_config.billing_account_id == null
? var.project_config.project_ids.landing
: "${var.project_config.project_ids.landing}${local.project_suffix}"
)
iam = var.project_config.billing_account_id != null ? local.iam_lnd : null
iam_additive = var.project_config.billing_account_id == null ? local.iam_lnd : null
services = [
"cloudkms.googleapis.com",
"cloudresourcemanager.googleapis.com",
"iam.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
]
service_encryption_key_ids = {
bq = [var.service_encryption_keys.bq]
storage = [var.service_encryption_keys.storage]
}
}

# Cloud Storage

module "land-sa-cs-0" {
source = "../../../modules/iam-service-account"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-cs-0"
display_name = "Data platform GCS landing service account."
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers
]
}
}

module "land-cs-0" {
source = "../../../modules/gcs"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-cs-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
force_destroy = var.data_force_destroy
}
117 changes: 117 additions & 0 deletions blueprints/data-solutions/data-platform-minimal/02-composer.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tfdoc:file:description Cloud Composer resources.

locals {
env_variables = {
BQ_LOCATION = var.location
CURATED_BQ_DATASET = module.cur-bq-0.dataset_id
CURATED_GCS = module.cur-cs-0.url
CURATED_PRJ = module.cur-project.project_id
DP_KMS_KEY = var.service_encryption_keys.compute
DP_REGION = var.region
GCP_REGION = var.region
LAND_PRJ = module.land-project.project_id
LAND_GCS = module.land-cs-0.name
PHS_CLUSTER_NAME = module.processing-dp-historyserver.name
PROCESSING_GCS = module.processing-cs-0.name
PROCESSING_PRJ = module.processing-project.project_id
PROCESSING_SA_DP = module.processing-sa-dp-0.email
PROCESSING_SUBNET = local.processing_subnet
PROCESSING_VPC = local.processing_vpc
}
}

module "processing-sa-cmp-0" {
source = "../../../modules/iam-service-account"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-cmp-0"
display_name = "Data platform Composer service account"
iam = {
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
"roles/iam.serviceAccountUser" = [module.processing-sa-cmp-0.iam_email]
}
}

resource "google_composer_environment" "processing-cmp-0" {
count = var.composer_config.disable_deployment == true ? 0 : 1
project = module.processing-project.project_id
name = "${var.prefix}-prc-cmp-0"
region = var.region
config {
software_config {
airflow_config_overrides = var.composer_config.software_config.airflow_config_overrides
pypi_packages = var.composer_config.software_config.pypi_packages
env_variables = merge(
var.composer_config.software_config.env_variables, local.env_variables
)
image_version = var.composer_config.software_config.image_version
}
workloads_config {
scheduler {
cpu = var.composer_config.workloads_config.scheduler.cpu
memory_gb = var.composer_config.workloads_config.scheduler.memory_gb
storage_gb = var.composer_config.workloads_config.scheduler.storage_gb
count = var.composer_config.workloads_config.scheduler.count
}
web_server {
cpu = var.composer_config.workloads_config.web_server.cpu
memory_gb = var.composer_config.workloads_config.web_server.memory_gb
storage_gb = var.composer_config.workloads_config.web_server.storage_gb
}
worker {
cpu = var.composer_config.workloads_config.worker.cpu
memory_gb = var.composer_config.workloads_config.worker.memory_gb
storage_gb = var.composer_config.workloads_config.worker.storage_gb
min_count = var.composer_config.workloads_config.worker.min_count
max_count = var.composer_config.workloads_config.worker.max_count
}
}

environment_size = var.composer_config.environment_size

node_config {
network = local.processing_vpc
subnetwork = local.processing_subnet
service_account = module.processing-sa-cmp-0.email
enable_ip_masq_agent = true
tags = ["composer-worker"]
ip_allocation_policy {
cluster_secondary_range_name = var.network_config.composer_ip_ranges.pods_range_name
services_secondary_range_name = var.network_config.composer_ip_ranges.services_range_name
}
}
private_environment_config {
enable_private_endpoint = "true"
cloud_sql_ipv4_cidr_block = var.network_config.composer_ip_ranges.cloud_sql
master_ipv4_cidr_block = var.network_config.composer_ip_ranges.gke_master
cloud_composer_connection_subnetwork = var.network_config.composer_ip_ranges.connection_subnetwork
}
dynamic "encryption_config" {
for_each = (
var.service_encryption_keys.composer != null
? { 1 = 1 }
: {}
)
content {
kms_key_name = var.service_encryption_keys.composer
}
}
}
depends_on = [
module.processing-project
]
}
121 changes: 121 additions & 0 deletions blueprints/data-solutions/data-platform-minimal/02-dataproc.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tfdoc:file:description Cloud Dataproc resources.

module "processing-cs-dp-history" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-cs-dp-history"
location = var.region
storage_class = "REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-sa-dp-0" {
source = "../../../modules/iam-service-account"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-dp-0"
display_name = "Dataproc service account"
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers,
module.processing-sa-cmp-0.iam_email
],
"roles/iam.serviceAccountUser" = [
module.processing-sa-cmp-0.iam_email
]
}
}

module "processing-dp-staging-0" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-stg-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-dp-temp-0" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-tmp-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-dp-log-0" {
source = "../../../modules/gcs"
project_id = module.processing-project.project_id
prefix = var.prefix
name = "prc-log-0"
location = var.location
storage_class = "MULTI_REGIONAL"
encryption_key = var.service_encryption_keys.storage
}

module "processing-dp-historyserver" {
source = "../../../modules/dataproc"
project_id = module.processing-project.project_id
name = "hystory-server"
prefix = var.prefix
region = var.region
dataproc_config = {
cluster_config = {
staging_bucket = module.processing-dp-staging-0.name
temp_bucket = module.processing-dp-temp-0.name
gce_cluster_config = {
subnetwork = module.processing-vpc[0].subnets["${var.region}/${var.prefix}-processing"].self_link
zone = "${var.region}-b"
service_account = module.processing-sa-dp-0.email
service_account_scopes = ["cloud-platform"]
internal_ip_only = true
}
worker_config = {
num_instances = 0
machine_type = null
min_cpu_platform = null
image_uri = null
}
software_config = {
override_properties = {
"dataproc:dataproc.allow.zero.workers" = "true"
"dataproc:job.history.to-gcs.enabled" = "true"
"spark:spark.history.fs.logDirectory" = (
"gs://${module.processing-dp-staging-0.name}/*/spark-job-history"
)
"spark:spark.eventLog.dir" = (
"gs://${module.processing-dp-staging-0.name}/*/spark-job-history"
)
"spark:spark.history.custom.executor.log.url.applyIncompleteApplication" = "false"
"spark:spark.history.custom.executor.log.url" = (
"{{YARN_LOG_SERVER_URL}}/{{NM_HOST}}:{{NM_PORT}}/{{CONTAINER_ID}}/{{CONTAINER_ID}}/{{USER}}/{{FILE_NAME}}"
)
}
}
endpoint_config = {
enable_http_port_access = "true"
}
encryption_config = {
kms_key_name = var.service_encryption_keys.compute
}
}
}
}
Loading

0 comments on commit 3cc6c71

Please sign in to comment.