What this tool does:
- Cloud Run Service: Deploys a Cloud Run service that houses the anti-pattern detection logic.
- BigQuery Remote Function: Creates a BigQuery User-Defined Function (UDF) that acts as a bridge between your SQL queries and the Cloud Run service.
- Anti-Pattern Detection: When you call the BigQuery UDF, it sends your SQL query to the Cloud Run service, which analyzes it for anti-patterns. The result is returned in JSON format.
This tutorial uses billable components of Google Cloud, including the following:
Use the pricing calculator to generate a cost estimate based on your projected usage.
For this tutorial, you need a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
At the bottom of the Cloud Console, a Cloud Shell session opens and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.
-
In Cloud Shell, clone the source repository:
git clone https://github.com/GoogleCloudPlatform/bigquery-antipattern-recognition.git cd bigquery-antipattern-recognition
-
Enable all the required Google Cloud APIs
gcloud services enable \ artifactregistry.googleapis.com \ cloudresourcemanager.googleapis.com \ bigquery.googleapis.com \ bigqueryconnection.googleapis.com \ cloudbuild.googleapis.com \ run.googleapis.com
-
Authenticate using User Application Default Credentials ("ADCs") as a primary authentication method.
gcloud auth application-default login
-
Initialize and run the Terraform script to create all resources:
cd ./udf/terraform && \ terraform init && \ terraform apply
-
Authenticate using User Application Default Credentials ("ADCs") as a primary authentication method.
gcloud auth application-default login
-
Set your variable names for resources:
export PROJECT_ID="<PROJECT_ID>" export REGION="<REGION>" export ARTIFACT_REGISTRY_NAME="<ARTIFACT_REGISTRY_NAME>" export CLOUD_RUN_SERVICE_NAME="antipattern-service" export BQ_FUNCTION_DATASET="fns"
-
Run the Bash script to create all resources:
bash ./udf/deploy_udf.sh
Once the deployment script successfully completes resources creation, visit BigQuery Console to run the test SQL script
SELECT fns.get_antipatterns("SELECT * from dataset.table ORDER BY 1")
The function returns a JSON string for each query representing the antipatterns found in each query, if any. For example the function would return the following response for the query above:
{
"antipatterns": [
{
"name": "SimpleSelectStar",
"result": "SELECT * on table: dataset.table. Check that all columns are needed."
},
{
"name": "OrderByWithoutLimit",
"result": "ORDER BY clause without LIMIT at line 1."
}
]
}
In case you want to customize the deployment, please use following steps:
-
In Cloud Shell, set the Cloud Region that you want to create your BigQuery and Cloud Run resources in:
PROJECT_ID="<PROJECT_ID>" REGION="<REGION_ID>" ARTIFACT_REGISTRY_NAME="<ARTIFACT_DOCKER_REGISTRY_NAME>" CLOUD_RUN_SERVICE_NAME="antipattern-service"
This is a containerized SpringBoot application. Create an Artifact Registry to store the application's container image
gcloud artifacts repositories create "${ARTIFACT_REGISTRY_NAME}" \
--repository-format=docker \
--location="${REGION}" \
--description="Docker repository for Bigquery Functions" \
--project="${PROJECT_ID}"
gcloud iam service-accounts create cloud-build-sa \
--display-name "Cloud Build Service Account"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member "serviceAccount:cloud-build-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/logging.logWriter"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member "serviceAccount:cloud-build-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/storage.objectViewer"
gcloud artifacts repositories add-iam-policy-binding \
$ARTIFACT_REGISTRY_NAME \
--location=$REGION \
--member "serviceAccount:cloud-build-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/artifactregistry.writer"
-
Build the application container image using Cloud Build. You should run this at the root of the directory. NOTE: It may take up to 60 seconds for the roles assigned in the previous step to propagate before running this command.
gcloud builds submit . \ --project=$PROJECT_ID \ --config=cloudbuild-udf.yaml \ --service-account=projects/$PROJECT_ID/serviceAccounts/cloud-build-sa@$PROJECT_ID.iam.gserviceaccount.com \ --substitutions=_CONTAINER_IMAGE_NAME=${REGION}-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REGISTRY_NAME/$CLOUD_RUN_SERVICE_NAME:latest \ --machine-type=e2-highcpu-8
-
Deploy Cloud Run by compiling and deploying Container :
gcloud run deploy ${CLOUD_RUN_SERVICE_NAME} \ --image="${REGION}-docker.pkg.dev/${PROJECT_ID}/${ARTIFACT_REGISTRY_NAME}/${CLOUD_RUN_SERVICE_NAME}:latest" \ --region="${REGION}" \ --no-allow-unauthenticated \ --project ${PROJECT_ID}
-
Retrieve and save the Cloud Run URL:
RUN_URL="$(gcloud run services describe ${CLOUD_RUN_SERVICE_NAME} --region ${REGION} --project ${PROJECT_ID} --format="get(status.address.url)")"
-
Create BigQuery connection for accessing Cloud Run:
bq mk --connection \ --display_name='External antipattern function connection' \ --connection_type=CLOUD_RESOURCE \ --project_id="${PROJECT_ID}" \ --location="${REGION}" \ ext-${CLOUD_RUN_SERVICE_NAME}
-
Find the BigQuery Service Account used for the connection and remove surrounding quotes:
CONNECTION_SA="$(bq --project_id ${PROJECT_ID} --format json show --connection ${PROJECT_ID}.${REGION}.ext-${CLOUD_RUN_SERVICE_NAME} | jq '.cloudResource.serviceAccountId')"
CONNECTION_SA="${CONNECTION_SA%\"}" # Remove trailing quote CONNECTION_SA="${CONNECTION_SA#\"}" # Remove leading quote
-
Grant the BigQuery connection Service Account Cloud Run Invoker role for accessing the Cloud Run:
gcloud projects add-iam-policy-binding ${PROJECT_ID} \ --member="serviceAccount:${CONNECTION_SA}" \ --role='roles/run.invoker'
-
Define the BigQuery dataset to create remote functions:
BQ_FUNCTION_DATASET="fns"
-
Create the dataset if it doesn't exist:
bq mk --dataset \ --project_id ${PROJECT_ID} \ --location ${REGION} \ ${BQ_FUNCTION_DATASET}
-
Create Antipattern remote function
bq query --project_id ${PROJECT_ID} \ --use_legacy_sql=false \ "CREATE OR REPLACE FUNCTION ${BQ_FUNCTION_DATASET}.get_antipatterns(query STRING) RETURNS JSON REMOTE WITH CONNECTION \`${PROJECT_ID}.${REGION}.ext-${CLOUD_RUN_SERVICE_NAME}\` OPTIONS (endpoint = '${RUN_URL}');"
- Ensure BigQuery and Cloud Run service are in the same cloud region
- The Antipattern remote function currently does not support the
--advanced-analysis
flag or AI rewrite.