Skip to content

Latest commit

 

History

History
774 lines (484 loc) · 29.5 KB

File metadata and controls

774 lines (484 loc) · 29.5 KB

HOL-3: Exercise 2: Azure Arc enabled PostgreSQL Hyperscale(READ ONLY)

Note: This is read only exercise, Azure Arc enabled PostgreSQL is in preview and Microsoft product team is working on enhancing it. This exercise is not dependent on any other exercise and users will be able to complete the other exercises without any issues

Duration: 40 minutes

Contoso has some applications that use PostgreSQL as the backend database. They have installed PostgreSQL on their Linux servers in their manufacturing plants but these locations don’t necessarily have local IT support to update the operating system and PostgreSQL with the latest security updates. Additionally, they have just recently migrated some of their very large Oracle databases to PostgreSQL. They have explored Azure Database for PostgreSQL Hyperscale and found that it meets their requirements and offers some unique capabilities such as distribution of the data across multiple nodes. However, due to latency issues and some compliance reasons, they want to run PostgreSQL databases in their own environment. Therefore they are excited about the opportunity of deploying PostgreSQL Hyperscale in their Azure Arc enabled environment.

Now that you are familiar with the existing Kubernetes cluster and Data controller, let's perform the following in this exercise:

  • Create a Postgres Hyperscale Server group
  • Configure & Scale, Connect source database
  • Backup & Restore on Postgres DB
  • Monitor/Visualize with Grafana & Kibana Dashboards

Task 1: Create a Postgres Hyperscale Server group and connect to the Azure Arc enabled PostgreSQL Hyperscale server group.

Contoso is happy with the current performance of PostgreSQL Hyperscale but they also anticipate that when they go in production, the database will grow significantly and there will be more users querying the database. Therefore they want to leverage the unique capability of PostgreSQL Hyperscale to scale out with additional worker nodes. They also like the fact that they will be able to add nodes and redistribute the data across the worker nodes as an online operation.

Info: (Server group)- The nodes in the server group collectively hold more data and use more CPU cores than would be possible on a single server. This architecture also allows the database to scale by adding more nodes to the server group.

Let's begin with learning how to create and connect the Postgres Hyperscale Server group.

  1. If you are not already in the Azure Arc Data Controller Dashboard, perform this step, else skip to next step. First, in the Connections panel, under AZURE ARC CONTROLLERS, right-click on the arcdc data controller and select Manage.

  2. Then click on + New Instance in the top left corner of the Azure Arc Data Controller dashboard.

  3. From the deployment options blade, select the PostgreSQL Hyperscale server group - Azure Arc and click on Select

  4. In the next blade, Accept the Privacy and license terms and click Next at the bottom.

  5. In the Deploy PostgreSQL Hyperscale server group - Azure Arc blade, enter the following information:

    Under General settings

    • Server group name: Enter postgres

      postgres
    • Password: Enter Password.!!1

      Password.!!1
    • Number of workers: Enter 2

      2
    • Port: Leave it default

    Note: By taking 2 worker nodes initially will help server group hold more data and use more CPU cores to run the queries across distributed data efficiently right from the startup. and leave empty for extension.

    Under Worker Nodes Compute Configuration

    • CPU request: Enter 2

      2
    • CPU limit: Enter 2.5

      2.5
    • Memory request: Enter 2.5

      2.5
    • Memory limit: Enter 2.5

      2.5

    Under Coordinator Node Compute Configuration

    • CPU request: Enter 2

      2
    • CPU limit: Enter 2.5

      2.5
    • Memory request: Enter 2.5

      2.5
    • Memory limit: Enter 2.5

      2.5

  6. Leave the Storage setting default.

  7. Once the values are entered, click on the Deploy button. This initiates the creation of the Azure Arc enabled PostgreSQL Hyperscale server group on the Arc enabled data services environment.

  8. Now, select New Python installation and click on Next button.

  9. On Install Dependencies tab, click on Install.

  10. Now a deploy.postgres.existing.arc Notebook will open after clicking on the deploy button and automatically execute the cells to deploy the PostgreSQL Hyperscale servergroup.

  11. Scroll down to the bottom of the page and you will see that the PostgreSQL Hyperscale - Azure Arc instance creation has been started.

  12. The deployment will take around 5 to 10 minutes and after successful deployment, you can see the output "arcpostgres is Ready"

    Please note that you have to wait for the deployment to be ready to proceed to the next step.

  13. Once the deployment is complete, in Azure Arc Data Controller dashboard page in the Azure Data Studio, click on Refresh.

  14. Under Azure Arc Resources, you will be able to see the recently created Postgres Hyperscale Server group. After that, minimize the Azure Data Studio.

Connect source database and Azure Arc enabled PostgreSQL server

  1. Open the command prompt window and run the following to get the Postgres servers list in AKS using the Azure Arc data controller. You will see the output as mentioned in the screenshot below.

    az postgres arc-server list --k8s-namespace arcdc --use-k8s

  2. On your LabVM, open the PgAdmin tool from desktop to connect to the Azure arc enabled Postgres Hyperscale database.

    Note: If you get any prompts to unlock master password, please provide Password.!!1, select Save Password and then click on OK.

    Password.!!1
  3. Now, right-click on the Servers and click on Create. Then, click on Server.

  4. Now provide the following details:

    • Name: postgres01
      postgres01

  5. Run the following command in command prompt window to get connection string details for postgres01 server endpoint.

     az postgres arc-server endpoint list --k8s-namespace arcdc --use-k8s

  6. Now, select the connection tab and enter the connection string details by getting the values from the output of the previous azdata endpoint list command. You will see the following line in the previous output. Fetch the IP Address from the endpoint URL and add it in the connection string details.

    PostgreSQL Instance   postgresql://postgres:<replace with password>@40.121.8.176:5432
    • Host name/address: Provide the copied IP Address
    • Port: 5432
    • Username: postgres
    • Password:
      Password.!!1
      
    • Tick "Save Password?" field

  7. After you click on Save, you will be connected to the server.

Task 2: How to migrate from different servers?

Now, let's migrate a PostgreSQL database to Azure Arc enabled PostgreSQL Hyperscale server group. The following steps are applicable to PostgreSQL database running on any cloud server or on-premises.

Backup from Source Server:

Let's take a backup of the Postgres Server running on a server named PostgreLocal. It contains a database named Arc-Demo-PG which has 4 tables.

Take a backup of the source database in on-premises

  1. In the pgAdmin, click on the arrow icon next to PostgreLocal Server to expand the menu. If you get any prompts for password, please provide "Password.!!1"

    Password.!!1

  2. You will get below screen after entring the password.

  3. Then right-click on the Arc-Demo-PG and select Backup... option.

  4. Then provide the following details:

    • Filename: C:\Users\arcadmin\Arc-Demo-Bkp

      C:\Users\arcadmin\Arc-Demo-Bkp
    • Format: Custom

      Custom

  5. You can keep rest of the values as default and then click on Backup.

  6. Now, the backup job will get started and will get completed successfully in a few seconds.

  7. Close the backup prompt by clicking on the close icon (x) in the prompt.

Restore to Destination Server:

Now let's restore the sample database Arc-Demo-Bkp from source server PostgresLocal on to the Postgres server postgres01 running on an Azure Arc environment with a new Database named Restored_Arc-Demo-Bkp.

First, let's create an empty database on the destination system in your Azure Arc enabled PostgreSQL Hyperscale server group

  1. Right-click on the postgres01 server on the pgAdmin and then Select Create -> Database option from postgres01 database menu

  2. In the create window that comes up, provide the Database name as Restored_Arc-Demo-PG and then click on the Save button at the bottom of the blade.

Now, you will restore the database into your Arc enabled data services environment:

  1. Expand the postres01 server by clicking on the " > " icon next to postgres01.

  2. Right-click on the Restored_Arc-Demo-PG database and then click on the Restore option.

  3. In the restore window that comes up, provide the following path for the Filename: C:\Users\arcadmin\Arc-Demo-Bkp

    C:\Users\arcadmin\Arc-Demo-Bkp
  4. Keep the default values for the rest of the options and then click on Restore.

  5. Now, a restore job will be created which will take around 1 second to complete successfully.

  6. Once the restoration is successful, you can close the prompt by clicking on x icon at the top right of the prompt.

Verification of the restored database on Azure Arc enabled PostgreSQL Hyperscale server group:

Now, we are done with the backup and restoration, let's verify if the database has been restored from the database running on the source server to the Arc enabled PostgreSQL Server by querying the data in both databases.

Now, use PgAdmin in which you are already connected to both Postgres instances ( local and Arc ).

  1. First, you will connect to the database Arc-Demo-PG in PostgreLocal server and query the row count in table pgbench_accounts:

  2. Right-click on the database Arc-Demo-PG and select Query tool to open a new query window

  3. Copy and paste the below query in the Query Editor

    select count(*) from Public.pgbench_accounts;
  4. Click on > icon on the top right to run the query to see the no of rows of pgbench_accounts table

    You'll see that the no of rows of data is 10000.

  5. Now since we have count from the local database, let's see the output for the same query in the new database we created. Connect to the database Restored_Arc-Demo-PG and query the row count in table pgbench_accounts:

  6. Right-click on the database Restored_Arc-Demo-PG and select Query tool to open a new query window

  7. Copy and paste the below query in the Query Editor

    select count(*) from Public.pgbench_accounts;
  8. Click on > icon on the top right to run the query to see the no of rows of pgbench_accounts table

The database migration was successful if the row counts are the same in both Arc-Demo-PG and Restored_Arc-Demo-PG Databases.

Task 3: Distribution(Shard) of data on worker nodes in Azure Arc PostgreSQL – Hyperscale

In this task, let us work on the Distribution (Shard) of data on worker nodes.

Info: Distributing table rows across multiple PostgreSQL servers is a key technique for scalable queries in Hyperscale (Citus). Together, multiple nodes can hold more data than a traditional database, and in many cases can use worker CPUs in parallel to execute queries. To know more about Shard data on PostgreSQL – Hyperscale, you can check : Shard data on PostgreSQL – Hyperscale

First, open the query tool by right-clicking on postgres database in postgres01 server and select the Query Tool... option.

In the previous tasks, we had created a Postgres Hyperscale server group with two worker nodes.

  1. First run the following query to check the active workers in the pg_dist_node table.

    select nodeid from pg_dist_node where isactive;

    You will get a similar result

Distribution of table data throughout the Hyperscale server group.

  1. You have to distribute table data throughout the Hyperscale server group.

    Info: Distributing a table assigns each row to a logical group called a shard.

    Run following queries to create a table and distribute it

    create table users ( email text primary key, bday date not null );

    select create_distributed_table('users', 'email');

    Hyperscale assigns each row to a shard. Every row will be in exactly one shard, and every shard can contain multiple rows.

  2. By default create_distributed_table() makes 32 shards, as you can see by counting in the metadata table pg_dist_shard

    Run following query to check the counting in the metadata table pg_dist_shard

    select logicalrelid, count(shardid) from pg_dist_shard group by logicalrelid;

  3. Run Following query to create sample data for users table

    insert into users
    select
       md5(random()::text) || '@test.com',
       date_trunc('day', now() - random()*'100 years'::interval)
    from generate_series(1, 1000);

View distributed data by Querying distributed tables

In the previous sections, you have seen how distributed table rows are placed in shards on worker nodes. Now let's see the distributed data present in the users table.

  1. You can look at the shard placements in pg_dist_placement. Joining it with the other metadata tables you can see where each shard is placed.

    select
     shard.logicalrelid as table,
     placement.shardid as shard,
     node.nodename as host
    from
     pg_dist_placement placement,
     pg_dist_node node,
     pg_dist_shard shard
    where placement.groupid = node.groupid
      and shard.shardid = placement.shardid
    order by shard
    limit 5;

    Note: In the above result you can see distribution of shards into different worker nodes under host text coloumn which shows DNS names or IP address of worker nodes.

  2. Run the following query to find the average age of users, treating the distributed users table like it's a normal table

    select avg(current_date - bday) as avg_days_old from users;

    The output shows an example of a Distribution (sharding) of data across worker nodes.

Task 4: Backup/Restore and Review the distribution of data

In this task, let's see how to backup the database in Arc enabled PostgreSQL Servers and restore it.

  1. To get started, open the Command Prompt window.

  2. Now, let's verify if the Hyperscale server group has been configured to use a backup storage class.

  3. Run the following command to view details of the existing PostgreSQL Hyperscale server

    az postgres arc-server show --name postgres --k8s-namespace arcdc --use-k8s

    Scroll and look at the storage section of the output and confirm backup storage class is there:

Take a manual full backup

  1. Run the following command to take a full backup of the entire data and log folders of the Postgres server group by :

    azdata arc postgres backup create --name backup01 --server-name arcpostgres

    This command will coordinate a distributed full backup across all the nodes that constitute Azure Arc enabled PostgreSQL Hyperscale server group.

    Where:

    • name indicates the name of a backup
    • server-name indicates a server group

    When the backup completes, the ID, name, size, state, and timestamp of the backup will be returned as showed in the image below.

    In the above result, "+00:00" means UTC time (UTC + 00 hour 00 minutes)

List backups

  1. Now let's list the backups that are available to restore.

    azdata arc postgres backup list --server-name arcpostgres

    Returns an output as mentioned in the image below:

Restore a full backup

  1. After confirming that the backup was successful, let's restore the backup of server group postgres onto itself.

    Run following command to restore postgres DB onto itself. Get the backup-id from the output of the last command.

    azdata arc postgres backup restore -sn arcpostgres --backup-id <backup-id>

    Note that this operation is only supported for PostgreSQL version 12 and higher.

Reviewing Distribution of data

Note: Distributing a table assigns each row to a logical group called a shard. Here only the table and it's rows are distributed to different shards. So, the resulted output data will remain same even after distributing the table rows.

  1. Now, go back to the pgAdmin portal in Edge Browser. In distributed table by default create_distributed_table() makes 32 shards, as you can see by counting in the metadata table pg_dist_shard

    Run following query against postgres database in postgres01 server to check the count in the metadata table pg_dist_shard

    select logicalrelid, count(shardid) from pg_dist_shard group by logicalrelid;

  2. You can look at the shard placements in pg_dist_placement. Joining it with the other metadata tables you can see where each shard is placed.

    select
     shard.logicalrelid as table,
     placement.shardid as shard,
     node.nodename as host
    from
     pg_dist_placement placement,
     pg_dist_node node,
     pg_dist_shard shard
    where placement.groupid = node.groupid
      and shard.shardid = placement.shardid
    order by shard
    limit 5;

    Note: In the above result you can see distribution of shards into different worker nodes under host text coloumn.

  3. Run the following query to find the average age of users which gets data from distributed table users.

    select avg(current_date - bday) as avg_days_old from users;

    You have executed the above query by treating the distributed users table like it's a normal table

    The output shows distributed data across worker nodes.

Task 5: Configure & Scale, and review the distribution of data.

Contoso is happy with the current performance of PostgreSQL Hyperscale but they also anticipate that when they go in production, the database will grow significantly and there will be more users querying the database. Therefore they want to leverage the unique capability of PostgreSQL Hyperscale to scale out with additional worker nodes. They also like the fact that they will be able to add nodes and redistribute the data across the worker nodes as an online operation.

Now let's see how to Configure & Scale, and review the distribution of data on the Database.

  1. Launch Command Prompt from desktop of the LabVM.

  2. Run the following query to verify that you currently have three Hyperscale worker nodes, each corresponding to a Kubernetes pod.

    azdata arc postgres server list

  3. Now to scale-out Azure Arc enabled PostgreSQL Hyperscale, in the command prompt run the following command. This will increase the number of worker nodes from 2 to 3.

    azdata arc postgres server edit -n arcpostgres -w 3

    Note: Command will take few minutes to complete

  4. Run the following command and verify that the server group is now using the additional worker nodes you added. In the output, you should be able to see 3 worker nodes.

    azdata arc postgres server list

    Info: Once the nodes are available, the Hyperscale Shard Rebalancer runs automatically and redistributes the data to the new nodes. The scale-out operation is an online operation. While the nodes are added and the data is redistributed across the nodes, the data remains available for queries.

  5. You can also scale up, from PostgreSQL Hyperscale - Azure Arc Dashboard.

Reviewing Distribution of data

Note: Distributing a table assigns each row to a logical group called a shard. Here only the table and it's rows are distributed to different shards. So, the resulted output data will remain same even after distributing the table rows.

  1. Now, go back to the pgAdmin portal in Edge Browser. In distributed table by default create_distributed_table() makes 32 shards, as you can see by counting in the metadata table pg_dist_shard

    Run following query to check the count in the metadata table pg_dist_shard

    select logicalrelid, count(shardid) from pg_dist_shard group by logicalrelid;

  2. You can look at the shard placements in pg_dist_placement. Joining it with the other metadata tables you can see where each shard is placed.

    select
     shard.logicalrelid as table,
     placement.shardid as shard,
     node.nodename as host
    from
     pg_dist_placement placement,
     pg_dist_node node,
     pg_dist_shard shard
    where placement.groupid = node.groupid
      and shard.shardid = placement.shardid
    order by shard
    limit 5;

    Note: In the above result you can see distribution of shards into different worker nodes under host text coloumn.

  3. Run the following query to find the average age of users which gets data from distributed table users.

    select avg(current_date - bday) as avg_days_old from users;

    You have executed the above query by treating the distributed users table like it's a normal table. You will get a similar output as shown below.

Task 6: Monitor/Visualize with Dashboards

Now that you are connected to a data controller, let's view the dashboards for the data controller and any SQL managed instances or PostgreSQL Hyperscale server group resources that you have.

  1. Open Azure Data Studio. In the Connections panel, under Arc Controllers right-click on the arcdc data controller and select Manage. and then click on arcpostgres under Azure Arc Resources.

  2. In the Azure Arc Data Controller dashboard, you can see Grafana and Kibana Dashboard URLs along with details about the data controller resource.

    Info: Kibana and Grafana web dashboards are provided to bring insight and clarity to the Kubernetes namespaces being used by Azure Arc enabled data services.

  3. Now copy the endpoint for Kibana dashboard and paste this endpoint url in a browser. If you get a prompt - connection isn't private, you can click on Advanced and then select Continue to ip address to access the link.

  4. Enter below user name and password for Postgres DB.

    Note You have to enter the credentials of the Azure Arc data controller.

    • User name : arcuser

      arcuser
    • Password : Password.1!!

      Password.1!!

    You can see all logs under discover tab from left side menu.

Info: You can learn more about kibana here: https://docs.microsoft.com/en-us/azure/azure-arc/data/monitor-grafana-kibana

View the Visualization and metric using grafana graph

  1. Now, copy the endpoint for the Grafana dashboard and paste this endpoint url in a browser. If you get a prompt - connection isn't private, you can click on Advanced and then select Continue to ip address to access the link.

  2. Enter below user name and password for Postgres DB.

    • Note You have to enter the credentials of the Azure Arc data controller.

    • User name : arcuser

      arcuser
    • Password : Password.1!!

      Password.1!!

    You can explore the metrics for postgres server on this Grafana page.

    You can learn more about Grafana here: https://docs.microsoft.com/en-us/azure/azure-arc/data/monitor-grafana-kibana

Task 7: Upload logs, metrics, and usages to Azure Monitor.

In this task, let's upload logs for your Azure Arc enabled PostgreSQL Hyperscale server groups to Azure Monitor and view your logs in the Azure portal

  1. Now in the command prompt run the following command to make sure that all environment variables required are set. As you can see from the outputs, we have already set the values in the environment for the variables. Azure Log Analytics Workspace has been pre-created for you and the Workspace ID and Shared Key are also added in the environment variables.

    echo %WORKSPACE_ID%
    echo %WORKSPACE_SHARED_KEY%
    echo %SPN_TENANT_ID%
    echo %SPN_CLIENT_ID%
    echo %SPN_CLIENT_SECRET%
    echo %SPN_AUTHORITY%

    Note: You have already added the variables of Service principal details, So you don't have to add these variables value.

  2. In the command prompt run the following to log in to the Azure Arc data controller. Follow the prompts to set the namespace as arcdc.

    azdata login
    • Namespace: Enter arcdc
    arcdc
    

  3. Export all logs to the specified file:

    azdata arc dc export --type logs --path logs.json
  4. Upload logs to an Azure monitor log analytics workspace:

    azdata arc dc upload --path logs.json

  5. Now to view your logs in the Azure portal, open the Azure portal and then search for log analytics workspace in the search bar at the top and then open it from the search results.

  6. In the Log Analytics workspaces page, select logazure-arc workspace.

  7. Now in your log analytics workspace page, from the left navigation menu under General select Logs and click on Get Started.

  8. Now, click on the x to close the Query popup.

  9. And then, click on >> icon to expand the Schema and Filter tab.

  10. Then, check if CustomLogs is there under Tables section. If you don't see CustomLogs there, refresh the page every 2 minutes until it is available. Expand Custom Logs at the bottom of the list of tables and you will see a table with name postgresInstances_logs_CL.

  11. Double click on the table called postgresInstances_logs_CL, you will see that a query window open. Now click on the run button to run the query.

  12. After running the query you will see some outputs generated.

After this exercise, you have learned the following

  • Create a Postgres Hyperscale Server group.
  • Backup/Restore and Review the distribution of data.
  • How to migrate from different server.
  • Distribution(Shard) of data on worker nodes.
  • Configure & Scale, and review the distribution of data.
  • Monitor/Visualize with Azure Data Studio Dashboards.
  • Upload usage data, metrics, and logs to Azure Monitor.