Skip to content

Commit

Permalink
Ingest all datasets and fix scopes array bug (microsoft#1011)
Browse files Browse the repository at this point in the history
  • Loading branch information
flanakin authored Sep 27, 2024
1 parent 12bfaa9 commit cbf21df
Show file tree
Hide file tree
Showing 23 changed files with 1,481 additions and 789 deletions.
4 changes: 0 additions & 4 deletions docs/_reporting/hubs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,10 +100,6 @@ Once deployed, you can report on the data in Power BI or by connecting to the st

## ➕ Create a new hub

<blockquote class="note" markdown="1">
_FinOps hubs 0.4 introduces support for FOCUS 1.0. This is **not** a breaking change and is completely backwards compatible with v0.3. To learn more, please refer to the [Upgrade guide](./upgrade.md)._
</blockquote>

1. **Deploy your FinOps hub.**

{% include deploy.html template="finops-hub" public="1" gov="0" china="0" %}
Expand Down
32 changes: 24 additions & 8 deletions docs/_reporting/hubs/configure-scopes.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,6 @@ If you cannot grant permissions for your scope, you can create Cost Management e
- **Type of data** = `Cost and usage details (FOCUS)`<sup>1</sup>
- **Dataset version** = `1.0`<sup>2</sup>
- **Frequency** = `Daily export of month-to-date costs`<sup>3</sup>
- **File partitioning** = On
- **Overwrite data** = Off<sup>4</sup>
- **Storage account** = (Use subscription/resource deployed with your hub)
- **Container** = `msexports`
- **Format** = `CSV`
Expand All @@ -56,14 +54,32 @@ If you cannot grant permissions for your scope, you can create Cost Management e
- _**MCA billing profile:** `billingProfiles/{billing-profile-id}`_
- _**Subscription:** `subscriptions/{subscription-id}`_
- _**Resource group:** `subscriptions/{subscription-id}/resourceGroups/{rg-name}`_
- **Format** = Parquet
- **Compression** = Snappy
- **File partitioning** = On
- **Overwrite data** = Off<sup>4</sup>

2. Create another export with the same settings except set **Frequency** to `Monthly export of last month's costs`.
3. Run your exports to initialize the dataset.
3. Create exports for any additional data you would like to include in your reports.
- Supported datasets and versions:
- Price sheet (2023-05-01)
- Reservation details (2023-03-01)
- Reservation recommendations (2023-05-01)
<blockquote class="note" markdown="1">
_Virtual machine reservation recommendations exports are required on the Reservation recommendations page of the Rate optimization report. If you do not create an export, the page will be empty._
</blockquote>
- Reservation transactions (2023-05-01)
- Supported formats: Parquet (preferred) or CSV
- Supported compression: Snappy or uncompressed
<blockquote class="important" markdown="1">
_GZip compression is not supported as of FinOps hubs 0.6. Please use uncompressed CSV if snappy parquet is not supported for the desired dataset._
</blockquote>
4. Run your exports to initialize the dataset.
- Exports can take up to a day to show up after first created.
- Use the **Run now** command at the top of the Cost Management Exports page.
- Your data should be available within 15 minutes or so, depending on how big your account is.
- If you want to backfill data, open the export details and select the **Export selected dates** command to export one month at a time or use the [Start-FinOpsCostExport PowerShell command](../../_automation/powershell/cost/Start-FinOpsCostExport.md) to export a larger date range.
4. Repeat steps 1-3 for each scope you want to monitor.
5. Repeat steps 1-4 for each scope you want to monitor.

_<sup>1) FinOps hubs 0.2 and beyond requires FOCUS cost data. As of July 2024, the option to export FOCUS cost data is only accessible from the central Cost Management experience in the Azure portal. If you do not see this option, please search for or navigate to [Cost Management Exports](https://portal.azure.com/#blade/Microsoft_Azure_CostManagement/Menu/open/exports).</sup>_
_<sup>2) FinOps hubs 0.4 supports both FOCUS 1.0 and FOCUS 1.0 preview. Power BI reports in 0.4 are aligned to FOCUS 1.0 regardless of whether data was ingested as FOCUS 1.0 preview. If you need 1.0 preview data and reports, please use FinOps hubs 0.3.</sup>_
Expand Down Expand Up @@ -129,11 +145,11 @@ Managed exports use a managed identity (MI) to configure the exports automatical

3. **Backfill historical data.**

As soon as you configure a new scope, FinOps hubs will start to monitor current and future costs. To backfill historical data, you must run the **config_RunBackfill** pipeline.
As soon as you configure a new scope, FinOps hubs will start to monitor current and future costs. To backfill historical data, you must run the **config_RunBackfillJob** pipeline for each month.

To run the pipeline from the Azure portal:

1. From the FinOps hub resource group, open the Data Factory instance, select **Launch Studio**, and navigate to **Author** > **Pipelines** > **config_RunBackfill**.
1. From the FinOps hub resource group, open the Data Factory instance, select **Launch Studio**, and navigate to **Author** > **Pipelines** > **config_RunBackfillJob**.
2. Select **Debug** in the command bar to run the pipeline. The total run time will vary depending on the retention period and number of scopes you're monitoring.

To run the pipeline from PowerShell:
Expand All @@ -146,7 +162,7 @@ Managed exports use a managed identity (MI) to configure the exports automatical
Invoke-AzDataFactoryV2Pipeline `
-ResourceGroupName $_.ResourceGroupName `
-DataFactoryName $_.DataFactoryName `
-PipelineName 'config_RunBackfill'
-PipelineName 'config_RunBackfillJob'
}
```

Expand Down Expand Up @@ -220,7 +236,7 @@ If this is the first time you are using the FinOps toolkit PowerShell module, re
2. Create the export and run it now to backfill up to 12 months of data.

```powershell
New-FinopsCostExport -Name 'ftk-FinOpsHub-costs' `
New-FinOpsCostExport -Name 'ftk-FinOpsHub-costs' `
-Scope "{scope-id}" `
-StorageAccountId "{storage-resource-id}" `
-Backfill 12 `
Expand Down
146 changes: 136 additions & 10 deletions docs/_reporting/hubs/data-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ From data cleanup to normalization, FinOps hubs do the work so you can focus on
- [📥 Data ingestion](#-data-ingestion)
- [ℹ️ About ingestion](#ℹ️-about-ingestion)
- [ℹ️ About exports](#ℹ️-about-exports)
- [🗃️ FinOps hubs v0.4-0.5](#️-finops-hubs-v04-05)
- [🗃️ FinOps hubs v0.2-0.3](#️-finops-hubs-v02-03)
- [🗃️ FinOps hubs v0.1](#️-finops-hubs-v01)
- [⏭️ Next steps](#️-next-steps)
Expand Down Expand Up @@ -55,8 +56,8 @@ This diagram shows what happens when the daily and monthly schedules are run.
```mermaid
sequenceDiagram
config->>config: ① config_Daily/MonthlySchedule
config->>config: ② config_ExportData
config->>config: ③ config_RunExports
config->>config: ② config_StartExportProcess
config->>config: ③ config_RunExportJobs
config->>Cost Management: ③ POST /exports/foo/run
Cost Management->>msexports: ④ Export data
msexports->>msexports: ⑤ msexports_ExecuteETL
Expand All @@ -67,12 +68,12 @@ sequenceDiagram
<br>

1. The **config_DailySchedule** and **config_MonthlySchedule** triggers run on their respective schedules to kick off data ingestion.
2. The **config_ExportData** pipeline gets the applicable exports for the schedule that is running.
3. The **config_RunExports** pipeline executes each of the selected exports.
2. The **config_StartExportProcess** pipeline gets the applicable exports for the schedule that is running.
3. The **config_RunExportJobs** pipeline executes each of the selected exports.
4. Cost Management exports raw cost details to the **msexports** container. [Learn more](#ℹ️-about-exports).
5. The **msexports_ExecuteETL** pipeline kicks off the extract-transform-load (ETL) process when files are added to storage.
6. The **msexports_ETL_ingestion** pipeline transforms the data to a standard schema and saves the raw data in parquet format to the **ingestion** container. [Learn more](#ℹ️-about-ingestion).
7. Power BI reads cost data from the **ingestion** container.
6. The **msexports_ETL_ingestion** pipeline transforms the data to parquet format and moves it to the **ingestion** container using a scalable file structure. [Learn more](#ℹ️-about-ingestion).
7. Power BI or other tools read data from the **ingestion** container.

<br>

Expand All @@ -81,14 +82,13 @@ sequenceDiagram
FinOps hubs rely on a specific folder path in the **ingestion** container:

```text
ingestion/{scope-id}/{month}/focuscost
ingestion/{dataset}/{yyyy}/{mm}/{scope-id}
```

- `ingestion` is the container where the data pipeline saves data.
- `{scope-id}` is expected to be the fully-qualified resource ID of the scope the data is from.
- `{dataset}` is the exported dataset type.
- `{month}` is the year and month of the exported data formatted as `yyyyMM`.
- `focuscost` is the exported dataset.
> Hubs 0.2 only supports FOCUS cost exports. Other export types will be added in a future release.
- `{scope-id}` is expected to be the fully-qualified resource ID of the scope the data is from.

If you need to use hubs to monitor non-Azure data, convert the data to [FOCUS](../../_docs/focus/README.md) and drop it into the **ingestion** container. Please note this has not been explicitly tested in the latest release. If you experience any issues, please [create an issue](https://aka.ms/ftk/idea).

Expand Down Expand Up @@ -155,6 +155,132 @@ FinOps hubs leverage the following properties:

<a name="datasets"></a>FinOps hubs support the following dataset types, versions, and API versions:

- FocusCost: `1.0`, `1.0-preview(v1)`
- PriceSheet: `2023-05-01`
- ReservationDetails: `2023-03-01`
- ReservationRecommendations: `2023-05-01`
- ReservationTransactions: `2023-05-01`
- API versions: `2023-07-01-preview`

<br>

## 🗃️ FinOps hubs v0.4-0.5

### Scope setup in v0.4-0.5

This diagram shows what happens when a new, managed scope is added to a hub instance. Unmanaged scopes (where Cost Management exports are manually configured) do not require any setup in hubs.

```mermaid
sequenceDiagram
config->>config: ① config_SettingsUpdated
config->>config: ② config_ConfigureExports
config->>Cost Management: ② PUT .../exports/foo
```

1. The **config_SettingsUpdated** trigger runs when the **settings.json** file is updated.
2. The **config_ConfigureExports** pipeline creates new exports for any new scopes that were added.

### Data ingestion in v0.4-0.5

This diagram shows what happens when the daily and monthly schedules are run.

```mermaid
sequenceDiagram
config->>config: ① config_Daily/MonthlySchedule
config->>config: ② config_ExportData
config->>config: ③ config_RunExports
config->>Cost Management: ③ POST /exports/foo/run
Cost Management->>msexports: ④ Export data
msexports->>msexports: ⑤ msexports_ExecuteETL
msexports->>ingestion: ⑥ msexports_ETL_ingestion
Power BI-->>ingestion: ⑦ Read data
```

1. The **config_DailySchedule** and **config_MonthlySchedule** triggers run on their respective schedules to kick off data ingestion.
2. The **config_ExportData** pipeline gets the applicable exports for the schedule that is running.
3. The **config_RunExports** pipeline executes each of the selected exports.
4. Cost Management exports raw cost details to the **msexports** container. [Learn more](#about-exports-in-v04-05).
5. The **msexports_ExecuteETL** pipeline kicks off the extract-transform-load (ETL) process when files are added to storage.
6. The **msexports_ETL_ingestion** pipeline transforms the data to a standard schema and saves the raw data in parquet format to the **ingestion** container. [Learn more](#about-ingestion-in-v04-05).
7. Power BI reads cost data from the **ingestion** container.

### About ingestion in v0.4-0.5

FinOps hubs rely on a specific folder path in the **ingestion** container:

```text
ingestion/{scope-id}/{month}/focuscost
```

- `ingestion` is the container where the data pipeline saves data.
- `{scope-id}` is expected to be the fully-qualified resource ID of the scope the data is from.
- `{month}` is the year and month of the exported data formatted as `yyyyMM`.
- `focuscost` is the exported dataset.
> Hubs 0.2 only supports FOCUS cost exports. Other export types will be added in a future release.
If you need to use hubs to monitor non-Azure data, convert the data to [FOCUS](../../_docs/focus/README.md) and drop it into the **ingestion** container. Please note this has not been explicitly tested in the latest release. If you experience any issues, please [create an issue](https://aka.ms/ftk/idea).

### About exports in v0.4-0.5

FinOps hubs leverage Cost Management exports to obtain cost data. Cost Management controls the folder structure for the exported data in the **msexports** container. A typical path looks like:

```text
{container}/{path}/{date-range}/{export-name}/{export-time}/{guid}/{file}
```

As of 0.4, FinOps hubs do not rely on file paths. Hubs utilize the manifest file to identify the scope, dataset, month, etc. The only important part of the path for hubs is the container, which must be **msexports**.

<blockquote class="warning" markdown="1">
_Do not export data to the **ingestion** container. Exported CSVs **must** be published to the **msexports** container to be processed by the hubs engine._

_To ingest custom data, save FOCUS-aligned parquet files in the **ingestion** container for the FinOps toolkit Power BI reports to work as expected._
</blockquote>

Export manifests can change with API versions. Here's an example with API version `2023-07-01-preview`:

```json
{
"exportConfig": {
"exportName": "<export-name>",
"resourceId": "/<scope>/providers/Microsoft.CostManagement/exports/<export-name>",
"dataVersion": "<dataset-version>",
"apiVersion": "<api-version>",
"type": "<dataset-type>",
"timeFrame": "OneTime|TheLastMonth|MonthToDate",
"granularity": "Daily"
},
"deliveryConfig": {
"partitionData": true,
"dataOverwriteBehavior": "CreateNewReport|OverwritePreviousReport",
"fileFormat": "Csv",
"containerUri": "<storage-resource-id>",
"rootFolderPath": "<path>"
},
"runInfo": {
"executionType": "Scheduled",
"submittedTime": "2024-02-03T18:33:03.1032074Z",
"runId": "af754a8e-30fc-4ef3-bfc6-71bd1efb8598",
"startDate": "2024-01-01T00:00:00",
"endDate": "2024-01-31T00:00:00"
},
"blobs": [
{
"blobName": "<path>/<export-name>/<date-range>/<export-time>/<guid>/<file-name>.csv",
"byteCount": ###
}
]
}
```

FinOps hubs leverage the following properties:

- `eportConfig.resourceId` to identify the scope.
- `eportConfig.type` to identify the dataset type.
- `eportConfig.dataVersion` to identify the dataset version.
- `runInfo.startDate` to identify the exported month.

FinOps hubs support the following dataset types, versions, and API versions:

- FocusCost
- 1.0
- 1.0-preview(v1)
Expand Down
12 changes: 6 additions & 6 deletions docs/_reporting/hubs/template.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,16 +117,16 @@ Resources use the following naming convention: `<hubName>-<purpose>-<unique-suff
- `msexports_ExecuteETL` – Queues the `msexports_ETL_ingestion` pipeline to account for Data Factory pipeline trigger limits.
- `msexports_ETL_transform` – Converts Cost Management exports into parquet and removes historical data duplicated in each day's export.
- `config_ConfigureExports` – Creates Cost Management exports for all scopes.
- `config_BackfillData` – Runs the backfill job for each month based on retention settings.
- `config_RunBackfill` – Creates and triggers exports for all defined scopes for the specified date range.
- `config_ExportData` – Gets a list of all Cost Management exports configured for this hub based on the scopes defined in settings.json, then runs each export using the config_RunExports pipeline.
- `config_RunExports` – Runs the specified Cost Management exports.
- `config_StartBackfillProcess` – Runs the backfill job for each month based on retention settings.
- `config_RunBackfillJob` – Creates and triggers exports for all defined scopes for the specified date range.
- `config_StartExportProcess` – Gets a list of all Cost Management exports configured for this hub based on the scopes defined in settings.json, then runs each export using the config_RunExportJobs pipeline.
- `config_RunExportJobs` – Runs the specified Cost Management exports.
- `msexports_ExecuteETL` – Triggers the ingestion process for Cost Management exports to account for Data Factory pipeline trigger limits.
- `msexports_ETL_transform` – Converts Cost Management exports into parquet and removes historical data duplicated in each day's export.
- Triggers:
- `config_SettingsUpdated` – Triggers the `config_ConfigureExports` pipeline when settings.json is updated.
- `config_DailySchedule` – Triggers the `config_RunExports` pipeline daily for the current month's cost data.
- `config_MonthlySchedule` – Triggers the `config_RunExports` pipeline monthly for the previous month's cost data.
- `config_DailySchedule` – Triggers the `config_RunExportJobs` pipeline daily for the current month's cost data.
- `config_MonthlySchedule` – Triggers the `config_RunExportJobs` pipeline monthly for the previous month's cost data.
- `msexports_FileAdded` – Triggers the `msexports_ExecuteETL` pipeline when Cost Management exports complete.
- `<hubName>-vault-<unique-suffix>` Key Vault instance
- Secrets:
Expand Down
Loading

0 comments on commit cbf21df

Please sign in to comment.