Skip to content

Commit

Permalink
add new stuff
Browse files Browse the repository at this point in the history
  • Loading branch information
calmacx committed Feb 14, 2024
1 parent 5556896 commit 8d9042b
Show file tree
Hide file tree
Showing 3 changed files with 116 additions and 58 deletions.
69 changes: 11 additions & 58 deletions docs/creating-a-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@ You should POST application/JSON data to the endpoint where your metadata valida
{"metadata": <metadata>},
```

=== " python requests "
=== " python "

Using python `requests`
``` python
import requests
import json
Expand All @@ -26,7 +25,6 @@ You should POST application/JSON data to the endpoint where your metadata valida
"Content-Type": "application/json",
}


response = requests.post(
f"{api_path}/integrations/datasets",
headers=headers,
Expand All @@ -36,61 +34,6 @@ You should POST application/JSON data to the endpoint where your metadata valida
print(json.dumps(response.json(), indent=6))
```

Running this returns:
```
{
"message": "created",
"data": <dataset_id : Integer>,
"version": <dataset_version: Integer>
}
```

=== "python locust.io"

Create/using the file `api-test.py`
``` python

from locust import HttpUser, task, between
import json


class BetaTester(HttpUser):
wait_time = between(5, 9)

metadata = json.load(open("example-hdruk212.json"))

client_id = "fScHE7KHejPZb0TLh4vgdJoitfymyGSMLt7oS10e"
app_id = "3pO6liuh64iYRkTlTEpZrdGGj8IJnTFH5h3l7HAC"
api_path = "/api/v1"
headers = {
"client_id": client_id,
"app_id": app_id,
"Content-Type": "application/json",
}


class UserCreatingDataset(BetaTester):

@task
def create_datasets(self):
data = { "metadata": self.metadata}

response = self.client.post(
f"{self.api_path}/integrations/datasets",
headers=self.headers,
json=data,
)
if response.status_code != 201:
print("Error:", response.status_code)
else:
print(json.dumps(response.json(), indent=6))
```

Run with:
```
locust -f api-test.py --headless -u 1 -r 1 -t 30 --host http://localhost:8000 UserCreatingDataset
```

=== "CURL"

```
Expand Down Expand Up @@ -222,4 +165,14 @@ You should POST application/JSON data to the endpoint where your metadata valida
}'
```

Running this returns:

```json
{
"message": "created",
"data": <dataset_id : Integer>,
"version": <dataset_version_id: Integer>
}
```

You should make a record of the dataset ID that is returned in the `data` field when the dataset is created. There are various endpoints that you can use to retrieve all your datasets and the IDs for them.
104 changes: 104 additions & 0 deletions docs/fe-creating-fma.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
## Introduction

The Federated Metadata Automation (FMA) service enables data custodians to automate metadata transfer to the Gateway by configuring specific API endpoints. This technical guide provides instructions on how to set up and manage the FMA self-service on the Gateway. It also covers common pitfalls and error codes encountered during the integration testing process.

## How to set up a FMA process

The following diagram (Fig 1) illustrates the steps involved in setting up a Federated Metadata Automation process on the Gateway:

Fig 1: Federated Metadata Automation process

### Step 1: Sign in to the Gateway

Sign in to the Gateway with your preferred route. Make sure you have a Team set up on the Gateway. If you need assistance with this step, contact the HDR UK technology team using the link below.

[https://www.healthdatagateway.org/about/contact-us](https://www.healthdatagateway.org/about/contact-us)

### Step 2: Access to the Gateway FMA service

The FMA service is designed to enable data custodians to maintain datasets and integration independently. If you have the necessary permissions (Team Admininistrator or Developer), you can access the service by following these steps:

- Go to Team Management > Integrations > Integration.
- Click on “Create new Integration” to initiate the configuration (Fig 2).

Fig 2: Create a new Integration

### Step 3: Create a new integration (integration configuration)

When creating a new integration, the following information needs to be provided:

- Integration Type: Choose one of the available options (NOTE: only ‘datasets’ is currently available. In the future this might include data use register, tools, etc.)
- Authentication Type: Select one of the authentication methods - API_Key, Bearer, or No_Auth.
- **API_Key**: Provides a simple way for APIs to verify the systems accessing them.
- **Bearer**: The service's script supports a static bearer token. It is strongly recommended to use HTTPS at all times to ensure security. If secure HTTP is not available, it is advisable not to use the service to prevent potential exploitation.
- **No_Auth**: Choose this option when authentication is not required to access your catalogue.
- Synchronisation Time: Specify the time at which the synchronisation process starts pulling data each day.
- Base URL: Enter the main domain name of the API.
- Datasets Endpoint: Provide the URL for listing all datasets available in the metadata catalogue.
- Dataset Endpoint: Specify the URL that lists the latest version of metadata on the data custodians' servers. Please manually fill in this field to avoid making assumptions during the process.
- Auth Token: Enter the API key generated by the data custodian from their data server.
- Notification Contacts: Add the relevant email addresses for receiving notifications.

Once all the required fields are filled, click on “Save configuration” to store the information on the Gateway (Fig 3). The next step is to run a test to ensure the API connection works without any errors.

Fig 3: Integration configuration form

### Step 4: Integration testing

The integration test covers two areas:

- Testing the connection to the server as per the defined server details.
- Verifying the given credentials for the authentication type provided.

If any of the above tests fail, an error message will be returned. If there are no errors, you can now enable the configuration, and the integration will go live (Fig 4).

\***\*Note\*\***: Configuration can only be enabled after a successful test.

#### Error Handling

Fig 4: Integration testing

If during normal operation the server changes or datasets are moved elsewhere, the integration may become invalid and FMA will disable it. In such cases, the synchronisation of datasets will cease, and you will receive a notification. To re-enable the integration, you will need to follow the configuration process again.

#### Error Codes

The FMA service utilises a list of error codes (Table 1). These error codes help in identifying and handling specific issues encountered during the integration testing process.

| Error code | Message | Status |
| ---------- | ----------------- | --------------------- |
| HTTP 200 | Test Successful | Success |
| HTTP 400 | Test Unsuccessful | Bad Request |
| HTTP 401 | Test Unsuccessful | Unauthorized |
| HTTP 403 | Test Unsuccessful | Forbidden |
| HTTP 404 | Test Unsuccessful | Not Found |
| HTTP 500 | Test Unsuccessful | Internal Server Error |
| HTTP 501 | Test Unsuccessful | Not Implemented |
| HTTP 503 | Test Unsuccessful | Gateway Timeout |

### Step 5: Manage integrations

Clicking on “Manage Integrations” displays a list of enabled and disabled integrations. This page provides an overview and allows for easy management and monitoring of the integrations (Fig 5).

Fig 5: Manage integrations

## Custodian Datasets Endpoint

The HDR UK custodian specification has been developed for interoperability by providing a clear set of standards that can be followed to ensure that custodians can share metadata in a consistent format and meet the minimum requirements for sharing metadata within the community.

The Interface Diagram below (Fig 6) shows how the Gateway integration ingestion script handle and process metadata catalogues:

Fig 6: Integration script process metadata catalogues

The Gateway first contacts the /datasets endpoint you provide and interprets the response. It then compares the returned information with the existing records in the Gateway database. Based on the comparison, a decision will be made for each dataset on how the metadata will be handled. There are generally three scenarios:

### New Dataset

If new data is detected through the ingestion script, it will be retrieved and stored in the Gateway database, and it will be made visible on the Gateway.

### Updated Dataset

The Gateway ingestion script determines if a dataset has changed since the last synchronisation. It specifically compares the ID of the dataset and the version that was last provided with the current version. The script does not check for a newer version number but rather a different version number. This accounts for cases where a dataset may be reverted to a previous version. Updates to datasets are automatically made live on the Gateway, and the previous version of the dataset will be archived following existing Gateway processes.

### Delete Dataset

The ingestion script can detect datasets that have been removed from the custodian metadata catalogue. If a dataset ID is no longer found in the /datasets endpoint, it will be considered a deleted dataset. A "deleted" dataset will be archived on the Gateway, along with all previous versions, and will no longer be visible on the Gateway following existing processes.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ nav:
- Home: index.md
- Gateway 2.0:
- API Management: fe-creating-an-app.md
- FMA Setup: fe-creating-fma.md
- Metadata:
- Creating metadata: creating-metadata.md
- Validating metadata: metadata-validation.md
Expand Down

0 comments on commit 8d9042b

Please sign in to comment.