The Deployment Manager is a service that manages package deployment and application creation for a single PNDA cluster.
- It implements the Packages, Applications, Repository and EnvironmentEndpoints API used by operators.
- It parses and validates basic package structure.
- It interacts with a Repository and a Registrar to determine available & record currently deployed packages and applications.
- It includes a number of component specific Creator implementations that carry out the concrete steps necessary to set up different parts of the Core Platform.
- It is easily extensible to support additional component types and repository types.
The design consists of a main class that implements the APIs and coordinates between a Repository, a Registrar and an Application Creator that dynamically loads a number of component specific Creator classes as required by a particular package.
HTTP and Python bindings are provided for these APIs.
By default, the Deployment Manager is installed on the edge
node. To access the API use: http://[cluster-name]-cdh-edge:5000
Packages are made available via a repository. The Deployment Manager is configured with a client of this repository at instantiation time. The reference repository is implemented as a thin wrapper over an Openstack Swift container.
The details of package deployments for a given service instance are recorded by a registrar. The registrar stores information in HBase in the platform_packages and platform_applications tables.
The Application Creator handles the creation and control of applications on behalf of the Deployment Manager. It implements business logic that is common to all components and delegates to a component specific Creator as required by a particular package. Creator subclasses are dynamically loaded as needed by the Application Creator.
Each component type is associated with a subclass of Creator. Each Creator implements the specific steps necessary to perform the following functions:
Each component type has a specific structure. Each Creator implements a validation function that checks that structure. All components are validated before the package is deployed. If any validation function fails, the package is deemed “bad” and package deployment fails. This provides an opportunity to catch simple package construction problems early in the deployment process.
Each component type has specific creation requirements and resource dependencies. Each Creator implements the process required to create components of a given type and returns “application_data”. The Deployment Manager aggregates the application data generated by the process of creating each of the components in the package, then persists an association between this and the package deployment using a Registrar.
Applications may be paused and restarted. This leaves all the installed components in-place and temporarily stops the running processes associated with those components.
Each Creator implements a specific set of steps to uninstall components of its associated type. The Creator is passed the application data associated with the package and component and uses this to execute those steps.
To build the Deployment Manager, change to the api
directory, which contains the pom.xml
file. Type mvn clean package
on the command line. Once the build is successful, the built package will be placed in the target
folder.
All API paths below are relative to a base URL is defined by schemes, host, port and base path on the root level of this API specification.
<scheme>://<host>:<port>/<base path>
By default, the API uses 'https' scheme as the transfer protocol. Host is the domain name or hostname that serves the API. In order to access the API outside PNDA security perimeter, it has to via knox service by using the domain name or FQDN when creating a PNDA cluster. The domain name or FQDN must be resolvable via public or private DNS service. To access the deployment management API, the base path, /gateway/pnda/deployment
, must be used as prefixes for all API paths.
e.g. https://knox.example.com:8443/gateway/pnda/deployment
?recency=n may be used to control how many versions of each package are listed, by default recency=1
GET /repository/packages?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
[
{
"latest_versions": [{
"version": "1.0.23",
"file": "spark-batch-example-app-1.0.23.tar.gz"
}],
"name": "spark-batch-example-app"
}
]
GET /packages?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
["spark-batch-example-app-1.0.23"]
GET /packages/<package>/status?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
{"status": "DEPLOYED", "information": "human readable error message or other information about this status"}
Possible values for status:
NOTDEPLOYED
DEPLOYING
DEPLOYED
UNDEPLOYING
GET /packages/<package>?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
{
"status": "DEPLOYED",
"version": "1.0.23",
"name": "spark-batch-example-app",
"user": "who-deployed-this",
"defaults": {
"oozie": {
"example": {
"end": "${deployment_end}",
"start": "${deployment_start}",
"driver_mem": "256M",
"input_data": "/user/pnda/PNDA_datasets/datasets/source=test-src/year=*",
"executors_num": "2",
"executors_mem": "256M",
"freq_in_mins": "180",
"job_name": "batch_example"
}
}
}
}
PUT /packages/<package>?user.name=<username>
Response Codes:
202 - Accepted, poll /packages/<package>/status for status
403 - Unauthorised user
404 - Package not found in repository
409 - Package already deployed
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
DELETE /packages/<package>?user.name=<username>
Response Codes:
202 - Accepted, poll /packages/<package>/status for status
403 - Unauthorised user
404 - Package not deployed
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
GET /applications?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
["spark-batch-example-app-instance"]
GET /packages/<package>/applications?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
["spark-batch-example-app-instance"]
GET /applications/<application>/status?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
{"status": "STARTED", "information": "human readible error message or other information about this status"}
Possible values for status:
NOTCREATED
CREATING
CREATED
STARTING
STARTED
STOPPING
DESTROYING
GET /applications/<application>/detail?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
{
"yarn_applications": {
"oozie-example": {
"type": "oozie",
"yarn-id": "application_1479988623709_0015",
"component": "example",
"yarn-start-time": 1479992520527,
"yarn-state": "FINISHED"
}
},
"status": "STARTED",
"name": "spark-batch-example-app-instance"
}
GET /applications/<application>/summary?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
{
"spark-batch-py": {
"aggregate_status": "COMPLETED",
"oozie-1": {
"status": "OK",
"name": "spark-batch-py-workflow",
"actions": {
"job-1": {
"status": "OK",
"information": "",
"yarnId": "application_1531380960927_0152",
"applicationType": "spark",
"name": "process"
}
},
"componentType": "Oozie",
"aggregate_status": "COMPLETED",
"oozieId": "0000013-180712073712712-oozie-oozi-W"
}
}
}
{
"spark-stream": {
"aggregate_status": "RUNNING",
"sparkStreaming-1": {
"information": {
"stageSummary": {
"active": 0,
"number_of_stages": 1404,
"complete": 1000,
"pending": 0,
"failed": 0
},
"jobSummary": {
"unknown": 0,
"number_of_jobs": 351,
"running": 0,
"succeeded": 351,
"failed": 0
}
},
"name": "spark-stream-example-job",
"yarnId": "application_1531380960927_0153",
"componentType": "SparkStreaming",
"aggregate_status": "RUNNING",
"tracking_url": "http://st-2-std-hadoop-mgr-2.node.dc1.pnda.local:8088/proxy/application_1531380960927_0153/"
}
}
}
{
"test1": {
"aggregate_status": "RUNNING",
"flink-1": {
"information": {
"state": "OK",
"vertices": [
{
"status": "RUNNING",
"name": "Source"
}
],
"flinkJid": "e7a7163fef86ad81017a0239839207cb"
},
"name": "test1-example-job",
"yarnId": "application_1524556418619_0205",
"trackingUrl": "http://rhel-hadoop-mgr-1.node.dc1.pnda.local:8088/proxy/application_1524556418619_0205/#/jobs/e7a7163fef86ad81017a0239839207cb",
"componentType": "Flink",
"aggregate_status": "RUNNING"
}
}
}
POST /applications/<application>/start?user.name=<username>
Response Codes:
202 - Accepted, poll /applications/<application>/status for status
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
POST /applications/<application>/stop?user.name=<username>
Response Codes:
202 - Accepted, poll /applications/<application>/status for status
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
GET /applications/<application>?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
{
"status": "CREATED",
"overrides": {
"user": "somebody",
"package_name": "spark-batch-example-app-1.0.23",
"oozie": {
"example": {
"executors_num": "5"
}
}
},
"package_name": "spark-batch-example-app-1.0.23",
"name": "spark-batch-example-app-instance",
"defaults": {
"oozie": {
"example": {
"end": "${deployment_end}",
"input_data": "/user/pnda/PNDA_datasets/datasets/source=test-src/year=*",
"driver_mem": "256M",
"start": "${deployment_start}",
"executors_num": "2",
"freq_in_mins": "180",
"executors_mem": "256M",
"job_name": "batch_example"
}
}
}
}
PUT /applications/<application>?user.name=<username>
{
"package": "<package>",
"<componentType>": {
"<componentName>": {
"<property>": "<value>"
}
}
}
Response Codes:
202 - Accepted, poll /applications/<application>/status for status
400 - Request body failed validation
403 - Unauthorised user
404 - Package not found
409 - Application already exists
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example body:
{
"package": "<package>",
"oozie": {
"example": {
"executors_num": "5"
}
}
}
Package is mandatory, property settings are optional
DELETE /applications/<application>?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
404 - Application not known
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
GET /environment/endpoints?user.name=<username>
Response Codes:
200 - OK
403 - Unauthorised user
500 - Server Error
Query Parameters:
user.name - User name to run this command as. Should have permissions to perform the action as defined in authorizer_rules.yaml.
Example response:
{"zookeeper_port": "2181", "cluster_root_user": "cloud-user", ... }
The following variables are made available for use in the configuration files for every component and injected as previously described.
application_user The user ID that this application's components will run as
component_application unique application ID
component_name name of component folder in package
component_job_name application_id-component_name-job
component_xxx setting xxx from properties.json
hdfspath_path_name generated from entries in hdfs.json
These can be obtained with the environment endpoints API
environment_app_packages_hdfs_path /pnda/deployment/app_packages
environment_hadoop_manager_host 192.168.1.2
environment_hadoop_manager_password admin
environment_hadoop_manager_username admin
environment_cluster_private_key ./dm.pem
environment_cluster_root_user cloud-user
environment_hbase_rest_port 20550
environment_hbase_rest_server cluster-cdh-mgr1
environment_hive_port 10000
environment_hive_server cluster-cdh-mgr1
environment_impala_host cluster-cdh-dn0
environment_impala_port 21050
environment_kafka_brokers 192.168.1.3:9092, ...
environment_kafka_manager https://192.168.1.4:443
environment_kafka_zookeeper 192.168.1.5:2181, ...
environment_metric_logger_url hhtp://192.169.1.7:3001/metrics
environment_name_node hdfs://cluster-cdh-mgr1:8020
environment_namespace platform_app
environment_oozie_uri http://cluster-cdh-mgr1:11000/oozie
environment_opentsdb 192.168.1.6:4242
environment_queue_policy /opt/pnda/rm-wrapper/yarn-policy.sh
environment_webhdfs_host cluster-cdh-mgr1
environment_webhdfs_port 50070
environment_yarn_node_managers cluster-cdh-dn0
environment_yarn_resource_manager_host cluster-cdh-mgr1
environment_yarn_resource_manager_mr_port 8032
environment_yarn_resource_manager_port 8088
environment_zookeeper_port 2181
environment_zookeeper_quorum cluster-cdh-mgr1
Both Spark streaming and Oozie components can be configured to use either Spark1 or Spark2. This may be set by including spark_version
in properties.json and setting it to 1
or 2
. It defaults to Spark1 if spark_version
is not included.
component_spark_version major version of spark to use. Set to '1' or '2'. Only applicable to HDP clusters
The following varibles are only injected for Spark streaming components. They may be overridden in properties.json, for example to override component_spark_version
, include spark_version
in properties.json.
component_spark_submit_args additional arguments to spark-submit
component_respawn_type whether to restart the process when it exits. Valid values are always, no, on-success, on-failure, on-abnormal, on-watchdog or on-abort. Refer to the systemd documentation for more information about each of these.
component_respawn_timeout_sec used with component_respawn_type to set how long to wait (in seconds) before restarting the process when it exits.
(java only) component_main_jar the jar containing the job code
(python only) component_main_py the python file containing the job code
(python only) component_py_files additional python files to pass to spark-submit
The following varibles are only injected for Oozie components.
component_end 2016-03-31T17:07Z
component_start 2016-03-24T17:07Z
mapreduce.job.user.name hdfs
mapreduce.job.queuename root.applications.prod
oozie.coord.application.path hdfs://cluster-cdh-mgr1:8020/user/application_id/component_name/coordinator.xml
oozie.libpath /pnda/deployment/platform
oozie.use.system.libpath true
user.name prod1