Skip to content

shubhi0310/platform-deployment-manager

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Platform Deployment Manager

Design

The Deployment Manager is a service that manages package deployment and application creation for a single PNDA cluster.

  • It implements the Packages, Applications, Repository and EnvironmentEndpoints API used by operators.
  • It parses and validates basic package structure.
  • It interacts with a Repository and a Registrar to determine available & record currently deployed packages and applications.
  • It includes a number of component specific Creator implementations that carry out the concrete steps necessary to set up different parts of the Core Platform.
  • It is easily extensible to support additional component types and repository types.

The design consists of a main class that implements the APIs and coordinates between a Repository, a Registrar and an Application Creator that dynamically loads a number of component specific Creator classes as required by a particular package.

HTTP and Python bindings are provided for these APIs.

Connecting

By default, the Deployment Manager is installed on the edge node. To access the API use: http://[cluster-name]-cdh-edge:5000

Repository

Packages are made available via a repository. The Deployment Manager is configured with a client of this repository at instantiation time. The reference repository is implemented as a thin wrapper over an Openstack Swift container.

Registrar

The details of package deployments for a given service instance are recorded by a registrar. The registrar stores information in HBase in the platform_packages and platform_applications tables.

Application Creator

The Application Creator handles the creation and control of applications on behalf of the Deployment Manager. It implements business logic that is common to all components and delegates to a component specific Creator as required by a particular package. Creator subclasses are dynamically loaded as needed by the Application Creator.

Creator

Each component type is associated with a subclass of Creator. Each Creator implements the specific steps necessary to perform the following functions:

Validation

Each component type has a specific structure. Each Creator implements a validation function that checks that structure. All components are validated before the package is deployed. If any validation function fails, the package is deemed “bad” and package deployment fails. This provides an opportunity to catch simple package construction problems early in the deployment process.

Application creation

Each component type has specific creation requirements and resource dependencies. Each Creator implements the process required to create components of a given type and returns “application_data”. The Deployment Manager aggregates the application data generated by the process of creating each of the components in the package, then persists an association between this and the package deployment using a Registrar.

Application control

Applications may be paused and restarted. This leaves all the installed components in-place and temporarily stops the running processes associated with those components.

Undeployment

Each Creator implements a specific set of steps to uninstall components of its associated type. The Creator is passed the application data associated with the package and component and uses this to execute those steps.

Requirements

Building

To build the Deployment Manager, change to the api directory, which contains the pom.xml file. Type mvn clean package on the command line. Once the build is successful, the built package will be placed in the target folder.

API Documentation

Repository API

List packages from the repository

?recency=n may be used to control how many versions of each package are listed, by default recency=1

GET /repository/packages

Response Codes:
200 - OK
500 - Server Error

Example response:
[
    {
	"latest_versions": [{
		"version": "1.0.23",
		"file": "spark-batch-example-app-1.0.23.tar.gz"
	}],
	"name": "spark-batch-example-app"
    }
]

Packages API

List packages currently deployed to the cluster

GET /packages

Response Codes:
200 - OK
500 - Server Error

Example response:
["spark-batch-example-app-1.0.23"]

Get the status for package

GET /packages/<package>/status

Response Codes:
200 - OK
500 - Server Error

Example response:
{"status": "DEPLOYED", "information": "human readable error message or other information about this status"}

Possible values for status:
NOTDEPLOYED
DEPLOYING
DEPLOYED
UNDEPLOYING

Get full information for package

GET /packages/<package>

Response Codes:
200 - OK
500 - Server Error

Example response:
{
	"status": "DEPLOYED",
	"version": "1.0.23",
	"name": "spark-batch-example-app",
	"defaults": {
		"oozie": {
			"example": {
				"end": "${deployment_end}",
				"start": "${deployment_start}",
				"driver_mem": "256M",
				"input_data": "/user/pnda/PNDA_datasets/datasets/source=test-src/year=*",
				"executors_num": "2",
				"executors_mem": "256M",
				"freq_in_mins": "180",
				"job_name": "batch_example"
			}
		}
	}
}

Deploy package to the cluster

PUT /packages/<package>

Response Codes:
202 - Accepted, poll /packages/<package>/status for status
404 - Package not found in repository
409 - Package already deployed
500 - Server Error

Undeploy package from the cluster

DELETE /packages/<package>

Response Codes:
202 - Accepted, poll /packages/<package>/status for status
404 - Package not deployed
500 - Server Error

Applications API

List all applications

GET /applications

Response Codes:
200 - OK
500 - Server Error

Example response:
["spark-batch-example-app-instance"]

List applications that have been created from package

GET /packages/<package>/applications

Response Codes:
200 - OK
500 - Server Error

Example response:
["spark-batch-example-app-instance"]

Get the status for application

GET /applications/<application>/status

Response Codes:
200 - OK
404 - Application not known
500 - Server Error

Example response:
{"status": "STARTED", "information": "human readible error message or other information about this status"}

Possible values for status:
NOTCREATED
CREATING
CREATED
STARTING
STARTED
STOPPING
DESTROYING

Get run-time details for application

GET /applications/<application>/detail

Response Codes:
200 - OK
404 - Application not known
500 - Server Error

{
        "yarn_applications": {
		    "oozie-example": {
			    "type": "oozie",
				"yarn-id": "application_1479988623709_0015",
				"component": "example",
				"yarn-start-time": 1479992520527,
				"yarn-state": "FINISHED"
			}
		},
		"status": "STARTED",
		"name": "spark-batch-example-app-instance"
}

Get the summary status for application

GET /applications/<application>/summary

Response Codes:
200 - OK
404 - Application not known
500 - Server Error

Summary status in case of oozie component

{
  "oozie-application": {
    "aggregate_status": "STARTED_RUNNING_WITH_ERRORS",
    "oozie-1": {
      "status": "WARN",
      "aggregate_status": "STARTED_RUNNING_WITH_ERRORS",
      "actions": {
        "workflow-1": {
          "status": "WARN",
          "oozieId": "0000004-171229054340125-oozie-oozi-W",
          "actions": {
            "subworkflow-1": {
              "status": "WARN",
              "oozieId": "0000005-171229054340125-oozie-oozi-W",
              "actions": {
                "job-2": {
                  "status": "ERROR",
                  "information": "No JSON object could be decoded",
                  "applicationType": "SPARK",
                  "name": "process",
                  "yarnId": "application_1514526198433_0022"
                },
                "job-1": {
                  "status": "OK",
                  "information": null,
                  "applicationType": "MAPREDUCE",
                  "name": "download",
                  "yarnId": "application_1514526198433_0019"
                }
              },
              "name": "oozie-application-subworkflow"
            }
          },
          "name": "oozie-application-workflow"
        }
      },
      "oozieId": "0000003-171229054340125-oozie-oozi-C",
      "name": "oozie-application-coordinator"
    }
  }
}

Summary status in case of spark-streaming component

{
  "spark-streaming-application": {
    "aggregate_status": "STARTED_RUNNING_WITH_NO_ERRORS",
    "sparkStreaming-1": {
      "information": {
        "stageSummary": {
          "active": 0,
          "number_of_stages": 128,
          "complete": 128,
          "pending": 0,
          "failed": 0
        },
        "jobSummary": {
          "unknown": 0,
          "number_of_jobs": 32,
          "running": 0,
          "succeeded": 32,
          "failed": 0
        }
      },
      "aggregate_status": "STARTED_RUNNING_WITH_NO_ERRORS",
      "name": "spark-streaming-application-example-job",
      "yarnId": "application_1514526198433_0069"
    }
  }
}

Start application

POST /applications/<application>/start

Response Codes:
202 - Accepted, poll /applications/<application>/status for status
404 - Application not known
500 - Server Error

Stop application

POST /applications/<application>/stop

Response Codes:
202 - Accepted, poll /applications/<application>/status for status
404 - Application not known
500 - Server Error

Get full information for application

GET /applications/<application>

Response Codes:
200 - OK
404 - Application not known
500 - Server Error

Example response:
{
	"status": "CREATED",
	"overrides": {
        "user": "somebody",
		"package_name": "spark-batch-example-app-1.0.23",
		"oozie": {
			"example": {
				"executors_num": "5"
			}
		}
	},
	"package_name": "spark-batch-example-app-1.0.23",
	"name": "spark-batch-example-app-instance",
	"defaults": {
		"oozie": {
			"example": {
				"end": "${deployment_end}",
				"input_data": "/user/pnda/PNDA_datasets/datasets/source=test-src/year=*",
				"driver_mem": "256M",
				"start": "${deployment_start}",
				"executors_num": "2",
				"freq_in_mins": "180",
				"executors_mem": "256M",
				"job_name": "batch_example"
			}
		}
	}
}

Create application from package

PUT /applications/<application>
{
	"user": "<username>",
	"package": "<package>",
	"<componentType>": {
		"<componentName>": {
			"<property>": "<value>"
		}
	}
}

Response Codes:
202 - Accepted, poll /applications/<application>/status for status
400 - Request body failed validation
404 - Package not found
409 - Application already exists
500 - Server Error

Example body:
{
	"user": "somebody",
	"package": "<package>",
	"oozie": {
		"example": {
			"executors_num": "5"
		}
	}
}

Package and user are mandatory, property settings are optional

Destroy application

DELETE /applications/<application>

Response Codes:
200 - OK
404 - Application not known
500 - Server Error

Environment Endpoints API

List environment variables known to the deployment manager

GET /environment/endpoints

Response Codes:
200 - OK
500 - Server Error

Example response:
{"zookeeper_port": "2181", "cluster_root_user": "cloud-user", ... }

Deployment Manager Variables

The following variables are made available for use in the configuration files for every component and injected as previously described.

Application Variables

application_user        The user ID that this application's components will run as

Component Variables

component_application   unique application ID
component_name          name of component folder in package
component_job_name      application_id-component_name-job
component_xxx           setting xxx from properties.json
hdfspath_path_name      generated from entries in hdfs.json

Environment Variables

These can be obtained with the environment endpoints API

environment_app_packages_hdfs_path    /pnda/deployment/app_packages
environment_hadoop_manager_host       192.168.1.2
environment_hadoop_manager_password   admin
environment_hadoop_manager_username   admin
environment_cluster_private_key         ./dm.pem
environment_cluster_root_user           cloud-user
environment_hbase_rest_port             20550
environment_hbase_rest_server           cluster-cdh-mgr1
environment_hive_port                   10000
environment_hive_server                 cluster-cdh-mgr1
environment_impala_host                 cluster-cdh-dn0
environment_impala_port                 21050
environment_kafka_brokers               192.168.1.3:9092, ...
environment_kafka_manager               https://192.168.1.4:443
environment_kafka_zookeeper             192.168.1.5:2181, ...
environment_metric_logger_url           hhtp://192.169.1.7:3001/metrics
environment_name_node                   hdfs://cluster-cdh-mgr1:8020
environment_namespace                   platform_app
environment_oozie_uri                   http://cluster-cdh-mgr1:11000/oozie
environment_opentsdb                    192.168.1.6:4242
environment_queue_policy                /opt/pnda/rm-wrapper/yarn-policy.sh
environment_webhdfs_host                cluster-cdh-mgr1
environment_webhdfs_port                50070
environment_yarn_node_managers          cluster-cdh-dn0
environment_yarn_resource_manager_host  cluster-cdh-mgr1
environment_yarn_resource_manager_mr_port 8032
environment_yarn_resource_manager_port  8088
environment_zookeeper_port              2181
environment_zookeeper_quorum            cluster-cdh-mgr1

Spark Streamining Specific Variables

The following varibles are only injected for Spark streaming components. They may be overridden in properties.json, for example to override component_spark_version, include spark_version in properties.json.

component_spark_version            major version of spark to use. Only applicable to HDP clusters, when using CDH PNDA does not support side-by-side Spark frameworks and whatever version is run by the spark-submit command will be used.
component_spark_submit_args        additional arguments to spark-submit
(java only) component_main_jar     the jar containing the job code
(python only) component_main_py    the python file containing the job code
(python only) component_py_files   additional python files to pass to spark-submit

Oozie Specific Variables

The following varibles are only injected for Oozie components.

component_end                  2016-03-31T17:07Z
component_start                2016-03-24T17:07Z
mapreduce.job.user.name        hdfs
mapreduce.job.queuename        root.applications.prod
oozie.coord.application.path   hdfs://cluster-cdh-mgr1:8020/user/application_id/component_name/coordinator.xml
oozie.libpath                  /pnda/deployment/platform
oozie.use.system.libpath       true
user.name                      prod1

About

Provides an API that manages package deployment and application creation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.4%
  • Smarty 1.2%
  • Shell 0.4%