The Job serves as a critical component for publishing content/collection objects submitted for publishing within the system. It fulfils various essential functions to ensure a seamless and efficient publishing process:
- Download and ECAR Packaging: The Job is responsible for downloading media files and packaging them into ECAR (E-content Archive) format, enabling offline consumption of content.
- Collection Hierarchy Publishing: It handles the publishing of updated or edited collection hierarchy data, ensuring that changes are correctly reflected in the published content.
- Metadata Update: The Job updates the metadata of content/collection object nodes with essential publish information, including ECAR paths, versionKey, pkgVersion, streamingUrl, status, and other relevant details.
- Elastic Search Indexing: For efficient search and retrieval, the Job indexes collection leaf node objects to Elastic Search, enabling streamlined access to content data.
- Redis Cache Clearing: To maintain data integrity and consistency, the Job clears cached node data from Redis, ensuring that users always access up-to-date information.
- Output Topics: The Job generates output topics for post-publish processing, video stream generation (if streaming is enabled and the content is of streamable mimeType)
By performing these essential functions, the Job plays a crucial role in ensuring the successful and accurate publishing of content/collection objects. It ensures that published content is properly packaged, metadata is updated, indexing is efficient, and caches are cleared for optimal user experience. Ultimately, the Job contributes to a robust and reliable publishing process within the system.
content-publish
{% embed url="https://github.com/Sunbird-Knowlg/knowledge-platform-jobs/tree/release-5.5.0/publish-pipeline/content-publish" %}
During the deployment process, the configuration for all knowledge-platform-jobs is sourced from the sunbird-learning-platform repository. On the other hand, for local setups, the configuration is taken from the respective job folders within the knowledge-platform-jobs repository.
Kafka Topic:
kafka {
input.topic = {{ env_name }}.publish.job.request
groupId = {{ env_name }}-content-publish-group
}
Job configuration variables:
Variable | Purpose |
---|---|
content.bundleLocation | Used to specify local/server folder location where artifacts are to be downloaded for ECAR bundling. |
content.isECARExtractionEnabled | Used to specify if the ECAR extraction is to be enabled to object 'version' and 'latest' cloud location using its 'snapshot' version. |
content.retry_asset_download_count | Used to specify number of times download attempt for assets part of content/collection object is to be done till it is successfully downloaded. |
content.tmp_file_location | NOT USED |
content.objectType | Used to specify list of valid objectTypes supported for publishing. |
content.mimeType | Used to specify list of valid mimeTypes supported for publishing. |
content.asset_download_duration | Used to specify time in seconds to wait for the asset download request to respond. |
content.stream.enabled | Used to check if streaming is enabled for published objects. If it is enabled, content rendenring is done using 'streamingUrl' attribute else via 'artifactUrl' |
content.stream.mimeType | Used to check if the mimeType of the object being published is of streamable type. If yes, event for video-stream-generator job is generated. |
content.artifact.size.for_online | Used to set the maximum size of the object (in bytes) that can be played by downloading beyond which "contentDisposition" is set to "online-only". |
content.downloadFiles.spine | Used to specify list of attributes that store asset Urls which are to be downloaded from the mentioned Urls while packaging SPINE ECAR. |
content.downloadFiles.full | Used to specify list of attributes that store asset Urls which are to be downloaded from the mentioned Urls while packaging FULL ECAR. |
content.nested.fields | Used to specify the list of object properties that are of object types with nested attributes. |
cloud_storage.folder.content | Used to specify the cloud store container folder name for content file storage/extraction etc. |
cloud_storage.folder.artifact | Used to specify the cloud store container folder name for artifact (media) files storage. |
contentTypeToPrimaryCategory | Used to specify the mapping between contentType and primaryCategory attributes using which object metadata is populated with the missing attribute among two. |
compositesearch.index.name | Used to specify the composite search index name where collection object nodes are synced with updated metadata. |
search.document.type | Used to specify the ElasticSearch document index type using which collection object nodes are synced with updated metadata. |
master.category.validation.enabled | Used to specify whether object getting published is to be enriched with framework metadata. |
service.print.basePath | NOT USED |
mimetype.allowed_extensions.word | Used to specify the list of file extensions allowed for uploaded content object. |
enableDIALContextUpdate | Used to sepcify if the DIAL code context update data is to be computed using the linked/de-linked Dial codes of the content/collection. |
Sample Kafka event:
{
"eid": "BE_JOB_REQUEST",
"ets": 1619527882745,
"mid": "LP.1619527882745.32dc378a-430f-49f6-83b5-bd73b767ad36",
"actor": {
"id": "content-publish",
"type": "System"
},
"context": {
"channel": "01309282781705830427",
"pdata": {
"id": "org.sunbird.platform",
"ver": "1.0"
},
"env": "dev"
},
"object": {
"id": "do_11329603741667328018",
"ver": "1619153418829"
},
"edata": {
"publish_type": "public",
"metadata": {
"identifier": "do_11329603741667328018",
"mimeType": "application/pdf",
"objectType": "Content",
"lastPublishedBy": "",
"pkgVersion": 1
},
"action": "publish",
"iteration": 1
}
}