Evidence Link Creation in the OnDemandDruidExhaust Job Data Product #29

Shakthieshwari · 2022-09-09T12:08:24Z

Shakthieshwari
Sep 9, 2022

Hi Team,

As part of 5.1 release, We have this https://project-sunbird.atlassian.net/browse/OB-70 Story.

ML Feature has a resource’s called observation, survey and projects where user can upload evidence/attachments (File) which is getting stored in the Azure cloud Storage. In the Program Dashboard CSV, We wanted to send this attachments as a link, Currently ML Data-Pipeline generates the link and store’s into druid.

Now to support Multi-Cloud Storage (Azure,AWS,GCP,Oracle), We are planning to create the Link in the OnDemandDruidExhaust Data Product itself Dynamically based on the cloud storage from the config file of druid query.

Sample Link https://{{azure_storage_account}}.blob.core.windows.net/{{azure_container_name}}/survey/631041ecd58d74000aec9e7f/bc374bdf-8c59-4036-b1c0-b1db471da3f1/e768b582-2b61-4913-a8db-dccc650c767e/fix csv report.png

https://sunbirdstagingpublic.blob.core.windows.net/samiksha/survey/631041ecd58d74000aec9e7f/bc374bdf-8c59-4036-b1c0-b1db471da3f1/e768b582-2b61-4913-a8db-dccc650c767e/fix csv report.png

Approach :-
In our ML Druid Datasource, we will store only FileSourcePath and generate the Link in the Scala Data Product and store the link into CSV for the Program Dashboard Usage.

With the help of FileSourcePath from the ML Druid Datasource we need to create a evidence link by modifying the Scala Data Product by getting the druid query from the config.

Sample Config for Multi-Cloud Storage Support :-
{"id":"ml-obs-question-detail-exhaust","labels":{"questionName":"Question","user_districtName":"Declared District","evidences":"Evidences","questionResponseLabel":"Question_response_label","solutionExternalId":"Observation ID","school_code":"Declared School ID","user_type":"User Type","role_title":"User Sub Type","minScore":"Question score","programName":"Program Name","questionExternalId":"Question_external_id","organisation_name":"Organisation Name","user_boardName":"Declared Board","createdBy":"UUID","remarks":"Remarks","user_blockName":"Declared Block","solutionName":"Observation Name","user_schoolName":"Declared School Name","programExternalId":"Program ID","user_stateName":"Declared State","observationSubmissionId":"observation_submission_id","districtName":"District observed","blockName":"Block observed","schoolName":"School observed","schoolExternalId":"ID of school observed"},"dateRange":{"interval":{"startDate":"1901-01-01","endDate":"2101-01-01"},"granularity":"all","intervalSlider":0},"metrics":[{"metric":"total_content_plays_on_portal","label":"total_content_plays_on_portal","druidQuery":{"intervals":"1901-01-01T00:00+00:00/2101-01-01T00:00:00+00:00","dataSource":"sl-observation","columns":["createdBy","user_type","role_title","user_stateName","user_districtName","user_blockName","school_code","user_schoolName","user_boardName","organisation_name","programName","programExternalId","solutionName","solutionExternalId","districtName","blockName","schoolName","schoolExternalId","observationSubmissionId","questionExternalId","questionName","questionResponseLabel","minScore","evidences","remarks"],"queryType":"scan"}}],"output":[{"zip":false,"label":"","dims":["date"],"fileParameters":["id","dims"],"metrics":["createdBy","user_type","role_title","user_stateName","user_districtName","user_blockName","school_code","user_schoolName","user_boardName","organisation_name","programName","programExternalId","solutionName","solutionExternalId","districtName","blockName","schoolName","schoolExternalId","observationSubmissionId","questionExternalId","questionName","questionResponseLabel","minScore","evidences","remarks"],"type":"csv"}],"sort":["UUID","Program ID","Observation ID","observation_submission_id","Question_external_id"],"queryType":"scan",**"cloud_storage":{"type":"S3(Azure,GCP,Oracle)","storage_account":"xyz","bucket_name(container_name)":"abc","base_url":"http://s3-REGION-.amazonaws.com/BUCKET-NAME/KEY"}**}

As part of 5.1 release,@sowmya-dixit @anandp504 Please Let us know if we can enhance this approach in the OnDemandDruidExhaust Data Product Job.

Cc- @aishwaryashikshalokam @Ashwiniev95 @Prateek-slokam @aks30 @kiranharidas187 @vijiurs @snehangsude

Please do the needful at the earliest

Awaiting your reply

Thanks

Shakthieshwari · 2022-09-19T08:41:18Z

Shakthieshwari
Sep 19, 2022
Author

@sowmya-dixit @anandp504 Any update on this ?

Please do help us out

Thanks

0 replies

Shakthieshwari · 2022-09-20T10:03:33Z

Shakthieshwari
Sep 20, 2022
Author

@sowmya-dixit @anandp504 Any update on this ?

Please do help us out, If required we can connect as well

Thanks

6 replies

Shakthieshwari Sep 23, 2022
Author

@anandp504 Our Backend Team upload files to a specific cloud storage, but that is not an requirement here. Current requirement is in the OnDemandDruidExhuast Job we should generate a link for the files uploaded in the cloud storage and share the link in the csv file.

Can we do that in the data product ?

Please help me out here, If required we can setup a call for 30mins, Please let me know your availability

Thanks

SanthoshVasabhaktula Sep 23, 2022
Maintainer

@Shakthieshwari

A data product cannot provide a signed url as the ttl for a signed url is short. You have the following options:

If the files uploaded are public, then directly provide the link in the csv.
If the files uploaded are private, create an API to provide the signed urls for the private files on demand. The path to API and the file name can be provided in the csv. For ex: the api can be like https:///survey/download/ and this API can either download the file or give a signed url with a ttl
Create a zip with all the uploaded files along with the csv. Within the csv provide relative links to the file in the zip

Shakthieshwari Sep 23, 2022
Author

@SanthoshVasabhaktula files are public, yes we will provide the link in the csv directly. But our data-product should have the logic to form a link with the help of base_url (https://{{storage_account}}.blob.core.windows.net/{{container_name}}/) from the config and SourcePath from the data (survey/607d5d765597a306c20eb84e/59fa60e7-281d-4876-ae19-5c33ece11db6/3b637159-8f9f-45f4-901d-027734c71484/1650889156923.pdf).

Link :- base_url + source_path

Config :- **{"cloud_storage":{"type":"S3(Azure,GCP)","storage_account":"xyz","bucket_name(container_name)":"abc","base_url":"http://s3-REGION-.amazonaws.com/BUCKET-NAME/KEY"}}**

Please let us know if we can do this on the data product?

SanthoshVasabhaktula Sep 23, 2022
Maintainer

Don't take in base url as a config, generate the base url using the configuration provided via type, region and bucket.

Shakthieshwari Sep 23, 2022
Author

sure @SanthoshVasabhaktula

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sunbird Obsrv

Evidence Link Creation in the OnDemandDruidExhaust Job Data Product #29

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Sunbird Obsrv

Evidence Link Creation in the OnDemandDruidExhaust Job Data Product #29

Shakthieshwari Sep 9, 2022

Replies: 2 comments · 6 replies

Shakthieshwari Sep 19, 2022 Author

Shakthieshwari Sep 20, 2022 Author

Shakthieshwari Sep 23, 2022 Author

SanthoshVasabhaktula Sep 23, 2022 Maintainer

Shakthieshwari Sep 23, 2022 Author

SanthoshVasabhaktula Sep 23, 2022 Maintainer

Shakthieshwari Sep 23, 2022 Author

Shakthieshwari
Sep 9, 2022

Replies: 2 comments 6 replies

Shakthieshwari
Sep 19, 2022
Author

Shakthieshwari
Sep 20, 2022
Author

Shakthieshwari Sep 23, 2022
Author

SanthoshVasabhaktula Sep 23, 2022
Maintainer

Shakthieshwari Sep 23, 2022
Author

SanthoshVasabhaktula Sep 23, 2022
Maintainer

Shakthieshwari Sep 23, 2022
Author