Cassandra Approach for Project ML Exhaust #103
Replies: 2 comments 8 replies
-
@SanthoshVasabhaktula @reshmi-nair @rhwarrier Can you please help us out here at the earliest ? Thanks |
Beta Was this translation helpful? Give feedback.
-
@SanthoshVasabhaktula @mathewjpallan As suggested by you, we explored on MongoDB Approach and we did a POC on it with the below 2 approaches mentioned :-
Hence as per our understanding, we did POC to max extend and we could not achieve it, Hence we are thinking to go back with the Cassandra Approach Design only which was reviewed early and proposed by us https://project-sunbird.atlassian.net/l/cp/mMBCB4Xy The only concerns raised by @SanthoshVasabhaktula for the Cassandra Approach was :- DB Sync between the 2 transactional DB and data duplication across 2 different transactional DB, Please find below the approach we have put to resolve the same:- Detailed doc on the above approaches is mentioned in https://project-sunbird.atlassian.net/l/cp/mMBCB4Xy doc at the bottom, please check ... Please review it and let us know the next steps of actions, if required we can get on a call as well @rakeshSgr Request you to take this forward from here ... Cc- @aks30 @kiranharidas187 @vijiurs @Vivek-M-08 @rakeshSgr Thanks |
Beta Was this translation helpful? Give feedback.
-
Hi @reshmi-nair @amit-tarento ,
As part of next release, We are planning to do few ML Exhaust Optimisation for scaling .
JIra Ticket Link :- https://project-sunbird.atlassian.net/browse/LR-472
Problem Statement :- CSV to be extracted from Transaction DB(Cassandra) and not from Druid to avoid deletion of druid datasource via Batch Ingestion
Reason for Deletion of Datasource :- Since the Status of the project vary every time and druid doesn't support updating a record, We are daily deleting the entire data from druid and re-ingesting the whole data into druid on a daily basis to get the updated status of a submission.
Concern :- Huge Data Handling is not supported by Druid when extracted as a CSV
Approach(Solution) :- Please check this confluence doc https://project-sunbird.atlassian.net/l/cp/TRSTnzhN , we have detailed out the design.
Similar to Data Product and Flink Jobs Implemented for PII , we need to create the same for projects as well.
Note :- The Design Doc Attached here is pretty much similar to the design developed for ML PII
Please provide us your @SanthoshVasabhaktula @reshmi-nair @rhwarrier @ approval and suggestions, if we can go a head on this.
Cc- @aishwaryashikshalokam @Ashwiniev95 @Prateek-slokam @aks30 @kiranharidas187 @vijiurs @[email protected]
Please do the needful at the earliest....
Thanks
Beta Was this translation helpful? Give feedback.
All reactions