feat: introduce the interface of `RemoteJobScheduler` #4124

zyy17 · 2024-06-09T03:44:07Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

`RemoteJobScheduler`

For the storage disaggregated system, we can always offload the CPU-intensive and IO-intensive tasks(for example, compaction and index) to the remote workers. For the above scenario, the PR introduces the abstraction.

RemoteJobScheduler is a trait that defines the API for scheduling remote jobs. Its implementation is in GreptimeDB Enterprise.

/// RemoteJobScheduler is a trait that defines the API to schedule remote jobs.
#[async_trait::async_trait]
pub trait RemoteJobScheduler: Send + Sync + 'static {
    /// Sends a job to the scheduler and returns a unique identifier for the job.
    async fn schedule(&self, job: RemoteJob, notifier: Arc<dyn Notifier>) -> Result<JobId>;
}

/// Notifier is used to notify the mito engine when a remote job is completed.
#[async_trait::async_trait]
pub trait Notifier: Send + Sync + 'static {
    /// Notify the mito engine that a remote job is completed.
    async fn notify(&self, result: RemoteJobResult);
}

The PR modify schedule_compaction_request() to support remote compaction:

If the compact request specifies the remote_compaction in region_options and the RemoteJobScheduler is initialized, the Mito will execute remote compaction;
If the remote compaction fails, fall back to local compaction;

Other changes

Add the async keyword for all the compaction-related functions because the schedule_compaction_request needs to be async;
Use Option type for senders in CompactionFinished because we don't need it in remote compaction scenario;
Add remote_compaction in compaction options;

TODOs

Inject RemoteJobScheduler from the plugin system;
~~Design the API to fetch the Jobs from the scheduler. When the datanode restarts, it can rebuild the context of the remote job~~;
~~Add the unit tests for the RemoteJobScheduler~~;

Checklist

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.

zyy17 · 2024-06-19T10:09:08Z

@sunng87 @shuiyisong The PR contains part of the plugins of datanode, PTAL.

zyy17 · 2024-06-19T10:16:43Z

@evenyag @v0y4g3r When I tried to design the API for rebuilding the job context if the datanode is restarted, I realized that we might need to keep metadata about the latest compaction status. It's useful if we use local compaction for heavy compaction tasks.

The metadata may be part of region metadata, for example:

last_compaction: {
    "timestamp": xxx,
    "status": "Processing"
}

When starting the datanode, it can fetch the metadata(open regions) and decide whether to schedule the compaction task.

codecov · 2024-06-19T10:23:40Z

Codecov Report

Attention: Patch coverage is 56.11814% with 104 lines in your changes missing coverage. Please review.

Project coverage is 84.83%. Comparing base (22d1268) to head (d08b56e).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4124      +/-   ##
==========================================
- Coverage   85.16%   84.83%   -0.33%     
==========================================
  Files        1020     1021       +1     
  Lines      179630   179777     +147     
==========================================
- Hits       152976   152519     -457     
- Misses      26654    27258     +604

sunng87 · 2024-06-19T13:14:43Z

src/mito2/src/schedule/remote_job_scheduler.rs

+
+/// RemoteJobScheduler is a trait that defines the API to schedule remote jobs.
+#[async_trait::async_trait]
+pub trait RemoteJobScheduler: Send + Sync + 'static {


Can we use a single trait for both remote and local job?

I think it's possible in theory, for example:

#[async_trait::async_trait] pub trait Scheduler: Send + Sync + 'static { async fn schedule(&self, job: Job, notifier: Option<Arc<dyn Notifier>>) -> Result<Option<JobId>>; }

sunng87 · 2024-06-19T13:22:34Z

src/mito2/src/request.rs

@@ -693,7 +693,7 @@ pub(crate) struct CompactionFinished {
    /// Region id.
    pub(crate) region_id: RegionId,
    /// Compaction result senders.
-    pub(crate) senders: Vec<OutputTx>,
+    pub(crate) senders: Option<Vec<OutputTx>>,


I remember we use empty vec for None in our code base @evenyag

github-actions bot added the docs-not-required This change does not impact docs. label Jun 9, 2024

zyy17 force-pushed the feat/add-experimental-remote-job-scheduler branch from c9d3214 to 90a4794 Compare June 17, 2024 03:28

zyy17 marked this pull request as ready for review June 19, 2024 10:07

zyy17 requested review from v0y4g3r, evenyag, waynexia and a team as code owners June 19, 2024 10:07

zyy17 requested review from sunng87 and shuiyisong June 19, 2024 10:07

sunng87 reviewed Jun 19, 2024

View reviewed changes

zyy17 closed this Jun 20, 2024

zyy17 force-pushed the feat/add-experimental-remote-job-scheduler branch from d08b56e to 5bcd7a1 Compare June 20, 2024 12:04

zyy17 mentioned this pull request Jun 20, 2024

feat: introduce the interface of RemoteJobScheduler #4181

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce the interface of `RemoteJobScheduler` #4124

feat: introduce the interface of `RemoteJobScheduler` #4124

zyy17 commented Jun 9, 2024 •

edited

Loading

zyy17 commented Jun 19, 2024

zyy17 commented Jun 19, 2024

codecov bot commented Jun 19, 2024

sunng87 Jun 19, 2024

zyy17 Jun 20, 2024 •

edited

Loading

sunng87 Jun 19, 2024

feat: introduce the interface of RemoteJobScheduler #4124

feat: introduce the interface of RemoteJobScheduler #4124

Conversation

zyy17 commented Jun 9, 2024 • edited Loading

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

RemoteJobScheduler

Other changes

TODOs

Checklist

zyy17 commented Jun 19, 2024

zyy17 commented Jun 19, 2024

codecov bot commented Jun 19, 2024

Codecov Report

sunng87 Jun 19, 2024

Choose a reason for hiding this comment

zyy17 Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

sunng87 Jun 19, 2024

Choose a reason for hiding this comment

feat: introduce the interface of `RemoteJobScheduler` #4124

feat: introduce the interface of `RemoteJobScheduler` #4124

zyy17 commented Jun 9, 2024 •

edited

Loading

`RemoteJobScheduler`

zyy17 Jun 20, 2024 •

edited

Loading