Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to use self-defined ones when setting up multithreading? #1155

Closed
jiajia1417 opened this issue Jan 14, 2025 · 1 comment
Closed

Comments

@jiajia1417
Copy link

jiajia1417 commented Jan 14, 2025

Problem description:
I now have a matrix multiplication that needs to be called in an external multithread.

arm_compute::IScheduler& scheduler = arm_compute::Scheduler::get();
scheduler.set_num_threads(total_thread);

void run_acl_gemm_thread(
    float* out, const float* x, const float* weight,
    int num_tokens, int in_channels, int out_channels,
    float alpha, float beta, int total_thread, int threadid) {
    
    arm_compute::Tensor tensor_x, tensor_weight, tensor_out;

    arm_compute::TensorShape shape_x(in_channels, num_tokens);       
    arm_compute::TensorShape shape_weight(in_channels, out_channels);     
    arm_compute::TensorShape shape_out(out_channels, num_tokens);  


    tensor_x.allocator()->init(arm_compute::TensorInfo(shape_x, 1, arm_compute::DataType::F32));
    tensor_weight.allocator()->init(arm_compute::TensorInfo(shape_weight, 1, arm_compute::DataType::F32));
    tensor_out.allocator()->init(arm_compute::TensorInfo(shape_out, 1, arm_compute::DataType::F32));
    

    tensor_x.allocator()->import_memory(const_cast<float*>(x));
    tensor_weight.allocator()->import_memory(const_cast<float*>(weight)); 
    tensor_out.allocator()->import_memory(out);  

    arm_compute::NEGEMM gemm;
    arm_compute::GEMMInfo gemm_info;
    gemm_info.set_pretranspose_B(true);
    gemm.configure(&tensor_x, &tensor_weight, nullptr, &tensor_out, alpha, beta, gemm_info);
    gemm.run();
}

But now that I've defined other threads externally, is there any way to set the scheduler's multithreading.

std::vector<std::thread> threads;
    for (int thread_id = 0; thread_id < total_thread; ++thread_id) {
        threads.emplace_back();
    }

like this, make threads set by the scheduler point to threads.

@morgolock
Copy link

Hi @jiajia1417

Just to clarify: arm_compute::IScheduler& scheduler is used internally by the NEGEMM function to create multiple threads for the computation. Each thread will run the same kernel on different non overlapping slices of the data. For example: if there are 4 cpu cores, NEGEMM will use 4 threads and divide the work equally among these threads.
The Scheduler job is to create the threads for each kernel and execute the workloads. You do not have to set the scheduler explicitly, the function NEGEMM already knows how to use it.

From what I read above you would like to run NEGEMM altogether from multiple threads, is this correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants