Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][flang][OpenMP] Experimental pass to map do concurrent to OMP #77285

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Jan 8, 2024

Adds a pass to map do concurrent to OpenMP worksharing consturcts. For now, only maps basic loops to omp parallel do. This is still a WIP with more work needed for testing and mapping more advanced loops.

So far, the pass is not added to any flang pipelines; it only lives as a standalone pass tested by fir-opt. Once we hopefully agree on the approach and the pass matures, I will add it to flang.

@ergawy ergawy self-assigned this Jan 8, 2024
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jan 8, 2024
@ergawy ergawy requested a review from tblah January 8, 2024 08:43
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds a pass to map do concurrent to OpenMP worksharing consturcts. For now, only maps basic loops to omp parallel do. This is still a WIP with more work needed for testing and mapping more advanced loops.


Full diff: https://github.com/llvm/llvm-project/pull/77285.diff

6 Files Affected:

  • (modified) flang/include/flang/Optimizer/HLFIR/HLFIROps.td (+1-1)
  • (modified) flang/include/flang/Optimizer/Transforms/Passes.h (+2)
  • (modified) flang/include/flang/Optimizer/Transforms/Passes.td (+20)
  • (modified) flang/lib/Optimizer/Transforms/CMakeLists.txt (+1)
  • (added) flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp (+162)
  • (added) flang/test/Transforms/DoConcurrent/basic.mlir (+60)
diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index 254771ca1780cf..718026e4fdfbe7 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -75,7 +75,7 @@ def hlfir_DeclareOp : hlfir_Op<"declare", [AttrSizedOperandSegments,
     func.func @foo(%arg0: !fir.ref<!fir.array<?x?x!fir.char<1,?>>>, %arg1: !fir.ref<i64>) {
       %c10 = arith.constant 10 : index
       %c20 = arith.constant 20 : index
-      %1 = fir.load %ag1 : fir.ref<i64>
+      %1 = fir.load %arg1 : fir.ref<i64>
       %2 = fir.shape_shift %c10, %1, %c20, %1 : (index, index, index, index) -> !fir.shapeshift<2>
       %3 = hfir.declare %arg0(%2) typeparams %1 {uniq_name = "c"} (fir.ref<!fir.array<?x?x!fir.char<1,?>>>, fir.shapeshift<2>, index) -> (fir.box<!fir.array<?x?x!fir.char<1,?>>>, fir.ref<!fir.array<?x?x!fir.char<1,?>>>)
       // ... uses %3#0 as "c"
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.h b/flang/include/flang/Optimizer/Transforms/Passes.h
index 6970da8698ae84..1f389a1f571b75 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.h
+++ b/flang/include/flang/Optimizer/Transforms/Passes.h
@@ -93,6 +93,8 @@ std::unique_ptr<mlir::Pass> createFunctionAttrPass();
 std::unique_ptr<mlir::Pass>
 createFunctionAttrPass(FunctionAttrTypes &functionAttr);
 
+std::unique_ptr<mlir::Pass> createDoConcurrentConversionPass();
+
 // declarative passes
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/Transforms/Passes.h.inc"
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index e3c45d41f04cc7..41d9be4249b5e2 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -370,4 +370,24 @@ def FunctionAttr : Pass<"function-attr", "mlir::func::FuncOp"> {
   let constructor = "::fir::createFunctionAttrPass()";
 }
 
+def DoConcurrentConversionPass : Pass<"fopenmp-do-concurrent-conversion", "mlir::func::FuncOp"> {
+  let summary = "Map `DO CONCURRENT` loops to OpenMP worksharing loops.";
+
+  let description = [{ This is an experimental pass to map `DO CONCURRENR` loops
+     to their correspnding equivalent OpenMP worksharing constructs.
+
+     For now the following is supported:
+       - Mapping simple loops to `parallel do`.
+
+     Still to TODO:
+       - More extensive testing.
+       - Mapping to `target teams distribute parallel do`.
+       - Allowing the user to control mapping behavior: either to the host or
+         target.
+  }];
+
+  let constructor = "::fir::createDoConcurrentConversionPass()";
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
 #endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index fc067ad3585395..13a46ef7422530 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -21,6 +21,7 @@ add_flang_library(FIRTransforms
   OMPMarkDeclareTarget.cpp
   VScaleAttr.cpp
   FunctionAttr.cpp
+  DoConcurrentConversion.cpp
 
   DEPENDS
   FIRDialect
diff --git a/flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp b/flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp
new file mode 100644
index 00000000000000..180c0bdf672af9
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp
@@ -0,0 +1,162 @@
+//===- DoConcurrentConversion.cpp -- map `DO CONCURRENT` to OpenMP loops --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/FIRDialect.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/FIRContext.h"
+#include "flang/Optimizer/HLFIR/HLFIRDialect.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/IR/IRMapping.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/DialectConversion.h"
+
+#include <memory>
+
+namespace fir {
+#define GEN_PASS_DEF_DOCONCURRENTCONVERSIONPASS
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "fopenmp-do-concurrent-conversion"
+
+namespace {
+class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
+public:
+  using mlir::OpConversionPattern<fir::DoLoopOp>::OpConversionPattern;
+
+  mlir::LogicalResult
+  matchAndRewrite(fir::DoLoopOp doLoop, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    mlir::OpPrintingFlags flags;
+    flags.printGenericOpForm();
+
+    mlir::omp::ParallelOp parallelOp =
+        rewriter.create<mlir::omp::ParallelOp>(doLoop.getLoc());
+
+    rewriter.createBlock(&parallelOp.getRegion());
+    mlir::Block &block = parallelOp.getRegion().back();
+
+    rewriter.setInsertionPointToEnd(&block);
+    rewriter.create<mlir::omp::TerminatorOp>(doLoop.getLoc());
+
+    rewriter.setInsertionPointToStart(&block);
+
+    // Clone the LB, UB, step defining ops inside the parallel region.
+    llvm::SmallVector<mlir::Value> lowerBound, upperBound, step;
+    lowerBound.push_back(
+        rewriter.clone(*doLoop.getLowerBound().getDefiningOp())->getResult(0));
+    upperBound.push_back(
+        rewriter.clone(*doLoop.getUpperBound().getDefiningOp())->getResult(0));
+    step.push_back(
+        rewriter.clone(*doLoop.getStep().getDefiningOp())->getResult(0));
+
+    auto wsLoopOp = rewriter.create<mlir::omp::WsLoopOp>(
+        doLoop.getLoc(), lowerBound, upperBound, step);
+    wsLoopOp.setInclusive(true);
+
+    auto outlineableOp =
+        mlir::dyn_cast<mlir::omp::OutlineableOpenMPOpInterface>(*parallelOp);
+    assert(outlineableOp);
+    rewriter.setInsertionPointToStart(outlineableOp.getAllocaBlock());
+
+    // For the induction variable, we need to privative its allocation and
+    // binding inside the parallel region.
+    llvm::SmallSetVector<mlir::Operation *, 2> workList;
+    // Therefore, we first discover the induction variable by discovering
+    // `fir.store`s where the source is the loop's block argument.
+    workList.insert(doLoop.getInductionVar().getUsers().begin(),
+                    doLoop.getInductionVar().getUsers().end());
+    llvm::SmallSetVector<fir::StoreOp, 2> inductionVarTargetStores;
+
+    // Walk the def-chain of the loop's block argument until we hit `fir.store`.
+    while (!workList.empty()) {
+      mlir::Operation *item = workList.front();
+
+      if (auto storeOp = mlir::dyn_cast<fir::StoreOp>(item)) {
+        inductionVarTargetStores.insert(storeOp);
+      } else {
+        workList.insert(item->getUsers().begin(), item->getUsers().end());
+      }
+
+      workList.remove(item);
+    }
+
+    // For each collected `fir.sotre`, find the target memref's alloca's and
+    // declare ops.
+    llvm::SmallSetVector<mlir::Operation *, 4> declareAndAllocasToClone;
+    for (auto storeOp : inductionVarTargetStores) {
+      mlir::Operation *storeTarget = storeOp.getMemref().getDefiningOp();
+
+      for (auto operand : storeTarget->getOperands()) {
+        declareAndAllocasToClone.insert(operand.getDefiningOp());
+      }
+      declareAndAllocasToClone.insert(storeTarget);
+    }
+
+    mlir::IRMapping mapper;
+
+    // Collect the memref defining ops in the parallel region.
+    for (mlir::Operation *opToClone : declareAndAllocasToClone) {
+      rewriter.clone(*opToClone, mapper);
+    }
+
+    // Clone the loop's body inside the worksharing construct using the mapped
+    // memref values.
+    rewriter.cloneRegionBefore(doLoop.getRegion(), wsLoopOp.getRegion(),
+                               wsLoopOp.getRegion().begin(), mapper);
+
+    mlir::Operation *terminator = wsLoopOp.getRegion().back().getTerminator();
+    rewriter.setInsertionPointToEnd(&wsLoopOp.getRegion().back());
+    rewriter.create<mlir::omp::YieldOp>(terminator->getLoc());
+    rewriter.eraseOp(terminator);
+
+    rewriter.eraseOp(doLoop);
+
+    return mlir::success();
+  }
+};
+
+class DoConcurrentConversionPass
+    : public fir::impl::DoConcurrentConversionPassBase<
+          DoConcurrentConversionPass> {
+public:
+  void runOnOperation() override {
+    mlir::func::FuncOp func = getOperation();
+
+    if (func.isDeclaration()) {
+      return;
+    }
+
+    auto *context = &getContext();
+    mlir::RewritePatternSet patterns(context);
+    patterns.insert<DoConcurrentConversion>(context);
+    mlir::ConversionTarget target(*context);
+    target.addLegalDialect<fir::FIROpsDialect, hlfir::hlfirDialect,
+                           mlir::arith::ArithDialect, mlir::func::FuncDialect,
+                           mlir::omp::OpenMPDialect>();
+
+    target.addDynamicallyLegalOp<fir::DoLoopOp>(
+        [](fir::DoLoopOp op) { return !op.getUnordered(); });
+
+    if (mlir::failed(mlir::applyFullConversion(getOperation(), target,
+                                               std::move(patterns)))) {
+      mlir::emitError(mlir::UnknownLoc::get(context),
+                      "error in converting do-concurrent op");
+      signalPassFailure();
+    }
+  }
+};
+} // namespace
+
+std::unique_ptr<mlir::Pass> fir::createDoConcurrentConversionPass() {
+  return std::make_unique<DoConcurrentConversionPass>();
+}
diff --git a/flang/test/Transforms/DoConcurrent/basic.mlir b/flang/test/Transforms/DoConcurrent/basic.mlir
new file mode 100644
index 00000000000000..b4a2ced45950e5
--- /dev/null
+++ b/flang/test/Transforms/DoConcurrent/basic.mlir
@@ -0,0 +1,60 @@
+// Tests mapping of a basic `do concurrent` loop to `!$omp parallel do`.
+
+// RUN: fir-opt --fopenmp-do-concurrent-conversion %s | FileCheck %s
+
+// CHECK-LABEL: func.func @do_concurrent_basic
+func.func @do_concurrent_basic() attributes {fir.bindc_name = "do_concurrent_basic"} {
+    // CHECK: %[[ARR:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFEa"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+    // CHECK: %[[C1:.*]] = arith.constant 1 : i32
+    // CHECK: %[[C10:.*]] = arith.constant 10 : i32
+
+    %0 = fir.alloca i32 {bindc_name = "i"}
+    %1:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+    %2 = fir.address_of(@_QFEa) : !fir.ref<!fir.array<10xi32>>
+    %c10 = arith.constant 10 : index
+    %3 = fir.shape %c10 : (index) -> !fir.shape<1>
+    %4:2 = hlfir.declare %2(%3) {uniq_name = "_QFEa"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+    %c1_i32 = arith.constant 1 : i32
+    %7 = fir.convert %c1_i32 : (i32) -> index
+    %c10_i32 = arith.constant 10 : i32
+    %8 = fir.convert %c10_i32 : (i32) -> index
+    %c1 = arith.constant 1 : index
+
+    // CHECK-NOT: fir.do_loop
+
+    // CHECK: omp.parallel {
+
+    // CHECK-NEXT: %[[ITER_VAR:.*]] = fir.alloca i32 {bindc_name = "i"}
+    // CHECK-NEXT: %[[BINDING:.*]]:2 = hlfir.declare %[[ITER_VAR]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+
+    // CHECK: %[[LB:.*]] = fir.convert %[[C1]] : (i32) -> index
+    // CHECK: %[[UB:.*]] = fir.convert %[[C10]] : (i32) -> index
+    // CHECK: %[[STEP:.*]] = arith.constant 1 : index
+
+     // CHECK: omp.wsloop for (%[[ARG0:.*]]) : index = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+     // CHECK-NEXT: %[[IV_IDX:.*]] = fir.convert %[[ARG0]] : (index) -> i32
+     // CHECK-NEXT: fir.store %[[IV_IDX]] to %[[BINDING]]#1 : !fir.ref<i32>
+     // CHECK-NEXT: %[[IV_VAL1:.*]] = fir.load %[[BINDING]]#0 : !fir.ref<i32>
+     // CHECK-NEXT: %[[IV_VAL2:.*]] = fir.load %[[BINDING]]#0 : !fir.ref<i32>
+     // CHECK-NEXT: %[[IV_VAL_I64:.*]] = fir.convert %[[IV_VAL2]] : (i32) -> i64
+     // CHECK-NEXT: %[[ARR_ACCESS:.*]] = hlfir.designate %[[ARR]]#0 (%[[IV_VAL_I64]])  : (!fir.ref<!fir.array<10xi32>>, i64) -> !fir.ref<i32>
+     // CHECK-NEXT: hlfir.assign %[[IV_VAL1]] to %[[ARR_ACCESS]] : i32, !fir.ref<i32>
+     // CHECK-NEXT: omp.yield
+     // CHECK-NEXT: }
+
+     // CHECK-NEXT: omp.terminator
+     // CHECK-NEXT: }
+    fir.do_loop %arg0 = %7 to %8 step %c1 unordered {
+      %13 = fir.convert %arg0 : (index) -> i32
+      fir.store %13 to %1#1 : !fir.ref<i32>
+      %14 = fir.load %1#0 : !fir.ref<i32>
+      %15 = fir.load %1#0 : !fir.ref<i32>
+      %16 = fir.convert %15 : (i32) -> i64
+      %17 = hlfir.designate %4#0 (%16)  : (!fir.ref<!fir.array<10xi32>>, i64) -> !fir.ref<i32>
+      hlfir.assign %14 to %17 : i32, !fir.ref<i32>
+    }
+
+    // CHECK-NOT: fir.do_loop
+
+    return
+  }

@kiranchandramohan
Copy link
Contributor

Nice start.

In case you have missed it, there are some recommendations for do concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md.

I guess the things to consider are:

  1. Is any additional analysis necessary before this conversion?
  2. How to map locality constraints, i believe they are currently handled in lowering. This will need to be delayed.
  3. Ensure this is only enabled with driver flags.

@ergawy
Copy link
Member Author

ergawy commented Jan 9, 2024

Thanks @kiranchandramohan for looking at this!

Nice start.

In case you have missed it, there are some recommendations for do concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md.

I guess the things to consider are:

1. Is any additional analysis necessary before this conversion?

I also have the feeling that some analyses will be needed but I am not sure what exactly yet. Maybe a starting point would be to verify that input loops respect the constraints listed by the spec, for example: C1121-C1137. I know some of these constraints are enforced by the front-end already before lowering so we will have to consider them one by one and check the best place to check for these constraints. There is also section 11.1.7.5 Additional semantics for DO CONCURRENT constructs in the spec which I think can investigated for further analyses before we apply the mapping to OMP. Do you have any particular analyses in mind outside of this?

2. How to map locality constraints, i believe they are currently handled in lowering. This will need to be delayed.

Indeed this is currently handled in lowering. Below is an example without local_init and then the same loop with local_init:

Without local_init this is the loop directly after lowering:
    // do concurrent (integer :: j=i:10)
    // end do
    fir.do_loop %arg0 = %8 to %9 step %c1 unordered {
      %10 = fir.convert %arg0 : (index) -> i32
      fir.store %10 to %1#1 : !fir.ref<i32>
    }
With local_init this is the loop directly after lowering:
    //  do concurrent (integer :: j=i:10) local_init(i)
    //  end do
    fir.do_loop %arg0 = %8 to %9 step %c1 unordered {
      %10 = fir.convert %arg0 : (index) -> i32
      fir.store %10 to %1#1 : !fir.ref<i32>
      %11 = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
      %12:2 = hlfir.declare %11 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      %13 = fir.load %6#0 : !fir.ref<i32>
      hlfir.assign %13 to %12#0 : i32, !fir.ref<i32>
    }
So, one possibility I think would be:
  • Extract the omp.map_info op to to some shared location between fir and omp dialects.
  • Use map_info to model locality constraints for do concurrent as well.

Do you see any blockers for such approach?

3. Ensure this is only enabled with driver flags.

Will do! For now, I suggest to just be a standalone pass until we flesh it out a bit more. Please let me know if you disagree.


In addition to what you mentioned, I think point number 4 would be how to model reductions. That's for the future though after the above points are addressed.


With that in-mind, do you mind if we review and merge the current PR as it is and defer further development to later PRs? I think doing all this in one PR won't be ideal.

@kiranchandramohan
Copy link
Contributor

Thanks @ergawy for the reply.

Since there are some strong opinions expressed about Do Concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md and also that there are several possible lowerings (sequential, CPU-OpenMP, GPU-OpenMP, CPU-SIMD, GPU-OpenACC), it might be good to agree on the design first and then proceed. The way to do this would be to write a design doc and submit a PR to the flang/docs directive and advertise in Flang Slack and discourse. In particular, we should be clear about the analysis required. I have not read the relevant sections in the standard in detail to reply to your question above, doesn't the document cover this?

@ergawy
Copy link
Member Author

ergawy commented Jan 9, 2024

Thanks @ergawy for the reply.

Since there are some strong opinions expressed about Do Concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md and also that there are several possible lowerings (sequential, CPU-OpenMP, GPU-OpenMP, CPU-SIMD, GPU-OpenACC), it might be good to agree on the design first and then proceed. The way to do this would be to write a design doc and submit a PR to the flang/docs directive and advertise in Flang Slack and discourse. In particular, we should be clear about the analysis required. I have not read the relevant sections in the standard in detail to reply to your question above, doesn't the document cover this?

Sure thing! I will start working on a design doc and open a separate (parent) PR for it.

@ergawy ergawy requested a review from klausler January 10, 2024 04:33
@klausler
Copy link
Contributor

Do not run DO CONCURRENT in parallel by default without ensuring first that the loop is free of non-parallelizable data usage.

@ergawy
Copy link
Member Author

ergawy commented Jan 11, 2024

Do not run DO CONCURRENT in parallel by default without ensuring first that the loop is free of non-parallelizable data usage.

Thanks for your reply. Yup, I am looking into understanding the inherent issues in do concurrent at the moment. I spent sometime going over your discussions on Github and J3 forums (not directly you but relevant to the issues you raised) ([1], [2], and the linked papers and documents) and trying to understand the issues in deeper detail. Once I form some more background, I will reach out to you if you don't mind since you thought about this problem a lot longer than I did.

@ergawy ergawy marked this pull request as draft January 11, 2024 06:45
@klausler
Copy link
Contributor

Do not run DO CONCURRENT in parallel by default without ensuring first that the loop is free of non-parallelizable data usage.

Thanks for your reply. Yup, I am looking into understanding the inherent issues in do concurrent at the moment. I spent sometime going over your discussions on Github and J3 forums (not directly you but relevant to the issues you raised) ([1], [2], and the linked papers and documents) and trying to understand the issues in deeper detail. Once I form some more background, I will reach out to you if you don't mind since you thought about this problem a lot longer than I did.

It is not complicated.

There are two common assumptions about DO CONCURRENT that are not true. First, due to the name, many people assume that DO CONCURRENT is an assertion by the programmer that the loop is safe to parallelize and/or a request to parallelize the loop. Second, that the restrictions imposed by the standard on DO CONCURRENT loops make any conformant program safe to parallelize.

The actual definition of DO CONCURRENT is such that it is safe to execute the iterations of the loop in any serial order. They are not sufficient to execute the iterations of the loop in parallel. You can write a DO CONCURRENT loop that is perfectly conformant with the standard but still unsafe to parallelize.

The standard could be fixed with a small change, but the committee won't do it.

If you have permission from the user via the command line to parallelize, fine. But by default, you still need to analyze the data flow in the loop to guarantee safe parallelization.

Adds a pass to map `do concurrent` to OpenMP worksharing consturcts. For
now, only maps basic loops to `omp parallel do`. This is still a WIP
with more work needed for testing and mapping more advanced loops.
This does not detail everything yet, still working through this.
@ergawy ergawy force-pushed the do_concurrent_to_omp branch from 4be76ec to 07d6c48 Compare February 16, 2024 09:32
Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ergawy, very late review on my side. Thanks for working on this, it is a great feature. On my side, as long as this pass is run behind a flag (given the points made by Peter), the direction looks good to me (I appreciate the fact that this is done in a pass so that do concurrent can also easily be mapped to other concepts).

let summary = "Map `DO CONCURRENT` loops to OpenMP worksharing loops.";

let description = [{ This is an experimental pass to map `DO CONCURRENR` loops
to their correspnding equivalent OpenMP worksharing constructs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to their correspnding equivalent OpenMP worksharing constructs.
to their corresponding equivalent OpenMP worksharing constructs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants