Skip to content

[LoopUnroll] Rotate loop before unrolling inside of UnrollRuntimeLoopRemainder #148243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mark-sed
Copy link

This MR is a alternative approach to #146540 (@fhahn @annamthomas ).

This patch adds loop rotation to runtime loop unrolling, if this makes the loop computable, which then might enable additional unrolling of the loop. To minimize the possibility of rotation without unrolling this rotation
is done right inside of UnrollRuntimeLoopRemainder.

As was the case in the #146540 patch I am thinking about refactoring the loop rotation function to split legality and profitability checks and will try to submit a NFC for it.

computable, which then might enable additional unrolling of the loop.

To minimize the possibility of rotation without unrolling this rotation
is done right inside of UnrollRuntimeLoopRemainder.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Marek Sedláček (mark-sed)

Changes

This MR is a alternative approach to #146540 (@fhahn @annamthomas ).

This patch adds loop rotation to runtime loop unrolling, if this makes the loop computable, which then might enable additional unrolling of the loop. To minimize the possibility of rotation without unrolling this rotation
is done right inside of UnrollRuntimeLoopRemainder.

As was the case in the #146540 patch I am thinking about refactoring the loop rotation function to split legality and profitability checks and will try to submit a NFC for it.


Patch is 56.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148243.diff

10 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h (+9-6)
  • (modified) llvm/include/llvm/Transforms/Utils/UnrollLoop.h (+17-2)
  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+3-2)
  • (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+16-12)
  • (modified) llvm/lib/Transforms/Utils/LoopUnroll.cpp (+72-58)
  • (modified) llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp (+8-6)
  • (modified) llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp (+35-13)
  • (modified) llvm/test/Transforms/LoopUnroll/full-unroll-avoid-partial.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopUnroll/runtime-loop-multiexit-dom-verify.ll (+189-83)
  • (added) llvm/test/Transforms/LoopUnroll/runtime-unroll-after-rotate.ll (+106)
diff --git a/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h b/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h
index c3643e0f27f94..1c83bd706ee3e 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_TRANSFORMS_UTILS_LOOPROTATIONUTILS_H
 #define LLVM_TRANSFORMS_UTILS_LOOPROTATIONUTILS_H
 
+#include "llvm/ADT/STLExtras.h"
 #include "llvm/Support/Compiler.h"
 
 namespace llvm {
@@ -32,12 +33,14 @@ class TargetTransformInfo;
 /// header. If the loop header's size exceeds the threshold, the loop rotation
 /// will give up. The flag IsUtilMode controls the heuristic used in the
 /// LoopRotation. If it is true, the profitability heuristic will be ignored.
-LLVM_ABI bool LoopRotation(Loop *L, LoopInfo *LI,
-                           const TargetTransformInfo *TTI, AssumptionCache *AC,
-                           DominatorTree *DT, ScalarEvolution *SE,
-                           MemorySSAUpdater *MSSAU, const SimplifyQuery &SQ,
-                           bool RotationOnly, unsigned Threshold,
-                           bool IsUtilMode, bool PrepareForLTO = false);
+/// The ProfitabilityCheck function can override general profitability check.
+LLVM_ABI bool LoopRotation(
+    Loop *L, LoopInfo *LI, const TargetTransformInfo *TTI, AssumptionCache *AC,
+    DominatorTree *DT, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+    const SimplifyQuery &SQ, bool RotationOnly, unsigned Threshold,
+    bool IsUtilMode, bool PrepareForLTO = false,
+    function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck =
+        [](Loop *, ScalarEvolution *) { return false; });
 
 } // namespace llvm
 
diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
index 765c613b04a44..4499b6f43ba19 100644
--- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
+++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
@@ -54,11 +54,14 @@ LLVM_ABI const Loop *addClonedBlockToLoopInfo(BasicBlock *OriginalBB,
                                               LoopInfo *LI,
                                               NewLoopsMap &NewLoops);
 
-/// Represents the result of a \c UnrollLoop invocation.
+/// Represents the result of a \c UnrollLoop and \c UnrollAndJamLoop invocation.
 enum class LoopUnrollResult {
   /// The loop was not modified.
   Unmodified,
 
+  /// The loop was modified, but not unrolled.
+  Modified,
+
   /// The loop was partially unrolled -- we still have a loop, but with a
   /// smaller trip count.  We may also have emitted epilogue loop if the loop
   /// had a non-constant trip count.
@@ -69,6 +72,18 @@ enum class LoopUnrollResult {
   FullyUnrolled
 };
 
+/// Represents the result of a \c UnrollRuntimeLoopRemainder invocation.
+enum class LoopReminderUnrollResult {
+  /// The loop reminder was not modified.
+  Unmodified,
+
+  /// The loop was rotated, but not unrolled.
+  Rotated,
+
+  /// The loop reminder was unrolled.
+  Unrolled
+};
+
 struct UnrollLoopOptions {
   unsigned Count;
   bool Force;
@@ -90,7 +105,7 @@ LLVM_ABI LoopUnrollResult UnrollLoop(Loop *L, UnrollLoopOptions ULO,
                                      Loop **RemainderLoop = nullptr,
                                      AAResults *AA = nullptr);
 
-LLVM_ABI bool UnrollRuntimeLoopRemainder(
+LLVM_ABI LoopReminderUnrollResult UnrollRuntimeLoopRemainder(
     Loop *L, unsigned Count, bool AllowExpensiveTripCount,
     bool UseEpilogRemainder, bool UnrollRemainder, bool ForgetAllSCEV,
     LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index a22d84dcf014d..8b1ab5a9e2181 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1357,8 +1357,9 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
   ULO.RuntimeUnrollMultiExit = UP.RuntimeUnrollMultiExit;
   LoopUnrollResult UnrollResult = UnrollLoop(
       L, ULO, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop, AA);
-  if (UnrollResult == LoopUnrollResult::Unmodified)
-    return LoopUnrollResult::Unmodified;
+  if (UnrollResult == LoopUnrollResult::Unmodified ||
+      UnrollResult == LoopUnrollResult::Modified)
+    return UnrollResult;
 
   if (RemainderLoop) {
     std::optional<MDNode *> RemainderLoopID =
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 66d0573e83f65..3d93d8a1b7d4c 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -69,16 +69,19 @@ class LoopRotate {
   bool RotationOnly;
   bool IsUtilMode;
   bool PrepareForLTO;
+  function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck;
 
 public:
   LoopRotate(unsigned MaxHeaderSize, LoopInfo *LI,
              const TargetTransformInfo *TTI, AssumptionCache *AC,
              DominatorTree *DT, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
              const SimplifyQuery &SQ, bool RotationOnly, bool IsUtilMode,
-             bool PrepareForLTO)
+             bool PrepareForLTO,
+             function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck)
       : MaxHeaderSize(MaxHeaderSize), LI(LI), TTI(TTI), AC(AC), DT(DT), SE(SE),
         MSSAU(MSSAU), SQ(SQ), RotationOnly(RotationOnly),
-        IsUtilMode(IsUtilMode), PrepareForLTO(PrepareForLTO) {}
+        IsUtilMode(IsUtilMode), PrepareForLTO(PrepareForLTO),
+        ProfitabilityCheck(ProfitabilityCheck) {}
   bool processLoop(Loop *L);
 
 private:
@@ -440,9 +443,9 @@ bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
 
     // Rotate if either the loop latch does *not* exit the loop, or if the loop
     // latch was just simplified. Or if we think it will be profitable.
-    if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
-        !profitableToRotateLoopExitingLatch(L) &&
-        !canRotateDeoptimizingLatchExit(L))
+    if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch &&
+        IsUtilMode == false && !profitableToRotateLoopExitingLatch(L) &&
+        !canRotateDeoptimizingLatchExit(L) && !ProfitabilityCheck(L, SE))
       return Rotated;
 
     // Check size of original header and reject loop if it is very big or we can't
@@ -1053,13 +1056,14 @@ bool LoopRotate::processLoop(Loop *L) {
 
 
 /// The utility to convert a loop into a loop with bottom test.
-bool llvm::LoopRotation(Loop *L, LoopInfo *LI, const TargetTransformInfo *TTI,
-                        AssumptionCache *AC, DominatorTree *DT,
-                        ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
-                        const SimplifyQuery &SQ, bool RotationOnly = true,
-                        unsigned Threshold = unsigned(-1),
-                        bool IsUtilMode = true, bool PrepareForLTO) {
+bool llvm::LoopRotation(
+    Loop *L, LoopInfo *LI, const TargetTransformInfo *TTI, AssumptionCache *AC,
+    DominatorTree *DT, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+    const SimplifyQuery &SQ, bool RotationOnly = true,
+    unsigned Threshold = unsigned(-1), bool IsUtilMode = true,
+    bool PrepareForLTO,
+    function_ref<bool(Loop *, ScalarEvolution *)> ProfitabilityCheck) {
   LoopRotate LR(Threshold, LI, TTI, AC, DT, SE, MSSAU, SQ, RotationOnly,
-                IsUtilMode, PrepareForLTO);
+                IsUtilMode, PrepareForLTO, ProfitabilityCheck);
   return LR.processLoop(L);
 }
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 86b268de43cf6..74662092a014f 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -486,12 +486,7 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
 
   // All these values should be taken only after peeling because they might have
   // changed.
-  BasicBlock *Preheader = L->getLoopPreheader();
-  BasicBlock *Header = L->getHeader();
   BasicBlock *LatchBlock = L->getLoopLatch();
-  SmallVector<BasicBlock *, 4> ExitBlocks;
-  L->getExitBlocks(ExitBlocks);
-  std::vector<BasicBlock *> OriginalLoopBlocks = L->getBlocks();
 
   const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
   const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
@@ -504,42 +499,6 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
   if (MaxTripCount && ULO.Count > MaxTripCount)
     ULO.Count = MaxTripCount;
 
-  struct ExitInfo {
-    unsigned TripCount;
-    unsigned TripMultiple;
-    unsigned BreakoutTrip;
-    bool ExitOnTrue;
-    BasicBlock *FirstExitingBlock = nullptr;
-    SmallVector<BasicBlock *> ExitingBlocks;
-  };
-  DenseMap<BasicBlock *, ExitInfo> ExitInfos;
-  SmallVector<BasicBlock *, 4> ExitingBlocks;
-  L->getExitingBlocks(ExitingBlocks);
-  for (auto *ExitingBlock : ExitingBlocks) {
-    // The folding code is not prepared to deal with non-branch instructions
-    // right now.
-    auto *BI = dyn_cast<BranchInst>(ExitingBlock->getTerminator());
-    if (!BI)
-      continue;
-
-    ExitInfo &Info = ExitInfos[ExitingBlock];
-    Info.TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
-    Info.TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
-    if (Info.TripCount != 0) {
-      Info.BreakoutTrip = Info.TripCount % ULO.Count;
-      Info.TripMultiple = 0;
-    } else {
-      Info.BreakoutTrip = Info.TripMultiple =
-          (unsigned)std::gcd(ULO.Count, Info.TripMultiple);
-    }
-    Info.ExitOnTrue = !L->contains(BI->getSuccessor(0));
-    Info.ExitingBlocks.push_back(ExitingBlock);
-    LLVM_DEBUG(dbgs() << "  Exiting block %" << ExitingBlock->getName()
-                      << ": TripCount=" << Info.TripCount
-                      << ", TripMultiple=" << Info.TripMultiple
-                      << ", BreakoutTrip=" << Info.BreakoutTrip << "\n");
-  }
-
   // Are we eliminating the loop control altogether?  Note that we can know
   // we're eliminating the backedge without knowing exactly which iteration
   // of the unrolled body exits.
@@ -552,17 +511,6 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
   if (CompletelyUnroll)
     ULO.Runtime = false;
 
-  // Go through all exits of L and see if there are any phi-nodes there. We just
-  // conservatively assume that they're inserted to preserve LCSSA form, which
-  // means that complete unrolling might break this form. We need to either fix
-  // it in-place after the transformation, or entirely rebuild LCSSA. TODO: For
-  // now we just recompute LCSSA for the outer loop, but it should be possible
-  // to fix it in-place.
-  bool NeedToFixLCSSA =
-      PreserveLCSSA && CompletelyUnroll &&
-      any_of(ExitBlocks,
-             [](const BasicBlock *BB) { return isa<PHINode>(BB->begin()); });
-
   // The current loop unroll pass can unroll loops that have
   // (1) single latch; and
   // (2a) latch is unconditional; or
@@ -587,21 +535,87 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
       UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog
                                               : isEpilogProfitable(L);
 
+  LoopReminderUnrollResult UnrollReminderResult =
+      LoopReminderUnrollResult::Unmodified;
+  if (ULO.Runtime) {
+    UnrollReminderResult = UnrollRuntimeLoopRemainder(
+        L, ULO.Count, ULO.AllowExpensiveTripCount, EpilogProfitability,
+        ULO.UnrollRemainder, ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
+        PreserveLCSSA, ULO.SCEVExpansionBudget, ULO.RuntimeUnrollMultiExit,
+        RemainderLoop);
+    LatchBlock = L->getLoopLatch();
+    LatchIsExiting = L->isLoopExiting(LatchBlock);
+  }
+
   if (ULO.Runtime &&
-      !UnrollRuntimeLoopRemainder(L, ULO.Count, ULO.AllowExpensiveTripCount,
-                                  EpilogProfitability, ULO.UnrollRemainder,
-                                  ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
-                                  PreserveLCSSA, ULO.SCEVExpansionBudget,
-                                  ULO.RuntimeUnrollMultiExit, RemainderLoop)) {
+      UnrollReminderResult != LoopReminderUnrollResult::Unrolled) {
     if (ULO.Force)
       ULO.Runtime = false;
     else {
       LLVM_DEBUG(dbgs() << "Won't unroll; remainder loop could not be "
                            "generated when assuming runtime trip count\n");
-      return LoopUnrollResult::Unmodified;
+      // Loop might have been rotated inside of UnrollRuntimeLoopRemainder and
+      // this needs to be propagated.
+      return UnrollReminderResult == LoopReminderUnrollResult::Rotated
+                 ? LoopUnrollResult::Modified
+                 : LoopUnrollResult::Unmodified;
+      ;
     }
   }
 
+  BasicBlock *Preheader = L->getLoopPreheader();
+  BasicBlock *Header = L->getHeader();
+  SmallVector<BasicBlock *, 4> ExitBlocks;
+  L->getExitBlocks(ExitBlocks);
+  std::vector<BasicBlock *> OriginalLoopBlocks = L->getBlocks();
+
+  // Go through all exits of L and see if there are any phi-nodes there. We just
+  // conservatively assume that they're inserted to preserve LCSSA form, which
+  // means that complete unrolling might break this form. We need to either fix
+  // it in-place after the transformation, or entirely rebuild LCSSA. TODO: For
+  // now we just recompute LCSSA for the outer loop, but it should be possible
+  // to fix it in-place.
+  bool NeedToFixLCSSA =
+      PreserveLCSSA && CompletelyUnroll &&
+      any_of(ExitBlocks,
+             [](const BasicBlock *BB) { return isa<PHINode>(BB->begin()); });
+
+  struct ExitInfo {
+    unsigned TripCount;
+    unsigned TripMultiple;
+    unsigned BreakoutTrip;
+    bool ExitOnTrue;
+    BasicBlock *FirstExitingBlock = nullptr;
+    SmallVector<BasicBlock *> ExitingBlocks;
+  };
+  DenseMap<BasicBlock *, ExitInfo> ExitInfos;
+  SmallVector<BasicBlock *, 4> ExitingBlocks;
+  L->getExitingBlocks(ExitingBlocks);
+  for (auto *ExitingBlock : ExitingBlocks) {
+    // The folding code is not prepared to deal with non-branch instructions
+    // right now.
+    auto *BI = dyn_cast<BranchInst>(ExitingBlock->getTerminator());
+    if (!BI)
+      continue;
+
+    ExitInfo &Info = ExitInfos[ExitingBlock];
+    Info.TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
+    Info.TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
+    if (Info.TripCount != 0) {
+      Info.BreakoutTrip = Info.TripCount % ULO.Count;
+      Info.TripMultiple = 0;
+    } else {
+      Info.BreakoutTrip = Info.TripMultiple =
+          (unsigned)std::gcd(ULO.Count, Info.TripMultiple);
+    }
+    Info.ExitOnTrue = !L->contains(BI->getSuccessor(0));
+    Info.ExitingBlocks.push_back(ExitingBlock);
+    LLVM_DEBUG(dbgs() << "  Exiting block %" << ExitingBlock->getName()
+                      << ": TripCount=" << Info.TripCount
+                      << ", TripMultiple=" << Info.TripMultiple
+                      << ", BreakoutTrip=" << Info.BreakoutTrip << "\n");
+  }
+
   using namespace ore;
   // Report the unrolling decision.
   if (CompletelyUnroll) {
diff --git a/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp b/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
index ca90bb65f5708..43189ddea4603 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
@@ -239,14 +239,16 @@ llvm::UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,
 
   // We use the runtime remainder in cases where we don't know trip multiple
   if (TripMultiple % Count != 0) {
-    if (!UnrollRuntimeLoopRemainder(L, Count, /*AllowExpensiveTripCount*/ false,
-                                    /*UseEpilogRemainder*/ true,
-                                    UnrollRemainder, /*ForgetAllSCEV*/ false,
-                                    LI, SE, DT, AC, TTI, true,
-                                    SCEVCheapExpansionBudget, EpilogueLoop)) {
+    auto UnrollReminderResult = UnrollRuntimeLoopRemainder(
+        L, Count, /*AllowExpensiveTripCount*/ false,
+        /*UseEpilogRemainder*/ true, UnrollRemainder, /*ForgetAllSCEV*/ false,
+        LI, SE, DT, AC, TTI, true, SCEVCheapExpansionBudget, EpilogueLoop);
+    if (UnrollReminderResult != LoopReminderUnrollResult::Unrolled) {
       LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; remainder loop could not be "
                            "generated when assuming runtime trip count\n");
-      return LoopUnrollResult::Unmodified;
+      return UnrollReminderResult == LoopReminderUnrollResult::Rotated
+                 ? LoopUnrollResult::Modified
+                 : LoopUnrollResult::Unmodified;
     }
   }
 
diff --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
index bf882d7406853..142f97b5c3a1a 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
@@ -37,6 +37,7 @@
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/LoopRotationUtils.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
 #include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
 #include "llvm/Transforms/Utils/UnrollLoop.h"
@@ -574,7 +575,7 @@ static Value *CreateTripRemainder(IRBuilder<> &B, Value *BECount,
 ///        if (extraiters != 0) jump Epil: // Omitted if unroll factor is 2.
 /// EpilExit:
 
-bool llvm::UnrollRuntimeLoopRemainder(
+LoopReminderUnrollResult llvm::UnrollRuntimeLoopRemainder(
     Loop *L, unsigned Count, bool AllowExpensiveTripCount,
     bool UseEpilogRemainder, bool UnrollRemainder, bool ForgetAllSCEV,
     LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
@@ -586,10 +587,31 @@ bool llvm::UnrollRuntimeLoopRemainder(
   LLVM_DEBUG(UseEpilogRemainder ? dbgs() << "Using epilog remainder.\n"
                                 : dbgs() << "Using prolog remainder.\n");
 
+  LoopReminderUnrollResult Result = LoopReminderUnrollResult::Unmodified;
+
+  // Rotate loop if it makes the exit count from the latch computable.
+  BasicBlock *OrigHeader = L->getHeader();
+  BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+  if (BI && !BI->isUnconditional() &&
+      isa<SCEVCouldNotCompute>(SE->getExitCount(L, L->getLoopLatch())) &&
+      !isa<SCEVCouldNotCompute>(SE->getExitCount(L, OrigHeader))) {
+    LLVM_DEBUG(
+        dbgs() << "  Rotating loop to make the exit count computable.\n");
+    SimplifyQuery SQ{OrigHeader->getDataLayout()};
+    SQ.TLI = nullptr;
+    SQ.DT = DT;
+    SQ.AC = AC;
+    if (llvm::LoopRotation(L, LI, TTI, AC, DT, SE, nullptr /*MemorySSAUpdater*/,
+                           SQ, false /*RotationOnly*/, 16 /*Threshold*/,
+                           false /*IsUtilMode*/, false /*PrepareForLTO*/,
+                           [](Loop *, ScalarEvolution *) { return true; }))
+      Result = LoopReminderUnrollResult::Rotated;
+  }
+
   // Make sure the loop is in canonical form.
   if (!L->isLoopSimplifyForm()) {
     LLVM_DEBUG(dbgs() << "Not in simplify form!\n");
-    return false;
+    return Result;
   }
 
   // Guaranteed by LoopSimplifyForm.
@@ -603,7 +625,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
     LLVM_DEBUG(
         dbgs()
         << "Loop latch not terminated by a conditional branch.\n");
-    return false;
+    return Result;
   }
 
   unsigned ExitIndex = LatchBR->getSuccessor(0) == Header ? 1 : 0;
@@ -615,7 +637,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
     LLVM_DEBUG(
         dbgs()
         << "One of the loop latch successors must be the exit block.\n");
-    return false;
+    return Result;
   }
 
   // These are exit blocks other than the target of the latch exiting block.
@@ -627,12 +649,12 @@ bool llvm::UnrollRuntimeLoopRemainder(
     // We rely on LCSSA form being preserved when the exit blocks are transformed.
     // (Note that only an off-by-default mode of the old PM disables PreserveLCCA.)
     if (!PreserveLCSSA)
-      return false;
+      return Result;
 
     // Priority goes to UnrollRuntimeMultiExit if it's supplied.
     if (UnrollRuntimeMultiExit.getNumOccurrences()) {
       if (!UnrollRuntimeMultiExit)
-        return false;
+        return Result;
     } else {
       // Otherwis...
[truncated]

@annamthomas
Copy link
Contributor

Some comments inline.

Thinking more about this, how about a slightly different approach (now that we can correct perform rotation in the UnrollRuntimeLoopRemainder):

  • pass the fact that we need rotation from the callee (UnrollRuntimeLoopRemainder) to the caller.
  • Depending on the caller, let the caller decide if it wants to do rotation and then call UnrollRuntimeLoopRemainder once more after doing the rotation.

This has 2 benefits:

  • Reduces the chances of "rotation without unrolling" even further. We passed all the legality and performance considerations for runtime unrolling, except for this rotation need. You can do what @fhahn previously suggested: assume we rotated and do the remaining checks on the header's exit count, i.e. as if the loop was rotated.
  • The caller can decide what to do with this (for example, we don't want unroll-and-jam to do anything with this). Similarly, if there are downstream users of unrolling, they wont get affected by the "unexpected" rotation.

@annamthomas annamthomas requested review from fhahn and annamthomas July 14, 2025 16:48
@mark-sed
Copy link
Author

@annamthomas Good suggestions. I have now added an argument to UnrollRuntimeLoopRemainder, which allows to rotate the loop and disabled it for UnrollAndJam. I'll now add the other part, which is a detection if unroll worked without rotation and if not, I'll also rotate the loop.

@mark-sed
Copy link
Author

@annamthomas When rewriting the code I have realized that it makes more sense to move the rotation after the failure of checks in UnrollRuntimeLoopRemainder and then do the rotation in LoopUnroll and call UnrollRuntimeLoopRemainder again.

This simplifies UnrollRuntimeLoopRemainder (no longer the need to propagate rotation into LoopUnroll) and also this should lower the chances of rotation without unrolling as @fhahn mentioned in the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants