Skip to content

[analyzer] Enforce not making overly complicated symbols #144327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

balazs-benics-sonarsource
Copy link
Contributor

Out of the worst 500 entry points, 45 were improved by at least 10%. Out of these 45, 5 were improved by more than 50%. Out of these 45, 2 were improved by more than 80%.

For example, for the
DelugeFirmware/src/OSLikeStuff/fault_handler/fault_handler.c TU:

  • printPointers entry point was improved from 31.1 seconds to 1.1 second (28x).
  • handle_cpu_fault entry point was improved from 15.5 seconds to 3 seconds (5x).

We had in total 3'037'085 entry points in the test pool. Out of these 390'156 were measured to run over a second.

asset

CPP-6182

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 16, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Benics (balazs-benics-sonarsource)

Changes

Out of the worst 500 entry points, 45 were improved by at least 10%. Out of these 45, 5 were improved by more than 50%. Out of these 45, 2 were improved by more than 80%.

For example, for the
DelugeFirmware/src/OSLikeStuff/fault_handler/fault_handler.c TU:

  • printPointers entry point was improved from 31.1 seconds to 1.1 second (28x).
  • handle_cpu_fault entry point was improved from 15.5 seconds to 3 seconds (5x).

We had in total 3'037'085 entry points in the test pool. Out of these 390'156 were measured to run over a second.

asset

CPP-6182


Patch is 40.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144327.diff

12 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h (+12-12)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h (+13-12)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h (+108-58)
  • (modified) clang/lib/StaticAnalyzer/Checkers/Taint.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/TrustNonnullChecker.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp (+26-12)
  • (modified) clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp (+11-4)
  • (modified) clang/lib/StaticAnalyzer/Core/SValBuilder.cpp (+48-41)
  • (modified) clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp (+18-11)
  • (modified) clang/lib/StaticAnalyzer/Core/SymbolManager.cpp (+6)
  • (added) clang/test/Analysis/ensure-capped-symbol-complexity.cpp (+53)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
index 2911554de9d97..0458a6125db9a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
@@ -57,6 +57,8 @@ class SValBuilder {
 protected:
   ASTContext &Context;
 
+  const AnalyzerOptions &AnOpts;
+
   /// Manager of APSInt values.
   BasicValueFactory BasicVals;
 
@@ -68,8 +70,6 @@ class SValBuilder {
 
   ProgramStateManager &StateMgr;
 
-  const AnalyzerOptions &AnOpts;
-
   /// The scalar type to use for array indices.
   const QualType ArrayIndexTy;
 
@@ -317,21 +317,21 @@ class SValBuilder {
     return nonloc::LocAsInteger(BasicVals.getPersistentSValWithData(loc, bits));
   }
 
-  nonloc::SymbolVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
-                               APSIntPtr rhs, QualType type);
+  DefinedOrUnknownSVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
+                                  APSIntPtr rhs, QualType type);
 
-  nonloc::SymbolVal makeNonLoc(APSIntPtr rhs, BinaryOperator::Opcode op,
-                               const SymExpr *lhs, QualType type);
+  DefinedOrUnknownSVal makeNonLoc(APSIntPtr rhs, BinaryOperator::Opcode op,
+                                  const SymExpr *lhs, QualType type);
 
-  nonloc::SymbolVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
-                               const SymExpr *rhs, QualType type);
+  DefinedOrUnknownSVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
+                                  const SymExpr *rhs, QualType type);
 
-  NonLoc makeNonLoc(const SymExpr *operand, UnaryOperator::Opcode op,
-                    QualType type);
+  DefinedOrUnknownSVal makeNonLoc(const SymExpr *operand,
+                                  UnaryOperator::Opcode op, QualType type);
 
   /// Create a NonLoc value for cast.
-  nonloc::SymbolVal makeNonLoc(const SymExpr *operand, QualType fromTy,
-                               QualType toTy);
+  DefinedOrUnknownSVal makeNonLoc(const SymExpr *operand, QualType fromTy,
+                                  QualType toTy);
 
   nonloc::ConcreteInt makeTruthVal(bool b, QualType type) {
     return nonloc::ConcreteInt(BasicVals.getTruthValue(b, type));
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
index aca14cf813c4b..11d0a22a31c46 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
@@ -51,9 +51,11 @@ class SymExpr : public llvm::FoldingSetNode {
   /// Note, however, that it can't be used in Profile because SymbolManager
   /// needs to compute Profile before allocating SymExpr.
   const SymbolID Sym;
+  const unsigned Complexity;
 
 protected:
-  SymExpr(Kind k, SymbolID Sym) : K(k), Sym(Sym) {}
+  SymExpr(Kind k, SymbolID Sym, unsigned Complexity)
+      : K(k), Sym(Sym), Complexity(Complexity) {}
 
   static bool isValidTypeForSymbol(QualType T) {
     // FIXME: Depending on whether we choose to deprecate structural symbols,
@@ -61,8 +63,6 @@ class SymExpr : public llvm::FoldingSetNode {
     return !T.isNull() && !T->isVoidType();
   }
 
-  mutable unsigned Complexity = 0;
-
 public:
   virtual ~SymExpr() = default;
 
@@ -108,7 +108,7 @@ class SymExpr : public llvm::FoldingSetNode {
     return llvm::make_range(symbol_iterator(this), symbol_iterator());
   }
 
-  virtual unsigned computeComplexity() const = 0;
+  unsigned complexity() const { return Complexity; }
 
   /// Find the region from which this symbol originates.
   ///
@@ -136,10 +136,15 @@ using SymbolRefSmallVectorTy = SmallVector<SymbolRef, 2>;
 /// A symbol representing data which can be stored in a memory location
 /// (region).
 class SymbolData : public SymExpr {
+  friend class SymbolManager;
   void anchor() override;
 
 protected:
-  SymbolData(Kind k, SymbolID sym) : SymExpr(k, sym) { assert(classof(this)); }
+  SymbolData(Kind k, SymbolID sym) : SymExpr(k, sym, computeComplexity()) {
+    assert(classof(this));
+  }
+
+  static unsigned computeComplexity(...) { return 1; }
 
 public:
   ~SymbolData() override = default;
@@ -147,14 +152,10 @@ class SymbolData : public SymExpr {
   /// Get a string representation of the kind of the region.
   virtual StringRef getKindStr() const = 0;
 
-  unsigned computeComplexity() const override {
-    return 1;
-  };
-
   // Implement isa<T> support.
-  static inline bool classof(const SymExpr *SE) {
-    Kind k = SE->getKind();
-    return k >= BEGIN_SYMBOLS && k <= END_SYMBOLS;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) {
+    return K >= BEGIN_SYMBOLS && K <= END_SYMBOLS;
   }
 };
 
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
index 86774ad5043dd..5239663788fb4 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
@@ -18,6 +18,7 @@
 #include "clang/AST/Type.h"
 #include "clang/Analysis/AnalysisDeclContext.h"
 #include "clang/Basic/LLVM.h"
+#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/APSIntPtr.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/StoreRef.h"
@@ -72,9 +73,9 @@ class SymbolRegionValue : public SymbolData {
   QualType getType() const override;
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == SymbolRegionValueKind;
-  }
+  static constexpr Kind ClassKind = SymbolRegionValueKind;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// A symbol representing the result of an expression in the case when we do
@@ -128,9 +129,9 @@ class SymbolConjured : public SymbolData {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == SymbolConjuredKind;
-  }
+  static constexpr Kind ClassKind = SymbolConjuredKind;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// A symbol representing the value of a MemRegion whose parent region has
@@ -172,9 +173,11 @@ class SymbolDerived : public SymbolData {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == SymbolDerivedKind;
+  static constexpr Kind ClassKind = SymbolDerivedKind;
+  static constexpr bool classof(const SymExpr *SE) {
+    return classof(SE->getKind());
   }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// SymbolExtent - Represents the extent (size in bytes) of a bounded region.
@@ -209,9 +212,9 @@ class SymbolExtent : public SymbolData {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == SymbolExtentKind;
-  }
+  static constexpr Kind ClassKind = SymbolExtentKind;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == SymbolExtentKind; }
 };
 
 /// SymbolMetadata - Represents path-dependent metadata about a specific region.
@@ -278,13 +281,14 @@ class SymbolMetadata : public SymbolData {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == SymbolMetadataKind;
-  }
+  static constexpr Kind ClassKind = SymbolMetadataKind;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// Represents a cast expression.
 class SymbolCast : public SymExpr {
+  friend class SymbolManager;
   const SymExpr *Operand;
 
   /// Type of the operand.
@@ -295,20 +299,19 @@ class SymbolCast : public SymExpr {
 
   friend class SymExprAllocator;
   SymbolCast(SymbolID Sym, const SymExpr *In, QualType From, QualType To)
-      : SymExpr(SymbolCastKind, Sym), Operand(In), FromTy(From), ToTy(To) {
+      : SymExpr(SymbolCastKind, Sym, computeComplexity(In, From, To)),
+        Operand(In), FromTy(From), ToTy(To) {
     assert(In);
     assert(isValidTypeForSymbol(From));
     // FIXME: GenericTaintChecker creates symbols of void type.
     // Otherwise, 'To' should also be a valid type.
   }
 
-public:
-  unsigned computeComplexity() const override {
-    if (Complexity == 0)
-      Complexity = 1 + Operand->computeComplexity();
-    return Complexity;
+  static unsigned computeComplexity(const SymExpr *In, QualType, QualType) {
+    return In->complexity() + 1;
   }
 
+public:
   QualType getType() const override { return ToTy; }
 
   LLVM_ATTRIBUTE_RETURNS_NONNULL
@@ -329,13 +332,14 @@ class SymbolCast : public SymExpr {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == SymbolCastKind;
-  }
+  static constexpr Kind ClassKind = SymbolCastKind;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// Represents a symbolic expression involving a unary operator.
 class UnarySymExpr : public SymExpr {
+  friend class SymbolManager;
   const SymExpr *Operand;
   UnaryOperator::Opcode Op;
   QualType T;
@@ -343,7 +347,8 @@ class UnarySymExpr : public SymExpr {
   friend class SymExprAllocator;
   UnarySymExpr(SymbolID Sym, const SymExpr *In, UnaryOperator::Opcode Op,
                QualType T)
-      : SymExpr(UnarySymExprKind, Sym), Operand(In), Op(Op), T(T) {
+      : SymExpr(UnarySymExprKind, Sym, computeComplexity(In, Op, T)),
+        Operand(In), Op(Op), T(T) {
     // Note, some unary operators are modeled as a binary operator. E.g. ++x is
     // modeled as x + 1.
     assert((Op == UO_Minus || Op == UO_Not) && "non-supported unary expression");
@@ -354,13 +359,12 @@ class UnarySymExpr : public SymExpr {
     assert(!Loc::isLocType(T) && "unary symbol should be nonloc");
   }
 
-public:
-  unsigned computeComplexity() const override {
-    if (Complexity == 0)
-      Complexity = 1 + Operand->computeComplexity();
-    return Complexity;
+  static unsigned computeComplexity(const SymExpr *In, UnaryOperator::Opcode,
+                                    QualType) {
+    return In->complexity() + 1;
   }
 
+public:
   const SymExpr *getOperand() const { return Operand; }
   UnaryOperator::Opcode getOpcode() const { return Op; }
   QualType getType() const override { return T; }
@@ -380,9 +384,9 @@ class UnarySymExpr : public SymExpr {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    return SE->getKind() == UnarySymExprKind;
-  }
+  static constexpr Kind ClassKind = UnarySymExprKind;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// Represents a symbolic expression involving a binary operator
@@ -391,8 +395,9 @@ class BinarySymExpr : public SymExpr {
   QualType T;
 
 protected:
-  BinarySymExpr(SymbolID Sym, Kind k, BinaryOperator::Opcode op, QualType t)
-      : SymExpr(k, Sym), Op(op), T(t) {
+  BinarySymExpr(SymbolID Sym, Kind k, BinaryOperator::Opcode op, QualType t,
+                unsigned Complexity)
+      : SymExpr(k, Sym, Complexity), Op(op), T(t) {
     assert(classof(this));
     // Binary expressions are results of arithmetic. Pointer arithmetic is not
     // handled by binary expressions, but it is instead handled by applying
@@ -408,14 +413,14 @@ class BinarySymExpr : public SymExpr {
   BinaryOperator::Opcode getOpcode() const { return Op; }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) {
-    Kind k = SE->getKind();
-    return k >= BEGIN_BINARYSYMEXPRS && k <= END_BINARYSYMEXPRS;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) {
+    return K >= BEGIN_BINARYSYMEXPRS && K <= END_BINARYSYMEXPRS;
   }
 
 protected:
   static unsigned computeOperandComplexity(const SymExpr *Value) {
-    return Value->computeComplexity();
+    return Value->complexity();
   }
   static unsigned computeOperandComplexity(const llvm::APSInt &Value) {
     return 1;
@@ -430,19 +435,28 @@ class BinarySymExpr : public SymExpr {
 };
 
 /// Template implementation for all binary symbolic expressions
-template <class LHSTYPE, class RHSTYPE, SymExpr::Kind ClassKind>
+template <class LHSTYPE, class RHSTYPE, SymExpr::Kind ClassK>
 class BinarySymExprImpl : public BinarySymExpr {
+  friend class SymbolManager;
   LHSTYPE LHS;
   RHSTYPE RHS;
 
   friend class SymExprAllocator;
   BinarySymExprImpl(SymbolID Sym, LHSTYPE lhs, BinaryOperator::Opcode op,
                     RHSTYPE rhs, QualType t)
-      : BinarySymExpr(Sym, ClassKind, op, t), LHS(lhs), RHS(rhs) {
+      : BinarySymExpr(Sym, ClassKind, op, t,
+                      computeComplexity(lhs, op, rhs, t)),
+        LHS(lhs), RHS(rhs) {
     assert(getPointer(lhs));
     assert(getPointer(rhs));
   }
 
+  static unsigned computeComplexity(LHSTYPE lhs, BinaryOperator::Opcode,
+                                    RHSTYPE rhs, QualType) {
+    // FIXME: Should we add 1 to complexity?
+    return computeOperandComplexity(lhs) + computeOperandComplexity(rhs);
+  }
+
 public:
   void dumpToStream(raw_ostream &os) const override {
     dumpToStreamImpl(os, LHS);
@@ -453,13 +467,6 @@ class BinarySymExprImpl : public BinarySymExpr {
   LHSTYPE getLHS() const { return LHS; }
   RHSTYPE getRHS() const { return RHS; }
 
-  unsigned computeComplexity() const override {
-    if (Complexity == 0)
-      Complexity =
-          computeOperandComplexity(RHS) + computeOperandComplexity(LHS);
-    return Complexity;
-  }
-
   static void Profile(llvm::FoldingSetNodeID &ID, LHSTYPE lhs,
                       BinaryOperator::Opcode op, RHSTYPE rhs, QualType t) {
     ID.AddInteger((unsigned)ClassKind);
@@ -474,7 +481,9 @@ class BinarySymExprImpl : public BinarySymExpr {
   }
 
   // Implement isa<T> support.
-  static bool classof(const SymExpr *SE) { return SE->getKind() == ClassKind; }
+  static constexpr Kind ClassKind = ClassK;
+  static bool classof(const SymExpr *SE) { return classof(SE->getKind()); }
+  static constexpr bool classof(Kind K) { return K == ClassKind; }
 };
 
 /// Represents a symbolic expression like 'x' + 3.
@@ -489,6 +498,33 @@ using IntSymExpr = BinarySymExprImpl<APSIntPtr, const SymExpr *,
 using SymSymExpr = BinarySymExprImpl<const SymExpr *, const SymExpr *,
                                      SymExpr::Kind::SymSymExprKind>;
 
+struct MaybeSymExpr {
+  MaybeSymExpr() = default;
+  explicit MaybeSymExpr(SymbolRef Sym) : Sym(Sym) {}
+  bool isValid() const { return Sym; }
+  bool isInvalid() const { return !isValid(); }
+  SymbolRef operator->() const { return Sym; }
+
+  SymbolRef getOrNull() const { return Sym; }
+  template <typename SymT> const SymT *getOrNull() const {
+    return llvm::dyn_cast_if_present<SymT>(Sym);
+  }
+
+  DefinedOrUnknownSVal getOrUnknown() const {
+    if (isInvalid())
+      return UnknownVal();
+    return nonloc::SymbolVal(Sym);
+  }
+
+  nonloc::SymbolVal getOrAssert() const {
+    assert(Sym);
+    return nonloc::SymbolVal(Sym);
+  }
+
+private:
+  SymbolRef Sym = nullptr;
+};
+
 class SymExprAllocator {
   SymbolID NextSymbolID = 0;
   llvm::BumpPtrAllocator &Alloc;
@@ -518,27 +554,27 @@ class SymbolManager {
   SymExprAllocator Alloc;
   BasicValueFactory &BV;
   ASTContext &Ctx;
+  const unsigned MaxCompComplexity;
 
 public:
   SymbolManager(ASTContext &ctx, BasicValueFactory &bv,
-                llvm::BumpPtrAllocator &bpalloc)
-      : SymbolDependencies(16), Alloc(bpalloc), BV(bv), Ctx(ctx) {}
+                llvm::BumpPtrAllocator &bpalloc, const AnalyzerOptions &Opts)
+      : SymbolDependencies(16), Alloc(bpalloc), BV(bv), Ctx(ctx),
+        MaxCompComplexity(Opts.MaxSymbolComplexity) {
+    assert(MaxCompComplexity > 0 && "Zero max complexity doesn't make sense");
+  }
 
   static bool canSymbolicate(QualType T);
 
   /// Create or retrieve a SymExpr of type \p SymExprT for the given arguments.
   /// Use the arguments to check for an existing SymExpr and return it,
   /// otherwise, create a new one and keep a pointer to it to avoid duplicates.
-  template <typename SymExprT, typename... Args>
-  const SymExprT *acquire(Args &&...args);
+  template <typename SymExprT, typename... Args> auto acquire(Args &&...args);
 
   const SymbolConjured *conjureSymbol(ConstCFGElementRef Elem,
                                       const LocationContext *LCtx, QualType T,
                                       unsigned VisitCount,
-                                      const void *SymbolTag = nullptr) {
-
-    return acquire<SymbolConjured>(Elem, LCtx, T, VisitCount, SymbolTag);
-  }
+                                      const void *SymbolTag = nullptr);
 
   QualType getType(const SymExpr *SE) const {
     return SE->getType();
@@ -672,7 +708,16 @@ class SymbolVisitor {
 };
 
 template <typename T, typename... Args>
-const T *SymbolManager::acquire(Args &&...args) {
+auto SymbolManager::acquire(Args &&...args) {
+  constexpr bool IsSymbolData = SymbolData::classof(T::ClassKind);
+  if constexpr (IsSymbolData) {
+    assert(T::computeComplexity(args...) == 1);
+  } else {
+    if (T::computeComplexity(args...) > MaxCompComplexity) {
+      return MaybeSymExpr();
+    }
+  }
+
   llvm::FoldingSetNodeID profile;
   T::Profile(profile, args...);
   void *InsertPos;
@@ -681,7 +726,12 @@ const T *SymbolManager::acquire(Args &&...args) {
     SD = Alloc.make<T>(std::forward<Args>(args)...);
     DataSet.InsertNode(SD, InsertPos);
   }
-  return cast<T>(SD);
+
+  if constexpr (IsSymbolData) {
+    return cast<T>(SD);
+  } else {
+    return MaybeSymExpr(SD);
+  }
 }
 
 } // namespace ento
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index e55d064253b84..f0dc889f15e7a 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -267,7 +267,7 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
 
   // HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
   if (const auto &Opts = State->getAnalysisManager().getAnalyzerOptions();
-      Sym->computeComplexity() > Opts.MaxTaintedSymbolComplexity) {
+      Sym->complexity() > Opts.MaxTaintedSymbolComplexity) {
     return {};
   }
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/TrustNonnullChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/TrustNonnullChecker.cpp
index e2f8bd541c967..ab0e3d8f56d86 100644
--- a/clang/lib/StaticAnalyzer/Checkers/TrustNonnullChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/TrustNonnullChecker.cpp
@@ -66,7 +66,7 @@ class TrustNonnullChecker : public Checker<check::PostCall,
                              SVal Cond,
                              bool Assumption) const {
     const SymbolRef CondS = Cond.getAsSymbol();
-    if (!CondS || CondS->computeComplexity() > ComplexityThreshold)
+    if (!CondS || CondS->complexity() > ComplexityThreshold)
       return State;
 
     for (SymbolRef Antecedent : CondS->symbols()) {
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
index fa8e669b6bb2f..3486485dcd686 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
@@ -67,7 +67,7 @@ void ExprEngine::VisitBinaryOperator(const BinaryOperator* B,
       if (RightV.isUnknown()) {
         unsigned Count = currBldrCtx->blockCount();
         RightV = svalBuilder.conjureSymbolVal(nullptr, getCFGElementRef(), LCtx,
-                                              Count);
+                                              RHS->getType(), Count);
       }
       // Simulate the effects of a "store":  bind the value of the RHS
       // to the L-Valu...
[truncated]

Count);
RHS->getType(), Count);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly in #137355 @fangyi-zhou changed the behavior of this line, thus needed a tiny bit of adjustment to make the new test pass while I was uplifting this downstream patch to current llvm main.
I didn't investigate the case beyond that this was the line that conjured a symbol of a wrong type after #137355, probably because in the past we directly passed a QualType here but after that change we rely on deducing the type from getCFGElementRef() - which is apparently wrong. To see the behavior, revert this hunk and see the broken test. There could be more places where this type mismatch on conjure could cause issues, but I didn't audit the code further.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I'm not convinced that this is the right direction for improving the analyzer runtime.

On the "risks" side I think that adding the corner case that "this may also return UnknownVal in rare situations" into many functions complicates the logic, burdens the code with early return branches and I fear that it will act as a footgun.

On the "benefits" side I fear that your statistics don't prove enough:

  1. You found that "Out of the worst 500 entry points, 45 were improved by at least 10%. Out of these 45, 5 were improved by more than 50%. Out of these 45, 2 were improved by more than 80%." but this only covers 9% of the worst 500 entry points. Eyeballing the graph suggests that there are some cases where the runtime actually got worse -- so please check that the overall effect of the change is also positive (e.g. the total runtime is reduced meaningfully).
  2. Moreover, if "worst 500 entry points" means "worst 500 in the first run", then it is a biased sample: if you pick the worst outliers (i.e. the entry points where the sum expected runtime + luck factor is largest), then you are expected to get entry points with worse than average luck (because among two similar entry points, the one with bad luck ends up in the worst 500 while the one with good luck avoids it), so if you redo the measurement, then regression toward the mean will produce better results -- even if you do both measurements with the same setup! As a sanity check, please redo the statistics on the the entry points that produced the worst 500 runtimes in the second run -- I fear that on that sample (which is biased in the opposite direction) you will see that the new revision is worse than the baseline.
  3. I'm also interested in comparing the statistical results with a second independent measurement -- is the set of "worst 500 entry points" stable between runs, or are these random unlucky functions that are hit with environmental issues?

If you can share the raw data, I can help with statistical calculations.

Comment on lines 320 to 334
DefinedOrUnknownSVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
APSIntPtr rhs, QualType type);

nonloc::SymbolVal makeNonLoc(APSIntPtr rhs, BinaryOperator::Opcode op,
const SymExpr *lhs, QualType type);
DefinedOrUnknownSVal makeNonLoc(APSIntPtr rhs, BinaryOperator::Opcode op,
const SymExpr *lhs, QualType type);

nonloc::SymbolVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
const SymExpr *rhs, QualType type);
DefinedOrUnknownSVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
const SymExpr *rhs, QualType type);

NonLoc makeNonLoc(const SymExpr *operand, UnaryOperator::Opcode op,
QualType type);
DefinedOrUnknownSVal makeNonLoc(const SymExpr *operand,
UnaryOperator::Opcode op, QualType type);

/// Create a NonLoc value for cast.
nonloc::SymbolVal makeNonLoc(const SymExpr *operand, QualType fromTy,
QualType toTy);
DefinedOrUnknownSVal makeNonLoc(const SymExpr *operand, QualType fromTy,
QualType toTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function must be renamed, because its current name strongly promises that it always returns a NonLoc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown values will be eventually converted into SymbolConjured anyway, as soon as they make it into the Environment. In this sense, spawning new Unknown values is basically never a good idea. I'm perfectly ok with working towards removing them as a concept.

So with that in mind, what do you folks think about returning a fresh atomic symbol instead of the Unknown?

Eg., make a new symbol class SymbolTooComplex (name TBD) whose only field is the pointer to the symbolic expression that would have been constructed if we didn't reach the complexity limit. But you're not supposed to access that pointer while folding arithmetic, it's only there for deduplication purposes. In other words, every time we evaluate the same operation and hit the same complexity limit, we'd get the same symbol.

This would probably avoid a lot of clumsiness in these high-level APIs (they will continue to make a NonLoc as promised), as well as improve deduplication of symbols instead of worsening it.

(FWIW makeNonLoc() isn't a fantastic name. These could be five different names.)

(The caller would still need to avoid hard-casting the value back to SymIntExpr or something like that. But why would anybody want to do that when they already have the raw parts?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with that in mind, what do you folks think about returning a fresh atomic symbol instead of the Unknown?

I thought about this, and problem is that conjuring a fresh symbol needs quite a bit of context: CFGElement, LocationContex, VisitCount that are not present. Passing these would be really annoying and intrusive for every use of the SValBuilder. I don't think this is the right way.

Eg., make a new symbol class SymbolTooComplex (name TBD) whose only field is the pointer to the symbolic expression that would have been constructed if we didn't reach the complexity limit. But you're not supposed to access that pointer while folding arithmetic, it's only there for deduplication purposes. In other words, every time we evaluate the same operation and hit the same complexity limit, we'd get the same symbol.

This would probably avoid a lot of clumsiness in these high-level APIs (they will continue to make a NonLoc as promised), as well as improve deduplication of symbols instead of worsening it.

I didn't think about adding a new Symbol kind. I'll give it a try.

(The caller would still need to avoid hard-casting the value back to SymIntExpr or something like that. But why would anybody want to do that when they already have the raw parts?)

I don't think anybody does that. They are definitely not supposed to unconditionally cast the results in moooost cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it works, see f8a9d4c

@balazs-benics-sonarsource
Copy link
Contributor Author

Unfortunately I'm not convinced that this is the right direction for improving the analyzer runtime.

On the "risks" side I think that adding the corner case that "this may also return UnknownVal in rare situations" into many functions complicates the logic, burdens the code with early return branches and I fear that it will act as a footgun.

On the "benefits" side I fear that your statistics don't prove enough:

  1. You found that "Out of the worst 500 entry points, 45 were improved by at least 10%. Out of these 45, 5 were improved by more than 50%. Out of these 45, 2 were improved by more than 80%." but this only covers 9% of the worst 500 entry points. Eyeballing the graph suggests that there are some cases where the runtime actually got worse -- so please check that the overall effect of the change is also positive (e.g. the total runtime is reduced meaningfully).
  2. Moreover, if "worst 500 entry points" means "worst 500 in the first run", then it is a biased sample: if you pick the worst outliers (i.e. the entry points where the sum expected runtime + luck factor is largest), then you are expected to get entry points with worse than average luck (because among two similar entry points, the one with bad luck ends up in the worst 500 while the one with good luck avoids it), so if you redo the measurement, then regression toward the mean will produce better results -- even if you do both measurements with the same setup! As a sanity check, please redo the statistics on the the entry points that produced the worst 500 runtimes in the second run -- I fear that on that sample (which is biased in the opposite direction) you will see that the new revision is worse than the baseline.
  3. I'm also interested in comparing the statistical results with a second independent measurement -- is the set of "worst 500 entry points" stable between runs, or are these random unlucky functions that are hit with environmental issues?

If you can share the raw data, I can help with statistical calculations.

Could you paraphrase your concerns?

How I read this you have mainly 2 concerns:

  1. The use of this strong-type makes it tedious the existing APIs to use because one needs to unwrap the value and frequently make an early-return to explicitly handle the case when a symbol-creation failed?
  2. The measurements weren't conclusive. There was no evidence provided that on the usual cases (all entry points) the RT would not regress. It was also not fair to look at only the longest 500 entry points to evaluate the effectiveness of limiting the max symbol complexity (in other words, honoring the max symbol complexity limit).

[...] regression toward the mean [...]

I formulated the test case in the tests by inspecting a long-running test case. It is consistently low-performing. You can also check it on godbolt, that it would not finish, because the symbol simplifier would need to do so much work due walking the overly complicated symbols. I've also inspected about 10 (a couple of months ago) other radically improved cases, and all showed a large number of binary op manipulations like hashing, or the test case I supply in this patch. This doesn't seem to be a coincidence to me.
From my experience, our pool is large enough to be consistent and roughly reproducible for long-ish entry points. According to our data, usually an entry point should finish in about 1 second if not less. Above that suggests something to look at.

To me, encountering symbols with complexity over the dedicated max symbol complexity threshold is a bug.
Can you think of other ways to ensure we never create overly complicated symbols?

@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 16, 2025

How I read this you have mainly 2 concerns:

  1. The use of this strong-type makes it tedious the existing APIs to use because one needs to unwrap the value and frequently make an early-return to explicitly handle the case when a symbol-creation failed?

Yes, this is roughly what I mean. In addition to the tediousness of writing the unwrap + early return boilerplate I also fear that these early return branches would be difficult to keep in mind and difficult to cover with tests, so they will act as a persistent source of bugs.

  1. The measurements weren't conclusive. There was no evidence provided that on the usual cases (all entry points) the RT would not regress. It was also not fair to look at only the longest 500 entry points to evaluate the effectiveness of limiting the max symbol complexity (in other words, honoring the max symbol complexity limit).

I don't claim that there was no evidence at all, but I feel that it wasn't significant enough.

[...] regression toward the mean [...]

I formulated the test case in the tests by inspecting a long-running test case. It is consistently low-performing. You can also check it on godbolt, that it would not finish, because the symbol simplifier would need to do so much work due walking the overly complicated symbols. I've also inspected about 10 (a couple of months ago) other radically improved cases, and all showed a large number of binary op manipulations like hashing, or the test case I supply in this patch. This doesn't seem to be a coincidence to me. From my experience, our pool is large enough to be consistent and roughly reproducible for long-ish entry points.

This additional information reduces my fears that the measured runtime difference is just environmental noise. However, extreme outliers (and I'd say that top 500 out of 3'000'000 or even just 390'000 is extreme outlier) are still a very treacherous ground for statistical conclusions (they can amplify small noises to a surprising degree), so I would like to see a more comprehensive statistical analysis. If you can share the raw data (the 3 million {old runtime, new runtime} pairs), I can do these statistical calculations myself.

According to our data, usually an entry point should finish in about 1 second if not less. Above that suggests something to look at.

Fair, although I'd emphasize that they are not "must be eliminated" bugs, but just "something to look at" -- which may or may not lead to an improvement. I don't think that symbol complexity is a "low-hanging fruit" for slow entry points -- instead of this I'd suggest investigating the heuristics related to inlining and inlined function size. However this is far off-topic -- I'll try to eventually start a discourse thread about it once I clarify my suspicions.

To me, encountering symbols with complexity over the dedicated max symbol complexity threshold is a bug.

I don't agree -- very complex symbols naturally appear as we define the symbols with the straightforward recursive definition that we use, and they correspond to well-formed expressions. You can say that symbols over a certain complexity threshold are so rare that the analyzer can arbitrarily discard them, and this may (or may not) be a useful heuristic -- but still the existence of the complex symbols is the "natural" bug-free state and suppressing them is the inaccurate behavior which complicates the situation.

Can you think of other ways to ensure we never create overly complicated symbols?

I'm not convinced that we need to ensure that we never create overly complicated symbols. I feel that this is a "cure is worse than the disease" situation -- this patch introduces a significant amount of code complexity for modest gains in performance.

However, if the statistical analysis confirms that this is an useful direction, I'd suggest eliminating the complex symbols (more precisely, the states that contain them) in a lazy fashion: put a boolean tag on the state when a complex symbol is stored in it and prevent further exploration from exploded nodes that contain a tagged state. This way this performance optimization hack could be limited to the engine, and the checkers wouldn't need to know about it at all.

@balazs-benics-sonarsource
Copy link
Contributor Author

How I read this you have mainly 2 concerns:

  1. The use of this strong-type makes it tedious the existing APIs to use because one needs to unwrap the value and frequently make an early-return to explicitly handle the case when a symbol-creation failed?

Yes, this is roughly what I mean.

The purpose of the strong type is to reject the code from compiling when not unwrapped, so I don't have correctness concerns, rather the opposite, I'm feeling safe about correctness. Speaking of the sprinkle of early returns, well, in C++ we don't really have monad support, so this is the best that I can think of. I'd consider other alternatives if you have any in mind. Otherwise, I proposed the best I have, so I don't know how to proceed on this point.

I'm very likely biased, but I find this code simple. Any operation that combines or otherwise "complicates" a symbol, can fail if it would hit the max complexity. So, in essence, in this refactor we would just use some optional type for representing the failure.
Some pieces of code want to handle such complexity failures differently, so the differently named getters will document the intentional choices - this is why a dedicated strong-type is better than just having a std::optional.

In addition to the tediousness of writing the unwrap + early return boilerplate I also fear that these early return branches would be difficult to keep in mind and difficult to cover with tests, so they will act as a persistent source of bugs.

I find it unnecessary to cover all possible combination of when we can reach max symbol complexity. As you commented, there are many many early-return scenarios, but they are also share a common structure, allowing us to structurally reason about them.

3. The measurements weren't conclusive. There was no evidence provided that on the usual cases (all entry points) the RT would not regress. It was also not fair to look at only the longest 500 entry points to evaluate the effectiveness of limiting the max symbol complexity (in other words, honoring the max symbol complexity limit).

I don't claim that there was no evidence at all, but I feel that it wasn't significant enough.

If I recall, there was no measurable difference at all. This was expected, as this patch would not change the common case, when the symbols are not getting complicated. Remember, sometimes the max complexity threshold was already obeyed (hence we had that threshold), but in other edge cases it was not, for the case in the test. Judging the patch, I also don't see any particular reason why I should be more alert for RT regression, if we only avoid work by honoring this threshold - other than the couple of if statements for the early return cases.

This additional information reduces my fears that the measured runtime difference is just environmental noise. However, extreme outliers (and I'd say that top 500 out of 3'000'000 or even just 390'000 is extreme outlier) are still a very treacherous ground for statistical conclusions (they can amplify small noises to a surprising degree), so I would like to see a more comprehensive statistical analysis. If you can share the raw data (the 3 million {old runtime, new runtime} pairs), I can do these statistical calculations myself.

I should have explained why I picked the top 500, indeed. I had a look at the distribution, and there was a significant increase in the trend. -- I managed to find the script generating the plot along with the data used, but the data contains sensitive projects intermixed.
From what I can see there I picked the top 500 because if we exclude those, we end up with 14 seconds of analysis time max per entry point - according to my notes.

According to our data, usually an entry point should finish in about 1 second if not less. Above that suggests something to look at.

Fair, although I'd emphasize that they are not "must be eliminated" bugs, but just "something to look at" -- which may or may not lead to an improvement. I don't think that symbol complexity is a "low-hanging fruit" for slow entry points -- instead of this I'd suggest investigating the heuristics related to inlining and inlined function size. However this is far off-topic -- I'll try to eventually start a discourse thread about it once I clarify my suspicions.

A predictable upper bound is really important to scale the analysis and to bound the total analysis time of a TU.
This is just a pathological case when the analyzer behaves exceptionally poorly. From the investigated cases, I can share that the slowest entry points are caused by doing lookups in the environment and spend significant time in the remove dead. Refer to the unsuccessful attempts to optimizing that in the relevant blog post.
However, the second reason why we had slow entry points was this one, although a lot less often impacting the entry points. You can see that not a lot of the slowest entry points are improved by a lot, quote:

Out of the worst 500 entry points, 45 were improved by at least 10%. Out of these 45, 5 were improved by more than 50%. Out of these 45, 2 were improved by more than 80%.

So, maybe a better way to look at this patch is by emphasizing less the RT improvements of the pathological few cases, but rather just enforcing an invariant about max symbol complexity. This patch is not a "low-hanging fruit" for improving the averages or the 99% of the cases. This is about the 45 cases out of 3'000'000, and the unlucky users who happen to have similar code to those 45 cases.

I agree that more intrusive changes in the heuristics offer much more room for potential improvement, but also at an increased perturbation in the results. I'm genuinely surprised that a patch like this that did not change any reports (beyond the usual noise levels), nor the average or the 99th percentile RT, brings an enforcement of an invariant and on top of all this also fixes a tiny fraction of pathological cases.

Maybe my tactic should have been to emphasize the invariant aspects more than the RT improvements in hind sight.

To me, encountering symbols with complexity over the dedicated max symbol complexity threshold is a bug.

I don't agree -- very complex symbols naturally appear as we define the symbols with the straightforward recursive definition that we use, and they correspond to well-formed expressions. You can say that symbols over a certain complexity threshold are so rare that the analyzer can arbitrarily discard them, and this may (or may not) be a useful heuristic -- but still the existence of the complex symbols is the "natural" bug-free state and suppressing them is the inaccurate behavior which complicates the situation.

Let me share what happens when the threshold is hit. I think this may have caused the misunderstanding. Having UnknownVal poisons a computation, and this patch may suggest at first glance that we issue UnknownVal a lot more often.

However, if you look carefully of the test provided, you can see that there is no Unknown. It uses a fresh conjured symbol as a substitute for the Unknown that the SValBuilder would create. If you look closer, you can see that the rest of the computation goes on and on, and each time the max complexity is reached, substitutes that with a new conjured symbol - keeping the effective complexity small, without compromising expressiveness of the history of the symbol. One way to look at this is by judging how likely it is that we could reuse some information from the complicated SymExpr to conclude some range facts (when simplifying). The argument is that it's very unlikely, and even harder to explain later how did we reach that conclusion for a symbol range. The engine does this Unknown to conjured substitutions at different places (I can't recall where exactly), but one instance is when trying to bind Unknown to a variable, it will just bind a fresh conjured instead - to recover future precision, without poisoning subsequent operations. So in short, if we return Unknown here, that actually always materializes a conjured symbol. As I'm writing this, I strongly believe that this was the source of confusion and push back against this patch.

Can you think of other ways to ensure we never create overly complicated symbols?

I'm not convinced that we need to ensure that we never create overly complicated symbols. I feel that this is a "cure is worse than the disease" situation -- this patch introduces a significant amount of code complexity for modest gains in performance.

By reading this, I have the impression you argue against having such a threshold. While I can't say why that threshold was introduced, I argue that they had motivation. Right now, we have this threshold, but sometimes we still have overly complicated symbols. This is objectively not a good place to be in. If we didn't have this threshold, indeed, this code would make little sense.
However, assuming that we need this threshold, this code is necessary to obey this threshold in a systematic way.
If you check out the code, and run the hugelyOverComplicatedSymbol() test, it ran 14 seconds on my system, and produced a symbol with 800 complexity. The problem there is not that it's large, but rather that its unbounded and for no particularly good reason or prospect for future use.

However, if the statistical analysis confirms that this is an useful direction, I'd suggest eliminating the complex symbols (more precisely, the states that contain them) in a lazy fashion: put a boolean tag on the state when a complex symbol is stored in it and prevent further exploration from exploded nodes that contain a tagged state.

I'll possibly come back to this point, but not part of this response.

This way this performance optimization hack could be limited to the engine, and the checkers wouldn't need to know about it at all.

Checkers wouldn't need to know about this at all. They should know that sometimes an evalBinOp returns Unknown. This was previously the contract there, and would remain the case. The contract of makeNonLoc APIs was relaxed part of this patch, but that API is not used by checkers, nor should be.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 17, 2025

SUMMARY:

My primary concern is the readability of the refactored code -- for details see the inline answers. (My secondary concern is that I don't entirely trust the statistics, but even if I take your conclusions at face value, I don't see why they justify the added complexity.)

I didn't even think about the issue that returning Unknown "poisons the computation", so it wasn't a factor in my pushback.

You had good points in several areas (questions about correctness and testing, no overall RT regression, no impact on checkers).


FULL REPLY:

How I read this you have mainly 2 concerns:

  1. The use of this strong-type makes it tedious the existing APIs to use because one needs to unwrap the value and frequently make an early-return to explicitly handle the case when a symbol-creation failed?

Yes, this is roughly what I mean.

The purpose of the strong type is to reject the code from compiling when not unwrapped, so I don't have correctness concerns, rather the opposite, I'm feeling safe about correctness.

I did a more through re-read of the patch and it reduced my worries about correctness issues.

Speaking of the sprinkle of early returns, well, in C++ we don't really have monad support, so this is the best that I can think of. I'd consider other alternatives if you have any in mind. Otherwise, I proposed the best I have, so I don't know how to proceed on this point.

Unfortunately, my re-read solidified my impression that the additional verbosity of the wrap-unwrap methods and early return branches severely hinders the readability of the refactored methods – which are already very complicated without these additional details. (The main issue is complexity, not the raw number of tokens -- so the situation wouldn't be much better in a language like Haskell that properly supports monads.)

Unfortunately I cannot suggest a more readable implementation, but I still feel that doing nothing would be better than merging this commit, because the limited benefits (improvements in 0.0015% of the entry points, no observable change of overall runtime) do not justify the added code complexity. (If this commit was the only possible fix for a commonly occurring crash, I'd accept it.)

I'm very likely biased, but I find this code simple.

You are biased :) -- which is natural as the author of the source code.

Any operation that combines or otherwise "complicates" a symbol, can fail if it would hit the max complexity. So, in essence, in this refactor we would just use some optional type for representing the failure.

Your implementation is a relatively simple natural solution for the goal that symbol complexity should be limited and it is reasonably readable when presented as a diff. However, you are injecting this new goal (a single "thread") into complex methods that already fulfill many other goals ("big Gordian knot of several other threads") and the difficulty of reading code increases exponentially with the number of intermixed goals (even if each of them is simple in isolation).

As I am reading the source code, I need to resolve each name that appears (jump to definition – "oh, this is just a wrapper for that") and mentally separate all the different goals ("these early returns handle nullptrs", "here we check whether swapping the operands help", "this handles an obscure corner case for ObjC", "this limits symbol complexity"), and adding yet another goal significantly complicates this, even if that goal is simple in isolation / when it is highlighted for review as a diff. As return statements interrupt the control flow, they are especially taxing for the reader – to comprehend the behavior of a function, all return statements must be simultaneously juggled in the mind.

Some pieces of code want to handle such complexity failures differently, so the differently named getters will document the intentional choices - this is why a dedicated strong-type is better than just having a std::optional.

I agree that using this strong type is (at least slightly) better than std::optional – even if std::optional has the big advantage that most readers have its definition "cached in their mind", while your custom strong type will need a few "jump to definition" calls for any code that uses it.

In addition to the tediousness of writing the unwrap + early return boilerplate I also fear that these early return branches would be difficult to keep in mind and difficult to cover with tests, so they will act as a persistent source of bugs.

I find it unnecessary to cover all possible combination of when we can reach max symbol complexity. As you commented, there are many many early-return scenarios, but they are also share a common structure, allowing us to structurally reason about them.

Ok, perhaps you're right. I don't see why the test coverage is sufficient, but I believe you if you say so.

  1. The measurements weren't conclusive. There was no evidence provided that on the usual cases (all entry points) the RT would not regress. It was also not fair to look at only the longest 500 entry points to evaluate the effectiveness of limiting the max symbol complexity (in other words, honoring the max symbol complexity limit).

I don't claim that there was no evidence at all, but I feel that it wasn't significant enough.

If I recall, there was no measurable difference at all. This was expected, as this patch would not change the common case, when the symbols are not getting complicated. Remember, sometimes the max complexity threshold was already obeyed (hence we had that threshold), but in other edge cases it was not, for the case in the test. Judging the patch, I also don't see any particular reason why I should be more alert for RT regression, if we only avoid work by honoring this threshold - other than the couple of if statements for the early return cases.

I didn't feel that overall RT regression was too likely -- although the runtime of the analyzer is very chaotic, so I feel there is a low baseline chance of RT regression for almost every commit. (Since the regressions caused by my "don't assume third iteration" commit I'm a bit paranoid...)

This additional information reduces my fears that the measured runtime difference is just environmental noise. However, extreme outliers (and I'd say that top 500 out of 3'000'000 or even just 390'000 is extreme outlier) are still a very treacherous ground for statistical conclusions (they can amplify small noises to a surprising degree), so I would like to see a more comprehensive statistical analysis. If you can share the raw data (the 3 million {old runtime, new runtime} pairs), I can do these statistical calculations myself.

I should have explained why I picked the top 500, indeed. I had a look at the distribution, and there was a significant increase in the trend.

Ok, I see – but this doesn't change that statistical calculations based on outliers are very shaky. It is possible to rigorously validate an effect like "this change has negligible effect overall, but reduces runtime for a subclass of outliers", but it would require significantly more complex statistical tools than what you used.

-- I managed to find the script generating the plot along with the data used, but the data contains sensitive projects intermixed. From what I can see there I picked the top 500 because if we exclude those, we end up with 14 seconds of analysis time max per entry point - according to my notes.

Do you think that the runtime values are sensitive even if you strip the names of the entry points and just share a big set of pairs of numbers? I would say that it's mathematically impossible to divine meaningful properties from the set of analysis runtime values, especially if you mix together many software projects.

According to our data, usually an entry point should finish in about 1 second if not less. Above that suggests something to look at.

Fair, although I'd emphasize that they are not "must be eliminated" bugs, but just "something to look at" -- which may or may not lead to an improvement. [...]

A predictable upper bound is really important to scale the analysis and to bound the total analysis time of a TU.

I don't agree with this -- I think it's perfectly possible to bound the total analysis runtime for all real-world code without spending extraordinary efforts and code complexity on policing outliers whose total runtime is negligible.

[...] So, maybe a better way to look at this patch is by emphasizing less the RT improvements of the pathological few cases, but rather just enforcing an invariant about max symbol complexity.

Why would you want to enforce this invariant if not for the RT improvements that it provides?

This patch is not a "low-hanging fruit" for improving the averages or the 99% of the cases. This is about the 45 cases out of 3'000'000, and the unlucky users who happen to have similar code to those 45 cases.

Are those unlucky users really "unlucky" in a meaningful way? Even if analyzing that entry point takes half a minute instead of a second, it's still a negligible time compared to the analysis of the full project. (Are you perhaps thinking about integrating the analyzer into a code editor, where quick re-analysis after edits makes the runtime more relevant?)

I agree that more intrusive changes in the heuristics offer much more room for potential improvement, but also at an increased perturbation in the results. I'm genuinely surprised that a patch like this that did not change any reports (beyond the usual noise levels), nor the average or the 99th percentile RT, brings an enforcement of an invariant and on top of all this also fixes a tiny fraction of pathological cases.

Maybe my tactic should have been to emphasize the invariant aspects more than the RT improvements in hind sight.

To me, encountering symbols with complexity over the dedicated max symbol complexity threshold is a bug.

I don't agree -- very complex symbols naturally appear as we define the symbols with the straightforward recursive definition that we use, and they correspond to well-formed expressions. You can say that symbols over a certain complexity threshold are so rare that the analyzer can arbitrarily discard them, and this may (or may not) be a useful heuristic -- but still the existence of the complex symbols is the "natural" bug-free state and suppressing them is the inaccurate behavior which complicates the situation.

Let me share what happens when the threshold is hit. I think this may have caused the misunderstanding. Having UnknownVal poisons a computation, and this patch may suggest at first glance that we issue UnknownVal a lot more often.

However, if you look carefully of the test provided, you can see that there is no Unknown. It uses a fresh conjured symbol as a substitute for the Unknown that the SValBuilder would create. If you look closer, you can see that the rest of the computation goes on and on, and each time the max complexity is reached, substitutes that with a new conjured symbol - keeping the effective complexity small, without compromising expressiveness of the history of the symbol. One way to look at this is by judging how likely it is that we could reuse some information from the complicated SymExpr to conclude some range facts (when simplifying). The argument is that it's very unlikely, and even harder to explain later how did we reach that conclusion for a symbol range. The engine does this Unknown to conjured substitutions at different places (I can't recall where exactly), but one instance is when trying to bind Unknown to a variable, it will just bind a fresh conjured instead - to recover future precision, without poisoning subsequent operations. So in short, if we return Unknown here, that actually always materializes a conjured symbol. As I'm writing this, I strongly believe that this was the source of confusion and push back against this patch.

No, this is not the source of the push back, I didn't even think about this question. My primary concern is the loss of source code readability (which will slow down any further developer who needs to understand or edit those areas) and the secondary concern is that I don't trust the statistical conclusions. (However, I'd suggest not merging this PR even if it did produce all the runtime improvements that you claim based on your statistics.)

Can you think of other ways to ensure we never create overly complicated symbols?

I'm not convinced that we need to ensure that we never create overly complicated symbols. I feel that this is a "cure is worse than the disease" situation -- this patch introduces a significant amount of code complexity for modest gains in performance.

By reading this, I have the impression you argue against having such a threshold. While I can't say why that threshold was introduced, I argue that they had motivation.

If the threshold and its current partial enforcement was introduced with a good cause (i.e. to avoid a crash or to avoid a slowdown that's visible on full projects), then I'm supporting its existence.

Right now, we have this threshold, but sometimes we still have overly complicated symbols. This is objectively not a good place to be in. If we didn't have this threshold, indeed, this code would make little sense. However, assuming that we need this threshold, this code is necessary to obey this threshold in a systematic way.

I don't support systematically enforcing the threshold if it complicates our code base and doesn't provide meaningful benefits for the user. This threshold is just heuristic limitation of the engine, so I don't see significant difference in elegance between the status quo (partial enforcement IIUC?) and full enforcement.

If you check out the code, and run the hugelyOverComplicatedSymbol() test, it ran 14 seconds on my system, and produced a symbol with 800 complexity. The problem there is not that it's large, but rather that its unbounded and for no particularly good reason or prospect for future use.

The static analyzer doesn't guarantee acceptable runtimes on artifically crafted code -- I'd guess that it's not too difficult to craft many analogous examples that target various parts of the analyzer code. Unless it surfaces in real code with meaningful frequency, I don't think that it justifies complex code changes.

However, if the statistical analysis confirms that this is an useful direction, I'd suggest eliminating the complex symbols (more precisely, the states that contain them) in a lazy fashion: put a boolean tag on the state when a complex symbol is stored in it and prevent further exploration from exploded nodes that contain a tagged state.

I'll possibly come back to this point, but not part of this response.

Rereading my suggestion, I'm less confident about it – don't bother with revisiting it unless you find it especially intriguing.

This way this performance optimization hack could be limited to the engine, and the checkers wouldn't need to know about it at all.

Checkers wouldn't need to know about this at all. They should know that sometimes an evalBinOp returns Unknown. This was previously the contract there, and would remain the case. The contract of makeNonLoc APIs was relaxed part of this patch, but that API is not used by checkers, nor should be.

Good point, I misunderstood this. Sorry!

@balazs-benics-sonarsource
Copy link
Contributor Author

I was thinking about possible ways to unblock this change.

If the additional code complexity needs justification, by measuring the average impact (to ensure no regression happens in the common cases), and by repeating the measurement of the handful of edge-cases where it should bring noticeable (10%+) improvement. My estimate of these efforts would be a couple of days of work. I probably can't afford to spend this time.

It's hard to say, but there are two other options I'm contemplating:

  • ask for second opinions on Discuss (or here by CC-ing the other maintainers), or
  • abandon the patch (for now, possibly indefinitely).

But I don't really like either of these options.

I'm open for suggestions about how to proceed with this patch.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 23, 2025

I was thinking about possible ways to unblock this change.

If the additional code complexity needs justification, by measuring the average impact (to ensure no regression happens in the common cases), and by repeating the measurement of the handful of edge-cases where it should bring noticeable (10%+) improvement. My estimate of these efforts would be a couple of days of work. I probably can't afford to spend this time.

I don't think that these additional measurements would be worth the effort. The measurement quality was my chronologically first concern, but it's not the crux of the matter.

  • If I understand the situation correctly you don't claim that this patch would provide any benefit that can be detected by users who analyze a complete real-world software project (and not just individual outlier TUs).
  • I find it plausible that this PR indeed improves the runtime of a handful of edge-cases and I don't suspect that it would increase the overall runtime, but I don't think that the benefits which are only visible on isolated outlier entry points (or artificial test cases) justify the additional code complexity.

My personal opinion is that the best decision would be abandoning this PR (costs outweigh the benefits), but this is an inherently subjective judgement, so if you are convinced that merging this PR would be better, then asking other contributors is the right way forward (and I won't block the decision if others agree with it).

@balazs-benics-sonarsource
Copy link
Contributor Author

I was thinking about possible ways to unblock this change.
If the additional code complexity needs justification, by measuring the average impact (to ensure no regression happens in the common cases), and by repeating the measurement of the handful of edge-cases where it should bring noticeable (10%+) improvement. My estimate of these efforts would be a couple of days of work. I probably can't afford to spend this time.

I don't think that these additional measurements would be worth the effort. The measurement quality was my chronologically first concern, but it's not the crux of the matter.

  • If I understand the situation correctly you don't claim that this patch would provide any benefit that can be detected by users who analyze a complete real-world software project (and not just individual outlier TUs).
  • I find it plausible that this PR indeed improves the runtime of a handful of edge-cases and I don't suspect that it would increase the overall runtime, but I don't think that the benefits which are only visible on isolated outlier entry points (or artificial test cases) justify the additional code complexity.

I'd say yes to both, while pointing out again that the handful of cases where the improvement can be observed are also real-world code, but rare code.

My personal opinion is that the best decision would be abandoning this PR (costs outweigh the benefits), but this is an inherently subjective judgement, so if you are convinced that merging this PR would be better, then asking other contributors is the right way forward (and I won't block the decision if others agree with it).

Thank you for your flexibility. I still believe that obeying max symbol complexity on its own brings value even without knowing that sometimes it also eliminates long running edge cases. I believe it justifies for the added complexity, but I'm very open about refining the patch in other directions that would also ensure max symbol complexity.

/cc @Xazax-hun @haoNoQ for second opinion.

Copy link
Collaborator

@haoNoQ haoNoQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't processed your entire discussion yet but at a glance I like this change. Since our analysis is fundamentally explosive and we're solving it with budgeting, we are cursed with an endless fight against the explosive cornercases.

Even if the overall analysis time of the entire project or an entire product or an entire OS doesn't go down significantly, it is still important to keep the worst-case scenario under control. This benefits the users who run analysis on a smaller scale (eg. at their desk, over the project they're actively developing) for whom spending 90 seconds on a single entry point could easily become an insurmountable burden.

So as long as we aren't introducing any major regressions on individual entry points (which can be detected by running the original statistic in reverse, as @NagyDonat pointed out), I think this patch is a huge net positive even if it doesn't affect the overall analysis time, so we should consider it quite seriously.

I also have an idea for dealing with the increased API clumsiness - see the inline comment.

Comment on lines 320 to 334
DefinedOrUnknownSVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
APSIntPtr rhs, QualType type);

nonloc::SymbolVal makeNonLoc(APSIntPtr rhs, BinaryOperator::Opcode op,
const SymExpr *lhs, QualType type);
DefinedOrUnknownSVal makeNonLoc(APSIntPtr rhs, BinaryOperator::Opcode op,
const SymExpr *lhs, QualType type);

nonloc::SymbolVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
const SymExpr *rhs, QualType type);
DefinedOrUnknownSVal makeNonLoc(const SymExpr *lhs, BinaryOperator::Opcode op,
const SymExpr *rhs, QualType type);

NonLoc makeNonLoc(const SymExpr *operand, UnaryOperator::Opcode op,
QualType type);
DefinedOrUnknownSVal makeNonLoc(const SymExpr *operand,
UnaryOperator::Opcode op, QualType type);

/// Create a NonLoc value for cast.
nonloc::SymbolVal makeNonLoc(const SymExpr *operand, QualType fromTy,
QualType toTy);
DefinedOrUnknownSVal makeNonLoc(const SymExpr *operand, QualType fromTy,
QualType toTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown values will be eventually converted into SymbolConjured anyway, as soon as they make it into the Environment. In this sense, spawning new Unknown values is basically never a good idea. I'm perfectly ok with working towards removing them as a concept.

So with that in mind, what do you folks think about returning a fresh atomic symbol instead of the Unknown?

Eg., make a new symbol class SymbolTooComplex (name TBD) whose only field is the pointer to the symbolic expression that would have been constructed if we didn't reach the complexity limit. But you're not supposed to access that pointer while folding arithmetic, it's only there for deduplication purposes. In other words, every time we evaluate the same operation and hit the same complexity limit, we'd get the same symbol.

This would probably avoid a lot of clumsiness in these high-level APIs (they will continue to make a NonLoc as promised), as well as improve deduplication of symbols instead of worsening it.

(FWIW makeNonLoc() isn't a fantastic name. These could be five different names.)

(The caller would still need to avoid hard-casting the value back to SymIntExpr or something like that. But why would anybody want to do that when they already have the raw parts?)

Copy link

github-actions bot commented Jun 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haoNoQ for this great idea and @balazs-benics-sonarsource for the quick implementation!

I'm completely satisfied with the current direction of development on this PR -- using this fresh symbol type instead of Unknowns and nullopts and early returns addresses my concerns about code complexity issues.

It's probably prudent to do a bit of testing to avoid runtime regressions, but I don't expect any trouble on that front.

@balazs-benics-sonarsource
Copy link
Contributor Author

Once we are settled on the implementation here, I'll split the classof refactors from this PR and merge it before merging this one.

Should I measure the perf of this change?

@necto
Copy link
Contributor

necto commented Jun 24, 2025

Should I measure the perf of this change?

I think so, because the patch is substantially different. For example, the overly complex symbol tree is now preserved, even if it is apparently not traversed in its entirety any longer. That makes it not obvious if the initial performance gains are still there.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience in this review! I'm happy to see the additional simplifications coming from just wrapping the over-complicated symbol instead of storing its parts separately. (I see that it's a bit inelegant to construct the very thing that we're trying to avoid, but this is the pragmatic choice...)

I don't expect any performance issues (the "frozen" complex symbols shouldn't be problematic if they aren't traversed), but performance testing is never completely useless.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if there are no surprising performance regressions.

@balazs-benics-sonarsource
Copy link
Contributor Author

Should I measure the perf of this change?

I think so, because the patch is substantially different. For example, the overly complex symbol tree is now preserved, even if it is apparently not traversed in its entirety any longer. That makes it not obvious if the initial performance gains are still there.

Agreed.

Thanks for your patience in this review! I'm happy to see the additional simplifications coming from just wrapping the over-complicated symbol instead of storing its parts separately. (I see that it's a bit inelegant to construct the very thing that we're trying to avoid, but this is the pragmatic choice...)
I don't expect any performance issues (the "frozen" complex symbols shouldn't be problematic if they aren't traversed), but performance testing is never completely useless.

I'll do a usual regression test, but not anything as detailed as the evaluation in the PR summary presents.
The test case sufficiently demonstrates that this enforcement works as intended. I also had a look at the time trace jsons of the test with the new way of this enforcement and the originally proposed variant. They look identical to the naked eye.
I also ran the test with the two versions and they run the same way according to hyperfine.

Nevertheless, I'll do a usual regression test.

@NagyDonat
Copy link
Contributor

The usual regression test is completely sufficient.

@balazs-benics-sonarsource balazs-benics-sonarsource force-pushed the bb/enforce-max-sym-complexity branch from bb82e63 to 8b90b3f Compare June 25, 2025 07:04
@balazs-benics-sonarsource balazs-benics-sonarsource force-pushed the bb/enforce-max-sym-complexity branch from 8b90b3f to bc7dfc2 Compare June 25, 2025 07:23
@balazs-benics-sonarsource
Copy link
Contributor Author

Just rebased the PR to exclude the refactor change of classof. I'll schedule a measurement now.

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
This should enable more powerful type metaprograms.

Split from llvm#144327
@balazs-benics-sonarsource
Copy link
Contributor Author

Backporting this updated version to clang-19 was not easy but it allowed to verify this PR.
Unfortunately, there is a bug in this updated version, that I'll explain.
Even after I fixed this bug, however, on a sample it ran about 19.36% slower under analyzing hashing-heavy real-world code.

Example hashing-heavy real-world code from hyperscan:state_compress.c:storecompressed512_64bit
typedef unsigned long long __attribute__((aligned((8)))) u64a;

u64a compress64(u64a x, u64a m) {
  if ((x & m) == 0)
      return 0;
  x &= m;
  u64a mk = ~m << 1;
  for (unsigned i = 0; i < 6; i++) {
    u64a mp = mk ^ (mk << 1);
    mp ^= mp << 2;
    mp ^= mp << 4;
    mp ^= mp << 8;
    mp ^= mp << 16;
    mp ^= mp << 32;
    u64a mv = mp & m;
    m = (m ^ mv) | (mv >> (1 << i));
    u64a t = x & mv;
    x = (x ^ t) | (t >> (1 << i));
    mk = mk & ~mp;
  }
  return x;
}

void storecompressed512_64bit(u64a *m, u64a *x) {
  u64a v[8] = {
    compress64(x[0], m[0]),
    compress64(x[1], m[1]),
    compress64(x[2], m[2]),
    compress64(x[3], m[3]),
    compress64(x[4], m[4]),
    compress64(x[5], m[5]),
    compress64(x[6], m[6]),
    compress64(x[7], m[7]),
  };
  (void)v;
}

The bug was that we inserted the T into the expected InsertPos corresponding to the Profile of T - so far so good, but instead of returning that, we checked if this is "overly complicated", and if so, we created the new SymbolOverlyComplex symbol and returned that instead. However, when the next time we get to call acquired<T>, we would simply get a successful lookup for the Profile of T and return the symbol we actually wanted to hide! instead of returning the SymbolOverlyComplex that we returned in the previous acquire<T> call. This took a while to debug.
My solution to this is pretty complicated and ugly. In short, check the complexity of the symbol before we do insertion/lookup.
If the complexity of this symbol would be below the threshold, then find or insert just like we did before.
Otherwise, we need to look up the wrapped symbol first, then check if we have a SymbolOverlyComplex with that wrapped symbol in the map. If there is, just return that one, otherwise we create it.

As you can tell, this is pretty convoluted. And its still slow. On the attached test case, I observed 19.36% slowdown (went from 1.503s to 1.794s).
So I have 2 ways of resolving this:

  1. Continue the investigation about why this implementation is slow.
  2. Revert back to the variant I originally proposed when I opened this PR.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 27, 2025

The bug was that we inserted the T into the expected InsertPos corresponding to the Profile of T - so far so good, but instead of returning that, we checked if this is "overly complicated", and if so, we created the new SymbolOverlyComplex symbol and returned that instead. However, when the next time we get to call acquired<T>, we would simply get a successful lookup for the Profile of T and return the symbol we actually wanted to hide! instead of returning the SymbolOverlyComplex that we returned in the previous acquire<T> call. [...]

Ugh, the stateful nature of these hashtable data structures is evil...

Otherwise, we need to look up the wrapped symbol first, then check if we have a SymbolOverlyComplex with that wrapped symbol in the map. If there is, just return that one, otherwise we create it.

I have to admit that I don't understand the role and meaning of "the wrapped symbol" in your code.


However, I think I see another, less complex approach for "fixing" the method acquire: instead of creating the Dummy symbol and calculating its complexity early, just modify the original state of the method (i.e. the parent revision of 8c29bbc ) by moving the block

if (SD->complexity() > MaxCompComplexity) {
  return cast<Ret>(acquire<SymbolOverlyComplex>(SD));
}

out of the if (!SD) block (placing it directly after the if (!SD) block).

This would guarantee that:

  • we always return SymbolOverlyComplex if the expression is too complex (even if there is a cache hit);
  • but even these overly complex symbols (which only live "inside" the SymbolOverlyComplex block) enjoy the benefit of caching and aren't recreated repeatedly.

I hope that this alternative approach could prevent the performance hit that you observed.

…of allocating it

This would bring 3 benefits:
 - Now a single `acquire<T>` call only inserts a single item into the
   `DataSet`.
 - Simplifies the logic slightly.
 - We have the invariant that the cache (`DataSet`) only ever contains
   symbols that obey the max complexity threshold.

Unfortunately, this performs just as bad like before this commit.
In other words, this did't improve anything but I'll leave it here for
history.
@balazs-benics-sonarsource
Copy link
Contributor Author

balazs-benics-sonarsource commented Jun 27, 2025

However, I think I see another, less complex approach for "fixing" the method acquire: instead of creating the Dummy symbol and calculating its complexity early, just modify the original state of the method (i.e. the parent revision of 8c29bbc ) by moving the block

if (SD->complexity() > MaxCompComplexity) {
  return cast<Ret>(acquire<SymbolOverlyComplex>(SD));
}

out of the if (!SD) block (placing it directly after the if (!SD) block).

This would guarantee that:

  • we always return SymbolOverlyComplex if the expression is too complex (even if there is a cache hit);
  • but even these overly complex symbols (which only live "inside" the SymbolOverlyComplex block) enjoy the benefit of caching and aren't recreated repeatedly.

You are right. Indeed this is a much simpler solution, IDK how I overlooked it xD

I hope that this alternative approach could prevent the performance hit that you observed.

No, the ~20% regression is still there. And now as I'm thinking about this a bit more it makes sense. Here is the flame graph that I get no matter what I do with this patch. It will make sense.
image

We see a lot of simplifySValOnce that decomposes the symbol and traverses the tree. With my original patch it's not present likely because the Unknowns poison the the tree so there is nothing to be traversed. This is why with my original patch we actually improve the performance of the sample rather than pessimising it.

How I look at this is that returning Unknown is the right thing to do.

What I do not yet understand how the baseline does not have these simplifySValOnce traversals in its flamegraph, but I'm not sure it's important if the observable report differences with the solution that returns Unknown is practically none.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 27, 2025

I'm still convinced that the solution that returns UnknownVals is unacceptable from a code readability and maintainability perspective. (The code complexity is already scary for new contributors, we don't want to block the road behind ourselves.) In my opinion adding so much complexity would be only acceptable if it was the only way to fix regularly occurring crash, and enforcing this invariant is not worth the change.

Of course, I understand that this is my personal opinion and can accept if other members of the community overrule it, but I'm strongly against it.

@NagyDonat NagyDonat closed this Jun 27, 2025
@NagyDonat
Copy link
Contributor

Closing this ticket was a misclick, sorry.

@NagyDonat NagyDonat reopened this Jun 27, 2025
@NagyDonat
Copy link
Contributor

We see a lot of simplifySValOnce that decomposes the symbol and traverses the tree. With my original patch it's not present likely because the Unknowns poison the the tree so there is nothing to be traversed. This is why with my original patch we actually improve the performance of the sample rather than pessimising it.

Oh wait a minute... which symbol is entered by the simplifySValOnce calls? I'm pretty sure that the simplification process cannot enter the complex structure in the SymbolOverlyComplex because its internal structure is private implementation detail -- so the only difference between the SymbolOverlyComplex and the UnknownVal is that evalBinOp (the parent stack frame in your flame graph) special cases UnknownVal:

SVal SValBuilder::evalBinOp(ProgramStateRef state, BinaryOperator::Opcode op,
                            SVal lhs, SVal rhs, QualType type) {             
  if (lhs.isUndef() || rhs.isUndef())                                        
    return UndefinedVal();                                                   
                                                                             
  if (lhs.isUnknown() || rhs.isUnknown())                                    
    return UnknownVal();
  // ... nontrivial part of the function
}

The parts of the flamegraph that you highlight are presumably coming from the other operand whose simplification is skipped due to the presence of the UnknownVal.

If you really want to bother with performance improvements that specifically target this 0.05% of the entrypoints, then you can insert one more early return here at the beginning of evalBinOp to skip some calculations if you encounter a SymbolOverlyComplex.

@balazs-benics-sonarsource
Copy link
Contributor Author

The parts of the flamegraph that you highlight are presumably coming from the other operand whose simplification is skipped due to the presence of the UnknownVal.

Yes, this is my theory, but this still wouldn't explain why I didn't see these simplifys on the baseline flamegraph.

If you really want to bother with performance improvements that specifically target this 0.05% of the entrypoints, then you can insert one more early return here at the beginning of evalBinOp to skip some calculations if you encounter a SymbolOverlyComplex.

This is the point. I don't think I can special case these because the computations still make sense to do.
So I'm concerned about adding something like this:

diff --git a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -492,6 +492,10 @@ SVal SValBuilder::evalBinOp(ProgramStateRef state, BinaryOperator::Opcode op,
   if (lhs.isUnknown() || rhs.isUnknown())
     return UnknownVal();
 
+  if (isa_and_nonnull<SymbolOverlyComplex>(lhs.getAsSymbol()) ||
+      isa_and_nonnull<SymbolOverlyComplex>(rhs.getAsSymbol()))
+    return UnknownVal();
+
   if (isa<nonloc::LazyCompoundVal>(lhs) || isa<nonloc::LazyCompoundVal>(rhs)) {
     return UnknownVal();
   }

Btw this would solve the performance problem (at least on the sample I shared), and it's technically a correct implementation, but I still find it unfair. As far as combining SymbolOverlyComplex with any other symbol, it should behave just like any other SymbolData.

Let me know if you means something like this when you referred to "you can insert one more early return".

@NagyDonat
Copy link
Contributor

Let me know if you means something like this when you referred to "you can insert one more early return".

Yes, I thought about something like your code example.

If you really want to bother with performance improvements that specifically target this 0.05% of the entrypoints, then you can insert one more early return here at the beginning of evalBinOp to skip some calculations if you encounter a SymbolOverlyComplex.

This is the point. I don't think I can special case these because the computations still make sense to do.
[...]
Btw this would solve the performance problem (at least on the sample I shared), and it's technically a correct implementation, but I still find it unfair.

The whole "place a limit on symbol complexity" logic is a fundamentally unfair heuristic that stops some calculations that "make sense" and are theoretically similar to other calculations.

Generally speaking, I'm trying to follow the dichotomy that:

  • The normal, commonly occurring situation should be handled by effective and elegant code that runs quickly and produces many useful and accurate results -- even if this requires lots of code complexity.
  • The rare corner cases should be short-circuited by simple logic, sacrificing theoretical elegance and accepting false negatives. (Even some performance hits are acceptable if they are overall negligible.)

@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 27, 2025

If I understand it correctly, it is easy to spot the calculations that would produce overcomplicated symbols, so instead of modifying "deep" functions like SValBuilder::makeNonLoc or SymbolManager::acquire, you can prevent the creation of overcomplicated symbols by placing an early return branch (that returns unknown) at the beginning of evalBinOp and a few similar functions. This way you wouldn't need to "bubble up" the UnknownVal or the SymbolOverlyComplex through several deeply nested, complex function calls. (If you want to be sure, you can place an assertion on makeNonLoc or acquire to guarantee that they don't emit overcomplicated symbols -- but the actual check can be placed in an outer layer to simplify the control flow.)


Original thought process

🤔 I agree that it wouldn't be elegant to add

+  if (isa_and_nonnull<SymbolOverlyComplex>(lhs.getAsSymbol()) ||
+      isa_and_nonnull<SymbolOverlyComplex>(rhs.getAsSymbol()))
+    return UnknownVal();

because this way SymbolOverlyComplex would become an overcomplicated intermediate step that always leads to UnknownVal.

However, as I think about this situation, I realized that it's completely arbitrary that we're putting a complexity limit on the result symbol -- we might as well introduce a complexity threshold above which symbols cannot participate in calculations:

if (lhs.getAsSymbol() && lhs.getAsSymbol()->complexity() > Threshold ||
    lhs.getAsSymbol() && lhs.getAsSymbol()->complexity() > Threshold)
  return UnknownVal();

This different sort of limitation still guarantees that we cannot build overly complex symbols, but its implementation is significantly shorter (just three early return statements for unary operator, binary operator and cast evaluation).

💡 In fact, you can compare the threshold with the sum of complexity values of the two operands to get a heuristic which is practically equivalent to the complexity threshold that you want, but doesn't pass around UnknownVals in already complex code and doesn't introduce a new symbol kind.

@balazs-benics-sonarsource
Copy link
Contributor Author

[...] This different sort of limitation still guarantees that we cannot build overly complex symbols, but its implementation is significantly shorter (just three early return statements for unary operator, binary operator and cast evaluation).

💡 In fact, you can compare the threshold with the sum of complexity values of the two operands to get a heuristic which is practically equivalent to the complexity threshold that you want, but doesn't pass around UnknownVals in already complex code and doesn't introduce a new symbol kind.

I played with the idea and there is one wrinkle.
EvalBinOp applies tactics that can reduce the requested operation to known values or ranges after applying some logic, like:

  • eagerly fold away multiplications to 1
  • shifting 0 left or right to a cast
  • zero divided/modulo by some value to 0
  • others

By checking the sum of the operand complexities before applying these heuristics would mean that we would lose out on these benefits, thus reduce cases into Unknown while in the past we could have deduced a simple result for the case.
The evalbinops use a bunch of APIs (call makeSymExprValNN, makeNonLoc, etc), so it's not easy to ensure that on all paths we obey the max complexity rule - unless we do the complexity check at makeNonLoc, where everything boils down to.

This was the originally proposed version, which was rejected due to the fact that if makeNonLoc could return Unknown then a bunch of call-sites would need to be updated to accommodate for this.

So, I think if we go with the evalbinop approach, then it should work as efficiently as my original proposal, while sacreficing the special cases that fold away the binop. I'm fine with either of the approaches.
I scheduled a measurement for the evalbinop approach, and I expect the results by tomorrow the latest.

@NagyDonat NagyDonat closed this Jun 30, 2025
@NagyDonat NagyDonat reopened this Jun 30, 2025
@NagyDonat
Copy link
Contributor

NagyDonat commented Jun 30, 2025

Yet another misclick -- the cursed „Close with comment” is placed at a location where I intuitively expect a „Cancel” button. (I used "Quote reply", then I felt that I don't want the whole previous comment in a quote, so I tried to press Cancel, but the "Close with comment" button is placed below the new comment textbox in the same location where the edit comment textbox has its "Cancel" button.)

@NagyDonat
Copy link
Contributor

I played with the idea and there is one wrinkle. EvalBinOp applies tactics that can reduce the requested operation to known values or ranges after applying some logic, like:

* eagerly fold away multiplications to 1

* shifting 0 left or right to a cast

* zero divided/modulo by some value to 0

* others

By checking the sum of the operand complexities before applying these heuristics would mean that we would lose out on these benefits, thus reduce cases into Unknown while in the past we could have deduced a simple result for the case.

Yes, doing the test at the beginning of EvalBinOp (instead of placing it in makeNonLoc) moves the threshold to a somewhat earlier step: the complexity cutoff will affect somewhat more symbols -- but this comes with a proportional performance improvement (as we skip the simplification steps that could create the overly complex symbol), so I don't think that this is a problem. (As a compensation, we could "sell" the performance advantage to slightly increase the complexity threshold -- but the threshold is an arbitrary round value anyway, so I don't think that we need to actually do this.)

So, I think if we go with the evalbinop approach, then it should work as efficiently as my original proposal, while sacreficing the special cases that fold away the binop. I'm fine with either of the approaches.
I scheduled a measurement for the evalbinop approach, and I expect the results by tomorrow the latest.

I'm looking forward to it :) I think this evalbinop approach could be a good compromise that eliminates the outliers without messing up the code.

@balazs-benics-sonarsource
Copy link
Contributor Author

So, I think if we go with the evalbinop approach, then it should work as efficiently as my original proposal, while sacreficing the special cases that fold away the binop. I'm fine with either of the approaches.
I scheduled a measurement for the evalbinop approach, and I expect the results by tomorrow the latest.

I'm looking forward to it :) I think this evalbinop approach could be a good compromise that eliminates the outliers without messing up the code.

I can already share that the assertion enforcing that we honor Max complexity would fire on about half of the projects in the test pool so far. This is still acceptable for us, but lessens the guarantees of the original patch where this assertion would have held.

I'll do another measurement now where there is no such assert in place to see if the performance would look good, in other words that we still make overly complicated symbols but less often to the degree that we care about. In this case we would accept this patch.

rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
This should enable more powerful type metaprograms.

Split from llvm#144327
@balazs-benics-sonarsource
Copy link
Contributor Author

balazs-benics-sonarsource commented Jul 8, 2025

So, I think if we go with the evalbinop approach, then it should work as efficiently as my original proposal, while sacreficing the special cases that fold away the binop. I'm fine with either of the approaches.
I scheduled a measurement for the evalbinop approach, and I expect the results by tomorrow the latest.

I'm looking forward to it :) I think this evalbinop approach could be a good compromise that eliminates the outliers without messing up the code.

I can already share that the assertion enforcing that we honor Max complexity would fire on about half of the projects in the test pool so far. This is still acceptable for us, but lessens the guarantees of the original patch where this assertion would have held.

It turns out that evalCast is another API that creates overly complicated symbols besides evalBinOp , thus to assert that we never create overly complicated symbols, we would need to have an early return comparing against the threshold too.
I've not done it though.

I'll do another measurement now where there is no such assert in place to see if the performance would look good, in other words that we still make overly complicated symbols but less often to the degree that we care about. In this case we would accept this patch.

It took me a while to gather the data. I had migrate the uses of the internal evalBinOpXX APIs to use the top-level evalBinOp from checkers and other recursive places, to ensure that the check withing evalBinOp is honored.

Once this was done, the data looks more convincing.
It's not a clear win though, as you will see.
max-compl-upstream-eval-animated

This is a log-log scatter plot of the running times of the individual entry points. The plot only charts entry points where either the baseline and the new run took at least 50 milliseconds, so we have 758786 entry points on the plot.

'abs dRT' := RT_new - RT_base
'rel dRT%' := 'abs dRT' / RT_base * 100

The "outliers" are the entry points where the abs('rel dRT' - mean('rel dRT%') > 10 * std('rel dRT%'), which accounts for 295 entry points out of the visualized, aka. 0.04%. These are circled to make them stand out.

Basically, if the dot is on the diagonal, that means it runs roughly as in the baseline. This is the majority of the entry points.
If the dot is on the bottom right, then it ran faster after the change.
Similarly, on the upper left side the entry point ran slower than on the baseline.
The further the dot is from the diagonal, the slower/faster it was relative to the baseline.
The dots on the bottom left corner are the "fast-ish" entry points, as they ran within the expected 10**3 (aka. 1 second) time budget.

If we are looking at the 1 second+ entry points, we can see that this patch greatly improves about 6 cases (marked with 5-10 on the chart).
However, it regresses one (1) entry point from 43 seconds to 93 seconds (1.15x). (project retdec:parameters.cpp serialize(...)
Also regresses quicker entry points, like (2) LibreOffice:document5.cxx from 0.487 seconds to 4.2 seconds (7.7x).
And a bunch of others, like (3) libuv:signal.c:uv_signal_event from 3.7s to 9.4s (1.5x) and (4) libuv:stream.c:uv_write from 3.3s to 9.1s (1.7x).

# RT_base RT_new rel dRT% EntryPoint
1 43136 93100 115.83% retdec/src/config/parameters.cpp c:@N@retdec@N@config@S@Parameters@F@serialize<#$@N@rapidjson@S@PrettyWriter>#$@N@rapidjson@S@GenericStringBuffer>#$@N@rapidjson@S@UTF8>#C#$@N@rapidjson@S@CrtAllocator#S2_#S2_#S3_#Vi0>#&S0_#1
2 487 4226 767.76% LibreOffice/core/data/documen5.cxx c:@S@ScDocument@F@GetOldChartParameters#&1$@N@rtl@S@OUString#&$@S@ScRangeList#&b#S4_#
3 3688 9423 155.50% libuv/orkspce/src/unix/signal.c c:signal.c@F@uv__signal_event
4 3352 9124 172.20% libuv/src/unix/stream.c c:@F@uv_write
5 3126 120 -96.16% GCC/gcc/sel-sched.cc c:sel-sched.cc@F@moveup_set_expr#**$@S@_list_node#*$@S@rtx_insn#b#
6 33322 1292 -96.12% DelugeFirmware/src/OSLikeStuff/fault_handler/fault_handler.c c:@F@printPointers
7 3939 207 -94.74% LibreOffice/basic/source/basmgr/basmgr.cxx c:@S@LibraryContainer_Impl@F@getByName#&1$@N@rtl@S@OUString#
8 4422 505 -88.58% tensorflow/compiler/mlir/lite/flatbuffer_export.cc c:flatbuffer_export.cc@aN@S@Translator@F@Translator#$@N@mlir@S@ModuleOp#b#b#b#&1$@N@std@S@unordered_set>#$@N@std@N@__cxx11@S@basic_string>#C#$@N@std@S@char_traits>#C#$@N@std@S@allocator>#C#$@N@std@S@hash>#S3_#$@N@std@S@equal_to>#S3_#$@N@std@S@allocator>#S3_#S1_#*$@N@tensorflow@S@OpOrArgNameMapper#
9 2774 417 -84.97% LibreOffice/cppuhelper/source/tdmgr.cxx c:tdmgr.cxx@N@cppu@F@createCTD#&1$@N@com@N@sun@N@star@N@uno@S@Reference>#$@N@com@N@sun@N@star@N@container@S@XHierarchicalNameAccess#&1$@N@com@N@sun@N@star@N@uno@S@Reference>#$@N@com@N@sun@N@star@N@reflection@S@XTypeDescription#
10 6525 1050 -83.91% tensorflow/compiler/mlir/lite/flatbuffer_export.cc c:flatbuffer_export.cc@aN@S@Translator@F@GetOpcodeIndex#&1$@N@std@N@__cxx11@S@basic_string>#C#$@N@std@S@char_traits>#C#$@N@std@S@allocator>#C#$@N@tflite@E@BuiltinOperator#

If we only look at the entry points that ran for at least 1 second in at least one of the measurements, we have these numbers:

long.describe(percentiles=[0.0001, 0.001, .01, .1, .25, .5, .75, .9, .99, .999, .9999])
              RT_base         RT_new        abs dRT       rel dRT%
count   400184.000000  400184.000000  400184.000000  400184.000000
mean      3469.711195    3468.452514      -1.258681       0.426680
std       1413.859913    1425.623877     286.598363     117.717681
min          7.000000       5.000000  -32030.000000     -99.876695
0.01%      742.018300     685.036600   -4081.414400     -59.523150
0.1%       926.000000     913.000000   -1595.000000     -38.294830
1%        1029.000000    1028.000000    -950.000000     -21.638762
10%       1699.000000    1692.000000    -230.000000      -6.317705
25%       2534.000000    2535.000000     -89.000000      -2.810499
50%       3464.000000    3455.000000       4.000000       0.134168
75%       4289.000000    4301.000000      92.000000       2.890952
90%       5069.000000    5090.000000     238.000000       7.072818
99%       6879.000000    6856.000000     711.000000      18.337760
99.9%    11427.634000   11341.000000    1560.817000      63.733926
99.99%   28564.700700   28336.414400    2705.981700      80.511678
max     116043.000000  138428.000000   49964.000000   69714.285714

I can share the anonimized data, where the entry point names are sha-hashed if you want to play with it.

Here is the script that can visualize the data.
# pip install pandas matplotlib hashlib PyQt7

import hashlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math

b = pd.read_csv('baseline/merged_metrics.csv')
n = pd.read_csv('proposed/merged_metrics.csv')
max_orig_num_rows = max(len(b), len(n))

# keep only the columns we need
b = b[['EntryPoint', 'PathRunningTime']]
n = n[['EntryPoint', 'PathRunningTime']]

# Drop the rows where the EntryPoint appears multiple times.
b = b.drop_duplicates(subset=['EntryPoint'])
n = n.drop_duplicates(subset=['EntryPoint'])

# Rename the columns to RT_base, RT_new, and RT_refactored
b.rename(columns={'PathRunningTime': 'RT_base'}, inplace=True)
n.rename(columns={'PathRunningTime': 'RT_new'}, inplace=True)

# Join the three dataframes on the EntryPoint column
joined = pd.merge(b, n, on=['EntryPoint'], how='inner')
joined = joined.sort_values(by='RT_base')
joined.reset_index(inplace=True)

print(f'Dropped about {max_orig_num_rows - len(joined)} ({(max_orig_num_rows - len(joined)) / len(joined):.3f}%) rows')

# Create an anonimized EntryPoint name in case I'd want to share this data publicly.
joined['Hash'] = joined['EntryPoint'].apply(lambda x: hashlib.sha256(x.encode()).hexdigest())

# Save this so that other (more specialized) scripts can start from here. 
joined.to_csv('joined.csv', index=False)
# joined = pd.read_csv('joined.csv')

# Calculate the absolute and relative difference
joined['abs dRT'] = joined['RT_new'] - joined['RT_base']
joined['rel dRT%'] = joined['abs dRT'] / joined['RT_base'] * 100

long = joined[(joined['RT_base'] > 1000) | (joined['RT_new'] > 1000)]
long.reset_index(inplace=True)
long.drop(columns=['index'], inplace=True)
print(f'{len(long)} ({len(long) / len(joined):.3f}%) EntryPoints ran longer than 1 second at least once')
long.describe(percentiles=[0.0001, 0.001, .01, .1, .25, .5, .75, .9, .99, .999, .9999])


relevant = joined[(joined['RT_base'] > 50) & (joined['RT_new'] > 50)]
relevant.reset_index(inplace=True)

# Identify outliers based on relative difference standard deviation
rel_drt_std = relevant['rel dRT%'].std()
rel_drt_mean = relevant['rel dRT%'].mean()
threshold = rel_drt_std * 10
outliers = relevant[abs(relevant['rel dRT%'] - rel_drt_mean) > threshold]
outliers.reset_index(inplace=True)
print(f'Found {len(outliers)} outlier datapoints (rel dRT% > {threshold:.2f} from mean)')
print(f'Mean rel dRT%: {rel_drt_mean:.2f}%, Std dev: {rel_drt_std:.2f}%')





# Create a side-by-side layout with plot and table
fig, (ax, ax_table) = plt.subplots(1, 2, figsize=(16, 8), width_ratios=[2, 1])

# Create scatter plots with hover functionality
scatter_outliers = ax.scatter(outliers['RT_base'], outliers['RT_new'], s=30, alpha=0.5, 
                             facecolors='none', edgecolors='black', linewidth=1,
                             label=f'Outliers ({len(outliers)} points, {len(outliers) / len(relevant)*100:.2f}%)')

scatter_relevant = ax.scatter(relevant['RT_base'], relevant['RT_new'], s=0.1, color='red', alpha=0.2)

# Add diagonal lines in semi-transparent colors
max_val = max(relevant['RT_base'].max(), relevant['RT_new'].max())
ax.plot([0, max_val], [0, max_val], color='skyblue', alpha=0.5, linestyle='-', linewidth=1, label='RT_new=RT_base')

RT_new_vs_base_ratio = relevant['RT_new'] / relevant['RT_base']
best_case = relevant.iloc[RT_new_vs_base_ratio.idxmin()]
worst_case = relevant.iloc[RT_new_vs_base_ratio.idxmax()]

num_guides = 5
for nth in range(1, num_guides+1):
    curr = round(nth * (worst_case['RT_new'] / worst_case['RT_base']) / num_guides, 1)
    ax.plot([0, max_val/curr], [0, max_val], color='orange', alpha=0.5, linestyle='--', linewidth=1, label=f'RT_new=RT_base*{curr}')
    #ax.text(max_val/curr, max_val, f'RT_new=RT_base/{curr}', color='orange', alpha=0.7, fontsize=8, ha='left', va='center')

max_ratio_inverse = math.floor(1/(best_case['RT_new'] / best_case['RT_base']))
for nth in range(1, num_guides+1):
    curr = round(nth * max_ratio_inverse / num_guides, 1)
    ax.plot([0, max_val], [0, max_val/curr], color='green', alpha=0.5, linestyle='--', linewidth=1, label=f'RT_new=RT_base/{curr}')
    #ax.text(max_val, max_val/curr, f'RT_new=RT_base*{curr}', color='green', alpha=0.7, fontsize=8, ha='left', va='center')


annotations = []
clicked_points = []

# Clear on right click, show info on left click
def onclick(event):
    global annotations, clicked_points
    
    # Only handle clicks on the main plot (not the table)
    if event.inaxes != ax:
        return
    
    # Right click - clear everything
    if event.button == 3:
        for annot in annotations:
            annot.set_visible(False)
        annotations.clear()
        clicked_points.clear()
        update_table()
        return
    
    # Left click - add numbered annotation
    if event.button == 1:
        distances = (event.xdata - outliers['RT_base'])**2 + (event.ydata - outliers['RT_new'])**2
        closest_point = outliers.iloc[distances.idxmin()]
        
        # Create numbered annotation
        number = len(annotations) + 1
        annot = ax.annotate(str(number), 
                           xy=(closest_point['RT_base'], closest_point['RT_new']),
                           bbox=dict(boxstyle="circle", edgecolor='none', facecolor='none', alpha=0.7),
                           fontsize=10, fontweight='bold', color='blue')
        
        annotations.append(annot)
        clicked_points.append(closest_point)
        
        update_table()

def update_table():
    ax_table.clear()
    ax_table.set_xticks([])
    ax_table.set_yticks([])
    
    if clicked_points:
        # Create table data with more columns for better readability
        table_data = []
        for i, point in enumerate(clicked_points):
            entry_point = point['EntryPoint']
            # Truncate long entry points for display
            if len(entry_point) > 50:
                entry_point = entry_point[:47] + "..."
            table_data.append([
                f"{i+1}",
                f"{point['RT_base']:.2f}",
                f"{point['RT_new']:.2f}",
                f"{point['rel dRT%']:.2f}%",
                entry_point
            ])
        
        table = ax_table.table(cellText=table_data, 
                              colLabels=['#', 'RT_base', 'RT_new', 'rel dRT%', 'EntryPoint'],
                              cellLoc='left',
                              loc='center',
                              bbox=[0, 0, 1, 1])
        
        # Style the table
        table.auto_set_font_size(False)
        table.set_fontsize(8)
        table.scale(1, 1.5)
        
        # Color header row
        for j in range(len(table_data[0])):
            table[(0, j)].set_facecolor('#E6E6E6')
            table[(0, j)].set_text_props(weight='bold')
        
        # Color alternating rows for better readability
        for i in range(len(table_data)):
            for j in range(len(table_data[0])):
                if i % 2 == 0:
                    table[(i+1, j)].set_facecolor('#F8F8F8')
    
    fig.canvas.draw()

fig.canvas.mpl_connect("button_press_event", onclick)

ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Baseline Running Time')
ax.set_ylabel('New Running Time')
ax.legend()

# Configure table subplot
ax_table.set_title('Selected EntryPoints', fontsize=12, fontweight='bold', pad=20)
ax_table.set_xticks([])
ax_table.set_yticks([])

# Add scrollable text widget for full entry point details
from matplotlib.widgets import TextBox
import tkinter as tk
from tkinter import ttk

# Create a separate window for detailed view
def show_details_window():
    if not clicked_points:
        return
    
    root = tk.Tk()
    root.title("EntryPoint Details")
    root.geometry("800x600")
    
    # Create frame for buttons
    button_frame = tk.Frame(root)
    button_frame.pack(fill='x', padx=5, pady=5)
    
    # Create treeview for spreadsheet-like interface
    tree = ttk.Treeview(root, columns=('Number', 'RT_base', 'RT_new', 'rel_dRT', 'EntryPoint'), show='headings')
    
    # Define columns
    tree.heading('Number', text='#')
    tree.heading('RT_base', text='RT_base')
    tree.heading('RT_new', text='RT_new')
    tree.heading('rel_dRT', text='rel dRT%')
    tree.heading('EntryPoint', text='EntryPoint')
    
    # Set column widths
    tree.column('Number', width=50)
    tree.column('RT_base', width=100)
    tree.column('RT_new', width=100)
    tree.column('rel_dRT', width=100)
    tree.column('EntryPoint', width=400)
    
    # Add scrollbars
    vsb = ttk.Scrollbar(root, orient="vertical", command=tree.yview)
    hsb = ttk.Scrollbar(root, orient="horizontal", command=tree.xview)
    tree.configure(yscrollcommand=vsb.set, xscrollcommand=hsb.set)
    
    # Pack layout
    tree.pack(side='left', fill='both', expand=True)
    vsb.pack(side='right', fill='y')
    hsb.pack(side='bottom', fill='x')
    
    # Populate data
    for i, point in enumerate(clicked_points):
        tree.insert('', 'end', values=(
            i+1,
            f"{point['RT_base']:.2f}",
            f"{point['RT_new']:.2f}",
            f"{point['rel dRT%']:.2f}%",
            point['EntryPoint']
        ))
    
    # Add copy to clipboard functionality
    def copy_selected():
        selected_items = tree.selection()
        if not selected_items:
            return
        
        clipboard_text = ""
        for item in selected_items:
            values = tree.item(item)['values']
            clipboard_text += "\t".join(str(v) for v in values) + "\n"
        
        root.clipboard_clear()
        root.clipboard_append(clipboard_text.strip())
    
    def copy_all():
        all_items = tree.get_children()
        if not all_items:
            return
        
        clipboard_text = "#\tRT_base\tRT_new\trel dRT%\tEntryPoint\n"
        for item in all_items:
            values = tree.item(item)['values']
            clipboard_text += "\t".join(str(v) for v in values) + "\n"
        
        root.clipboard_clear()
        root.clipboard_append(clipboard_text.strip())
    
    # Add buttons
    copy_selected_btn = tk.Button(button_frame, text="Copy Selected", command=copy_selected)
    copy_selected_btn.pack(side='left', padx=5)
    
    copy_all_btn = tk.Button(button_frame, text="Copy All", command=copy_all)
    copy_all_btn.pack(side='left', padx=5)
    
    # Add keyboard shortcuts
    def on_key(event):
        if event.state & 4:  # Ctrl key
            if event.keysym == 'c':
                copy_selected()
            elif event.keysym == 'a':
                tree.selection_set(tree.get_children())
    
    tree.bind('', on_key)
    
    root.mainloop()

# Add button to open details window
from matplotlib.widgets import Button
ax_button = plt.axes([0.83, 0.02, 0.15, 0.05])
button = Button(ax_button, 'View Details')
button.on_clicked(lambda x: show_details_window())

plt.tight_layout()
plt.show()

In conclusion, in contrast to the originally proposed variant of this patch this evolved version is more controversial and the benefits are not clear cut. I don't think it's simpler than the original variant, so I think we reached a stale mate.

Unfortunately, I can't invest more time into this PR right now, so I'll close this.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jul 8, 2025

Thank you for the very through statistical analysis and visualization -- this paints a clear picture about the effects of the change.

In conclusion, in contrast to the originally proposed variant of this patch this evolved version is more controversial and the benefits are not clear cut. I don't think it's simpler than the original variant, so I think we reached a stale mate.

Unfortunately, I can't invest more time into this PR right now, so I'll close this.

I'm sad that this "check complexity at the beginning of evalBinOp" idea didn't provide convincing results 🙁

I don't see a theoretical reason why are these benefits less clear cut than your originally proposed change. (In fact, I'm not sure whether the originally proposed variant is really better -- as far as I see the statistics that you published originally don't exclude an effect that's similar to the one that you observed with this "check complexity at the beginning of evalBinOp" variant.)

I had migrate the uses of the internal evalBinOpXX APIs to use the top-level evalBinOp from checkers and other recursive places, to ensure that the check withing evalBinOp is honored.

This is probably irrelevant now, but as far as I see it would've been enough to put the complexity checking to the beginning of evalBinOpNN (instead of evalBinOp) because it's the only "branch" of evalBinOp that can create complex symbols. (Pointer arithmetic is represented in a different way...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants