[analyzer] Performance optimizations for the CloneChecker Summary: This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3. The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%. The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%. I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this. Reviewers: NoQ Reviewed By: NoQ Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D34182 llvm-svn: 312222

commit: 70686a1590c359a9f4cf6e08434e3c8cd0a80b54 [log] [tgz]
author: Raphael Isemann <teemperor@gmail.com> Thu Aug 31 07:10:46 2017 +0000
committer: Raphael Isemann <teemperor@gmail.com> Thu Aug 31 07:10:46 2017 +0000
tree: 81b264caac43889e45686bff82b0b5506972d717
parent: 0a9c1ef2eb1f74e72cbf775a858294bfbe014b01 [diff] [blame]
diff --git a/clang/lib/Analysis/CloneDetection.cpp b/clang/lib/Analysis/CloneDetection.cpp
index 1cce6a4..637e042 100644
--- a/clang/lib/Analysis/CloneDetection.cpp
+++ b/clang/lib/Analysis/CloneDetection.cpp

@@ -238,9 +238,18 @@
   return HashCode;
 }
 
-size_t RecursiveCloneTypeIIConstraint::saveHash(
-    const Stmt *S, const Decl *D,
-    std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash) {
+/// Generates and saves a hash code for the given Stmt.
+/// \param S The given Stmt.
+/// \param D The Decl containing S.
+/// \param StmtsByHash Output parameter that will contain the hash codes for
+///                    each StmtSequence in the given Stmt.
+/// \return The hash code of the given Stmt.
+///
+/// If the given Stmt is a CompoundStmt, this method will also generate
+/// hashes for all possible StmtSequences in the children of this Stmt.
+static size_t
+saveHash(const Stmt *S, const Decl *D,
+         std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash) {
   llvm::MD5 Hash;
   ASTContext &Context = D->getASTContext();
 
@@ -340,7 +349,7 @@
   return DataLHS == DataRHS;
 }
 
-void RecursiveCloneTypeIIConstraint::constrain(
+void RecursiveCloneTypeIIHashConstraint::constrain(
     std::vector<CloneDetector::CloneGroup> &Sequences) {
   // FIXME: Maybe we can do this in-place and don't need this additional vector.
   std::vector<CloneDetector::CloneGroup> Result;
@@ -381,8 +390,7 @@
 
       for (; i < StmtsByHash.size(); ++i) {
         // A different hash value means we have reached the end of the sequence.
-        if (PrototypeHash != StmtsByHash[i].first ||
-            !areSequencesClones(StmtsByHash[i].second, Current.second)) {
+        if (PrototypeHash != StmtsByHash[i].first) {
           // The current sequence could be the start of a new CloneGroup. So we
           // decrement i so that we visit it again in the outer loop.
           // Note: i can never be 0 at this point because we are just comparing
@@ -405,6 +413,14 @@
   Sequences = Result;
 }
 
+void RecursiveCloneTypeIIVerifyConstraint::constrain(
+    std::vector<CloneDetector::CloneGroup> &Sequences) {
+  CloneConstraint::splitCloneGroups(
+      Sequences, [](const StmtSequence &A, const StmtSequence &B) {
+        return areSequencesClones(A, B);
+      });
+}
+
 size_t MinComplexityConstraint::calculateStmtComplexity(
     const StmtSequence &Seq, const std::string &ParentMacroStack) {
   if (Seq.empty())
commit	70686a1590c359a9f4cf6e08434e3c8cd0a80b54	[log] [tgz]
author	Raphael Isemann <teemperor@gmail.com>	Thu Aug 31 07:10:46 2017 +0000
committer	Raphael Isemann <teemperor@gmail.com>	Thu Aug 31 07:10:46 2017 +0000
tree	81b264caac43889e45686bff82b0b5506972d717
parent	0a9c1ef2eb1f74e72cbf775a858294bfbe014b01 [diff] [blame]