Retry: [llvm-profdata] Speed up merging by using a thread pool

Add a "-j" option to llvm-profdata to control the number of threads used.
Auto-detect NumThreads when it isn't specified, and avoid spawning threads when
they wouldn't be beneficial.

I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:

  No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
  With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total

Changes since the initial commit:

  - When handling odd-length inputs, call ThreadPool::wait() before merging the
    last profile. Should fix a race/off-by-one (see r275937).

Differential Revision: https://reviews.llvm.org/D22438

llvm-svn: 275938
diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index e25299e..7fabcdb 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -182,6 +182,14 @@
   return Dest.takeError();
 }
 
+Error InstrProfWriter::mergeRecordsFromWriter(InstrProfWriter &&IPW) {
+  for (auto &I : IPW.FunctionData)
+    for (auto &Func : I.getValue())
+      if (Error E = addRecord(std::move(Func.second), 1))
+        return E;
+  return Error::success();
+}
+
 bool InstrProfWriter::shouldEncodeData(const ProfilingData &PD) {
   if (!Sparse)
     return true;