blob: c03e15253e15b467e91eef545eb81621db6e9fc3 [file] [log] [blame]
Rui Ueyama244a4352016-12-03 21:24:51 +00001//===- Threads.h ------------------------------------------------*- C++ -*-===//
2//
3// The LLVM Linker
4//
5// This file is distributed under the University of Illinois Open Source
6// License. See LICENSE.TXT for details.
7//
8//===----------------------------------------------------------------------===//
Rui Ueyamac3aacfd2016-12-03 23:35:22 +00009//
10// LLD supports threads to distribute workloads to multiple cores. Using
11// multicore is most effective when more than one core are idle. At the
12// last step of a build, it is often the case that a linker is the only
13// active process on a computer. So, we are naturally interested in using
14// threads wisely to reduce latency to deliver results to users.
15//
16// That said, we don't want to do "too clever" things using threads.
17// Complex multi-threaded algorithms are sometimes extremely hard to
18// justify the correctness and can easily mess up the entire design.
19//
20// Fortunately, when a linker links large programs (when the link time is
21// most critical), it spends most of the time to work on massive number of
Rui Ueyama28f22ae2016-12-04 02:34:29 +000022// small pieces of data of the same kind, and there are opportunities for
23// large parallelism there. Here are examples:
Rui Ueyamac3aacfd2016-12-03 23:35:22 +000024//
25// - We have hundreds of thousands of input sections that need to be
26// copied to a result file at the last step of link. Once we fix a file
27// layout, each section can be copied to its destination and its
28// relocations can be applied independently.
29//
30// - We have tens of millions of small strings when constructing a
31// mergeable string section.
32//
33// For the cases such as the former, we can just use parallel_for_each
34// instead of std::for_each (or a plain for loop). Because tasks are
35// completely independent from each other, we can run them in parallel
36// without any coordination between them. That's very easy to understand
37// and justify.
38//
39// For the cases such as the latter, we can use parallel algorithms to
40// deal with massive data. We have to write code for a tailored algorithm
41// for each problem, but the complexity of multi-threading is isolated in
42// a single pass and doesn't affect the linker's overall design.
43//
44// The above approach seems to be working fairly well. As an example, when
45// linking Chromium (output size 1.6 GB), using 4 cores reduces latency to
46// 75% compared to single core (from 12.66 seconds to 9.55 seconds) on my
Rui Ueyama28f22ae2016-12-04 02:34:29 +000047// Ivy Bridge Xeon 2.8 GHz machine. Using 40 cores reduces it to 63% (from
48// 12.66 seconds to 7.95 seconds). Because of the Amdahl's law, the
49// speedup is not linear, but as you add more cores, it gets faster.
Rui Ueyamac3aacfd2016-12-03 23:35:22 +000050//
51// On a final note, if you are trying to optimize, keep the axiom "don't
52// guess, measure!" in mind. Some important passes of the linker are not
53// that slow. For example, resolving all symbols is not a very heavy pass,
54// although it would be very hard to parallelize it. You want to first
55// identify a slow pass and then optimize it.
56//
57//===----------------------------------------------------------------------===//
Rui Ueyama244a4352016-12-03 21:24:51 +000058
59#ifndef LLD_ELF_THREADS_H
60#define LLD_ELF_THREADS_H
61
62#include "Config.h"
63
64#include "lld/Core/Parallel.h"
65#include <algorithm>
66#include <functional>
67
68namespace lld {
69namespace elf {
70
71template <class IterTy, class FuncTy>
72void forEach(IterTy Begin, IterTy End, FuncTy Fn) {
73 if (Config->Threads)
74 parallel_for_each(Begin, End, Fn);
75 else
76 std::for_each(Begin, End, Fn);
77}
78
79inline void forLoop(size_t Begin, size_t End, std::function<void(size_t)> Fn) {
80 if (Config->Threads) {
81 parallel_for(Begin, End, Fn);
82 } else {
83 for (size_t I = Begin; I < End; ++I)
84 Fn(I);
85 }
86}
87}
88}
89
90#endif