Blame - lld/ELF/Threads.h - toolchain/llvm-project

blob: c03e15253e15b467e91eef545eb81621db6e9fc3 [file] [log] [blame]

Rui Ueyama	244a435	2016-12-03 21:24:51 +0000	[diff] [blame]	1	//===- Threads.h ------------------------------------------------- C++ --===//
				2	//
				3	// The LLVM Linker
				4	//
				5	// This file is distributed under the University of Illinois Open Source
				6	// License. See LICENSE.TXT for details.
				7	//
				8	//===----------------------------------------------------------------------===//
Rui Ueyama	c3aacfd	2016-12-03 23:35:22 +0000	[diff] [blame]	9	//
				10	// LLD supports threads to distribute workloads to multiple cores. Using
				11	// multicore is most effective when more than one core are idle. At the
				12	// last step of a build, it is often the case that a linker is the only
				13	// active process on a computer. So, we are naturally interested in using
				14	// threads wisely to reduce latency to deliver results to users.
				15	//
				16	// That said, we don't want to do "too clever" things using threads.
				17	// Complex multi-threaded algorithms are sometimes extremely hard to
				18	// justify the correctness and can easily mess up the entire design.
				19	//
				20	// Fortunately, when a linker links large programs (when the link time is
				21	// most critical), it spends most of the time to work on massive number of
Rui Ueyama	28f22ae	2016-12-04 02:34:29 +0000	[diff] [blame^]	22	// small pieces of data of the same kind, and there are opportunities for
				23	// large parallelism there. Here are examples:
Rui Ueyama	c3aacfd	2016-12-03 23:35:22 +0000	[diff] [blame]	24	//
				25	// - We have hundreds of thousands of input sections that need to be
				26	// copied to a result file at the last step of link. Once we fix a file
				27	// layout, each section can be copied to its destination and its
				28	// relocations can be applied independently.
				29	//
				30	// - We have tens of millions of small strings when constructing a
				31	// mergeable string section.
				32	//
				33	// For the cases such as the former, we can just use parallel_for_each
				34	// instead of std::for_each (or a plain for loop). Because tasks are
				35	// completely independent from each other, we can run them in parallel
				36	// without any coordination between them. That's very easy to understand
				37	// and justify.
				38	//
				39	// For the cases such as the latter, we can use parallel algorithms to
				40	// deal with massive data. We have to write code for a tailored algorithm
				41	// for each problem, but the complexity of multi-threading is isolated in
				42	// a single pass and doesn't affect the linker's overall design.
				43	//
				44	// The above approach seems to be working fairly well. As an example, when
				45	// linking Chromium (output size 1.6 GB), using 4 cores reduces latency to
				46	// 75% compared to single core (from 12.66 seconds to 9.55 seconds) on my
Rui Ueyama	28f22ae	2016-12-04 02:34:29 +0000	[diff] [blame^]	47	// Ivy Bridge Xeon 2.8 GHz machine. Using 40 cores reduces it to 63% (from
				48	// 12.66 seconds to 7.95 seconds). Because of the Amdahl's law, the
				49	// speedup is not linear, but as you add more cores, it gets faster.
Rui Ueyama	c3aacfd	2016-12-03 23:35:22 +0000	[diff] [blame]	50	//
				51	// On a final note, if you are trying to optimize, keep the axiom "don't
				52	// guess, measure!" in mind. Some important passes of the linker are not
				53	// that slow. For example, resolving all symbols is not a very heavy pass,
				54	// although it would be very hard to parallelize it. You want to first
				55	// identify a slow pass and then optimize it.
				56	//
				57	//===----------------------------------------------------------------------===//
Rui Ueyama	244a435	2016-12-03 21:24:51 +0000	[diff] [blame]	58
				59	#ifndef LLD_ELF_THREADS_H
				60	#define LLD_ELF_THREADS_H
				61
				62	#include "Config.h"
				63
				64	#include "lld/Core/Parallel.h"
				65	#include <algorithm>
				66	#include <functional>
				67
				68	namespace lld {
				69	namespace elf {
				70
				71	template <class IterTy, class FuncTy>
				72	void forEach(IterTy Begin, IterTy End, FuncTy Fn) {
				73	if (Config->Threads)
				74	parallel_for_each(Begin, End, Fn);
				75	else
				76	std::for_each(Begin, End, Fn);
				77	}
				78
				79	inline void forLoop(size_t Begin, size_t End, std::function<void(size_t)> Fn) {
				80	if (Config->Threads) {
				81	parallel_for(Begin, End, Fn);
				82	} else {
				83	for (size_t I = Begin; I < End; ++I)
				84	Fn(I);
				85	}
				86	}
				87	}
				88	}
				89
				90	#endif