Blame - Documentation/dev-tools/kmemcheck.rst - kernel/msm-4.9

blob: 7f3d1985de743f00860e69033564043aa145ad39 [file] [log] [blame]

Jonathan Corbet	9c296b4	2016-08-07 16:12:28 -0600	[diff] [blame]	1	Getting started with kmemcheck
				2	==============================
				3
				4	Vegard Nossum <vegardno@ifi.uio.no>
				5
				6
				7	Introduction
				8	------------
				9
				10	kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
				11	is a dynamic checker that detects and warns about some uses of uninitialized
				12	memory.
				13
				14	Userspace programmers might be familiar with Valgrind's memcheck. The main
				15	difference between memcheck and kmemcheck is that memcheck works for userspace
				16	programs only, and kmemcheck works for the kernel only. The implementations
				17	are of course vastly different. Because of this, kmemcheck is not as accurate
				18	as memcheck, but it turns out to be good enough in practice to discover real
				19	programmer errors that the compiler is not able to find through static
				20	analysis.
				21
				22	Enabling kmemcheck on a kernel will probably slow it down to the extent that
				23	the machine will not be usable for normal workloads such as e.g. an
				24	interactive desktop. kmemcheck will also cause the kernel to use about twice
				25	as much memory as normal. For this reason, kmemcheck is strictly a debugging
				26	feature.
				27
				28
				29	Downloading
				30	-----------
				31
				32	As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel.
				33
				34
				35	Configuring and compiling
				36	-------------------------
				37
				38	kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
				39	configuration variables must have specific settings in order for the kmemcheck
				40	menu to even appear in "menuconfig". These are:
				41
				42	- ``CONFIG_CC_OPTIMIZE_FOR_SIZE=n``
				43	This option is located under "General setup" / "Optimize for size".
				44
				45	Without this, gcc will use certain optimizations that usually lead to
				46	false positive warnings from kmemcheck. An example of this is a 16-bit
				47	field in a struct, where gcc may load 32 bits, then discard the upper
				48	16 bits. kmemcheck sees only the 32-bit load, and may trigger a
				49	warning for the upper 16 bits (if they're uninitialized).
				50
				51	- ``CONFIG_SLAB=y`` or ``CONFIG_SLUB=y``
				52	This option is located under "General setup" / "Choose SLAB
				53	allocator".
				54
				55	- ``CONFIG_FUNCTION_TRACER=n``
				56	This option is located under "Kernel hacking" / "Tracers" / "Kernel
				57	Function Tracer"
				58
				59	When function tracing is compiled in, gcc emits a call to another
				60	function at the beginning of every function. This means that when the
				61	page fault handler is called, the ftrace framework will be called
				62	before kmemcheck has had a chance to handle the fault. If ftrace then
				63	modifies memory that was tracked by kmemcheck, the result is an
				64	endless recursive page fault.
				65
				66	- ``CONFIG_DEBUG_PAGEALLOC=n``
				67	This option is located under "Kernel hacking" / "Memory Debugging"
				68	/ "Debug page memory allocations".
				69
				70	In addition, I highly recommend turning on ``CONFIG_DEBUG_INFO=y``. This is also
				71	located under "Kernel hacking". With this, you will be able to get line number
				72	information from the kmemcheck warnings, which is extremely valuable in
				73	debugging a problem. This option is not mandatory, however, because it slows
				74	down the compilation process and produces a much bigger kernel image.
				75
				76	Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory
				77	Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows
				78	a description of the kmemcheck configuration variables:
				79
				80	- ``CONFIG_KMEMCHECK``
				81	This must be enabled in order to use kmemcheck at all...
				82
				83	- ``CONFIG_KMEMCHECK_``[``DISABLED`` \| ``ENABLED`` \| ``ONESHOT``]``_BY_DEFAULT``
				84	This option controls the status of kmemcheck at boot-time. "Enabled"
				85	will enable kmemcheck right from the start, "disabled" will boot the
				86	kernel as normal (but with the kmemcheck code compiled in, so it can
				87	be enabled at run-time after the kernel has booted), and "one-shot" is
				88	a special mode which will turn kmemcheck off automatically after
				89	detecting the first use of uninitialized memory.
				90
				91	If you are using kmemcheck to actively debug a problem, then you
				92	probably want to choose "enabled" here.
				93
				94	The one-shot mode is mostly useful in automated test setups because it
				95	can prevent floods of warnings and increase the chances of the machine
				96	surviving in case something is really wrong. In other cases, the one-
				97	shot mode could actually be counter-productive because it would turn
				98	itself off at the very first error -- in the case of a false positive
				99	too -- and this would come in the way of debugging the specific
				100	problem you were interested in.
				101
				102	If you would like to use your kernel as normal, but with a chance to
				103	enable kmemcheck in case of some problem, it might be a good idea to
				104	choose "disabled" here. When kmemcheck is disabled, most of the run-
				105	time overhead is not incurred, and the kernel will be almost as fast
				106	as normal.
				107
				108	- ``CONFIG_KMEMCHECK_QUEUE_SIZE``
				109	Select the maximum number of error reports to store in an internal
				110	(fixed-size) buffer. Since errors can occur virtually anywhere and in
				111	any context, we need a temporary storage area which is guaranteed not
				112	to generate any other page faults when accessed. The queue will be
				113	emptied as soon as a tasklet may be scheduled. If the queue is full,
				114	new error reports will be lost.
				115
				116	The default value of 64 is probably fine. If some code produces more
				117	than 64 errors within an irqs-off section, then the code is likely to
				118	produce many, many more, too, and these additional reports seldom give
				119	any more information (the first report is usually the most valuable
				120	anyway).
				121
				122	This number might have to be adjusted if you are not using serial
				123	console or similar to capture the kernel log. If you are using the
				124	"dmesg" command to save the log, then getting a lot of kmemcheck
				125	warnings might overflow the kernel log itself, and the earlier reports
				126	will get lost in that way instead. Try setting this to 10 or so on
				127	such a setup.
				128
				129	- ``CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT``
				130	Select the number of shadow bytes to save along with each entry of the
				131	error-report queue. These bytes indicate what parts of an allocation
				132	are initialized, uninitialized, etc. and will be displayed when an
				133	error is detected to help the debugging of a particular problem.
				134
				135	The number entered here is actually the logarithm of the number of
				136	bytes that will be saved. So if you pick for example 5 here, kmemcheck
				137	will save 2^5 = 32 bytes.
				138
				139	The default value should be fine for debugging most problems. It also
				140	fits nicely within 80 columns.
				141
				142	- ``CONFIG_KMEMCHECK_PARTIAL_OK``
				143	This option (when enabled) works around certain GCC optimizations that
				144	produce 32-bit reads from 16-bit variables where the upper 16 bits are
				145	thrown away afterwards.
				146
				147	The default value (enabled) is recommended. This may of course hide
				148	some real errors, but disabling it would probably produce a lot of
				149	false positives.
				150
				151	- ``CONFIG_KMEMCHECK_BITOPS_OK``
				152	This option silences warnings that would be generated for bit-field
				153	accesses where not all the bits are initialized at the same time. This
				154	may also hide some real bugs.
				155
				156	This option is probably obsolete, or it should be replaced with
				157	the kmemcheck-/bitfield-annotations for the code in question. The
				158	default value is therefore fine.
				159
				160	Now compile the kernel as usual.
				161
				162
				163	How to use
				164	----------
				165
				166	Booting
				167	~~~~~~~
				168
				169	First some information about the command-line options. There is only one
				170	option specific to kmemcheck, and this is called "kmemcheck". It can be used
				171	to override the default mode as chosen by the ``CONFIG_KMEMCHECK_*_BY_DEFAULT``
				172	option. Its possible settings are:
				173
				174	- ``kmemcheck=0`` (disabled)
				175	- ``kmemcheck=1`` (enabled)
				176	- ``kmemcheck=2`` (one-shot mode)
				177
				178	If SLUB debugging has been enabled in the kernel, it may take precedence over
				179	kmemcheck in such a way that the slab caches which are under SLUB debugging
				180	will not be tracked by kmemcheck. In order to ensure that this doesn't happen
				181	(even though it shouldn't by default), use SLUB's boot option ``slub_debug``,
				182	like this: ``slub_debug=-``
				183
				184	In fact, this option may also be used for fine-grained control over SLUB vs.
				185	kmemcheck. For example, if the command line includes
				186	``kmemcheck=1 slub_debug=,dentry``, then SLUB debugging will be used only
				187	for the "dentry" slab cache, and with kmemcheck tracking all the other
				188	caches. This is advanced usage, however, and is not generally recommended.
				189
				190
				191	Run-time enable/disable
				192	~~~~~~~~~~~~~~~~~~~~~~~
				193
				194	When the kernel has booted, it is possible to enable or disable kmemcheck at
				195	run-time. WARNING: This feature is still experimental and may cause false
				196	positive warnings to appear. Therefore, try not to use this. If you find that
				197	it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
				198	will be happy to take bug reports.
				199
				200	Use the file ``/proc/sys/kernel/kmemcheck`` for this purpose, e.g.::
				201
				202	$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
				203
				204	The numbers are the same as for the ``kmemcheck=`` command-line option.
				205
				206
				207	Debugging
				208	~~~~~~~~~
				209
				210	A typical report will look something like this::
				211
				212	WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
				213	80000000000000000000000000000000000000000088ffff0000000000000000
				214	i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
				215	^
				216
				217	Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
				218	RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
				219	RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
				220	RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
				221	RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
				222	RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
				223	R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
				224	R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
				225	FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
				226	CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
				227	CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
				228	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
				229	DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
				230	[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
				231	[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
				232	[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
				233	[<ffffffff8100c7b5>] int_signal+0x12/0x17
				234	[<ffffffffffffffff>] 0xffffffffffffffff
				235
				236	The single most valuable information in this report is the RIP (or EIP on 32-
				237	bit) value. This will help us pinpoint exactly which instruction that caused
				238	the warning.
				239
				240	If your kernel was compiled with ``CONFIG_DEBUG_INFO=y``, then all we have to do
				241	is give this address to the addr2line program, like this::
				242
				243	$ addr2line -e vmlinux -i ffffffff8104ede8
				244	arch/x86/include/asm/string_64.h:12
				245	include/asm-generic/siginfo.h:287
				246	kernel/signal.c:380
				247	kernel/signal.c:410
				248
				249	The "``-e vmlinux``" tells addr2line which file to look in. IMPORTANT:
				250	This must be the vmlinux of the kernel that produced the warning in the
				251	first place! If not, the line number information will almost certainly be
				252	wrong.
				253
				254	The "``-i``" tells addr2line to also print the line numbers of inlined
				255	functions. In this case, the flag was very important, because otherwise,
				256	it would only have printed the first line, which is just a call to
				257	``memcpy()``, which could be called from a thousand places in the kernel, and
				258	is therefore not very useful. These inlined functions would not show up in
				259	the stack trace above, simply because the kernel doesn't load the extra
				260	debugging information. This technique can of course be used with ordinary
				261	kernel oopses as well.
				262
				263	In this case, it's the caller of ``memcpy()`` that is interesting, and it can be
				264	found in ``include/asm-generic/siginfo.h``, line 287::
				265
				266	281 static inline void copy_siginfo(struct siginfo to, struct siginfo from)
				267	282 {
				268	283 if (from->si_code < 0)
				269	284 memcpy(to, from, sizeof(*to));
				270	285 else
				271	286 /* _sigchld is currently the largest know union member */
				272	287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
				273	288 }
				274
				275	Since this was a read (kmemcheck usually warns about reads only, though it can
				276	warn about writes to unallocated or freed memory as well), it was probably the
				277	"from" argument which contained some uninitialized bytes. Following the chain
				278	of calls, we move upwards to see where "from" was allocated or initialized,
				279	``kernel/signal.c``, line 380::
				280
				281	359 static void collect_signal(int sig, struct sigpending list, siginfo_t info)
				282	360 {
				283	...
				284	367 list_for_each_entry(q, &list->list, list) {
				285	368 if (q->info.si_signo == sig) {
				286	369 if (first)
				287	370 goto still_pending;
				288	371 first = q;
				289	...
				290	377 if (first) {
				291	378 still_pending:
				292	379 list_del_init(&first->list);
				293	380 copy_siginfo(info, &first->info);
				294	381 __sigqueue_free(first);
				295	...
				296	392 }
				297	393 }
				298
				299	Here, it is ``&first->info`` that is being passed on to ``copy_siginfo()``. The
				300	variable ``first`` was found on a list -- passed in as the second argument to
				301	``collect_signal()``. We continue our journey through the stack, to figure out
				302	where the item on "list" was allocated or initialized. We move to line 410::
				303
				304	395 static int __dequeue_signal(struct sigpending pending, sigset_t mask,
				305	396 siginfo_t *info)
				306	397 {
				307	...
				308	410 collect_signal(sig, pending, info);
				309	...
				310	414 }
				311
				312	Now we need to follow the ``pending`` pointer, since that is being passed on to
				313	``collect_signal()`` as ``list``. At this point, we've run out of lines from the
				314	"addr2line" output. Not to worry, we just paste the next addresses from the
				315	kmemcheck stack dump, i.e.::
				316
				317	[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
				318	[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
				319	[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
				320	[<ffffffff8100c7b5>] int_signal+0x12/0x17
				321
				322	$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
				323	ffffffff8100b87d ffffffff8100c7b5
				324	kernel/signal.c:446
				325	kernel/signal.c:1806
				326	arch/x86/kernel/signal.c:805
				327	arch/x86/kernel/signal.c:871
				328	arch/x86/kernel/entry_64.S:694
				329
				330	Remember that since these addresses were found on the stack and not as the
				331	RIP value, they actually point to the _next_ instruction (they are return
				332	addresses). This becomes obvious when we look at the code for line 446::
				333
				334	422 int dequeue_signal(struct task_struct tsk, sigset_t mask, siginfo_t *info)
				335	423 {
				336	...
				337	431 signr = __dequeue_signal(&tsk->signal->shared_pending,
				338	432 mask, info);
				339	433 /*
				340	434 * itimer signal ?
				341	435 *
				342	436 * itimers are process shared and we restart periodic
				343	437 * itimers in the signal delivery path to prevent DoS
				344	438 * attacks in the high resolution timer case. This is
				345	439 * compliant with the old way of self restarting
				346	440 * itimers, as the SIGALRM is a legacy signal and only
				347	441 * queued once. Changing the restart behaviour to
				348	442 * restart the timer in the signal dequeue path is
				349	443 * reducing the timer noise on heavy loaded !highres
				350	444 * systems too.
				351	445 */
				352	446 if (unlikely(signr == SIGALRM)) {
				353	...
				354	489 }
				355
				356	So instead of looking at 446, we should be looking at 431, which is the line
				357	that executes just before 446. Here we see that what we are looking for is
				358	``&tsk->signal->shared_pending``.
				359
				360	Our next task is now to figure out which function that puts items on this
				361	``shared_pending`` list. A crude, but efficient tool, is ``git grep``::
				362
				363	$ git grep -n 'shared_pending' kernel/
				364	...
				365	kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
				366	kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
				367	...
				368
				369	There were more results, but none of them were related to list operations,
				370	and these were the only assignments. We inspect the line numbers more closely
				371	and find that this is indeed where items are being added to the list::
				372
				373	816 static int send_signal(int sig, struct siginfo info, struct task_struct t,
				374	817 int group)
				375	818 {
				376	...
				377	828 pending = group ? &t->signal->shared_pending : &t->pending;
				378	...
				379	851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
				380	852 (is_si_special(info) \|\|
				381	853 info->si_code >= 0)));
				382	854 if (q) {
				383	855 list_add_tail(&q->list, &pending->list);
				384	...
				385	890 }
				386
				387	and::
				388
				389	1309 int send_sigqueue(struct sigqueue q, struct task_struct t, int group)
				390	1310 {
				391	....
				392	1339 pending = group ? &t->signal->shared_pending : &t->pending;
				393	1340 list_add_tail(&q->list, &pending->list);
				394	....
				395	1347 }
				396
				397	In the first case, the list element we are looking for, ``q``, is being
				398	returned from the function ``__sigqueue_alloc()``, which looks like an
				399	allocation function. Let's take a look at it::
				400
				401	187 static struct sigqueue __sigqueue_alloc(struct task_struct t, gfp_t flags,
				402	188 int override_rlimit)
				403	189 {
				404	190 struct sigqueue *q = NULL;
				405	191 struct user_struct *user;
				406	192
				407	193 /*
				408	194 * We won't get problems with the target's UID changing under us
				409	195 * because changing it requires RCU be used, and if t != current, the
				410	196 * caller must be holding the RCU readlock (by way of a spinlock) and
				411	197 * we use RCU protection here
				412	198 */
				413	199 user = get_uid(__task_cred(t)->user);
				414	200 atomic_inc(&user->sigpending);
				415	201 if (override_rlimit \|\|
				416	202 atomic_read(&user->sigpending) <=
				417	203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
				418	204 q = kmem_cache_alloc(sigqueue_cachep, flags);
				419	205 if (unlikely(q == NULL)) {
				420	206 atomic_dec(&user->sigpending);
				421	207 free_uid(user);
				422	208 } else {
				423	209 INIT_LIST_HEAD(&q->list);
				424	210 q->flags = 0;
				425	211 q->user = user;
				426	212 }
				427	213
				428	214 return q;
				429	215 }
				430
				431	We see that this function initializes ``q->list``, ``q->flags``, and
				432	``q->user``. It seems that now is the time to look at the definition of
				433	``struct sigqueue``, e.g.::
				434
				435	14 struct sigqueue {
				436	15 struct list_head list;
				437	16 int flags;
				438	17 siginfo_t info;
				439	18 struct user_struct *user;
				440	19 };
				441
				442	And, you might remember, it was a ``memcpy()`` on ``&first->info`` that
				443	caused the warning, so this makes perfect sense. It also seems reasonable
				444	to assume that it is the caller of ``__sigqueue_alloc()`` that has the
				445	responsibility of filling out (initializing) this member.
				446
				447	But just which fields of the struct were uninitialized? Let's look at
				448	kmemcheck's report again::
				449
				450	WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
				451	80000000000000000000000000000000000000000088ffff0000000000000000
				452	i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
				453	^
				454
				455	These first two lines are the memory dump of the memory object itself, and
				456	the shadow bytemap, respectively. The memory object itself is in this case
				457	``&first->info``. Just beware that the start of this dump is NOT the start
				458	of the object itself! The position of the caret (^) corresponds with the
				459	address of the read (ffff88003e4a2024).
				460
				461	The shadow bytemap dump legend is as follows:
				462
				463	- i: initialized
				464	- u: uninitialized
				465	- a: unallocated (memory has been allocated by the slab layer, but has not
				466	yet been handed off to anybody)
				467	- f: freed (memory has been allocated by the slab layer, but has been freed
				468	by the previous owner)
				469
				470	In order to figure out where (relative to the start of the object) the
				471	uninitialized memory was located, we have to look at the disassembly. For
				472	that, we'll need the RIP address again::
				473
				474	RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
				475
				476	$ objdump -d --no-show-raw-insn vmlinux \| grep -C 8 ffffffff8104ede8:
				477	ffffffff8104edc8: mov %r8,0x8(%r8)
				478	ffffffff8104edcc: test %r10d,%r10d
				479	ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
				480	ffffffff8104edd5: mov %rax,%rdx
				481	ffffffff8104edd8: mov $0xc,%ecx
				482	ffffffff8104eddd: mov %r13,%rdi
				483	ffffffff8104ede0: mov $0x30,%eax
				484	ffffffff8104ede5: mov %rdx,%rsi
				485	ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
				486	ffffffff8104edea: test $0x2,%al
				487	ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
				488	ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
				489	ffffffff8104edf0: test $0x1,%al
				490	ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
				491	ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
				492	ffffffff8104edf5: mov %r8,%rdi
				493	ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
				494
				495	As expected, it's the "``rep movsl``" instruction from the ``memcpy()``
				496	that causes the warning. We know about ``REP MOVSL`` that it uses the register
				497	``RCX`` to count the number of remaining iterations. By taking a look at the
				498	register dump again (from the kmemcheck report), we can figure out how many
				499	bytes were left to copy::
				500
				501	RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
				502
				503	By looking at the disassembly, we also see that ``%ecx`` is being loaded
				504	with the value ``$0xc`` just before (ffffffff8104edd8), so we are very
				505	lucky. Keep in mind that this is the number of iterations, not bytes. And
				506	since this is a "long" operation, we need to multiply by 4 to get the
				507	number of bytes. So this means that the uninitialized value was encountered
				508	at 4 * (0xc - 0x9) = 12 bytes from the start of the object.
				509
				510	We can now try to figure out which field of the "``struct siginfo``" that
				511	was not initialized. This is the beginning of the struct::
				512
				513	40 typedef struct siginfo {
				514	41 int si_signo;
				515	42 int si_errno;
				516	43 int si_code;
				517	44
				518	45 union {
				519	..
				520	92 } _sifields;
				521	93 } siginfo_t;
				522
				523	On 64-bit, the int is 4 bytes long, so it must the union member that has
				524	not been initialized. We can verify this using gdb::
				525
				526	$ gdb vmlinux
				527	...
				528	(gdb) p &((struct siginfo *) 0)->_sifields
				529	$1 = (union {...} *) 0x10
				530
				531	Actually, it seems that the union member is located at offset 0x10 -- which
				532	means that gcc has inserted 4 bytes of padding between the members ``si_code``
				533	and ``_sifields``. We can now get a fuller picture of the memory dump::
				534
				535	_----------------------------=> si_code
				536	/ _--------------------=> (padding)
				537	\| / _------------=> _sifields(._kill._pid)
				538	\| \| / _----=> _sifields(._kill._uid)
				539	\| \| \| /
				540	-------\|-------\|-------\|-------\|
				541	80000000000000000000000000000000000000000088ffff0000000000000000
				542	i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
				543
				544	This allows us to realize another important fact: ``si_code`` contains the
				545	value 0x80. Remember that x86 is little endian, so the first 4 bytes
				546	"80000000" are really the number 0x00000080. With a bit of research, we
				547	find that this is actually the constant ``SI_KERNEL`` defined in
				548	``include/asm-generic/siginfo.h``::
				549
				550	144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
				551
				552	This macro is used in exactly one place in the x86 kernel: In ``send_signal()``
				553	in ``kernel/signal.c``::
				554
				555	816 static int send_signal(int sig, struct siginfo info, struct task_struct t,
				556	817 int group)
				557	818 {
				558	...
				559	828 pending = group ? &t->signal->shared_pending : &t->pending;
				560	...
				561	851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
				562	852 (is_si_special(info) \|\|
				563	853 info->si_code >= 0)));
				564	854 if (q) {
				565	855 list_add_tail(&q->list, &pending->list);
				566	856 switch ((unsigned long) info) {
				567	...
				568	865 case (unsigned long) SEND_SIG_PRIV:
				569	866 q->info.si_signo = sig;
				570	867 q->info.si_errno = 0;
				571	868 q->info.si_code = SI_KERNEL;
				572	869 q->info.si_pid = 0;
				573	870 q->info.si_uid = 0;
				574	871 break;
				575	...
				576	890 }
				577
				578	Not only does this match with the ``.si_code`` member, it also matches the place
				579	we found earlier when looking for where siginfo_t objects are enqueued on the
				580	``shared_pending`` list.
				581
				582	So to sum up: It seems that it is the padding introduced by the compiler
				583	between two struct fields that is uninitialized, and this gets reported when
				584	we do a ``memcpy()`` on the struct. This means that we have identified a false
				585	positive warning.
				586
				587	Normally, kmemcheck will not report uninitialized accesses in ``memcpy()`` calls
				588	when both the source and destination addresses are tracked. (Instead, we copy
				589	the shadow bytemap as well). In this case, the destination address clearly
				590	was not tracked. We can dig a little deeper into the stack trace from above::
				591
				592	arch/x86/kernel/signal.c:805
				593	arch/x86/kernel/signal.c:871
				594	arch/x86/kernel/entry_64.S:694
				595
				596	And we clearly see that the destination siginfo object is located on the
				597	stack::
				598
				599	782 static void do_signal(struct pt_regs *regs)
				600	783 {
				601	784 struct k_sigaction ka;
				602	785 siginfo_t info;
				603	...
				604	804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
				605	...
				606	854 }
				607
				608	And this ``&info`` is what eventually gets passed to ``copy_siginfo()`` as the
				609	destination argument.
				610
				611	Now, even though we didn't find an actual error here, the example is still a
				612	good one, because it shows how one would go about to find out what the report
				613	was all about.
				614
				615
				616	Annotating false positives
				617	~~~~~~~~~~~~~~~~~~~~~~~~~~
				618
				619	There are a few different ways to make annotations in the source code that
				620	will keep kmemcheck from checking and reporting certain allocations. Here
				621	they are:
				622
				623	- ``__GFP_NOTRACK_FALSE_POSITIVE``
				624	This flag can be passed to ``kmalloc()`` or ``kmem_cache_alloc()``
				625	(therefore also to other functions that end up calling one of
				626	these) to indicate that the allocation should not be tracked
				627	because it would lead to a false positive report. This is a "big
				628	hammer" way of silencing kmemcheck; after all, even if the false
				629	positive pertains to particular field in a struct, for example, we
				630	will now lose the ability to find (real) errors in other parts of
				631	the same struct.
				632
				633	Example::
				634
				635	/* No warnings will ever trigger on accessing any part of x */
				636	x = kmalloc(sizeof *x, GFP_KERNEL \| __GFP_NOTRACK_FALSE_POSITIVE);
				637
				638	- ``kmemcheck_bitfield_begin(name)``/``kmemcheck_bitfield_end(name)`` and
				639	``kmemcheck_annotate_bitfield(ptr, name)``
				640	The first two of these three macros can be used inside struct
				641	definitions to signal, respectively, the beginning and end of a
				642	bitfield. Additionally, this will assign the bitfield a name, which
				643	is given as an argument to the macros.
				644
				645	Having used these markers, one can later use
				646	kmemcheck_annotate_bitfield() at the point of allocation, to indicate
				647	which parts of the allocation is part of a bitfield.
				648
				649	Example::
				650
				651	struct foo {
				652	int x;
				653
				654	kmemcheck_bitfield_begin(flags);
				655	int flag_a:1;
				656	int flag_b:1;
				657	kmemcheck_bitfield_end(flags);
				658
				659	int y;
				660	};
				661
				662	struct foo x = kmalloc(sizeof x);
				663
				664	/* No warnings will trigger on accessing the bitfield of x */
				665	kmemcheck_annotate_bitfield(x, flags);
				666
				667	Note that ``kmemcheck_annotate_bitfield()`` can be used even before the
				668	return value of ``kmalloc()`` is checked -- in other words, passing NULL
				669	as the first argument is legal (and will do nothing).
				670
				671
				672	Reporting errors
				673	----------------
				674
				675	As we have seen, kmemcheck will produce false positive reports. Therefore, it
				676	is not very wise to blindly post kmemcheck warnings to mailing lists and
				677	maintainers. Instead, I encourage maintainers and developers to find errors
				678	in their own code. If you get a warning, you can try to work around it, try
				679	to figure out if it's a real error or not, or simply ignore it. Most
				680	developers know their own code and will quickly and efficiently determine the
				681	root cause of a kmemcheck report. This is therefore also the most efficient
				682	way to work with kmemcheck.
				683
				684	That said, we (the kmemcheck maintainers) will always be on the lookout for
				685	false positives that we can annotate and silence. So whatever you find,
				686	please drop us a note privately! Kernel configs and steps to reproduce (if
				687	available) are of course a great help too.
				688
				689	Happy hacking!
				690
				691
				692	Technical description
				693	---------------------
				694
				695	kmemcheck works by marking memory pages non-present. This means that whenever
				696	somebody attempts to access the page, a page fault is generated. The page
				697	fault handler notices that the page was in fact only hidden, and so it calls
				698	on the kmemcheck code to make further investigations.
				699
				700	When the investigations are completed, kmemcheck "shows" the page by marking
				701	it present (as it would be under normal circumstances). This way, the
				702	interrupted code can continue as usual.
				703
				704	But after the instruction has been executed, we should hide the page again, so
				705	that we can catch the next access too! Now kmemcheck makes use of a debugging
				706	feature of the processor, namely single-stepping. When the processor has
				707	finished the one instruction that generated the memory access, a debug
				708	exception is raised. From here, we simply hide the page again and continue
				709	execution, this time with the single-stepping feature turned off.
				710
				711	kmemcheck requires some assistance from the memory allocator in order to work.
				712	The memory allocator needs to
				713
				714	1. Tell kmemcheck about newly allocated pages and pages that are about to
				715	be freed. This allows kmemcheck to set up and tear down the shadow memory
				716	for the pages in question. The shadow memory stores the status of each
				717	byte in the allocation proper, e.g. whether it is initialized or
				718	uninitialized.
				719
				720	2. Tell kmemcheck which parts of memory should be marked uninitialized.
				721	There are actually a few more states, such as "not yet allocated" and
				722	"recently freed".
				723
				724	If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
				725	memory that can take page faults because of kmemcheck.
				726
				727	If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
				728	request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
				729	This does not prevent the page faults from occurring, however, but marks the
				730	object in question as being initialized so that no warnings will ever be
				731	produced for this object.
				732
				733	Currently, the SLAB and SLUB allocators are supported by kmemcheck.