Blame - Documentation/RCU/lockdep-splat.txt - kernel/msm-4.9

blob: bf9061142827fca8d7f19c6cb013f322b47bc4e2 [file] [log] [blame]

Paul E. McKenney	d7bd2d6	2011-07-21 17:10:40 -0700	[diff] [blame]	1	Lockdep-RCU was added to the Linux kernel in early 2010
				2	(http://lwn.net/Articles/371986/). This facility checks for some common
				3	misuses of the RCU API, most notably using one of the rcu_dereference()
				4	family to access an RCU-protected pointer without the proper protection.
				5	When such misuse is detected, an lockdep-RCU splat is emitted.
				6
				7	The usual cause of a lockdep-RCU slat is someone accessing an
				8	RCU-protected data structure without either (1) being in the right kind of
				9	RCU read-side critical section or (2) holding the right update-side lock.
				10	This problem can therefore be serious: it might result in random memory
				11	overwriting or worse. There can of course be false positives, this
				12	being the real world and all that.
				13
				14	So let's look at an example RCU lockdep splat from 3.0-rc5, one that
				15	has long since been fixed:
				16
				17	===============================
				18	[ INFO: suspicious RCU usage. ]
				19	-------------------------------
				20	block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
				21
				22	other info that might help us debug this:
				23
				24
				25	rcu_scheduler_active = 1, debug_locks = 0
				26	3 locks held by scsi_scan_6/1552:
				27	#0: (&shost->scan_mutex){+.+.+.}, at: [<ffffffff8145efca>]
				28	scsi_scan_host_selected+0x5a/0x150
				29	#1: (&eq->sysfs_lock){+.+...}, at: [<ffffffff812a5032>]
				30	elevator_exit+0x22/0x60
				31	#2: (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffff812b6233>]
				32	cfq_exit_queue+0x43/0x190
				33
				34	stack backtrace:
				35	Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
				36	Call Trace:
				37	[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
				38	[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
				39	[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
				40	[<ffffffff812a5046>] elevator_exit+0x36/0x60
				41	[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
				42	[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
				43	[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
				44	[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
				45	[<ffffffff817da069>] ? error_exit+0x29/0xb0
				46	[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
				47	[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
				48	[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
				49	[<ffffffff817da069>] ? error_exit+0x29/0xb0
				50	[<ffffffff812bcc60>] ? kobject_del+0x40/0x40
				51	[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
				52	[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
				53	[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
				54	[<ffffffff8145f170>] do_scan_async+0x20/0x160
				55	[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
				56	[<ffffffff810975b6>] kthread+0xa6/0xb0
				57	[<ffffffff817db154>] kernel_thread_helper+0x4/0x10
				58	[<ffffffff81066430>] ? finish_task_switch+0x80/0x110
				59	[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
				60	[<ffffffff81097510>] ? __init_kthread_worker+0x70/0x70
				61	[<ffffffff817db150>] ? gs_change+0xb/0xb
				62
				63	Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:
				64
				65	if (rcu_dereference(ioc->ioc_data) == cic) {
				66
				67	This form says that it must be in a plain vanilla RCU read-side critical
				68	section, but the "other info" list above shows that this is not the
				69	case. Instead, we hold three locks, one of which might be RCU related.
				70	And maybe that lock really does protect this reference. If so, the fix
				71	is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to
				72	take the struct request_queue "q" from cfq_exit_queue() as an argument,
				73	which would permit us to invoke rcu_dereference_protected as follows:
				74
				75	if (rcu_dereference_protected(ioc->ioc_data,
				76	lockdep_is_held(&q->queue_lock)) == cic) {
				77
				78	With this change, there would be no lockdep-RCU splat emitted if this
				79	code was invoked either from within an RCU read-side critical section
				80	or with the ->queue_lock held. In particular, this would have suppressed
				81	the above lockdep-RCU splat because ->queue_lock is held (see #2 in the
				82	list above).
				83
				84	On the other hand, perhaps we really do need an RCU read-side critical
				85	section. In this case, the critical section must span the use of the
				86	return value from rcu_dereference(), or at least until there is some
				87	reference count incremented or some such. One way to handle this is to
				88	add rcu_read_lock() and rcu_read_unlock() as follows:
				89
				90	rcu_read_lock();
				91	if (rcu_dereference(ioc->ioc_data) == cic) {
				92	spin_lock(&ioc->lock);
				93	rcu_assign_pointer(ioc->ioc_data, NULL);
				94	spin_unlock(&ioc->lock);
				95	}
				96	rcu_read_unlock();
				97
				98	With this change, the rcu_dereference() is always within an RCU
				99	read-side critical section, which again would have suppressed the
				100	above lockdep-RCU splat.
				101
				102	But in this particular case, we don't actually deference the pointer
				103	returned from rcu_dereference(). Instead, that pointer is just compared
				104	to the cic pointer, which means that the rcu_dereference() can be replaced
				105	by rcu_access_pointer() as follows:
				106
				107	if (rcu_access_pointer(ioc->ioc_data) == cic) {
				108
				109	Because it is legal to invoke rcu_access_pointer() without protection,
				110	this change would also suppress the above lockdep-RCU splat.