Blame - doc/kernel.txt - platform/external/libfuse

blob: 6fcfadae3d8fb38ead3f1df5bbe2d8d58730fa68 [file] [log] [blame]

Miklos Szeredi	407e6a7	2005-03-25 12:19:43 +0000	[diff] [blame^]	1	The following diagram shows how a filesystem operation (in this
				2	example unlink) is performed in FUSE.
				3
				4	NOTE: everything in this description is greatly simplified
				5
				6	\| "rm /mnt/fuse/file" \| FUSE filesystem daemon
				7	\| \|
				8	\| \| >sys_read()
				9	\| \| >fuse_dev_read()
				10	\| \| >request_wait()
				11	\| \| [sleep on fc->waitq]
				12	\| \|
				13	\| >sys_unlink() \|
				14	\| >fuse_unlink() \|
				15	\| [get request from \|
				16	\| fc->unused_list] \|
				17	\| >request_send() \|
				18	\| [queue req on fc->pending] \|
				19	\| [wake up fc->waitq] \| [woken up]
				20	\| >request_wait_answer() \|
				21	\| [sleep on req->waitq] \|
				22	\| \| <request_wait()
				23	\| \| [remove req from fc->pending]
				24	\| \| [copy req to read buffer]
				25	\| \| [add req to fc->processing]
				26	\| \| <fuse_dev_read()
				27	\| \| <sys_read()
				28	\| \|
				29	\| \| [perform unlink]
				30	\| \|
				31	\| \| >sys_write()
				32	\| \| >fuse_dev_write()
				33	\| \| [look up req in fc->processing]
				34	\| \| [remove from fc->processing]
				35	\| \| [copy write buffer to req]
				36	\| [woken up] \| [wake up req->waitq]
				37	\| \| <fuse_dev_write()
				38	\| \| <sys_write()
				39	\| <request_wait_answer() \|
				40	\| <request_send() \|
				41	\| [add request to \|
				42	\| fc->unused_list] \|
				43	\| <fuse_unlink() \|
				44	\| <sys_unlink() \|
				45
				46	There are a couple of ways in which to deadlock a FUSE filesystem.
				47	Since we are talking about unprivileged userspace programs,
				48	something must be done about these.
				49
				50	Scenario 1 - Simple deadlock
				51	-----------------------------
				52
				53	\| "rm /mnt/fuse/file" \| FUSE filesystem daemon
				54	\| \|
				55	\| >sys_unlink("/mnt/fuse/file") \|
				56	\| [acquire inode semaphore \|
				57	\| for "file"] \|
				58	\| >fuse_unlink() \|
				59	\| [sleep on req->waitq] \|
				60	\| \| <sys_read()
				61	\| \| >sys_unlink("/mnt/fuse/file")
				62	\| \| [acquire inode semaphore
				63	\| \| for "file"]
				64	\| \| DEADLOCK
				65
				66	The solution for this is to allow requests to be interrupted while
				67	they are in userspace:
				68
				69	\| [interrupted by signal] \|
				70	\| <fuse_unlink() \|
				71	\| [release semaphore] \| [semaphore acquired]
				72	\| <sys_unlink() \|
				73	\| \| >fuse_unlink()
				74	\| \| [queue req on fc->pending]
				75	\| \| [wake up fc->waitq]
				76	\| \| [sleep on req->waitq]
				77
				78	If the filesystem daemon was single threaded, this will stop here,
				79	since there's no other thread to dequeue and execute the request.
				80	In this case the solution is to kill the FUSE daemon as well. If
				81	there are multiple serving threads, you just have to kill them as
				82	long as any remain.
				83
				84	Moral: a filesystem which deadlocks, can soon find itself dead.
				85
				86	Scenario 2 - Tricky deadlock
				87	----------------------------
				88
				89	This one needs a carefully crafted filesystem. It's a variation on
				90	the above, only the call back to the filesystem is not explicit,
				91	but is caused by a pagefault.
				92
				93	\| Kamikaze filesystem thread 1 \| Kamikaze filesystem thread 2
				94	\| \|
				95	\| [fd = open("/mnt/fuse/file")] \| [request served normally]
				96	\| [mmap fd to 'addr'] \|
				97	\| [close fd] \| [FLUSH triggers 'magic' flag]
				98	\| [read a byte from addr] \|
				99	\| >do_page_fault() \|
				100	\| [find or create page] \|
				101	\| [lock page] \|
				102	\| >fuse_readpage() \|
				103	\| [queue READ request] \|
				104	\| [sleep on req->waitq] \|
				105	\| \| [read request to buffer]
				106	\| \| [create reply header before addr]
				107	\| \| >sys_write(addr - headerlength)
				108	\| \| >fuse_dev_write()
				109	\| \| [look up req in fc->processing]
				110	\| \| [remove from fc->processing]
				111	\| \| [copy write buffer to req]
				112	\| \| >do_page_fault()
				113	\| \| [find or create page]
				114	\| \| [lock page]
				115	\| \| * DEADLOCK *
				116
				117	Solution is again to let the the request be interrupted (not
				118	elaborated further).
				119
				120	An additional problem is that while the write buffer is being
				121	copied to the request, the request must not be interrupted. This
				122	is because the destination address of the copy may not be valid
				123	after the request is interrupted.
				124
				125	This is solved with doing the copy atomically, and allowing
				126	interruption while the page(s) belonging to the write buffer are
				127	faulted with get_user_pages(). The 'req->locked' flag indicates
				128	when the copy is taking place, and interruption is delayed until
				129	this flag is unset.
				130