Blame - Documentation/filesystems/fuse.txt - kernel/msm

blob: 324df27704cccb4c2ad8fbe8835140c46a0307b2 [file] [log] [blame]

Miklos Szeredi	334f485	2005-09-09 13:10:27 -0700	[diff] [blame]	1	Definitions
				2	~~~~~~~~~~~
				3
				4	Userspace filesystem:
				5
				6	A filesystem in which data and metadata are provided by an ordinary
				7	userspace process. The filesystem can be accessed normally through
				8	the kernel interface.
				9
				10	Filesystem daemon:
				11
				12	The process(es) providing the data and metadata of the filesystem.
				13
				14	Non-privileged mount (or user mount):
				15
				16	A userspace filesystem mounted by a non-privileged (non-root) user.
				17	The filesystem daemon is running with the privileges of the mounting
				18	user. NOTE: this is not the same as mounts allowed with the "user"
				19	option in /etc/fstab, which is not discussed here.
				20
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	21	Filesystem connection:
				22
				23	A connection between the filesystem daemon and the kernel. The
				24	connection exists until either the daemon dies, or the filesystem is
				25	umounted. Note that detaching (or lazy umounting) the filesystem
				26	does _not_ break the connection, in this case it will exist until
				27	the last reference to the filesystem is released.
				28
Miklos Szeredi	334f485	2005-09-09 13:10:27 -0700	[diff] [blame]	29	Mount owner:
				30
				31	The user who does the mounting.
				32
				33	User:
				34
				35	The user who is performing filesystem operations.
				36
				37	What is FUSE?
				38	~~~~~~~~~~~~~
				39
				40	FUSE is a userspace filesystem framework. It consists of a kernel
				41	module (fuse.ko), a userspace library (libfuse.*) and a mount utility
				42	(fusermount).
				43
				44	One of the most important features of FUSE is allowing secure,
				45	non-privileged mounts. This opens up new possibilities for the use of
				46	filesystems. A good example is sshfs: a secure network filesystem
				47	using the sftp protocol.
				48
				49	The userspace library and utilities are available from the FUSE
				50	homepage:
				51
				52	http://fuse.sourceforge.net/
				53
				54	Mount options
				55	~~~~~~~~~~~~~
				56
				57	'fd=N'
				58
				59	The file descriptor to use for communication between the userspace
				60	filesystem and the kernel. The file descriptor must have been
				61	obtained by opening the FUSE device ('/dev/fuse').
				62
				63	'rootmode=M'
				64
				65	The file mode of the filesystem's root in octal representation.
				66
				67	'user_id=N'
				68
				69	The numeric user id of the mount owner.
				70
				71	'group_id=N'
				72
				73	The numeric group id of the mount owner.
				74
				75	'default_permissions'
				76
				77	By default FUSE doesn't check file access permissions, the
				78	filesystem is free to implement it's access policy or leave it to
				79	the underlying file access mechanism (e.g. in case of network
				80	filesystems). This option enables permission checking, restricting
				81	access based on file mode. This is option is usually useful
				82	together with the 'allow_other' mount option.
				83
				84	'allow_other'
				85
				86	This option overrides the security measure restricting file access
				87	to the user mounting the filesystem. This option is by default only
				88	allowed to root, but this restriction can be removed with a
				89	(userspace) configuration option.
				90
Miklos Szeredi	334f485	2005-09-09 13:10:27 -0700	[diff] [blame]	91	'max_read=N'
				92
				93	With this option the maximum size of read operations can be set.
				94	The default is infinite. Note that the size of read requests is
				95	limited anyway to 32 pages (which is 128kbyte on i386).
				96
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	97	Control filesystem
				98	~~~~~~~~~~~~~~~~~~
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	99
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	100	There's a control filesystem for FUSE, which can be mounted by:
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	101
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	102	mount -t fusectl none /sys/fs/fuse/connections
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	103
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	104	Mounting it under the '/sys/fs/fuse/connections' directory makes it
				105	backwards compatible with earlier versions.
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	106
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	107	Under the fuse control filesystem each connection has a directory
				108	named by a unique number.
				109
				110	For each connection the following files exist within this directory:
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	111
				112	'waiting'
				113
				114	The number of requests which are waiting to be transfered to
				115	userspace or being processed by the filesystem daemon. If there is
				116	no filesystem activity and 'waiting' is non-zero, then the
				117	filesystem is hung or deadlocked.
				118
				119	'abort'
				120
				121	Writing anything into this file will abort the filesystem
				122	connection. This means that all waiting requests will be aborted an
				123	error returned for all aborted and new requests.
				124
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	125	Only the owner of the mount may read or write these files.
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	126
				127	Aborting a filesystem connection
				128	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				129
				130	It is possible to get into certain situations where the filesystem is
				131	not responding. Reasons for this may be:
				132
				133	a) Broken userspace filesystem implementation
				134
				135	b) Network connection down
				136
				137	c) Accidental deadlock
				138
				139	d) Malicious deadlock
				140
				141	(For more on c) and d) see later sections)
				142
				143	In either of these cases it may be useful to abort the connection to
				144	the filesystem. There are several ways to do this:
				145
				146	- Kill the filesystem daemon. Works in case of a) and b)
				147
				148	- Kill the filesystem daemon and all users of the filesystem. Works
				149	in all cases except some malicious deadlocks
				150
				151	- Use forced umount (umount -f). Works in all cases but only if
				152	filesystem is still attached (it hasn't been lazy unmounted)
				153
Miklos Szeredi	bafa965	2006-06-25 05:48:51 -0700	[diff] [blame^]	154	- Abort filesystem through the FUSE control filesystem. Most
				155	powerful method, always works.
Miklos Szeredi	bacac38	2006-01-16 22:14:47 -0800	[diff] [blame]	156
Miklos Szeredi	334f485	2005-09-09 13:10:27 -0700	[diff] [blame]	157	How do non-privileged mounts work?
				158	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				159
				160	Since the mount() system call is a privileged operation, a helper
				161	program (fusermount) is needed, which is installed setuid root.
				162
				163	The implication of providing non-privileged mounts is that the mount
				164	owner must not be able to use this capability to compromise the
				165	system. Obvious requirements arising from this are:
				166
				167	A) mount owner should not be able to get elevated privileges with the
				168	help of the mounted filesystem
				169
				170	B) mount owner should not get illegitimate access to information from
				171	other users' and the super user's processes
				172
				173	C) mount owner should not be able to induce undesired behavior in
				174	other users' or the super user's processes
				175
				176	How are requirements fulfilled?
				177	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				178
				179	A) The mount owner could gain elevated privileges by either:
				180
				181	1) creating a filesystem containing a device file, then opening
				182	this device
				183
				184	2) creating a filesystem containing a suid or sgid application,
				185	then executing this application
				186
				187	The solution is not to allow opening device files and ignore
				188	setuid and setgid bits when executing programs. To ensure this
				189	fusermount always adds "nosuid" and "nodev" to the mount options
				190	for non-privileged mounts.
				191
				192	B) If another user is accessing files or directories in the
				193	filesystem, the filesystem daemon serving requests can record the
				194	exact sequence and timing of operations performed. This
				195	information is otherwise inaccessible to the mount owner, so this
				196	counts as an information leak.
				197
				198	The solution to this problem will be presented in point 2) of C).
				199
				200	C) There are several ways in which the mount owner can induce
				201	undesired behavior in other users' processes, such as:
				202
				203	1) mounting a filesystem over a file or directory which the mount
				204	owner could otherwise not be able to modify (or could only
				205	make limited modifications).
				206
				207	This is solved in fusermount, by checking the access
				208	permissions on the mountpoint and only allowing the mount if
				209	the mount owner can do unlimited modification (has write
				210	access to the mountpoint, and mountpoint is not a "sticky"
				211	directory)
				212
				213	2) Even if 1) is solved the mount owner can change the behavior
				214	of other users' processes.
				215
				216	i) It can slow down or indefinitely delay the execution of a
				217	filesystem operation creating a DoS against the user or the
				218	whole system. For example a suid application locking a
				219	system file, and then accessing a file on the mount owner's
				220	filesystem could be stopped, and thus causing the system
				221	file to be locked forever.
				222
				223	ii) It can present files or directories of unlimited length, or
				224	directory structures of unlimited depth, possibly causing a
				225	system process to eat up diskspace, memory or other
				226	resources, again causing DoS.
				227
				228	The solution to this as well as B) is not to allow processes
				229	to access the filesystem, which could otherwise not be
				230	monitored or manipulated by the mount owner. Since if the
				231	mount owner can ptrace a process, it can do all of the above
				232	without using a FUSE mount, the same criteria as used in
				233	ptrace can be used to check if a process is allowed to access
				234	the filesystem or not.
				235
				236	Note that the ptrace check is not strictly necessary to
				237	prevent B/2/i, it is enough to check if mount owner has enough
				238	privilege to send signal to the process accessing the
				239	filesystem, since SIGSTOP can be used to get a similar effect.
				240
				241	I think these limitations are unacceptable?
				242	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				243
				244	If a sysadmin trusts the users enough, or can ensure through other
				245	measures, that system processes will never enter non-privileged
				246	mounts, it can relax the last limitation with a "user_allow_other"
				247	config option. If this config option is set, the mounting user can
				248	add the "allow_other" mount option which disables the check for other
				249	users' processes.
				250
				251	Kernel - userspace interface
				252	~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				253
				254	The following diagram shows how a filesystem operation (in this
				255	example unlink) is performed in FUSE.
				256
				257	NOTE: everything in this description is greatly simplified
				258
				259	\| "rm /mnt/fuse/file" \| FUSE filesystem daemon
				260	\| \|
				261	\| \| >sys_read()
				262	\| \| >fuse_dev_read()
				263	\| \| >request_wait()
				264	\| \| [sleep on fc->waitq]
				265	\| \|
				266	\| >sys_unlink() \|
				267	\| >fuse_unlink() \|
				268	\| [get request from \|
				269	\| fc->unused_list] \|
				270	\| >request_send() \|
				271	\| [queue req on fc->pending] \|
				272	\| [wake up fc->waitq] \| [woken up]
				273	\| >request_wait_answer() \|
				274	\| [sleep on req->waitq] \|
				275	\| \| <request_wait()
				276	\| \| [remove req from fc->pending]
				277	\| \| [copy req to read buffer]
				278	\| \| [add req to fc->processing]
				279	\| \| <fuse_dev_read()
				280	\| \| <sys_read()
				281	\| \|
				282	\| \| [perform unlink]
				283	\| \|
				284	\| \| >sys_write()
				285	\| \| >fuse_dev_write()
				286	\| \| [look up req in fc->processing]
				287	\| \| [remove from fc->processing]
				288	\| \| [copy write buffer to req]
				289	\| [woken up] \| [wake up req->waitq]
				290	\| \| <fuse_dev_write()
				291	\| \| <sys_write()
				292	\| <request_wait_answer() \|
				293	\| <request_send() \|
				294	\| [add request to \|
				295	\| fc->unused_list] \|
				296	\| <fuse_unlink() \|
				297	\| <sys_unlink() \|
				298
				299	There are a couple of ways in which to deadlock a FUSE filesystem.
				300	Since we are talking about unprivileged userspace programs,
				301	something must be done about these.
				302
				303	Scenario 1 - Simple deadlock
				304	-----------------------------
				305
				306	\| "rm /mnt/fuse/file" \| FUSE filesystem daemon
				307	\| \|
				308	\| >sys_unlink("/mnt/fuse/file") \|
				309	\| [acquire inode semaphore \|
				310	\| for "file"] \|
				311	\| >fuse_unlink() \|
				312	\| [sleep on req->waitq] \|
				313	\| \| <sys_read()
				314	\| \| >sys_unlink("/mnt/fuse/file")
				315	\| \| [acquire inode semaphore
				316	\| \| for "file"]
				317	\| \| DEADLOCK
				318
Miklos Szeredi	51eb01e	2006-06-25 05:48:50 -0700	[diff] [blame]	319	The solution for this is to allow the filesystem to be aborted.
Miklos Szeredi	334f485	2005-09-09 13:10:27 -0700	[diff] [blame]	320
				321	Scenario 2 - Tricky deadlock
				322	----------------------------
				323
				324	This one needs a carefully crafted filesystem. It's a variation on
				325	the above, only the call back to the filesystem is not explicit,
				326	but is caused by a pagefault.
				327
				328	\| Kamikaze filesystem thread 1 \| Kamikaze filesystem thread 2
				329	\| \|
				330	\| [fd = open("/mnt/fuse/file")] \| [request served normally]
				331	\| [mmap fd to 'addr'] \|
				332	\| [close fd] \| [FLUSH triggers 'magic' flag]
				333	\| [read a byte from addr] \|
				334	\| >do_page_fault() \|
				335	\| [find or create page] \|
				336	\| [lock page] \|
				337	\| >fuse_readpage() \|
				338	\| [queue READ request] \|
				339	\| [sleep on req->waitq] \|
				340	\| \| [read request to buffer]
				341	\| \| [create reply header before addr]
				342	\| \| >sys_write(addr - headerlength)
				343	\| \| >fuse_dev_write()
				344	\| \| [look up req in fc->processing]
				345	\| \| [remove from fc->processing]
				346	\| \| [copy write buffer to req]
				347	\| \| >do_page_fault()
				348	\| \| [find or create page]
				349	\| \| [lock page]
				350	\| \| * DEADLOCK *
				351
Miklos Szeredi	51eb01e	2006-06-25 05:48:50 -0700	[diff] [blame]	352	Solution is basically the same as above.
Miklos Szeredi	334f485	2005-09-09 13:10:27 -0700	[diff] [blame]	353
				354	An additional problem is that while the write buffer is being
				355	copied to the request, the request must not be interrupted. This
				356	is because the destination address of the copy may not be valid
				357	after the request is interrupted.
				358
Miklos Szeredi	51eb01e	2006-06-25 05:48:50 -0700	[diff] [blame]	359	This is solved with doing the copy atomically, and allowing abort
				360	while the page(s) belonging to the write buffer are faulted with
				361	get_user_pages(). The 'req->locked' flag indicates when the copy is
				362	taking place, and abort is delayed until this flag is unset.