Blame - Documentation/filesystems/Locking - kernel/msm-4.9

blob: af1608070cd599341df31c2040f8aa1b1e502a9f [file] [log] [blame]

Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	1	The text below describes the locking rules for VFS-related methods.
				2	It is (believed to be) up-to-date. Please, if you change anything in
				3	prototypes or locking protocols - update this file. And update the relevant
				4	instances in the tree, don't leave that to maintainers of filesystems/devices/
				5	etc. At the very least, put the list of dubious cases in the end of this file.
				6	Don't turn it into log - maintainers of out-of-the-tree code are supposed to
				7	be able to use diff(1).
				8	Thing currently missing here: socket operations. Alexey?
				9
				10	--------------------------- dentry_operations --------------------------
				11	prototypes:
				12	int (d_revalidate)(struct dentry , int);
				13	int (d_hash) (struct dentry , struct qstr *);
				14	int (d_compare) (struct dentry , struct qstr , struct qstr );
				15	int (d_delete)(struct dentry );
				16	void (d_release)(struct dentry );
				17	void (d_iput)(struct dentry , struct inode *);
Eric Dumazet	c23fbb6	2007-05-08 00:26:18 -0700	[diff] [blame]	18	char (d_dname)((struct dentry dentry, char buffer, int buflen);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	19
				20	locking rules:
				21	none have BKL
				22	dcache_lock rename_lock ->d_lock may block
				23	d_revalidate: no no no yes
				24	d_hash no no no yes
				25	d_compare: no yes no no
				26	d_delete: yes no yes no
				27	d_release: no no no yes
				28	d_iput: no no no yes
Eric Dumazet	c23fbb6	2007-05-08 00:26:18 -0700	[diff] [blame]	29	d_dname: no no no no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	30
				31	--------------------------- inode_operations ---------------------------
				32	prototypes:
				33	int (create) (struct inode ,struct dentry ,int, struct nameidata );
				34	struct dentry * (lookup) (struct inode ,struct dentry *, struct nameid
				35	ata *);
				36	int (link) (struct dentry ,struct inode ,struct dentry );
				37	int (unlink) (struct inode ,struct dentry *);
				38	int (symlink) (struct inode ,struct dentry ,const char );
				39	int (mkdir) (struct inode ,struct dentry *,int);
				40	int (rmdir) (struct inode ,struct dentry *);
				41	int (mknod) (struct inode ,struct dentry *,int,dev_t);
				42	int (rename) (struct inode , struct dentry *,
				43	struct inode , struct dentry );
				44	int (readlink) (struct dentry , char __user *,int);
				45	int (follow_link) (struct dentry , struct nameidata *);
				46	void (truncate) (struct inode );
				47	int (permission) (struct inode , int, struct nameidata *);
				48	int (setattr) (struct dentry , struct iattr *);
				49	int (getattr) (struct vfsmount , struct dentry , struct kstat );
				50	int (setxattr) (struct dentry , const char ,const void ,size_t,int);
				51	ssize_t (getxattr) (struct dentry , const char , void , size_t);
				52	ssize_t (listxattr) (struct dentry , char *, size_t);
				53	int (removexattr) (struct dentry , const char *);
				54
				55	locking rules:
				56	all may block, none have BKL
Artem Bityutskiy	a7bc02f	2007-05-09 07:53:16 +0200	[diff] [blame]	57	i_mutex(inode)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	58	lookup: yes
				59	create: yes
				60	link: yes (both)
				61	mknod: yes
				62	symlink: yes
				63	mkdir: yes
				64	unlink: yes (both)
				65	rmdir: yes (both) (see below)
				66	rename: yes (all) (see below)
				67	readlink: no
				68	follow_link: no
				69	truncate: yes (see below)
				70	setattr: yes
				71	permission: no
				72	getattr: no
				73	setxattr: yes
				74	getxattr: no
				75	listxattr: no
				76	removexattr: yes
Artem Bityutskiy	a7bc02f	2007-05-09 07:53:16 +0200	[diff] [blame]	77	Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_mutex on
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	78	victim.
				79	cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
				80	->truncate() is never called directly - it's a callback, not a
				81	method. It's called by vmtruncate() - library function normally used by
				82	->setattr(). Locking information above applies to that call (i.e. is
				83	inherited from ->setattr() - vmtruncate() is used when ATTR_SIZE had been
				84	passed).
				85
				86	See Documentation/filesystems/directory-locking for more detailed discussion
				87	of the locking scheme for directory operations.
				88
				89	--------------------------- super_operations ---------------------------
				90	prototypes:
				91	struct inode (alloc_inode)(struct super_block *sb);
				92	void (destroy_inode)(struct inode );
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	93	void (dirty_inode) (struct inode );
				94	int (write_inode) (struct inode , int);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	95	void (drop_inode) (struct inode );
				96	void (delete_inode) (struct inode );
				97	void (put_super) (struct super_block );
				98	void (write_super) (struct super_block );
				99	int (sync_fs)(struct super_block sb, int wait);
Takashi Sato	c4be0c1	2009-01-09 16:40:58 -0800	[diff] [blame]	100	int (freeze_fs) (struct super_block );
				101	int (unfreeze_fs) (struct super_block );
David Howells	726c334	2006-06-23 02:02:58 -0700	[diff] [blame]	102	int (statfs) (struct dentry , struct kstatfs *);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	103	int (remount_fs) (struct super_block , int , char );
				104	void (clear_inode) (struct inode );
				105	void (umount_begin) (struct super_block );
				106	int (show_options)(struct seq_file , struct vfsmount *);
				107	ssize_t (quota_read)(struct super_block , int, char *, size_t, loff_t);
				108	ssize_t (quota_write)(struct super_block , int, const char *, size_t, loff_t);
				109
				110	locking rules:
				111	All may block.
Christoph Hellwig	7e325d3	2009-06-19 20:22:37 +0200	[diff] [blame]	112	None have BKL
				113	s_umount
				114	alloc_inode:
				115	destroy_inode:
				116	dirty_inode: (must not sleep)
				117	write_inode:
				118	drop_inode: !!!inode_lock!!!
				119	delete_inode:
				120	put_super: write
				121	write_super: read
				122	sync_fs: read
				123	freeze_fs: read
				124	unfreeze_fs: read
				125	statfs: no
				126	remount_fs: maybe (see below)
				127	clear_inode:
				128	umount_begin: no
				129	show_options: no (namespace_sem)
				130	quota_read: no (see below)
				131	quota_write: no (see below)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	132
Christoph Hellwig	7e325d3	2009-06-19 20:22:37 +0200	[diff] [blame]	133	->remount_fs() will have the s_umount exclusive lock if it's already mounted.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	134	When called from get_sb_single, it does NOT have the s_umount lock.
				135	->quota_read() and ->quota_write() functions are both guaranteed to
				136	be the only ones operating on the quota file by the quota code (via
				137	dqio_sem) (unless an admin really wants to screw up something and
				138	writes to quota files with quotas on). For other details about locking
				139	see also dquot_operations section.
				140
				141	--------------------------- file_system_type ---------------------------
				142	prototypes:
Jonathan Corbet	5d8b2eb	2006-07-10 04:44:07 -0700	[diff] [blame]	143	int (get_sb) (struct file_system_type , int,
				144	const char , void , struct vfsmount *);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	145	void (kill_sb) (struct super_block );
				146	locking rules:
				147	may block BKL
Christoph Hellwig	adaae72	2008-09-09 20:02:01 +0200	[diff] [blame]	148	get_sb yes no
				149	kill_sb yes no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	150
David Howells	454e239	2006-06-23 02:02:57 -0700	[diff] [blame]	151	->get_sb() returns error or 0 with locked superblock attached to the vfsmount
				152	(exclusive on ->s_umount).
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	153	->kill_sb() takes a write-locked superblock, does all shutdown work on it,
				154	unlocks and drops the reference.
				155
				156	--------------------------- address_space_operations --------------------------
				157	prototypes:
				158	int (writepage)(struct page page, struct writeback_control *wbc);
				159	int (readpage)(struct file , struct page *);
				160	int (sync_page)(struct page );
				161	int (writepages)(struct address_space , struct writeback_control *);
				162	int (set_page_dirty)(struct page page);
				163	int (readpages)(struct file filp, struct address_space *mapping,
				164	struct list_head *pages, unsigned nr_pages);
Nick Piggin	4e02ed4	2008-10-29 14:00:55 -0700	[diff] [blame]	165	int (write_begin)(struct file , struct address_space *mapping,
				166	loff_t pos, unsigned len, unsigned flags,
				167	struct page pagep, void fsdata);
				168	int (write_end)(struct file , struct address_space *mapping,
				169	loff_t pos, unsigned len, unsigned copied,
				170	struct page page, void fsdata);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	171	sector_t (bmap)(struct address_space , sector_t);
				172	int (invalidatepage) (struct page , unsigned long);
				173	int (releasepage) (struct page , int);
				174	int (direct_IO)(int, struct kiocb , const struct iovec *iov,
				175	loff_t offset, unsigned long nr_segs);
Trond Myklebust	e3db769	2007-01-10 23:15:39 -0800	[diff] [blame]	176	int (launder_page) (struct page );
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	177
				178	locking rules:
				179	All except set_page_dirty may block
				180
Thadeu Lima de Souza Cascardo	ca0dbd8	2010-05-07 16:52:26 -0300	[diff] [blame^]	181	BKL PageLocked(page) i_mutex
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	182	writepage: no yes, unlocks (see below)
				183	readpage: no yes, unlocks
				184	sync_page: no maybe
				185	writepages: no
				186	set_page_dirty no no
				187	readpages: no
Nick Piggin	afddba4	2007-10-16 01:25:01 -0700	[diff] [blame]	188	write_begin: no locks the page yes
				189	write_end: no yes, unlocks yes
				190	perform_write: no n/a yes
Al Viro	fe36adf	2009-06-16 13:35:01 -0400	[diff] [blame]	191	bmap: no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	192	invalidatepage: no yes
				193	releasepage: no yes
				194	direct_IO: no
Trond Myklebust	e3db769	2007-01-10 23:15:39 -0800	[diff] [blame]	195	launder_page: no yes
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	196
Nick Piggin	4e02ed4	2008-10-29 14:00:55 -0700	[diff] [blame]	197	->write_begin(), ->write_end(), ->sync_page() and ->readpage()
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	198	may be called from the request handler (/dev/loop).
				199
				200	->readpage() unlocks the page, either synchronously or via I/O
				201	completion.
				202
				203	->readpages() populates the pagecache with the passed pages and starts
				204	I/O against them. They come unlocked upon I/O completion.
				205
				206	->writepage() is used for two purposes: for "memory cleansing" and for
				207	"sync". These are quite different operations and the behaviour may differ
				208	depending upon the mode.
				209
				210	If writepage is called for sync (wbc->sync_mode != WBC_SYNC_NONE) then
				211	it must start I/O against the page, even if that would involve
				212	blocking on in-progress I/O.
				213
				214	If writepage is called for memory cleansing (sync_mode ==
				215	WBC_SYNC_NONE) then its role is to get as much writeout underway as
				216	possible. So writepage should try to avoid blocking against
				217	currently-in-progress I/O.
				218
				219	If the filesystem is not called for "sync" and it determines that it
				220	would need to block against in-progress I/O to be able to start new I/O
				221	against the page the filesystem should redirty the page with
				222	redirty_page_for_writepage(), then unlock the page and return zero.
				223	This may also be done to avoid internal deadlocks, but rarely.
				224
Robert P. J. Day	3a4fa0a	2007-10-19 23:10:43 +0200	[diff] [blame]	225	If the filesystem is called for sync then it must wait on any
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	226	in-progress I/O and then start new I/O.
				227
Nikita Danilov	2054606	2005-05-01 08:58:37 -0700	[diff] [blame]	228	The filesystem should unlock the page synchronously, before returning to the
				229	caller, unless ->writepage() returns special WRITEPAGE_ACTIVATE
				230	value. WRITEPAGE_ACTIVATE means that page cannot really be written out
				231	currently, and VM should stop calling ->writepage() on this page for some
				232	time. VM does this by moving page to the head of the active list, hence the
				233	name.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	234
				235	Unless the filesystem is going to redirty_page_for_writepage(), unlock the page
				236	and return zero, writepage must run set_page_writeback() against the page,
				237	followed by unlocking it. Once set_page_writeback() has been run against the
				238	page, write I/O can be submitted and the write I/O completion handler must run
				239	end_page_writeback() once the I/O is complete. If no I/O is submitted, the
				240	filesystem must run end_page_writeback() against the page before returning from
				241	writepage.
				242
				243	That is: after 2.5.12, pages which are under writeout are not locked. Note,
				244	if the filesystem needs the page to be locked during writeout, that is ok, too,
				245	the page is allowed to be unlocked at any point in time between the calls to
				246	set_page_writeback() and end_page_writeback().
				247
				248	Note, failure to run either redirty_page_for_writepage() or the combination of
				249	set_page_writeback()/end_page_writeback() on a page submitted to writepage
				250	will leave the page itself marked clean but it will be tagged as dirty in the
				251	radix tree. This incoherency can lead to all sorts of hard-to-debug problems
				252	in the filesystem like having dirty inodes at umount and losing written data.
				253
				254	->sync_page() locking rules are not well-defined - usually it is called
				255	with lock on page, but that is not guaranteed. Considering the currently
				256	existing instances of this method ->sync_page() itself doesn't look
				257	well-defined...
				258
				259	->writepages() is used for periodic writeback and for syscall-initiated
				260	sync operations. The address_space should start I/O against at least
				261	nr_to_write pages. nr_to_write must be decremented for each page which is
				262	written. The address_space implementation may write more (or less) pages
				263	than *nr_to_write asks for, but it should try to be reasonably close. If
				264	nr_to_write is NULL, all dirty pages must be written.
				265
				266	writepages should _only_ write pages which are present on
				267	mapping->io_pages.
				268
				269	->set_page_dirty() is called from various places in the kernel
				270	when the target page is marked as needing writeback. It may be called
				271	under spinlock (it cannot block) and is sometimes called with the page
				272	not locked.
				273
				274	->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
				275	filesystems and by the swapper. The latter will eventually go away. All
				276	instances do not actually need the BKL. Please, keep it that way and don't
				277	breed new callers.
				278
				279	->invalidatepage() is called when the filesystem must attempt to drop
				280	some or all of the buffers from the page when it is being truncated. It
				281	returns zero on success. If ->invalidatepage is zero, the kernel uses
				282	block_invalidatepage() instead.
				283
				284	->releasepage() is called when the kernel is about to try to drop the
				285	buffers from the page in preparation for freeing it. It returns zero to
				286	indicate that the buffers are (or may be) freeable. If ->releasepage is zero,
				287	the kernel assumes that the fs has no private interest in the buffers.
				288
Trond Myklebust	e3db769	2007-01-10 23:15:39 -0800	[diff] [blame]	289	->launder_page() may be called prior to releasing a page if
				290	it is still found to be dirty. It returns zero if the page was successfully
				291	cleaned, or an error value if not. Note that in order to prevent the page
				292	getting mapped back in and redirtied, it needs to be kept locked
				293	across the entire operation.
				294
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	295	Note: currently almost all instances of address_space methods are
				296	using BKL for internal serialization and that's one of the worst sources
				297	of contention. Normally they are calling library functions (in fs/buffer.c)
				298	and pass foo_get_block() as a callback (on local block-based filesystems,
				299	indeed). BKL is not needed for library stuff and is usually taken by
				300	foo_get_block(). It's an overkill, since block bitmaps can be protected by
				301	internal fs locking and real critical areas are much smaller than the areas
				302	filesystems protect now.
				303
				304	----------------------- file_lock_operations ------------------------------
				305	prototypes:
				306	void (fl_insert)(struct file_lock ); /* lock insertion callback */
				307	void (fl_remove)(struct file_lock ); /* lock removal callback */
				308	void (fl_copy_lock)(struct file_lock , struct file_lock *);
				309	void (fl_release_private)(struct file_lock );
				310
				311
				312	locking rules:
				313	BKL may block
				314	fl_insert: yes no
				315	fl_remove: yes no
				316	fl_copy_lock: yes no
				317	fl_release_private: yes yes
				318
				319	----------------------- lock_manager_operations ---------------------------
				320	prototypes:
				321	int (fl_compare_owner)(struct file_lock , struct file_lock *);
				322	void (fl_notify)(struct file_lock ); /* unblock callback */
				323	void (fl_copy_lock)(struct file_lock , struct file_lock *);
				324	void (fl_release_private)(struct file_lock );
				325	void (fl_break)(struct file_lock ); /* break_lease callback */
				326
				327	locking rules:
				328	BKL may block
				329	fl_compare_owner: yes no
				330	fl_notify: yes no
				331	fl_copy_lock: yes no
				332	fl_release_private: yes yes
				333	fl_break: yes no
				334
				335	Currently only NFSD and NLM provide instances of this class. None of the
				336	them block. If you have out-of-tree instances - please, show up. Locking
				337	in that area will change.
				338	--------------------------- buffer_head -----------------------------------
				339	prototypes:
				340	void (b_end_io)(struct buffer_head bh, int uptodate);
				341
				342	locking rules:
				343	called from interrupts. In other words, extreme care is needed here.
				344	bh is locked, but that's all warranties we have here. Currently only RAID1,
				345	highmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices
				346	call this method upon the IO completion.
				347
				348	--------------------------- block_device_operations -----------------------
				349	prototypes:
				350	int (open) (struct inode , struct file *);
				351	int (release) (struct inode , struct file *);
				352	int (ioctl) (struct inode , struct file *, unsigned, unsigned long);
				353	int (media_changed) (struct gendisk );
				354	int (revalidate_disk) (struct gendisk );
				355
				356	locking rules:
				357	BKL bd_sem
				358	open: yes yes
				359	release: yes yes
				360	ioctl: yes no
				361	media_changed: no no
				362	revalidate_disk: no no
				363
				364	The last two are called only from check_disk_change().
				365
				366	--------------------------- file_operations -------------------------------
				367	prototypes:
				368	loff_t (llseek) (struct file , loff_t, int);
				369	ssize_t (read) (struct file , char __user , size_t, loff_t );
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	370	ssize_t (write) (struct file , const char __user , size_t, loff_t );
Badari Pulavarty	027445c	2006-09-30 23:28:46 -0700	[diff] [blame]	371	ssize_t (aio_read) (struct kiocb , const struct iovec *, unsigned long, loff_t);
				372	ssize_t (aio_write) (struct kiocb , const struct iovec *, unsigned long, loff_t);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	373	int (readdir) (struct file , void *, filldir_t);
				374	unsigned int (poll) (struct file , struct poll_table_struct *);
				375	int (ioctl) (struct inode , struct file *, unsigned int,
				376	unsigned long);
				377	long (unlocked_ioctl) (struct file , unsigned int, unsigned long);
				378	long (compat_ioctl) (struct file , unsigned int, unsigned long);
				379	int (mmap) (struct file , struct vm_area_struct *);
				380	int (open) (struct inode , struct file *);
				381	int (flush) (struct file );
				382	int (release) (struct inode , struct file *);
				383	int (fsync) (struct file , struct dentry *, int datasync);
				384	int (aio_fsync) (struct kiocb , int datasync);
				385	int (fasync) (int, struct file , int);
				386	int (lock) (struct file , int, struct file_lock *);
				387	ssize_t (readv) (struct file , const struct iovec *, unsigned long,
				388	loff_t *);
				389	ssize_t (writev) (struct file , const struct iovec *, unsigned long,
				390	loff_t *);
				391	ssize_t (sendfile) (struct file , loff_t *, size_t, read_actor_t,
				392	void __user *);
				393	ssize_t (sendpage) (struct file , struct page *, int, size_t,
				394	loff_t *, int);
				395	unsigned long (get_unmapped_area)(struct file , unsigned long,
				396	unsigned long, unsigned long, unsigned long);
				397	int (*check_flags)(int);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	398	};
				399
				400	locking rules:
Tejun Heo	5f820f6	2009-01-06 14:40:59 -0800	[diff] [blame]	401	All may block.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	402	BKL
				403	llseek: no (see below)
				404	read: no
				405	aio_read: no
				406	write: no
				407	aio_write: no
				408	readdir: no
				409	poll: no
				410	ioctl: yes (see below)
				411	unlocked_ioctl: no (see below)
				412	compat_ioctl: no
				413	mmap: no
Christoph Hellwig	adaae72	2008-09-09 20:02:01 +0200	[diff] [blame]	414	open: no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	415	flush: no
				416	release: no
				417	fsync: no (see below)
				418	aio_fsync: no
Christoph Hellwig	adaae72	2008-09-09 20:02:01 +0200	[diff] [blame]	419	fasync: no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	420	lock: yes
				421	readv: no
				422	writev: no
				423	sendfile: no
				424	sendpage: no
				425	get_unmapped_area: no
				426	check_flags: no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	427
				428	->llseek() locking has moved from llseek to the individual llseek
				429	implementations. If your fs is not using generic_file_llseek, you
				430	need to acquire and release the appropriate locks in your ->llseek().
				431	For many filesystems, it is probably safe to acquire the inode
Thadeu Lima de Souza Cascardo	ca0dbd8	2010-05-07 16:52:26 -0300	[diff] [blame^]	432	mutex. Note some filesystems (i.e. remote ones) provide no
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	433	protection for i_size so you will need to use the BKL.
				434
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	435	Note: ext2_release() was the source of contention on fs-intensive
				436	loads and dropping BKL on ->release() helps to get rid of that (we still
				437	grab BKL for cases when we close a file that had been opened r/w, but that
				438	can and should be done using the internal locking with smaller critical areas).
				439	Current worst offender is ext2_get_block()...
				440
Jonathan Corbet	7639842	2009-02-01 14:26:59 -0700	[diff] [blame]	441	->fasync() is called without BKL protection, and is responsible for
				442	maintaining the FASYNC bit in filp->f_flags. Most instances call
				443	fasync_helper(), which does that maintenance, so it's not normally
				444	something one needs to worry about. Return values > 0 will be mapped to
				445	zero in the VFS layer.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	446
				447	->readdir() and ->ioctl() on directories must be changed. Ideally we would
				448	move ->readdir() to inode_operations and use a separate method for directory
				449	->ioctl() or kill the latter completely. One of the problems is that for
				450	anything that resembles union-mount we won't have a struct file for all
				451	components. And there are other reasons why the current interface is a mess...
				452
				453	->ioctl() on regular files is superceded by the ->unlocked_ioctl() that
				454	doesn't take the BKL.
				455
				456	->read on directories probably must go away - we should just enforce -EISDIR
				457	in sys_read() and friends.
				458
Artem Bityutskiy	a7bc02f	2007-05-09 07:53:16 +0200	[diff] [blame]	459	->fsync() has i_mutex on inode.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	460
				461	--------------------------- dquot_operations -------------------------------
				462	prototypes:
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	463	int (write_dquot) (struct dquot );
				464	int (acquire_dquot) (struct dquot );
				465	int (release_dquot) (struct dquot );
				466	int (mark_dirty) (struct dquot );
				467	int (write_info) (struct super_block , int);
				468
				469	These operations are intended to be more or less wrapping functions that ensure
				470	a proper locking wrt the filesystem and call the generic quota operations.
				471
				472	What filesystem should expect from the generic quota functions:
				473
				474	FS recursion Held locks when called
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	475	write_dquot: yes dqonoff_sem or dqptr_sem
				476	acquire_dquot: yes dqonoff_sem or dqptr_sem
				477	release_dquot: yes dqonoff_sem or dqptr_sem
				478	mark_dirty: no -
				479	write_info: yes dqonoff_sem
				480
				481	FS recursion means calling ->quota_read() and ->quota_write() from superblock
				482	operations.
				483
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	484	More details about quota locking can be found in fs/dquot.c.
				485
				486	--------------------------- vm_operations_struct -----------------------------
				487	prototypes:
				488	void (open)(struct vm_area_struct);
				489	void (close)(struct vm_area_struct);
Nick Piggin	d0217ac	2007-07-19 01:47:03 -0700	[diff] [blame]	490	int (fault)(struct vm_area_struct, struct vm_fault *);
Nick Piggin	c2ec175	2009-03-31 15:23:21 -0700	[diff] [blame]	491	int (page_mkwrite)(struct vm_area_struct , struct vm_fault *);
Rik van Riel	28b2ee2	2008-07-23 21:27:05 -0700	[diff] [blame]	492	int (access)(struct vm_area_struct , unsigned long, void*, int, int);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	493
				494	locking rules:
Mark Fasheh	ed2f2f9	2007-07-19 01:47:01 -0700	[diff] [blame]	495	BKL mmap_sem PageLocked(page)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	496	open: no yes
				497	close: no yes
Nick Piggin	b827e49	2009-04-30 15:08:16 -0700	[diff] [blame]	498	fault: no yes can return with page locked
				499	page_mkwrite: no yes can return with page locked
Rik van Riel	28b2ee2	2008-07-23 21:27:05 -0700	[diff] [blame]	500	access: no yes
Mark Fasheh	ed2f2f9	2007-07-19 01:47:01 -0700	[diff] [blame]	501
Nick Piggin	b827e49	2009-04-30 15:08:16 -0700	[diff] [blame]	502	->fault() is called when a previously not present pte is about
				503	to be faulted in. The filesystem must find and return the page associated
				504	with the passed in "pgoff" in the vm_fault structure. If it is possible that
				505	the page may be truncated and/or invalidated, then the filesystem must lock
				506	the page, then ensure it is not already truncated (the page lock will block
				507	subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
				508	locked. The VM will unlock the page.
				509
				510	->page_mkwrite() is called when a previously read-only pte is
				511	about to become writeable. The filesystem again must ensure that there are
				512	no truncate/invalidate races, and then return with the page locked. If
				513	the page has been truncated, the filesystem should not look up a new page
				514	like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
				515	will cause the VM to retry the fault.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	516
Rik van Riel	28b2ee2	2008-07-23 21:27:05 -0700	[diff] [blame]	517	->access() is called when get_user_pages() fails in
				518	acces_process_vm(), typically used to debug a process through
				519	/proc/pid/mem or ptrace. This function is needed only for
				520	VM_IO \| VM_PFNMAP VMAs.
				521
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	522	================================================================================
				523	Dubious stuff
				524
				525	(if you break something or notice that it is broken and do not fix it yourself
				526	- at least put it here)
				527
				528	ipc/shm.c::shm_delete() - may need BKL.
				529	->read() and ->write() in many drivers are (probably) missing BKL.