blob: 0f3a1390bf0087a2509c39d6a317b25b78bd2988 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001Changes since 2.5.0:
2
Oliver Pinter3eb43f62008-02-03 17:59:17 +02003---
Linus Torvalds1da177e2005-04-16 15:20:36 -07004[recommended]
5
6New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
7 sb_set_blocksize() and sb_min_blocksize().
8
9Use them.
10
11(sb_find_get_block() replaces 2.4's get_hash_table())
12
Oliver Pinter3eb43f62008-02-03 17:59:17 +020013---
Linus Torvalds1da177e2005-04-16 15:20:36 -070014[recommended]
15
16New methods: ->alloc_inode() and ->destroy_inode().
17
18Remove inode->u.foo_inode_i
19Declare
20 struct foo_inode_info {
21 /* fs-private stuff */
22 struct inode vfs_inode;
23 };
24 static inline struct foo_inode_info *FOO_I(struct inode *inode)
25 {
26 return list_entry(inode, struct foo_inode_info, vfs_inode);
27 }
28
29Use FOO_I(inode) instead of &inode->u.foo_inode_i;
30
Oliver Pinter3eb43f62008-02-03 17:59:17 +020031Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate
Linus Torvalds1da177e2005-04-16 15:20:36 -070032foo_inode_info and return the address of ->vfs_inode, the latter should free
33FOO_I(inode) (see in-tree filesystems for examples).
34
35Make them ->alloc_inode and ->destroy_inode in your super_operations.
36
David Howells12debc42008-02-07 00:15:52 -080037Keep in mind that now you need explicit initialization of private data
38typically between calling iget_locked() and unlocking the inode.
Linus Torvalds1da177e2005-04-16 15:20:36 -070039
40At some point that will become mandatory.
41
42---
43[mandatory]
44
45Change of file_system_type method (->read_super to ->get_sb)
46
47->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
48
49Turn your foo_read_super() into a function that would return 0 in case of
50success and negative number in case of error (-EINVAL unless you have more
51informative error value to report). Call it foo_fill_super(). Now declare
52
David Howells454e2392006-06-23 02:02:57 -070053int foo_get_sb(struct file_system_type *fs_type,
54 int flags, const char *dev_name, void *data, struct vfsmount *mnt)
Linus Torvalds1da177e2005-04-16 15:20:36 -070055{
David Howells454e2392006-06-23 02:02:57 -070056 return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
57 mnt);
Linus Torvalds1da177e2005-04-16 15:20:36 -070058}
59
60(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
61filesystem).
62
63Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
64foo_get_sb.
65
66---
67[mandatory]
68
69Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
70Most likely there is no need to change anything, but if you relied on
71global exclusion between renames for some internal purpose - you need to
72change your internal locking. Otherwise exclusion warranties remain the
73same (i.e. parents and victim are locked, etc.).
74
75---
76[informational]
77
78Now we have the exclusion between ->lookup() and directory removal (by
79->rmdir() and ->rename()). If you used to need that exclusion and do
80it by internal locking (most of filesystems couldn't care less) - you
81can relax your locking.
82
83---
84[mandatory]
85
86->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
87->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
88and ->readdir() are called without BKL now. Grab it on entry, drop upon return
89- that will guarantee the same locking you used to have. If your method or its
90parts do not need BKL - better yet, now you can shift lock_kernel() and
91unlock_kernel() so that they would protect exactly what needs to be
92protected.
93
94---
95[mandatory]
96
Artem Bityutskiy34e50532012-07-25 18:12:00 +030097BKL is also moved from around sb operations. BKL should have been shifted into
98individual fs sb_op functions. If you don't need it, remove it.
Linus Torvalds1da177e2005-04-16 15:20:36 -070099
100---
101[informational]
102
103check for ->link() target not being a directory is done by callers. Feel
104free to drop it...
105
106---
107[informational]
108
Josef 'Jeff' Sipekc2b38982007-05-24 12:21:43 -0400109->link() callers hold ->i_mutex on the object we are linking to. Some of your
Linus Torvalds1da177e2005-04-16 15:20:36 -0700110problems might be over...
111
112---
113[mandatory]
114
115new file_system_type method - kill_sb(superblock). If you are converting
116an existing filesystem, set it according to ->fs_flags:
117 FS_REQUIRES_DEV - kill_block_super
118 FS_LITTER - kill_litter_super
119 neither - kill_anon_super
120FS_LITTER is gone - just remove it from fs_flags.
121
122---
123[mandatory]
124
125 FS_SINGLE is gone (actually, that had happened back when ->get_sb()
126went in - and hadn't been documented ;-/). Just remove it from fs_flags
127(and see ->get_sb() entry for other actions).
128
129---
130[mandatory]
131
Josef 'Jeff' Sipekc2b38982007-05-24 12:21:43 -0400132->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so
133watch for ->i_mutex-grabbing code that might be used by your ->setattr().
134Callers of notify_change() need ->i_mutex now.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700135
136---
137[recommended]
138
139New super_block field "struct export_operations *s_export_op" for
140explicit support for exporting, e.g. via NFS. The structure is fully
141documented at its declaration in include/linux/fs.h, and in
J. Bruce Fieldsdc7a0812009-10-27 14:41:35 -0400142Documentation/filesystems/nfs/Exporting.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700143
144Briefly it allows for the definition of decode_fh and encode_fh operations
145to encode and decode filehandles, and allows the filesystem to use
146a standard helper function for decode_fh, and provide file-system specific
147support for this helper, particularly get_parent.
148
149It is planned that this will be required for exporting once the code
150settles down a bit.
151
152[mandatory]
153
154s_export_op is now required for exporting a filesystem.
155isofs, ext2, ext3, resierfs, fat
156can be used as examples of very different filesystems.
157
158---
159[mandatory]
160
161iget4() and the read_inode2 callback have been superseded by iget5_locked()
162which has the following prototype,
163
164 struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
165 int (*test)(struct inode *, void *),
166 int (*set)(struct inode *, void *),
167 void *data);
168
169'test' is an additional function that can be used when the inode
170number is not sufficient to identify the actual file object. 'set'
171should be a non-blocking function that initializes those parts of a
172newly created inode to allow the test function to succeed. 'data' is
173passed as an opaque value to both test and set functions.
174
David Howells12debc42008-02-07 00:15:52 -0800175When the inode has been created by iget5_locked(), it will be returned with the
176I_NEW flag set and will still be locked. The filesystem then needs to finalize
177the initialization. Once the inode is initialized it must be unlocked by
178calling unlock_new_inode().
Linus Torvalds1da177e2005-04-16 15:20:36 -0700179
180The filesystem is responsible for setting (and possibly testing) i_ino
181when appropriate. There is also a simpler iget_locked function that
182just takes the superblock and inode number as arguments and does the
183test and set for you.
184
185e.g.
David Howellsb46980f2008-02-07 00:15:27 -0800186 inode = iget_locked(sb, ino);
187 if (inode->i_state & I_NEW) {
188 err = read_inode_from_disk(inode);
189 if (err < 0) {
190 iget_failed(inode);
191 return err;
192 }
193 unlock_new_inode(inode);
194 }
195
196Note that if the process of setting up a new inode fails, then iget_failed()
197should be called on the inode to render it dead, and an appropriate error
198should be passed back to the caller.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700199
200---
201[recommended]
202
203->getattr() finally getting used. See instances in nfs, minix, etc.
204
205---
206[mandatory]
207
208->revalidate() is gone. If your filesystem had it - provide ->getattr()
209and let it call whatever you had as ->revlidate() + (for symlinks that
210had ->revalidate()) add calls in ->follow_link()/->readlink().
211
212---
213[mandatory]
214
215->d_parent changes are not protected by BKL anymore. Read access is safe
216if at least one of the following is true:
217 * filesystem has no cross-directory rename()
Linus Torvalds1da177e2005-04-16 15:20:36 -0700218 * we know that parent had been locked (e.g. we are looking at
219->d_parent of ->lookup() argument).
220 * we are called from ->rename().
221 * the child's ->d_lock is held
222Audit your code and add locking if needed. Notice that any place that is
223not protected by the conditions above is risky even in the old tree - you
224had been relying on BKL and that's prone to screwups. Old tree had quite
225a few holes of that kind - unprotected access to ->d_parent leading to
226anything from oops to silent memory corruption.
227
228---
229[mandatory]
230
231 FS_NOMOUNT is gone. If you use it - just set MS_NOUSER in flags
232(see rootfs for one kind of solution and bdev/socket/pipe for another).
233
234---
235[recommended]
236
237 Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter
238is still alive, but only because of the mess in drivers/s390/block/dasd.c.
239As soon as it gets fixed is_read_only() will die.
240
241---
242[mandatory]
243
244->permission() is called without BKL now. Grab it on entry, drop upon
245return - that will guarantee the same locking you used to have. If
246your method or its parts do not need BKL - better yet, now you can
247shift lock_kernel() and unlock_kernel() so that they would protect
248exactly what needs to be protected.
249
250---
251[mandatory]
252
253->statfs() is now called without BKL held. BKL should have been
254shifted into individual fs sb_op functions where it's not clear that
255it's safe to remove it. If you don't need it, remove it.
256
257---
258[mandatory]
259
260 is_read_only() is gone; use bdev_read_only() instead.
261
262---
263[mandatory]
264
265 destroy_buffers() is gone; use invalidate_bdev().
266
267---
268[mandatory]
269
270 fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is
271deliberate; as soon as struct block_device * is propagated in a reasonable
272way by that code fixing will become trivial; until then nothing can be
273done.
Christoph Hellwig1e231732010-06-07 09:29:20 +0200274
275[mandatory]
276
277 block truncatation on error exit from ->write_begin, and ->direct_IO
278moved from generic methods (block_write_begin, cont_write_begin,
279nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at
280ext2_write_failed and callers for an example.
281
282[mandatory]
283
Marco Stornellib9f61c32012-12-15 12:00:38 +0100284 ->truncate is gone. The whole truncate sequence needs to be
Christoph Hellwig1e231732010-06-07 09:29:20 +0200285implemented in ->setattr, which is now mandatory for filesystems
286implementing on-disk size changes. Start with a copy of the old inode_setattr
287and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
288be in order of zeroing blocks using block_truncate_page or similar helpers,
289size update and on finally on-disk truncation which should not fail.
290inode_change_ok now includes the size checks for ATTR_SIZE and must be called
291in the beginning of ->setattr unconditionally.
Al Viro336fb3b2010-06-08 00:37:12 -0400292
293[mandatory]
294
295 ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
296be used instead. It gets called whenever the inode is evicted, whether it has
297remaining links or not. Caller does *not* evict the pagecache or inode-associated
Johannes Weiner91b0abe2014-04-03 14:47:49 -0700298metadata buffers; the method has to use truncate_inode_pages_final() to get rid
299of those. Caller makes sure async writeback cannot be running for the inode while
300(or after) ->evict_inode() is called.
Dave Chinnerf283c862011-03-22 22:23:39 +1100301
302 ->drop_inode() returns int now; it's called on final iput() with
303inode->i_lock held and it returns true if filesystems wants the inode to be
304dropped. As before, generic_drop_inode() is still the default and it's been
305updated appropriately. generic_delete_inode() is also alive and it consists
306simply of return 1. Note that all actual eviction work is done by caller after
307->drop_inode() returns.
308
Jan Karadbd57682012-05-03 14:48:02 +0200309 As before, clear_inode() must be called exactly once on each call of
310->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike
311before, if you are using inode-associated metadata buffers (i.e.
312mark_buffer_dirty_inode()), it's your responsibility to call
313invalidate_inode_buffers() before clear_inode().
Al Viro336fb3b2010-06-08 00:37:12 -0400314
315 NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
316if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput()
317may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
318free the on-disk inode, you may end up doing that while ->write_inode() is writing
319to it.
Nick Pigginfe15ce42011-01-07 17:49:23 +1100320
321---
322[mandatory]
323
324 .d_delete() now only advises the dcache as to whether or not to cache
325unreferenced dentries, and is now only called when the dentry refcount goes to
3260. Even on 0 refcount transition, it must be able to tolerate being called 0,
3271, or more times (eg. constant, idempotent).
Nick Piggin621e1552011-01-07 17:49:27 +1100328
329---
330[mandatory]
331
332 .d_compare() calling convention and locking rules are significantly
333changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
334look at examples of other filesystems) for guidance.
Nick Pigginb1e6a012011-01-07 17:49:28 +1100335
336---
337[mandatory]
338
339 .d_hash() calling convention and locking rules are significantly
340changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
341look at examples of other filesystems) for guidance.
Nick Pigginb5c84bf2011-01-07 17:49:38 +1100342
343---
344[mandatory]
345 dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
346for details of what locks to replace dcache_lock with in order to protect
347particular things. Most of the time, a filesystem only needs ->d_lock, which
348protects *all* the dcache state of a given dentry.
Nick Pigginfa0d7e3d2011-01-07 17:49:49 +1100349
350--
351[mandatory]
352
353 Filesystems must RCU-free their inodes, if they can have been accessed
354via rcu-walk path walk (basically, if the file can have had a path name in the
355vfs namespace).
356
Al Viro049b3c12012-06-09 11:55:20 -0400357 Even though i_dentry and i_rcu share storage in a union, we will
358initialize the former in inode_init_always(), so just leave it alone in
359the callback. It used to be necessary to clean it there, but not anymore
360(starting at 3.2).
Nick Piggin34286d62011-01-07 17:49:57 +1100361
362--
363[recommended]
364 vfs now tries to do path walking in "rcu-walk mode", which avoids
365atomic operations and scalability hazards on dentries and inodes (see
Nick Piggina82416d2011-01-14 02:26:53 +0000366Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes
367(above) are examples of the changes required to support this. For more complex
Nick Piggin34286d62011-01-07 17:49:57 +1100368filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
369no changes are required to the filesystem. However, this is costly and loses
370the benefits of rcu-walk mode. We will begin to add filesystem callbacks that
371are rcu-walk aware, shown below. Filesystems should take advantage of this
372where possible.
373
374--
375[mandatory]
376 d_revalidate is a callback that is made on every path element (if
377the filesystem provides it), which requires dropping out of rcu-walk mode. This
378may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
379returned if the filesystem cannot handle rcu-walk. See
380Documentation/filesystems/vfs.txt for more details.
Nick Pigginb74c79e2011-01-07 17:49:58 +1100381
382 permission and check_acl are inode permission checks that are called
383on many or all directory inodes on the way down a path walk (to check for
Nick Piggina82416d2011-01-14 02:26:53 +0000384exec permission). These must now be rcu-walk aware (flags & IPERM_FLAG_RCU).
385See Documentation/filesystems/vfs.txt for more details.
Josef Bacik92424152011-01-05 15:00:07 -0500386
387--
388[mandatory]
389 In ->fallocate() you must check the mode option passed in. If your
390filesystem does not support hole punching (deallocating space in the middle of a
391file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
392Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
393so the i_size should not change when hole punching, even when puching the end of
394a file off.
Al Viro1a102ff2011-03-16 09:07:58 -0400395
396--
397[mandatory]
398 ->get_sb() is gone. Switch to use of ->mount(). Typically it's just
399a matter of switching from calling get_sb_... to mount_... and changing the
400function type. If you were doing it manually, just switch from setting ->mnt_root
401to some pointer to returning that pointer. On errors return ERR_PTR(...).
Al Viro76fe3272011-06-20 21:56:31 -0400402
403--
404[mandatory]
Christoph Hellwig4e34e712011-07-23 17:37:31 +0200405 ->permission() and generic_permission()have lost flags
Al Viro76fe3272011-06-20 21:56:31 -0400406argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
Christoph Hellwig4e34e712011-07-23 17:37:31 +0200407 generic_permission() has also lost the check_acl argument; ACL checking
408has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
409to read an ACL from disk.
Josef Bacik982d8162011-07-18 13:21:35 -0400410
411--
412[mandatory]
413 If you implement your own ->llseek() you must handle SEEK_HOLE and
414SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to
415support it in some way. The generic handler assumes that the entire file is
416data and there is a virtual hole at the end of the file. So if the provided
417offset is less than i_size and SEEK_DATA is specified, return the same offset.
418If the above is true for the offset and you are given SEEK_HOLE, return the end
419of the file. If the offset is i_size or greater return -ENXIO in either case.
Josef Bacik02c24a82011-07-16 20:44:56 -0400420
421[mandatory]
422 If you have your own ->fsync() you must make sure to call
423filemap_write_and_wait_range() so that all dirty pages are synced out properly.
424You must also keep in mind that ->fsync() is not called with i_mutex held
425anymore, so if you require i_mutex locking you must make sure to take it and
426release it yourself.
Al Viro32991ab2012-02-12 22:15:47 -0500427
428--
429[mandatory]
430 d_alloc_root() is gone, along with a lot of bugs caused by code
431misusing it. Replacement: d_make_root(inode). The difference is,
432d_make_root() drops the reference to inode if dentry allocation fails.
Al Viro0b728e12012-06-10 16:03:43 -0400433
434--
435[mandatory]
Al Viro00cd8dd2012-06-10 17:13:09 -0400436 The witch is dead! Well, 2/3 of it, anyway. ->d_revalidate() and
437->lookup() do *not* take struct nameidata anymore; just the flags.
Al Viroebfc3b42012-06-10 18:05:36 -0400438--
439[mandatory]
440 ->create() doesn't take struct nameidata *; unlike the previous
441two, it gets "is it an O_EXCL or equivalent?" boolean argument. Note that
442local filesystems can ignore tha argument - they are guaranteed that the
443object doesn't exist. It's remote/distributed ones that might care...
Jeff Laytonecf3d1f2013-02-20 11:19:05 -0500444--
445[mandatory]
446 FS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
447in your dentry operations instead.
Al Viro5c0ba4e2013-05-15 13:52:59 -0400448--
449[mandatory]
450 vfs_readdir() is gone; switch to iterate_dir() instead
Al Viro2233f312013-05-22 21:44:23 -0400451--
452[mandatory]
453 ->readdir() is gone now; switch to ->iterate()
Christoph Hellwig4aa32892013-09-09 07:16:41 -0700454[mandatory]
455 vfs_follow_link has been removed. Filesystems must use nd_set_link
456 from ->follow_link for normal symlinks, or nd_jump_link for magic
457 /proc/<pid> style links.
Al Viro5a3cd99282013-11-06 09:54:52 -0500458--
459[mandatory]
460 iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
461 called with both ->i_lock and inode_hash_lock held; the former is *not*
462 taken anymore, so verify that your callbacks do not rely on it (none
463 of the in-tree instances did). inode_hash_lock is still held,
464 of course, so they are still serialized wrt removal from inode hash,
465 as well as wrt set() callback of iget5_locked().