blob: b12c89538680aaf5f2cf3f5f872321f73a3d57ab [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001Changes since 2.5.0:
2
Oliver Pinter3eb43f62008-02-03 17:59:17 +02003---
Linus Torvalds1da177e2005-04-16 15:20:36 -07004[recommended]
5
6New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
7 sb_set_blocksize() and sb_min_blocksize().
8
9Use them.
10
11(sb_find_get_block() replaces 2.4's get_hash_table())
12
Oliver Pinter3eb43f62008-02-03 17:59:17 +020013---
Linus Torvalds1da177e2005-04-16 15:20:36 -070014[recommended]
15
16New methods: ->alloc_inode() and ->destroy_inode().
17
18Remove inode->u.foo_inode_i
19Declare
20 struct foo_inode_info {
21 /* fs-private stuff */
22 struct inode vfs_inode;
23 };
24 static inline struct foo_inode_info *FOO_I(struct inode *inode)
25 {
26 return list_entry(inode, struct foo_inode_info, vfs_inode);
27 }
28
29Use FOO_I(inode) instead of &inode->u.foo_inode_i;
30
Oliver Pinter3eb43f62008-02-03 17:59:17 +020031Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate
Linus Torvalds1da177e2005-04-16 15:20:36 -070032foo_inode_info and return the address of ->vfs_inode, the latter should free
33FOO_I(inode) (see in-tree filesystems for examples).
34
35Make them ->alloc_inode and ->destroy_inode in your super_operations.
36
David Howells12debc42008-02-07 00:15:52 -080037Keep in mind that now you need explicit initialization of private data
38typically between calling iget_locked() and unlocking the inode.
Linus Torvalds1da177e2005-04-16 15:20:36 -070039
40At some point that will become mandatory.
41
42---
43[mandatory]
44
45Change of file_system_type method (->read_super to ->get_sb)
46
47->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
48
49Turn your foo_read_super() into a function that would return 0 in case of
50success and negative number in case of error (-EINVAL unless you have more
51informative error value to report). Call it foo_fill_super(). Now declare
52
David Howells454e2392006-06-23 02:02:57 -070053int foo_get_sb(struct file_system_type *fs_type,
54 int flags, const char *dev_name, void *data, struct vfsmount *mnt)
Linus Torvalds1da177e2005-04-16 15:20:36 -070055{
David Howells454e2392006-06-23 02:02:57 -070056 return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
57 mnt);
Linus Torvalds1da177e2005-04-16 15:20:36 -070058}
59
60(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
61filesystem).
62
63Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
64foo_get_sb.
65
66---
67[mandatory]
68
69Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
70Most likely there is no need to change anything, but if you relied on
71global exclusion between renames for some internal purpose - you need to
72change your internal locking. Otherwise exclusion warranties remain the
73same (i.e. parents and victim are locked, etc.).
74
75---
76[informational]
77
78Now we have the exclusion between ->lookup() and directory removal (by
79->rmdir() and ->rename()). If you used to need that exclusion and do
80it by internal locking (most of filesystems couldn't care less) - you
81can relax your locking.
82
83---
84[mandatory]
85
86->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
87->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
88and ->readdir() are called without BKL now. Grab it on entry, drop upon return
89- that will guarantee the same locking you used to have. If your method or its
90parts do not need BKL - better yet, now you can shift lock_kernel() and
91unlock_kernel() so that they would protect exactly what needs to be
92protected.
93
94---
95[mandatory]
96
97BKL is also moved from around sb operations. ->write_super() Is now called
98without BKL held. BKL should have been shifted into individual fs sb_op
99functions. If you don't need it, remove it.
100
101---
102[informational]
103
104check for ->link() target not being a directory is done by callers. Feel
105free to drop it...
106
107---
108[informational]
109
Josef 'Jeff' Sipekc2b38982007-05-24 12:21:43 -0400110->link() callers hold ->i_mutex on the object we are linking to. Some of your
Linus Torvalds1da177e2005-04-16 15:20:36 -0700111problems might be over...
112
113---
114[mandatory]
115
116new file_system_type method - kill_sb(superblock). If you are converting
117an existing filesystem, set it according to ->fs_flags:
118 FS_REQUIRES_DEV - kill_block_super
119 FS_LITTER - kill_litter_super
120 neither - kill_anon_super
121FS_LITTER is gone - just remove it from fs_flags.
122
123---
124[mandatory]
125
126 FS_SINGLE is gone (actually, that had happened back when ->get_sb()
127went in - and hadn't been documented ;-/). Just remove it from fs_flags
128(and see ->get_sb() entry for other actions).
129
130---
131[mandatory]
132
Josef 'Jeff' Sipekc2b38982007-05-24 12:21:43 -0400133->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so
134watch for ->i_mutex-grabbing code that might be used by your ->setattr().
135Callers of notify_change() need ->i_mutex now.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700136
137---
138[recommended]
139
140New super_block field "struct export_operations *s_export_op" for
141explicit support for exporting, e.g. via NFS. The structure is fully
142documented at its declaration in include/linux/fs.h, and in
J. Bruce Fieldsdc7a0812009-10-27 14:41:35 -0400143Documentation/filesystems/nfs/Exporting.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700144
145Briefly it allows for the definition of decode_fh and encode_fh operations
146to encode and decode filehandles, and allows the filesystem to use
147a standard helper function for decode_fh, and provide file-system specific
148support for this helper, particularly get_parent.
149
150It is planned that this will be required for exporting once the code
151settles down a bit.
152
153[mandatory]
154
155s_export_op is now required for exporting a filesystem.
156isofs, ext2, ext3, resierfs, fat
157can be used as examples of very different filesystems.
158
159---
160[mandatory]
161
162iget4() and the read_inode2 callback have been superseded by iget5_locked()
163which has the following prototype,
164
165 struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
166 int (*test)(struct inode *, void *),
167 int (*set)(struct inode *, void *),
168 void *data);
169
170'test' is an additional function that can be used when the inode
171number is not sufficient to identify the actual file object. 'set'
172should be a non-blocking function that initializes those parts of a
173newly created inode to allow the test function to succeed. 'data' is
174passed as an opaque value to both test and set functions.
175
David Howells12debc42008-02-07 00:15:52 -0800176When the inode has been created by iget5_locked(), it will be returned with the
177I_NEW flag set and will still be locked. The filesystem then needs to finalize
178the initialization. Once the inode is initialized it must be unlocked by
179calling unlock_new_inode().
Linus Torvalds1da177e2005-04-16 15:20:36 -0700180
181The filesystem is responsible for setting (and possibly testing) i_ino
182when appropriate. There is also a simpler iget_locked function that
183just takes the superblock and inode number as arguments and does the
184test and set for you.
185
186e.g.
David Howellsb46980f2008-02-07 00:15:27 -0800187 inode = iget_locked(sb, ino);
188 if (inode->i_state & I_NEW) {
189 err = read_inode_from_disk(inode);
190 if (err < 0) {
191 iget_failed(inode);
192 return err;
193 }
194 unlock_new_inode(inode);
195 }
196
197Note that if the process of setting up a new inode fails, then iget_failed()
198should be called on the inode to render it dead, and an appropriate error
199should be passed back to the caller.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700200
201---
202[recommended]
203
204->getattr() finally getting used. See instances in nfs, minix, etc.
205
206---
207[mandatory]
208
209->revalidate() is gone. If your filesystem had it - provide ->getattr()
210and let it call whatever you had as ->revlidate() + (for symlinks that
211had ->revalidate()) add calls in ->follow_link()/->readlink().
212
213---
214[mandatory]
215
216->d_parent changes are not protected by BKL anymore. Read access is safe
217if at least one of the following is true:
218 * filesystem has no cross-directory rename()
219 * dcache_lock is held
220 * we know that parent had been locked (e.g. we are looking at
221->d_parent of ->lookup() argument).
222 * we are called from ->rename().
223 * the child's ->d_lock is held
224Audit your code and add locking if needed. Notice that any place that is
225not protected by the conditions above is risky even in the old tree - you
226had been relying on BKL and that's prone to screwups. Old tree had quite
227a few holes of that kind - unprotected access to ->d_parent leading to
228anything from oops to silent memory corruption.
229
230---
231[mandatory]
232
233 FS_NOMOUNT is gone. If you use it - just set MS_NOUSER in flags
234(see rootfs for one kind of solution and bdev/socket/pipe for another).
235
236---
237[recommended]
238
239 Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter
240is still alive, but only because of the mess in drivers/s390/block/dasd.c.
241As soon as it gets fixed is_read_only() will die.
242
243---
244[mandatory]
245
246->permission() is called without BKL now. Grab it on entry, drop upon
247return - that will guarantee the same locking you used to have. If
248your method or its parts do not need BKL - better yet, now you can
249shift lock_kernel() and unlock_kernel() so that they would protect
250exactly what needs to be protected.
251
252---
253[mandatory]
254
255->statfs() is now called without BKL held. BKL should have been
256shifted into individual fs sb_op functions where it's not clear that
257it's safe to remove it. If you don't need it, remove it.
258
259---
260[mandatory]
261
262 is_read_only() is gone; use bdev_read_only() instead.
263
264---
265[mandatory]
266
267 destroy_buffers() is gone; use invalidate_bdev().
268
269---
270[mandatory]
271
272 fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is
273deliberate; as soon as struct block_device * is propagated in a reasonable
274way by that code fixing will become trivial; until then nothing can be
275done.
Christoph Hellwig1e231732010-06-07 09:29:20 +0200276
277[mandatory]
278
279 block truncatation on error exit from ->write_begin, and ->direct_IO
280moved from generic methods (block_write_begin, cont_write_begin,
281nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at
282ext2_write_failed and callers for an example.
283
284[mandatory]
285
286 ->truncate is going away. The whole truncate sequence needs to be
287implemented in ->setattr, which is now mandatory for filesystems
288implementing on-disk size changes. Start with a copy of the old inode_setattr
289and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
290be in order of zeroing blocks using block_truncate_page or similar helpers,
291size update and on finally on-disk truncation which should not fail.
292inode_change_ok now includes the size checks for ATTR_SIZE and must be called
293in the beginning of ->setattr unconditionally.
Al Viro336fb3b2010-06-08 00:37:12 -0400294
295[mandatory]
296
297 ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
298be used instead. It gets called whenever the inode is evicted, whether it has
299remaining links or not. Caller does *not* evict the pagecache or inode-associated
300metadata buffers; getting rid of those is responsibility of method, as it had
301been for ->delete_inode().
302 ->drop_inode() returns int now; it's called on final iput() with inode_lock
303held and it returns true if filesystems wants the inode to be dropped. As before,
304generic_drop_inode() is still the default and it's been updated appropriately.
305generic_delete_inode() is also alive and it consists simply of return 1. Note that
306all actual eviction work is done by caller after ->drop_inode() returns.
307 clear_inode() is gone; use end_writeback() instead. As before, it must
308be called exactly once on each call of ->evict_inode() (as it used to be for
309each call of ->delete_inode()). Unlike before, if you are using inode-associated
310metadata buffers (i.e. mark_buffer_dirty_inode()), it's your responsibility to
311call invalidate_inode_buffers() before end_writeback().
312 No async writeback (and thus no calls of ->write_inode()) will happen
313after end_writeback() returns, so actions that should not overlap with ->write_inode()
314(e.g. freeing on-disk inode if i_nlink is 0) ought to be done after that call.
315
316 NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
317if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput()
318may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
319free the on-disk inode, you may end up doing that while ->write_inode() is writing
320to it.