David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 1 | =============================== |
| 2 | FS-CACHE NETWORK FILESYSTEM API |
| 3 | =============================== |
| 4 | |
| 5 | There's an API by which a network filesystem can make use of the FS-Cache |
| 6 | facilities. This is based around a number of principles: |
| 7 | |
| 8 | (1) Caches can store a number of different object types. There are two main |
| 9 | object types: indices and files. The first is a special type used by |
| 10 | FS-Cache to make finding objects faster and to make retiring of groups of |
| 11 | objects easier. |
| 12 | |
| 13 | (2) Every index, file or other object is represented by a cookie. This cookie |
| 14 | may or may not have anything associated with it, but the netfs doesn't |
| 15 | need to care. |
| 16 | |
| 17 | (3) Barring the top-level index (one entry per cached netfs), the index |
| 18 | hierarchy for each netfs is structured according the whim of the netfs. |
| 19 | |
| 20 | This API is declared in <linux/fscache.h>. |
| 21 | |
| 22 | This document contains the following sections: |
| 23 | |
| 24 | (1) Network filesystem definition |
| 25 | (2) Index definition |
| 26 | (3) Object definition |
| 27 | (4) Network filesystem (un)registration |
| 28 | (5) Cache tag lookup |
| 29 | (6) Index registration |
| 30 | (7) Data file registration |
| 31 | (8) Miscellaneous object registration |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 32 | (9) Setting the data file size |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 33 | (10) Page alloc/read/write |
| 34 | (11) Page uncaching |
David Howells | da9803b | 2013-08-21 17:29:38 -0400 | [diff] [blame] | 35 | (12) Index and data file consistency |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 36 | (13) Cookie enablement |
| 37 | (14) Miscellaneous cookie operations |
| 38 | (15) Cookie unregistration |
| 39 | (16) Index invalidation |
| 40 | (17) Data file invalidation |
| 41 | (18) FS-Cache specific page flags. |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 42 | |
| 43 | |
| 44 | ============================= |
| 45 | NETWORK FILESYSTEM DEFINITION |
| 46 | ============================= |
| 47 | |
| 48 | FS-Cache needs a description of the network filesystem. This is specified |
| 49 | using a record of the following structure: |
| 50 | |
| 51 | struct fscache_netfs { |
| 52 | uint32_t version; |
| 53 | const char *name; |
| 54 | struct fscache_cookie *primary_index; |
| 55 | ... |
| 56 | }; |
| 57 | |
| 58 | This first two fields should be filled in before registration, and the third |
| 59 | will be filled in by the registration function; any other fields should just be |
| 60 | ignored and are for internal use only. |
| 61 | |
| 62 | The fields are: |
| 63 | |
| 64 | (1) The name of the netfs (used as the key in the toplevel index). |
| 65 | |
| 66 | (2) The version of the netfs (if the name matches but the version doesn't, the |
| 67 | entire in-cache hierarchy for this netfs will be scrapped and begun |
| 68 | afresh). |
| 69 | |
| 70 | (3) The cookie representing the primary index will be allocated according to |
| 71 | another parameter passed into the registration function. |
| 72 | |
| 73 | For example, kAFS (linux/fs/afs/) uses the following definitions to describe |
| 74 | itself: |
| 75 | |
| 76 | struct fscache_netfs afs_cache_netfs = { |
| 77 | .version = 0, |
| 78 | .name = "afs", |
| 79 | }; |
| 80 | |
| 81 | |
| 82 | ================ |
| 83 | INDEX DEFINITION |
| 84 | ================ |
| 85 | |
| 86 | Indices are used for two purposes: |
| 87 | |
| 88 | (1) To aid the finding of a file based on a series of keys (such as AFS's |
| 89 | "cell", "volume ID", "vnode ID"). |
| 90 | |
| 91 | (2) To make it easier to discard a subset of all the files cached based around |
| 92 | a particular key - for instance to mirror the removal of an AFS volume. |
| 93 | |
| 94 | However, since it's unlikely that any two netfs's are going to want to define |
| 95 | their index hierarchies in quite the same way, FS-Cache tries to impose as few |
| 96 | restraints as possible on how an index is structured and where it is placed in |
| 97 | the tree. The netfs can even mix indices and data files at the same level, but |
| 98 | it's not recommended. |
| 99 | |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 100 | Each index entry consists of a key of indeterminate length plus some auxiliary |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 101 | data, also of indeterminate length. |
| 102 | |
| 103 | There are some limits on indices: |
| 104 | |
| 105 | (1) Any index containing non-index objects should be restricted to a single |
| 106 | cache. Any such objects created within an index will be created in the |
| 107 | first cache only. The cache in which an index is created can be |
| 108 | controlled by cache tags (see below). |
| 109 | |
| 110 | (2) The entry data must be atomically journallable, so it is limited to about |
| 111 | 400 bytes at present. At least 400 bytes will be available. |
| 112 | |
| 113 | (3) The depth of the index tree should be judged with care as the search |
| 114 | function is recursive. Too many layers will run the kernel out of stack. |
| 115 | |
| 116 | |
| 117 | ================= |
| 118 | OBJECT DEFINITION |
| 119 | ================= |
| 120 | |
| 121 | To define an object, a structure of the following type should be filled out: |
| 122 | |
| 123 | struct fscache_cookie_def |
| 124 | { |
| 125 | uint8_t name[16]; |
| 126 | uint8_t type; |
| 127 | |
| 128 | struct fscache_cache_tag *(*select_cache)( |
| 129 | const void *parent_netfs_data, |
| 130 | const void *cookie_netfs_data); |
| 131 | |
| 132 | uint16_t (*get_key)(const void *cookie_netfs_data, |
| 133 | void *buffer, |
| 134 | uint16_t bufmax); |
| 135 | |
| 136 | void (*get_attr)(const void *cookie_netfs_data, |
| 137 | uint64_t *size); |
| 138 | |
| 139 | uint16_t (*get_aux)(const void *cookie_netfs_data, |
| 140 | void *buffer, |
| 141 | uint16_t bufmax); |
| 142 | |
| 143 | enum fscache_checkaux (*check_aux)(void *cookie_netfs_data, |
| 144 | const void *data, |
| 145 | uint16_t datalen); |
| 146 | |
| 147 | void (*get_context)(void *cookie_netfs_data, void *context); |
| 148 | |
| 149 | void (*put_context)(void *cookie_netfs_data, void *context); |
| 150 | |
| 151 | void (*mark_pages_cached)(void *cookie_netfs_data, |
| 152 | struct address_space *mapping, |
| 153 | struct pagevec *cached_pvec); |
| 154 | |
| 155 | void (*now_uncached)(void *cookie_netfs_data); |
| 156 | }; |
| 157 | |
| 158 | This has the following fields: |
| 159 | |
| 160 | (1) The type of the object [mandatory]. |
| 161 | |
| 162 | This is one of the following values: |
| 163 | |
| 164 | (*) FSCACHE_COOKIE_TYPE_INDEX |
| 165 | |
| 166 | This defines an index, which is a special FS-Cache type. |
| 167 | |
| 168 | (*) FSCACHE_COOKIE_TYPE_DATAFILE |
| 169 | |
| 170 | This defines an ordinary data file. |
| 171 | |
| 172 | (*) Any other value between 2 and 255 |
| 173 | |
| 174 | This defines an extraordinary object such as an XATTR. |
| 175 | |
| 176 | (2) The name of the object type (NUL terminated unless all 16 chars are used) |
| 177 | [optional]. |
| 178 | |
| 179 | (3) A function to select the cache in which to store an index [optional]. |
| 180 | |
| 181 | This function is invoked when an index needs to be instantiated in a cache |
| 182 | during the instantiation of a non-index object. Only the immediate index |
| 183 | parent for the non-index object will be queried. Any indices above that |
| 184 | in the hierarchy may be stored in multiple caches. This function does not |
| 185 | need to be supplied for any non-index object or any index that will only |
| 186 | have index children. |
| 187 | |
| 188 | If this function is not supplied or if it returns NULL then the first |
Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 189 | cache in the parent's list will be chosen, or failing that, the first |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 190 | cache in the master list. |
| 191 | |
| 192 | (4) A function to retrieve an object's key from the netfs [mandatory]. |
| 193 | |
| 194 | This function will be called with the netfs data that was passed to the |
| 195 | cookie acquisition function and the maximum length of key data that it may |
| 196 | provide. It should write the required key data into the given buffer and |
| 197 | return the quantity it wrote. |
| 198 | |
| 199 | (5) A function to retrieve attribute data from the netfs [optional]. |
| 200 | |
| 201 | This function will be called with the netfs data that was passed to the |
| 202 | cookie acquisition function. It should return the size of the file if |
| 203 | this is a data file. The size may be used to govern how much cache must |
| 204 | be reserved for this file in the cache. |
| 205 | |
| 206 | If the function is absent, a file size of 0 is assumed. |
| 207 | |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 208 | (6) A function to retrieve auxiliary data from the netfs [optional]. |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 209 | |
| 210 | This function will be called with the netfs data that was passed to the |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 211 | cookie acquisition function and the maximum length of auxiliary data that |
| 212 | it may provide. It should write the auxiliary data into the given buffer |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 213 | and return the quantity it wrote. |
| 214 | |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 215 | If this function is absent, the auxiliary data length will be set to 0. |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 216 | |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 217 | The length of the auxiliary data buffer may be dependent on the key |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 218 | length. A netfs mustn't rely on being able to provide more than 400 bytes |
| 219 | for both. |
| 220 | |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 221 | (7) A function to check the auxiliary data [optional]. |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 222 | |
| 223 | This function will be called to check that a match found in the cache for |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 224 | this object is valid. For instance with AFS it could check the auxiliary |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 225 | data against the data version number returned by the server to determine |
| 226 | whether the index entry in a cache is still valid. |
| 227 | |
| 228 | If this function is absent, it will be assumed that matching objects in a |
| 229 | cache are always valid. |
| 230 | |
| 231 | If present, the function should return one of the following values: |
| 232 | |
| 233 | (*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is |
| 234 | (*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update |
| 235 | (*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted |
| 236 | |
Lucas De Marchi | 25985ed | 2011-03-30 22:57:33 -0300 | [diff] [blame] | 237 | This function can also be used to extract data from the auxiliary data in |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 238 | the cache and copy it into the netfs's structures. |
| 239 | |
| 240 | (8) A pair of functions to manage contexts for the completion callback |
| 241 | [optional]. |
| 242 | |
| 243 | The cache read/write functions are passed a context which is then passed |
| 244 | to the I/O completion callback function. To ensure this context remains |
| 245 | valid until after the I/O completion is called, two functions may be |
| 246 | provided: one to get an extra reference on the context, and one to drop a |
| 247 | reference to it. |
| 248 | |
| 249 | If the context is not used or is a type of object that won't go out of |
| 250 | scope, then these functions are not required. These functions are not |
| 251 | required for indices as indices may not contain data. These functions may |
| 252 | be called in interrupt context and so may not sleep. |
| 253 | |
| 254 | (9) A function to mark a page as retaining cache metadata [optional]. |
| 255 | |
| 256 | This is called by the cache to indicate that it is retaining in-memory |
| 257 | information for this page and that the netfs should uncache the page when |
| 258 | it has finished. This does not indicate whether there's data on the disk |
| 259 | or not. Note that several pages at once may be presented for marking. |
| 260 | |
| 261 | The PG_fscache bit is set on the pages before this function would be |
| 262 | called, so the function need not be provided if this is sufficient. |
| 263 | |
| 264 | This function is not required for indices as they're not permitted data. |
| 265 | |
| 266 | (10) A function to unmark all the pages retaining cache metadata [mandatory]. |
| 267 | |
| 268 | This is called by FS-Cache to indicate that a backing store is being |
| 269 | unbound from a cookie and that all the marks on the pages should be |
| 270 | cleared to prevent confusion. Note that the cache will have torn down all |
| 271 | its tracking information so that the pages don't need to be explicitly |
| 272 | uncached. |
| 273 | |
| 274 | This function is not required for indices as they're not permitted data. |
| 275 | |
| 276 | |
| 277 | =================================== |
| 278 | NETWORK FILESYSTEM (UN)REGISTRATION |
| 279 | =================================== |
| 280 | |
| 281 | The first step is to declare the network filesystem to the cache. This also |
| 282 | involves specifying the layout of the primary index (for AFS, this would be the |
| 283 | "cell" level). |
| 284 | |
| 285 | The registration function is: |
| 286 | |
| 287 | int fscache_register_netfs(struct fscache_netfs *netfs); |
| 288 | |
| 289 | It just takes a pointer to the netfs definition. It returns 0 or an error as |
| 290 | appropriate. |
| 291 | |
| 292 | For kAFS, registration is done as follows: |
| 293 | |
| 294 | ret = fscache_register_netfs(&afs_cache_netfs); |
| 295 | |
| 296 | The last step is, of course, unregistration: |
| 297 | |
| 298 | void fscache_unregister_netfs(struct fscache_netfs *netfs); |
| 299 | |
| 300 | |
| 301 | ================ |
| 302 | CACHE TAG LOOKUP |
| 303 | ================ |
| 304 | |
| 305 | FS-Cache permits the use of more than one cache. To permit particular index |
| 306 | subtrees to be bound to particular caches, the second step is to look up cache |
| 307 | representation tags. This step is optional; it can be left entirely up to |
| 308 | FS-Cache as to which cache should be used. The problem with doing that is that |
| 309 | FS-Cache will always pick the first cache that was registered. |
| 310 | |
| 311 | To get the representation for a named tag: |
| 312 | |
| 313 | struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name); |
| 314 | |
| 315 | This takes a text string as the name and returns a representation of a tag. It |
| 316 | will never return an error. It may return a dummy tag, however, if it runs out |
| 317 | of memory; this will inhibit caching with this tag. |
| 318 | |
| 319 | Any representation so obtained must be released by passing it to this function: |
| 320 | |
| 321 | void fscache_release_cache_tag(struct fscache_cache_tag *tag); |
| 322 | |
| 323 | The tag will be retrieved by FS-Cache when it calls the object definition |
| 324 | operation select_cache(). |
| 325 | |
| 326 | |
| 327 | ================== |
| 328 | INDEX REGISTRATION |
| 329 | ================== |
| 330 | |
| 331 | The third step is to inform FS-Cache about part of an index hierarchy that can |
| 332 | be used to locate files. This is done by requesting a cookie for each index in |
| 333 | the path to the file: |
| 334 | |
| 335 | struct fscache_cookie * |
| 336 | fscache_acquire_cookie(struct fscache_cookie *parent, |
| 337 | const struct fscache_object_def *def, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 338 | void *netfs_data, |
| 339 | bool enable); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 340 | |
| 341 | This function creates an index entry in the index represented by parent, |
| 342 | filling in the index entry by calling the operations pointed to by def. |
| 343 | |
| 344 | Note that this function never returns an error - all errors are handled |
| 345 | internally. It may, however, return NULL to indicate no cookie. It is quite |
| 346 | acceptable to pass this token back to this function as the parent to another |
| 347 | acquisition (or even to the relinquish cookie, read page and write page |
| 348 | functions - see below). |
| 349 | |
| 350 | Note also that no indices are actually created in a cache until a non-index |
| 351 | object needs to be created somewhere down the hierarchy. Furthermore, an index |
| 352 | may be created in several different caches independently at different times. |
| 353 | This is all handled transparently, and the netfs doesn't see any of it. |
| 354 | |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 355 | A cookie will be created in the disabled state if enabled is false. A cookie |
| 356 | must be enabled to do anything with it. A disabled cookie can be enabled by |
| 357 | calling fscache_enable_cookie() (see below). |
| 358 | |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 359 | For example, with AFS, a cell would be added to the primary index. This index |
| 360 | entry would have a dependent inode containing a volume location index for the |
| 361 | volume mappings within this cell: |
| 362 | |
| 363 | cell->cache = |
| 364 | fscache_acquire_cookie(afs_cache_netfs.primary_index, |
| 365 | &afs_cell_cache_index_def, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 366 | cell, true); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 367 | |
| 368 | Then when a volume location was accessed, it would be entered into the cell's |
| 369 | index and an inode would be allocated that acts as a volume type and hash chain |
| 370 | combination: |
| 371 | |
| 372 | vlocation->cache = |
| 373 | fscache_acquire_cookie(cell->cache, |
| 374 | &afs_vlocation_cache_index_def, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 375 | vlocation, true); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 376 | |
| 377 | And then a particular flavour of volume (R/O for example) could be added to |
| 378 | that index, creating another index for vnodes (AFS inode equivalents): |
| 379 | |
| 380 | volume->cache = |
| 381 | fscache_acquire_cookie(vlocation->cache, |
| 382 | &afs_volume_cache_index_def, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 383 | volume, true); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 384 | |
| 385 | |
| 386 | ====================== |
| 387 | DATA FILE REGISTRATION |
| 388 | ====================== |
| 389 | |
| 390 | The fourth step is to request a data file be created in the cache. This is |
| 391 | identical to index cookie acquisition. The only difference is that the type in |
| 392 | the object definition should be something other than index type. |
| 393 | |
| 394 | vnode->cache = |
| 395 | fscache_acquire_cookie(volume->cache, |
| 396 | &afs_vnode_cache_object_def, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 397 | vnode, true); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 398 | |
| 399 | |
| 400 | ================================= |
| 401 | MISCELLANEOUS OBJECT REGISTRATION |
| 402 | ================================= |
| 403 | |
| 404 | An optional step is to request an object of miscellaneous type be created in |
| 405 | the cache. This is almost identical to index cookie acquisition. The only |
| 406 | difference is that the type in the object definition should be something other |
| 407 | than index type. Whilst the parent object could be an index, it's more likely |
| 408 | it would be some other type of object such as a data file. |
| 409 | |
| 410 | xattr->cache = |
| 411 | fscache_acquire_cookie(vnode->cache, |
| 412 | &afs_xattr_cache_object_def, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 413 | xattr, true); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 414 | |
| 415 | Miscellaneous objects might be used to store extended attributes or directory |
| 416 | entries for example. |
| 417 | |
| 418 | |
| 419 | ========================== |
| 420 | SETTING THE DATA FILE SIZE |
| 421 | ========================== |
| 422 | |
| 423 | The fifth step is to set the physical attributes of the file, such as its size. |
| 424 | This doesn't automatically reserve any space in the cache, but permits the |
| 425 | cache to adjust its metadata for data tracking appropriately: |
| 426 | |
| 427 | int fscache_attr_changed(struct fscache_cookie *cookie); |
| 428 | |
| 429 | The cache will return -ENOBUFS if there is no backing cache or if there is no |
| 430 | space to allocate any extra metadata required in the cache. The attributes |
| 431 | will be accessed with the get_attr() cookie definition operation. |
| 432 | |
| 433 | Note that attempts to read or write data pages in the cache over this size may |
| 434 | be rebuffed with -ENOBUFS. |
| 435 | |
| 436 | This operation schedules an attribute adjustment to happen asynchronously at |
| 437 | some point in the future, and as such, it may happen after the function returns |
| 438 | to the caller. The attribute adjustment excludes read and write operations. |
| 439 | |
| 440 | |
| 441 | ===================== |
David Howells | 696f69b | 2013-09-05 13:06:15 +0100 | [diff] [blame] | 442 | PAGE ALLOC/READ/WRITE |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 443 | ===================== |
| 444 | |
| 445 | And the sixth step is to store and retrieve pages in the cache. There are |
| 446 | three functions that are used to do this. |
| 447 | |
| 448 | Note: |
| 449 | |
| 450 | (1) A page should not be re-read or re-allocated without uncaching it first. |
| 451 | |
| 452 | (2) A read or allocated page must be uncached when the netfs page is released |
| 453 | from the pagecache. |
| 454 | |
| 455 | (3) A page should only be written to the cache if previous read or allocated. |
| 456 | |
| 457 | This permits the cache to maintain its page tracking in proper order. |
| 458 | |
| 459 | |
| 460 | PAGE READ |
| 461 | --------- |
| 462 | |
| 463 | Firstly, the netfs should ask FS-Cache to examine the caches and read the |
| 464 | contents cached for a particular page of a particular file if present, or else |
| 465 | allocate space to store the contents if not: |
| 466 | |
| 467 | typedef |
| 468 | void (*fscache_rw_complete_t)(struct page *page, |
| 469 | void *context, |
| 470 | int error); |
| 471 | |
| 472 | int fscache_read_or_alloc_page(struct fscache_cookie *cookie, |
| 473 | struct page *page, |
| 474 | fscache_rw_complete_t end_io_func, |
| 475 | void *context, |
| 476 | gfp_t gfp); |
| 477 | |
| 478 | The cookie argument must specify a cookie for an object that isn't an index, |
| 479 | the page specified will have the data loaded into it (and is also used to |
| 480 | specify the page number), and the gfp argument is used to control how any |
| 481 | memory allocations made are satisfied. |
| 482 | |
| 483 | If the cookie indicates the inode is not cached: |
| 484 | |
| 485 | (1) The function will return -ENOBUFS. |
| 486 | |
| 487 | Else if there's a copy of the page resident in the cache: |
| 488 | |
| 489 | (1) The mark_pages_cached() cookie operation will be called on that page. |
| 490 | |
| 491 | (2) The function will submit a request to read the data from the cache's |
| 492 | backing device directly into the page specified. |
| 493 | |
| 494 | (3) The function will return 0. |
| 495 | |
| 496 | (4) When the read is complete, end_io_func() will be invoked with: |
| 497 | |
| 498 | (*) The netfs data supplied when the cookie was created. |
| 499 | |
| 500 | (*) The page descriptor. |
| 501 | |
| 502 | (*) The context argument passed to the above function. This will be |
| 503 | maintained with the get_context/put_context functions mentioned above. |
| 504 | |
| 505 | (*) An argument that's 0 on success or negative for an error code. |
| 506 | |
| 507 | If an error occurs, it should be assumed that the page contains no usable |
Milosz Tanski | 5a6f282 | 2013-08-21 17:30:11 -0400 | [diff] [blame] | 508 | data. fscache_readpages_cancel() may need to be called. |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 509 | |
| 510 | end_io_func() will be called in process context if the read is results in |
| 511 | an error, but it might be called in interrupt context if the read is |
| 512 | successful. |
| 513 | |
| 514 | Otherwise, if there's not a copy available in cache, but the cache may be able |
| 515 | to store the page: |
| 516 | |
| 517 | (1) The mark_pages_cached() cookie operation will be called on that page. |
| 518 | |
| 519 | (2) A block may be reserved in the cache and attached to the object at the |
| 520 | appropriate place. |
| 521 | |
| 522 | (3) The function will return -ENODATA. |
| 523 | |
| 524 | This function may also return -ENOMEM or -EINTR, in which case it won't have |
| 525 | read any data from the cache. |
| 526 | |
| 527 | |
| 528 | PAGE ALLOCATE |
| 529 | ------------- |
| 530 | |
| 531 | Alternatively, if there's not expected to be any data in the cache for a page |
| 532 | because the file has been extended, a block can simply be allocated instead: |
| 533 | |
| 534 | int fscache_alloc_page(struct fscache_cookie *cookie, |
| 535 | struct page *page, |
| 536 | gfp_t gfp); |
| 537 | |
| 538 | This is similar to the fscache_read_or_alloc_page() function, except that it |
| 539 | never reads from the cache. It will return 0 if a block has been allocated, |
| 540 | rather than -ENODATA as the other would. One or the other must be performed |
| 541 | before writing to the cache. |
| 542 | |
| 543 | The mark_pages_cached() cookie operation will be called on the page if |
| 544 | successful. |
| 545 | |
| 546 | |
| 547 | PAGE WRITE |
| 548 | ---------- |
| 549 | |
| 550 | Secondly, if the netfs changes the contents of the page (either due to an |
| 551 | initial download or if a user performs a write), then the page should be |
| 552 | written back to the cache: |
| 553 | |
| 554 | int fscache_write_page(struct fscache_cookie *cookie, |
| 555 | struct page *page, |
| 556 | gfp_t gfp); |
| 557 | |
| 558 | The cookie argument must specify a data file cookie, the page specified should |
| 559 | contain the data to be written (and is also used to specify the page number), |
| 560 | and the gfp argument is used to control how any memory allocations made are |
| 561 | satisfied. |
| 562 | |
| 563 | The page must have first been read or allocated successfully and must not have |
| 564 | been uncached before writing is performed. |
| 565 | |
| 566 | If the cookie indicates the inode is not cached then: |
| 567 | |
| 568 | (1) The function will return -ENOBUFS. |
| 569 | |
| 570 | Else if space can be allocated in the cache to hold this page: |
| 571 | |
| 572 | (1) PG_fscache_write will be set on the page. |
| 573 | |
| 574 | (2) The function will submit a request to write the data to cache's backing |
| 575 | device directly from the page specified. |
| 576 | |
| 577 | (3) The function will return 0. |
| 578 | |
| 579 | (4) When the write is complete PG_fscache_write is cleared on the page and |
| 580 | anyone waiting for that bit will be woken up. |
| 581 | |
| 582 | Else if there's no space available in the cache, -ENOBUFS will be returned. It |
| 583 | is also possible for the PG_fscache_write bit to be cleared when no write took |
| 584 | place if unforeseen circumstances arose (such as a disk error). |
| 585 | |
| 586 | Writing takes place asynchronously. |
| 587 | |
| 588 | |
| 589 | MULTIPLE PAGE READ |
| 590 | ------------------ |
| 591 | |
| 592 | A facility is provided to read several pages at once, as requested by the |
| 593 | readpages() address space operation: |
| 594 | |
| 595 | int fscache_read_or_alloc_pages(struct fscache_cookie *cookie, |
| 596 | struct address_space *mapping, |
| 597 | struct list_head *pages, |
| 598 | int *nr_pages, |
| 599 | fscache_rw_complete_t end_io_func, |
| 600 | void *context, |
| 601 | gfp_t gfp); |
| 602 | |
| 603 | This works in a similar way to fscache_read_or_alloc_page(), except: |
| 604 | |
| 605 | (1) Any page it can retrieve data for is removed from pages and nr_pages and |
| 606 | dispatched for reading to the disk. Reads of adjacent pages on disk may |
| 607 | be merged for greater efficiency. |
| 608 | |
| 609 | (2) The mark_pages_cached() cookie operation will be called on several pages |
| 610 | at once if they're being read or allocated. |
| 611 | |
| 612 | (3) If there was an general error, then that error will be returned. |
| 613 | |
| 614 | Else if some pages couldn't be allocated or read, then -ENOBUFS will be |
| 615 | returned. |
| 616 | |
| 617 | Else if some pages couldn't be read but were allocated, then -ENODATA will |
| 618 | be returned. |
| 619 | |
| 620 | Otherwise, if all pages had reads dispatched, then 0 will be returned, the |
| 621 | list will be empty and *nr_pages will be 0. |
| 622 | |
| 623 | (4) end_io_func will be called once for each page being read as the reads |
| 624 | complete. It will be called in process context if error != 0, but it may |
| 625 | be called in interrupt context if there is no error. |
| 626 | |
| 627 | Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude |
| 628 | some of the pages being read and some being allocated. Those pages will have |
| 629 | been marked appropriately and will need uncaching. |
| 630 | |
| 631 | |
Milosz Tanski | 5a6f282 | 2013-08-21 17:30:11 -0400 | [diff] [blame] | 632 | CANCELLATION OF UNREAD PAGES |
| 633 | ---------------------------- |
| 634 | |
| 635 | If one or more pages are passed to fscache_read_or_alloc_pages() but not then |
| 636 | read from the cache and also not read from the underlying filesystem then |
| 637 | those pages will need to have any marks and reservations removed. This can be |
| 638 | done by calling: |
| 639 | |
| 640 | void fscache_readpages_cancel(struct fscache_cookie *cookie, |
| 641 | struct list_head *pages); |
| 642 | |
| 643 | prior to returning to the caller. The cookie argument should be as passed to |
| 644 | fscache_read_or_alloc_pages(). Every page in the pages list will be examined |
| 645 | and any that have PG_fscache set will be uncached. |
| 646 | |
| 647 | |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 648 | ============== |
| 649 | PAGE UNCACHING |
| 650 | ============== |
| 651 | |
| 652 | To uncache a page, this function should be called: |
| 653 | |
| 654 | void fscache_uncache_page(struct fscache_cookie *cookie, |
| 655 | struct page *page); |
| 656 | |
| 657 | This function permits the cache to release any in-memory representation it |
| 658 | might be holding for this netfs page. This function must be called once for |
| 659 | each page on which the read or write page functions above have been called to |
| 660 | make sure the cache's in-memory tracking information gets torn down. |
| 661 | |
| 662 | Note that pages can't be explicitly deleted from the a data file. The whole |
| 663 | data file must be retired (see the relinquish cookie function below). |
| 664 | |
| 665 | Furthermore, note that this does not cancel the asynchronous read or write |
| 666 | operation started by the read/alloc and write functions, so the page |
David Howells | 201a154 | 2009-11-19 18:11:35 +0000 | [diff] [blame] | 667 | invalidation functions must use: |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 668 | |
| 669 | bool fscache_check_page_write(struct fscache_cookie *cookie, |
| 670 | struct page *page); |
| 671 | |
| 672 | to see if a page is being written to the cache, and: |
| 673 | |
| 674 | void fscache_wait_on_page_write(struct fscache_cookie *cookie, |
| 675 | struct page *page); |
| 676 | |
| 677 | to wait for it to finish if it is. |
| 678 | |
| 679 | |
David Howells | 201a154 | 2009-11-19 18:11:35 +0000 | [diff] [blame] | 680 | When releasepage() is being implemented, a special FS-Cache function exists to |
| 681 | manage the heuristics of coping with vmscan trying to eject pages, which may |
| 682 | conflict with the cache trying to write pages to the cache (which may itself |
| 683 | need to allocate memory): |
| 684 | |
| 685 | bool fscache_maybe_release_page(struct fscache_cookie *cookie, |
| 686 | struct page *page, |
| 687 | gfp_t gfp); |
| 688 | |
| 689 | This takes the netfs cookie, and the page and gfp arguments as supplied to |
| 690 | releasepage(). It will return false if the page cannot be released yet for |
| 691 | some reason and if it returns true, the page has been uncached and can now be |
| 692 | released. |
| 693 | |
| 694 | To make a page available for release, this function may wait for an outstanding |
| 695 | storage request to complete, or it may attempt to cancel the storage request - |
| 696 | in which case the page will not be stored in the cache this time. |
| 697 | |
| 698 | |
David Howells | c902ce1 | 2011-07-07 12:19:48 +0100 | [diff] [blame] | 699 | BULK INODE PAGE UNCACHE |
| 700 | ----------------------- |
| 701 | |
| 702 | A convenience routine is provided to perform an uncache on all the pages |
| 703 | attached to an inode. This assumes that the pages on the inode correspond on a |
| 704 | 1:1 basis with the pages in the cache. |
| 705 | |
| 706 | void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie, |
| 707 | struct inode *inode); |
| 708 | |
| 709 | This takes the netfs cookie that the pages were cached with and the inode that |
| 710 | the pages are attached to. This function will wait for pages to finish being |
| 711 | written to the cache and for the cache to finish with the page generally. No |
| 712 | error is returned. |
| 713 | |
| 714 | |
David Howells | da9803b | 2013-08-21 17:29:38 -0400 | [diff] [blame] | 715 | =============================== |
| 716 | INDEX AND DATA FILE CONSISTENCY |
| 717 | =============================== |
| 718 | |
| 719 | To find out whether auxiliary data for an object is up to data within the |
| 720 | cache, the following function can be called: |
| 721 | |
| 722 | int fscache_check_consistency(struct fscache_cookie *cookie) |
| 723 | |
| 724 | This will call back to the netfs to check whether the auxiliary data associated |
| 725 | with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it |
| 726 | may also return -ENOMEM and -ERESTARTSYS. |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 727 | |
| 728 | To request an update of the index data for an index or other object, the |
| 729 | following function should be called: |
| 730 | |
| 731 | void fscache_update_cookie(struct fscache_cookie *cookie); |
| 732 | |
| 733 | This function will refer back to the netfs_data pointer stored in the cookie by |
| 734 | the acquisition function to obtain the data to write into each revised index |
| 735 | entry. The update method in the parent index definition will be called to |
| 736 | transfer the data. |
| 737 | |
| 738 | Note that partial updates may happen automatically at other times, such as when |
| 739 | data blocks are added to a data file object. |
| 740 | |
| 741 | |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 742 | ================= |
| 743 | COOKIE ENABLEMENT |
| 744 | ================= |
| 745 | |
| 746 | Cookies exist in one of two states: enabled and disabled. If a cookie is |
| 747 | disabled, it ignores all attempts to acquire child cookies; check, update or |
| 748 | invalidate its state; allocate, read or write backing pages - though it is |
| 749 | still possible to uncache pages and relinquish the cookie. |
| 750 | |
| 751 | The initial enablement state is set by fscache_acquire_cookie(), but the cookie |
| 752 | can be enabled or disabled later. To disable a cookie, call: |
| 753 | |
| 754 | void fscache_disable_cookie(struct fscache_cookie *cookie, |
| 755 | bool invalidate); |
| 756 | |
| 757 | If the cookie is not already disabled, this locks the cookie against other |
| 758 | enable and disable ops, marks the cookie as being disabled, discards or |
| 759 | invalidates any backing objects and waits for cessation of activity on any |
| 760 | associated object before unlocking the cookie. |
| 761 | |
| 762 | All possible failures are handled internally. The caller should consider |
| 763 | calling fscache_uncache_all_inode_pages() afterwards to make sure all page |
| 764 | markings are cleared up. |
| 765 | |
| 766 | Cookies can be enabled or reenabled with: |
| 767 | |
| 768 | void fscache_enable_cookie(struct fscache_cookie *cookie, |
| 769 | bool (*can_enable)(void *data), |
| 770 | void *data) |
| 771 | |
| 772 | If the cookie is not already enabled, this locks the cookie against other |
| 773 | enable and disable ops, invokes can_enable() and, if the cookie is not an index |
| 774 | cookie, will begin the procedure of acquiring backing objects. |
| 775 | |
| 776 | The optional can_enable() function is passed the data argument and returns a |
| 777 | ruling as to whether or not enablement should actually be permitted to begin. |
| 778 | |
| 779 | All possible failures are handled internally. The cookie will only be marked |
| 780 | as enabled if provisional backing objects are allocated. |
| 781 | |
| 782 | |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 783 | =============================== |
| 784 | MISCELLANEOUS COOKIE OPERATIONS |
| 785 | =============================== |
| 786 | |
| 787 | There are a number of operations that can be used to control cookies: |
| 788 | |
| 789 | (*) Cookie pinning: |
| 790 | |
| 791 | int fscache_pin_cookie(struct fscache_cookie *cookie); |
| 792 | void fscache_unpin_cookie(struct fscache_cookie *cookie); |
| 793 | |
| 794 | These operations permit data cookies to be pinned into the cache and to |
| 795 | have the pinning removed. They are not permitted on index cookies. |
| 796 | |
| 797 | The pinning function will return 0 if successful, -ENOBUFS in the cookie |
| 798 | isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning, |
| 799 | -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or |
| 800 | -EIO if there's any other problem. |
| 801 | |
| 802 | (*) Data space reservation: |
| 803 | |
| 804 | int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size); |
| 805 | |
| 806 | This permits a netfs to request cache space be reserved to store up to the |
| 807 | given amount of a file. It is permitted to ask for more than the current |
| 808 | size of the file to allow for future file expansion. |
| 809 | |
| 810 | If size is given as zero then the reservation will be cancelled. |
| 811 | |
| 812 | The function will return 0 if successful, -ENOBUFS in the cookie isn't |
| 813 | backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations, |
| 814 | -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or |
| 815 | -EIO if there's any other problem. |
| 816 | |
| 817 | Note that this doesn't pin an object in a cache; it can still be culled to |
| 818 | make space if it's not in use. |
| 819 | |
| 820 | |
| 821 | ===================== |
| 822 | COOKIE UNREGISTRATION |
| 823 | ===================== |
| 824 | |
| 825 | To get rid of a cookie, this function should be called. |
| 826 | |
| 827 | void fscache_relinquish_cookie(struct fscache_cookie *cookie, |
David Howells | 94d30ae | 2013-09-21 00:09:31 +0100 | [diff] [blame] | 828 | bool retire); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 829 | |
| 830 | If retire is non-zero, then the object will be marked for recycling, and all |
| 831 | copies of it will be removed from all active caches in which it is present. |
| 832 | Not only that but all child objects will also be retired. |
| 833 | |
| 834 | If retire is zero, then the object may be available again when next the |
| 835 | acquisition function is called. Retirement here will overrule the pinning on a |
| 836 | cookie. |
| 837 | |
| 838 | One very important note - relinquish must NOT be called for a cookie unless all |
| 839 | the cookies for "child" indices, objects and pages have been relinquished |
| 840 | first. |
| 841 | |
| 842 | |
David Howells | ef778e7 | 2012-12-20 21:52:36 +0000 | [diff] [blame] | 843 | ================== |
| 844 | INDEX INVALIDATION |
| 845 | ================== |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 846 | |
David Howells | ef778e7 | 2012-12-20 21:52:36 +0000 | [diff] [blame] | 847 | There is no direct way to invalidate an index subtree. To do this, the caller |
| 848 | should relinquish and retire the cookie they have, and then acquire a new one. |
| 849 | |
| 850 | |
| 851 | ====================== |
| 852 | DATA FILE INVALIDATION |
| 853 | ====================== |
| 854 | |
| 855 | Sometimes it will be necessary to invalidate an object that contains data. |
| 856 | Typically this will be necessary when the server tells the netfs of a foreign |
| 857 | change - at which point the netfs has to throw away all the state it had for an |
| 858 | inode and reload from the server. |
| 859 | |
| 860 | To indicate that a cache object should be invalidated, the following function |
| 861 | can be called: |
| 862 | |
| 863 | void fscache_invalidate(struct fscache_cookie *cookie); |
| 864 | |
| 865 | This can be called with spinlocks held as it defers the work to a thread pool. |
| 866 | All extant storage, retrieval and attribute change ops at this point are |
| 867 | cancelled and discarded. Some future operations will be rejected until the |
| 868 | cache has had a chance to insert a barrier in the operations queue. After |
| 869 | that, operations will be queued again behind the invalidation operation. |
| 870 | |
| 871 | The invalidation operation will perform an attribute change operation and an |
| 872 | auxiliary data update operation as it is very likely these will have changed. |
| 873 | |
| 874 | Using the following function, the netfs can wait for the invalidation operation |
| 875 | to have reached a point at which it can start submitting ordinary operations |
| 876 | once again: |
| 877 | |
| 878 | void fscache_wait_on_invalidate(struct fscache_cookie *cookie); |
David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame] | 879 | |
| 880 | |
| 881 | =========================== |
| 882 | FS-CACHE SPECIFIC PAGE FLAG |
| 883 | =========================== |
| 884 | |
| 885 | FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is |
| 886 | given the alternative name PG_fscache. |
| 887 | |
| 888 | PG_fscache is used to indicate that the page is known by the cache, and that |
| 889 | the cache must be informed if the page is going to go away. It's an indication |
| 890 | to the netfs that the cache has an interest in this page, where an interest may |
| 891 | be a pointer to it, resources allocated or reserved for it, or I/O in progress |
| 892 | upon it. |
| 893 | |
| 894 | The netfs can use this information in methods such as releasepage() to |
| 895 | determine whether it needs to uncache a page or update it. |
| 896 | |
| 897 | Furthermore, if this bit is set, releasepage() and invalidatepage() operations |
| 898 | will be called on a page to get rid of it, even if PG_private is not set. This |
| 899 | allows caching to attempted on a page before read_cache_pages() to be called |
| 900 | after fscache_read_or_alloc_pages() as the former will try and release pages it |
| 901 | was given under certain circumstances. |
| 902 | |
| 903 | This bit does not overlap with such as PG_private. This means that FS-Cache |
| 904 | can be used with a filesystem that uses the block buffering code. |
| 905 | |
| 906 | There are a number of operations defined on this flag: |
| 907 | |
| 908 | int PageFsCache(struct page *page); |
| 909 | void SetPageFsCache(struct page *page) |
| 910 | void ClearPageFsCache(struct page *page) |
| 911 | int TestSetPageFsCache(struct page *page) |
| 912 | int TestClearPageFsCache(struct page *page) |
| 913 | |
| 914 | These functions are bit test, bit set, bit clear, bit test and set and bit |
| 915 | test and clear operations on PG_fscache. |