David Howells | 2d6fff6 | 2009-04-03 16:42:36 +0100 | [diff] [blame^] | 1 | ========================== |
| 2 | General Filesystem Caching |
| 3 | ========================== |
| 4 | |
| 5 | ======== |
| 6 | OVERVIEW |
| 7 | ======== |
| 8 | |
| 9 | This facility is a general purpose cache for network filesystems, though it |
| 10 | could be used for caching other things such as ISO9660 filesystems too. |
| 11 | |
| 12 | FS-Cache mediates between cache backends (such as CacheFS) and network |
| 13 | filesystems: |
| 14 | |
| 15 | +---------+ |
| 16 | | | +--------------+ |
| 17 | | NFS |--+ | | |
| 18 | | | | +-->| CacheFS | |
| 19 | +---------+ | +----------+ | | /dev/hda5 | |
| 20 | | | | | +--------------+ |
| 21 | +---------+ +-->| | | |
| 22 | | | | |--+ |
| 23 | | AFS |----->| FS-Cache | |
| 24 | | | | |--+ |
| 25 | +---------+ +-->| | | |
| 26 | | | | | +--------------+ |
| 27 | +---------+ | +----------+ | | | |
| 28 | | | | +-->| CacheFiles | |
| 29 | | ISOFS |--+ | /var/cache | |
| 30 | | | +--------------+ |
| 31 | +---------+ |
| 32 | |
| 33 | Or to look at it another way, FS-Cache is a module that provides a caching |
| 34 | facility to a network filesystem such that the cache is transparent to the |
| 35 | user: |
| 36 | |
| 37 | +---------+ |
| 38 | | | |
| 39 | | Server | |
| 40 | | | |
| 41 | +---------+ |
| 42 | | NETWORK |
| 43 | ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 44 | | |
| 45 | | +----------+ |
| 46 | V | | |
| 47 | +---------+ | | |
| 48 | | | | | |
| 49 | | NFS |----->| FS-Cache | |
| 50 | | | | |--+ |
| 51 | +---------+ | | | +--------------+ +--------------+ |
| 52 | | | | | | | | | |
| 53 | V +----------+ +-->| CacheFiles |-->| Ext3 | |
| 54 | +---------+ | /var/cache | | /dev/sda6 | |
| 55 | | | +--------------+ +--------------+ |
| 56 | | VFS | ^ ^ |
| 57 | | | | | |
| 58 | +---------+ +--------------+ | |
| 59 | | KERNEL SPACE | | |
| 60 | ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~ |
| 61 | | USER SPACE | | |
| 62 | V | | |
| 63 | +---------+ +--------------+ |
| 64 | | | | | |
| 65 | | Process | | cachefilesd | |
| 66 | | | | | |
| 67 | +---------+ +--------------+ |
| 68 | |
| 69 | |
| 70 | FS-Cache does not follow the idea of completely loading every netfs file |
| 71 | opened in its entirety into a cache before permitting it to be accessed and |
| 72 | then serving the pages out of that cache rather than the netfs inode because: |
| 73 | |
| 74 | (1) It must be practical to operate without a cache. |
| 75 | |
| 76 | (2) The size of any accessible file must not be limited to the size of the |
| 77 | cache. |
| 78 | |
| 79 | (3) The combined size of all opened files (this includes mapped libraries) |
| 80 | must not be limited to the size of the cache. |
| 81 | |
| 82 | (4) The user should not be forced to download an entire file just to do a |
| 83 | one-off access of a small portion of it (such as might be done with the |
| 84 | "file" program). |
| 85 | |
| 86 | It instead serves the cache out in PAGE_SIZE chunks as and when requested by |
| 87 | the netfs('s) using it. |
| 88 | |
| 89 | |
| 90 | FS-Cache provides the following facilities: |
| 91 | |
| 92 | (1) More than one cache can be used at once. Caches can be selected |
| 93 | explicitly by use of tags. |
| 94 | |
| 95 | (2) Caches can be added / removed at any time. |
| 96 | |
| 97 | (3) The netfs is provided with an interface that allows either party to |
| 98 | withdraw caching facilities from a file (required for (2)). |
| 99 | |
| 100 | (4) The interface to the netfs returns as few errors as possible, preferring |
| 101 | rather to let the netfs remain oblivious. |
| 102 | |
| 103 | (5) Cookies are used to represent indices, files and other objects to the |
| 104 | netfs. The simplest cookie is just a NULL pointer - indicating nothing |
| 105 | cached there. |
| 106 | |
| 107 | (6) The netfs is allowed to propose - dynamically - any index hierarchy it |
| 108 | desires, though it must be aware that the index search function is |
| 109 | recursive, stack space is limited, and indices can only be children of |
| 110 | indices. |
| 111 | |
| 112 | (7) Data I/O is done direct to and from the netfs's pages. The netfs |
| 113 | indicates that page A is at index B of the data-file represented by cookie |
| 114 | C, and that it should be read or written. The cache backend may or may |
| 115 | not start I/O on that page, but if it does, a netfs callback will be |
| 116 | invoked to indicate completion. The I/O may be either synchronous or |
| 117 | asynchronous. |
| 118 | |
| 119 | (8) Cookies can be "retired" upon release. At this point FS-Cache will mark |
| 120 | them as obsolete and the index hierarchy rooted at that point will get |
| 121 | recycled. |
| 122 | |
| 123 | (9) The netfs provides a "match" function for index searches. In addition to |
| 124 | saying whether a match was made or not, this can also specify that an |
| 125 | entry should be updated or deleted. |
| 126 | |
| 127 | (10) As much as possible is done asynchronously. |
| 128 | |
| 129 | |
| 130 | FS-Cache maintains a virtual indexing tree in which all indices, files, objects |
| 131 | and pages are kept. Bits of this tree may actually reside in one or more |
| 132 | caches. |
| 133 | |
| 134 | FSDEF |
| 135 | | |
| 136 | +------------------------------------+ |
| 137 | | | |
| 138 | NFS AFS |
| 139 | | | |
| 140 | +--------------------------+ +-----------+ |
| 141 | | | | | |
| 142 | homedir mirror afs.org redhat.com |
| 143 | | | | |
| 144 | +------------+ +---------------+ +----------+ |
| 145 | | | | | | | |
| 146 | 00001 00002 00007 00125 vol00001 vol00002 |
| 147 | | | | | | |
| 148 | +---+---+ +-----+ +---+ +------+------+ +-----+----+ |
| 149 | | | | | | | | | | | | | | |
| 150 | PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak |
| 151 | | | |
| 152 | PG0 +-------+ |
| 153 | | | |
| 154 | 00001 00003 |
| 155 | | |
| 156 | +---+---+ |
| 157 | | | | |
| 158 | PG0 PG1 PG2 |
| 159 | |
| 160 | In the example above, you can see two netfs's being backed: NFS and AFS. These |
| 161 | have different index hierarchies: |
| 162 | |
| 163 | (*) The NFS primary index contains per-server indices. Each server index is |
| 164 | indexed by NFS file handles to get data file objects. Each data file |
| 165 | objects can have an array of pages, but may also have further child |
| 166 | objects, such as extended attributes and directory entries. Extended |
| 167 | attribute objects themselves have page-array contents. |
| 168 | |
| 169 | (*) The AFS primary index contains per-cell indices. Each cell index contains |
| 170 | per-logical-volume indices. Each of volume index contains up to three |
| 171 | indices for the read-write, read-only and backup mirrors of those volumes. |
| 172 | Each of these contains vnode data file objects, each of which contains an |
| 173 | array of pages. |
| 174 | |
| 175 | The very top index is the FS-Cache master index in which individual netfs's |
| 176 | have entries. |
| 177 | |
| 178 | Any index object may reside in more than one cache, provided it only has index |
| 179 | children. Any index with non-index object children will be assumed to only |
| 180 | reside in one cache. |
| 181 | |
| 182 | |
| 183 | The netfs API to FS-Cache can be found in: |
| 184 | |
| 185 | Documentation/filesystems/caching/netfs-api.txt |
| 186 | |
| 187 | The cache backend API to FS-Cache can be found in: |
| 188 | |
| 189 | Documentation/filesystems/caching/backend-api.txt |
| 190 | |
| 191 | |
| 192 | ======================= |
| 193 | STATISTICAL INFORMATION |
| 194 | ======================= |
| 195 | |
| 196 | If FS-Cache is compiled with the following options enabled: |
| 197 | |
| 198 | CONFIG_FSCACHE_PROC=y (implied by the following two) |
| 199 | CONFIG_FSCACHE_STATS=y |
| 200 | CONFIG_FSCACHE_HISTOGRAM=y |
| 201 | |
| 202 | then it will gather certain statistics and display them through a number of |
| 203 | proc files. |
| 204 | |
| 205 | (*) /proc/fs/fscache/stats |
| 206 | |
| 207 | This shows counts of a number of events that can happen in FS-Cache: |
| 208 | |
| 209 | CLASS EVENT MEANING |
| 210 | ======= ======= ======================================================= |
| 211 | Cookies idx=N Number of index cookies allocated |
| 212 | dat=N Number of data storage cookies allocated |
| 213 | spc=N Number of special cookies allocated |
| 214 | Objects alc=N Number of objects allocated |
| 215 | nal=N Number of object allocation failures |
| 216 | avl=N Number of objects that reached the available state |
| 217 | ded=N Number of objects that reached the dead state |
| 218 | ChkAux non=N Number of objects that didn't have a coherency check |
| 219 | ok=N Number of objects that passed a coherency check |
| 220 | upd=N Number of objects that needed a coherency data update |
| 221 | obs=N Number of objects that were declared obsolete |
| 222 | Pages mrk=N Number of pages marked as being cached |
| 223 | unc=N Number of uncache page requests seen |
| 224 | Acquire n=N Number of acquire cookie requests seen |
| 225 | nul=N Number of acq reqs given a NULL parent |
| 226 | noc=N Number of acq reqs rejected due to no cache available |
| 227 | ok=N Number of acq reqs succeeded |
| 228 | nbf=N Number of acq reqs rejected due to error |
| 229 | oom=N Number of acq reqs failed on ENOMEM |
| 230 | Lookups n=N Number of lookup calls made on cache backends |
| 231 | neg=N Number of negative lookups made |
| 232 | pos=N Number of positive lookups made |
| 233 | crt=N Number of objects created by lookup |
| 234 | Updates n=N Number of update cookie requests seen |
| 235 | nul=N Number of upd reqs given a NULL parent |
| 236 | run=N Number of upd reqs granted CPU time |
| 237 | Relinqs n=N Number of relinquish cookie requests seen |
| 238 | nul=N Number of rlq reqs given a NULL parent |
| 239 | wcr=N Number of rlq reqs waited on completion of creation |
| 240 | AttrChg n=N Number of attribute changed requests seen |
| 241 | ok=N Number of attr changed requests queued |
| 242 | nbf=N Number of attr changed rejected -ENOBUFS |
| 243 | oom=N Number of attr changed failed -ENOMEM |
| 244 | run=N Number of attr changed ops given CPU time |
| 245 | Allocs n=N Number of allocation requests seen |
| 246 | ok=N Number of successful alloc reqs |
| 247 | wt=N Number of alloc reqs that waited on lookup completion |
| 248 | nbf=N Number of alloc reqs rejected -ENOBUFS |
| 249 | ops=N Number of alloc reqs submitted |
| 250 | owt=N Number of alloc reqs waited for CPU time |
| 251 | Retrvls n=N Number of retrieval (read) requests seen |
| 252 | ok=N Number of successful retr reqs |
| 253 | wt=N Number of retr reqs that waited on lookup completion |
| 254 | nod=N Number of retr reqs returned -ENODATA |
| 255 | nbf=N Number of retr reqs rejected -ENOBUFS |
| 256 | int=N Number of retr reqs aborted -ERESTARTSYS |
| 257 | oom=N Number of retr reqs failed -ENOMEM |
| 258 | ops=N Number of retr reqs submitted |
| 259 | owt=N Number of retr reqs waited for CPU time |
| 260 | Stores n=N Number of storage (write) requests seen |
| 261 | ok=N Number of successful store reqs |
| 262 | agn=N Number of store reqs on a page already pending storage |
| 263 | nbf=N Number of store reqs rejected -ENOBUFS |
| 264 | oom=N Number of store reqs failed -ENOMEM |
| 265 | ops=N Number of store reqs submitted |
| 266 | run=N Number of store reqs granted CPU time |
| 267 | Ops pend=N Number of times async ops added to pending queues |
| 268 | run=N Number of times async ops given CPU time |
| 269 | enq=N Number of times async ops queued for processing |
| 270 | dfr=N Number of async ops queued for deferred release |
| 271 | rel=N Number of async ops released |
| 272 | gc=N Number of deferred-release async ops garbage collected |
| 273 | |
| 274 | |
| 275 | (*) /proc/fs/fscache/histogram |
| 276 | |
| 277 | cat /proc/fs/fscache/histogram |
| 278 | +HZ +TIME OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS |
| 279 | ===== ===== ========= ========= ========= ========= ========= |
| 280 | |
| 281 | This shows the breakdown of the number of times each amount of time |
| 282 | between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The |
| 283 | columns are as follows: |
| 284 | |
| 285 | COLUMN TIME MEASUREMENT |
| 286 | ======= ======================================================= |
| 287 | OBJ INST Length of time to instantiate an object |
| 288 | OP RUNS Length of time a call to process an operation took |
| 289 | OBJ RUNS Length of time a call to process an object event took |
| 290 | RETRV DLY Time between an requesting a read and lookup completing |
| 291 | RETRIEVLS Time between beginning and end of a retrieval |
| 292 | |
| 293 | Each row shows the number of events that took a particular range of times. |
| 294 | Each step is 1 jiffy in size. The +HZ column indicates the particular |
| 295 | jiffy range covered, and the +TIME field the equivalent number of seconds. |
| 296 | |
| 297 | |
| 298 | ========= |
| 299 | DEBUGGING |
| 300 | ========= |
| 301 | |
| 302 | The FS-Cache facility can have runtime debugging enabled by adjusting the value |
| 303 | in: |
| 304 | |
| 305 | /sys/module/fscache/parameters/debug |
| 306 | |
| 307 | This is a bitmask of debugging streams to enable: |
| 308 | |
| 309 | BIT VALUE STREAM POINT |
| 310 | ======= ======= =============================== ======================= |
| 311 | 0 1 Cache management Function entry trace |
| 312 | 1 2 Function exit trace |
| 313 | 2 4 General |
| 314 | 3 8 Cookie management Function entry trace |
| 315 | 4 16 Function exit trace |
| 316 | 5 32 General |
| 317 | 6 64 Page handling Function entry trace |
| 318 | 7 128 Function exit trace |
| 319 | 8 256 General |
| 320 | 9 512 Operation management Function entry trace |
| 321 | 10 1024 Function exit trace |
| 322 | 11 2048 General |
| 323 | |
| 324 | The appropriate set of values should be OR'd together and the result written to |
| 325 | the control file. For example: |
| 326 | |
| 327 | echo $((1|8|64)) >/sys/module/fscache/parameters/debug |
| 328 | |
| 329 | will turn on all function entry debugging. |
| 330 | |