blob: ac2facc50d2a2b2914d10a3268c3ae375ae57d6c [file] [log] [blame]
Dipankar Sarma282254182005-09-09 13:04:15 -07001File management in the Linux kernel
2-----------------------------------
3
4This document describes how locking for files (struct file)
5and file descriptor table (struct files) works.
6
7Up until 2.6.12, the file descriptor table has been protected
8with a lock (files->file_lock) and reference count (files->count).
9->file_lock protected accesses to all the file related fields
10of the table. ->count was used for sharing the file descriptor
11table between tasks cloned with CLONE_FILES flag. Typically
12this would be the case for posix threads. As with the common
13refcounting model in the kernel, the last task doing
14a put_files_struct() frees the file descriptor (fd) table.
15The files (struct file) themselves are protected using
16reference count (->f_count).
17
18In the new lock-free model of file descriptor management,
19the reference counting is similar, but the locking is
20based on RCU. The file descriptor table contains multiple
21elements - the fd sets (open_fds and close_on_exec, the
22array of file pointers, the sizes of the sets and the array
23etc.). In order for the updates to appear atomic to
24a lock-free reader, all the elements of the file descriptor
25table are in a separate structure - struct fdtable.
26files_struct contains a pointer to struct fdtable through
27which the actual fd table is accessed. Initially the
28fdtable is embedded in files_struct itself. On a subsequent
29expansion of fdtable, a new fdtable structure is allocated
30and files->fdtab points to the new structure. The fdtable
31structure is freed with RCU and lock-free readers either
32see the old fdtable or the new fdtable making the update
33appear atomic. Here are the locking rules for
34the fdtable structure -
35
361. All references to the fdtable must be done through
37 the files_fdtable() macro :
38
39 struct fdtable *fdt;
40
41 rcu_read_lock();
42
43 fdt = files_fdtable(files);
44 ....
45 if (n <= fdt->max_fds)
46 ....
47 ...
48 rcu_read_unlock();
49
50 files_fdtable() uses rcu_dereference() macro which takes care of
51 the memory barrier requirements for lock-free dereference.
52 The fdtable pointer must be read within the read-side
53 critical section.
54
552. Reading of the fdtable as described above must be protected
56 by rcu_read_lock()/rcu_read_unlock().
57
Paolo Ornati670e9f32006-10-03 22:57:56 +0200583. For any update to the fd table, files->file_lock must
Dipankar Sarma282254182005-09-09 13:04:15 -070059 be held.
60
614. To look up the file structure given an fd, a reader
62 must use either fcheck() or fcheck_files() APIs. These
63 take care of barrier requirements due to lock-free lookup.
64 An example :
65
66 struct file *file;
67
68 rcu_read_lock();
69 file = fcheck(fd);
70 if (file) {
71 ...
72 }
73 ....
74 rcu_read_unlock();
75
765. Handling of the file structures is special. Since the look-up
77 of the fd (fget()/fget_light()) are lock-free, it is possible
78 that look-up may race with the last put() operation on the
Eric Dumazetfd659fd2008-12-10 09:35:45 -080079 file structure. This is avoided using atomic_long_inc_not_zero()
Dipankar Sarma282254182005-09-09 13:04:15 -070080 on ->f_count :
81
82 rcu_read_lock();
83 file = fcheck_files(files, fd);
84 if (file) {
Eric Dumazetfd659fd2008-12-10 09:35:45 -080085 if (atomic_long_inc_not_zero(&file->f_count))
Dipankar Sarma282254182005-09-09 13:04:15 -070086 *fput_needed = 1;
87 else
88 /* Didn't get the reference, someone's freed */
89 file = NULL;
90 }
91 rcu_read_unlock();
92 ....
93 return file;
94
Eric Dumazetfd659fd2008-12-10 09:35:45 -080095 atomic_long_inc_not_zero() detects if refcounts is already zero or
Dipankar Sarma282254182005-09-09 13:04:15 -070096 goes to zero during increment. If it does, we fail
97 fget()/fget_light().
98
996. Since both fdtable and file structures can be looked up
100 lock-free, they must be installed using rcu_assign_pointer()
101 API. If they are looked up lock-free, rcu_dereference()
102 must be used. However it is advisable to use files_fdtable()
103 and fcheck()/fcheck_files() which take care of these issues.
104
1057. While updating, the fdtable pointer must be looked up while
106 holding files->file_lock. If ->file_lock is dropped, then
107 another thread expand the files thereby creating a new
108 fdtable and making the earlier fdtable pointer stale.
109 For example :
110
111 spin_lock(&files->file_lock);
112 fd = locate_fd(files, file, start);
113 if (fd >= 0) {
114 /* locate_fd() may have expanded fdtable, load the ptr */
115 fdt = files_fdtable(files);
116 FD_SET(fd, fdt->open_fds);
117 FD_CLR(fd, fdt->close_on_exec);
118 spin_unlock(&files->file_lock);
119 .....
120
121 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
122 the fdtable pointer (fdt) must be loaded after locate_fd().
123