blob: 3af384156d7e0776d8b1fa35225a5f8745484390 [file] [log] [blame]
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -03001===================================================
2Adding reference counters (krefs) to kernel objects
3===================================================
4
5:Author: Corey Minyard <minyard@acm.org>
6:Author: Thomas Hellstrom <thellstrom@vmware.com>
7
8A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
9presentation on krefs, which can be found at:
10
11 - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
12 - http://www.kroah.com/linux/talks/ols_2004_kref_talk/
13
14Introduction
15============
minyard@acm.org5c11c522005-04-18 21:57:30 -070016
17krefs allow you to add reference counters to your objects. If you
18have objects that are used in multiple places and passed around, and
19you don't have refcounts, your code is almost certainly broken. If
20you want refcounts, krefs are the way to go.
21
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030022To use a kref, add one to your data structures like::
minyard@acm.org5c11c522005-04-18 21:57:30 -070023
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030024 struct my_data
25 {
minyard@acm.org5c11c522005-04-18 21:57:30 -070026 .
27 .
28 struct kref refcount;
29 .
30 .
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030031 };
minyard@acm.org5c11c522005-04-18 21:57:30 -070032
33The kref can occur anywhere within the data structure.
34
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030035Initialization
36==============
37
minyard@acm.org5c11c522005-04-18 21:57:30 -070038You must initialize the kref after you allocate it. To do this, call
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030039kref_init as so::
minyard@acm.org5c11c522005-04-18 21:57:30 -070040
41 struct my_data *data;
42
43 data = kmalloc(sizeof(*data), GFP_KERNEL);
44 if (!data)
45 return -ENOMEM;
46 kref_init(&data->refcount);
47
48This sets the refcount in the kref to 1.
49
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030050Kref rules
51==========
52
minyard@acm.org5c11c522005-04-18 21:57:30 -070053Once you have an initialized kref, you must follow the following
54rules:
55
561) If you make a non-temporary copy of a pointer, especially if
57 it can be passed to another thread of execution, you must
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030058 increment the refcount with kref_get() before passing it off::
59
minyard@acm.org5c11c522005-04-18 21:57:30 -070060 kref_get(&data->refcount);
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030061
minyard@acm.org5c11c522005-04-18 21:57:30 -070062 If you already have a valid pointer to a kref-ed structure (the
63 refcount cannot go to zero) you may do this without a lock.
64
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300652) When you are done with a pointer, you must call kref_put()::
66
minyard@acm.org5c11c522005-04-18 21:57:30 -070067 kref_put(&data->refcount, data_release);
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030068
minyard@acm.org5c11c522005-04-18 21:57:30 -070069 If this is the last reference to the pointer, the release
70 routine will be called. If the code never tries to get
71 a valid pointer to a kref-ed structure without already
72 holding a valid pointer, it is safe to do this without
73 a lock.
74
753) If the code attempts to gain a reference to a kref-ed structure
76 without already holding a valid pointer, it must serialize access
77 where a kref_put() cannot occur during the kref_get(), and the
78 structure must remain valid during the kref_get().
79
80For example, if you allocate some data and then pass it to another
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030081thread to process::
minyard@acm.org5c11c522005-04-18 21:57:30 -070082
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030083 void data_release(struct kref *ref)
84 {
minyard@acm.org5c11c522005-04-18 21:57:30 -070085 struct my_data *data = container_of(ref, struct my_data, refcount);
86 kfree(data);
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030087 }
minyard@acm.org5c11c522005-04-18 21:57:30 -070088
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030089 void more_data_handling(void *cb_data)
90 {
minyard@acm.org5c11c522005-04-18 21:57:30 -070091 struct my_data *data = cb_data;
92 .
93 . do stuff with data here
94 .
Satyam Sharmab7cc4a82007-05-11 19:07:14 +020095 kref_put(&data->refcount, data_release);
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030096 }
minyard@acm.org5c11c522005-04-18 21:57:30 -070097
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -030098 int my_data_handler(void)
99 {
minyard@acm.org5c11c522005-04-18 21:57:30 -0700100 int rv = 0;
101 struct my_data *data;
102 struct task_struct *task;
103 data = kmalloc(sizeof(*data), GFP_KERNEL);
104 if (!data)
105 return -ENOMEM;
106 kref_init(&data->refcount);
107
108 kref_get(&data->refcount);
109 task = kthread_run(more_data_handling, data, "more_data_handling");
110 if (task == ERR_PTR(-ENOMEM)) {
111 rv = -ENOMEM;
Thomas Hellstromfd0f50d2017-03-06 08:19:27 +0100112 kref_put(&data->refcount, data_release);
minyard@acm.org5c11c522005-04-18 21:57:30 -0700113 goto out;
114 }
115
116 .
117 . do stuff with data here
118 .
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300119 out:
minyard@acm.org5c11c522005-04-18 21:57:30 -0700120 kref_put(&data->refcount, data_release);
121 return rv;
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300122 }
minyard@acm.org5c11c522005-04-18 21:57:30 -0700123
124This way, it doesn't matter what order the two threads handle the
125data, the kref_put() handles knowing when the data is not referenced
126any more and releasing it. The kref_get() does not require a lock,
127since we already have a valid pointer that we own a refcount for. The
128put needs no lock because nothing tries to get the data without
129already holding a pointer.
130
131Note that the "before" in rule 1 is very important. You should never
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300132do something like::
minyard@acm.org5c11c522005-04-18 21:57:30 -0700133
134 task = kthread_run(more_data_handling, data, "more_data_handling");
135 if (task == ERR_PTR(-ENOMEM)) {
136 rv = -ENOMEM;
137 goto out;
138 } else
139 /* BAD BAD BAD - get is after the handoff */
140 kref_get(&data->refcount);
141
142Don't assume you know what you are doing and use the above construct.
143First of all, you may not know what you are doing. Second, you may
144know what you are doing (there are some situations where locking is
145involved where the above may be legal) but someone else who doesn't
146know what they are doing may change the code or copy the code. It's
147bad style. Don't do it.
148
149There are some situations where you can optimize the gets and puts.
150For instance, if you are done with an object and enqueuing it for
151something else or passing it off to something else, there is no reason
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300152to do a get then a put::
minyard@acm.org5c11c522005-04-18 21:57:30 -0700153
154 /* Silly extra get and put */
155 kref_get(&obj->ref);
156 enqueue(obj);
157 kref_put(&obj->ref, obj_cleanup);
158
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300159Just do the enqueue. A comment about this is always welcome::
minyard@acm.org5c11c522005-04-18 21:57:30 -0700160
161 enqueue(obj);
162 /* We are done with obj, so we pass our refcount off
163 to the queue. DON'T TOUCH obj AFTER HERE! */
164
165The last rule (rule 3) is the nastiest one to handle. Say, for
166instance, you have a list of items that are each kref-ed, and you wish
167to get the first one. You can't just pull the first item off the list
168and kref_get() it. That violates rule 3 because you are not already
Daniel Walker1373bed2008-02-06 01:37:58 -0800169holding a valid pointer. You must add a mutex (or some other lock).
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300170For instance::
minyard@acm.org5c11c522005-04-18 21:57:30 -0700171
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300172 static DEFINE_MUTEX(mutex);
173 static LIST_HEAD(q);
174 struct my_data
175 {
176 struct kref refcount;
177 struct list_head link;
178 };
minyard@acm.org5c11c522005-04-18 21:57:30 -0700179
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300180 static struct my_data *get_entry()
181 {
182 struct my_data *entry = NULL;
183 mutex_lock(&mutex);
184 if (!list_empty(&q)) {
185 entry = container_of(q.next, struct my_data, link);
186 kref_get(&entry->refcount);
187 }
188 mutex_unlock(&mutex);
189 return entry;
minyard@acm.org5c11c522005-04-18 21:57:30 -0700190 }
minyard@acm.org5c11c522005-04-18 21:57:30 -0700191
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300192 static void release_entry(struct kref *ref)
193 {
194 struct my_data *entry = container_of(ref, struct my_data, refcount);
minyard@acm.org5c11c522005-04-18 21:57:30 -0700195
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300196 list_del(&entry->link);
197 kfree(entry);
198 }
minyard@acm.org5c11c522005-04-18 21:57:30 -0700199
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300200 static void put_entry(struct my_data *entry)
201 {
202 mutex_lock(&mutex);
203 kref_put(&entry->refcount, release_entry);
204 mutex_unlock(&mutex);
205 }
minyard@acm.org5c11c522005-04-18 21:57:30 -0700206
207The kref_put() return value is useful if you do not want to hold the
208lock during the whole release operation. Say you didn't want to call
209kfree() with the lock held in the example above (since it is kind of
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300210pointless to do so). You could use kref_put() as follows::
minyard@acm.org5c11c522005-04-18 21:57:30 -0700211
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300212 static void release_entry(struct kref *ref)
213 {
214 /* All work is done after the return from kref_put(). */
215 }
minyard@acm.org5c11c522005-04-18 21:57:30 -0700216
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300217 static void put_entry(struct my_data *entry)
218 {
219 mutex_lock(&mutex);
220 if (kref_put(&entry->refcount, release_entry)) {
221 list_del(&entry->link);
222 mutex_unlock(&mutex);
223 kfree(entry);
224 } else
225 mutex_unlock(&mutex);
226 }
minyard@acm.org5c11c522005-04-18 21:57:30 -0700227
228This is really more useful if you have to call other routines as part
229of the free operations that could take a long time or might claim the
230same lock. Note that doing everything in the release routine is still
231preferred as it is a little neater.
232
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000233The above example could also be optimized using kref_get_unless_zero() in
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300234the following way::
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000235
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300236 static struct my_data *get_entry()
237 {
238 struct my_data *entry = NULL;
239 mutex_lock(&mutex);
240 if (!list_empty(&q)) {
241 entry = container_of(q.next, struct my_data, link);
242 if (!kref_get_unless_zero(&entry->refcount))
243 entry = NULL;
244 }
245 mutex_unlock(&mutex);
246 return entry;
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000247 }
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000248
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300249 static void release_entry(struct kref *ref)
250 {
251 struct my_data *entry = container_of(ref, struct my_data, refcount);
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000252
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300253 mutex_lock(&mutex);
254 list_del(&entry->link);
255 mutex_unlock(&mutex);
256 kfree(entry);
257 }
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000258
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300259 static void put_entry(struct my_data *entry)
260 {
261 kref_put(&entry->refcount, release_entry);
262 }
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000263
264Which is useful to remove the mutex lock around kref_put() in put_entry(), but
265it's important that kref_get_unless_zero is enclosed in the same critical
266section that finds the entry in the lookup table,
267otherwise kref_get_unless_zero may reference already freed memory.
268Note that it is illegal to use kref_get_unless_zero without checking its
269return value. If you are sure (by already having a valid pointer) that
270kref_get_unless_zero() will return true, then use kref_get() instead.
271
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300272Krefs and RCU
273=============
274
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000275The function kref_get_unless_zero also makes it possible to use rcu
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300276locking for lookups in the above example::
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000277
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300278 struct my_data
279 {
280 struct rcu_head rhead;
281 .
282 struct kref refcount;
283 .
284 .
285 };
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000286
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300287 static struct my_data *get_entry_rcu()
288 {
289 struct my_data *entry = NULL;
290 rcu_read_lock();
291 if (!list_empty(&q)) {
292 entry = container_of(q.next, struct my_data, link);
293 if (!kref_get_unless_zero(&entry->refcount))
294 entry = NULL;
295 }
296 rcu_read_unlock();
297 return entry;
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000298 }
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000299
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300300 static void release_entry_rcu(struct kref *ref)
301 {
302 struct my_data *entry = container_of(ref, struct my_data, refcount);
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000303
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300304 mutex_lock(&mutex);
305 list_del_rcu(&entry->link);
306 mutex_unlock(&mutex);
307 kfree_rcu(entry, rhead);
308 }
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000309
Mauro Carvalho Chehabd6ac1c72017-05-14 17:13:21 -0300310 static void put_entry(struct my_data *entry)
311 {
312 kref_put(&entry->refcount, release_entry_rcu);
313 }
Thomas Hellstroma82b8db2012-11-20 12:16:48 +0000314
315But note that the struct kref member needs to remain in valid memory for a
316rcu grace period after release_entry_rcu was called. That can be accomplished
317by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
318before using kfree, but note that synchronize_rcu() may sleep for a
319substantial amount of time.