blob: a2db352870036ba062aa177ce450d0d491f7d8c8 [file] [log] [blame]
David Howells98870ab2008-11-14 10:39:26 +11001 ====================
2 CREDENTIALS IN LINUX
3 ====================
4
5By: David Howells <dhowells@redhat.com>
6
7Contents:
8
9 (*) Overview.
10
11 (*) Types of credentials.
12
13 (*) File markings.
14
15 (*) Task credentials.
16
17 - Immutable credentials.
18 - Accessing task credentials.
19 - Accessing another task's credentials.
20 - Altering credentials.
21 - Managing credentials.
22
23 (*) Open file credentials.
24
25 (*) Overriding the VFS's use of credentials.
26
27
28========
29OVERVIEW
30========
31
32There are several parts to the security check performed by Linux when one
33object acts upon another:
34
35 (1) Objects.
36
37 Objects are things in the system that may be acted upon directly by
38 userspace programs. Linux has a variety of actionable objects, including:
39
40 - Tasks
41 - Files/inodes
42 - Sockets
43 - Message queues
44 - Shared memory segments
45 - Semaphores
46 - Keys
47
48 As a part of the description of all these objects there is a set of
49 credentials. What's in the set depends on the type of object.
50
51 (2) Object ownership.
52
53 Amongst the credentials of most objects, there will be a subset that
54 indicates the ownership of that object. This is used for resource
55 accounting and limitation (disk quotas and task rlimits for example).
56
57 In a standard UNIX filesystem, for instance, this will be defined by the
58 UID marked on the inode.
59
60 (3) The objective context.
61
62 Also amongst the credentials of those objects, there will be a subset that
63 indicates the 'objective context' of that object. This may or may not be
64 the same set as in (2) - in standard UNIX files, for instance, this is the
65 defined by the UID and the GID marked on the inode.
66
67 The objective context is used as part of the security calculation that is
68 carried out when an object is acted upon.
69
70 (4) Subjects.
71
72 A subject is an object that is acting upon another object.
73
74 Most of the objects in the system are inactive: they don't act on other
75 objects within the system. Processes/tasks are the obvious exception:
76 they do stuff; they access and manipulate things.
77
78 Objects other than tasks may under some circumstances also be subjects.
79 For instance an open file may send SIGIO to a task using the UID and EUID
80 given to it by a task that called fcntl(F_SETOWN) upon it. In this case,
81 the file struct will have a subjective context too.
82
83 (5) The subjective context.
84
85 A subject has an additional interpretation of its credentials. A subset
86 of its credentials forms the 'subjective context'. The subjective context
87 is used as part of the security calculation that is carried out when a
88 subject acts.
89
90 A Linux task, for example, has the FSUID, FSGID and the supplementary
91 group list for when it is acting upon a file - which are quite separate
92 from the real UID and GID that normally form the objective context of the
93 task.
94
95 (6) Actions.
96
97 Linux has a number of actions available that a subject may perform upon an
98 object. The set of actions available depends on the nature of the subject
99 and the object.
100
101 Actions include reading, writing, creating and deleting files; forking or
102 signalling and tracing tasks.
103
104 (7) Rules, access control lists and security calculations.
105
106 When a subject acts upon an object, a security calculation is made. This
107 involves taking the subjective context, the objective context and the
108 action, and searching one or more sets of rules to see whether the subject
109 is granted or denied permission to act in the desired manner on the
110 object, given those contexts.
111
112 There are two main sources of rules:
113
114 (a) Discretionary access control (DAC):
115
116 Sometimes the object will include sets of rules as part of its
117 description. This is an 'Access Control List' or 'ACL'. A Linux
118 file may supply more than one ACL.
119
120 A traditional UNIX file, for example, includes a permissions mask that
121 is an abbreviated ACL with three fixed classes of subject ('user',
122 'group' and 'other'), each of which may be granted certain privileges
123 ('read', 'write' and 'execute' - whatever those map to for the object
124 in question). UNIX file permissions do not allow the arbitrary
125 specification of subjects, however, and so are of limited use.
126
127 A Linux file might also sport a POSIX ACL. This is a list of rules
128 that grants various permissions to arbitrary subjects.
129
130 (b) Mandatory access control (MAC):
131
132 The system as a whole may have one or more sets of rules that get
133 applied to all subjects and objects, regardless of their source.
134 SELinux and Smack are examples of this.
135
136 In the case of SELinux and Smack, each object is given a label as part
137 of its credentials. When an action is requested, they take the
138 subject label, the object label and the action and look for a rule
139 that says that this action is either granted or denied.
140
141
142====================
143TYPES OF CREDENTIALS
144====================
145
146The Linux kernel supports the following types of credentials:
147
148 (1) Traditional UNIX credentials.
149
150 Real User ID
151 Real Group ID
152
153 The UID and GID are carried by most, if not all, Linux objects, even if in
154 some cases it has to be invented (FAT or CIFS files for example, which are
155 derived from Windows). These (mostly) define the objective context of
156 that object, with tasks being slightly different in some cases.
157
158 Effective, Saved and FS User ID
159 Effective, Saved and FS Group ID
160 Supplementary groups
161
162 These are additional credentials used by tasks only. Usually, an
163 EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
164 will be used as the objective. For tasks, it should be noted that this is
165 not always true.
166
167 (2) Capabilities.
168
169 Set of permitted capabilities
170 Set of inheritable capabilities
171 Set of effective capabilities
172 Capability bounding set
173
174 These are only carried by tasks. They indicate superior capabilities
175 granted piecemeal to a task that an ordinary task wouldn't otherwise have.
176 These are manipulated implicitly by changes to the traditional UNIX
177 credentials, but can also be manipulated directly by the capset() system
178 call.
179
180 The permitted capabilities are those caps that the process might grant
181 itself to its effective or permitted sets through capset(). This
182 inheritable set might also be so constrained.
183
184 The effective capabilities are the ones that a task is actually allowed to
185 make use of itself.
186
187 The inheritable capabilities are the ones that may get passed across
188 execve().
189
190 The bounding set limits the capabilities that may be inherited across
191 execve(), especially when a binary is executed that will execute as UID 0.
192
193 (3) Secure management flags (securebits).
194
195 These are only carried by tasks. These govern the way the above
196 credentials are manipulated and inherited over certain operations such as
197 execve(). They aren't used directly as objective or subjective
198 credentials.
199
200 (4) Keys and keyrings.
201
202 These are only carried by tasks. They carry and cache security tokens
203 that don't fit into the other standard UNIX credentials. They are for
204 making such things as network filesystem keys available to the file
205 accesses performed by processes, without the necessity of ordinary
206 programs having to know about security details involved.
207
208 Keyrings are a special type of key. They carry sets of other keys and can
209 be searched for the desired key. Each process may subscribe to a number
210 of keyrings:
211
212 Per-thread keying
213 Per-process keyring
214 Per-session keyring
215
216 When a process accesses a key, if not already present, it will normally be
217 cached on one of these keyrings for future accesses to find.
218
219 For more information on using keys, see Documentation/keys.txt.
220
221 (5) LSM
222
223 The Linux Security Module allows extra controls to be placed over the
224 operations that a task may do. Currently Linux supports two main
225 alternate LSM options: SELinux and Smack.
226
227 Both work by labelling the objects in a system and then applying sets of
228 rules (policies) that say what operations a task with one label may do to
229 an object with another label.
230
231 (6) AF_KEY
232
233 This is a socket-based approach to credential management for networking
234 stacks [RFC 2367]. It isn't discussed by this document as it doesn't
235 interact directly with task and file credentials; rather it keeps system
236 level credentials.
237
238
239When a file is opened, part of the opening task's subjective context is
240recorded in the file struct created. This allows operations using that file
241struct to use those credentials instead of the subjective context of the task
242that issued the operation. An example of this would be a file opened on a
243network filesystem where the credentials of the opened file should be presented
244to the server, regardless of who is actually doing a read or a write upon it.
245
246
247=============
248FILE MARKINGS
249=============
250
251Files on disk or obtained over the network may have annotations that form the
252objective security context of that file. Depending on the type of filesystem,
253this may include one or more of the following:
254
255 (*) UNIX UID, GID, mode;
256
257 (*) Windows user ID;
258
259 (*) Access control list;
260
261 (*) LSM security label;
262
263 (*) UNIX exec privilege escalation bits (SUID/SGID);
264
265 (*) File capabilities exec privilege escalation bits.
266
267These are compared to the task's subjective security context, and certain
268operations allowed or disallowed as a result. In the case of execve(), the
269privilege escalation bits come into play, and may allow the resulting process
270extra privileges, based on the annotations on the executable file.
271
272
273================
274TASK CREDENTIALS
275================
276
277In Linux, all of a task's credentials are held in (uid, gid) or through
278(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
279Each task points to its credentials by a pointer called 'cred' in its
280task_struct.
281
282Once a set of credentials has been prepared and committed, it may not be
283changed, barring the following exceptions:
284
285 (1) its reference count may be changed;
286
287 (2) the reference count on the group_info struct it points to may be changed;
288
289 (3) the reference count on the security data it points to may be changed;
290
291 (4) the reference count on any keyrings it points to may be changed;
292
293 (5) any keyrings it points to may be revoked, expired or have their security
294 attributes changed; and
295
296 (6) the contents of any keyrings to which it points may be changed (the whole
297 point of keyrings being a shared set of credentials, modifiable by anyone
298 with appropriate access).
299
300To alter anything in the cred struct, the copy-and-replace principle must be
301adhered to. First take a copy, then alter the copy and then use RCU to change
302the task pointer to make it point to the new copy. There are wrappers to aid
303with this (see below).
304
305A task may only alter its _own_ credentials; it is no longer permitted for a
306task to alter another's credentials. This means the capset() system call is no
307longer permitted to take any PID other than the one of the current process.
308Also keyctl_instantiate() and keyctl_negate() functions no longer permit
309attachment to process-specific keyrings in the requesting process as the
310instantiating process may need to create them.
311
312
313IMMUTABLE CREDENTIALS
314---------------------
315
316Once a set of credentials has been made public (by calling commit_creds() for
317example), it must be considered immutable, barring two exceptions:
318
319 (1) The reference count may be altered.
320
321 (2) Whilst the keyring subscriptions of a set of credentials may not be
322 changed, the keyrings subscribed to may have their contents altered.
323
324To catch accidental credential alteration at compile time, struct task_struct
325has _const_ pointers to its credential sets, as does struct file. Furthermore,
326certain functions such as get_cred() and put_cred() operate on const pointers,
327thus rendering casts unnecessary, but require to temporarily ditch the const
328qualification to be able to alter the reference count.
329
330
331ACCESSING TASK CREDENTIALS
332--------------------------
333
334A task being able to alter only its own credentials permits the current process
335to read or replace its own credentials without the need for any form of locking
336- which simplifies things greatly. It can just call:
337
338 const struct cred *current_cred()
339
340to get a pointer to its credentials structure, and it doesn't have to release
341it afterwards.
342
343There are convenience wrappers for retrieving specific aspects of a task's
344credentials (the value is simply returned in each case):
345
346 uid_t current_uid(void) Current's real UID
347 gid_t current_gid(void) Current's real GID
348 uid_t current_euid(void) Current's effective UID
349 gid_t current_egid(void) Current's effective GID
350 uid_t current_fsuid(void) Current's file access UID
351 gid_t current_fsgid(void) Current's file access GID
352 kernel_cap_t current_cap(void) Current's effective capabilities
353 void *current_security(void) Current's LSM security pointer
354 struct user_struct *current_user(void) Current's user account
355
356There are also convenience wrappers for retrieving specific associated pairs of
357a task's credentials:
358
359 void current_uid_gid(uid_t *, gid_t *);
360 void current_euid_egid(uid_t *, gid_t *);
361 void current_fsuid_fsgid(uid_t *, gid_t *);
362
363which return these pairs of values through their arguments after retrieving
364them from the current task's credentials.
365
366
367In addition, there is a function for obtaining a reference on the current
368process's current set of credentials:
369
370 const struct cred *get_current_cred(void);
371
372and functions for getting references to one of the credentials that don't
373actually live in struct cred:
374
375 struct user_struct *get_current_user(void);
376 struct group_info *get_current_groups(void);
377
378which get references to the current process's user accounting structure and
379supplementary groups list respectively.
380
381Once a reference has been obtained, it must be released with put_cred(),
382free_uid() or put_group_info() as appropriate.
383
384
385ACCESSING ANOTHER TASK'S CREDENTIALS
386------------------------------------
387
388Whilst a task may access its own credentials without the need for locking, the
389same is not true of a task wanting to access another task's credentials. It
390must use the RCU read lock and rcu_dereference().
391
392The rcu_dereference() is wrapped by:
393
394 const struct cred *__task_cred(struct task_struct *task);
395
396This should be used inside the RCU read lock, as in the following example:
397
398 void foo(struct task_struct *t, struct foo_data *f)
399 {
400 const struct cred *tcred;
401 ...
402 rcu_read_lock();
403 tcred = __task_cred(t);
404 f->uid = tcred->uid;
405 f->gid = tcred->gid;
406 f->groups = get_group_info(tcred->groups);
407 rcu_read_unlock();
408 ...
409 }
410
David Howells98870ab2008-11-14 10:39:26 +1100411Should it be necessary to hold another task's credentials for a long period of
412time, and possibly to sleep whilst doing so, then the caller should get a
413reference on them using:
414
415 const struct cred *get_task_cred(struct task_struct *task);
416
417This does all the RCU magic inside of it. The caller must call put_cred() on
418the credentials so obtained when they're finished with.
419
420There are a couple of convenience functions to access bits of another task's
421credentials, hiding the RCU magic from the caller:
422
423 uid_t task_uid(task) Task's real UID
424 uid_t task_euid(task) Task's effective UID
425
Serge E. Hallynb03df872010-04-26 11:58:49 +0100426If the caller is holding the RCU read lock at the time anyway, then:
David Howells98870ab2008-11-14 10:39:26 +1100427
428 __task_cred(task)->uid
429 __task_cred(task)->euid
430
431should be used instead. Similarly, if multiple aspects of a task's credentials
Serge E. Hallynb03df872010-04-26 11:58:49 +0100432need to be accessed, RCU read lock should be used, __task_cred() called, the
433result stored in a temporary pointer and then the credential aspects called
434from that before dropping the lock. This prevents the potentially expensive
435RCU magic from being invoked multiple times.
David Howells98870ab2008-11-14 10:39:26 +1100436
437Should some other single aspect of another task's credentials need to be
438accessed, then this can be used:
439
440 task_cred_xxx(task, member)
441
442where 'member' is a non-pointer member of the cred struct. For instance:
443
444 uid_t task_cred_xxx(task, suid);
445
446will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
447magic. This may not be used for pointer members as what they point to may
448disappear the moment the RCU read lock is dropped.
449
450
451ALTERING CREDENTIALS
452--------------------
453
454As previously mentioned, a task may only alter its own credentials, and may not
455alter those of another task. This means that it doesn't need to use any
456locking to alter its own credentials.
457
458To alter the current process's credentials, a function should first prepare a
459new set of credentials by calling:
460
461 struct cred *prepare_creds(void);
462
463this locks current->cred_replace_mutex and then allocates and constructs a
464duplicate of the current process's credentials, returning with the mutex still
465held if successful. It returns NULL if not successful (out of memory).
466
467The mutex prevents ptrace() from altering the ptrace state of a process whilst
468security checks on credentials construction and changing is taking place as
469the ptrace state may alter the outcome, particularly in the case of execve().
470
471The new credentials set should be altered appropriately, and any security
472checks and hooks done. Both the current and the proposed sets of credentials
473are available for this purpose as current_cred() will return the current set
474still at this point.
475
476
477When the credential set is ready, it should be committed to the current process
478by calling:
479
480 int commit_creds(struct cred *new);
481
482This will alter various aspects of the credentials and the process, giving the
483LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually
484commit the new credentials to current->cred, it will release
485current->cred_replace_mutex to allow ptrace() to take place, and it will notify
486the scheduler and others of the changes.
487
488This function is guaranteed to return 0, so that it can be tail-called at the
489end of such functions as sys_setresuid().
490
491Note that this function consumes the caller's reference to the new credentials.
492The caller should _not_ call put_cred() on the new credentials afterwards.
493
494Furthermore, once this function has been called on a new set of credentials,
495those credentials may _not_ be changed further.
496
497
498Should the security checks fail or some other error occur after prepare_creds()
499has been called, then the following function should be invoked:
500
501 void abort_creds(struct cred *new);
502
503This releases the lock on current->cred_replace_mutex that prepare_creds() got
504and then releases the new credentials.
505
506
507A typical credentials alteration function would look something like this:
508
509 int alter_suid(uid_t suid)
510 {
511 struct cred *new;
512 int ret;
513
514 new = prepare_creds();
515 if (!new)
516 return -ENOMEM;
517
518 new->suid = suid;
519 ret = security_alter_suid(new);
520 if (ret < 0) {
521 abort_creds(new);
522 return ret;
523 }
524
525 return commit_creds(new);
526 }
527
528
529MANAGING CREDENTIALS
530--------------------
531
532There are some functions to help manage credentials:
533
534 (*) void put_cred(const struct cred *cred);
535
536 This releases a reference to the given set of credentials. If the
537 reference count reaches zero, the credentials will be scheduled for
538 destruction by the RCU system.
539
540 (*) const struct cred *get_cred(const struct cred *cred);
541
542 This gets a reference on a live set of credentials, returning a pointer to
543 that set of credentials.
544
545 (*) struct cred *get_new_cred(struct cred *cred);
546
547 This gets a reference on a set of credentials that is under construction
548 and is thus still mutable, returning a pointer to that set of credentials.
549
550
551=====================
552OPEN FILE CREDENTIALS
553=====================
554
555When a new file is opened, a reference is obtained on the opening task's
556credentials and this is attached to the file struct as 'f_cred' in place of
557'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid
558should now access file->f_cred->fsuid and file->f_cred->fsgid.
559
560It is safe to access f_cred without the use of RCU or locking because the
561pointer will not change over the lifetime of the file struct, and nor will the
562contents of the cred struct pointed to, barring the exceptions listed above
563(see the Task Credentials section).
564
565
566=======================================
567OVERRIDING THE VFS'S USE OF CREDENTIALS
568=======================================
569
570Under some circumstances it is desirable to override the credentials used by
571the VFS, and that can be done by calling into such as vfs_mkdir() with a
572different set of credentials. This is done in the following places:
573
574 (*) sys_faccessat().
575
576 (*) do_coredump().
577
578 (*) nfs4recover.c.