Mike Rapoport | 8ff7e07 | 2018-09-14 12:27:56 +0300 | [diff] [blame] | 1 | .. _gfp_mask_from_fs_io: |
| 2 | |
Michal Hocko | 46ca359 | 2018-05-29 10:26:44 +0200 | [diff] [blame] | 3 | ================================= |
| 4 | GFP masks used from FS/IO context |
| 5 | ================================= |
| 6 | |
| 7 | :Date: May, 2018 |
| 8 | :Author: Michal Hocko <mhocko@kernel.org> |
| 9 | |
| 10 | Introduction |
| 11 | ============ |
| 12 | |
| 13 | Code paths in the filesystem and IO stacks must be careful when |
| 14 | allocating memory to prevent recursion deadlocks caused by direct |
| 15 | memory reclaim calling back into the FS or IO paths and blocking on |
| 16 | already held resources (e.g. locks - most commonly those used for the |
| 17 | transaction context). |
| 18 | |
| 19 | The traditional way to avoid this deadlock problem is to clear __GFP_FS |
| 20 | respectively __GFP_IO (note the latter implies clearing the first as well) in |
| 21 | the gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be |
| 22 | used as shortcut. It turned out though that above approach has led to |
| 23 | abuses when the restricted gfp mask is used "just in case" without a |
| 24 | deeper consideration which leads to problems because an excessive use |
| 25 | of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory |
| 26 | reclaim issues. |
| 27 | |
| 28 | New API |
| 29 | ======== |
| 30 | |
| 31 | Since 4.12 we do have a generic scope API for both NOFS and NOIO context |
| 32 | ``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, |
| 33 | ``memalloc_noio_restore`` which allow to mark a scope to be a critical |
| 34 | section from a filesystem or I/O point of view. Any allocation from that |
| 35 | scope will inherently drop __GFP_FS respectively __GFP_IO from the given |
| 36 | mask so no memory allocation can recurse back in the FS/IO. |
| 37 | |
Jonathan Corbet | d43f2c9 | 2018-05-29 05:44:58 -0600 | [diff] [blame] | 38 | .. kernel-doc:: include/linux/sched/mm.h |
| 39 | :functions: memalloc_nofs_save memalloc_nofs_restore |
| 40 | .. kernel-doc:: include/linux/sched/mm.h |
| 41 | :functions: memalloc_noio_save memalloc_noio_restore |
| 42 | |
Michal Hocko | 46ca359 | 2018-05-29 10:26:44 +0200 | [diff] [blame] | 43 | FS/IO code then simply calls the appropriate save function before |
| 44 | any critical section with respect to the reclaim is started - e.g. |
| 45 | lock shared with the reclaim context or when a transaction context |
| 46 | nesting would be possible via reclaim. The restore function should be |
| 47 | called when the critical section ends. All that ideally along with an |
| 48 | explanation what is the reclaim context for easier maintenance. |
| 49 | |
| 50 | Please note that the proper pairing of save/restore functions |
| 51 | allows nesting so it is safe to call ``memalloc_noio_save`` or |
| 52 | ``memalloc_noio_restore`` respectively from an existing NOIO or NOFS |
| 53 | scope. |
| 54 | |
| 55 | What about __vmalloc(GFP_NOFS) |
| 56 | ============================== |
| 57 | |
| 58 | vmalloc doesn't support GFP_NOFS semantic because there are hardcoded |
| 59 | GFP_KERNEL allocations deep inside the allocator which are quite non-trivial |
| 60 | to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is |
| 61 | almost always a bug. The good news is that the NOFS/NOIO semantic can be |
| 62 | achieved by the scope API. |
| 63 | |
| 64 | In the ideal world, upper layers should already mark dangerous contexts |
| 65 | and so no special care is required and vmalloc should be called without |
| 66 | any problems. Sometimes if the context is not really clear or there are |
| 67 | layering violations then the recommended way around that is to wrap ``vmalloc`` |
| 68 | by the scope API with a comment explaining the problem. |