Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 1 | .. _pagemap: |
| 2 | |
Mike Rapoport | 41ea9dd | 2018-04-18 11:07:47 +0300 | [diff] [blame] | 3 | ============================= |
| 4 | Examining Process Page Tables |
| 5 | ============================= |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 6 | |
| 7 | pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow |
| 8 | userspace programs to examine the page tables and related information by |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 9 | reading files in ``/proc``. |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 10 | |
Vladimir Davydov | 80ae2fd | 2015-09-09 15:35:38 -0700 | [diff] [blame] | 11 | There are four components to pagemap: |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 12 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 13 | * ``/proc/pid/pagemap``. This file lets a userspace process find out which |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 14 | physical frame each virtual page is mapped to. It contains one 64-bit |
| 15 | value for each virtual page, containing the following data (from |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 16 | ``fs/proc/task_mmu.c``, above pagemap_read): |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 17 | |
Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 18 | * Bits 0-54 page frame number (PFN) if present |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 19 | * Bits 0-4 swap type if swapped |
Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 20 | * Bits 5-54 swap offset if swapped |
Mike Rapoport | e27a20f | 2018-04-18 11:07:50 +0300 | [diff] [blame] | 21 | * Bit 55 pte is soft-dirty (see |
| 22 | :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`) |
Konstantin Khlebnikov | 83b4b0b | 2015-09-08 15:00:13 -0700 | [diff] [blame] | 23 | * Bit 56 page exclusively mapped (since 4.2) |
Konstantin Khlebnikov | 77bb499 | 2015-09-08 15:00:10 -0700 | [diff] [blame] | 24 | * Bits 57-60 zero |
Konstantin Khlebnikov | 83b4b0b | 2015-09-08 15:00:13 -0700 | [diff] [blame] | 25 | * Bit 61 page is file-page or shared-anon (since 3.5) |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 26 | * Bit 62 page swapped |
| 27 | * Bit 63 page present |
| 28 | |
Konstantin Khlebnikov | 83b4b0b | 2015-09-08 15:00:13 -0700 | [diff] [blame] | 29 | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. |
| 30 | In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from |
| 31 | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. |
| 32 | Reason: information about PFNs helps in exploiting Rowhammer vulnerability. |
| 33 | |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 34 | If the page is not present but in swap, then the PFN contains an |
| 35 | encoding of the swap file number and the page's offset into the |
| 36 | swap. Unmapped pages return a null PFN. This allows determining |
| 37 | precisely which pages are mapped (or in swap) and comparing mapped |
| 38 | pages between processes. |
| 39 | |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 40 | Efficient users of this interface will use ``/proc/pid/maps`` to |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 41 | determine which areas of memory are actually mapped and llseek to |
| 42 | skip over unmapped regions. |
| 43 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 44 | * ``/proc/kpagecount``. This file contains a 64-bit count of the number of |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 45 | times each page is mapped, indexed by PFN. |
| 46 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 47 | * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 48 | page, indexed by PFN. |
| 49 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 50 | The flags are (from ``fs/proc/page.c``, above kpageflags_read): |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 51 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 52 | 0. LOCKED |
| 53 | 1. ERROR |
| 54 | 2. REFERENCED |
| 55 | 3. UPTODATE |
| 56 | 4. DIRTY |
| 57 | 5. LRU |
| 58 | 6. ACTIVE |
| 59 | 7. SLAB |
| 60 | 8. WRITEBACK |
| 61 | 9. RECLAIM |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 62 | 10. BUDDY |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 63 | 11. MMAP |
| 64 | 12. ANON |
| 65 | 13. SWAPCACHE |
| 66 | 14. SWAPBACKED |
| 67 | 15. COMPOUND_HEAD |
| 68 | 16. COMPOUND_TAIL |
Doug Hoyte | 63f8e8d | 2016-04-13 11:09:21 -0400 | [diff] [blame] | 69 | 17. HUGE |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 70 | 18. UNEVICTABLE |
Wu Fengguang | 253fb02 | 2009-10-07 16:32:27 -0700 | [diff] [blame] | 71 | 19. HWPOISON |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 72 | 20. NOPAGE |
Wu Fengguang | a1bbb5e | 2009-10-07 16:32:28 -0700 | [diff] [blame] | 73 | 21. KSM |
Naoya Horiguchi | 807f0cc | 2012-03-21 16:33:58 -0700 | [diff] [blame] | 74 | 22. THP |
Wang, Yalin | 56873f4 | 2015-02-11 15:24:51 -0800 | [diff] [blame] | 75 | 23. BALLOON |
| 76 | 24. ZERO_PAGE |
Vladimir Davydov | f074a8f | 2015-09-09 15:35:48 -0700 | [diff] [blame] | 77 | 25. IDLE |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 78 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 79 | * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the |
Vladimir Davydov | 80ae2fd | 2015-09-09 15:35:38 -0700 | [diff] [blame] | 80 | memory cgroup each page is charged to, indexed by PFN. Only available when |
| 81 | CONFIG_MEMCG is set. |
| 82 | |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 83 | Short descriptions to the page flags |
| 84 | ==================================== |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 85 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 86 | 0 - LOCKED |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 87 | page is being locked for exclusive access, e.g. by undergoing read/write IO |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 88 | 7 - SLAB |
| 89 | page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator |
| 90 | When compound page is used, SLUB/SLQB will only set this flag on the head |
| 91 | page; SLOB will not flag it at all. |
| 92 | 10 - BUDDY |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 93 | a free memory block managed by the buddy system allocator |
| 94 | The buddy system organizes free memory in blocks of various orders. |
| 95 | An order N block has 2^N physically contiguous pages, with the BUDDY flag |
| 96 | set for and _only_ for the first page. |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 97 | 15 - COMPOUND_HEAD |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 98 | A compound page with order N consists of 2^N physically contiguous pages. |
| 99 | A compound page with order 2 takes the form of "HTTT", where H donates its |
| 100 | head page and T donates its tail page(s). The major consumers of compound |
Mike Rapoport | e27a20f | 2018-04-18 11:07:50 +0300 | [diff] [blame] | 101 | pages are hugeTLB pages |
| 102 | (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`), |
| 103 | the SLUB etc. memory allocators and various device drivers. |
| 104 | However in this interface, only huge/giga pages are made visible |
| 105 | to end users. |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 106 | 16 - COMPOUND_TAIL |
| 107 | A compound page tail (see description above). |
| 108 | 17 - HUGE |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 109 | this is an integral part of a HugeTLB page |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 110 | 19 - HWPOISON |
Wu Fengguang | 253fb02 | 2009-10-07 16:32:27 -0700 | [diff] [blame] | 111 | hardware detected memory corruption on this page: don't touch the data! |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 112 | 20 - NOPAGE |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 113 | no page frame exists at the requested address |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 114 | 21 - KSM |
Wu Fengguang | a1bbb5e | 2009-10-07 16:32:28 -0700 | [diff] [blame] | 115 | identical memory pages dynamically shared between one or more processes |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 116 | 22 - THP |
Naoya Horiguchi | 807f0cc | 2012-03-21 16:33:58 -0700 | [diff] [blame] | 117 | contiguous pages which construct transparent hugepages |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 118 | 23 - BALLOON |
Wang, Yalin | 56873f4 | 2015-02-11 15:24:51 -0800 | [diff] [blame] | 119 | balloon compaction page |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 120 | 24 - ZERO_PAGE |
Wang, Yalin | 56873f4 | 2015-02-11 15:24:51 -0800 | [diff] [blame] | 121 | zero page for pfn_zero or huge_zero page |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 122 | 25 - IDLE |
Vladimir Davydov | f074a8f | 2015-09-09 15:35:48 -0700 | [diff] [blame] | 123 | page has not been accessed since it was marked idle (see |
Mike Rapoport | e27a20f | 2018-04-18 11:07:50 +0300 | [diff] [blame] | 124 | :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`). |
| 125 | Note that this flag may be stale in case the page was accessed via |
| 126 | a PTE. To make sure the flag is up-to-date one has to read |
| 127 | ``/sys/kernel/mm/page_idle/bitmap`` first. |
Vladimir Davydov | f074a8f | 2015-09-09 15:35:48 -0700 | [diff] [blame] | 128 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 129 | IO related page flags |
| 130 | --------------------- |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 131 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 132 | 1 - ERROR |
| 133 | IO error occurred |
| 134 | 3 - UPTODATE |
| 135 | page has up-to-date data |
| 136 | ie. for file backed page: (in-memory data revision >= on-disk one) |
| 137 | 4 - DIRTY |
| 138 | page has been written to, hence contains new data |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 139 | i.e. for file backed page: (in-memory data revision > on-disk one) |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 140 | 8 - WRITEBACK |
| 141 | page is being synced to disk |
| 142 | |
| 143 | LRU related page flags |
| 144 | ---------------------- |
| 145 | |
| 146 | 5 - LRU |
| 147 | page is in one of the LRU lists |
| 148 | 6 - ACTIVE |
| 149 | page is in the active LRU list |
| 150 | 18 - UNEVICTABLE |
| 151 | page is in the unevictable (non-)LRU list It is somehow pinned and |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 152 | not a candidate for LRU page reclaims, e.g. ramfs pages, |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 153 | shmctl(SHM_LOCK) and mlock() memory segments |
| 154 | 2 - REFERENCED |
| 155 | page has been referenced since last LRU list enqueue/requeue |
| 156 | 9 - RECLAIM |
| 157 | page will be reclaimed soon after its pageout IO completed |
| 158 | 11 - MMAP |
| 159 | a memory mapped page |
| 160 | 12 - ANON |
| 161 | a memory mapped page that is not part of a file |
| 162 | 13 - SWAPCACHE |
Mike Rapoport | 86207d9 | 2018-04-18 11:07:46 +0300 | [diff] [blame] | 163 | page is mapped to swap space, i.e. has an associated swap entry |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 164 | 14 - SWAPBACKED |
| 165 | page is backed by swap/RAM |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 166 | |
Randy Wright | 3250af1 | 2015-04-10 15:00:02 -0600 | [diff] [blame] | 167 | The page-types tool in the tools/vm directory can be used to query the |
| 168 | above flags. |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 169 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 170 | Using pagemap to do something useful |
| 171 | ==================================== |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 172 | |
| 173 | The general procedure for using pagemap to find out about a process' memory |
| 174 | usage goes like this: |
| 175 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 176 | 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 177 | mapped to what. |
| 178 | 2. Select the maps you are interested in -- all of them, or a particular |
| 179 | library, or the stack or the heap, etc. |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 180 | 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine. |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 181 | 4. Read a u64 for each page from pagemap. |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 182 | 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you |
| 183 | just read, seek to that entry in the file, and read the data you want. |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 184 | |
| 185 | For example, to find the "unique set size" (USS), which is the amount of |
| 186 | memory that a process is using that is not shared with any other process, |
| 187 | you can go through every map in the process, find the PFNs, look those up |
| 188 | in kpagecount, and tally up the number of pages that are only referenced |
| 189 | once. |
| 190 | |
Mike Rapoport | 25c3bf8 | 2018-03-21 21:22:33 +0200 | [diff] [blame] | 191 | Other notes |
| 192 | =========== |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 193 | |
| 194 | Reading from any of the files will return -EINVAL if you are not starting |
Anatol Pomozov | f884ab1 | 2013-05-08 16:56:16 -0700 | [diff] [blame] | 195 | the read on an 8-byte boundary (e.g., if you sought an odd number of bytes |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 196 | into the file), or if the size of the read is not a multiple of 8 bytes. |
Konstantin Khlebnikov | 83b4b0b | 2015-09-08 15:00:13 -0700 | [diff] [blame] | 197 | |
| 198 | Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is |
| 199 | always 12 at most architectures). Since Linux 3.11 their meaning changes |
| 200 | after first clear of soft-dirty bits. Since Linux 4.2 they are used for |
| 201 | flags unconditionally. |