Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 1 | pagemap, from the userspace perspective |
| 2 | --------------------------------------- |
| 3 | |
| 4 | pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow |
| 5 | userspace programs to examine the page tables and related information by |
| 6 | reading files in /proc. |
| 7 | |
| 8 | There are three components to pagemap: |
| 9 | |
| 10 | * /proc/pid/pagemap. This file lets a userspace process find out which |
| 11 | physical frame each virtual page is mapped to. It contains one 64-bit |
| 12 | value for each virtual page, containing the following data (from |
| 13 | fs/proc/task_mmu.c, above pagemap_read): |
| 14 | |
Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 15 | * Bits 0-54 page frame number (PFN) if present |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 16 | * Bits 0-4 swap type if swapped |
Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 17 | * Bits 5-54 swap offset if swapped |
Pavel Emelyanov | 541c237 | 2013-07-03 15:01:22 -0700 | [diff] [blame] | 18 | * Bit 55 pte is soft-dirty (see Documentation/vm/soft-dirty.txt) |
| 19 | * Bits 56-60 zero |
Konstantin Khlebnikov | 052fb0d | 2012-05-31 16:26:19 -0700 | [diff] [blame] | 20 | * Bit 61 page is file-page or shared-anon |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 21 | * Bit 62 page swapped |
| 22 | * Bit 63 page present |
| 23 | |
| 24 | If the page is not present but in swap, then the PFN contains an |
| 25 | encoding of the swap file number and the page's offset into the |
| 26 | swap. Unmapped pages return a null PFN. This allows determining |
| 27 | precisely which pages are mapped (or in swap) and comparing mapped |
| 28 | pages between processes. |
| 29 | |
| 30 | Efficient users of this interface will use /proc/pid/maps to |
| 31 | determine which areas of memory are actually mapped and llseek to |
| 32 | skip over unmapped regions. |
| 33 | |
| 34 | * /proc/kpagecount. This file contains a 64-bit count of the number of |
| 35 | times each page is mapped, indexed by PFN. |
| 36 | |
| 37 | * /proc/kpageflags. This file contains a 64-bit set of flags for each |
| 38 | page, indexed by PFN. |
| 39 | |
Wu Fengguang | c9ba78e | 2009-06-16 15:32:25 -0700 | [diff] [blame] | 40 | The flags are (from fs/proc/page.c, above kpageflags_read): |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 41 | |
| 42 | 0. LOCKED |
| 43 | 1. ERROR |
| 44 | 2. REFERENCED |
| 45 | 3. UPTODATE |
| 46 | 4. DIRTY |
| 47 | 5. LRU |
| 48 | 6. ACTIVE |
| 49 | 7. SLAB |
| 50 | 8. WRITEBACK |
| 51 | 9. RECLAIM |
| 52 | 10. BUDDY |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 53 | 11. MMAP |
| 54 | 12. ANON |
| 55 | 13. SWAPCACHE |
| 56 | 14. SWAPBACKED |
| 57 | 15. COMPOUND_HEAD |
| 58 | 16. COMPOUND_TAIL |
| 59 | 16. HUGE |
| 60 | 18. UNEVICTABLE |
Wu Fengguang | 253fb02 | 2009-10-07 16:32:27 -0700 | [diff] [blame] | 61 | 19. HWPOISON |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 62 | 20. NOPAGE |
Wu Fengguang | a1bbb5e | 2009-10-07 16:32:28 -0700 | [diff] [blame] | 63 | 21. KSM |
Naoya Horiguchi | 807f0cc | 2012-03-21 16:33:58 -0700 | [diff] [blame] | 64 | 22. THP |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 65 | |
| 66 | Short descriptions to the page flags: |
| 67 | |
| 68 | 0. LOCKED |
| 69 | page is being locked for exclusive access, eg. by undergoing read/write IO |
| 70 | |
| 71 | 7. SLAB |
| 72 | page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator |
| 73 | When compound page is used, SLUB/SLQB will only set this flag on the head |
| 74 | page; SLOB will not flag it at all. |
| 75 | |
| 76 | 10. BUDDY |
| 77 | a free memory block managed by the buddy system allocator |
| 78 | The buddy system organizes free memory in blocks of various orders. |
| 79 | An order N block has 2^N physically contiguous pages, with the BUDDY flag |
| 80 | set for and _only_ for the first page. |
| 81 | |
| 82 | 15. COMPOUND_HEAD |
| 83 | 16. COMPOUND_TAIL |
| 84 | A compound page with order N consists of 2^N physically contiguous pages. |
| 85 | A compound page with order 2 takes the form of "HTTT", where H donates its |
| 86 | head page and T donates its tail page(s). The major consumers of compound |
| 87 | pages are hugeTLB pages (Documentation/vm/hugetlbpage.txt), the SLUB etc. |
| 88 | memory allocators and various device drivers. However in this interface, |
| 89 | only huge/giga pages are made visible to end users. |
| 90 | 17. HUGE |
| 91 | this is an integral part of a HugeTLB page |
| 92 | |
Wu Fengguang | 253fb02 | 2009-10-07 16:32:27 -0700 | [diff] [blame] | 93 | 19. HWPOISON |
| 94 | hardware detected memory corruption on this page: don't touch the data! |
| 95 | |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 96 | 20. NOPAGE |
| 97 | no page frame exists at the requested address |
| 98 | |
Wu Fengguang | a1bbb5e | 2009-10-07 16:32:28 -0700 | [diff] [blame] | 99 | 21. KSM |
| 100 | identical memory pages dynamically shared between one or more processes |
| 101 | |
Naoya Horiguchi | 807f0cc | 2012-03-21 16:33:58 -0700 | [diff] [blame] | 102 | 22. THP |
| 103 | contiguous pages which construct transparent hugepages |
| 104 | |
Wu Fengguang | 17e8950 | 2009-06-16 15:32:26 -0700 | [diff] [blame] | 105 | [IO related page flags] |
| 106 | 1. ERROR IO error occurred |
| 107 | 3. UPTODATE page has up-to-date data |
| 108 | ie. for file backed page: (in-memory data revision >= on-disk one) |
| 109 | 4. DIRTY page has been written to, hence contains new data |
| 110 | ie. for file backed page: (in-memory data revision > on-disk one) |
| 111 | 8. WRITEBACK page is being synced to disk |
| 112 | |
| 113 | [LRU related page flags] |
| 114 | 5. LRU page is in one of the LRU lists |
| 115 | 6. ACTIVE page is in the active LRU list |
| 116 | 18. UNEVICTABLE page is in the unevictable (non-)LRU list |
| 117 | It is somehow pinned and not a candidate for LRU page reclaims, |
| 118 | eg. ramfs pages, shmctl(SHM_LOCK) and mlock() memory segments |
| 119 | 2. REFERENCED page has been referenced since last LRU list enqueue/requeue |
| 120 | 9. RECLAIM page will be reclaimed soon after its pageout IO completed |
| 121 | 11. MMAP a memory mapped page |
| 122 | 12. ANON a memory mapped page that is not part of a file |
| 123 | 13. SWAPCACHE page is mapped to swap space, ie. has an associated swap entry |
| 124 | 14. SWAPBACKED page is backed by swap/RAM |
| 125 | |
| 126 | The page-types tool in this directory can be used to query the above flags. |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 127 | |
| 128 | Using pagemap to do something useful: |
| 129 | |
| 130 | The general procedure for using pagemap to find out about a process' memory |
| 131 | usage goes like this: |
| 132 | |
| 133 | 1. Read /proc/pid/maps to determine which parts of the memory space are |
| 134 | mapped to what. |
| 135 | 2. Select the maps you are interested in -- all of them, or a particular |
| 136 | library, or the stack or the heap, etc. |
| 137 | 3. Open /proc/pid/pagemap and seek to the pages you would like to examine. |
| 138 | 4. Read a u64 for each page from pagemap. |
| 139 | 5. Open /proc/kpagecount and/or /proc/kpageflags. For each PFN you just |
| 140 | read, seek to that entry in the file, and read the data you want. |
| 141 | |
| 142 | For example, to find the "unique set size" (USS), which is the amount of |
| 143 | memory that a process is using that is not shared with any other process, |
| 144 | you can go through every map in the process, find the PFNs, look those up |
| 145 | in kpagecount, and tally up the number of pages that are only referenced |
| 146 | once. |
| 147 | |
| 148 | Other notes: |
| 149 | |
| 150 | Reading from any of the files will return -EINVAL if you are not starting |
Anatol Pomozov | f884ab1 | 2013-05-08 16:56:16 -0700 | [diff] [blame] | 151 | the read on an 8-byte boundary (e.g., if you sought an odd number of bytes |
Thomas Tuttle | ef421be | 2008-06-05 22:46:59 -0700 | [diff] [blame] | 152 | into the file), or if the size of the read is not a multiple of 8 bytes. |