memcg: fix VM_BUG_ON from page migration
Page migration gave me free_hot_cold_page's VM_BUG_ON page->page_cgroup.
remove_migration_pte was calling mem_cgroup_charge on the new page whenever it
found a swap pte, before it had determined it to be a migration entry. That
left a surplus reference count on the page_cgroup, so it was still attached
when the page was later freed.
Move that mem_cgroup_charge down to where we're sure it's a migration entry.
We were already under i_mmap_lock or anon_vma->lock, so its GFP_KERNEL was
already inappropriate: change that to GFP_ATOMIC.
It's essential that remove_migration_pte removes all the migration entries,
other crashes follow if not. So proceed even when the charge fails: normally
it cannot, but after a mem_cgroup_force_empty it might - comment in the code.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index a73504f..4e0eccc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -153,11 +153,6 @@
return;
}
- if (mem_cgroup_charge(new, mm, GFP_KERNEL)) {
- pte_unmap(ptep);
- return;
- }
-
ptl = pte_lockptr(mm, pmd);
spin_lock(ptl);
pte = *ptep;
@@ -169,6 +164,20 @@
if (!is_migration_entry(entry) || migration_entry_to_page(entry) != old)
goto out;
+ /*
+ * Yes, ignore the return value from a GFP_ATOMIC mem_cgroup_charge.
+ * Failure is not an option here: we're now expected to remove every
+ * migration pte, and will cause crashes otherwise. Normally this
+ * is not an issue: mem_cgroup_prepare_migration bumped up the old
+ * page_cgroup count for safety, that's now attached to the new page,
+ * so this charge should just be another incrementation of the count,
+ * to keep in balance with rmap.c's mem_cgroup_uncharging. But if
+ * there's been a force_empty, those reference counts may no longer
+ * be reliable, and this charge can actually fail: oh well, we don't
+ * make the situation any worse by proceeding as if it had succeeded.
+ */
+ mem_cgroup_charge(new, mm, GFP_ATOMIC);
+
get_page(new);
pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
if (is_write_migration_entry(entry))