oom: task->mm == NULL doesn't mean the memory was freed

exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which
frees the memory.

However select_bad_process() checks ->mm != NULL before TIF_MEMDIE,
so it continues to kill other tasks even if we have the oom-killed
task freeing its memory.

Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip
the tasks which have already passed exit_notify() to ensure a zombie
with TIF_MEMDIE set can't block oom-killer. Alternatively we could
probably clear TIF_MEMDIE after exit_mmap().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index eafff89..626303b5 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -303,7 +303,7 @@
 	do_each_thread(g, p) {
 		unsigned int points;
 
-		if (!p->mm)
+		if (p->exit_state)
 			continue;
 		if (oom_unkillable_task(p, mem, nodemask))
 			continue;
@@ -319,6 +319,8 @@
 		 */
 		if (test_tsk_thread_flag(p, TIF_MEMDIE))
 			return ERR_PTR(-1UL);
+		if (!p->mm)
+			continue;
 
 		if (p->flags & PF_EXITING) {
 			/*