[PATCH] set EXIT_DEAD state in do_exit(), not in schedule()
schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD
is in fact a "random" value, we can use any bit except normal TASK_XXX values.
It is better to set this state in do_exit() along with PF_DEAD flag and remove
that check in schedule().
We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
can not change task's ->state: the 'state' argument of try_to_wake_up() can't
have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of
->state == TASK_RUNNING it will do nothing.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index 4a28085..3d759c9 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -957,6 +957,7 @@
preempt_disable();
BUG_ON(tsk->flags & PF_DEAD);
tsk->flags |= PF_DEAD;
+ tsk->state = EXIT_DEAD;
schedule();
BUG();
diff --git a/kernel/sched.c b/kernel/sched.c
index 155a33d..e1646b0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3348,9 +3348,6 @@
spin_lock_irq(&rq->lock);
- if (unlikely(prev->flags & PF_DEAD))
- prev->state = EXIT_DEAD;
-
switch_count = &prev->nivcsw;
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
switch_count = &prev->nvcsw;