2012-02-07 06:48:13

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 0/8] Fixes for common mistakes w/ for_each_process and task->mm

Hi all,

While working on lowmemorykiller driver, I stumbled upon several places
where we traverse the tasklist in an unsafe manner, plus there are a
few cases of unsafe access to task->mm.

Note that some patches were not tested (e.g. sh, blackfin), so please
take a closer look if there's silly mistakes. Also special attention
needed for patch 7/8 (UML specific).

Oleg,

For sysrq case I kept the force_sig() usage, this is because in sysrq
case I belive we do want to kill PID namespace init processes. If using
force_sig() is still a bad idea, I guess we should fix it somehow
else.

Thanks.

--
Anton Vorontsov
Email: [email protected]


2012-02-07 06:49:43

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 1/8] sysrq: Fix possible race with exiting task

sysrq should grab the tasklist lock, otherwise calling force_sig() is
not safe, as it might race with exiting task, which ->sighand might be
set to NULL already.

Signed-off-by: Anton Vorontsov <[email protected]>
---
drivers/tty/sysrq.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 7867b7c..a1bcad7 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -322,11 +322,13 @@ static void send_sig_all(int sig)
{
struct task_struct *p;

+ read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm && !is_global_init(p))
/* Not swapper, init nor kernel thread */
force_sig(sig, p);
}
+ read_unlock(&tasklist_lock);
}

static void sysrq_handle_term(int key)
--
1.7.7.6

2012-02-07 06:49:55

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 2/8] sysrq: Properly check for kernel threads

There's a real possibility of killing kernel threads that might
have issued use_mm(), so kthread's mm might become non-NULL.

This patch fixes the issue by checking for PF_KTHREAD (just as
get_task_mm()).

Suggested-by: Oleg Nesterov <[email protected]>
Signed-off-by: Anton Vorontsov <[email protected]>
---
drivers/tty/sysrq.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index a1bcad7..8db9125 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -324,9 +324,12 @@ static void send_sig_all(int sig)

read_lock(&tasklist_lock);
for_each_process(p) {
- if (p->mm && !is_global_init(p))
- /* Not swapper, init nor kernel thread */
- force_sig(sig, p);
+ if (p->flags & PF_KTHREAD)
+ continue;
+ if (is_global_init(p))
+ continue;
+
+ force_sig(sig, p);
}
read_unlock(&tasklist_lock);
}
--
1.7.7.6

2012-02-07 06:50:11

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 3/8] arm: Fix possible race on task->mm

Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).

We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.

Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/arm/kernel/smp.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 57db122..85db3f2 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -168,8 +168,10 @@ int __cpu_disable(void)

read_lock(&tasklist_lock);
for_each_process(p) {
+ task_lock(p);
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+ task_unlock(p);
}
read_unlock(&tasklist_lock);

--
1.7.7.6

2012-02-07 06:50:27

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 4/8] powerpc/mm: Fix possible race on task->mm

Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).

We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.

Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/powerpc/mm/mmu_context_nohash.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index 5b63bd3..13ec484 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -43,6 +43,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/init.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/bootmem.h>
#include <linux/notifier.h>
@@ -360,8 +361,10 @@ static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self,
/* We also clear the cpu_vm_mask bits of CPUs going away */
read_lock(&tasklist_lock);
for_each_process(p) {
+ task_lock(p);
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+ task_unlock(p);
}
read_unlock(&tasklist_lock);
break;
--
1.7.7.6

2012-02-07 06:50:44

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 5/8] sh: Fix possible race on task->mm

Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).

We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.

Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/sh/kernel/smp.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 3147a9a..11d29dc 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -154,9 +154,12 @@ int __cpu_disable(void)
local_flush_tlb_all();

read_lock(&tasklist_lock);
- for_each_process(p)
+ for_each_process(p) {
+ task_lock(p);
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+ task_unlock(p);
+ }
read_unlock(&tasklist_lock);

return 0;
--
1.7.7.6

2012-02-07 06:50:59

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 6/8] blackfin: Fix possible race on task->mm

Even when in atomic context, grabbing irqsave variant of tasklist lock
is not enough to protect task->mm from disappearing on SMP machines.
Instead, we have to grab the task lock.

Signed-off-by: Anton Vorontsov <[email protected]>
---

p.s. I don't have blackfin toolchain handy, so I did not build-test
the patch.

arch/blackfin/kernel/trace.c | 21 +++++++++------------
1 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/blackfin/kernel/trace.c b/arch/blackfin/kernel/trace.c
index 050db44..19bc02a 100644
--- a/arch/blackfin/kernel/trace.c
+++ b/arch/blackfin/kernel/trace.c
@@ -10,6 +10,7 @@
#include <linux/hardirq.h>
#include <linux/thread_info.h>
#include <linux/mm.h>
+#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/module.h>
#include <linux/kallsyms.h>
@@ -27,7 +28,6 @@ void decode_address(char *buf, unsigned long address)
struct task_struct *p;
struct mm_struct *mm;
unsigned long flags, offset;
- unsigned char in_atomic = (bfin_read_IPEND() & 0x10) || in_atomic();
struct rb_node *n;

#ifdef CONFIG_KALLSYMS
@@ -113,15 +113,13 @@ void decode_address(char *buf, unsigned long address)
*/
write_lock_irqsave(&tasklist_lock, flags);
for_each_process(p) {
- mm = (in_atomic ? p->mm : get_task_mm(p));
+ task_lock(p);
+ mm = p->mm;
if (!mm)
- continue;
+ goto __continue;

- if (!down_read_trylock(&mm->mmap_sem)) {
- if (!in_atomic)
- mmput(mm);
- continue;
- }
+ if (!down_read_trylock(&mm->mmap_sem))
+ goto __continue;

for (n = rb_first(&mm->mm_rb); n; n = rb_next(n)) {
struct vm_area_struct *vma;
@@ -163,8 +161,7 @@ void decode_address(char *buf, unsigned long address)
name, vma->vm_start, vma->vm_end);

up_read(&mm->mmap_sem);
- if (!in_atomic)
- mmput(mm);
+ task_unlock(p);

if (buf[0] == '\0')
sprintf(buf, "[ %s ] dynamic memory", name);
@@ -174,8 +171,8 @@ void decode_address(char *buf, unsigned long address)
}

up_read(&mm->mmap_sem);
- if (!in_atomic)
- mmput(mm);
+__continue:
+ task_unlock(p);
}

/*
--
1.7.7.6

2012-02-07 06:51:26

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 7/8] um: Should hold tasklist_lock while traversing processes

Traversing the tasks requires holding tasklist_lock, otherwise it
is unsafe.

Signed-off-by: Anton Vorontsov <[email protected]>
---

p.s. However, I'm not sure that calling os_kill_ptraced_process()
in the 'atomic' context is correct. It seem to work, but please
take a closer look.

arch/um/kernel/reboot.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/um/kernel/reboot.c b/arch/um/kernel/reboot.c
index 4d93dff..66d754c 100644
--- a/arch/um/kernel/reboot.c
+++ b/arch/um/kernel/reboot.c
@@ -4,6 +4,7 @@
*/

#include "linux/sched.h"
+#include "linux/spinlock.h"
#include "linux/slab.h"
#include "kern_util.h"
#include "os.h"
@@ -22,6 +23,7 @@ static void kill_off_processes(void)
struct task_struct *p;
int pid;

+ read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm == NULL)
continue;
@@ -29,6 +31,7 @@ static void kill_off_processes(void)
pid = p->mm->context.id.u.pid;
os_kill_ptraced_process(pid, 1);
}
+ read_unlock(&tasklist_lock);
}
}

--
1.7.7.6

2012-02-07 06:51:45

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH 8/8] um: Fix possible race on task->mm

Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).

We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.

Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/um/kernel/reboot.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/um/kernel/reboot.c b/arch/um/kernel/reboot.c
index 66d754c..1411f4e 100644
--- a/arch/um/kernel/reboot.c
+++ b/arch/um/kernel/reboot.c
@@ -25,10 +25,13 @@ static void kill_off_processes(void)

read_lock(&tasklist_lock);
for_each_process(p) {
- if (p->mm == NULL)
+ task_lock(p);
+ if (!p->mm) {
+ task_unlock(p);
continue;
-
+ }
pid = p->mm->context.id.u.pid;
+ task_unlock(p);
os_kill_ptraced_process(pid, 1);
}
read_unlock(&tasklist_lock);
--
1.7.7.6

2012-02-08 16:07:10

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/8] Fixes for common mistakes w/ for_each_process and task->mm

On 02/07, Anton Vorontsov wrote:
>
> For sysrq case I kept the force_sig() usage, this is because in sysrq
> case I belive we do want to kill PID namespace init processes.

Agreed,

> If using
> force_sig() is still a bad idea, I guess we should fix it somehow
> else.

Yes, I think it is simply wrong here. And I think that force_*
should not take the "struct task_struct *" argument, it should be
used for synchronous signals only.

I am thinking about the patch below. With this patch send_sig_all()
can use send_sig_info(SEND_SIG_FORCED). And probably send_sig_all()
doesn't need tasklist in this case.

I'll recheck this but I need to disappear until Friday, sorry.

Oleg.


--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -58,21 +58,20 @@ static int sig_handler_ignored(void __us
(handler == SIG_DFL && sig_kernel_ignore(sig));
}

-static int sig_task_ignored(struct task_struct *t, int sig,
- int from_ancestor_ns)
+static int sig_task_ignored(struct task_struct *t, int sig, bool force)
{
void __user *handler;

handler = sig_handler(t, sig);

if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
- handler == SIG_DFL && !from_ancestor_ns)
+ handler == SIG_DFL && !force)
return 1;

return sig_handler_ignored(handler, sig);
}

-static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
+static int sig_ignored(struct task_struct *t, int sig, bool force)
{
/*
* Blocked signals are never ignored, since the
@@ -82,7 +81,7 @@ static int sig_ignored(struct task_struc
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;

- if (!sig_task_ignored(t, sig, from_ancestor_ns))
+ if (!sig_task_ignored(t, sig, force))
return 0;

/*
@@ -855,7 +854,7 @@ static void ptrace_trap_notify(struct ta
* Returns true if the signal should be actually delivered, otherwise
* it should be dropped.
*/
-static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
+static int prepare_signal(int sig, struct task_struct *p, bool force)
{
struct signal_struct *signal = p->signal;
struct task_struct *t;
@@ -915,7 +914,7 @@ static int prepare_signal(int sig, struc
}
}

- return !sig_ignored(p, sig, from_ancestor_ns);
+ return !sig_ignored(p, sig, force);
}

/*
@@ -1059,7 +1058,8 @@ static int __send_signal(int sig, struct

assert_spin_locked(&t->sighand->siglock);

- if (!prepare_signal(sig, t, from_ancestor_ns))
+ if (!prepare_signal(sig, t,
+ from_ancestor_ns || (info == SEND_SIG_FORCED)))
return 0;

pending = group ? &t->signal->shared_pending : &t->pending;

2012-02-08 16:15:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/8] arm: Fix possible race on task->mm

On 02/07, Anton Vorontsov wrote:
>
> Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> assigns NULL under task_lock(), so tasklist lock is not enough).
>
> We can't use get_task_mm()/mmput() pair as mmput() might sleep,

Yes, but

> so let's take the task lock while we care about its mm.

it seems that this needs find_lock_task_mm() too ?

the same for the rest patches in this series.

Oleg.

2012-02-08 16:27:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/8] blackfin: Fix possible race on task->mm

On 02/07, Anton Vorontsov wrote:
>
> Even when in atomic context, grabbing irqsave variant of tasklist lock
> is not enough to protect task->mm from disappearing on SMP machines.
> Instead, we have to grab the task lock.

Yes, but afaics there is no reason for write_lock_irqsave(tasklist).
read_lock() should be enough.

I know nothing about arch/blackfin/ but in fact this looks simply wrong.

For example. sysrq_showregs_othercpus() does smp_call_function(showacpu)
and showacpu() show_stack()->decode_address(). Now suppose that IPI
interrupts the task holding read_lock(tasklist).

Mike?

Oleg.

2012-02-09 01:46:19

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 2/8] sysrq: Properly check for kernel threads

On Tue, 7 Feb 2012, Anton Vorontsov wrote:

> There's a real possibility of killing kernel threads that might
> have issued use_mm(), so kthread's mm might become non-NULL.
>
> This patch fixes the issue by checking for PF_KTHREAD (just as
> get_task_mm()).
>
> Suggested-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Anton Vorontsov <[email protected]>

Acked-by: David Rientjes <[email protected]>

Definitely needed.

2012-02-09 01:46:40

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 3/8] arm: Fix possible race on task->mm

On Tue, 7 Feb 2012, Anton Vorontsov wrote:

> Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> assigns NULL under task_lock(), so tasklist lock is not enough).
>
> We can't use get_task_mm()/mmput() pair as mmput() might sleep,
> so let's take the task lock while we care about its mm.
>
> Signed-off-by: Anton Vorontsov <[email protected]>

Acked-by: David Rientjes <[email protected]>

2012-02-09 01:47:33

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH 1/8] sysrq: Fix possible race with exiting task

On Tue, 7 Feb 2012, Anton Vorontsov wrote:

> sysrq should grab the tasklist lock, otherwise calling force_sig() is
> not safe, as it might race with exiting task, which ->sighand might be
> set to NULL already.
>
> Signed-off-by: Anton Vorontsov <[email protected]>

Acked-by: David Rientjes <[email protected]>

2012-02-09 15:33:52

by Anton Vorontsov

[permalink] [raw]
Subject: Re: [PATCH 3/8] arm: Fix possible race on task->mm

On Wed, Feb 08, 2012 at 05:08:08PM +0100, Oleg Nesterov wrote:
> On 02/07, Anton Vorontsov wrote:
> >
> > Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> > assigns NULL under task_lock(), so tasklist lock is not enough).
> >
> > We can't use get_task_mm()/mmput() pair as mmput() might sleep,
>
> Yes, but
>
> > so let's take the task lock while we care about its mm.
>
> it seems that this needs find_lock_task_mm() too ?
>
> the same for the rest patches in this series.

Yep, I think you're right, will add this change.

--
Anton Vorontsov
Email: [email protected]

2012-02-09 15:43:42

by Anton Vorontsov

[permalink] [raw]
Subject: Re: [PATCH 3/8] arm: Fix possible race on task->mm

On Thu, Feb 09, 2012 at 07:33:46PM +0400, Anton Vorontsov wrote:
> On Wed, Feb 08, 2012 at 05:08:08PM +0100, Oleg Nesterov wrote:
> > On 02/07, Anton Vorontsov wrote:
> > >
> > > Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> > > assigns NULL under task_lock(), so tasklist lock is not enough).
> > >
> > > We can't use get_task_mm()/mmput() pair as mmput() might sleep,
> >
> > Yes, but
> >
> > > so let's take the task lock while we care about its mm.
> >
> > it seems that this needs find_lock_task_mm() too ?
> >
> > the same for the rest patches in this series.
>
> Yep, I think you're right, will add this change.

Thinking about it more... making the code use find_lock_task_mm
would be a behaviour change. Sure, in trivial cases like ARM this
looks like a 100% safe thing to do, but in e.g. UML case, I
wouldn't bet much money on that 'mm->context.id.u.pid' would be
still meaningful.

So, I'd rather do it in a separate change, so this can be easily
reverted.

--
Anton Vorontsov
Email: [email protected]

2012-02-10 20:28:29

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/8] arm: Fix possible race on task->mm

On 02/09, Anton Vorontsov wrote:
>
> On Thu, Feb 09, 2012 at 07:33:46PM +0400, Anton Vorontsov wrote:
> > On Wed, Feb 08, 2012 at 05:08:08PM +0100, Oleg Nesterov wrote:
> > > On 02/07, Anton Vorontsov wrote:
> > > >
> > > > Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> > > > assigns NULL under task_lock(), so tasklist lock is not enough).
> > > >
> > > > We can't use get_task_mm()/mmput() pair as mmput() might sleep,
> > >
> > > Yes, but
> > >
> > > > so let's take the task lock while we care about its mm.
> > >
> > > it seems that this needs find_lock_task_mm() too ?
> > >
> > > the same for the rest patches in this series.
> >
> > Yep, I think you're right, will add this change.
>
> Thinking about it more... making the code use find_lock_task_mm
> would be a behaviour change. Sure, in trivial cases like ARM this
> looks like a 100% safe thing to do, but in e.g. UML case, I
> wouldn't bet much money on that 'mm->context.id.u.pid' would be
> still meaningful.

OK, perhaps UML differs. I don't know what context.id.u.pid means.
Although at first glance it would be meaningful anyway...

> So, I'd rather do it in a separate change, so this can be easily
> reverted.

In the !UML case find_lock_task_mm() "obviously looks like the right
thing...

But I won't argue, up to you.

Oleg.