Hi all,
While working on lowmemorykiller driver, I stumbled upon several places
where we traverse the tasklist in an unsafe manner, plus there are a
few cases of unsafe access to task->mm.
Note that some patches were not tested (e.g. sh, blackfin), so please
take a closer look if there's silly mistakes. Also special attention
needed for patch 7/8 (UML specific).
Oleg,
For sysrq case I kept the force_sig() usage, this is because in sysrq
case I belive we do want to kill PID namespace init processes. If using
force_sig() is still a bad idea, I guess we should fix it somehow
else.
Thanks.
--
Anton Vorontsov
Email: [email protected]
sysrq should grab the tasklist lock, otherwise calling force_sig() is
not safe, as it might race with exiting task, which ->sighand might be
set to NULL already.
Signed-off-by: Anton Vorontsov <[email protected]>
---
drivers/tty/sysrq.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 7867b7c..a1bcad7 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -322,11 +322,13 @@ static void send_sig_all(int sig)
{
struct task_struct *p;
+ read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm && !is_global_init(p))
/* Not swapper, init nor kernel thread */
force_sig(sig, p);
}
+ read_unlock(&tasklist_lock);
}
static void sysrq_handle_term(int key)
--
1.7.7.6
There's a real possibility of killing kernel threads that might
have issued use_mm(), so kthread's mm might become non-NULL.
This patch fixes the issue by checking for PF_KTHREAD (just as
get_task_mm()).
Suggested-by: Oleg Nesterov <[email protected]>
Signed-off-by: Anton Vorontsov <[email protected]>
---
drivers/tty/sysrq.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index a1bcad7..8db9125 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -324,9 +324,12 @@ static void send_sig_all(int sig)
read_lock(&tasklist_lock);
for_each_process(p) {
- if (p->mm && !is_global_init(p))
- /* Not swapper, init nor kernel thread */
- force_sig(sig, p);
+ if (p->flags & PF_KTHREAD)
+ continue;
+ if (is_global_init(p))
+ continue;
+
+ force_sig(sig, p);
}
read_unlock(&tasklist_lock);
}
--
1.7.7.6
Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.
Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/arm/kernel/smp.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 57db122..85db3f2 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -168,8 +168,10 @@ int __cpu_disable(void)
read_lock(&tasklist_lock);
for_each_process(p) {
+ task_lock(p);
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+ task_unlock(p);
}
read_unlock(&tasklist_lock);
--
1.7.7.6
Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.
Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/powerpc/mm/mmu_context_nohash.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index 5b63bd3..13ec484 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -43,6 +43,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/init.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/bootmem.h>
#include <linux/notifier.h>
@@ -360,8 +361,10 @@ static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self,
/* We also clear the cpu_vm_mask bits of CPUs going away */
read_lock(&tasklist_lock);
for_each_process(p) {
+ task_lock(p);
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+ task_unlock(p);
}
read_unlock(&tasklist_lock);
break;
--
1.7.7.6
Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.
Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/sh/kernel/smp.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 3147a9a..11d29dc 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -154,9 +154,12 @@ int __cpu_disable(void)
local_flush_tlb_all();
read_lock(&tasklist_lock);
- for_each_process(p)
+ for_each_process(p) {
+ task_lock(p);
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
+ task_unlock(p);
+ }
read_unlock(&tasklist_lock);
return 0;
--
1.7.7.6
Even when in atomic context, grabbing irqsave variant of tasklist lock
is not enough to protect task->mm from disappearing on SMP machines.
Instead, we have to grab the task lock.
Signed-off-by: Anton Vorontsov <[email protected]>
---
p.s. I don't have blackfin toolchain handy, so I did not build-test
the patch.
arch/blackfin/kernel/trace.c | 21 +++++++++------------
1 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/arch/blackfin/kernel/trace.c b/arch/blackfin/kernel/trace.c
index 050db44..19bc02a 100644
--- a/arch/blackfin/kernel/trace.c
+++ b/arch/blackfin/kernel/trace.c
@@ -10,6 +10,7 @@
#include <linux/hardirq.h>
#include <linux/thread_info.h>
#include <linux/mm.h>
+#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/module.h>
#include <linux/kallsyms.h>
@@ -27,7 +28,6 @@ void decode_address(char *buf, unsigned long address)
struct task_struct *p;
struct mm_struct *mm;
unsigned long flags, offset;
- unsigned char in_atomic = (bfin_read_IPEND() & 0x10) || in_atomic();
struct rb_node *n;
#ifdef CONFIG_KALLSYMS
@@ -113,15 +113,13 @@ void decode_address(char *buf, unsigned long address)
*/
write_lock_irqsave(&tasklist_lock, flags);
for_each_process(p) {
- mm = (in_atomic ? p->mm : get_task_mm(p));
+ task_lock(p);
+ mm = p->mm;
if (!mm)
- continue;
+ goto __continue;
- if (!down_read_trylock(&mm->mmap_sem)) {
- if (!in_atomic)
- mmput(mm);
- continue;
- }
+ if (!down_read_trylock(&mm->mmap_sem))
+ goto __continue;
for (n = rb_first(&mm->mm_rb); n; n = rb_next(n)) {
struct vm_area_struct *vma;
@@ -163,8 +161,7 @@ void decode_address(char *buf, unsigned long address)
name, vma->vm_start, vma->vm_end);
up_read(&mm->mmap_sem);
- if (!in_atomic)
- mmput(mm);
+ task_unlock(p);
if (buf[0] == '\0')
sprintf(buf, "[ %s ] dynamic memory", name);
@@ -174,8 +171,8 @@ void decode_address(char *buf, unsigned long address)
}
up_read(&mm->mmap_sem);
- if (!in_atomic)
- mmput(mm);
+__continue:
+ task_unlock(p);
}
/*
--
1.7.7.6
Traversing the tasks requires holding tasklist_lock, otherwise it
is unsafe.
Signed-off-by: Anton Vorontsov <[email protected]>
---
p.s. However, I'm not sure that calling os_kill_ptraced_process()
in the 'atomic' context is correct. It seem to work, but please
take a closer look.
arch/um/kernel/reboot.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/um/kernel/reboot.c b/arch/um/kernel/reboot.c
index 4d93dff..66d754c 100644
--- a/arch/um/kernel/reboot.c
+++ b/arch/um/kernel/reboot.c
@@ -4,6 +4,7 @@
*/
#include "linux/sched.h"
+#include "linux/spinlock.h"
#include "linux/slab.h"
#include "kern_util.h"
#include "os.h"
@@ -22,6 +23,7 @@ static void kill_off_processes(void)
struct task_struct *p;
int pid;
+ read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm == NULL)
continue;
@@ -29,6 +31,7 @@ static void kill_off_processes(void)
pid = p->mm->context.id.u.pid;
os_kill_ptraced_process(pid, 1);
}
+ read_unlock(&tasklist_lock);
}
}
--
1.7.7.6
Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so let's take the task lock while we care about its mm.
Signed-off-by: Anton Vorontsov <[email protected]>
---
arch/um/kernel/reboot.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/um/kernel/reboot.c b/arch/um/kernel/reboot.c
index 66d754c..1411f4e 100644
--- a/arch/um/kernel/reboot.c
+++ b/arch/um/kernel/reboot.c
@@ -25,10 +25,13 @@ static void kill_off_processes(void)
read_lock(&tasklist_lock);
for_each_process(p) {
- if (p->mm == NULL)
+ task_lock(p);
+ if (!p->mm) {
+ task_unlock(p);
continue;
-
+ }
pid = p->mm->context.id.u.pid;
+ task_unlock(p);
os_kill_ptraced_process(pid, 1);
}
read_unlock(&tasklist_lock);
--
1.7.7.6
On 02/07, Anton Vorontsov wrote:
>
> For sysrq case I kept the force_sig() usage, this is because in sysrq
> case I belive we do want to kill PID namespace init processes.
Agreed,
> If using
> force_sig() is still a bad idea, I guess we should fix it somehow
> else.
Yes, I think it is simply wrong here. And I think that force_*
should not take the "struct task_struct *" argument, it should be
used for synchronous signals only.
I am thinking about the patch below. With this patch send_sig_all()
can use send_sig_info(SEND_SIG_FORCED). And probably send_sig_all()
doesn't need tasklist in this case.
I'll recheck this but I need to disappear until Friday, sorry.
Oleg.
--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -58,21 +58,20 @@ static int sig_handler_ignored(void __us
(handler == SIG_DFL && sig_kernel_ignore(sig));
}
-static int sig_task_ignored(struct task_struct *t, int sig,
- int from_ancestor_ns)
+static int sig_task_ignored(struct task_struct *t, int sig, bool force)
{
void __user *handler;
handler = sig_handler(t, sig);
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
- handler == SIG_DFL && !from_ancestor_ns)
+ handler == SIG_DFL && !force)
return 1;
return sig_handler_ignored(handler, sig);
}
-static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
+static int sig_ignored(struct task_struct *t, int sig, bool force)
{
/*
* Blocked signals are never ignored, since the
@@ -82,7 +81,7 @@ static int sig_ignored(struct task_struc
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;
- if (!sig_task_ignored(t, sig, from_ancestor_ns))
+ if (!sig_task_ignored(t, sig, force))
return 0;
/*
@@ -855,7 +854,7 @@ static void ptrace_trap_notify(struct ta
* Returns true if the signal should be actually delivered, otherwise
* it should be dropped.
*/
-static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
+static int prepare_signal(int sig, struct task_struct *p, bool force)
{
struct signal_struct *signal = p->signal;
struct task_struct *t;
@@ -915,7 +914,7 @@ static int prepare_signal(int sig, struc
}
}
- return !sig_ignored(p, sig, from_ancestor_ns);
+ return !sig_ignored(p, sig, force);
}
/*
@@ -1059,7 +1058,8 @@ static int __send_signal(int sig, struct
assert_spin_locked(&t->sighand->siglock);
- if (!prepare_signal(sig, t, from_ancestor_ns))
+ if (!prepare_signal(sig, t,
+ from_ancestor_ns || (info == SEND_SIG_FORCED)))
return 0;
pending = group ? &t->signal->shared_pending : &t->pending;
On 02/07, Anton Vorontsov wrote:
>
> Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> assigns NULL under task_lock(), so tasklist lock is not enough).
>
> We can't use get_task_mm()/mmput() pair as mmput() might sleep,
Yes, but
> so let's take the task lock while we care about its mm.
it seems that this needs find_lock_task_mm() too ?
the same for the rest patches in this series.
Oleg.
On 02/07, Anton Vorontsov wrote:
>
> Even when in atomic context, grabbing irqsave variant of tasklist lock
> is not enough to protect task->mm from disappearing on SMP machines.
> Instead, we have to grab the task lock.
Yes, but afaics there is no reason for write_lock_irqsave(tasklist).
read_lock() should be enough.
I know nothing about arch/blackfin/ but in fact this looks simply wrong.
For example. sysrq_showregs_othercpus() does smp_call_function(showacpu)
and showacpu() show_stack()->decode_address(). Now suppose that IPI
interrupts the task holding read_lock(tasklist).
Mike?
Oleg.
On Tue, 7 Feb 2012, Anton Vorontsov wrote:
> There's a real possibility of killing kernel threads that might
> have issued use_mm(), so kthread's mm might become non-NULL.
>
> This patch fixes the issue by checking for PF_KTHREAD (just as
> get_task_mm()).
>
> Suggested-by: Oleg Nesterov <[email protected]>
> Signed-off-by: Anton Vorontsov <[email protected]>
Acked-by: David Rientjes <[email protected]>
Definitely needed.
On Tue, 7 Feb 2012, Anton Vorontsov wrote:
> Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> assigns NULL under task_lock(), so tasklist lock is not enough).
>
> We can't use get_task_mm()/mmput() pair as mmput() might sleep,
> so let's take the task lock while we care about its mm.
>
> Signed-off-by: Anton Vorontsov <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Tue, 7 Feb 2012, Anton Vorontsov wrote:
> sysrq should grab the tasklist lock, otherwise calling force_sig() is
> not safe, as it might race with exiting task, which ->sighand might be
> set to NULL already.
>
> Signed-off-by: Anton Vorontsov <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Wed, Feb 08, 2012 at 05:08:08PM +0100, Oleg Nesterov wrote:
> On 02/07, Anton Vorontsov wrote:
> >
> > Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> > assigns NULL under task_lock(), so tasklist lock is not enough).
> >
> > We can't use get_task_mm()/mmput() pair as mmput() might sleep,
>
> Yes, but
>
> > so let's take the task lock while we care about its mm.
>
> it seems that this needs find_lock_task_mm() too ?
>
> the same for the rest patches in this series.
Yep, I think you're right, will add this change.
--
Anton Vorontsov
Email: [email protected]
On Thu, Feb 09, 2012 at 07:33:46PM +0400, Anton Vorontsov wrote:
> On Wed, Feb 08, 2012 at 05:08:08PM +0100, Oleg Nesterov wrote:
> > On 02/07, Anton Vorontsov wrote:
> > >
> > > Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> > > assigns NULL under task_lock(), so tasklist lock is not enough).
> > >
> > > We can't use get_task_mm()/mmput() pair as mmput() might sleep,
> >
> > Yes, but
> >
> > > so let's take the task lock while we care about its mm.
> >
> > it seems that this needs find_lock_task_mm() too ?
> >
> > the same for the rest patches in this series.
>
> Yep, I think you're right, will add this change.
Thinking about it more... making the code use find_lock_task_mm
would be a behaviour change. Sure, in trivial cases like ARM this
looks like a 100% safe thing to do, but in e.g. UML case, I
wouldn't bet much money on that 'mm->context.id.u.pid' would be
still meaningful.
So, I'd rather do it in a separate change, so this can be easily
reverted.
--
Anton Vorontsov
Email: [email protected]
On 02/09, Anton Vorontsov wrote:
>
> On Thu, Feb 09, 2012 at 07:33:46PM +0400, Anton Vorontsov wrote:
> > On Wed, Feb 08, 2012 at 05:08:08PM +0100, Oleg Nesterov wrote:
> > > On 02/07, Anton Vorontsov wrote:
> > > >
> > > > Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
> > > > assigns NULL under task_lock(), so tasklist lock is not enough).
> > > >
> > > > We can't use get_task_mm()/mmput() pair as mmput() might sleep,
> > >
> > > Yes, but
> > >
> > > > so let's take the task lock while we care about its mm.
> > >
> > > it seems that this needs find_lock_task_mm() too ?
> > >
> > > the same for the rest patches in this series.
> >
> > Yep, I think you're right, will add this change.
>
> Thinking about it more... making the code use find_lock_task_mm
> would be a behaviour change. Sure, in trivial cases like ARM this
> looks like a 100% safe thing to do, but in e.g. UML case, I
> wouldn't bet much money on that 'mm->context.id.u.pid' would be
> still meaningful.
OK, perhaps UML differs. I don't know what context.id.u.pid means.
Although at first glance it would be meaningful anyway...
> So, I'd rather do it in a separate change, so this can be easily
> reverted.
In the !UML case find_lock_task_mm() "obviously looks like the right
thing...
But I won't argue, up to you.
Oleg.