2012-10-19 08:35:47

by Xiaotian Feng

[permalink] [raw]
Subject: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

There's a regression from commit 800d4d30, in autogroup_move_group()

p->signal->autogroup = autogroup_kref_get(ag);

if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
goto out;
...
out:
autogroup_kref_put(prev);

So kernel changed p's autogroup to ag, but never sched_move_task(p).
Then previous autogroup of p is released, which may release task_group
related with p. After commit 8323f26ce, p->sched_task_group might point
to this stale value, and thus caused kernel crashes.

This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0"
to your /etc/sysctl.conf, your system will never boot up. It is not reasonable
to put the sysctl enabled check in autogroup_move_group(), kernel should check
it before autogroup_create in sched_autogroup_create_attach().

Reported-by: cwillu <[email protected]>
Reported-by: Luis Henriques <[email protected]>
Signed-off-by: Xiaotian Feng <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/sched/auto_group.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index 0984a21..ac62415 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)

p->signal->autogroup = autogroup_kref_get(ag);

- if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
- goto out;
-
t = p;
do {
sched_move_task(t);
} while_each_thread(p, t);

-out:
unlock_task_sighand(p, &flags);
autogroup_kref_put(prev);
}
@@ -159,8 +155,12 @@ out:
/* Allocates GFP_KERNEL, cannot be called under any spinlock */
void sched_autogroup_create_attach(struct task_struct *p)
{
- struct autogroup *ag = autogroup_create();
+ struct autogroup *ag;
+
+ if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
+ return;

+ ag = autogroup_create();
autogroup_move_group(p, ag);
/* drop extra reference added by autogroup_create() */
autogroup_kref_put(ag);
--
1.7.9.5


2012-10-19 13:42:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

Always try and CC people who wrote the code..

On Fri, 2012-10-19 at 16:36 +0800, Xiaotian Feng wrote:
> There's a regression from commit 800d4d30, in autogroup_move_group()
>
> p->signal->autogroup = autogroup_kref_get(ag);
>
> if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> goto out;
> ...
> out:
> autogroup_kref_put(prev);
>
> So kernel changed p's autogroup to ag, but never sched_move_task(p).
> Then previous autogroup of p is released, which may release task_group
> related with p. After commit 8323f26ce, p->sched_task_group might point
> to this stale value, and thus caused kernel crashes.
>
> This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0"
> to your /etc/sysctl.conf, your system will never boot up. It is not reasonable
> to put the sysctl enabled check in autogroup_move_group(), kernel should check
> it before autogroup_create in sched_autogroup_create_attach().
>
> Reported-by: cwillu <[email protected]>
> Reported-by: Luis Henriques <[email protected]>
> Signed-off-by: Xiaotian Feng <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> kernel/sched/auto_group.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
> index 0984a21..ac62415 100644
> --- a/kernel/sched/auto_group.c
> +++ b/kernel/sched/auto_group.c
> @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
>
> p->signal->autogroup = autogroup_kref_get(ag);
>
> - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> - goto out;
> -
> t = p;
> do {
> sched_move_task(t);
> } while_each_thread(p, t);
>
> -out:
> unlock_task_sighand(p, &flags);
> autogroup_kref_put(prev);
> }

So I've looked at this for all of 1 minute, but why isn't moving that
check up one line to be above the p->signal->autogroup assignment
enough?

> @@ -159,8 +155,12 @@ out:
> /* Allocates GFP_KERNEL, cannot be called under any spinlock */
> void sched_autogroup_create_attach(struct task_struct *p)
> {
> - struct autogroup *ag = autogroup_create();
> + struct autogroup *ag;
> +
> + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> + return;
>
> + ag = autogroup_create();
> autogroup_move_group(p, ag);
> /* drop extra reference added by autogroup_create() */
> autogroup_kref_put(ag);

Man,.. so on memory allocation fail we'll put the group in
autogroup_default, which I think ends up being the root cgroup.

But what happens when sysctl_sched_autogroup_enabled is false?

It looks like sched_autogroup_fork() is effective in that case, which
would mean we'll stay in whatever group our parent is in, which is not
the same as being disabled.

2012-10-20 06:43:01

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

On Fri, Oct 19, 2012 at 9:42 PM, Peter Zijlstra <[email protected]> wrote:
> Always try and CC people who wrote the code..
>
> On Fri, 2012-10-19 at 16:36 +0800, Xiaotian Feng wrote:
>> There's a regression from commit 800d4d30, in autogroup_move_group()
>>
>> p->signal->autogroup = autogroup_kref_get(ag);
>>
>> if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
>> goto out;
>> ...
>> out:
>> autogroup_kref_put(prev);
>>
>> So kernel changed p's autogroup to ag, but never sched_move_task(p).
>> Then previous autogroup of p is released, which may release task_group
>> related with p. After commit 8323f26ce, p->sched_task_group might point
>> to this stale value, and thus caused kernel crashes.
>>
>> This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0"
>> to your /etc/sysctl.conf, your system will never boot up. It is not reasonable
>> to put the sysctl enabled check in autogroup_move_group(), kernel should check
>> it before autogroup_create in sched_autogroup_create_attach().
>>
>> Reported-by: cwillu <[email protected]>
>> Reported-by: Luis Henriques <[email protected]>
>> Signed-off-by: Xiaotian Feng <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> ---
>> kernel/sched/auto_group.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
>> index 0984a21..ac62415 100644
>> --- a/kernel/sched/auto_group.c
>> +++ b/kernel/sched/auto_group.c
>> @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
>>
>> p->signal->autogroup = autogroup_kref_get(ag);
>>
>> - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
>> - goto out;
>> -
>> t = p;
>> do {
>> sched_move_task(t);
>> } while_each_thread(p, t);
>>
>> -out:
>> unlock_task_sighand(p, &flags);
>> autogroup_kref_put(prev);
>> }
>
> So I've looked at this for all of 1 minute, but why isn't moving that
> check up one line to be above the p->signal->autogroup assignment
> enough?

I think if autogroup is disabled, we don't need to use
autogroup_create() to create a new ag and tg, kernel will not use it.

>
>> @@ -159,8 +155,12 @@ out:
>> /* Allocates GFP_KERNEL, cannot be called under any spinlock */
>> void sched_autogroup_create_attach(struct task_struct *p)
>> {
>> - struct autogroup *ag = autogroup_create();
>> + struct autogroup *ag;
>> +
>> + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
>> + return;
>>
>> + ag = autogroup_create();
>> autogroup_move_group(p, ag);
>> /* drop extra reference added by autogroup_create() */
>> autogroup_kref_put(ag);
>
> Man,.. so on memory allocation fail we'll put the group in
> autogroup_default, which I think ends up being the root cgroup.
>
> But what happens when sysctl_sched_autogroup_enabled is false?
>

autogroup runtime disable is very nasty, as it might happen at any
place of sched_move_group() for any setsid task.
After sysctl_sched_autogroup_enabled is changed to false,
autogroup_task_group(p, tg) will return tg, which is from its cpu
cgroup.

> It looks like sched_autogroup_fork() is effective in that case, which
> would mean we'll stay in whatever group our parent is in, which is not
> the same as being disabled.

It's true, but after autogroup is disabled, p->signal->autogroup will
never be used then, as autogroup_task_group() will not use it. But
after autogroup is enabled again, it might be a disaster....

I think we'd better delete the runtime enable/disable support for
autogroup, because it might mess up the group scheduler....

>
>

2012-10-20 12:37:54

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

On Sat, 2012-10-20 at 14:42 +0800, Xiaotian Feng wrote:
> On Fri, Oct 19, 2012 at 9:42 PM, Peter Zijlstra <[email protected]> wrote:
> > Always try and CC people who wrote the code..
> >
> > On Fri, 2012-10-19 at 16:36 +0800, Xiaotian Feng wrote:
> >> There's a regression from commit 800d4d30, in autogroup_move_group()
> >>
> >> p->signal->autogroup = autogroup_kref_get(ag);
> >>
> >> if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> >> goto out;
> >> ...
> >> out:
> >> autogroup_kref_put(prev);
> >>
> >> So kernel changed p's autogroup to ag, but never sched_move_task(p).
> >> Then previous autogroup of p is released, which may release task_group
> >> related with p. After commit 8323f26ce, p->sched_task_group might point
> >> to this stale value, and thus caused kernel crashes.
> >>
> >> This is very easy to reproduce, add "kernel.sched_autogroup_enabled = 0"
> >> to your /etc/sysctl.conf, your system will never boot up. It is not reasonable
> >> to put the sysctl enabled check in autogroup_move_group(), kernel should check
> >> it before autogroup_create in sched_autogroup_create_attach().
> >>
> >> Reported-by: cwillu <[email protected]>
> >> Reported-by: Luis Henriques <[email protected]>
> >> Signed-off-by: Xiaotian Feng <[email protected]>
> >> Cc: Ingo Molnar <[email protected]>
> >> Cc: Peter Zijlstra <[email protected]>
> >> ---
> >> kernel/sched/auto_group.c | 10 +++++-----
> >> 1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
> >> index 0984a21..ac62415 100644
> >> --- a/kernel/sched/auto_group.c
> >> +++ b/kernel/sched/auto_group.c
> >> @@ -143,15 +143,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
> >>
> >> p->signal->autogroup = autogroup_kref_get(ag);
> >>
> >> - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> >> - goto out;
> >> -
> >> t = p;
> >> do {
> >> sched_move_task(t);
> >> } while_each_thread(p, t);
> >>
> >> -out:
> >> unlock_task_sighand(p, &flags);
> >> autogroup_kref_put(prev);
> >> }
> >
> > So I've looked at this for all of 1 minute, but why isn't moving that
> > check up one line to be above the p->signal->autogroup assignment
> > enough?
>
> I think if autogroup is disabled, we don't need to use
> autogroup_create() to create a new ag and tg, kernel will not use it.
>
> >
> >> @@ -159,8 +155,12 @@ out:
> >> /* Allocates GFP_KERNEL, cannot be called under any spinlock */
> >> void sched_autogroup_create_attach(struct task_struct *p)
> >> {
> >> - struct autogroup *ag = autogroup_create();
> >> + struct autogroup *ag;
> >> +
> >> + if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> >> + return;
> >>
> >> + ag = autogroup_create();
> >> autogroup_move_group(p, ag);
> >> /* drop extra reference added by autogroup_create() */
> >> autogroup_kref_put(ag);
> >
> > Man,.. so on memory allocation fail we'll put the group in
> > autogroup_default, which I think ends up being the root cgroup.
> >
> > But what happens when sysctl_sched_autogroup_enabled is false?
> >
>
> autogroup runtime disable is very nasty, as it might happen at any
> place of sched_move_group() for any setsid task.
> After sysctl_sched_autogroup_enabled is changed to false,
> autogroup_task_group(p, tg) will return tg, which is from its cpu
> cgroup.
>
> > It looks like sched_autogroup_fork() is effective in that case, which
> > would mean we'll stay in whatever group our parent is in, which is not
> > the same as being disabled.
>
> It's true, but after autogroup is disabled, p->signal->autogroup will
> never be used then, as autogroup_task_group() will not use it. But
> after autogroup is enabled again, it might be a disaster....

autogroups are intended to always exist, enable/disable only a choice of
whether you use it or not.

> I think we'd better delete the runtime enable/disable support for
> autogroup, because it might mess up the group scheduler....

Disabling runtime on/off sounds good to me too. Not because it will
mess up the scheduler, it doesn't, but the on/off switch does not take
effect instantly, and in some cases doesn't ever take effect (fully
functional on/off was shot down, so doing that now won't fly either).

So what I would do is either let the user decide once at boot, in which
case if off, creating groups would be stupid), or, just rip autogroup
completely out, since systemd is taking over the known universe anyway.

-Mike

2012-10-26 20:29:03

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

On Sat, 2012-10-20 at 08:38 -0400, Mike Galbraith wrote:

> So what I would do is either let the user decide once at boot, in which
> case if off, creating groups would be stupid), or, just rip autogroup
> completely out, since systemd is taking over the known universe anyway.

I'm traveling, but have somewhat functional connectivity ATM, so..

Peter: which would prefer. Simple noautogroup -> autogroup one time
only boottime enable, and autogroup lives on (I like it for my laptop)
with backport for stable, or fix stable as above, and whack it upstream
as annoyance since systemd (one daemon to bind them..) is being adopted
everywhere?

(other?.. fully function on/off switch? revert 800d4d30?)

-Mike

2012-10-27 18:25:43

by Mike Galbraith

[permalink] [raw]
Subject: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled

On Fri, 2012-10-26 at 13:29 -0700, Mike Galbraith wrote:
> On Sat, 2012-10-20 at 08:38 -0400, Mike Galbraith wrote:
>
> > So what I would do is either let the user decide once at boot, in which
> > case if off, creating groups would be stupid), or, just rip autogroup
> > completely out, since systemd is taking over the known universe anyway.
>
> I'm traveling, but have somewhat functional connectivity ATM, so..
>
> Peter: which would prefer. Simple noautogroup -> autogroup one time
> only boottime enable, and autogroup lives on (I like it for my laptop)
> with backport for stable.

Like so, with bonus points for extra minus signs.

sched, autogroup: fix crash on reboot when autogroup is disabled

Between 8323f26ce and 800d4d30, autogroup is a wreck. With both
applied, all you have to do to crash a box is disable autogroup
during boot up, then reboot.. boom, NULL pointer dereference due
to 800d4d30 not allowing autogroup to move things, and 8323f26ce
making that the only way to switch runqueues.

Remove all of the knobs, and make autogroup only go active if the
user provides 'autogroup' on the command line. This allows distros
to offer it, once the user asks for it, it's on. If the user then
fiddles with cgroups, tough, once tasks are moved, autogroup won't
mess with them again unless they call setsid().

No knobs, no glitz, nada, just a cute little thing folks can turn
on if they don't want to muck about with cgroups and/or systemd.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: [email protected] v3.6

---
Documentation/kernel-parameters.txt | 4 -
fs/proc/base.c | 78 ------------------------------------
include/linux/sched.h | 2
kernel/sched/auto_group.c | 74 ++++++----------------------------
kernel/sched/auto_group.h | 9 ----
kernel/sysctl.c | 11 -----
6 files changed, 17 insertions(+), 161 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -367,6 +367,8 @@ bytes respectively. Such letter suffixes
autoconf= [IPV6]
See Documentation/networking/ipv6.txt.

+ autogroup Enable scheduler automatic task group creation.
+
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
Limit apic dumping. The parameter defines the maximal
number of local apics being dumped. Also it is possible
@@ -1810,8 +1812,6 @@ bytes respectively. Such letter suffixes
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.

- noautogroup Disable scheduler automatic task group creation.
-
nobats [PPC] Do not use BATs for mapping kernel lowmem
on "Classic" PPC cores.

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1165,81 +1165,6 @@ static const struct file_operations proc

#endif

-#ifdef CONFIG_SCHED_AUTOGROUP
-/*
- * Print out autogroup related information:
- */
-static int sched_autogroup_show(struct seq_file *m, void *v)
-{
- struct inode *inode = m->private;
- struct task_struct *p;
-
- p = get_proc_task(inode);
- if (!p)
- return -ESRCH;
- proc_sched_autogroup_show_task(p, m);
-
- put_task_struct(p);
-
- return 0;
-}
-
-static ssize_t
-sched_autogroup_write(struct file *file, const char __user *buf,
- size_t count, loff_t *offset)
-{
- struct inode *inode = file->f_path.dentry->d_inode;
- struct task_struct *p;
- char buffer[PROC_NUMBUF];
- int nice;
- int err;
-
- memset(buffer, 0, sizeof(buffer));
- if (count > sizeof(buffer) - 1)
- count = sizeof(buffer) - 1;
- if (copy_from_user(buffer, buf, count))
- return -EFAULT;
-
- err = kstrtoint(strstrip(buffer), 0, &nice);
- if (err < 0)
- return err;
-
- p = get_proc_task(inode);
- if (!p)
- return -ESRCH;
-
- err = proc_sched_autogroup_set_nice(p, nice);
- if (err)
- count = err;
-
- put_task_struct(p);
-
- return count;
-}
-
-static int sched_autogroup_open(struct inode *inode, struct file *filp)
-{
- int ret;
-
- ret = single_open(filp, sched_autogroup_show, NULL);
- if (!ret) {
- struct seq_file *m = filp->private_data;
-
- m->private = inode;
- }
- return ret;
-}
-
-static const struct file_operations proc_pid_sched_autogroup_operations = {
- .open = sched_autogroup_open,
- .read = seq_read,
- .write = sched_autogroup_write,
- .llseek = seq_lseek,
- .release = single_release,
-};
-
-#endif /* CONFIG_SCHED_AUTOGROUP */
-
static ssize_t comm_write(struct file *file, const char __user *buf,
size_t count, loff_t *offset)
{
@@ -2550,9 +2475,6 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
-#ifdef CONFIG_SCHED_AUTOGROUP
- REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
-#endif
REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
INF("syscall", S_IRUGO, proc_pid_syscall),
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2020,8 +2020,6 @@ int sched_rt_handler(struct ctl_table *t
loff_t *ppos);

#ifdef CONFIG_SCHED_AUTOGROUP
-extern unsigned int sysctl_sched_autogroup_enabled;
-
extern void sched_autogroup_create_attach(struct task_struct *p);
extern void sched_autogroup_detach(struct task_struct *p);
extern void sched_autogroup_fork(struct signal_struct *sig);
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -9,7 +9,7 @@
#include <linux/security.h>
#include <linux/export.h>

-unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
+unsigned int __read_mostly autogroup_enabled = 0;
static struct autogroup autogroup_default;
static atomic_t autogroup_seq_nr;

@@ -110,6 +110,9 @@ static inline struct autogroup *autogrou

bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
{
+ if (!autogroup_enabled)
+ return false;
+
if (tg != &root_task_group)
return false;

@@ -143,15 +146,11 @@ autogroup_move_group(struct task_struct

p->signal->autogroup = autogroup_kref_get(ag);

- if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
- goto out;
-
t = p;
do {
sched_move_task(t);
} while_each_thread(p, t);

-out:
unlock_task_sighand(p, &flags);
autogroup_kref_put(prev);
}
@@ -159,8 +158,11 @@ autogroup_move_group(struct task_struct
/* Allocates GFP_KERNEL, cannot be called under any spinlock */
void sched_autogroup_create_attach(struct task_struct *p)
{
- struct autogroup *ag = autogroup_create();
+ struct autogroup *ag;

+ if (!autogroup_enabled)
+ return;
+ ag = autogroup_create();
autogroup_move_group(p, ag);
/* drop extra reference added by autogroup_create() */
autogroup_kref_put(ag);
@@ -176,74 +178,26 @@ EXPORT_SYMBOL(sched_autogroup_detach);

void sched_autogroup_fork(struct signal_struct *sig)
{
+ if (!autogroup_enabled)
+ return;
sig->autogroup = autogroup_task_get(current);
}

void sched_autogroup_exit(struct signal_struct *sig)
{
+ if (!autogroup_enabled)
+ return;
autogroup_kref_put(sig->autogroup);
}

static int __init setup_autogroup(char *str)
{
- sysctl_sched_autogroup_enabled = 0;
+ autogroup_enabled = 1;

return 1;
}

-__setup("noautogroup", setup_autogroup);
-
-#ifdef CONFIG_PROC_FS
-
-int proc_sched_autogroup_set_nice(struct task_struct *p, int nice)
-{
- static unsigned long next = INITIAL_JIFFIES;
- struct autogroup *ag;
- int err;
-
- if (nice < -20 || nice > 19)
- return -EINVAL;
-
- err = security_task_setnice(current, nice);
- if (err)
- return err;
-
- if (nice < 0 && !can_nice(current, nice))
- return -EPERM;
-
- /* this is a heavy operation taking global locks.. */
- if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
- return -EAGAIN;
-
- next = HZ / 10 + jiffies;
- ag = autogroup_task_get(p);
-
- down_write(&ag->lock);
- err = sched_group_set_shares(ag->tg, prio_to_weight[nice + 20]);
- if (!err)
- ag->nice = nice;
- up_write(&ag->lock);
-
- autogroup_kref_put(ag);
-
- return err;
-}
-
-void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
-{
- struct autogroup *ag = autogroup_task_get(p);
-
- if (!task_group_is_autogroup(ag->tg))
- goto out;
-
- down_read(&ag->lock);
- seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
- up_read(&ag->lock);
-
-out:
- autogroup_kref_put(ag);
-}
-#endif /* CONFIG_PROC_FS */
+__setup("autogroup", setup_autogroup);

#ifdef CONFIG_SCHED_DEBUG
int autogroup_path(struct task_group *tg, char *buf, int buflen)
--- a/kernel/sched/auto_group.h
+++ b/kernel/sched/auto_group.h
@@ -4,11 +4,6 @@
#include <linux/rwsem.h>

struct autogroup {
- /*
- * reference doesn't mean how many thread attach to this
- * autogroup now. It just stands for the number of task
- * could use this autogroup.
- */
struct kref kref;
struct task_group *tg;
struct rw_semaphore lock;
@@ -29,9 +24,7 @@ extern bool task_wants_autogroup(struct
static inline struct task_group *
autogroup_task_group(struct task_struct *p, struct task_group *tg)
{
- int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
-
- if (enabled && task_wants_autogroup(p, tg))
+ if (task_wants_autogroup(p, tg))
return p->signal->autogroup->tg;

return tg;
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -362,17 +362,6 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = sched_rt_handler,
},
-#ifdef CONFIG_SCHED_AUTOGROUP
- {
- .procname = "sched_autogroup_enabled",
- .data = &sysctl_sched_autogroup_enabled,
- .maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &zero,
- .extra2 = &one,
- },
-#endif
#ifdef CONFIG_CFS_BANDWIDTH
{
.procname = "sched_cfs_bandwidth_slice_us",

2012-10-28 10:25:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled


* Mike Galbraith <[email protected]> wrote:

> On Fri, 2012-10-26 at 13:29 -0700, Mike Galbraith wrote:
> > On Sat, 2012-10-20 at 08:38 -0400, Mike Galbraith wrote:
> >
> > > So what I would do is either let the user decide once at boot, in which
> > > case if off, creating groups would be stupid), or, just rip autogroup
> > > completely out, since systemd is taking over the known universe anyway.
> >
> > I'm traveling, but have somewhat functional connectivity ATM, so..
> >
> > Peter: which would prefer. Simple noautogroup -> autogroup one time
> > only boottime enable, and autogroup lives on (I like it for my laptop)
> > with backport for stable.
>
> Like so, with bonus points for extra minus signs.
>
> sched, autogroup: fix crash on reboot when autogroup is disabled
>
> Between 8323f26ce and 800d4d30, autogroup is a wreck. With both
> applied, all you have to do to crash a box is disable autogroup
> during boot up, then reboot.. boom, NULL pointer dereference due
> to 800d4d30 not allowing autogroup to move things, and 8323f26ce
> making that the only way to switch runqueues.
>
> Remove all of the knobs, and make autogroup only go active if the
> user provides 'autogroup' on the command line. This allows distros
> to offer it, once the user asks for it, it's on. If the user then
> fiddles with cgroups, tough, once tasks are moved, autogroup won't
> mess with them again unless they call setsid().
>
> No knobs, no glitz, nada, just a cute little thing folks can turn
> on if they don't want to muck about with cgroups and/or systemd.

Please also keep the Kconfig switch and reuse it to turn on the
'autogroups' knob.

That way people with existing .config's don't have to change a
thing to get this functionality.

Thanks,

Ingo

2012-10-28 13:12:57

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled

On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote:
> * Mike Galbraith <[email protected]> wrote:
>

> > No knobs, no glitz, nada, just a cute little thing folks can turn
> > on if they don't want to muck about with cgroups and/or systemd.
>
> Please also keep the Kconfig switch and reuse it to turn on the
> 'autogroups' knob.
>
> That way people with existing .config's don't have to change a
> thing to get this functionality.

The Kconfig option is still there. The noautogroup -> autogroup arg
change just makes it off by default (since an on/off switch would have
to be a full move everybody thing post 8323f26ce race fix), so distros
can make it available in their swiss army knife config, but it'll be out
of the way unless specifically asked for by the user at boot.

I can make it default 'on' by removing that arg change if you think
that's the better way to go, but opt in at boot sounded better to me
given there is no runtime on/off switch at all now.

-Mike

2012-10-28 13:19:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled


* Mike Galbraith <[email protected]> wrote:

> On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <[email protected]> wrote:
> >
>
> > > No knobs, no glitz, nada, just a cute little thing folks can turn
> > > on if they don't want to muck about with cgroups and/or systemd.
> >
> > Please also keep the Kconfig switch and reuse it to turn on
> > the 'autogroups' knob.
> >
> > That way people with existing .config's don't have to change
> > a thing to get this functionality.
>
> The Kconfig option is still there. The noautogroup ->
> autogroup arg change just makes it off by default (since an
> on/off switch would have to be a full move everybody thing
> post 8323f26ce race fix), so distros can make it available in
> their swiss army knife config, but it'll be out of the way
> unless specifically asked for by the user at boot.
>
> I can make it default 'on' by removing that arg change if you
> think that's the better way to go, but opt in at boot sounded
> better to me given there is no runtime on/off switch at all
> now.

If I got your patch right then adding a command line option to
turn it on will disable it in essence for pretty much everyone
who has CONFIG_SCHED_AUTOGROUP=y in their .config today.

The patch should not change the defaults for existing .config's.

I.e. if autogroups was off, it should stay off, but if
autogroups was enabled in the .config and the kernel booted with
it enabled, then it should continue to do so in the future as
well.

Adding a boot tweak and removing the runtime knobs is OK -
changing the current default functionality is not.

Thanks,

Ingo

2012-10-28 13:32:52

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled

On Sun, 2012-10-28 at 14:19 +0100, Ingo Molnar wrote:
> * Mike Galbraith <[email protected]> wrote:
>
> > On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote:
> > > * Mike Galbraith <[email protected]> wrote:
> > >
> >
> > > > No knobs, no glitz, nada, just a cute little thing folks can turn
> > > > on if they don't want to muck about with cgroups and/or systemd.
> > >
> > > Please also keep the Kconfig switch and reuse it to turn on
> > > the 'autogroups' knob.
> > >
> > > That way people with existing .config's don't have to change
> > > a thing to get this functionality.
> >
> > The Kconfig option is still there. The noautogroup ->
> > autogroup arg change just makes it off by default (since an
> > on/off switch would have to be a full move everybody thing
> > post 8323f26ce race fix), so distros can make it available in
> > their swiss army knife config, but it'll be out of the way
> > unless specifically asked for by the user at boot.
> >
> > I can make it default 'on' by removing that arg change if you
> > think that's the better way to go, but opt in at boot sounded
> > better to me given there is no runtime on/off switch at all
> > now.
>
> If I got your patch right then adding a command line option to
> turn it on will disable it in essence for pretty much everyone
> who has CONFIG_SCHED_AUTOGROUP=y in their .config today.

With no user intervention, yes.

> The patch should not change the defaults for existing .config's.
>
> I.e. if autogroups was off, it should stay off, but if
> autogroups was enabled in the .config and the kernel booted with
> it enabled, then it should continue to do so in the future as
> well.
>
> Adding a boot tweak and removing the runtime knobs is OK -
> changing the current default functionality is not.

Ok, I'll whack the arg change and respin.

-Mike

2012-10-28 14:05:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled


* Mike Galbraith <[email protected]> wrote:

> On Sun, 2012-10-28 at 14:19 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <[email protected]> wrote:
> >
> > > On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote:
> > > > * Mike Galbraith <[email protected]> wrote:
> > > >
> > >
> > > > > No knobs, no glitz, nada, just a cute little thing folks can turn
> > > > > on if they don't want to muck about with cgroups and/or systemd.
> > > >
> > > > Please also keep the Kconfig switch and reuse it to turn on
> > > > the 'autogroups' knob.
> > > >
> > > > That way people with existing .config's don't have to change
> > > > a thing to get this functionality.
> > >
> > > The Kconfig option is still there. The noautogroup ->
> > > autogroup arg change just makes it off by default (since an
> > > on/off switch would have to be a full move everybody thing
> > > post 8323f26ce race fix), so distros can make it available in
> > > their swiss army knife config, but it'll be out of the way
> > > unless specifically asked for by the user at boot.
> > >
> > > I can make it default 'on' by removing that arg change if you
> > > think that's the better way to go, but opt in at boot sounded
> > > better to me given there is no runtime on/off switch at all
> > > now.
> >
> > If I got your patch right then adding a command line option to
> > turn it on will disable it in essence for pretty much everyone
> > who has CONFIG_SCHED_AUTOGROUP=y in their .config today.
>
> With no user intervention, yes.

'No user intervention' is what happens with new kernel
commandline options, in 99.9999% of the cases.

> > The patch should not change the defaults for existing
> > .config's.
> >
> > I.e. if autogroups was off, it should stay off, but if
> > autogroups was enabled in the .config and the kernel booted
> > with it enabled, then it should continue to do so in the
> > future as well.
> >
> > Adding a boot tweak and removing the runtime knobs is OK -
> > changing the current default functionality is not.
>
> Ok, I'll whack the arg change and respin.

Thanks!

I'd also suggest to still expose the state of autosched in
/proc/sys, read-only, so that its status can be checked.

Ingo

2012-10-28 14:26:41

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, autogroup: fix crash on reboot when autogroup is disabled

On Sun, 2012-10-28 at 15:05 +0100, Ingo Molnar wrote:

> I'd also suggest to still expose the state of autosched in
> /proc/sys, read-only, so that its status can be checked.

(Aw poo, less pretty minus signs;)

Ok, will do.

-Mike

2012-10-28 19:19:00

by Mike Galbraith

[permalink] [raw]
Subject: [PATCH] V2 sched, autogroup: fix crash on reboot when autogroup is disabled

On Sun, 2012-10-28 at 15:05 +0100, Ingo Molnar wrote:
> * Mike Galbraith <[email protected]> wrote:
>
> > On Sun, 2012-10-28 at 14:19 +0100, Ingo Molnar wrote:
> > > * Mike Galbraith <[email protected]> wrote:
> > >
> > > > On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote:
> > > > > * Mike Galbraith <[email protected]> wrote:
> > > > >
> > > >
> > > > > > No knobs, no glitz, nada, just a cute little thing folks can turn
> > > > > > on if they don't want to muck about with cgroups and/or systemd.
> > > > >
> > > > > Please also keep the Kconfig switch and reuse it to turn on
> > > > > the 'autogroups' knob.
> > > > >
> > > > > That way people with existing .config's don't have to change
> > > > > a thing to get this functionality.
> > > >
> > > > The Kconfig option is still there. The noautogroup ->
> > > > autogroup arg change just makes it off by default (since an
> > > > on/off switch would have to be a full move everybody thing
> > > > post 8323f26ce race fix), so distros can make it available in
> > > > their swiss army knife config, but it'll be out of the way
> > > > unless specifically asked for by the user at boot.
> > > >
> > > > I can make it default 'on' by removing that arg change if you
> > > > think that's the better way to go, but opt in at boot sounded
> > > > better to me given there is no runtime on/off switch at all
> > > > now.
> > >
> > > If I got your patch right then adding a command line option to
> > > turn it on will disable it in essence for pretty much everyone
> > > who has CONFIG_SCHED_AUTOGROUP=y in their .config today.
> >
> > With no user intervention, yes.
>
> 'No user intervention' is what happens with new kernel
> commandline options, in 99.9999% of the cases.
>
> > > The patch should not change the defaults for existing
> > > .config's.
> > >
> > > I.e. if autogroups was off, it should stay off, but if
> > > autogroups was enabled in the .config and the kernel booted
> > > with it enabled, then it should continue to do so in the
> > > future as well.
> > >
> > > Adding a boot tweak and removing the runtime knobs is OK -
> > > changing the current default functionality is not.
> >
> > Ok, I'll whack the arg change and respin.
>
> Thanks!
>
> I'd also suggest to still expose the state of autosched in
> /proc/sys, read-only, so that its status can be checked.

Done. Autogroup remains default on with noautogroup opt out at boot
time as before. Sysctl remains intact, read only. Knobs are gone.

sched, autogroup: fix crash on reboot when autogroup is disabled

Between 8323f26ce and 800d4d30, autogroup is a wreck. With both
applied, all you have to do to crash a box is disable autogroup
during boot up, then reboot.. boom, NULL pointer dereference due
to 800d4d30 not allowing autogroup to move things, and 8323f26ce
making that the only way to switch runqueues.

Remove all of the knobs, as what wasn't broken would be by making
autogroup exclusively either on or off from boot, with off meaning
autogroups are not created, so unavailable for proc interface.

If the user fiddles with cgroups hereafter, tough, once tasks are
moved, autogroup won't mess with them again unless they call setsid().

No knobs, no glitz, nada, just a cute little thing folks can turn
on if they don't want to muck about with cgroups and/or systemd.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: [email protected] v3.6

---
fs/proc/base.c | 78 ----------------------------------------------
kernel/sched/auto_group.c | 68 ++++++----------------------------------
kernel/sched/auto_group.h | 9 -----
kernel/sysctl.c | 6 +--
4 files changed, 14 insertions(+), 147 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1165,81 +1165,6 @@ static const struct file_operations proc

#endif

-#ifdef CONFIG_SCHED_AUTOGROUP
-/*
- * Print out autogroup related information:
- */
-static int sched_autogroup_show(struct seq_file *m, void *v)
-{
- struct inode *inode = m->private;
- struct task_struct *p;
-
- p = get_proc_task(inode);
- if (!p)
- return -ESRCH;
- proc_sched_autogroup_show_task(p, m);
-
- put_task_struct(p);
-
- return 0;
-}
-
-static ssize_t
-sched_autogroup_write(struct file *file, const char __user *buf,
- size_t count, loff_t *offset)
-{
- struct inode *inode = file->f_path.dentry->d_inode;
- struct task_struct *p;
- char buffer[PROC_NUMBUF];
- int nice;
- int err;
-
- memset(buffer, 0, sizeof(buffer));
- if (count > sizeof(buffer) - 1)
- count = sizeof(buffer) - 1;
- if (copy_from_user(buffer, buf, count))
- return -EFAULT;
-
- err = kstrtoint(strstrip(buffer), 0, &nice);
- if (err < 0)
- return err;
-
- p = get_proc_task(inode);
- if (!p)
- return -ESRCH;
-
- err = proc_sched_autogroup_set_nice(p, nice);
- if (err)
- count = err;
-
- put_task_struct(p);
-
- return count;
-}
-
-static int sched_autogroup_open(struct inode *inode, struct file *filp)
-{
- int ret;
-
- ret = single_open(filp, sched_autogroup_show, NULL);
- if (!ret) {
- struct seq_file *m = filp->private_data;
-
- m->private = inode;
- }
- return ret;
-}
-
-static const struct file_operations proc_pid_sched_autogroup_operations = {
- .open = sched_autogroup_open,
- .read = seq_read,
- .write = sched_autogroup_write,
- .llseek = seq_lseek,
- .release = single_release,
-};
-
-#endif /* CONFIG_SCHED_AUTOGROUP */
-
static ssize_t comm_write(struct file *file, const char __user *buf,
size_t count, loff_t *offset)
{
@@ -2550,9 +2475,6 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
-#ifdef CONFIG_SCHED_AUTOGROUP
- REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
-#endif
REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
INF("syscall", S_IRUGO, proc_pid_syscall),
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -110,6 +110,9 @@ static inline struct autogroup *autogrou

bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
{
+ if (!sysctl_sched_autogroup_enabled)
+ return false;
+
if (tg != &root_task_group)
return false;

@@ -143,15 +146,11 @@ autogroup_move_group(struct task_struct

p->signal->autogroup = autogroup_kref_get(ag);

- if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
- goto out;
-
t = p;
do {
sched_move_task(t);
} while_each_thread(p, t);

-out:
unlock_task_sighand(p, &flags);
autogroup_kref_put(prev);
}
@@ -159,8 +158,11 @@ autogroup_move_group(struct task_struct
/* Allocates GFP_KERNEL, cannot be called under any spinlock */
void sched_autogroup_create_attach(struct task_struct *p)
{
- struct autogroup *ag = autogroup_create();
+ struct autogroup *ag;

+ if (!sysctl_sched_autogroup_enabled)
+ return;
+ ag = autogroup_create();
autogroup_move_group(p, ag);
/* drop extra reference added by autogroup_create() */
autogroup_kref_put(ag);
@@ -176,11 +178,15 @@ EXPORT_SYMBOL(sched_autogroup_detach);

void sched_autogroup_fork(struct signal_struct *sig)
{
+ if (!sysctl_sched_autogroup_enabled)
+ return;
sig->autogroup = autogroup_task_get(current);
}

void sched_autogroup_exit(struct signal_struct *sig)
{
+ if (!sysctl_sched_autogroup_enabled)
+ return;
autogroup_kref_put(sig->autogroup);
}

@@ -193,58 +199,6 @@ static int __init setup_autogroup(char *

__setup("noautogroup", setup_autogroup);

-#ifdef CONFIG_PROC_FS
-
-int proc_sched_autogroup_set_nice(struct task_struct *p, int nice)
-{
- static unsigned long next = INITIAL_JIFFIES;
- struct autogroup *ag;
- int err;
-
- if (nice < -20 || nice > 19)
- return -EINVAL;
-
- err = security_task_setnice(current, nice);
- if (err)
- return err;
-
- if (nice < 0 && !can_nice(current, nice))
- return -EPERM;
-
- /* this is a heavy operation taking global locks.. */
- if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
- return -EAGAIN;
-
- next = HZ / 10 + jiffies;
- ag = autogroup_task_get(p);
-
- down_write(&ag->lock);
- err = sched_group_set_shares(ag->tg, prio_to_weight[nice + 20]);
- if (!err)
- ag->nice = nice;
- up_write(&ag->lock);
-
- autogroup_kref_put(ag);
-
- return err;
-}
-
-void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
-{
- struct autogroup *ag = autogroup_task_get(p);
-
- if (!task_group_is_autogroup(ag->tg))
- goto out;
-
- down_read(&ag->lock);
- seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
- up_read(&ag->lock);
-
-out:
- autogroup_kref_put(ag);
-}
-#endif /* CONFIG_PROC_FS */
-
#ifdef CONFIG_SCHED_DEBUG
int autogroup_path(struct task_group *tg, char *buf, int buflen)
{
--- a/kernel/sched/auto_group.h
+++ b/kernel/sched/auto_group.h
@@ -4,11 +4,6 @@
#include <linux/rwsem.h>

struct autogroup {
- /*
- * reference doesn't mean how many thread attach to this
- * autogroup now. It just stands for the number of task
- * could use this autogroup.
- */
struct kref kref;
struct task_group *tg;
struct rw_semaphore lock;
@@ -29,9 +24,7 @@ extern bool task_wants_autogroup(struct
static inline struct task_group *
autogroup_task_group(struct task_struct *p, struct task_group *tg)
{
- int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
-
- if (enabled && task_wants_autogroup(p, tg))
+ if (task_wants_autogroup(p, tg))
return p->signal->autogroup->tg;

return tg;
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -367,10 +367,8 @@ static struct ctl_table kern_table[] = {
.procname = "sched_autogroup_enabled",
.data = &sysctl_sched_autogroup_enabled,
.maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &zero,
- .extra2 = &one,
+ .mode = 0444,
+ .proc_handler = proc_dointvec,
},
#endif
#ifdef CONFIG_CFS_BANDWIDTH

2012-10-29 02:42:41

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [PATCH] V2 sched, autogroup: fix crash on reboot when autogroup is disabled

On Mon, Oct 29, 2012 at 3:19 AM, Mike Galbraith <[email protected]> wrote:
> On Sun, 2012-10-28 at 15:05 +0100, Ingo Molnar wrote:
>> * Mike Galbraith <[email protected]> wrote:
>>
>> > On Sun, 2012-10-28 at 14:19 +0100, Ingo Molnar wrote:
>> > > * Mike Galbraith <[email protected]> wrote:
>> > >
>> > > > On Sun, 2012-10-28 at 11:25 +0100, Ingo Molnar wrote:
>> > > > > * Mike Galbraith <[email protected]> wrote:
>> > > > >
>> > > >
>> > > > > > No knobs, no glitz, nada, just a cute little thing folks can turn
>> > > > > > on if they don't want to muck about with cgroups and/or systemd.
>> > > > >
>> > > > > Please also keep the Kconfig switch and reuse it to turn on
>> > > > > the 'autogroups' knob.
>> > > > >
>> > > > > That way people with existing .config's don't have to change
>> > > > > a thing to get this functionality.
>> > > >
>> > > > The Kconfig option is still there. The noautogroup ->
>> > > > autogroup arg change just makes it off by default (since an
>> > > > on/off switch would have to be a full move everybody thing
>> > > > post 8323f26ce race fix), so distros can make it available in
>> > > > their swiss army knife config, but it'll be out of the way
>> > > > unless specifically asked for by the user at boot.
>> > > >
>> > > > I can make it default 'on' by removing that arg change if you
>> > > > think that's the better way to go, but opt in at boot sounded
>> > > > better to me given there is no runtime on/off switch at all
>> > > > now.
>> > >
>> > > If I got your patch right then adding a command line option to
>> > > turn it on will disable it in essence for pretty much everyone
>> > > who has CONFIG_SCHED_AUTOGROUP=y in their .config today.
>> >
>> > With no user intervention, yes.
>>
>> 'No user intervention' is what happens with new kernel
>> commandline options, in 99.9999% of the cases.
>>
>> > > The patch should not change the defaults for existing
>> > > .config's.
>> > >
>> > > I.e. if autogroups was off, it should stay off, but if
>> > > autogroups was enabled in the .config and the kernel booted
>> > > with it enabled, then it should continue to do so in the
>> > > future as well.
>> > >
>> > > Adding a boot tweak and removing the runtime knobs is OK -
>> > > changing the current default functionality is not.
>> >
>> > Ok, I'll whack the arg change and respin.
>>
>> Thanks!
>>
>> I'd also suggest to still expose the state of autosched in
>> /proc/sys, read-only, so that its status can be checked.
>
> Done. Autogroup remains default on with noautogroup opt out at boot
> time as before. Sysctl remains intact, read only. Knobs are gone.
>
> sched, autogroup: fix crash on reboot when autogroup is disabled
>
> Between 8323f26ce and 800d4d30, autogroup is a wreck. With both
> applied, all you have to do to crash a box is disable autogroup
> during boot up, then reboot.. boom, NULL pointer dereference due
> to 800d4d30 not allowing autogroup to move things, and 8323f26ce
> making that the only way to switch runqueues.
>
> Remove all of the knobs, as what wasn't broken would be by making
> autogroup exclusively either on or off from boot, with off meaning
> autogroups are not created, so unavailable for proc interface.
>

I'm okay with the removal of runtime enable/disable autogroup. But can
we simply remove
these two knobs that is already exposed to user space since 2.6.38 ?
So we can't cat /proc/<pid>/autogroup
or echo <nice level> > /proc/<pid>/autogroup anymore even if autogroup is on?


> If the user fiddles with cgroups hereafter, tough, once tasks are
> moved, autogroup won't mess with them again unless they call setsid().
>
> No knobs, no glitz, nada, just a cute little thing folks can turn
> on if they don't want to muck about with cgroups and/or systemd.
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Cc: [email protected] v3.6
>
> ---
> fs/proc/base.c | 78 ----------------------------------------------
> kernel/sched/auto_group.c | 68 ++++++----------------------------------
> kernel/sched/auto_group.h | 9 -----
> kernel/sysctl.c | 6 +--
> 4 files changed, 14 insertions(+), 147 deletions(-)
>
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1165,81 +1165,6 @@ static const struct file_operations proc
>
> #endif
>
> -#ifdef CONFIG_SCHED_AUTOGROUP
> -/*
> - * Print out autogroup related information:
> - */
> -static int sched_autogroup_show(struct seq_file *m, void *v)
> -{
> - struct inode *inode = m->private;
> - struct task_struct *p;
> -
> - p = get_proc_task(inode);
> - if (!p)
> - return -ESRCH;
> - proc_sched_autogroup_show_task(p, m);
> -
> - put_task_struct(p);
> -
> - return 0;
> -}
> -
> -static ssize_t
> -sched_autogroup_write(struct file *file, const char __user *buf,
> - size_t count, loff_t *offset)
> -{
> - struct inode *inode = file->f_path.dentry->d_inode;
> - struct task_struct *p;
> - char buffer[PROC_NUMBUF];
> - int nice;
> - int err;
> -
> - memset(buffer, 0, sizeof(buffer));
> - if (count > sizeof(buffer) - 1)
> - count = sizeof(buffer) - 1;
> - if (copy_from_user(buffer, buf, count))
> - return -EFAULT;
> -
> - err = kstrtoint(strstrip(buffer), 0, &nice);
> - if (err < 0)
> - return err;
> -
> - p = get_proc_task(inode);
> - if (!p)
> - return -ESRCH;
> -
> - err = proc_sched_autogroup_set_nice(p, nice);
> - if (err)
> - count = err;
> -
> - put_task_struct(p);
> -
> - return count;
> -}
> -
> -static int sched_autogroup_open(struct inode *inode, struct file *filp)
> -{
> - int ret;
> -
> - ret = single_open(filp, sched_autogroup_show, NULL);
> - if (!ret) {
> - struct seq_file *m = filp->private_data;
> -
> - m->private = inode;
> - }
> - return ret;
> -}
> -
> -static const struct file_operations proc_pid_sched_autogroup_operations = {
> - .open = sched_autogroup_open,
> - .read = seq_read,
> - .write = sched_autogroup_write,
> - .llseek = seq_lseek,
> - .release = single_release,
> -};
> -
> -#endif /* CONFIG_SCHED_AUTOGROUP */
> -
> static ssize_t comm_write(struct file *file, const char __user *buf,
> size_t count, loff_t *offset)
> {
> @@ -2550,9 +2475,6 @@ static const struct pid_entry tgid_base_
> #ifdef CONFIG_SCHED_DEBUG
> REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
> #endif
> -#ifdef CONFIG_SCHED_AUTOGROUP
> - REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
> -#endif
> REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
> #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
> INF("syscall", S_IRUGO, proc_pid_syscall),
> --- a/kernel/sched/auto_group.c
> +++ b/kernel/sched/auto_group.c
> @@ -110,6 +110,9 @@ static inline struct autogroup *autogrou
>
> bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
> {
> + if (!sysctl_sched_autogroup_enabled)
> + return false;
> +
> if (tg != &root_task_group)
> return false;
>
> @@ -143,15 +146,11 @@ autogroup_move_group(struct task_struct
>
> p->signal->autogroup = autogroup_kref_get(ag);
>
> - if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
> - goto out;
> -
> t = p;
> do {
> sched_move_task(t);
> } while_each_thread(p, t);
>
> -out:
> unlock_task_sighand(p, &flags);
> autogroup_kref_put(prev);
> }
> @@ -159,8 +158,11 @@ autogroup_move_group(struct task_struct
> /* Allocates GFP_KERNEL, cannot be called under any spinlock */
> void sched_autogroup_create_attach(struct task_struct *p)
> {
> - struct autogroup *ag = autogroup_create();
> + struct autogroup *ag;
>
> + if (!sysctl_sched_autogroup_enabled)
> + return;
> + ag = autogroup_create();
> autogroup_move_group(p, ag);
> /* drop extra reference added by autogroup_create() */
> autogroup_kref_put(ag);
> @@ -176,11 +178,15 @@ EXPORT_SYMBOL(sched_autogroup_detach);
>
> void sched_autogroup_fork(struct signal_struct *sig)
> {
> + if (!sysctl_sched_autogroup_enabled)
> + return;
> sig->autogroup = autogroup_task_get(current);
> }
>
> void sched_autogroup_exit(struct signal_struct *sig)
> {
> + if (!sysctl_sched_autogroup_enabled)
> + return;
> autogroup_kref_put(sig->autogroup);
> }
>
> @@ -193,58 +199,6 @@ static int __init setup_autogroup(char *
>
> __setup("noautogroup", setup_autogroup);
>
> -#ifdef CONFIG_PROC_FS
> -
> -int proc_sched_autogroup_set_nice(struct task_struct *p, int nice)
> -{
> - static unsigned long next = INITIAL_JIFFIES;
> - struct autogroup *ag;
> - int err;
> -
> - if (nice < -20 || nice > 19)
> - return -EINVAL;
> -
> - err = security_task_setnice(current, nice);
> - if (err)
> - return err;
> -
> - if (nice < 0 && !can_nice(current, nice))
> - return -EPERM;
> -
> - /* this is a heavy operation taking global locks.. */
> - if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
> - return -EAGAIN;
> -
> - next = HZ / 10 + jiffies;
> - ag = autogroup_task_get(p);
> -
> - down_write(&ag->lock);
> - err = sched_group_set_shares(ag->tg, prio_to_weight[nice + 20]);
> - if (!err)
> - ag->nice = nice;
> - up_write(&ag->lock);
> -
> - autogroup_kref_put(ag);
> -
> - return err;
> -}
> -
> -void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
> -{
> - struct autogroup *ag = autogroup_task_get(p);
> -
> - if (!task_group_is_autogroup(ag->tg))
> - goto out;
> -
> - down_read(&ag->lock);
> - seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
> - up_read(&ag->lock);
> -
> -out:
> - autogroup_kref_put(ag);
> -}
> -#endif /* CONFIG_PROC_FS */
> -
> #ifdef CONFIG_SCHED_DEBUG
> int autogroup_path(struct task_group *tg, char *buf, int buflen)
> {
> --- a/kernel/sched/auto_group.h
> +++ b/kernel/sched/auto_group.h
> @@ -4,11 +4,6 @@
> #include <linux/rwsem.h>
>
> struct autogroup {
> - /*
> - * reference doesn't mean how many thread attach to this
> - * autogroup now. It just stands for the number of task
> - * could use this autogroup.
> - */
> struct kref kref;
> struct task_group *tg;
> struct rw_semaphore lock;
> @@ -29,9 +24,7 @@ extern bool task_wants_autogroup(struct
> static inline struct task_group *
> autogroup_task_group(struct task_struct *p, struct task_group *tg)
> {
> - int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
> -
> - if (enabled && task_wants_autogroup(p, tg))
> + if (task_wants_autogroup(p, tg))
> return p->signal->autogroup->tg;
>
> return tg;
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -367,10 +367,8 @@ static struct ctl_table kern_table[] = {
> .procname = "sched_autogroup_enabled",
> .data = &sysctl_sched_autogroup_enabled,
> .maxlen = sizeof(unsigned int),
> - .mode = 0644,
> - .proc_handler = proc_dointvec_minmax,
> - .extra1 = &zero,
> - .extra2 = &one,
> + .mode = 0444,
> + .proc_handler = proc_dointvec,
> },
> #endif
> #ifdef CONFIG_CFS_BANDWIDTH
>
>

2012-10-29 12:09:48

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] V2 sched, autogroup: fix crash on reboot when autogroup is disabled

On Mon, 2012-10-29 at 10:42 +0800, Xiaotian Feng wrote:

> I'm okay with the removal of runtime enable/disable autogroup. But can
> we simply remove
> these two knobs that is already exposed to user space since 2.6.38 ?
> So we can't cat /proc/<pid>/autogroup
> or echo <nice level> > /proc/<pid>/autogroup anymore even if autogroup is on?

I should have never made those knobs, as it's against the whole idea of
automatic no muss no fuss group scheduling. If you want knobs, you want
full cgroups, so I think we should call this a fix opportunity.

If the knobs are seen as being cast in stone, I'll do the full move for
the on/off switch instead, and there will be no user visible change. I'd
rather not do that. This way, it's really out of the way when disabled,
smaller and simpler is better for this feature.

It's a dinky little sidelines feature, so let's just quietly make kaboom
go away, non-functional on/off go away, along with the silly knobs which
will make kaboom unless you jump through hoops just to keep them alive.

-Mike

2012-10-30 12:30:21

by Mike Galbraith

[permalink] [raw]
Subject: [tip:sched/urgent] sched/autogroup: Fix crash on reboot when autogroup is disabled

Commit-ID: 5258f386ea4e8454bc801fb443e8a4217da1947c
Gitweb: http://git.kernel.org/tip/5258f386ea4e8454bc801fb443e8a4217da1947c
Author: Mike Galbraith <[email protected]>
AuthorDate: Sun, 28 Oct 2012 12:19:23 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 30 Oct 2012 10:26:04 +0100

sched/autogroup: Fix crash on reboot when autogroup is disabled

Due to these two commits:

8323f26ce342 sched: Fix race in task_group()
800d4d30c8f2 sched, autogroup: Stop going ahead if autogroup is disabled

... autogroup scheduling's dynamic knobs are wrecked.

With both patches applied, all you have to do to crash a box is
disable autogroup during boot up, then reboot.. boom, NULL pointer
dereference due to 800d4d30 not allowing autogroup to move things,
and 8323f26ce making that the only way to switch runqueues.

Remove most of the (dysfunctional) knobs and turn the remaining
sched_autogroup_enabled knob readonly.

If the user fiddles with cgroups hereafter, once tasks
are moved, autogroup won't mess with them again unless
they call setsid().

No knobs, no glitz, nada, just a cute little thing folks can
turn on if they don't want to muck about with cgroups and/or
systemd.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Xiaotian Feng <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Xiaotian Feng <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: <[email protected]> # v3.6
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
fs/proc/base.c | 78 ---------------------------------------------
kernel/sched/auto_group.c | 68 ++++++---------------------------------
kernel/sched/auto_group.h | 9 +-----
kernel/sysctl.c | 6 +--
4 files changed, 14 insertions(+), 147 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b6c84c..bb1d962 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1271,81 +1271,6 @@ static const struct file_operations proc_pid_sched_operations = {

#endif

-#ifdef CONFIG_SCHED_AUTOGROUP
-/*
- * Print out autogroup related information:
- */
-static int sched_autogroup_show(struct seq_file *m, void *v)
-{
- struct inode *inode = m->private;
- struct task_struct *p;
-
- p = get_proc_task(inode);
- if (!p)
- return -ESRCH;
- proc_sched_autogroup_show_task(p, m);
-
- put_task_struct(p);
-
- return 0;
-}
-
-static ssize_t
-sched_autogroup_write(struct file *file, const char __user *buf,
- size_t count, loff_t *offset)
-{
- struct inode *inode = file->f_path.dentry->d_inode;
- struct task_struct *p;
- char buffer[PROC_NUMBUF];
- int nice;
- int err;
-
- memset(buffer, 0, sizeof(buffer));
- if (count > sizeof(buffer) - 1)
- count = sizeof(buffer) - 1;
- if (copy_from_user(buffer, buf, count))
- return -EFAULT;
-
- err = kstrtoint(strstrip(buffer), 0, &nice);
- if (err < 0)
- return err;
-
- p = get_proc_task(inode);
- if (!p)
- return -ESRCH;
-
- err = proc_sched_autogroup_set_nice(p, nice);
- if (err)
- count = err;
-
- put_task_struct(p);
-
- return count;
-}
-
-static int sched_autogroup_open(struct inode *inode, struct file *filp)
-{
- int ret;
-
- ret = single_open(filp, sched_autogroup_show, NULL);
- if (!ret) {
- struct seq_file *m = filp->private_data;
-
- m->private = inode;
- }
- return ret;
-}
-
-static const struct file_operations proc_pid_sched_autogroup_operations = {
- .open = sched_autogroup_open,
- .read = seq_read,
- .write = sched_autogroup_write,
- .llseek = seq_lseek,
- .release = single_release,
-};
-
-#endif /* CONFIG_SCHED_AUTOGROUP */
-
static ssize_t comm_write(struct file *file, const char __user *buf,
size_t count, loff_t *offset)
{
@@ -3036,9 +2961,6 @@ static const struct pid_entry tgid_base_stuff[] = {
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
-#ifdef CONFIG_SCHED_AUTOGROUP
- REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
-#endif
REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
INF("syscall", S_IRUGO, proc_pid_syscall),
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index 0984a21..0f1bacb 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -110,6 +110,9 @@ out_fail:

bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
{
+ if (!sysctl_sched_autogroup_enabled)
+ return false;
+
if (tg != &root_task_group)
return false;

@@ -143,15 +146,11 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)

p->signal->autogroup = autogroup_kref_get(ag);

- if (!ACCESS_ONCE(sysctl_sched_autogroup_enabled))
- goto out;
-
t = p;
do {
sched_move_task(t);
} while_each_thread(p, t);

-out:
unlock_task_sighand(p, &flags);
autogroup_kref_put(prev);
}
@@ -159,8 +158,11 @@ out:
/* Allocates GFP_KERNEL, cannot be called under any spinlock */
void sched_autogroup_create_attach(struct task_struct *p)
{
- struct autogroup *ag = autogroup_create();
+ struct autogroup *ag;

+ if (!sysctl_sched_autogroup_enabled)
+ return;
+ ag = autogroup_create();
autogroup_move_group(p, ag);
/* drop extra reference added by autogroup_create() */
autogroup_kref_put(ag);
@@ -176,11 +178,15 @@ EXPORT_SYMBOL(sched_autogroup_detach);

void sched_autogroup_fork(struct signal_struct *sig)
{
+ if (!sysctl_sched_autogroup_enabled)
+ return;
sig->autogroup = autogroup_task_get(current);
}

void sched_autogroup_exit(struct signal_struct *sig)
{
+ if (!sysctl_sched_autogroup_enabled)
+ return;
autogroup_kref_put(sig->autogroup);
}

@@ -193,58 +199,6 @@ static int __init setup_autogroup(char *str)

__setup("noautogroup", setup_autogroup);

-#ifdef CONFIG_PROC_FS
-
-int proc_sched_autogroup_set_nice(struct task_struct *p, int nice)
-{
- static unsigned long next = INITIAL_JIFFIES;
- struct autogroup *ag;
- int err;
-
- if (nice < -20 || nice > 19)
- return -EINVAL;
-
- err = security_task_setnice(current, nice);
- if (err)
- return err;
-
- if (nice < 0 && !can_nice(current, nice))
- return -EPERM;
-
- /* this is a heavy operation taking global locks.. */
- if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
- return -EAGAIN;
-
- next = HZ / 10 + jiffies;
- ag = autogroup_task_get(p);
-
- down_write(&ag->lock);
- err = sched_group_set_shares(ag->tg, prio_to_weight[nice + 20]);
- if (!err)
- ag->nice = nice;
- up_write(&ag->lock);
-
- autogroup_kref_put(ag);
-
- return err;
-}
-
-void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
-{
- struct autogroup *ag = autogroup_task_get(p);
-
- if (!task_group_is_autogroup(ag->tg))
- goto out;
-
- down_read(&ag->lock);
- seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
- up_read(&ag->lock);
-
-out:
- autogroup_kref_put(ag);
-}
-#endif /* CONFIG_PROC_FS */
-
#ifdef CONFIG_SCHED_DEBUG
int autogroup_path(struct task_group *tg, char *buf, int buflen)
{
diff --git a/kernel/sched/auto_group.h b/kernel/sched/auto_group.h
index 8bd0471..4552c6b 100644
--- a/kernel/sched/auto_group.h
+++ b/kernel/sched/auto_group.h
@@ -4,11 +4,6 @@
#include <linux/rwsem.h>

struct autogroup {
- /*
- * reference doesn't mean how many thread attach to this
- * autogroup now. It just stands for the number of task
- * could use this autogroup.
- */
struct kref kref;
struct task_group *tg;
struct rw_semaphore lock;
@@ -29,9 +24,7 @@ extern bool task_wants_autogroup(struct task_struct *p, struct task_group *tg);
static inline struct task_group *
autogroup_task_group(struct task_struct *p, struct task_group *tg)
{
- int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
-
- if (enabled && task_wants_autogroup(p, tg))
+ if (task_wants_autogroup(p, tg))
return p->signal->autogroup->tg;

return tg;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 81c7b1a..2914d0f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -363,10 +363,8 @@ static struct ctl_table kern_table[] = {
.procname = "sched_autogroup_enabled",
.data = &sysctl_sched_autogroup_enabled,
.maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &zero,
- .extra2 = &one,
+ .mode = 0444,
+ .proc_handler = proc_dointvec,
},
#endif
#ifdef CONFIG_CFS_BANDWIDTH