2014-01-15 08:27:39

by Daniel Lezcano

[permalink] [raw]
Subject: [BUG] [ tip/sched/core ] System unresponsive after booting


Hi all,

I use the tip/sched/core branch.

After git pulling yesterday, my host is unresponsive after booting the OS.

* It boots normally
* It sends info to the console
* The graphics does not work
* The terminals show the prompt, I can enter the username but after
pressing enter, it does not give the password prompt
* sysrq works more or less, I can't get the process stack but it
receives the command

It is like no new process can be created.

I have a dual Xeon processor E5325 (2 x 4 cores).

After git bisecting, the following patch seems to introduce the bug.

commit d50dde5a10f305253cbc3855307f608f8a3c5f73
Author: Dario Faggioli <[email protected]>
Date: Thu Nov 7 14:43:36 2013 +0100

sched: Add new scheduler syscalls to support an extended scheduling
parameters ABI

Add the syscalls needed for supporting scheduling algorithms
with extended scheduling parameters (e.g., SCHED_DEADLINE).


[ ... ]


Signed-off-by: Dario Faggioli <[email protected]>
[ Rewrote to use sched_attr. ]
Signed-off-by: Juri Lelli <[email protected]>
[ Removed sched_setscheduler2() for now. ]
Signed-off-by: Peter Zijlstra <[email protected]>
Link:
http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2014-01-15 09:22:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting


* Daniel Lezcano <[email protected]> wrote:

>
> Hi all,
>
> I use the tip/sched/core branch.
>
> After git pulling yesterday, my host is unresponsive after booting the OS.
>
> * It boots normally
> * It sends info to the console
> * The graphics does not work
> * The terminals show the prompt, I can enter the username but after
> pressing enter, it does not give the password prompt
> * sysrq works more or less, I can't get the process stack but it
> receives the command
>
> It is like no new process can be created.
>
> I have a dual Xeon processor E5325 (2 x 4 cores).
>
> After git bisecting, the following patch seems to introduce the bug.
>
> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
> Author: Dario Faggioli <[email protected]>
> Date: Thu Nov 7 14:43:36 2013 +0100
>
> sched: Add new scheduler syscalls to support an extended
> scheduling parameters ABI
>
> Add the syscalls needed for supporting scheduling algorithms
> with extended scheduling parameters (e.g., SCHED_DEADLINE).
>
>
> [ ... ]
>
>
> Signed-off-by: Dario Faggioli <[email protected]>
> [ Rewrote to use sched_attr. ]
> Signed-off-by: Juri Lelli <[email protected]>
> [ Removed sched_setscheduler2() for now. ]
> Signed-off-by: Peter Zijlstra <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>

I checked this patch again, and noticed a few oddities:

1)

There's this change to __setscheduler():

-__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
+/* Actually do priority change: must hold pi & rq lock. */
+static void __setscheduler(struct rq *rq, struct task_struct *p,
+ const struct sched_attr *attr)
{
+ int policy = attr->sched_policy;
+
p->policy = policy;
- p->rt_priority = prio;
+
+ if (rt_policy(policy))
+ p->rt_priority = attr->sched_priority;
+ else
+ p->static_prio = NICE_TO_PRIO(attr->sched_nice);
+


doesnt this change keep p->rt_priority uninitialized in the
normalize_task() case?

I.e. rt_priority should still be set unconditionally. In the
SCHED_NORMAL case that will be a zero initialization.

2)

It's not clear why this change to __setscheduler() was done:

/*
* Allow unprivileged RT tasks to decrease priority:
*/
if (user && !capable(CAP_SYS_NICE)) {
+ if (fair_policy(policy)) {
+ if (!can_nice(p, attr->sched_nice))
+ return -EPERM;
+ }
+
if (rt_policy(policy)) {


3)

On ARM:

-#define __NR_syscalls (380)
+#define __NR_syscalls (384)

but:

#define __NR_finit_module (__NR_SYSCALL_BASE+379)
+#define __NR_sched_setattr (__NR_SYSCALL_BASE+380)
+#define __NR_sched_getattr (__NR_SYSCALL_BASE+381)

/*


Why is the syscall table increased by 4, while we only add 2 new
syscalls?

4)

In hindsight this patch should have been 3 patches or so:

- one that just mechanically extends __setscheduler() with an 'attr'
way to pass parameters

- one that adds whatever other desired changes to __setscheduler(),
with an explanation.

- one that adds the new syscalls.

Which would ease the debugging of such bugs.

Thanks,

Ingo

2014-01-15 09:32:46

by Michael wang

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

Hi, Daniel

On 01/15/2014 04:27 PM, Daniel Lezcano wrote:
[snip]
> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
> Author: Dario Faggioli <[email protected]>
> Date: Thu Nov 7 14:43:36 2013 +0100
>
> sched: Add new scheduler syscalls to support an extended scheduling
> parameters ABI
>
> Add the syscalls needed for supporting scheduling algorithms
> with extended scheduling parameters (e.g., SCHED_DEADLINE).

Will this do any helps?

Regards,
Michael Wang

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0326c06..bf4a6ed 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3464,6 +3464,10 @@ int sched_setscheduler(struct task_struct *p, int
policy,
.sched_policy = policy,
.sched_priority = param->sched_priority
};
+
+ if (fair_policy(policy))
+ attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
+
return __sched_setscheduler(p, &attr, true);
}
EXPORT_SYMBOL_GPL(sched_setscheduler);
@@ -3494,6 +3498,10 @@ int sched_setscheduler_nocheck(struct task_struct
*p, int policy,
.sched_policy = policy,
.sched_priority = param->sched_priority
};
+
+ if (fair_policy(policy))
+ attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
+
return __sched_setscheduler(p, &attr, false);
}



>
>
> [ ... ]
>
>
> Signed-off-by: Dario Faggioli <[email protected]>
> [ Rewrote to use sched_attr. ]
> Signed-off-by: Juri Lelli <[email protected]>
> [ Removed sched_setscheduler2() for now. ]
> Signed-off-by: Peter Zijlstra <[email protected]>
> Link:
> http://lkml.kernel.org/r/[email protected]
>
> Signed-off-by: Ingo Molnar <[email protected]>
>

2014-01-15 10:40:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Wed, Jan 15, 2014 at 10:22:45AM +0100, Ingo Molnar wrote:
> 3)
>
> On ARM:
>
> -#define __NR_syscalls (380)
> +#define __NR_syscalls (384)
>
> but:
>
> #define __NR_finit_module (__NR_SYSCALL_BASE+379)
> +#define __NR_sched_setattr (__NR_SYSCALL_BASE+380)
> +#define __NR_sched_getattr (__NR_SYSCALL_BASE+381)
>
> /*
>
>
> Why is the syscall table increased by 4, while we only add 2 new
> syscalls?

arch/arm/kernel/entry-common.S:.ifne NR_syscalls - __NR_syscalls
arch/arm/kernel/calls.S:.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls

Totally confusing, I know.

2014-01-15 11:00:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Wed, Jan 15, 2014 at 10:22:45AM +0100, Ingo Molnar wrote:
> 1)
>
> There's this change to __setscheduler():
>
> -__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
> +/* Actually do priority change: must hold pi & rq lock. */
> +static void __setscheduler(struct rq *rq, struct task_struct *p,
> + const struct sched_attr *attr)
> {
> + int policy = attr->sched_policy;
> +
> p->policy = policy;
> - p->rt_priority = prio;
> +
> + if (rt_policy(policy))
> + p->rt_priority = attr->sched_priority;
> + else
> + p->static_prio = NICE_TO_PRIO(attr->sched_nice);
> +
>
>
> doesnt this change keep p->rt_priority uninitialized in the
> normalize_task() case?
>
> I.e. rt_priority should still be set unconditionally. In the
> SCHED_NORMAL case that will be a zero initialization.

But why? SCHED_NORMAL doesn't care about rt_priority afaict. And once it
gets priority boosted we'll set the rt_priority again to whatever is
propagated from the pi-chain.

> 2)
>
> It's not clear why this change to __setscheduler() was done:
>
> /*
> * Allow unprivileged RT tasks to decrease priority:
> */
> if (user && !capable(CAP_SYS_NICE)) {
> + if (fair_policy(policy)) {
> + if (!can_nice(p, attr->sched_nice))
> + return -EPERM;
> + }
> +
> if (rt_policy(policy)) {o

Bah, I'm pretty sure I wrote that :/ And I can't for the life of me
remember why I did that. Complete fail there.

2014-01-15 11:30:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Wed, Jan 15, 2014 at 05:25:55PM +0800, Michael wang wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0326c06..bf4a6ed 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3464,6 +3464,10 @@ int sched_setscheduler(struct task_struct *p, int
> policy,
> .sched_policy = policy,
> .sched_priority = param->sched_priority
> };
> +
> + if (fair_policy(policy))
> + attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
> +
> return __sched_setscheduler(p, &attr, true);
> }
> EXPORT_SYMBOL_GPL(sched_setscheduler);
> @@ -3494,6 +3498,10 @@ int sched_setscheduler_nocheck(struct task_struct
> *p, int policy,
> .sched_policy = policy,
> .sched_priority = param->sched_priority
> };
> +
> + if (fair_policy(policy))
> + attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
> +
> return __sched_setscheduler(p, &attr, false);
> }

That seems wrong; the manpage confirms, from sched_setscheduler(2):

For processes scheduled under one of the normal scheduling
policies (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority
is not used in scheduling decisions (it must be speci‐ fied as
0).

That said; we should probably preserve the nice value, which with the
current code we reset to 0.

---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0326c06953eb..f3abbb7dae62 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3462,7 +3462,8 @@ int sched_setscheduler(struct task_struct *p, int policy,
{
struct sched_attr attr = {
.sched_policy = policy,
- .sched_priority = param->sched_priority
+ .sched_priority = param->sched_priority,
+ .sched_nice = PRIO_TO_NICE(p->static_prio),
};
return __sched_setscheduler(p, &attr, true);
}
@@ -3492,7 +3493,8 @@ int sched_setscheduler_nocheck(struct task_struct *p, int policy,
{
struct sched_attr attr = {
.sched_policy = policy,
- .sched_priority = param->sched_priority
+ .sched_priority = param->sched_priority,
+ .sched_nice = PRIO_TO_NICE(p->static_prio),
};
return __sched_setscheduler(p, &attr, false);
}

2014-01-15 12:04:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>
> Hi all,
>
> I use the tip/sched/core branch.
>
> After git pulling yesterday, my host is unresponsive after booting the OS.
>
> * It boots normally
> * It sends info to the console
> * The graphics does not work
> * The terminals show the prompt, I can enter the username but after
> pressing enter, it does not give the password prompt
> * sysrq works more or less, I can't get the process stack but it receives
> the command
>
> It is like no new process can be created.
>
> I have a dual Xeon processor E5325 (2 x 4 cores).
>
> After git bisecting, the following patch seems to introduce the bug.
>
> commit d50dde5a10f305253cbc3855307f608f8a3c5f73

OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
if graphics works, but I can ssh in and work on it without bother.

I can even log in on the serial console without problems.

I tried both tip/master and tip/sched/core.

Would you happen to have a .config for me to try?

2014-01-15 12:24:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting


* Peter Zijlstra <[email protected]> wrote:

> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
> >
> > Hi all,
> >
> > I use the tip/sched/core branch.
> >
> > After git pulling yesterday, my host is unresponsive after booting the OS.
> >
> > * It boots normally
> > * It sends info to the console
> > * The graphics does not work
> > * The terminals show the prompt, I can enter the username but after
> > pressing enter, it does not give the password prompt
> > * sysrq works more or less, I can't get the process stack but it receives
> > the command
> >
> > It is like no new process can be created.
> >
> > I have a dual Xeon processor E5325 (2 x 4 cores).
> >
> > After git bisecting, the following patch seems to introduce the bug.
> >
> > commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.

I too booted various sytems and have yet to notice similar oddity.

Thanks,

Ingo

2014-01-15 13:09:51

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>
>> Hi all,
>>
>> I use the tip/sched/core branch.
>>
>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>
>> * It boots normally
>> * It sends info to the console
>> * The graphics does not work
>> * The terminals show the prompt, I can enter the username but after
>> pressing enter, it does not give the password prompt
>> * sysrq works more or less, I can't get the process stack but it receives
>> the command
>>
>> It is like no new process can be created.
>>
>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>
>> After git bisecting, the following patch seems to introduce the bug.
>>
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.
>
> Would you happen to have a .config for me to try?

Yes, sure.

http://pastebin.com/Hj4nEZHc


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-15 13:27:34

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/15/2014 10:25 AM, Michael wang wrote:
> Hi, Daniel
>
> On 01/15/2014 04:27 PM, Daniel Lezcano wrote:
> [snip]
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>> Author: Dario Faggioli <[email protected]>
>> Date: Thu Nov 7 14:43:36 2013 +0100
>>
>> sched: Add new scheduler syscalls to support an extended scheduling
>> parameters ABI
>>
>> Add the syscalls needed for supporting scheduling algorithms
>> with extended scheduling parameters (e.g., SCHED_DEADLINE).
>
> Will this do any helps?

Thanks for the patch but unfortunately it leads to a NULL pointer
dereference.

BUG: unable to handle kernel NULL pointer dereference at 00000216
IP: [<c129ede0>] cfq_prio_tree_add+0x50/0xc0
*pde = 00000000


modem-managOops: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 2348 Comm: rs:main Q:Reg Tainted: G W
3.13.0-rc7-00171-g130816c-dirty #436
Hardware name: /S2696, BIOS 6.00 11/07/2007
task: f525bf00 ti: f2294000 task.ti: f2294000
er[2344]: <info>EIP: 0060:[<c129ede0>] EFLAGS: 00010002 CPU: 3
Loaded plugin EIP is at cfq_prio_tree_add+0x50/0xc0
EAX: 00000202 EBX: f59eda18 ECX: 00000000 EDX: f515bfec
ESI: 00000000 EDI: 002ce530 EBP: f2295b38 ESP: f2295b2c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 00000216 CR3: 3218c000 CR4: 00000790
Stack:
f59eda18Option High-Spee f4cdf3b0 f511c000 f2295b78 c12a0b8b 00000010
00000001 f2295b6c
d

modem-ma c12a7477 00000000 f2304600 00000000 00000001
f4c08000nager[2344]: <info> Loaded plugin Wavecom
00000000 00000000
f4cdf3b0
modem-manager[2 f59eda18 00000000 f2295bd0 c12a14fb344]:
<info> Lo f2295b94 00000001aded plugin Huaw 00000000
Call Trace:
[<c12a0b8b>] cfq_add_rq_rb+0x8b/0x190
[<c12a7477>] ? radix_tree_insert+0x77/0x210
ei

modem-m [<c12a14fb>] cfq_insert_request+0x8b/0x6c0
anager[2344]: <i [<c12864f5>] __elv_add_request+0x175/0x320
nfo> Loaded plu [<c128d026>] blk_queue_bio+0x226/0x310
gin AnyData
[<c128b303>] generic_make_request+0xa3/0xd0

[<c128b381>] submit_bio+0x51/0x100
[<c118797e>] ? ext4_da_get_block_prep+0x8e/0x640
[<c1127ae0>] ? end_buffer_read_nobh+0x10/0x10
[<c112bc6b>] ? bio_alloc_bioset+0x7b/0x170
[<c1128ed7>] _submit_bh+0x1a7/0x200
[<c1128f3f>] submit_bh+0xf/0x20
[<c1128fab>] ll_rw_block+0x5b/0xa0
[<c112a265>] __block_write_begin+0x1a5/0x320
[<c118bc59>] ext4_da_write_begin+0x109/0x300
[<c11878f0>] ? ext4_readpage+0xc0/0xc0
[<c11b364f>] ? __ext4_journal_stop+0x5f/0x90
[<c105e595>] ? preempt_count_sub+0x45/0x50
[<c10ccc4c>] generic_file_buffered_write+0xdc/0x220
[<c10ccfaa>] __generic_file_aio_write+0x21a/0x470
[<c1769e55>] ? _raw_spin_unlock+0x15/0x30
[<c10cd25d>] generic_file_aio_write+0x5d/0xa0
[<c1182f8d>] ext4_file_write+0xbd/0x450
[<c110562d>] ? final_putname+0x1d/0x40
[<c110815f>] ? user_path_at_empty+0x4f/0x80
[<c12ae229>] ? _copy_to_user+0x39/0x50
[<c105e595>] ? preempt_count_sub+0x45/0x50
[<c10fd560>] do_sync_write+0x60/0xa0
[<c10fd759>] vfs_write+0x99/0x190
[<c10fd500>] ? vfs_read+0x150/0x150
[<c1115fd6>] ? fget_light+0x76/0xe0
[<c10fd947>] SyS_write+0x57/0xa0
[<c176a690>] syscall_call+0x7/0xb
Code: 66 83 7b 70 03 74 46 8b 43 30 85 c0 74 3f 0f b7 53 6e 31 c9 8d 54
96 3c 89 53 28 8b 78 3c 8b 70 40 8b 02 85 c0 74 3f 8d 74 26 00 <8b> 50
14 8b 4a 3c 8b 52 40 39 d6 72 53 77 21 39 cf 77 1d 39 d6
EIP: [<c129ede0>] cfq_prio_tree_add+0x50/0xc0 SS:ESP 0068:f2295b2c
CR2: 0000000000000216
---[ end trace c743276118479205 ]---


>
> Regards,
> Michael Wang
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0326c06..bf4a6ed 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3464,6 +3464,10 @@ int sched_setscheduler(struct task_struct *p, int
> policy,
> .sched_policy = policy,
> .sched_priority = param->sched_priority
> };
> +
> + if (fair_policy(policy))
> + attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
> +
> return __sched_setscheduler(p, &attr, true);
> }
> EXPORT_SYMBOL_GPL(sched_setscheduler);
> @@ -3494,6 +3498,10 @@ int sched_setscheduler_nocheck(struct task_struct
> *p, int policy,
> .sched_policy = policy,
> .sched_priority = param->sched_priority
> };
> +
> + if (fair_policy(policy))
> + attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
> +
> return __sched_setscheduler(p, &attr, false);
> }
>
>
>
>>
>>
>> [ ... ]
>>
>>
>> Signed-off-by: Dario Faggioli <[email protected]>
>> [ Rewrote to use sched_attr. ]
>> Signed-off-by: Juri Lelli <[email protected]>
>> [ Removed sched_setscheduler2() for now. ]
>> Signed-off-by: Peter Zijlstra <[email protected]>
>> Link:
>> http://lkml.kernel.org/r/[email protected]
>>
>> Signed-off-by: Ingo Molnar <[email protected]>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-15 13:28:10

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/15/2014 12:30 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 05:25:55PM +0800, Michael wang wrote:
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 0326c06..bf4a6ed 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3464,6 +3464,10 @@ int sched_setscheduler(struct task_struct *p, int
>> policy,
>> .sched_policy = policy,
>> .sched_priority = param->sched_priority
>> };
>> +
>> + if (fair_policy(policy))
>> + attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
>> +
>> return __sched_setscheduler(p, &attr, true);
>> }
>> EXPORT_SYMBOL_GPL(sched_setscheduler);
>> @@ -3494,6 +3498,10 @@ int sched_setscheduler_nocheck(struct task_struct
>> *p, int policy,
>> .sched_policy = policy,
>> .sched_priority = param->sched_priority
>> };
>> +
>> + if (fair_policy(policy))
>> + attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
>> +
>> return __sched_setscheduler(p, &attr, false);
>> }
>
> That seems wrong; the manpage confirms, from sched_setscheduler(2):
>
> For processes scheduled under one of the normal scheduling
> policies (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority
> is not used in scheduling decisions (it must be speci‐ fied as
> 0).
>
> That said; we should probably preserve the nice value, which with the
> current code we reset to 0.

I tried this patch also but it does not solve the issue.


> ---
> kernel/sched/core.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0326c06953eb..f3abbb7dae62 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3462,7 +3462,8 @@ int sched_setscheduler(struct task_struct *p, int policy,
> {
> struct sched_attr attr = {
> .sched_policy = policy,
> - .sched_priority = param->sched_priority
> + .sched_priority = param->sched_priority,
> + .sched_nice = PRIO_TO_NICE(p->static_prio),
> };
> return __sched_setscheduler(p, &attr, true);
> }
> @@ -3492,7 +3493,8 @@ int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> {
> struct sched_attr attr = {
> .sched_policy = policy,
> - .sched_priority = param->sched_priority
> + .sched_priority = param->sched_priority,
> + .sched_nice = PRIO_TO_NICE(p->static_prio),
> };
> return __sched_setscheduler(p, &attr, false);
> }
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-15 13:45:16

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/15/2014 01:24 PM, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>>
>>> Hi all,
>>>
>>> I use the tip/sched/core branch.
>>>
>>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>>
>>> * It boots normally
>>> * It sends info to the console
>>> * The graphics does not work
>>> * The terminals show the prompt, I can enter the username but after
>>> pressing enter, it does not give the password prompt
>>> * sysrq works more or less, I can't get the process stack but it receives
>>> the command
>>>
>>> It is like no new process can be created.
>>>
>>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>>
>>> After git bisecting, the following patch seems to introduce the bug.
>>>
>>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>>
>> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
>> if graphics works, but I can ssh in and work on it without bother.
>>
>> I can even log in on the serial console without problems.
>>
>> I tried both tip/master and tip/sched/core.
>
> I too booted various sytems and have yet to notice similar oddity.

Ok, I was able to boot and use the system with init=/bin/bash and a
busybox but the ubuntu distro ends up to block after booting. So I guess
there a service triggering something in the kernel leading to this issue.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-15 14:05:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Wed, Jan 15, 2014 at 12:00:18PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 10:22:45AM +0100, Ingo Molnar wrote:
> > 2)
> >
> > It's not clear why this change to __setscheduler() was done:
> >
> > /*
> > * Allow unprivileged RT tasks to decrease priority:
> > */
> > if (user && !capable(CAP_SYS_NICE)) {
> > + if (fair_policy(policy)) {
> > + if (!can_nice(p, attr->sched_nice))
> > + return -EPERM;
> > + }
> > +
> > if (rt_policy(policy)) {o
>
> Bah, I'm pretty sure I wrote that :/ And I can't for the life of me
> remember why I did that. Complete fail there.

Ah, I remember, its because we can now set nice through this path as
well, so we have to do permission checks.

The existing 'nice' syscalls have their own permission checks and do not
user __sched_setscheduler() at all.

Subject: [tip:sched/core] sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls

Commit-ID: e3de300d1212b42aa9d0d6031b12fca06ac00dd9
Gitweb: http://git.kernel.org/tip/e3de300d1212b42aa9d0d6031b12fca06ac00dd9
Author: Peter Zijlstra <[email protected]>
AuthorDate: Wed, 15 Jan 2014 12:30:15 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 16 Jan 2014 09:27:14 +0100

sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls

Previously sched_setscheduler() and sched_setparam() would not affect
the nice value of a task, restore this behaviour.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Michael wang <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26af370..c1b3d7e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3451,7 +3451,8 @@ int sched_setscheduler(struct task_struct *p, int policy,
{
struct sched_attr attr = {
.sched_policy = policy,
- .sched_priority = param->sched_priority
+ .sched_priority = param->sched_priority,
+ .sched_nice = PRIO_TO_NICE(p->static_prio),
};
return __sched_setscheduler(p, &attr, true);
}
@@ -3481,7 +3482,8 @@ int sched_setscheduler_nocheck(struct task_struct *p, int policy,
{
struct sched_attr attr = {
.sched_policy = policy,
- .sched_priority = param->sched_priority
+ .sched_priority = param->sched_priority,
+ .sched_nice = PRIO_TO_NICE(p->static_prio),
};
return __sched_setscheduler(p, &attr, false);
}

2014-01-16 13:48:56

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
>>
>> Hi all,
>>
>> I use the tip/sched/core branch.
>>
>> After git pulling yesterday, my host is unresponsive after booting the OS.
>>
>> * It boots normally
>> * It sends info to the console
>> * The graphics does not work
>> * The terminals show the prompt, I can enter the username but after
>> pressing enter, it does not give the password prompt
>> * sysrq works more or less, I can't get the process stack but it receives
>> the command
>>
>> It is like no new process can be created.
>>
>> I have a dual Xeon processor E5325 (2 x 4 cores).
>>
>> After git bisecting, the following patch seems to introduce the bug.
>>
>> commit d50dde5a10f305253cbc3855307f608f8a3c5f73
>
> OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
> if graphics works, but I can ssh in and work on it without bother.
>
> I can even log in on the serial console without problems.
>
> I tried both tip/master and tip/sched/core.
>
> Would you happen to have a .config for me to try?

I was able to reduce the scope and reproduce the issue.

AFAICT, that happens with rsyslogd. When login in a tty, the login
command sends a message through /dev/log. But rsyslogd is never woken up
and blocked in poll_schedule_timeout. The login process is blocked in
unix_wait_for_peer.

I can strace rsyslogd at startup. The two last sched_setscheduler calls
fail.

> grep sched trace.out

3570 sched_getparam(3570, { 0 }) = 0
3570 sched_getscheduler(3570) = 0 (SCHED_OTHER)
3570 sched_get_priority_min(SCHED_OTHER) = 0
3570 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_get_priority_min(SCHED_OTHER) = 0
3571 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_get_priority_min(SCHED_OTHER) = 0
3571 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = 0
3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_min resumed> ) = 0
3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_max resumed> ) = 0
3571 sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
permitted)
3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_min resumed> ) = 0
3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_max resumed> ) = 0
3571 sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
permitted)

The same strace but on a kernel which does not hang. The calls to
sched_setscheduler do not fail.

3292 sched_getparam(3292, { 0 }) = 0
3292 sched_getscheduler(3292) = 0 (SCHED_OTHER)
3292 sched_get_priority_min(SCHED_OTHER) = 0
3292 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_get_priority_min(SCHED_OTHER) = 0
3293 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_get_priority_min(SCHED_OTHER) = 0
3293 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_min resumed> ) = 0
3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_max resumed> ) = 0
3293 sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_min resumed> ) = 0
3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_max resumed> ) = 0
3293 sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0

The EPERM error comes from kernel/sched/core.c:3303

...
if (fair_policy(policy)) {
if (!can_nice(p, attr->sched_nice))
return -EPERM;
}
...


But I don't know why this is leading to block a process or making
rsyslogd being not woken up by a packet coming in the af_unix socket.

I hope that helps

-- Daniel


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 14:18:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Thu, Jan 16, 2014 at 02:48:51PM +0100, Daniel Lezcano wrote:
> 3570 sched_getparam(3570, { 0 }) = 0
> 3570 sched_getscheduler(3570) = 0 (SCHED_OTHER)
> 3570 sched_get_priority_min(SCHED_OTHER) = 0
> 3570 sched_get_priority_max(SCHED_OTHER) = 0
> 3571 sched_get_priority_min(SCHED_OTHER) = 0
> 3571 sched_get_priority_max(SCHED_OTHER) = 0
> 3571 sched_get_priority_min(SCHED_OTHER) = 0
> 3571 sched_get_priority_max(SCHED_OTHER) = 0
> 3571 sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
> 3571 <... sched_setscheduler resumed> ) = 0
> 3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_min resumed> ) = 0
> 3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_max resumed> ) = 0
> 3571 sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
> 3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
> permitted)
> 3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_min resumed> ) = 0
> 3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3571 <... sched_get_priority_max resumed> ) = 0
> 3571 sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
> 3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
> permitted)
>
> The same strace but on a kernel which does not hang. The calls to
> sched_setscheduler do not fail.
>
> 3292 sched_getparam(3292, { 0 }) = 0
> 3292 sched_getscheduler(3292) = 0 (SCHED_OTHER)
> 3292 sched_get_priority_min(SCHED_OTHER) = 0
> 3292 sched_get_priority_max(SCHED_OTHER) = 0
> 3293 sched_get_priority_min(SCHED_OTHER) = 0
> 3293 sched_get_priority_max(SCHED_OTHER) = 0
> 3293 sched_get_priority_min(SCHED_OTHER) = 0
> 3293 sched_get_priority_max(SCHED_OTHER) = 0
> 3293 sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
> 3293 <... sched_setscheduler resumed> ) = 0
> 3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_min resumed> ) = 0
> 3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_max resumed> ) = 0
> 3293 sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
> 3293 <... sched_setscheduler resumed> ) = 0
> 3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_min resumed> ) = 0
> 3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
> 3293 <... sched_get_priority_max resumed> ) = 0
> 3293 sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
> 3293 <... sched_setscheduler resumed> ) = 0
>
> The EPERM error comes from kernel/sched/core.c:3303
>
> ...
> if (fair_policy(policy)) {
> if (!can_nice(p, attr->sched_nice))
> return -EPERM;
> }
> ...
>
>
> But I don't know why this is leading to block a process or making rsyslogd
> being not woken up by a packet coming in the af_unix socket.

Could you test with a fresh tip/master, Ingo just pushed out a stack of
fixes, in particularly:

e3de300d1212b ("sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls")
39fd8fd22b322 ("sched: Fix up scheduler syscall LTP fails")

Could have affected things.

Meanwhile I'll try and better read what the above says.

Thanks!

2014-01-16 14:20:36

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/16/2014 03:17 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 02:48:51PM +0100, Daniel Lezcano wrote:
>> 3570 sched_getparam(3570, { 0 }) = 0
>> 3570 sched_getscheduler(3570) = 0 (SCHED_OTHER)
>> 3570 sched_get_priority_min(SCHED_OTHER) = 0
>> 3570 sched_get_priority_max(SCHED_OTHER) = 0
>> 3571 sched_get_priority_min(SCHED_OTHER) = 0
>> 3571 sched_get_priority_max(SCHED_OTHER) = 0
>> 3571 sched_get_priority_min(SCHED_OTHER) = 0
>> 3571 sched_get_priority_max(SCHED_OTHER) = 0
>> 3571 sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
>> 3571 <... sched_setscheduler resumed> ) = 0
>> 3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
>> 3571 <... sched_get_priority_min resumed> ) = 0
>> 3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
>> 3571 <... sched_get_priority_max resumed> ) = 0
>> 3571 sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
>> 3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
>> permitted)
>> 3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
>> 3571 <... sched_get_priority_min resumed> ) = 0
>> 3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
>> 3571 <... sched_get_priority_max resumed> ) = 0
>> 3571 sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
>> 3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
>> permitted)
>>
>> The same strace but on a kernel which does not hang. The calls to
>> sched_setscheduler do not fail.
>>
>> 3292 sched_getparam(3292, { 0 }) = 0
>> 3292 sched_getscheduler(3292) = 0 (SCHED_OTHER)
>> 3292 sched_get_priority_min(SCHED_OTHER) = 0
>> 3292 sched_get_priority_max(SCHED_OTHER) = 0
>> 3293 sched_get_priority_min(SCHED_OTHER) = 0
>> 3293 sched_get_priority_max(SCHED_OTHER) = 0
>> 3293 sched_get_priority_min(SCHED_OTHER) = 0
>> 3293 sched_get_priority_max(SCHED_OTHER) = 0
>> 3293 sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
>> 3293 <... sched_setscheduler resumed> ) = 0
>> 3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
>> 3293 <... sched_get_priority_min resumed> ) = 0
>> 3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
>> 3293 <... sched_get_priority_max resumed> ) = 0
>> 3293 sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
>> 3293 <... sched_setscheduler resumed> ) = 0
>> 3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
>> 3293 <... sched_get_priority_min resumed> ) = 0
>> 3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
>> 3293 <... sched_get_priority_max resumed> ) = 0
>> 3293 sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
>> 3293 <... sched_setscheduler resumed> ) = 0
>>
>> The EPERM error comes from kernel/sched/core.c:3303
>>
>> ...
>> if (fair_policy(policy)) {
>> if (!can_nice(p, attr->sched_nice))
>> return -EPERM;
>> }
>> ...
>>
>>
>> But I don't know why this is leading to block a process or making rsyslogd
>> being not woken up by a packet coming in the af_unix socket.
>
> Could you test with a fresh tip/master, Ingo just pushed out a stack of
> fixes, in particularly:
>
> e3de300d1212b ("sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls")
> 39fd8fd22b322 ("sched: Fix up scheduler syscall LTP fails")
>
> Could have affected things.
>
> Meanwhile I'll try and better read what the above says.

Already tested. The last commits do not change the issue described above.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 14:26:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
> Already tested. The last commits do not change the issue described above.

Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
laptop around that should have some recent ubuntu on it, let me go find
it.

2014-01-16 14:30:41

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/16/2014 03:25 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
>> Already tested. The last commits do not change the issue described above.
>
> Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
> laptop around that should have some recent ubuntu on it, let me go find
> it.

It is a Ubuntu 12.04 and rsyslog 5.8.6-1ubuntu8.4



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 15:28:11

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/16/2014 03:25 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
>> Already tested. The last commits do not change the issue described above.
>
> Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
> laptop around that should have some recent ubuntu on it, let me go find
> it.
>

Removing the lines below fix the issue.

These changes were introduced by the commit
d50dde5a10f305253cbc3855307f608f8a3c5f73.

I don't get the connection between what is supposed to do the patch as
described in the commit log and these four lines added.

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 141baa3..acabef1 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -15,7 +15,7 @@

#include <uapi/asm/unistd.h>
dlezcano@mai:~/Work/src/cpuidle-next (sched/idle-balance)$ git diff
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 93a2836..673edff 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3295,10 +3295,6 @@ recheck:
* Allow unprivileged RT tasks to decrease priority:
*/
if (user && !capable(CAP_SYS_NICE)) {
- if (fair_policy(policy)) {
- if (!can_nice(p, attr->sched_nice))
- return -EPERM;
- }

if (rt_policy(policy)) {
unsigned long rlim_rtprio =



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 15:42:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Thu, Jan 16, 2014 at 03:30:34PM +0100, Daniel Lezcano wrote:
> On 01/16/2014 03:25 PM, Peter Zijlstra wrote:
> >On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
> >>Already tested. The last commits do not change the issue described above.
> >
> >Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
> >laptop around that should have some recent ubuntu on it, let me go find
> >it.
>
> It is a Ubuntu 12.04 and rsyslog 5.8.6-1ubuntu8.4

My laptop has 12.10, I've not yet tried to reproduce but while staring
at the code I noticed a difference with the other can_nice() tests.

Could you try the below while I build a pristine tip/master for the
laptop?

---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0326c06953eb..e4a3f7e3d613 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3298,7 +3298,7 @@ static int __sched_setscheduler(struct task_struct *p,
*/
if (user && !capable(CAP_SYS_NICE)) {
if (fair_policy(policy)) {
- if (!can_nice(p, attr->sched_nice))
+ if (attr->sched_nice < TASK_NICE(p) && !can_nice(p, attr->sched_nice))
return -EPERM;
}

2014-01-16 15:48:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On Thu, Jan 16, 2014 at 04:28:05PM +0100, Daniel Lezcano wrote:
> On 01/16/2014 03:25 PM, Peter Zijlstra wrote:
> >On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
> >>Already tested. The last commits do not change the issue described above.
> >
> >Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
> >laptop around that should have some recent ubuntu on it, let me go find
> >it.
> >
>
> Removing the lines below fix the issue.
>
> These changes were introduced by the commit
> d50dde5a10f305253cbc3855307f608f8a3c5f73.
>
> I don't get the connection between what is supposed to do the patch as
> described in the commit log and these four lines added.

The interface adds:

sched_attr::sched_nice

And extends __sched_setscheduler() to also set nice values as provided
by the sched_attr, therefore we must check to see if we've got
permission to change the nice value.

2014-01-16 15:50:51

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/16/2014 04:42 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 03:30:34PM +0100, Daniel Lezcano wrote:
>> On 01/16/2014 03:25 PM, Peter Zijlstra wrote:
>>> On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
>>>> Already tested. The last commits do not change the issue described above.
>>>
>>> Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
>>> laptop around that should have some recent ubuntu on it, let me go find
>>> it.
>>
>> It is a Ubuntu 12.04 and rsyslog 5.8.6-1ubuntu8.4
>
> My laptop has 12.10, I've not yet tried to reproduce but while staring
> at the code I noticed a difference with the other can_nice() tests.
>
> Could you try the below while I build a pristine tip/master for the
> laptop?

Yes, it fixes the issue.

> ---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0326c06953eb..e4a3f7e3d613 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3298,7 +3298,7 @@ static int __sched_setscheduler(struct task_struct *p,
> */
> if (user && !capable(CAP_SYS_NICE)) {
> if (fair_policy(policy)) {
> - if (!can_nice(p, attr->sched_nice))
> + if (attr->sched_nice < TASK_NICE(p) && !can_nice(p, attr->sched_nice))
> return -EPERM;
> }
>
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 15:51:38

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [BUG] [ tip/sched/core ] System unresponsive after booting

On 01/16/2014 04:48 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 04:28:05PM +0100, Daniel Lezcano wrote:
>> On 01/16/2014 03:25 PM, Peter Zijlstra wrote:
>>> On Thu, Jan 16, 2014 at 03:20:20PM +0100, Daniel Lezcano wrote:
>>>> Already tested. The last commits do not change the issue described above.
>>>
>>> Bummer, ok, what version of Ubuntu/rsyslogd are you using? I have a
>>> laptop around that should have some recent ubuntu on it, let me go find
>>> it.
>>>
>>
>> Removing the lines below fix the issue.
>>
>> These changes were introduced by the commit
>> d50dde5a10f305253cbc3855307f608f8a3c5f73.
>>
>> I don't get the connection between what is supposed to do the patch as
>> described in the commit log and these four lines added.
>
> The interface adds:
>
> sched_attr::sched_nice
>
> And extends __sched_setscheduler() to also set nice values as provided
> by the sched_attr, therefore we must check to see if we've got
> permission to change the nice value.

Ah, ok.

Thanks for the clarification.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-01-16 16:54:34

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] sched: Fix __sched_setscheduler() nice test

Subject: sched: Fix __sched_setscheduler() nice test

With the introduction of sched_attr::sched_nice we need to check if
we've got permission to actually change the nice value.

Daniel found that can_nice() would always fail; and upon inspection it
turns out that can_nice() only tests to see if we can lower the nice
value, but it doesn't validate if we're lowering or not.

Therefore amend the test to only call can_nice() when we lower the nice
value.

Cc: [email protected]
Cc: [email protected]
Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Reported-and-Tested-by: Daniel Lezcano <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 93a2836b6220..36c951b7eef8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3296,7 +3296,8 @@ static int __sched_setscheduler(struct task_struct *p,
*/
if (user && !capable(CAP_SYS_NICE)) {
if (fair_policy(policy)) {
- if (!can_nice(p, attr->sched_nice))
+ if (attr->sched_nice < TASK_NICE(p) &&
+ !can_nice(p, attr->sched_nice))
return -EPERM;
}

2014-01-16 18:33:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Fix __sched_setscheduler() nice test


So I tried to make can_nice() do what I thought it did; but I'm not sure
the result is worth the effort.

Also, binder seems to use can_nice() -- but I really couldn't be arsed
to look at it.

---

Index: linux-2.6/kernel/sched/auto_group.c
===================================================================
--- linux-2.6.orig/kernel/sched/auto_group.c
+++ linux-2.6/kernel/sched/auto_group.c
@@ -210,7 +210,7 @@ int proc_sched_autogroup_set_nice(struct
if (err)
return err;

- if (nice < 0 && !can_nice(current, nice))
+ if (!can_nice(current, nice))
return -EPERM;

/* this is a heavy operation taking global locks.. */
Index: linux-2.6/kernel/sched/core.c
===================================================================
--- linux-2.6.orig/kernel/sched/core.c
+++ linux-2.6/kernel/sched/core.c
@@ -3017,17 +3017,22 @@ void set_user_nice(struct task_struct *p
EXPORT_SYMBOL(set_user_nice);

/*
- * can_nice - check if a task can reduce its nice value
+ * can_nice - check if a task can change its nice value
* @p: task
* @nice: nice value
*/
-int can_nice(const struct task_struct *p, const int nice)
+bool can_nice(const struct task_struct *p, const int nice)
{
/* convert nice value [19,-20] to rlimit style value [1,40] */
int nice_rlim = 20 - nice;

- return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) ||
- capable(CAP_SYS_NICE));
+ if (nice >= TASK_NICE(p))
+ return true;
+
+ if (nice_rlim <= task_rlimit(p, RLIMIT_NICE) || capable(CAP_SYS_NICE))
+ return true;
+
+ return false;
}

#ifdef __ARCH_WANT_SYS_NICE
@@ -3059,7 +3064,7 @@ SYSCALL_DEFINE1(nice, int, increment)
if (nice > 19)
nice = 19;

- if (increment < 0 && !can_nice(current, nice))
+ if (!can_nice(current, nice))
return -EPERM;

retval = security_task_setnice(current, nice);
@@ -3296,8 +3301,7 @@ static int __sched_setscheduler(struct t
*/
if (user && !capable(CAP_SYS_NICE)) {
if (fair_policy(policy)) {
- if (attr->sched_nice < TASK_NICE(p) &&
- !can_nice(p, attr->sched_nice))
+ if (!can_nice(p, attr->sched_nice))
return -EPERM;
}

@@ -3320,8 +3324,18 @@ static int __sched_setscheduler(struct t
* SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
*/
if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
- if (!can_nice(p, TASK_NICE(p)))
- return -EPERM;
+ int static_prio = p->static_prio;
+ int err = 0;
+
+ p->static_prio = NICE_TO_PRIO(20);
+
+ if (!can_nice(p, PRIO_TO_NICE(static_prio)))
+ err = -EPERM;
+
+ p->static_prio = static_prio;
+
+ if (err)
+ return err;
}

/* can't change other user's priorities */
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -144,7 +144,7 @@ static int set_one_prio(struct task_stru
error = -EPERM;
goto out;
}
- if (niceval < task_nice(p) && !can_nice(p, niceval)) {
+ if (!can_nice(p, niceval)) {
error = -EACCES;
goto out;
}

Subject: [tip:sched/core] sched: Fix __sched_setscheduler() nice test

Commit-ID: eaad45132c564ce377e6dce05e78e08e456d5315
Gitweb: http://git.kernel.org/tip/eaad45132c564ce377e6dce05e78e08e456d5315
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 16 Jan 2014 17:54:25 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 16 Jan 2014 18:07:08 +0100

sched: Fix __sched_setscheduler() nice test

With the introduction of sched_attr::sched_nice we need to check
if we've got permission to actually change the nice value.

Daniel found that can_nice() would always fail; and upon
inspection it turns out that can_nice() only tests to see if we
can lower the nice value, but it doesn't validate if we're
lowering or not.

Therefore amend the test to only call can_nice() when we lower
the nice value.

Reported-and-Tested-by: Daniel Lezcano <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Daniel Lezcano <[email protected]>
Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 93a2836..36c951b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3296,7 +3296,8 @@ recheck:
*/
if (user && !capable(CAP_SYS_NICE)) {
if (fair_policy(policy)) {
- if (!can_nice(p, attr->sched_nice))
+ if (attr->sched_nice < TASK_NICE(p) &&
+ !can_nice(p, attr->sched_nice))
return -EPERM;
}