2007-12-04 05:08:22

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.24-rc4


We should have one week between -rc releases, but I was gone for a week
over thanksgiving (as were some other kernel developers), so this one is a
bit late. It's been almost the rule rather than the exception, but I
promise I'll be better...

Anyway, there aren't a lot of exciting changes here, but there's still a
_lot_ more churn than I really hoped for at the -rc4 stage. Blackfin, MIPS
and Power do stand out in the diffstats, but ARM and x86 got some updates
too.

And we had some ACPI churn (processor throttling etc), along with various
driver updates: ATA, IDE, infiniband, SCSI, USB and network drivers.. And
on the filesystem side, cifs, NFS, ocfs2 and proc. Ugh. Too much.

In fact, the diff from -rc3 is almost 36,000 lines, and that's the smaller
git one with the renames shown as renames (not the ones I upload as
patches to kernel.org - those are done so that people with GNU patch and
other legacy patch programs can use the diffs). I'll blame the two-week
window for some of it, but even so, this is a bit disheartening. I'm
really hoping that we're slowing down and -rc5 won't be anywhere near that
large.

That said, none of the changes are really _exciting_ or really scary. And
we should have fixed a number of regressions, although more certainly
remain.

Linus


2007-12-04 10:24:01

by Kamalesh Babulal

[permalink] [raw]
Subject: [build failure] Re: Linux 2.6.24-rc4 on S390x

Hi,

The patch ctc: make use of alloc_netdev() (commit 1c1478859017452a1179dbbdf7b9eb5b48438746)
introduces the build failure

CC [M] drivers/s390/net/fsm.o
CC [M] drivers/s390/net/smsgiucv.o
CC [M] drivers/s390/net/ctcmain.o
drivers/s390/net/ctcmain.c: In function `ctc_init_netdevice':
drivers/s390/net/ctcmain.c:2805: error: implicit declaration of function `SET_MODULE_OWNER'
make[2]: *** [drivers/s390/net/ctcmain.o] Error 1
make[1]: *** [drivers/s390/net] Error 2
make: *** [drivers/s390] Error 2

--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

2007-12-04 10:31:45

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [build failure] Re: Linux 2.6.24-rc4 on S390x

On Tue, 2007-12-04 at 15:53 +0530, Kamalesh Babulal wrote:
> The patch ctc: make use of alloc_netdev() (commit 1c1478859017452a1179dbbdf7b9eb5b48438746)
> introduces the build failure
>
> CC [M] drivers/s390/net/fsm.o
> CC [M] drivers/s390/net/smsgiucv.o
> CC [M] drivers/s390/net/ctcmain.o
> drivers/s390/net/ctcmain.c: In function `ctc_init_netdevice':
> drivers/s390/net/ctcmain.c:2805: error: implicit declaration of function `SET_MODULE_OWNER'
> make[2]: *** [drivers/s390/net/ctcmain.o] Error 1
> make[1]: *** [drivers/s390/net] Error 2
> make: *** [drivers/s390] Error 2

Hi Uschi,
that last patch reverted commit 10d024c1b2fd58af8362670d7d6e5ae52fc33353.
That needs to get readded.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2007-12-04 10:33:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [build failure] Re: Linux 2.6.24-rc4 on S390x


* Kamalesh Babulal <[email protected]> wrote:

> The patch ctc: make use of alloc_netdev() (commit
> 1c1478859017452a1179dbbdf7b9eb5b48438746) introduces the build failure
>
> CC [M] drivers/s390/net/fsm.o
> CC [M] drivers/s390/net/smsgiucv.o
> CC [M] drivers/s390/net/ctcmain.o
> drivers/s390/net/ctcmain.c: In function `ctc_init_netdevice':
> drivers/s390/net/ctcmain.c:2805: error: implicit declaration of function `SET_MODULE_OWNER'
> make[2]: *** [drivers/s390/net/ctcmain.o] Error 1
> make[1]: *** [drivers/s390/net] Error 2
> make: *** [drivers/s390] Error 2

the patch below should fix this.

Ingo

------------>
Subject: drivers/s390/net/ctcmain.c: fix build bug
From: Ingo Molnar <[email protected]>

SET_MODULE_OWNER() is obsolete.

Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/s390/net/ctcmain.c | 1 -
1 file changed, 1 deletion(-)

Index: linux/drivers/s390/net/ctcmain.c
===================================================================
--- linux.orig/drivers/s390/net/ctcmain.c
+++ linux/drivers/s390/net/ctcmain.c
@@ -2802,7 +2802,6 @@ void ctc_init_netdevice(struct net_devic
dev->type = ARPHRD_SLIP;
dev->tx_queue_len = 100;
dev->flags = IFF_POINTOPOINT | IFF_NOARP;
- SET_MODULE_OWNER(dev);
}

2007-12-04 13:22:18

by Nicolas Pitre

[permalink] [raw]
Subject: Re: Linux 2.6.24-rc4

On Mon, 3 Dec 2007, Linus Torvalds wrote:

> That said, none of the changes are really _exciting_ or really scary. And
> we should have fixed a number of regressions, although more certainly
> remain.

Any reason for this:

mode change 100644 => 100755 drivers/net/chelsio/cxgb2.c
mode change 100644 => 100755 drivers/net/chelsio/pm3393.c
mode change 100644 => 100755 drivers/net/chelsio/sge.c
mode change 100644 => 100755 drivers/net/chelsio/sge.h


Nicolas

Subject: [local DoS] Re: Linux 2.6.24-rc4

Em Mon, 3 Dec 2007 21:08:12 -0800 (PST)
Linus Torvalds <[email protected]> escreveu:

| That said, none of the changes are really _exciting_ or really scary. And
| we should have fixed a number of regressions, although more certainly
| remain.

A Mandriva user reported this bug last week. Run the following program
as a normal user.

"""
#include <stdio.h>
#include <sched.h>

int main(void)
{
sched_rr_get_interval(1, NULL);
return 0;
}
"""

You should get the following OOPS and the machine will hang.

"""
divide error: 0000 [#1] SMP
Modules linked in: af_packet snd_seq_dummy snd_seq_oss snd_seq_midi_event ipv6 snd_seq snd_pcm_oss snd_mie

Pid: 4202, comm: unhide Not tainted (2.6.24-desktop-0.rc3.2mdv #1)
EIP: 0060:[<c01276cb>] EFLAGS: 00010046 CPU: 0
EIP is at sched_slice+0x3b/0x60
EAX: 00000004 EBX: c4b40000 ECX: 00000004 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: d7d2bf84 ESP: d7d2bf78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process unhide (pid: 4202, ti=d7d2a000 task=d7d29580 task.ti=d7d2a000)
Stack: c140a0a0 df98a000 00000000 d7d2bfb0 c012cc1e d7d2bfb8 d7c019f4 00000064
0804a898 00000000 00000286 00000001 b7f3acc0 00000000 d7d2a000 c010830e
00000001 bffdc2b8 b7f0cff4 b7f3acc0 00000000 bffdc2c8 000000a1 0000007b
Call Trace:
[<c01094ca>] show_trace_log_lvl+0x1a/0x30
[<c010958b>] show_stack_log_lvl+0xab/0xd0
[<c010966d>] show_registers+0xbd/0x1c0
[<c0109894>] die+0x124/0x250
[<c0109a51>] do_trap+0x91/0xc0
[<c0109f55>] do_divide_error+0x85/0x90
[<c033bc6a>] error_code+0x72/0x78
[<c012cc1e>] sys_sched_rr_get_interval+0x7e/0xf0
[<c010830e>] sysenter_past_esp+0x6b/0xa1
=======================
Code: d6 89 7c 24 08 8b 40 08 e8 b3 fe ff ff 8b 0e 8b 3b 89 d6 0f af f1 f7 e1 8d 1c 16 89 da 89 d1 31 d2
EIP: [<c01276cb>] sched_slice+0x3b/0x60 SS:ESP 0068:d7d2bf78
"""

That OOPS is from a -rc3-git1 Mandriva kernel, but the same thing
happens with you're latest tree.

I've reported it to vendor-sec but looks like it's only
present in 2.6.24-rcs and Ingo's CFS backports.

As Ingo's usually very responsive and he didn't answer me so
far I'm starting to think you can't reproduce this problem?

Anyway, the problem seems to be in sched_slice() called by
sys_sched_rr_get_interval():

time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));

sched_slice() will use 'cfs_rq->load.weight' as the base for a
division, which is zero for process 1.

The following hack fixes the problem for me.

-----

Index: linux-2.6.23/kernel/sched_fair.c
===================================================================
--- linux-2.6.23.orig/kernel/sched_fair.c
+++ linux-2.6.23/kernel/sched_fair.c
@@ -266,7 +266,8 @@ static u64 sched_slice(struct cfs_rq *cf
u64 slice = __sched_period(cfs_rq->nr_running);

slice *= se->load.weight;
- do_div(slice, cfs_rq->load.weight);
+ if (likely(cfs_rq->load.weight))
+ do_div(slice, cfs_rq->load.weight);

return slice;
}


--
Luiz Fernando N. Capitulino

2007-12-04 15:57:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [local DoS] Re: Linux 2.6.24-rc4



On Tue, 4 Dec 2007, Luiz Fernando N. Capitulino wrote:
>
> sched_rr_get_interval(1, NULL);

Looks like we have a zero "cfs_rq->load.weight".

Ingo? Both sched_slice() and __sched_slice() do a divide by the runqueue
weight, and at least dequeue_task_fair() explicitly checks for that being
zero, so clearly zero is a possible value. Hmm?

Linus

2007-12-04 16:00:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [local DoS] Re: Linux 2.6.24-rc4


* Linus Torvalds <[email protected]> wrote:

>
>
> On Tue, 4 Dec 2007, Luiz Fernando N. Capitulino wrote:
> >
> > sched_rr_get_interval(1, NULL);
>
> Looks like we have a zero "cfs_rq->load.weight".
>
> Ingo? Both sched_slice() and __sched_slice() do a divide by the
> runqueue weight, and at least dequeue_task_fair() explicitly checks
> for that being zero, so clearly zero is a possible value. Hmm?

yeah, i can reproduce this crash too.

The problem is on SMP: if sched_rr_get_interval() gets a task from an
otherwise idle runqueue, then rq->load.weight is 0. Normally
sched_slice() is only used on a busy runqueue. So the correct fixup site
is not in sched_slice() but in sys_sched_rr_get_interval() - i'm working
on the right fix, i hope to be able to send a pull request in a few
minutes.

Ingo

2007-12-04 16:04:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.24-rc4

Nicolas Pitre wrote:
> On Mon, 3 Dec 2007, Linus Torvalds wrote:
>
>> That said, none of the changes are really _exciting_ or really scary. And
>> we should have fixed a number of regressions, although more certainly
>> remain.
>
> Any reason for this:
>
> mode change 100644 => 100755 drivers/net/chelsio/cxgb2.c
> mode change 100644 => 100755 drivers/net/chelsio/pm3393.c
> mode change 100644 => 100755 drivers/net/chelsio/sge.c
> mode change 100644 => 100755 drivers/net/chelsio/sge.h

As repeatedly mentioned on the list :) it is a mistake.

Jeff


Subject: Re: [local DoS] Re: Linux 2.6.24-rc4

Em Tue, 4 Dec 2007 17:00:05 +0100
Ingo Molnar <[email protected]> escreveu:

|
| * Linus Torvalds <[email protected]> wrote:
|
| >
| >
| > On Tue, 4 Dec 2007, Luiz Fernando N. Capitulino wrote:
| > >
| > > sched_rr_get_interval(1, NULL);
| >
| > Looks like we have a zero "cfs_rq->load.weight".
| >
| > Ingo? Both sched_slice() and __sched_slice() do a divide by the
| > runqueue weight, and at least dequeue_task_fair() explicitly checks
| > for that being zero, so clearly zero is a possible value. Hmm?
|
| yeah, i can reproduce this crash too.
|
| The problem is on SMP: if sched_rr_get_interval() gets a task from an
| otherwise idle runqueue, then rq->load.weight is 0. Normally
| sched_slice() is only used on a busy runqueue. So the correct fixup site
| is not in sched_slice() but in sys_sched_rr_get_interval() - i'm working
| on the right fix, i hope to be able to send a pull request in a few
| minutes.

Ingo, I can reproduce this w/o SMP support as well.

(Also, the backtrace I sent was reproduced on a UP machine with a
SMP kernel).

--
Luiz Fernando N. Capitulino

2007-12-04 16:09:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [local DoS] Re: Linux 2.6.24-rc4


* Luiz Fernando N. Capitulino <[email protected]> wrote:

> | The problem is on SMP: if sched_rr_get_interval() gets a task from
> | an otherwise idle runqueue, then rq->load.weight is 0. Normally
> | sched_slice() is only used on a busy runqueue. So the correct fixup
> | site is not in sched_slice() but in sys_sched_rr_get_interval() -
> | i'm working on the right fix, i hope to be able to send a pull
> | request in a few minutes.
>
> Ingo, I can reproduce this w/o SMP support as well.

hm, if you run this as an RT task, right? Or can you trigger it via pure
SCHED_OTHER tasks as well? Below is my candidate fix.

Ingo

--------------->
Subject: sched: fix crash in sys_sched_rr_get_interval()
From: Ingo Molnar <[email protected]>

Luiz Fernando N. Capitulino reported that sched_rr_get_interval()
crashes for SCHED_OTHER tasks that are on an idle runqueue.

The fix is to return a 0 timeslice for tasks that are on an idle
runqueue. (and which are not running, obviously)

Reported-by: Luiz Fernando N. Capitulino <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4850,17 +4850,21 @@ long sys_sched_rr_get_interval(pid_t pid
if (retval)
goto out_unlock;

- if (p->policy == SCHED_FIFO)
- time_slice = 0;
- else if (p->policy == SCHED_RR)
+ /*
+ * Time slice is 0 for SCHED_FIFO tasks and for SCHED_OTHER
+ * tasks that are on an otherwise idle runqueue:
+ */
+ time_slice = 0;
+ if (p->policy == SCHED_RR) {
time_slice = DEF_TIMESLICE;
- else {
+ } else {
struct sched_entity *se = &p->se;
unsigned long flags;
struct rq *rq;

rq = task_rq_lock(p, &flags);
- time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
+ if (rq->cfs.load.weight)
+ time_slice = NS_TO_JIFFIES(sched_slice(&rq->cfs, se));
task_rq_unlock(rq, &flags);
}
read_unlock(&tasklist_lock);

2007-12-04 16:18:52

by Ingo Molnar

[permalink] [raw]
Subject: [git pull] scheduler fixes


* Ingo Molnar <[email protected]> wrote:

> The problem is on SMP: if sched_rr_get_interval() gets a task from an
> otherwise idle runqueue, then rq->load.weight is 0. Normally
> sched_slice() is only used on a busy runqueue. So the correct fixup
> site is not in sched_slice() but in sys_sched_rr_get_interval() - i'm
> working on the right fix, i hope to be able to send a pull request in
> a few minutes.

the problem is on UP too - if there are no SCHED_OTHER tasks. I've
tested the fix and it solves the problem for various combinations of
crash.c. I've updated sched.git, please pull it from:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

It has another commit besides this fix. Thanks,

Ingo

------------------>

Ingo Molnar (2):
sched: fix crash in sys_sched_rr_get_interval()
sched: default to more agressive yield for SCHED_BATCH tasks

sched.c | 14 +++++++++-----
sched_fair.c | 7 ++++---
2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 59ff6b1..b062856 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4850,17 +4850,21 @@ long sys_sched_rr_get_interval(pid_t pid, struct timespec __user *interval)
if (retval)
goto out_unlock;

- if (p->policy == SCHED_FIFO)
- time_slice = 0;
- else if (p->policy == SCHED_RR)
+ /*
+ * Time slice is 0 for SCHED_FIFO tasks and for SCHED_OTHER
+ * tasks that are on an otherwise idle runqueue:
+ */
+ time_slice = 0;
+ if (p->policy == SCHED_RR) {
time_slice = DEF_TIMESLICE;
- else {
+ } else {
struct sched_entity *se = &p->se;
unsigned long flags;
struct rq *rq;

rq = task_rq_lock(p, &flags);
- time_slice = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
+ if (rq->cfs.load.weight)
+ time_slice = NS_TO_JIFFIES(sched_slice(&rq->cfs, se));
task_rq_unlock(rq, &flags);
}
read_unlock(&tasklist_lock);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 37bb265..c33f0ce 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -799,8 +799,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep)
*/
static void yield_task_fair(struct rq *rq)
{
- struct cfs_rq *cfs_rq = task_cfs_rq(rq->curr);
- struct sched_entity *rightmost, *se = &rq->curr->se;
+ struct task_struct *curr = rq->curr;
+ struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+ struct sched_entity *rightmost, *se = &curr->se;

/*
* Are we the only task in the tree?
@@ -808,7 +809,7 @@ static void yield_task_fair(struct rq *rq)
if (unlikely(cfs_rq->nr_running == 1))
return;

- if (likely(!sysctl_sched_compat_yield)) {
+ if (likely(!sysctl_sched_compat_yield) && curr->policy != SCHED_BATCH) {
__update_rq_clock(rq);
/*
* Update run-time statistics of the 'current'.

Subject: Re: [git pull] scheduler fixes

Em Tue, 4 Dec 2007 17:18:27 +0100
Ingo Molnar <[email protected]> escreveu:

|
| * Ingo Molnar <[email protected]> wrote:
|
| > The problem is on SMP: if sched_rr_get_interval() gets a task from an
| > otherwise idle runqueue, then rq->load.weight is 0. Normally
| > sched_slice() is only used on a busy runqueue. So the correct fixup
| > site is not in sched_slice() but in sys_sched_rr_get_interval() - i'm
| > working on the right fix, i hope to be able to send a pull request in
| > a few minutes.
|
| the problem is on UP too - if there are no SCHED_OTHER tasks. I've
| tested the fix and it solves the problem for various combinations of
| crash.c. I've updated sched.git, please pull it from:
|
| git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
|
| It has another commit besides this fix. Thanks,

Yes, I tested the 'sched: fix crash in sys_sched_rr_get_interval()'
one and it really fixes the problem.

Thanks a lot Ingo.

--
Luiz Fernando N. Capitulino

2007-12-04 18:26:31

by Greg KH

[permalink] [raw]
Subject: Re: [git pull] scheduler fixes

On Tue, Dec 04, 2007 at 05:18:27PM +0100, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > The problem is on SMP: if sched_rr_get_interval() gets a task from an
> > otherwise idle runqueue, then rq->load.weight is 0. Normally
> > sched_slice() is only used on a busy runqueue. So the correct fixup
> > site is not in sched_slice() but in sys_sched_rr_get_interval() - i'm
> > working on the right fix, i hope to be able to send a pull request in
> > a few minutes.
>
> the problem is on UP too - if there are no SCHED_OTHER tasks. I've
> tested the fix and it solves the problem for various combinations of
> crash.c. I've updated sched.git, please pull it from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
>
> It has another commit besides this fix. Thanks,

Can you make up something that I can apply for 2.6.23-stable? or is
this not an issue on that tree?

thanks,

greg k-h

Subject: Re: [git pull] scheduler fixes

Em Tue, 4 Dec 2007 10:28:51 -0800
Greg KH <[email protected]> escreveu:

| On Tue, Dec 04, 2007 at 05:18:27PM +0100, Ingo Molnar wrote:
| >
| > * Ingo Molnar <[email protected]> wrote:
| >
| > > The problem is on SMP: if sched_rr_get_interval() gets a task from an
| > > otherwise idle runqueue, then rq->load.weight is 0. Normally
| > > sched_slice() is only used on a busy runqueue. So the correct fixup
| > > site is not in sched_slice() but in sys_sched_rr_get_interval() - i'm
| > > working on the right fix, i hope to be able to send a pull request in
| > > a few minutes.
| >
| > the problem is on UP too - if there are no SCHED_OTHER tasks. I've
| > tested the fix and it solves the problem for various combinations of
| > crash.c. I've updated sched.git, please pull it from:
| >
| > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
| >
| > It has another commit besides this fix. Thanks,
|
| Can you make up something that I can apply for 2.6.23-stable? or is
| this not an issue on that tree?

FWIW I couldn't reproduce the problem with 2.6.23.9. sched_slice()
is quite different on that kernel and _maybe_ it won't never divide
by zero.

My original report on vendor-sec was wrong. I've said that 2.6.23.9
had the same bug but turns out the kernel I tested had the Ingo's
CFS backport patch applied. I didn't know that, I thought it was a
vanilla kernel.

Btw, I think it's important to release a new CFS backport patch
because maybe some distro is using it (Mandriva stable kernel is
using the CFS backport patch, but we didn't update to latest
version yet).

--
Luiz Fernando N. Capitulino

2007-12-04 21:04:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] scheduler fixes


* Luiz Fernando N. Capitulino <[email protected]> wrote:

> | > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
> | >
> | > It has another commit besides this fix. Thanks,
> |
> | Can you make up something that I can apply for 2.6.23-stable? or is
> | this not an issue on that tree?
>
> FWIW I couldn't reproduce the problem with 2.6.23.9. sched_slice() is
> quite different on that kernel and _maybe_ it won't never divide by
> zero.

no, this is due to a fairly recent commit, so 2.6.23 should not be
affected. (We cleaned up sched_rr_interval() in one of the 2.6.24
scheduler commits.)

> My original report on vendor-sec was wrong. I've said that 2.6.23.9
> had the same bug but turns out the kernel I tested had the Ingo's CFS
> backport patch applied. I didn't know that, I thought it was a vanilla
> kernel.
>
> Btw, I think it's important to release a new CFS backport patch
> because maybe some distro is using it (Mandriva stable kernel is using
> the CFS backport patch, but we didn't update to latest version yet).

this should only affect the v24 CFS version - i've updated the v24
backport patches. sched_rr_interval() is almost never used, and it's
basically never used for SCHED_OTHER tasks.

Ingo

2007-12-05 00:25:42

by Diego Calleja

[permalink] [raw]
Subject: Re: Linux 2.6.24-rc4

As usually, if someone finds errors in http://kernelnewbies.org/Linux_2_6_24 ,
let me know it or change it yourself.