2008-06-05 07:52:35

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: Tree for June 5

Hi all,

Changes since next-20080604:

The hid tree fixed the conflicts with Linus' tree.

The v4l-dvb tree no longer needed the fixup patch.

The galak tree gained a conflict with the net tree.

The wireless tree now has the same conflict (in the ps3_gelic driver) as
the semaphore-removal tree.

The rr tree gained a conflict with the net-current tree.

The ldp tree suffered from the v4l-dvb struct members renaming.

I have applied the following temporary patch for known build problems:

"Fix various 8390 builds" - the net tree broke builds on various
architectures - hopefully this patch will go into the net tree shortly.
"firmware: build fixes 2" - the firmware tree broke some arm builds.

----------------------------------------------------------------------------

I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
(patches at
http://www.kernel.org/pub/linux/kernel/people/sfr/linux-next/). If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one. You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source. There are also quilt-import.log and merge.log files
in the Next directory. Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups, it is also built with powerpc allnoconfig,
44x_defconfig and allyesconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

We are up to 87 trees (counting Linus' and 13 trees of patches pending for
Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next . If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Jan Dittmer for adding the linux-next tree to his build tests
at http://l4x.org/k/ , the guys at http://test.kernel.org/ and Randy
Dunlap for doing many randconfig builds.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ . Thanks to Frank Seidel.

--
Cheers,
Stephen Rothwell [email protected]

$ git checkout master
$ git reset --hard stable
Merging origin/master
Merging powerpc-merge/merge
Merging scsi-rc-fixes/master
Merging net-current/master
Merging sparc-current/master
Merging sound-current/for-linus
Merging arm-current/master
Merging pci-current/for-linus
Merging wireless-current/master
Merging kbuild-current/master
Merging quilt/driver-core.current
Merging quilt/usb.current
Merging cpufreq-current/fixes
Merging input-current/for-linus
Merging quilt/driver-core
CONFLICT (content): Merge conflict in drivers/s390/kvm/kvm_virtio.c
CONFLICT (content): Merge conflict in drivers/virtio/virtio.c
CONFLICT (content): Merge conflict in drivers/virtio/virtio_pci.c
Merging quilt/usb
Merging x86/auto-x86-next
Merging sched/auto-sched-next
Merging ftrace/auto-ftrace-next
Applying ftrace: fix rculist split fallout
Merging pci/linux-next
CONFLICT (content): Merge conflict in drivers/base/power/main.c
CONFLICT (content): Merge conflict in include/linux/device.h
Merging quilt/device-mapper
Merging hid/mm
Merging quilt/i2c
CONFLICT (content): Merge conflict in drivers/i2c/i2c-core.c
Merging quilt/kernel-doc
Merging avr32/avr32-arch
Merging v4l-dvb/stable
Merging s390/features
CONFLICT (content): Merge conflict in drivers/s390/block/dasd.c
CONFLICT (content): Merge conflict in drivers/s390/block/dasd_eckd.c
CONFLICT (content): Merge conflict in drivers/s390/block/dasd_fba.c
CONFLICT (content): Merge conflict in drivers/s390/char/tape_core.c
CONFLICT (content): Merge conflict in drivers/s390/cio/device_fsm.c
CONFLICT (content): Merge conflict in drivers/s390/net/claw.c
CONFLICT (content): Merge conflict in drivers/s390/net/ctcm_main.c
CONFLICT (content): Merge conflict in drivers/s390/net/lcs.c
Merging sh/master
Merging jfs/next
Merging kbuild/master
Merging quilt/ide
Merging libata/NEXT
Merging nfs/linux-next
Merging xfs/master
Merging infiniband/for-next
Merging acpi/test
Merging blackfin/for-linus
Merging nfsd/nfsd-next
Merging ieee1394/for-next
Merging hwmon/testing
Merging ubi/master
Merging kvm/master
Merging dlm/next
Merging scsi/master
Applying scsi: fix fallout from KOBJ_NAME_LEN removal
Merging ia64/test
Merging tests/master
CONFLICT (content): Merge conflict in lib/Kconfig.debug
Merging ocfs2/linux-next
Merging selinux/for-akpm
Merging quilt/m68k
Merging powerpc/powerpc-next
Merging hrt/mm
Merging lblnet/master
Merging ext4/next
Merging 4xx/next
Merging async_tx/next
Merging udf/for_next
Merging security-testing/next
Merging net/master
Merging sparc/master
Merging galak/powerpc-next
CONFLICT (content): Merge conflict in Documentation/powerpc/booting-without-of.txt
Merging mtd/master
Merging wireless/master
CONFLICT (content): Merge conflict in drivers/net/ps3_gelic_wireless.c
CONFLICT (content): Merge conflict in drivers/net/wireless/libertas/main.c
CONFLICT (content): Merge conflict in drivers/net/wireless/rt2x00/rt2x00dev.c
Merging crypto/master
Merging vfs/vfs-2.6.25
Merging sound/master
Merging arm/devel
CONFLICT (content): Merge conflict in arch/arm/mach-pxa/tosa.c
Merging cpufreq/next
Merging v9fs/for-next
Merging quilt/rr
CONFLICT (content): Merge conflict in drivers/net/virtio_net.c
Merging cifs/master
Merging mmc/next
Merging gfs2/master
Merging rcu/core/rcu
Merging locking/core/locking
Merging safe-poison-pointers/safe-poison-pointers
Merging stackprotector/stackprotector
Merging input/next
Merging semaphore/semaphore
Merging semaphore-removal/semaphore-removal
CONFLICT (content): Merge conflict in drivers/net/ps3_gelic_wireless.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_attr.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_def.h
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_mbx.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_mid.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_os.c
Merging quilt/ldp.next
Applying ldp: fix fallout from v4l struct element renaming
Merging bkl-removal/bkl-removal
Merging trivial/next
Merging ubifs/for_andrew
Merging lsm/for-next
Merging block/for-next
Merging embedded/master
Merging firmware/master
CONFLICT (content): Merge conflict in drivers/usb/serial/Kconfig
CONFLICT (delete/modify): drivers/usb/serial/ti_fw_3410.h deleted in firmware/master and modified in HEAD. Version HEAD of drivers/usb/serial/ti_fw_3410.h left in tree.
CONFLICT (delete/modify): drivers/usb/serial/ti_fw_5052.h deleted in firmware/master and modified in HEAD. Version HEAD of drivers/usb/serial/ti_fw_5052.h left in tree.
CONFLICT (content): Merge conflict in drivers/usb/serial/ti_usb_3410_5052.c
CONFLICT (content): Merge conflict in sound/pci/Kconfig
CONFLICT (content): Merge conflict in sound/pci/maestro3.c
CONFLICT (content): Merge conflict in sound/pci/ymfpci/ymfpci_main.c
Applying Fix various 8390 builds
Applying firmware: build fixes 2


Attachments:
(No filename) (7.13 kB)
(No filename) (197.00 B)
Download all attachments

2008-06-06 02:57:05

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <[email protected]> wrote:

> I have created today's linux-next tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git

Instantly oopses on two x86_64 boxes with this config:
http://userweb.kernel.org/~akpm/config-akpm2.txt

oops: http://userweb.kernel.org/~akpm/p6056454.jpg

At a guess I'd say the sched_domains code is calling into slab before
slab is initalised. Something like that.


I had to do this:

--- a/arch/x86/kernel/traps_64.c~a
+++ a/arch/x86/kernel/traps_64.c
@@ -504,6 +504,7 @@ void show_registers(struct pt_regs *regs
}
}
printk("\n");
+ for ( ;; );
}

int is_valid_bugaddr(unsigned long ip)
_

to collect that oops. Otherwise it scrolled away due to "trying to
kill init" doing a dump_stack. pause_on_oops seems to not be working
properly any more. It used to.

2008-06-06 03:46:31

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Thu, 5 Jun 2008 19:56:04 -0700 Andrew Morton <[email protected]> wrote:

> On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <[email protected]> wrote:
>
> > I have created today's linux-next tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
>
> Instantly oopses on two x86_64 boxes with this config:
> http://userweb.kernel.org/~akpm/config-akpm2.txt
>
> oops: http://userweb.kernel.org/~akpm/p6056454.jpg
>
> At a guess I'd say the sched_domains code is calling into slab before
> slab is initalised. Something like that.

With CONFIG_SLUB=y it dies differently:

http://userweb.kernel.org/~akpm/p6056455.jpg

2008-06-06 07:17:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <[email protected]> wrote:
>
> > I have created today's linux-next tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
>
> Instantly oopses on two x86_64 boxes with this config:
> http://userweb.kernel.org/~akpm/config-akpm2.txt
>
> oops: http://userweb.kernel.org/~akpm/p6056454.jpg
>
> At a guess I'd say the sched_domains code is calling into slab before
> slab is initalised. Something like that.

did SLUB change in linux-next? There no such problem in -tip.

> I had to do this:
>
> --- a/arch/x86/kernel/traps_64.c~a
> +++ a/arch/x86/kernel/traps_64.c
> @@ -504,6 +504,7 @@ void show_registers(struct pt_regs *regs
> }
> }
> printk("\n");
> + for ( ;; );
> }
>
> int is_valid_bugaddr(unsigned long ip)
> _
>
> to collect that oops. Otherwise it scrolled away due to "trying to
> kill init" doing a dump_stack. pause_on_oops seems to not be working
> properly any more. It used to.

hm, perhaps mdelay(1) does not loop for 1 msec anymore? You'll probably
be able to work it around via pause_on_oops=5000000 or so.

Ingo

2008-06-06 07:26:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Ingo Molnar <[email protected]> wrote:

> * Andrew Morton <[email protected]> wrote:
>
> > On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <[email protected]> wrote:
> >
> > > I have created today's linux-next tree at
> > > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> >
> > Instantly oopses on two x86_64 boxes with this config:
> > http://userweb.kernel.org/~akpm/config-akpm2.txt
> >
> > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> >
> > At a guess I'd say the sched_domains code is calling into slab before
> > slab is initalised. Something like that.
>
> did SLUB change in linux-next? There is no such problem in -tip.

i just successfully booted your config on 4 separate 64-bit test-systems
with latest -tip. (two dual-core boxes, a quad and a 16way box) Latest
-tip includes sched-next and x86-next as well.

Ingo

2008-06-06 07:30:28

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 09:17:07 +0200 Ingo Molnar <[email protected]> wrote:

> * Andrew Morton <[email protected]> wrote:
>
> > On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <[email protected]> wrote:
> >
> > > I have created today's linux-next tree at
> > > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> >
> > Instantly oopses on two x86_64 boxes with this config:
> > http://userweb.kernel.org/~akpm/config-akpm2.txt
> >
> > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> >
> > At a guess I'd say the sched_domains code is calling into slab before
> > slab is initalised. Something like that.
>
> did SLUB change in linux-next? There no such problem in -tip.

It crashes on two quite different machines with both slab and slub.

2008-06-06 07:33:53

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Hi Ingo,

On Fri, 6 Jun 2008 09:17:07 +0200 Ingo Molnar <[email protected]> wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > Instantly oopses on two x86_64 boxes with this config:
> > http://userweb.kernel.org/~akpm/config-akpm2.txt
> >
> > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> >
> > At a guess I'd say the sched_domains code is calling into slab before
> > slab is initalised. Something like that.
>
> did SLUB change in linux-next? There no such problem in -tip.

$ git rev-list stable..next-20080605 -- mm/slub.c include/linux/slub_def.h
139e2551f25697a242de2b9c61d4514c2f762ca8
0bb08241ce68aaa70b7b804b4d6319d8bad3ae24

The first is the merge of the trivial tree and the second is in that tree
and only changes some comments in slub.c

$ git rev-list stable..next-20080605 -- mm/slab.c include/linux/slab*
(nothing)

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (960.00 B)
(No filename) (197.00 B)
Download all attachments

2008-06-06 07:34:14

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 09:25:36 +0200 Ingo Molnar <[email protected]> wrote:

>
> * Ingo Molnar <[email protected]> wrote:
>
> > * Andrew Morton <[email protected]> wrote:
> >
> > > On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <[email protected]> wrote:
> > >
> > > > I have created today's linux-next tree at
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> > >
> > > Instantly oopses on two x86_64 boxes with this config:
> > > http://userweb.kernel.org/~akpm/config-akpm2.txt
> > >
> > > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> > >
> > > At a guess I'd say the sched_domains code is calling into slab before
> > > slab is initalised. Something like that.
> >
> > did SLUB change in linux-next? There is no such problem in -tip.
>
> i just successfully booted your config on 4 separate 64-bit test-systems
> with latest -tip. (two dual-core boxes, a quad and a 16way box) Latest
> -tip includes sched-next and x86-next as well.

What's the point in testing a radically differenet kernel from the one
which is known to be crashing?

2008-06-06 07:42:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> > > did SLUB change in linux-next? There is no such problem in -tip.
> >
> > i just successfully booted your config on 4 separate 64-bit
> > test-systems with latest -tip. (two dual-core boxes, a quad and a
> > 16way box) Latest -tip includes sched-next and x86-next as well.
>
> What's the point in testing a radically differenet kernel from the one
> which is known to be crashing?

well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this
area (scheduling, 64-bit x86) in the first step.

Ingo

2008-06-06 07:48:43

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 09:41:37 +0200 Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > > > did SLUB change in linux-next? There is no such problem in -tip.
> > >
> > > i just successfully booted your config on 4 separate 64-bit
> > > test-systems with latest -tip. (two dual-core boxes, a quad and a
> > > 16way box) Latest -tip includes sched-next and x86-next as well.
> >
> > What's the point in testing a radically differenet kernel from the one
> > which is known to be crashing?
>
> well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this
> area (scheduling, 64-bit x86) in the first step.
>

What's the relationship between -tip and linux-next?

The crash seems to be due to sched_domains startup ordering, at a guess.

My third bisect iteration has hit this:

arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
arch/x86/mm/kmmio.c:85: error: 'list' undeclared (first use in this function)
arch/x86/mm/kmmio.c:85: error: (Each undeclared identifier is reported only once
arch/x86/mm/kmmio.c:85: error: for each function it appears in.)
arch/x86/mm/kmmio.c:85: error: syntax error before '{' token
arch/x86/mm/kmmio.c:88: warning: no return statement in function returning non-void
arch/x86/mm/kmmio.c: In function 'get_kmmio_fault_page':
arch/x86/mm/kmmio.c:100: error: 'list' undeclared (first use in this function)
arch/x86/mm/kmmio.c:100: error: syntax error before '{' token
arch/x86/mm/kmmio.c:103: warning: no return statement in function returning non-void
arch/x86/mm/kmmio.c: In function 'add_kmmio_fault_page':
arch/x86/mm/kmmio.c:328: error: implicit declaration of function 'list_add_rcu'
arch/x86/mm/kmmio.c: In function 'remove_kmmio_fault_pages':
arch/x86/mm/kmmio.c:420: error: implicit declaration of function 'list_del_rcu'

2008-06-06 07:54:27

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Hi Andrew,

On Fri, 6 Jun 2008 00:47:43 -0700 Andrew Morton <[email protected]> wrote:
>
> My third bisect iteration has hit this:
>
> arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'

You need the following patch from linux-next. Which should be the commit
immediately after the merge of the ftrace tree.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

>From ee19aa543ada9ce11a0b3b8480f3a268ff86cb02 Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <[email protected]>
Date: Tue, 27 May 2008 12:53:04 +1000
Subject: [PATCH] ftrace: fix rculist split fallout

Signed-off-by: Stephen Rothwell <[email protected]>
---
arch/x86/mm/kmmio.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index b65871e..7bfdad7 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -23,6 +23,7 @@
#include <linux/errno.h>
#include <asm/debugreg.h>
#include <linux/mmiotrace.h>
+#include <linux/rculist.h>

#define KMMIO_PAGE_HASH_BITS 4
#define KMMIO_PAGE_TABLE_SIZE (1 << KMMIO_PAGE_HASH_BITS)
--
1.5.5.1

2008-06-06 08:02:58

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 17:53:58 +1000 Stephen Rothwell <[email protected]> wrote:

> On Fri, 6 Jun 2008 00:47:43 -0700 Andrew Morton <[email protected]> wrote:
> >
> > My third bisect iteration has hit this:
> >
> > arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> > arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
>
> You need the following patch from linux-next. Which should be the commit
> immediately after the merge of the ftrace tree.

Well yes - I just bodged it by hand then unbodged it later. But we
have a bisection break there. Admittedly a minor one, unless the bug
you're bisecting for requires that kprobes be configured. But it would
be nice to squish it.

I hope Ingo isn't following this
once-you've-checked-it-in-you-can't-fix-it stupidity :(

2008-06-06 08:22:28

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Hi Andrew,

On Fri, 6 Jun 2008 01:01:49 -0700 Andrew Morton <[email protected]> wrote:
>
> Well yes - I just bodged it by hand then unbodged it later. But we
> have a bisection break there. Admittedly a minor one, unless the bug
> you're bisecting for requires that kprobes be configured. But it would
> be nice to squish it.
>
> I hope Ingo isn't following this
> once-you've-checked-it-in-you-can't-fix-it stupidity :(

Its a break caused by the merge of the ftrace tree into the linux-next
tree (because at the point I merge the ftrace tree, linux-next contains
the rcu tree which has moves stuff into rculist.h), so logically that
patch should become part of the merge commit. If it was part of the
merge, you could never bisect to a point where you got this build
breakage.

Each tree is fine on its own if you go one step back from the merge.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (966.00 B)
(No filename) (197.00 B)
Download all attachments

2008-06-06 08:24:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> > > > i just successfully booted your config on 4 separate 64-bit
> > > > test-systems with latest -tip. (two dual-core boxes, a quad and a
> > > > 16way box) Latest -tip includes sched-next and x86-next as well.
> > >
> > > What's the point in testing a radically differenet kernel from the one
> > > which is known to be crashing?
> >
> > well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this
> > area (scheduling, 64-bit x86) in the first step.
>
> What's the relationship between -tip and linux-next?

most of the -tip topics (there are 75 of them currently) are present in
linux-next - about ~70% of all -tip commits are in linux-next already.
The stuff that is not in linux-next yet is either because it's:
miscellany fixes (i.e. intentionally grabbed out-of-tree to make our
tests work better), not cooked enough yet, or because we are still
working it out - tip is less than a month old still.

in general the rule is that if there's anything we want to push
upstream, it will show up in linux-next.

> The crash seems to be due to sched_domains startup ordering, at a guess.
>
> My third bisect iteration has hit this:
>
> arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
> arch/x86/mm/kmmio.c:85: error: 'list' undeclared (first use in this function)

hm, which commit is this exactly? I've never hit it myself in bisection
(and there are days when i bisect -tip several times). We'll respin
tip/tracing/mmiotrace if it's bisection-hostile. You can probably nudge
it into building via "git-bisect next".

Ingo

2008-06-06 08:27:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Stephen Rothwell <[email protected]> wrote:

> > arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> > arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
>
> You need the following patch from linux-next. Which should be the commit
> immediately after the merge of the ftrace tree.
>
> --
> Cheers,
> Stephen Rothwell [email protected]
> http://www.canb.auug.org.au/~sfr/
>
> >From ee19aa543ada9ce11a0b3b8480f3a268ff86cb02 Mon Sep 17 00:00:00 2001
> From: Stephen Rothwell <[email protected]>
> Date: Tue, 27 May 2008 12:53:04 +1000
> Subject: [PATCH] ftrace: fix rculist split fallout

ah, that one - that's in the tip/tracing/mmiotrace-mergefixups branch.
ideally this should be embedded in the merge commit of
tip/core/rcu+tip/auto-ftrace-next [so that no bisection can ever hit the
combined trees without also getting the merge fixup], but havent found a
good Git way of doing that yet.

Ingo

----------->
commit 668a6c3654560aef8741642478973e205a4f02bf
Author: Ingo Molnar <[email protected]>
Date: Mon May 19 13:35:24 2008 +0200

- fix mmioftrace + rcu merge interaction

Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index b65871e..93d8203 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -6,6 +6,7 @@
*/

#include <linux/list.h>
+#include <linux/rculist.h>
#include <linux/spinlock.h>
#include <linux/hash.h>
#include <linux/init.h>

2008-06-06 08:29:31

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 10:23:25 +0200 Ingo Molnar <[email protected]> wrote:
>
> hm, which commit is this exactly? I've never hit it myself in bisection
> (and there are days when i bisect -tip several times). We'll respin
> tip/tracing/mmiotrace if it's bisection-hostile. You can probably nudge
> it into building via "git-bisect next".

See my other email. This is because the ftrace tree does not merge well
with the rcu tree.

I may start merging such build breakage fixes into the actual merge
commits that cause them. That will make linux-next more bisectable, but
means I have to remember that I did it for Linus' sake.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (729.00 B)
(No filename) (197.00 B)
Download all attachments

2008-06-06 08:32:15

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 18:22:06 +1000 Stephen Rothwell <[email protected]> wrote:

> Hi Andrew,
>
> On Fri, 6 Jun 2008 01:01:49 -0700 Andrew Morton <[email protected]> wrote:
> >
> > Well yes - I just bodged it by hand then unbodged it later. But we
> > have a bisection break there. Admittedly a minor one, unless the bug
> > you're bisecting for requires that kprobes be configured. But it would
> > be nice to squish it.
> >
> > I hope Ingo isn't following this
> > once-you've-checked-it-in-you-can't-fix-it stupidity :(
>
> Its a break caused by the merge of the ftrace tree into the linux-next
> tree (because at the point I merge the ftrace tree, linux-next contains
> the rcu tree which has moves stuff into rculist.h), so logically that
> patch should become part of the merge commit. If it was part of the
> merge, you could never bisect to a point where you got this build
> breakage.
>
> Each tree is fine on its own if you go one step back from the merge.

Well OK. But patches in fact _do_ go into Linux as a single linear
stream of commits. But the whole git model ignores that reality and
here we see the result.

And saying "git doesn't work like that - you don't understand" just
doesn't cut it. It is a tool's job to permit humans to implement the
workflow which they wish to follow. Not to go and force them into
doing something inferior.

Sigh.

/usualrant

2008-06-06 08:33:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Stephen Rothwell <[email protected]> wrote:

> On Fri, 6 Jun 2008 10:23:25 +0200 Ingo Molnar <[email protected]> wrote:
> >
> > hm, which commit is this exactly? I've never hit it myself in
> > bisection (and there are days when i bisect -tip several times).
> > We'll respin tip/tracing/mmiotrace if it's bisection-hostile. You
> > can probably nudge it into building via "git-bisect next".
>
> See my other email. This is because the ftrace tree does not merge
> well with the rcu tree.

yeah, we have this fixup in -tip as well, in a structured way: you might
want to start tracking the tip/tracing/mmiotrace-mergefixups branch to
pick it up.

or we could offer you a full auto-tip-next plug-and-play branch as well.
(there's no reason to redo all these integration steps)

Ingo

2008-06-06 08:37:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> > Each tree is fine on its own if you go one step back from the merge.
>
> Well OK. But patches in fact _do_ go into Linux as a single linear
> stream of commits. But the whole git model ignores that reality and
> here we see the result.

it's fixable via "git-merge -n" and then doing a second git-merge, to
create only a single commit. OTOH, it's more transparent to have such
manual fixups in a followup commit.

Ingo

2008-06-06 08:39:20

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 10:23:25 +0200 Ingo Molnar <[email protected]> wrote:

> * Andrew Morton <[email protected]> wrote:
>
> > > > > i just successfully booted your config on 4 separate 64-bit
> > > > > test-systems with latest -tip. (two dual-core boxes, a quad and a
> > > > > 16way box) Latest -tip includes sched-next and x86-next as well.
> > > >
> > > > What's the point in testing a radically differenet kernel from the one
> > > > which is known to be crashing?
> > >
> > > well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this
> > > area (scheduling, 64-bit x86) in the first step.
> >
> > What's the relationship between -tip and linux-next?
>
> most of the -tip topics (there are 75 of them currently) are present in
> linux-next - about ~70% of all -tip commits are in linux-next already.
> The stuff that is not in linux-next yet is either because it's:
> miscellany fixes (i.e. intentionally grabbed out-of-tree to make our
> tests work better), not cooked enough yet, or because we are still
> working it out - tip is less than a month old still.
>
> in general the rule is that if there's anything we want to push
> upstream, it will show up in linux-next.

I don't think it's a good idea for you guys to be off working on 2.6.28
material when we're trying to stabilise 2.6.25, 2.6.26 and preparing
for 2.6.27.

What's especially regrettable is that, afaik, you are expending testing
resources on a tree which nobody will ever run rather than upon the
tree which everyone _will_ run :( We'd all be better off if that testing
was being performed against linux-next. Or at least some (most) of it.


ho hum.

Bisecting: 23 revisions left to test after this
[919b0a2702e5a0284094f63215da65539f6ef692] Merge branch 'x86/ptemask' into auto-x86-next

No -mm today...

2008-06-06 08:50:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> > most of the -tip topics (there are 75 of them currently) are present
> > in linux-next - about ~70% of all -tip commits are in linux-next
> > already. The stuff that is not in linux-next yet is either because
> > it's: miscellany fixes (i.e. intentionally grabbed out-of-tree to
> > make our tests work better), not cooked enough yet, or because we
> > are still working it out - tip is less than a month old still.
> >
> > in general the rule is that if there's anything we want to push
> > upstream, it will show up in linux-next.
>
> I don't think it's a good idea for you guys to be off working on
> 2.6.28 material when we're trying to stabilise 2.6.25, 2.6.26 and
> preparing for 2.6.27.
>
> What's especially regrettable is that, afaik, you are expending
> testing resources on a tree which nobody will ever run rather than
> upon the tree which everyone _will_ run :( [...]

what do you mean? We are testing commits that everybody will run and are
pre-filtering them for sanity and stability before they hit linux-next.

Ingo

2008-06-06 09:01:54

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 10:49:49 +0200 Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > > most of the -tip topics (there are 75 of them currently) are present
> > > in linux-next - about ~70% of all -tip commits are in linux-next
> > > already. The stuff that is not in linux-next yet is either because
> > > it's: miscellany fixes (i.e. intentionally grabbed out-of-tree to
> > > make our tests work better), not cooked enough yet, or because we
> > > are still working it out - tip is less than a month old still.
> > >
> > > in general the rule is that if there's anything we want to push
> > > upstream, it will show up in linux-next.
> >
> > I don't think it's a good idea for you guys to be off working on
> > 2.6.28 material when we're trying to stabilise 2.6.25, 2.6.26 and
> > preparing for 2.6.27.
> >
> > What's especially regrettable is that, afaik, you are expending
> > testing resources on a tree which nobody will ever run rather than
> > upon the tree which everyone _will_ run :( [...]
>
> what do you mean? We are testing commits that everybody will run and are
> pre-filtering them for sanity and stability before they hit linux-next.
>

One doesn't test commits - one tests a tree. And the -tip tree is
2.6.26-rc5 plus a bunch of x86 changes. That tree will never be run by
anyone. Testing -tip fails to pick up problems which are caused by
integration of the x86 changes with everyone else's work and it fails
to pick up problems which lie wholly outside the x86 changes.

For both these reasons it would be more valuable were that testing
effort to be expended on our 2.6.27 candidate tree.

Plus, of course, there's the risk that linux-next contains x86-only
regressions which were fixed or avoided in -tip.




Bisecting: 5 revisions left to test after this
[29657a44f8660acd8751d7e9f5aac06ec8633481] x86: cleanup early per cpu variables/accesses v4

2008-06-06 09:48:43

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 00:29:57 -0700 Andrew Morton <[email protected]> wrote:

> It crashes on two quite different machines with both slab and slub.

OK, I seem to be screwed here.

Five commits to go and my bisection point is at

Author: Mike Travis <[email protected]> 2008-05-12 12:21:13
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:08:00
Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526

But here I'm getting a totally different crash - an early exception.

I'll try a linear search starting at

Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:05:25
Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
Child: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
Branch:

and ending at

Author: Mike Travis <[email protected]> 2008-05-12 12:21:13
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:08:00
Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526


2008-06-06 09:54:40

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <[email protected]> wrote:

> On Fri, 6 Jun 2008 00:29:57 -0700 Andrew Morton <[email protected]> wrote:
>
> > It crashes on two quite different machines with both slab and slub.
>
> OK, I seem to be screwed here.
>
> Five commits to go and my bisection point is at

argh, that was copy-n-pasted from gitk which doesn't give the commit IDs.

> Author: Mike Travis <[email protected]> 2008-05-12 12:21:13
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:08:00
> Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526

29657a44f8660acd8751d7e9f5aac06ec8633481
x86: cleanup early per cpu variables/accesses v4

> But here I'm getting a totally different crash - an early exception.
>
> I'll try a linear search starting at
>
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:05:25
> Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
> Child: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
> Branch:

ff0e010ef613b0e7136f2f40ec4b51273676b085
x86: fix remove cpu_pda table patch

> and ending at
>
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:13
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:08:00
> Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526


78d49c6d890aee9cf8aea371011c9d7b0121b822
x86: remove static boot_cpu_pda array v2

2008-06-06 10:10:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> > Author: Mike Travis <[email protected]> 2008-05-12 12:21:13
> > Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:08:00
> > Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> > Branch:
> > Follows: v2.6.26-rc2
> > Precedes: next-20080526
>
> 29657a44f8660acd8751d7e9f5aac06ec8633481
> x86: cleanup early per cpu variables/accesses v4

hm, these commits have not caused problems in testing before. Mike
Cc:-ed.

Ingo

2008-06-06 10:47:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> > what do you mean? We are testing commits that everybody will run and
> > are pre-filtering them for sanity and stability before they hit
> > linux-next.
>
> One doesn't test commits - one tests a tree. And the -tip tree is
> 2.6.26-rc5 plus a bunch of x86 changes. [...]

no, 90%+ of all bugs are not due to tree interaction effects but are
caused by individual commits, triggerable on a particular
system/workload. (Our historic regression list is the proof for that,
can give you itemized statistics if you want.)

also, the -tip tree is not "2.6.26-rc5 plus a bunch of x86 changes" but
v2.6.26-rc5-84-g39b945a plus 75 topic trees we maintain:

build, core/futex-64bit, core/kill-the-BKL, core/locking, core/percpu,
core/printk, core/rcu, core/rodata, core/softirq, core/softlockup,
core/stacktrace, core/urgent, cpus4096, genirq, hrtimers, kmemcheck,
out-of-tree, pci-for-jesse, safe-poison-pointers, sched, sched-devel,
scratch, stackprotector, timers/clockevents, timers/hpet,
timers/hrtimers, timers/nohz, timers/posixtimers, tip, tracing/ftrace,
tracing/ftrace-mergefixups, tracing/immediates, tracing/markers,
tracing/mmiotrace, tracing/mmiotrace-mergefixups, tracing/nmisafe,
tracing/sched_markers, tracing/stopmachine-allcpus, tracing/sysprof,
tracing/textedit, x86/apic, x86/apm, x86/bitops, x86/build, x86/checkme,
x86/cleanups, x86/cpa, x86/cpu, x86/defconfig, x86/gart, x86/i8259,
x86/intel, x86/irq, x86/irqstats, x86/kconfig, x86/ldt, x86/mce,
x86/memtest, x86/mmio, x86/mpparse, x86/nmi, x86/numa, x86/numa-fixes,
x86/pat, x86/pebs, x86/ptemask, x86/resumetrace, x86/scratch, x86/setup,
x86/threadinfo, x86/timers, x86/urgent, x86/uv, x86/vdso, x86/xen,
x86/xsave.

most of which are in linux-next (around 70%), or will be shortly in
linux-next (more than 90%).

> [...] That tree will never be run by anyone. Testing -tip fails to
> pick up problems which are caused by integration of the x86 changes
> with everyone else's work and it fails to pick up problems which lie
> wholly outside the x86 changes.

that's wrong, and here's a very clear counter-example: 95% of the trees
we all test during a bisection session is executed for the first time
ever and wont ever be run by anyone else. If the integration aspects
mattered as much as you claim then bisection would almost never work in
practice.

Dont get me wrong, integration _does_ matter (and hence we do it
ourselves, instead of dumping 70+ trees on you!), but the reality is
that 90% of the bugs are introduced by a single commit and go away if
the change done by that commit is removed.

The real benefit of integration is not the technical effects of
integration but the testing effects: people are enabled to test more
commits at once.

> For both these reasons it would be more valuable were that testing
> effort to be expended on our 2.6.27 candidate tree.

but that's blatantly wrong: my testing would only be wasted if my test
capacity was unused. In reality it's fully utilized: half of it is spent
on general upstream problems we trigger [9381 commits since v2.6.25 and
counting], the other half of it is spent on our incoming -tip flow of
patches for v2.6.27 [750 commits and counting].

If there's spare capacity we do volunteer to debug whatever problem that
comes up. In fact i'd say i still test way more than i should ;-)

> Plus, of course, there's the risk that linux-next contains x86-only
> regressions which were fixed or avoided in -tip.

there's risk from every single line of source code difference. There's
risk from having just a single binary bit of difference between two
user-space installations. The question is always the amount of risk and
how to manage that risk.

Ingo

2008-06-06 10:54:53

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <[email protected]> wrote:

> I'll try a linear search starting at

ff0e010ef613b0e7136f2f40ec4b51273676b085
Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:05:25
Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
Child: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526

x86: fix remove cpu_pda table patch


Good

dfdf1d75efee39e9396f8384c6f3bf555349ed60
Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:05:39
Parent: ff0e010ef613b0e7136f2f40ec4b51273676b085 (x86: fix remove cpu_pda table patch)
Child: 29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526

x86: modify Kconfig to allow up to 4096 cpus

Good

29657a44f8660acd8751d7e9f5aac06ec8633481
Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:23
Parent: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
Child: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526

x86: cleanup early per cpu variables/accesses v4

Good


543e21916497be5a4005fd5820264ce1de9bd56d
Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:37
Parent: 29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
Child: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526

x86: restore pda nodenumber field

Good

a9ad585c8a18f7ba754b85f5786976609b9d7d29
Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:47
Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
Child: 78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
Branch:
Follows: v2.6.26-rc2
Precedes: next-20080526

x86: remove the static 256k node_to_cpumask_map

crash, as described earlier.

I don't know what happened to that early exception - it didn't come back.

The below revert gets linux-next working for me.



From: Andrew Morton <[email protected]>

Revert

commit a9ad585c8a18f7ba754b85f5786976609b9d7d29
Author: Mike Travis <[email protected]>
Date: Mon May 12 21:21:12 2008 +0200

x86: remove the static 256k node_to_cpumask_map

* Consolidate node_to_cpumask operations and remove the 256k
byte node_to_cpumask_map. This is done by allocating the
node_to_cpumask_map array after the number of possible nodes
(nr_node_ids) is known.

* Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
been increased. It now shows faults when calling node_to_cpumask()
and node_to_cpumask_ptr().

For inclusion into sched-devel/latest tree.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

Cc: Mike Travis <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86/kernel/setup.c | 132 +----------------------------------
arch/x86/mm/numa_64.c | 6 +
include/asm-x86/topology.h | 25 ++----
3 files changed, 19 insertions(+), 144 deletions(-)

diff -puN arch/x86/kernel/setup.c~revert-x86-remove-the-static-256k-node_to_cpumask_map arch/x86/kernel/setup.c
--- a/arch/x86/kernel/setup.c~revert-x86-remove-the-static-256k-node_to_cpumask_map
+++ a/arch/x86/kernel/setup.c
@@ -35,16 +35,6 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu
/* map cpu index to node index */
DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
-
-/* which logical CPUs are on which nodes */
-cpumask_t *node_to_cpumask_map;
-EXPORT_SYMBOL(node_to_cpumask_map);
-
-/* setup node_to_cpumask_map */
-static void __init setup_node_to_cpumask_map(void);
-
-#else
-static inline void setup_node_to_cpumask_map(void) { }
#endif

#if defined(CONFIG_HAVE_SETUP_PER_CPU_AREA) && defined(CONFIG_SMP)
@@ -191,15 +181,11 @@ void __init setup_per_cpu_areas(void)

}

- printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
- NR_CPUS, nr_cpu_ids, nr_node_ids);
+ printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d\n", NR_CPUS, nr_cpu_ids);

/* Setup percpu data maps */
setup_per_cpu_maps();

- /* Setup node to cpumask map */
- setup_node_to_cpumask_map();
-
/* Setup cpumask_of_cpu map */
setup_cpumask_of_cpu();
}
@@ -220,35 +206,6 @@ void __cpuinit amd_enable_pci_ext_cfg(st
#endif

#ifdef X86_64_NUMA
-
-/*
- * Allocate node_to_cpumask_map based on number of available nodes
- * Requires node_possible_map to be valid.
- *
- * Note: node_to_cpumask() is not valid until after this is done.
- */
-static void __init setup_node_to_cpumask_map(void)
-{
- unsigned int node, num = 0;
- cpumask_t *map;
-
- /* setup nr_node_ids if not done yet */
- if (nr_node_ids == MAX_NUMNODES) {
- for_each_node_mask(node, node_possible_map)
- num = node;
- nr_node_ids = num + 1;
- }
-
- /* allocate the map */
- map = alloc_bootmem_low(nr_node_ids * sizeof(cpumask_t));
-
- Dprintk(KERN_DEBUG "Node to cpumask map at %p for %d nodes\n",
- map, nr_node_ids);
-
- /* node_to_cpumask() will now work */
- node_to_cpumask_map = map;
-}
-
void __cpuinit numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
@@ -271,8 +228,6 @@ void __cpuinit numa_clear_node(int cpu)
numa_set_node(cpu, NUMA_NO_NODE);
}

-#ifndef CONFIG_DEBUG_PER_CPU_MAPS
-
void __cpuinit numa_add_cpu(int cpu)
{
cpu_set(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
@@ -282,44 +237,9 @@ void __cpuinit numa_remove_cpu(int cpu)
{
cpu_clear(cpu, node_to_cpumask_map[cpu_to_node(cpu)]);
}
+#endif /* CONFIG_NUMA */

-#else /* CONFIG_DEBUG_PER_CPU_MAPS */
-
-/*
- * --------- debug versions of the numa functions ---------
- */
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
-{
- int node = cpu_to_node(cpu);
- cpumask_t *mask;
- char buf[64];
-
- if (node_to_cpumask_map == NULL) {
- printk(KERN_ERR "node_to_cpumask_map NULL\n");
- dump_stack();
- return;
- }
-
- mask = &node_to_cpumask_map[node];
- if (enable)
- cpu_set(cpu, *mask);
- else
- cpu_clear(cpu, *mask);
-
- cpulist_scnprintf(buf, sizeof(buf), *mask);
- printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
- enable? "numa_add_cpu":"numa_remove_cpu", cpu, node, buf);
- }
-
-void __cpuinit numa_add_cpu(int cpu)
-{
- numa_set_cpumask(cpu, 1);
-}
-
-void __cpuinit numa_remove_cpu(int cpu)
-{
- numa_set_cpumask(cpu, 0);
-}
+#if defined(CONFIG_DEBUG_PER_CPU_MAPS) && defined(CONFIG_X86_64)

int cpu_to_node(int cpu)
{
@@ -333,10 +253,6 @@ int cpu_to_node(int cpu)
}
EXPORT_SYMBOL(cpu_to_node);

-/*
- * Same function as cpu_to_node() but used if called before the
- * per_cpu areas are setup.
- */
int early_cpu_to_node(int cpu)
{
if (early_per_cpu_ptr(x86_cpu_to_node_map))
@@ -345,47 +261,9 @@ int early_cpu_to_node(int cpu)
if (!per_cpu_offset(cpu)) {
printk(KERN_WARNING
"early_cpu_to_node(%d): no per_cpu area!\n", cpu);
- dump_stack();
+ dump_stack();
return NUMA_NO_NODE;
}
return per_cpu(x86_cpu_to_node_map, cpu);
}
-
-/*
- * Returns a pointer to the bitmask of CPUs on Node 'node'.
- */
-cpumask_t *_node_to_cpumask_ptr(int node)
-{
- if (node_to_cpumask_map == NULL) {
- printk(KERN_WARNING
- "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
- node);
- dump_stack();
- return &cpu_online_map;
- }
- return &node_to_cpumask_map[node];
-}
-EXPORT_SYMBOL(_node_to_cpumask_ptr);
-
-/*
- * Returns a bitmask of CPUs on Node 'node'.
- */
-cpumask_t node_to_cpumask(int node)
-{
- if (node_to_cpumask_map == NULL) {
- printk(KERN_WARNING
- "node_to_cpumask(%d): no node_to_cpumask_map!\n", node);
- dump_stack();
- return cpu_online_map;
- }
- return node_to_cpumask_map[node];
-}
-EXPORT_SYMBOL(node_to_cpumask);
-
-/*
- * --------- end of debug versions of the numa functions ---------
- */
-
-#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
-
-#endif /* X86_64_NUMA */
+#endif
diff -puN arch/x86/mm/numa_64.c~revert-x86-remove-the-static-256k-node_to_cpumask_map arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c~revert-x86-remove-the-static-256k-node_to_cpumask_map
+++ a/arch/x86/mm/numa_64.c
@@ -35,6 +35,9 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpu
[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
};

+cpumask_t node_to_cpumask_map[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_to_cpumask_map);
+
int numa_off __initdata;
static unsigned long __initdata nodemap_addr;
static unsigned long __initdata nodemap_size;
@@ -557,6 +560,9 @@ void __init numa_initmem_init(unsigned l
node_set(0, node_possible_map);
for (i = 0; i < NR_CPUS; i++)
numa_set_node(i, 0);
+ /* cpumask_of_cpu() may not be available during early startup */
+ memset(&node_to_cpumask_map[0], 0, sizeof(node_to_cpumask_map[0]));
+ cpu_set(0, node_to_cpumask_map[0]);
e820_register_active_regions(0, start_pfn, last_pfn);
setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
}
diff -puN include/asm-x86/topology.h~revert-x86-remove-the-static-256k-node_to_cpumask_map include/asm-x86/topology.h
--- a/include/asm-x86/topology.h~revert-x86-remove-the-static-256k-node_to_cpumask_map
+++ a/include/asm-x86/topology.h
@@ -57,16 +57,10 @@ static inline int cpu_to_node(int cpu)
}
#define early_cpu_to_node(cpu) cpu_to_node(cpu)

-/* Returns a bitmask of CPUs on Node 'node'. */
-static inline cpumask_t node_to_cpumask(int node)
-{
- return node_to_cpumask_map[node];
-}
-
#else /* CONFIG_X86_64 */

/* Mappings between node number and cpus on that node. */
-extern cpumask_t *node_to_cpumask_map;
+extern cpumask_t node_to_cpumask_map[];

/* Mappings between logical cpu number and node number */
DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
@@ -110,6 +104,7 @@ static inline cpumask_t node_to_cpumask(
}

#endif /* !CONFIG_DEBUG_PER_CPU_MAPS */
+#endif /* CONFIG_X86_64 */

/* Replace default node_to_cpumask_ptr with optimized version */
#define node_to_cpumask_ptr(v, node) \
@@ -118,7 +113,12 @@ static inline cpumask_t node_to_cpumask(
#define node_to_cpumask_ptr_next(v, node) \
v = _node_to_cpumask_ptr(node)

-#endif /* CONFIG_X86_64 */
+/* Returns the number of the first CPU on Node 'node'. */
+static inline int node_to_first_cpu(int node)
+{
+ node_to_cpumask_ptr(mask, node);
+ return first_cpu(*mask);
+}

/*
* Returns the number of the node containing Node 'node'. This
@@ -204,15 +204,6 @@ static inline int node_to_first_cpu(int

#include <asm-generic/topology.h>

-#ifdef CONFIG_NUMA
-/* Returns the number of the first CPU on Node 'node'. */
-static inline int node_to_first_cpu(int node)
-{
- node_to_cpumask_ptr(mask, node);
- return first_cpu(*mask);
-}
-#endif
-
extern cpumask_t cpu_coregroup_map(int cpu);

#ifdef ENABLE_TOPO_DEFINES
_

2008-06-06 11:21:19

by Vegard Nossum

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, Jun 6, 2008 at 12:54 PM, Andrew Morton
<[email protected]> wrote:
> On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <[email protected]> wrote:

...

> commit a9ad585c8a18f7ba754b85f5786976609b9d7d29
> Author: Mike Travis <[email protected]>
> Date: Mon May 12 21:21:12 2008 +0200
>
> x86: remove the static 256k node_to_cpumask_map
>
> * Consolidate node_to_cpumask operations and remove the 256k
> byte node_to_cpumask_map. This is done by allocating the
> node_to_cpumask_map array after the number of possible nodes
> (nr_node_ids) is known.
>
> * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
> been increased. It now shows faults when calling node_to_cpumask()
> and node_to_cpumask_ptr().

This might be obvious, but maybe enabling CONFIG_DEBUG_PER_CPU_MAPS
will give you some more (valuable) info? It looks to me like maybe
nr_node_ids is returning inconsistent numbers or used somewhere before
it's properly initialized.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-06 11:50:26

by Paul Mackerras

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Andrew Morton writes:

> Well OK. But patches in fact _do_ go into Linux as a single linear
> stream of commits.

Well no, they don't. Multiple people work on things independently and
then put their stuff together. Sometimes there are then conflicts
that have to be sorted out. That's what merging is all about.

> But the whole git model ignores that reality and
> here we see the result.

No, the git model (and the BK model before it) expresses the reality
that there is lots of development going on in parallel in many
different places.

> And saying "git doesn't work like that - you don't understand" just
> doesn't cut it. It is a tool's job to permit humans to implement the
> workflow which they wish to follow. Not to go and force them into
> doing something inferior.

You'd prefer to be the bunny that keeps every single subsystem's
string of patches all bundled together in a single humungous quilt
series? With all due respect (and with a sense of admiration at how
much patch-wrangling you already do), I don't think you'd scale that
far. :)

Paul.

2008-06-06 11:58:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> Good
>
> a9ad585c8a18f7ba754b85f5786976609b9d7d29
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:47
> Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
> Child: 78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526
>
> x86: remove the static 256k node_to_cpumask_map
>
> crash, as described earlier.

thanks for tracking it down! This was the origin of the commit:

# tip/x86/numa: a9ad585: x86: remove the static 256k node_to_cpumask_map

which has been in -tip since May 12 and in linux-next for two weeks
AFAICS, which is beyond the point of being something freshly wrong.

So i suspect something more subtle here. What compiler version are you
using? This crash is not something that has been found in testing before
- i use rather new compilers, gcc 4.2.2 most of the time. Previous
compilers miscompile the kernel seriously so it's not usable for our
regression testing grid.

until more is found out i've put the revert into tip/x86/numa for now.
Note, you'll also need the commit below for 32-bit NUMA.

Ingo

---------------->
commit f418f2b4a9b6ef4035cc8c9a166873a2b275e4ef
Author: Ingo Molnar <[email protected]>
Date: Fri Jun 6 13:54:52 2008 +0200

x86: fix revert side-effects

fix 32-bit NUMA.

diff --git a/include/asm-x86/topology.h b/include/asm-x86/topology.h
index c0e6ff7..abd3aa8 100644
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -57,6 +57,18 @@ static inline int cpu_to_node(int cpu)
}
#define early_cpu_to_node(cpu) cpu_to_node(cpu)

+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline cpumask_t *_node_to_cpumask_ptr(int node)
+{
+ return &node_to_cpumask_map[node];
+}
+
+/* Returns a bitmask of CPUs on Node 'node'. */
+static inline cpumask_t node_to_cpumask(int node)
+{
+ return node_to_cpumask_map[node];
+}
+
#else /* CONFIG_X86_64 */

/* Mappings between node number and cpus on that node. */

2008-06-06 12:34:18

by Vegard Nossum

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, Jun 6, 2008 at 1:57 PM, Ingo Molnar <[email protected]> wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
>> Good
>>
>> a9ad585c8a18f7ba754b85f5786976609b9d7d29
>> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
>> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:47
>> Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
>> Child: 78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
>> Branch:
>> Follows: v2.6.26-rc2
>> Precedes: next-20080526
>>
>> x86: remove the static 256k node_to_cpumask_map
>>
>> crash, as described earlier.
>
> thanks for tracking it down! This was the origin of the commit:
>
> # tip/x86/numa: a9ad585: x86: remove the static 256k node_to_cpumask_map
>
> which has been in -tip since May 12 and in linux-next for two weeks
> AFAICS, which is beyond the point of being something freshly wrong.
>
> So i suspect something more subtle here. What compiler version are you
> using? This crash is not something that has been found in testing before
> - i use rather new compilers, gcc 4.2.2 most of the time. Previous
> compilers miscompile the kernel seriously so it's not usable for our
> regression testing grid.
>

Hi,

I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.

static int __build_sched_domains(const cpumask_t *cpu_map,
struct sched_domain_attr *attr)
{
...
for (i = 0; i < MAX_NUMNODES; i++) {
...
sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
...

This code is calling into the allocator with a spurious value of i,
which causes SLAB to use an index (of 4 in my case) that is out of
bounds for its nodelist array (at least it hasn't been initialized).

This bit of code (a bit further down, inside the same loop) is also dubious:

sg = kmalloc_node(sizeof(struct sched_group),
GFP_KERNEL, i);
if (!sg) {
printk(KERN_WARNING
"Can not alloc domain group for node %d\n", j);
goto error;
}

Where it passes i to kmalloc_node() but reports an allocation for node
j. Which one is correct?

Hope this helps, will send an update if I find out more.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-06 13:29:16

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Andrew Morton wrote:
> On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <[email protected]> wrote:
>
>> I'll try a linear search starting at
>
> ff0e010ef613b0e7136f2f40ec4b51273676b085
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:05:25
> Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
> Child: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526
>
> x86: fix remove cpu_pda table patch
>
>
> Good
>
> dfdf1d75efee39e9396f8384c6f3bf555349ed60
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:05:39
> Parent: ff0e010ef613b0e7136f2f40ec4b51273676b085 (x86: fix remove cpu_pda table patch)
> Child: 29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526
>
> x86: modify Kconfig to allow up to 4096 cpus
>
> Good
>
> 29657a44f8660acd8751d7e9f5aac06ec8633481
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:23
> Parent: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
> Child: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526
>
> x86: cleanup early per cpu variables/accesses v4
>
> Good
>
>
> 543e21916497be5a4005fd5820264ce1de9bd56d
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:37
> Parent: 29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
> Child: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526
>
> x86: restore pda nodenumber field
>
> Good
>
> a9ad585c8a18f7ba754b85f5786976609b9d7d29
> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:47
> Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
> Child: 78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
> Branch:
> Follows: v2.6.26-rc2
> Precedes: next-20080526
>
> x86: remove the static 256k node_to_cpumask_map
>
> crash, as described earlier.
>
> I don't know what happened to that early exception - it didn't come back.
>
> The below revert gets linux-next working for me.

Did you try using the DEBUG_PER_CPU_MAPS option? This should trigger on any
use of the node_to_cpumask map before it's been allocated. (Hmm, I should
check if it also validates the node number - as when MAX_NUMNODES is used
instead of nr_node_ids.)

Also, could you send me the config file and a short description of what kind
of system you are testing on?

Thanks,
Mike

2008-06-06 13:34:00

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 1:57 PM, Ingo Molnar <[email protected]> wrote:
>> * Andrew Morton <[email protected]> wrote:
>>
>>> Good
>>>
>>> a9ad585c8a18f7ba754b85f5786976609b9d7d29
>>> Author: Mike Travis <[email protected]> 2008-05-12 12:21:12
>>> Committer: Thomas Gleixner <[email protected]> 2008-05-23 09:07:47
>>> Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
>>> Child: 78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
>>> Branch:
>>> Follows: v2.6.26-rc2
>>> Precedes: next-20080526
>>>
>>> x86: remove the static 256k node_to_cpumask_map
>>>
>>> crash, as described earlier.
>> thanks for tracking it down! This was the origin of the commit:
>>
>> # tip/x86/numa: a9ad585: x86: remove the static 256k node_to_cpumask_map
>>
>> which has been in -tip since May 12 and in linux-next for two weeks
>> AFAICS, which is beyond the point of being something freshly wrong.
>>
>> So i suspect something more subtle here. What compiler version are you
>> using? This crash is not something that has been found in testing before
>> - i use rather new compilers, gcc 4.2.2 most of the time. Previous
>> compilers miscompile the kernel seriously so it's not usable for our
>> regression testing grid.
>>
>
> Hi,
>
> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>
> static int __build_sched_domains(const cpumask_t *cpu_map,
> struct sched_domain_attr *attr)
> {
> ...
> for (i = 0; i < MAX_NUMNODES; i++) {
> ...
> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
> ...
>
> This code is calling into the allocator with a spurious value of i,
> which causes SLAB to use an index (of 4 in my case) that is out of
> bounds for its nodelist array (at least it hasn't been initialized).
>
> This bit of code (a bit further down, inside the same loop) is also dubious:
>
> sg = kmalloc_node(sizeof(struct sched_group),
> GFP_KERNEL, i);
> if (!sg) {
> printk(KERN_WARNING
> "Can not alloc domain group for node %d\n", j);
> goto error;
> }
>
> Where it passes i to kmalloc_node() but reports an allocation for node
> j. Which one is correct?
>
> Hope this helps, will send an update if I find out more.
>
>
> Vegard
>

Thanks Vegard for tracking this down. My thoughts were along the same
wavelength... ;-)

Mike

2008-06-06 13:50:17

by Vegard Nossum

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
> Vegard Nossum wrote:
>>
>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>
>> static int __build_sched_domains(const cpumask_t *cpu_map,
>> struct sched_domain_attr *attr)
>> {
>> ...
>> for (i = 0; i < MAX_NUMNODES; i++) {
>> ...
>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>> ...
>>
>> This code is calling into the allocator with a spurious value of i,
>> which causes SLAB to use an index (of 4 in my case) that is out of
>> bounds for its nodelist array (at least it hasn't been initialized).
>>
>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>
>> sg = kmalloc_node(sizeof(struct sched_group),
>> GFP_KERNEL, i);
>> if (!sg) {
>> printk(KERN_WARNING
>> "Can not alloc domain group for node %d\n", j);
>> goto error;
>> }
>>
>> Where it passes i to kmalloc_node() but reports an allocation for node
>> j. Which one is correct?
>>

Hm, I think I'm wrong and the code is correct. However...

>> Hope this helps, will send an update if I find out more.
>>
>>
>> Vegard
>>
>
> Thanks Vegard for tracking this down. My thoughts were along the same
> wavelength... ;-)

I applied this patch
@@ -7133,6 +7133,14 @@ static int __build_sched_domains(const
cpumask_t *cpu_map,
cpus_clear(*covered);

cpus_and(*nodemask, *nodemask, *cpu_map);
+
+ printk("node %d\n", i);
+ for (j = 0; j < NR_CPUS; ++j)
+ printk("%c", cpu_isset(j, *nodemask) ? 'X' : '.');
+ printk("\n");
+
+ printk("empty = %d\n", cpus_empty(*nodemask));
+
if (cpus_empty(*nodemask)) {
sched_group_nodes[i] = NULL;
continue;

and it shows some really strange output, maybe it makes sense to you:

(the X means cpu is in the node)

Total of 2 processors activated (11976.24 BogoMIPS).
node 0
XX..............................................................................
................................................................................
................................................................................
...............
empty = 0
node 1
XX..............................................................................
................................................................................
................................................................................
...............
empty = 0
l3 = cachep->nodelists[0] (size-64) = ffff81003f824340
node 2
................................................................................
................................................................................
................................................................................
...............
empty = 1
node 3
................................................................................
................................................................................
................................................................................
...............
empty = 1
node 4
X...............................................................................
................................................................................
................................................................................
...............
empty = 0

This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
Yet node 4 is claimed to have a cpu too. That's bogus!

(But I don't think it's an error in sched.c any more, probably the
code that sets up the node maps.)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-06 14:07:28

by Vegard Nossum

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <[email protected]> wrote:
> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>> Vegard Nossum wrote:
>>>
>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>
>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>> struct sched_domain_attr *attr)
>>> {
>>> ...
>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>> ...
>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>> ...
>>>
>>> This code is calling into the allocator with a spurious value of i,
>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>
>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>
>>> sg = kmalloc_node(sizeof(struct sched_group),
>>> GFP_KERNEL, i);
>>> if (!sg) {
>>> printk(KERN_WARNING
>>> "Can not alloc domain group for node %d\n", j);
>>> goto error;
>>> }
>>>
>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>> j. Which one is correct?
>>>
>
> Hm, I think I'm wrong and the code is correct. However...
>
>>> Hope this helps, will send an update if I find out more.
>>>
>>>
>>> Vegard
>>>
>>
>> Thanks Vegard for tracking this down. My thoughts were along the same
>> wavelength... ;-)

...

>
> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
> Yet node 4 is claimed to have a cpu too. That's bogus!
>
> (But I don't think it's an error in sched.c any more, probably the
> code that sets up the node maps.)

Aha.

The error is of course that the node masks for nodes > nr_node_ids are
not valid. While this function ignores that:

cpumask_t *_node_to_cpumask_ptr(int node)
{
if (node_to_cpumask_map == NULL) {
printk(KERN_WARNING
"_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
node);
dump_stack();
return &cpu_online_map;
}
return &node_to_cpumask_map[node];
}
EXPORT_SYMBOL(_node_to_cpumask_ptr);

Notice the return statement. It needs to check if node < nr_node_ids.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-06 14:13:36

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>> Vegard Nossum wrote:
>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>
>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>> struct sched_domain_attr *attr)
>>> {
>>> ...
>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>> ...
>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>> ...
>>>
>>> This code is calling into the allocator with a spurious value of i,
>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>
>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>
>>> sg = kmalloc_node(sizeof(struct sched_group),
>>> GFP_KERNEL, i);
>>> if (!sg) {
>>> printk(KERN_WARNING
>>> "Can not alloc domain group for node %d\n", j);
>>> goto error;
>>> }
>>>
>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>> j. Which one is correct?
>>>
>
> Hm, I think I'm wrong and the code is correct. However...
>
>>> Hope this helps, will send an update if I find out more.
>>>
>>>
>>> Vegard
>>>
>> Thanks Vegard for tracking this down. My thoughts were along the same
>> wavelength... ;-)
>
> I applied this patch
> @@ -7133,6 +7133,14 @@ static int __build_sched_domains(const
> cpumask_t *cpu_map,
> cpus_clear(*covered);
>
> cpus_and(*nodemask, *nodemask, *cpu_map);
> +
> + printk("node %d\n", i);
> + for (j = 0; j < NR_CPUS; ++j)
> + printk("%c", cpu_isset(j, *nodemask) ? 'X' : '.');
> + printk("\n");
> +
> + printk("empty = %d\n", cpus_empty(*nodemask));
> +
> if (cpus_empty(*nodemask)) {
> sched_group_nodes[i] = NULL;
> continue;
>
> and it shows some really strange output, maybe it makes sense to you:
>
> (the X means cpu is in the node)
>
> Total of 2 processors activated (11976.24 BogoMIPS).
> node 0
> XX..............................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 0
> node 1
> XX..............................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 0
> l3 = cachep->nodelists[0] (size-64) = ffff81003f824340
> node 2
> ................................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 1
> node 3
> ................................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 1
> node 4
> X...............................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 0
>
> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
> Yet node 4 is claimed to have a cpu too. That's bogus!
>
> (But I don't think it's an error in sched.c any more, probably the
> code that sets up the node maps.)
>
>
> Vegard
>

Could you send me the full console log and your config file? The setup of
the node_to_cpumask map is dependent on the early discovery (usually in the
apic code) and there's been some changes in that area recently.

Thanks,
Mike

2008-06-06 14:20:40

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <[email protected]> wrote:
>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>>> Vegard Nossum wrote:
>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>
>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>> struct sched_domain_attr *attr)
>>>> {
>>>> ...
>>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>>> ...
>>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>> ...
>>>>
>>>> This code is calling into the allocator with a spurious value of i,
>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>
>>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>>
>>>> sg = kmalloc_node(sizeof(struct sched_group),
>>>> GFP_KERNEL, i);
>>>> if (!sg) {
>>>> printk(KERN_WARNING
>>>> "Can not alloc domain group for node %d\n", j);
>>>> goto error;
>>>> }
>>>>
>>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>>> j. Which one is correct?
>>>>
>> Hm, I think I'm wrong and the code is correct. However...
>>
>>>> Hope this helps, will send an update if I find out more.
>>>>
>>>>
>>>> Vegard
>>>>
>>> Thanks Vegard for tracking this down. My thoughts were along the same
>>> wavelength... ;-)
>
> ...
>
>> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
>> Yet node 4 is claimed to have a cpu too. That's bogus!
>>
>> (But I don't think it's an error in sched.c any more, probably the
>> code that sets up the node maps.)
>
> Aha.
>
> The error is of course that the node masks for nodes > nr_node_ids are
> not valid. While this function ignores that:
>
> cpumask_t *_node_to_cpumask_ptr(int node)
> {
> if (node_to_cpumask_map == NULL) {
> printk(KERN_WARNING
> "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
> node);
> dump_stack();
> return &cpu_online_map;
> }
> return &node_to_cpumask_map[node];
> }
> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>
> Notice the return statement. It needs to check if node < nr_node_ids.
>
>
> Vegard
>


Thanks, yes I had that some after thought. It should check the node
index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
sets it to the correct value. So uses before that should be caught by
the earlier check.

Mike

2008-06-06 14:36:20

by Vegard Nossum

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <[email protected]> wrote:
> Vegard Nossum wrote:
>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <[email protected]> wrote:
>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>>>> Vegard Nossum wrote:
>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>
>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>> struct sched_domain_attr *attr)
>>>>> {
>>>>> ...
>>>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>>>> ...
>>>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>> ...
>>>>>
>>>>> This code is calling into the allocator with a spurious value of i,
>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>

...

>> The error is of course that the node masks for nodes > nr_node_ids are
>> not valid. While this function ignores that:
>>
>> cpumask_t *_node_to_cpumask_ptr(int node)
>> {
>> if (node_to_cpumask_map == NULL) {
>> printk(KERN_WARNING
>> "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>> node);
>> dump_stack();
>> return &cpu_online_map;
>> }
>> return &node_to_cpumask_map[node];
>> }
>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>
>> Notice the return statement. It needs to check if node < nr_node_ids.
>>

...

>
> Thanks, yes I had that some after thought. It should check the node
> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
> sets it to the correct value. So uses before that should be caught by
> the earlier check.

I think it should always check the node index. The code in
kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
MAX_NUMNODES and it WILL use invalid pointers. Or should
kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
I believe there are more places that do this than just sched.c.

I have attached two patches. The sched one fixes Andrew's boot
problem. The x86 one is untested, but I believe it is better to BUG
than silently corrupt some arbitrary memory. (Then the callers can be
found easily and fixed at least.)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036


Attachments:
(No filename) (2.60 kB)
0001-sched-don-t-call-node_to_cpumask-on-nodes-nr_no.patch (1.79 kB)
0001-x86-don-t-return-invalid-pointers-from-node_to_cpum.patch (966.00 B)
Download all attachments

2008-06-06 14:41:55

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <[email protected]> wrote:
>> Vegard Nossum wrote:
>>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <[email protected]> wrote:
>>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>>>>> Vegard Nossum wrote:
>>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>>
>>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>> struct sched_domain_attr *attr)
>>>>>> {
>>>>>> ...
>>>>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>>>>> ...
>>>>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>>> ...
>>>>>>
>>>>>> This code is calling into the allocator with a spurious value of i,
>>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>>
>
> ...
>
>>> The error is of course that the node masks for nodes > nr_node_ids are
>>> not valid. While this function ignores that:
>>>
>>> cpumask_t *_node_to_cpumask_ptr(int node)
>>> {
>>> if (node_to_cpumask_map == NULL) {
>>> printk(KERN_WARNING
>>> "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>> node);
>>> dump_stack();
>>> return &cpu_online_map;
>>> }
>>> return &node_to_cpumask_map[node];
>>> }
>>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>>
>>> Notice the return statement. It needs to check if node < nr_node_ids.
>>>
>
> ...
>
>> Thanks, yes I had that some after thought. It should check the node
>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
>> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
>> sets it to the correct value. So uses before that should be caught by
>> the earlier check.
>
> I think it should always check the node index. The code in
> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
> MAX_NUMNODES and it WILL use invalid pointers. Or should
> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
> I believe there are more places that do this than just sched.c.

Yes, using MAX_NUMNODES is usually incorrect (the same for NR_CPUS).
When I originally submitted the patch I searched for all usages to
make sure they were correct. Unfortunately, later changes might not
have been validated. (Hmm, maybe adding to checkpatch.pl a similar
warning as it now does for NR_CPUS...?)

>
> I have attached two patches. The sched one fixes Andrew's boot
> problem. The x86 one is untested, but I believe it is better to BUG
> than silently corrupt some arbitrary memory. (Then the callers can be
> found easily and fixed at least.)

Andrew (or maybe it was Ingo) had suggested that instead of BUG use
dump_stack() and continue whenever possible. In this case returning
an empty cpumask would be correct.

Thanks,
Mike

2008-06-06 14:51:24

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Mike Travis wrote:
> Vegard Nossum wrote:
>> On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <[email protected]> wrote:
>>> Vegard Nossum wrote:
>>>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <[email protected]> wrote:
>>>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>>>>>> Vegard Nossum wrote:
>>>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>>>
>>>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>>> struct sched_domain_attr *attr)
>>>>>>> {
>>>>>>> ...
>>>>>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>>>>>> ...
>>>>>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>>>> ...
>>>>>>>
>>>>>>> This code is calling into the allocator with a spurious value of i,
>>>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>>>
>> ...
>>
>>>> The error is of course that the node masks for nodes > nr_node_ids are
>>>> not valid. While this function ignores that:
>>>>
>>>> cpumask_t *_node_to_cpumask_ptr(int node)
>>>> {
>>>> if (node_to_cpumask_map == NULL) {
>>>> printk(KERN_WARNING
>>>> "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>>> node);
>>>> dump_stack();
>>>> return &cpu_online_map;
>>>> }
>>>> return &node_to_cpumask_map[node];
>>>> }
>>>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>>>
>>>> Notice the return statement. It needs to check if node < nr_node_ids.
>>>>
>> ...
>>
>>> Thanks, yes I had that some after thought. It should check the node
>>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
>>> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
>>> sets it to the correct value. So uses before that should be caught by
>>> the earlier check.
>> I think it should always check the node index. The code in
>> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
>> MAX_NUMNODES and it WILL use invalid pointers. Or should
>> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
>> I believe there are more places that do this than just sched.c.
>
> Yes, using MAX_NUMNODES is usually incorrect (the same for NR_CPUS).
> When I originally submitted the patch I searched for all usages to
> make sure they were correct. Unfortunately, later changes might not
> have been validated. (Hmm, maybe adding to checkpatch.pl a similar
> warning as it now does for NR_CPUS...?)
>
>> I have attached two patches. The sched one fixes Andrew's boot
>> problem. The x86 one is untested, but I believe it is better to BUG
>> than silently corrupt some arbitrary memory. (Then the callers can be
>> found easily and fixed at least.)
>
> Andrew (or maybe it was Ingo) had suggested that instead of BUG use
> dump_stack() and continue whenever possible. In this case returning
> an empty cpumask would be correct.
>
> Thanks,
> Mike

Aha, here's the missing patch:

a953e4597abd51b74c99e0e3b7074532a60fd031

2008-06-06 14:54:58

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Mike Travis wrote:
> Mike Travis wrote:
>> Vegard Nossum wrote:
>>> On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <[email protected]> wrote:
>>>> Vegard Nossum wrote:
>>>>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <[email protected]> wrote:
>>>>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <[email protected]> wrote:
>>>>>>> Vegard Nossum wrote:
>>>>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>>>>
>>>>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>>>> struct sched_domain_attr *attr)
>>>>>>>> {
>>>>>>>> ...
>>>>>>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>>>>>>> ...
>>>>>>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>>>>> ...
>>>>>>>>
>>>>>>>> This code is calling into the allocator with a spurious value of i,
>>>>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>>>>
>>> ...
>>>
>>>>> The error is of course that the node masks for nodes > nr_node_ids are
>>>>> not valid. While this function ignores that:
>>>>>
>>>>> cpumask_t *_node_to_cpumask_ptr(int node)
>>>>> {
>>>>> if (node_to_cpumask_map == NULL) {
>>>>> printk(KERN_WARNING
>>>>> "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>>>> node);
>>>>> dump_stack();
>>>>> return &cpu_online_map;
>>>>> }
>>>>> return &node_to_cpumask_map[node];
>>>>> }
>>>>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>>>>
>>>>> Notice the return statement. It needs to check if node < nr_node_ids.
>>>>>
>>> ...
>>>
>>>> Thanks, yes I had that some after thought. It should check the node
>>>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
>>>> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
>>>> sets it to the correct value. So uses before that should be caught by
>>>> the earlier check.
>>> I think it should always check the node index. The code in
>>> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
>>> MAX_NUMNODES and it WILL use invalid pointers. Or should
>>> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
>>> I believe there are more places that do this than just sched.c.
>> Yes, using MAX_NUMNODES is usually incorrect (the same for NR_CPUS).
>> When I originally submitted the patch I searched for all usages to
>> make sure they were correct. Unfortunately, later changes might not
>> have been validated. (Hmm, maybe adding to checkpatch.pl a similar
>> warning as it now does for NR_CPUS...?)
>>
>>> I have attached two patches. The sched one fixes Andrew's boot
>>> problem. The x86 one is untested, but I believe it is better to BUG
>>> than silently corrupt some arbitrary memory. (Then the callers can be
>>> found easily and fixed at least.)
>> Andrew (or maybe it was Ingo) had suggested that instead of BUG use
>> dump_stack() and continue whenever possible. In this case returning
>> an empty cpumask would be correct.
>>
>> Thanks,
>> Mike
>
> Aha, here's the missing patch:
>
> a953e4597abd51b74c99e0e3b7074532a60fd031
>

Oops, message got away from me prematurely... ;-)

Ingo - can we push this from tip to linux-next?

Thanks,
Mike

2008-06-06 14:57:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Vegard Nossum <[email protected]> wrote:

> > Thanks, yes I had that some after thought. It should check the node
> > index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
> > nr_node_ids is intialized to MAX_NUMNODES until
> > setup_node_to_cpumask_map() sets it to the correct value. So uses
> > before that should be caught by the earlier check.
>
> I think it should always check the node index. The code in
> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
> MAX_NUMNODES and it WILL use invalid pointers. Or should
> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
> I believe there are more places that do this than just sched.c.
>
> I have attached two patches. The sched one fixes Andrew's boot
> problem. The x86 one is untested, but I believe it is better to BUG
> than silently corrupt some arbitrary memory. (Then the callers can be
> found easily and fixed at least.)

nice fixes! I have applied both of them to -tip, this one to
tip/sched-devel:

> Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids

AFAICS this is not yet required for v2.6.26, as the requirement to never
iterate to MAX_NUMNODES and call nr_cpus_node() with the index only got
introduced by Mike's patch.

and this one to tip/x86/numa:

> Subject: [PATCH] x86: don't return invalid pointers from node_to_cpumask()

and i've undone the revert of "x86: remove the static 256k
node_to_cpumask_map" as well.

agreed?

Ingo

2008-06-06 15:01:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Ingo Molnar <[email protected]> wrote:

> > Subject: [PATCH] sched: don't call node_to_cpumask() on nodes >
> > nr_node_ids
>
> AFAICS this is not yet required for v2.6.26, as the requirement to
> never iterate to MAX_NUMNODES and call nr_cpus_node() with the index
> only got introduced by Mike's patch.

the one below is needed as well i think.

Ingo
---
kernel/sched.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -6576,9 +6576,9 @@ static int find_next_best_node(int node,

min_val = INT_MAX;

- for (i = 0; i < MAX_NUMNODES; i++) {
+ for (i = 0; i < nr_node_ids; i++) {
/* Start at @node */
- n = (node + i) % MAX_NUMNODES;
+ n = (node + i) % nr_node_ids;

if (!nr_cpus_node(n))
continue;

2008-06-06 15:04:20

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Ingo Molnar wrote:
> * Vegard Nossum <[email protected]> wrote:
>
>>> Thanks, yes I had that some after thought. It should check the node
>>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
>>> nr_node_ids is intialized to MAX_NUMNODES until
>>> setup_node_to_cpumask_map() sets it to the correct value. So uses
>>> before that should be caught by the earlier check.
>> I think it should always check the node index. The code in
>> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
>> MAX_NUMNODES and it WILL use invalid pointers. Or should
>> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
>> I believe there are more places that do this than just sched.c.
>>
>> I have attached two patches. The sched one fixes Andrew's boot
>> problem. The x86 one is untested, but I believe it is better to BUG
>> than silently corrupt some arbitrary memory. (Then the callers can be
>> found easily and fixed at least.)
>
> nice fixes! I have applied both of them to -tip, this one to
> tip/sched-devel:
>
>> Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids
>
> AFAICS this is not yet required for v2.6.26, as the requirement to never
> iterate to MAX_NUMNODES and call nr_cpus_node() with the index only got
> introduced by Mike's patch.
>
> and this one to tip/x86/numa:
>
>> Subject: [PATCH] x86: don't return invalid pointers from node_to_cpumask()
>
> and i've undone the revert of "x86: remove the static 256k
> node_to_cpumask_map" as well.
>
> agreed?
>
> Ingo

Hi Ingo,

My -tip branch has:

a953e4597abd51b74c99e0e3b7074532a60fd031

sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
committed: 2008-05-23 09:22:17

The check for node > nr_node_ids however should be included (at least
when CONFIG_DEBUG_PER_CPU_MAPS is enabled.)

Thanks,
Mike

2008-06-06 15:13:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Ingo Molnar <[email protected]> wrote:

> > Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids
>
> AFAICS this is not yet required for v2.6.26, as the requirement to
> never iterate to MAX_NUMNODES and call nr_cpus_node() with the index
> only got introduced by Mike's patch.

note that we already had these changes in tip/cpus4096, the fact that
PERCPU_DEBUG didnt properly detect the problem was icing on the cake.

Ingo

2008-06-06 15:14:21

by Vegard Nossum

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

On Fri, Jun 6, 2008 at 5:01 PM, Ingo Molnar <[email protected]> wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
>> > Subject: [PATCH] sched: don't call node_to_cpumask() on nodes >
>> > nr_node_ids
>>
>> AFAICS this is not yet required for v2.6.26, as the requirement to
>> never iterate to MAX_NUMNODES and call nr_cpus_node() with the index
>> only got introduced by Mike's patch.
>
> the one below is needed as well i think.

Yeah. I think you had better take Mike's patches, I don't trust even
that my patch and your fixlet does everything correctly.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-06-06 15:20:38

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Mike Travis wrote:
>
> Hi Ingo,
>
> My -tip branch has:
>
> a953e4597abd51b74c99e0e3b7074532a60fd031
>
> sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
> committed: 2008-05-23 09:22:17
>
> The check for node > nr_node_ids however should be included (at least
> when CONFIG_DEBUG_PER_CPU_MAPS is enabled.)
>
> Thanks,
> Mike

Note this was in the following set of patches:


- Subject: [PATCH 01/10] percpu: Use a kconfig variable to signal arch specific percpu setup
- Subject: [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded
- Subject: [PATCH 04/11] x86: remove the static 256k node_to_cpumask_map
- Subject: [PATCH 03/11] x86: restore pda nodenumber field
- Subject: [PATCH 08/11] x86: Add performance variants of cpumask operators
- Subject: [PATCH 09/11] x86: Use performance variant for_each_cpu_mask_nr
- Subject: [PATCH 02/11] x86: cleanup early per cpu variables/accesses v4
- Subject: [PATCH 06/11] cpu: change some globals to statics in drivers/base/cpu.c v2
- Subject: [PATCH 07/11] x86: remove static boot_cpu_pda array
- Subject: [PATCH 05/11] sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
- Subject: [PATCH 11/11] net: Pass reference to cpumask variable in net/sunrpc/svc.c

- Date: Fri, 25 Apr 2008 17:15:48 -0700

(Some have been further modified by later patches.)

The patch ordering was incorrect as I removed the node_to_cpumask_map before I replaced
the MAX_NUMNODES, should have been the opposite.

2008-06-06 15:23:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Vegard Nossum <[email protected]> wrote:

> >> AFAICS this is not yet required for v2.6.26, as the requirement to
> >> never iterate to MAX_NUMNODES and call nr_cpus_node() with the
> >> index only got introduced by Mike's patch.
> >
> > the one below is needed as well i think.
>
> Yeah. I think you had better take Mike's patches, I don't trust even
> that my patch and your fixlet does everything correctly.

yep, just discovered that we had them already ;-)

Thomas has just scripted up a new "detect if a commit is not in
linux-next yet" script that should avoid such problems in the future.

your second patch is still wanted, it would have detected the problem
earlier.

Ingo

2008-06-06 15:34:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Mike Travis <[email protected]> wrote:

> The patch ordering was incorrect as I removed the node_to_cpumask_map
> before I replaced the MAX_NUMNODES, should have been the opposite.

It needed the combination 4 failures along the line: the debug check was
not complete, the ordering was bad and thus the splitup was bad as well
- and then one component went missing in linux-next and the combined
effect created this bug that needed a bisection by Andrew and Vegard to
figure out.

the moral: we now tightened the debug check, fixed the integration bug
and tightened the checks we have for patch propagation. (Thomas just
added the new tip-check-integration script to tip/tip that implements
this)

Ingo

2008-06-06 15:53:00

by Mike Travis

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5

Ingo Molnar wrote:
> * Vegard Nossum <[email protected]> wrote:
>
>>>> AFAICS this is not yet required for v2.6.26, as the requirement to
>>>> never iterate to MAX_NUMNODES and call nr_cpus_node() with the
>>>> index only got introduced by Mike's patch.
>>> the one below is needed as well i think.
>> Yeah. I think you had better take Mike's patches, I don't trust even
>> that my patch and your fixlet does everything correctly.
>
> yep, just discovered that we had them already ;-)
>
> Thomas has just scripted up a new "detect if a commit is not in
> linux-next yet" script that should avoid such problems in the future.
>
> your second patch is still wanted, it would have detected the problem
> earlier.
>
> Ingo

Thanks, yes, I agree. However I would like to modify it slightly:
---
Subject: [PATCH 1/1] x86: Add check for node passed to node_to_cpumask

* When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
node_to_cpumask and node_to_cpumask_ptr should be validated.

Signed-off-by: Mike Travis <[email protected]>
---
arch/x86/kernel/setup.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

--- linux-2.6.tip.orig/arch/x86/kernel/setup.c
+++ linux-2.6.tip/arch/x86/kernel/setup.c
@@ -399,6 +399,10 @@ int early_cpu_to_node(int cpu)
return per_cpu(x86_cpu_to_node_map, cpu);
}

+
+/* empty cpumask */
+static cpumask_t cpu_mask_none;
+
/*
* Returns a pointer to the bitmask of CPUs on Node 'node'.
*/
@@ -411,6 +415,13 @@ cpumask_t *_node_to_cpumask_ptr(int node
dump_stack();
return &cpu_online_map;
}
+ if (node >= nr_node_ids) {
+ printk(KERN_WARNING
+ "_node_to_cpumask_ptr(%d): node > nr_node_ids(%d)\n",
+ node, nr_node_ids);
+ dump_stack();
+ return &cpu_mask_none;
+ }
return &node_to_cpumask_map[node];
}
EXPORT_SYMBOL(_node_to_cpumask_ptr);
@@ -426,6 +437,13 @@ cpumask_t node_to_cpumask(int node)
dump_stack();
return cpu_online_map;
}
+ if (node >= nr_node_ids) {
+ printk(KERN_WARNING
+ "node_to_cpumask(%d): node > nr_node_ids(%d)\n",
+ node, nr_node_ids);
+ dump_stack();
+ return cpu_mask_none;
+ }
return node_to_cpumask_map[node];
}
EXPORT_SYMBOL(node_to_cpumask);

2008-06-06 16:38:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Ingo Molnar <[email protected]> wrote:

> * Andrew Morton <[email protected]> wrote:
>
> > > what do you mean? We are testing commits that everybody will run and
> > > are pre-filtering them for sanity and stability before they hit
> > > linux-next.
> >
> > One doesn't test commits - one tests a tree. And the -tip tree is
> > 2.6.26-rc5 plus a bunch of x86 changes. [...]
>
> no, 90%+ of all bugs are not due to tree interaction effects but are
> caused by individual commits, triggerable on a particular
> system/workload. (Our historic regression list is the proof for that,
> can give you itemized statistics if you want.)
>
> also, the -tip tree is not "2.6.26-rc5 plus a bunch of x86 changes" but
> v2.6.26-rc5-84-g39b945a plus 75 topic trees we maintain:
>
> build, core/futex-64bit, core/kill-the-BKL, core/locking, core/percpu,
> core/printk, core/rcu, core/rodata, core/softirq, core/softlockup,
> core/stacktrace, core/urgent, cpus4096, genirq, hrtimers, kmemcheck,
> out-of-tree, pci-for-jesse, safe-poison-pointers, sched, sched-devel,
> scratch, stackprotector, timers/clockevents, timers/hpet,
> timers/hrtimers, timers/nohz, timers/posixtimers, tip, tracing/ftrace,
> tracing/ftrace-mergefixups, tracing/immediates, tracing/markers,
> tracing/mmiotrace, tracing/mmiotrace-mergefixups, tracing/nmisafe,
> tracing/sched_markers, tracing/stopmachine-allcpus, tracing/sysprof,
> tracing/textedit, x86/apic, x86/apm, x86/bitops, x86/build, x86/checkme,
> x86/cleanups, x86/cpa, x86/cpu, x86/defconfig, x86/gart, x86/i8259,
> x86/intel, x86/irq, x86/irqstats, x86/kconfig, x86/ldt, x86/mce,
> x86/memtest, x86/mmio, x86/mpparse, x86/nmi, x86/numa, x86/numa-fixes,
> x86/pat, x86/pebs, x86/ptemask, x86/resumetrace, x86/scratch, x86/setup,
> x86/threadinfo, x86/timers, x86/urgent, x86/uv, x86/vdso, x86/xen,
> x86/xsave.
>
> most of which are in linux-next (around 70%), or will be shortly in
> linux-next (more than 90%).

we created some stats and in fact not 70% but 80% of all -tip commits
are in linux-next right now.

Here are the full -tip commit stats (merge commits excluded):

total commits in auto-next-branches: 617
auto-branches commits in linux-next: 553
total commits in tip/auto-latest: 686
total commits in tip/master: 699

that propotion should go up to 90% on the next linux-next iteration.
(barring any problems with the new topics)

Ingo

2008-06-06 17:15:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Andrew Morton <[email protected]> wrote:

> x86: remove the static 256k node_to_cpumask_map
>
> crash, as described earlier.
>
> I don't know what happened to that early exception - it didn't come back.

hm, early exception, there's one commit i can think of right now:

# tip/x86/xen: b20aecc: xen: fix early bootup crash on native hardware

but that got propagated to linux-next:

# linux-next: b20aecc: xen: fix early bootup crash on native hardware

Ingo

2008-06-18 08:27:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: linux-next: Tree for June 5


* Mike Travis <[email protected]> wrote:

> Ingo Molnar wrote:
> > * Vegard Nossum <[email protected]> wrote:
> >
> >>>> AFAICS this is not yet required for v2.6.26, as the requirement to
> >>>> never iterate to MAX_NUMNODES and call nr_cpus_node() with the
> >>>> index only got introduced by Mike's patch.
> >>> the one below is needed as well i think.
> >> Yeah. I think you had better take Mike's patches, I don't trust even
> >> that my patch and your fixlet does everything correctly.
> >
> > yep, just discovered that we had them already ;-)
> >
> > Thomas has just scripted up a new "detect if a commit is not in
> > linux-next yet" script that should avoid such problems in the future.
> >
> > your second patch is still wanted, it would have detected the problem
> > earlier.
> >
> > Ingo
>
> Thanks, yes, I agree. However I would like to modify it slightly:
> ---
> Subject: [PATCH 1/1] x86: Add check for node passed to node_to_cpumask
>
> * When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
> node_to_cpumask and node_to_cpumask_ptr should be validated.
>
> Signed-off-by: Mike Travis <[email protected]>
> ---
> arch/x86/kernel/setup.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> --- linux-2.6.tip.orig/arch/x86/kernel/setup.c
> +++ linux-2.6.tip/arch/x86/kernel/setup.c
> @@ -399,6 +399,10 @@ int early_cpu_to_node(int cpu)
> return per_cpu(x86_cpu_to_node_map, cpu);
> }
>
> +
> +/* empty cpumask */
> +static cpumask_t cpu_mask_none;

hm, this should be __read_mostly, maybe even const so that it becomes
readonly?

Ingo