2003-02-24 17:58:52

by Martin J. Bligh

[permalink] [raw]
Subject: 2.5.62-mjb3 (scalability / NUMA patchset)

The patchset contains mainly scalability and NUMA stuff, and anything
else that stops things from irritating me. It's meant to be pretty stable,
not so much a testing ground for new stuff.

I'd be very interested in feedback from anyone willing to test on any
platform, however large or small.

ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.62/patch-2.5.62-mjb
3.bz2

additional:

http://www.aracnet.com/~fletch/linux/2.5.59/pidmaps_nodepages

Since 2.5.62-mjb2 (~ = changed, + = added, - = dropped)

Notes: Fixes some critical scheduler hangs.

- discontig_x440 Pat Gaughen / IBM NUMA team
+ early_ioremap Dave Hansen
+ x440disco_A0 Pat Gaughen / IBM NUMA team
+ fix_was_sched Ingo / wli / Rick Lindsley
+ no_kirq Martin J. Bligh
+ auto_disable_tsc John Stultz
+ cleaner_inodes Andrew Morton

Pending:
scheduler callers profiling (Anton)
PPC64 NUMA patches (Anton)
Child runs first (akpm)
Kexec
e1000 fixes
Non-PAE aligned kernel splits (Dave Hansen)
Update the lost timer ticks code
Ingo scheduler updates

Present in this patch:

early_printk Dave Hansen et al.
Allow printk before console_init

confighz Andrew Morton / Dave Hansen
Make HZ a config option of 100 Hz or 1000 Hz

config_page_offset Dave Hansen / Andrea
Make PAGE_OFFSET a config option

vmalloc_stats Dave Hansen
Expose useful vmalloc statistics

local_pgdat William Lee Irwin
Move the pgdat structure into the remapped space with lmem_map

numameminfo Martin Bligh / Keith Mannthey
Expose NUMA meminfo information under /proc/meminfo.numa

notsc Martin Bligh
Enable notsc option for NUMA-Q (new version for new config system)

mpc_apic_id Martin J. Bligh
Fix null ptr dereference (optimised away, but ...)

doaction Martin J. Bligh
Fix cruel torture of macros and small furry animals in io_apic.c

kgdb Andrew Morton / Various People
The older version of kgdb, synched with 2.5.54-mm1

noframeptr Martin Bligh
Disable -fomit_frame_pointer

ingosched Ingo Molnar
Modify NUMA scheduler to have independant tick basis.

schedstat Rick Lindsley
Provide stats about the scheduler under /proc/stat

sched_tunables Robert Love
Provide tunable parameters for the scheduler (+ NUMA scheduler)

early_ioremap Dave Hansen
Provide ioremap in very early boot when we only have 8Mb address space

x440disco_A0 Pat Gaughen / IBM NUMA team
SLIT/SRAT parsing for x440 discontigmem

acpi_x440_hack Anonymous Coward
Stops x440 crashing, but owner is ashamed of it ;-)

numa_pci_fix Dave Hansen
Fix a potential error in the numa pci code from Stanford Checker

pfn_to_nid William Lee Irwin
Turn pfn_to_nid into a macro

kprobes Vamsi Krishna S
Add kernel probes hooks to the kernel

dmc_exit1 Dave McCracken
Speed up the exit path, pt 1.

dmc_exit2 Dave McCracken
Speed up the exit path, pt 1.

shpte Dave McCracken
Shared pagetables (as a config option)

thread_info_cleanup (4K stacks pt 1) Dave Hansen / Ben LaHaise
Prep work to reduce kernel stacks to 4K

interrupt_stacks (4K stacks pt 2) Dave Hansen / Ben LaHaise
Create a per-cpu interrupt stack.

stack_usage_check (4K stacks pt 3) Dave Hansen / Ben LaHaise
Check for kernel stack overflows.

4k_stack (4K stacks pt 4) Dave Hansen
Config option to reduce kernel stacks to 4K

fix_kgdb Dave Hansen
Fix interaction between kgdb and 4K stacks

stacks_from_slab William Lee Irwin
Take kernel stacks from the slab cache, not page allocation.

thread_under_page William Lee Irwin
Fix THREAD_SIZE < PAGE_SIZE case

lkcd LKCD team
Linux kernel crash dump support

percpu_loadavg Martin J. Bligh
Provide per-cpu loadaverages, and real load averages

irq_affinity Martin J. Bligh
Workaround for irq_affinity on clustered apic mode systems (eg x440)

kirq_clustered_fix Dave Hansen / Martin J. Bligh
Fix kirq for clustered apic systems (eg x440)

fix_was_sched Ingo / wli / Rick Lindsley
Fix scheduler hangs from deadlocks

no_kirq Martin J. Bligh
Allow disabling of kirq to work properly

auto_disable_tsc John Stultz
Automatically disable the TSC for NUMA-Q

cleaner_inodes Andrew Morton
Make noatime filesystems more efficient

-mjb Martin J. Bligh
Add a tag to the makefile


2003-02-26 15:25:55

by Mark Haverkamp

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

On Mon, 2003-02-24 at 10:08, Martin J. Bligh wrote:
> The patchset contains mainly scalability and NUMA stuff, and anything
> else that stops things from irritating me. It's meant to be pretty stable,
> not so much a testing ground for new stuff.
>
> I'd be very interested in feedback from anyone willing to test on any
> platform, however large or small.
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.62/patch-2.5.62-mjb
> 3.bz2
>

Martin,

I have been seeing system hangs on my 16 processor numaq while running
contest. The system will hang within a few seconds to half an hour.
Unfortunately there is no stack trace or any other indication on the
system console. I have been running your 2.5.62-mjb2 without problems
previously. Any ideas what I can do to narrow this down?

Mark.
--
Mark Haverkamp <[email protected]>

2003-02-26 15:45:22

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

>> The patchset contains mainly scalability and NUMA stuff, and anything
>> else that stops things from irritating me. It's meant to be pretty
>> stable, not so much a testing ground for new stuff.
>>
>> I'd be very interested in feedback from anyone willing to test on any
>> platform, however large or small.
>>
>> ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.62/patch-2.5.62-
>> mjb 3.bz2
>>
>
> Martin,
>
> I have been seeing system hangs on my 16 processor numaq while running
> contest. The system will hang within a few seconds to half an hour.
> Unfortunately there is no stack trace or any other indication on the
> system console. I have been running your 2.5.62-mjb2 without problems
> previously. Any ideas what I can do to narrow this down?

Humpf. Can you try backing out this patch (it caused me similar problems on
59, but seemed fine in 62). I suspect it's just changing timing enough that
we hit some other bug ... if you could, would be nice to try the ALT+SYSRQ
stuff, or turn on NMI watchdogs and get a backtrace ... I've not been able
to reproduce this on recent kernels.

Thanks,

M.

diff -urpN -X /home/fletch/.diff.exclude
330-no_kirq/include/asm-i386/mach-numaq/mach_mpparse.h
340-auto_disable_tsc/include/asm-i386/mach-numaq/mach_mpparse.h
--- 330-no_kirq/include/asm-i386/mach-numaq/mach_mpparse.h Fri Jan 17
09:18:31 2003
+++ 340-auto_disable_tsc/include/asm-i386/mach-numaq/mach_mpparse.h Mon Feb
24 08:14:42 2003
@@ -32,6 +32,7 @@ static inline void mps_oem_check(struct
if (mpc->mpc_oemptr)
smp_read_mpc_oem((struct mp_config_oemtable *) mpc->mpc_oemptr,
mpc->mpc_oemsize);
+ tsc_disable=1;
}

/* Hook from generic ACPI tables.c */

2003-02-26 15:53:40

by Mark Haverkamp

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

On Wed, 2003-02-26 at 07:55, Martin J. Bligh wrote:
> >> The patchset contains mainly scalability and NUMA stuff, and anything
> >> else that stops things from irritating me. It's meant to be pretty
> >> stable, not so much a testing ground for new stuff.
> >>
> >> I'd be very interested in feedback from anyone willing to test on any
> >> platform, however large or small.
> >>
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.62/patch-2.5.62-
> >> mjb 3.bz2
> >>
> >
> > Martin,
> >
> > I have been seeing system hangs on my 16 processor numaq while running
> > contest. The system will hang within a few seconds to half an hour.
> > Unfortunately there is no stack trace or any other indication on the
> > system console. I have been running your 2.5.62-mjb2 without problems
> > previously. Any ideas what I can do to narrow this down?
>
> Humpf. Can you try backing out this patch (it caused me similar problems on
> 59, but seemed fine in 62). I suspect it's just changing timing enough that
> we hit some other bug ...

OK, I'll try this.


> if you could, would be nice to try the ALT+SYSRQ
> stuff, or turn on NMI watchdogs and get a backtrace ... I've not been able
> to reproduce this on recent kernels.

I'll try these first and see what happens.

Mark.


--
Mark Haverkamp <[email protected]>

2003-02-26 22:38:33

by Mark Haverkamp

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

On Wed, 2003-02-26 at 07:55, Martin J. Bligh wrote:
> >> The patchset contains mainly scalability and NUMA stuff, and anything
> >> else that stops things from irritating me. It's meant to be pretty
> >> stable, not so much a testing ground for new stuff.
> >>
> >> I'd be very interested in feedback from anyone willing to test on any
> >> platform, however large or small.
> >>
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.62/patch-2.5.62-
> >> mjb 3.bz2
> >>
> >
> > Martin,
> >
> > I have been seeing system hangs on my 16 processor numaq while running
> > contest. The system will hang within a few seconds to half an hour.
> > Unfortunately there is no stack trace or any other indication on the
> > system console. I have been running your 2.5.62-mjb2 without problems
> > previously. Any ideas what I can do to narrow this down?
>
> Humpf. Can you try backing out this patch (it caused me similar problems on
> 59, but seemed fine in 62). I suspect it's just changing timing enough that
> we hit some other bug ... if you could, would be nice to try the ALT+SYSRQ
> stuff, or turn on NMI watchdogs and get a backtrace ... I've not been able
> to reproduce this on recent kernels.
>
> Thanks,
>
> M.
>
> diff -urpN -X /home/fletch/.diff.exclude
> 330-no_kirq/include/asm-i386/mach-numaq/mach_mpparse.h
> 340-auto_disable_tsc/include/asm-i386/mach-numaq/mach_mpparse.h
> --- 330-no_kirq/include/asm-i386/mach-numaq/mach_mpparse.h Fri Jan 17
> 09:18:31 2003
> +++ 340-auto_disable_tsc/include/asm-i386/mach-numaq/mach_mpparse.h Mon Feb
> 24 08:14:42 2003
> @@ -32,6 +32,7 @@ static inline void mps_oem_check(struct
> if (mpc->mpc_oemptr)
> smp_read_mpc_oem((struct mp_config_oemtable *) mpc->mpc_oemptr,
> mpc->mpc_oemsize);
> + tsc_disable=1;
> }
>
> /* Hook from generic ACPI tables.c */
>

I turned on NMI watchdogs and when the system hung, I saw no output. My
serial console is through a terminal server that isn't set up to pass
along the sysrq, so I need to get this fixed. In any case I backed out
the patch that you suggested and I have had no system hangs since.

Mark.
--
Mark Haverkamp <[email protected]>

2003-02-26 22:46:07

by Randy.Dunlap

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

|
| I turned on NMI watchdogs and when the system hung, I saw no output. My
| serial console is through a terminal server that isn't set up to pass
| along the sysrq, so I need to get this fixed. In any case I backed out
| the patch that you suggested and I have had no system hangs since.
|
| Mark.
| --
| Mark Haverkamp <[email protected]>

Mark,

You can also use my "echo key > sysrq" patch.
It was updated to 2.5.62 by Zwane M.
It's available at http://www.osdl.org/archive/rddunlap/patches/magickey_2562.patch
(after a possible 15-minute rsync delay).

--
~Randy

2003-02-26 22:43:34

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

> I turned on NMI watchdogs and when the system hung, I saw no output. My
> serial console is through a terminal server that isn't set up to pass
> along the sysrq, so I need to get this fixed. In any case I backed out
> the patch that you suggested and I have had no system hangs since.

OK, I'll back out that patch for now, but it seems to indicate underlying
crud. What parameter did you set for NMI watchdog?

M.

2003-02-26 22:54:56

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

>> > I turned on NMI watchdogs and when the system hung, I saw no output.
>> > My serial console is through a terminal server that isn't set up to
>> > pass along the sysrq, so I need to get this fixed. In any case I
>> > backed out the patch that you suggested and I have had no system hangs
>> > since.
>>
>> OK, I'll back out that patch for now, but it seems to indicate underlying
>> crud. What parameter did you set for NMI watchdog?
>
> I set it to 1. In Documentation/nmi_watchdog.txt this looked like the
> only option. Now that I look at apic.h, I see that I could set it to 2
> also. If you like I can try this also.

2 is what we used sucessfully last time, but I can't remember the
difference off the top of my head ... if you could try that, would be most
useful.

M.

2003-02-26 22:52:55

by Mark Haverkamp

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

On Wed, 2003-02-26 at 14:53, Martin J. Bligh wrote:
> > I turned on NMI watchdogs and when the system hung, I saw no output. My
> > serial console is through a terminal server that isn't set up to pass
> > along the sysrq, so I need to get this fixed. In any case I backed out
> > the patch that you suggested and I have had no system hangs since.
>
> OK, I'll back out that patch for now, but it seems to indicate underlying
> crud. What parameter did you set for NMI watchdog?

I set it to 1. In Documentation/nmi_watchdog.txt this looked like the
only option. Now that I look at apic.h, I see that I could set it to 2
also. If you like I can try this also.

Mark.
--
Mark Haverkamp <[email protected]>

2003-02-27 17:11:32

by Mark Haverkamp

[permalink] [raw]
Subject: Re: 2.5.62-mjb3 (scalability / NUMA patchset)

On Wed, 2003-02-26 at 15:05, Martin J. Bligh wrote:
> >> > I turned on NMI watchdogs and when the system hung, I saw no output.
> >> > My serial console is through a terminal server that isn't set up to
> >> > pass along the sysrq, so I need to get this fixed. In any case I
> >> > backed out the patch that you suggested and I have had no system hangs
> >> > since.
> >>
> >> OK, I'll back out that patch for now, but it seems to indicate underlying
> >> crud. What parameter did you set for NMI watchdog?
> >
> > I set it to 1. In Documentation/nmi_watchdog.txt this looked like the
> > only option. Now that I look at apic.h, I see that I could set it to 2
> > also. If you like I can try this also.
>
> 2 is what we used sucessfully last time, but I can't remember the
> difference off the top of my head ... if you could try that, would be most
> useful.


Still no luck getting a stack trace. With nmi_watchdog=2, I get these
kind of messages on occasion:


Uhhuh. NMI received for unknown reason 35 on CPU 11.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?



But when the system finally froze, there was nothing.

Mark.



--

Mark Haverkamp <[email protected]>