2003-02-17 23:12:49

by Linus Torvalds

[permalink] [raw]
Subject: Linux v2.5.62


Hmm.. Mostly lots of small updates, although the merge with Andrew
included the RCU dcache patches from IBM that he has carried along for a
while (ie fairly fundamnetal, but also very well tested).

ARM, PPC, PPC64, alpha, kbuild.

Oh, and as a sign that 2.6.x really _is_ approaching, people have started
sending me spelling fixes. Kernel coders are apparently all atrocious
spellers, and for some reason the spelling police always comes out of the
woodwork when stable releases get closer.

Linus

---

Summary of changes from v2.5.61 to v2.5.62
============================================

<[email protected]>:
o PPC32: Export additional symbols for CONFIG_4xx

<[email protected]>:
o ppc64: revised machine check exception handler
o ppc64: new scanlog interface

Adrian Bunk <[email protected]>:
o [netdrvr] make CONFIG_MII one-line desc more pretty

Alan Cox <[email protected]>:
o Add printk levels to mtrr, also clarify
o merge the NEC98 parsing code
o make the io-apic printk generate less junk mail
o printk levels for mpparse
o remove bogowarning
o itanic people cant spell either
o nor PPC people ;)
o specialix fix from 2.4 missing in 2.5
o bring 2.5 arcnet into line with 2.4
o Fix aha1542
o mca 53c9x also needs mca-legacy
o another ia64 typo
o header update for arcnet updates (again to match 2.4)

Andrew Morton <[email protected]>:
o ppc64: kill ppc64 unused var warning
o ppc64: fix warning in smp_prepare_cpus
o JFS build fix with gcc-2.95.3
o flush_tlb_all is not preempt safe
o move fault_in_pages_readable/writeable to header
o separate checks from generic_file_aio_write
o fix ext3 BUG due to race with truncate
o crc32 improvements
o dcache_rcu: revert fast_walk code
o dcache_rcu
o error checking in ext3 xattr code
o xattr: listxattr fix
o xattr: infrastructure for permission overrides
o xattr: allow kernel code to override EA permissions
o xattr: trusted extended attributes
o blk_congestion_wait tuning and lockup fix
o cciss driver update
o cciss, fix array bounds overrun
o direct-io return value fix
o direct-io: allow reading of the part-filled EOF block
o Fix ext3 build when EXT3_DEBUG is defined
o Make the world safe for -Wundef
o fix compile breakage on drivers/scsi/NCR53C9x.c
o Use table lookup for radix_tree_maxindex()
o elv_former_request reversion

Andries E. Brouwer <[email protected]>:
o add static, fix typo

Anton Blanchard <[email protected]>:
o ppc64: add TCSBRKP
o ppc64: Remove sys32_mremap, not required on ppc64 since we alter
TASK_SIZE
o ppc64: fix compile warnings
o ppc64: clean up some of big bad sys_ppc32.c
o ppc64: always compile in 32bit ELF support
o ppc64: Never call event-scan faster than once per second, required
on some machines
o ppc64: dont attempt a traceback table lookup for userspace
addresses
o ppc64: warning fix, caused by me
o ppc64: use get_user in alignment exception handler
o ppc64: ptrace signal fix
o ppc64: make sure socketcall_table is 8 byte aligned
o ppc64: add set_tid_address and fadvise64
o disable printout of interrupts in /proc/stat on ppc64
o enable OFFB on ppc64
o remove stale comment
o compat futex fix

Art Haas <[email protected]>:
o C99 initializers for drivers/net/aironet4500_proc.c
o C99 initializers for drivers/char/rtc.c
o C99 initializers for drivers/cdrom/cdrom.c
o C99 initializers for drivers/net/arlan-proc.c

Ben Collins <[email protected]>:
o IEEE-1394 Updates

Brian Gerst <[email protected]>:
o remove .mod.c files in make clean

Daniel Jacobowitz <[email protected]>:
o Clean up ptrace_setoptions and PT_* constants
o Set ptrace_message before PT_TRACE_EXIT

Dave Kleikamp <[email protected]>:
o JFS: Fix jfs_sync_fs

Dominik Brodowski <[email protected]>:
o pcmcia: add device_class pcmcia_socket, update devices & drivers
o pcmcia: use device_class->add_device/remove_device
o cpufreq: move frequency table helpers to extra module
o cpufreq: move /proc/cpufreq interface code to extra module
o cpufreq: fix compilation of ACPI if !CPU_FREQ
o pcmcia: small bugfix & cleanup

Fran?ois Romieu <[email protected]>:
o [netdrvr rrunner] small fixes and cleanups

Jaroslav Kysela <[email protected]>:
o ALSA update

Jeff Wiedemeier <[email protected]>:
o alpha numa setup_memory leaves meaningless {min,max}_low_pfn
o delay marvel agp printk until after !hose check

Jens Axboe <[email protected]>:
o deadline ioscheduler bug fixes
o fix request-to-request front merging
o missing lock in get_request_wait()
o front merge fix (really!)

Kai Germaschewski <[email protected]>:
o kbuild: Always postprocess modules
o kbuild: Move the version magic generation into module
postprocessing
o kbuild: Use list of modules for "make modules_install"
o kbuild: Do module post processing in C
o kbuild: Add dependency info to modules
o kbuild: Add dependency info to modules
o kbuild: Figure endianness / word size at compile time
o kbuild: Merge file2alias into scripts/modpost.c
o kbuild: Rename some module postprocessing stuff
o kbuild: scripts/elfconfig.h is generated
o kbuild: Warn on undefined exported symbols
o kbuild: Fix modules_install w/o modules error
o kbuild: Fix a 64-bit issue in scripts/modpost.c
o kbuild: Fix a "make -j" bug

Linus Torvalds <[email protected]>:
o Fix futex compile breakage introduced by the compat code
o Clean up and fix locking around signal rendering
o Do proper signal locking for the old-style /proc/stat too
o It's usually considered stupid to lock the same spinlock twice in
close succession. However, for this once we'll just call it
"inspired".
o Fix locking for "send_sig_info()", to avoid possible races with
signal state changes due to execve() and exit(). We need to hold
the tasklist lock to guarantee stability of "task->sighand".

Marc Zyngier <[email protected]>:
o EISA/sysfs updates

Matthew Wilcox <[email protected]>:
o Fix mandatory locking

Paul Mackerras <[email protected]>:
o PPC32: Changes to accommodate recent signal changes
(current->sighand)
o PPC32: Fix compile warnings in some programs used in the build
process
o PPC32: Add set_tid_address and fadvise64 system calls
o PPC32: declare pm_power_off
o PPC32: use ptrace_notify

Randy Dunlap <[email protected]>:
o fix Documentation/cli-sti-removal.txt thinko

Richard Henderson <[email protected]>:
o [ALPHA] Add missing sighand bits
o [ALPHA] Add isa_eth_io_copy_and_sum
o [ALPHA] Add fadvise64

Rob Weryk <[email protected]>:
o Fix small typo

Robert Love <[email protected]>:
o trivial: unused var in sunrpc

Roger Luethi <[email protected]>:
o [netdrvr via-rhine] trivial bits
o [netdrvr via-rhine] fix broken tx-underrun handling
o [netdrvr via-rhine] various duplex-related fixes
o [netdrvr via-rhine] reset function rewrite
o [netdrvr via-rhine] bump version, use constant instead of magic
number
o Fix 8139too device close

Russell King <[email protected]>:
o [ARM] Fix resource initialisation for IOP310
o [ARM] Miscellaneous cleanups
o [ARM] Reduce scope of "safe_buffers"
o [ARM PATCH] 1372/1: EPXA10DB: Add missing include files to irq.c
for 2.5.59
o [ARM PATCH] 1373/1: EPXA10DB: Update def-config file
o [ARM PATCH] 1376/1: Use #defines for iq80310 serial port
o [ARM PATCH] 1377/1: Retain endianess state on XScale CPUs during
boot
o [ARM PATCH] 1368/1: Fix some typos in proc-armv/system.h
o [ARM] Better handling of bad IRQ implementations
o [ARM PATCH] 1380/1: Big-Endian support for jiffies
o [ARM] Add init_sighand for 2.5.60
o [ARM] Ensure backtrace terminates on corrupted frame pointers
o [ARM] Update Acorn SCSI drivers
o [ARM] Update wdt285 and wdt977 watchdog drivers
o [ARM] Add input_devclass support to SA1111 PS/2 port driver
o [ARM PATCH] 1099/4: trizeps MTD support
o [ARM] Update signal handling for ARM

Rusty Russell <[email protected]>:
o kbuild: Module alias and device table support
o kbuild: Do modversions checks on module structure
o get rid of exec_usermodehelper, replace with call_usermodehelper
o kbuild: Fix non-verbose make modules_install output

Sam Ravnborg <[email protected]>:
o fix warning in kernel/dma.c
o char/drivers/random.c - fix warning

Scott Anderson <[email protected]>:
o PPC32: Invalidate the icache before use on PPC40x

Stephen Rothwell <[email protected]>:
o compat_sys_futex 1/3 generic, parisc, ppc64, s390x and x86_64

Steve French <[email protected]>:
o Merge in fixes from version 0.6.5 of the CIFS VFS. Greatly
improved performance including improved distributed caching support
and support for readpages and larger read sizes. Cache data now
flushed properly at file close time. Socket and memory leak fixed.
Fix two oops. Fix error logging and made more consistent. Generic
sendfile added

Steven Cole <[email protected]>:
o [tokenring proteon] trivial, spelling fix
o high pedantry in ppc spelling
o alpha typo fix
o 2.5.61 fix erroneous spellings of error
o 2.5.61 Reduce the number of "nuber" by four
o 2.5.61 fix spelling of necessary in 11 files
o fix different spellings of different and differences
o correct the spelling of correction and correctly
o more accurate spelling of accuracy
o yet more pedantry: complement vs compliment

Tom Rini <[email protected]>:
o PPC32: Fix some license drain bamage. Noticed by Christoph Hellwig



2003-02-17 23:53:04

by Chris Wedgwood

[permalink] [raw]
Subject: Linux v2.5.62 --- spontaneous reboots

On Mon, Feb 17, 2003 at 03:18:43PM -0800, Linus Torvalds wrote:

> Oh, and as a sign that 2.6.x really _is_ approaching, people have
> started sending me spelling fixes.

FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
without spontaneous rebooting under load (kernel compile in a loop).

I wondered if it was specific to my system here except a few other
people have reported this on *very* different hardware (I'm have UP
Athlon with IDE, they have 8-way P4 with SCSI).

Is anyone else seeing this? Might there be some bogon causing triple
faults or similar lurking that I'm just unlucky enough to hit often?

I note the 2.5.59-mjb4 seems pretty reliable and doesn't have this
problem...




--cw

2003-02-18 00:36:40

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

On Mon, Feb 17, 2003 at 07:44:08PM -0500, Jeff Garzik wrote:

> ACPI, or no?

nope

> highmem, or no?

no for me --- yes for them I assume (8-way P4)

> Are you running your UP Athlon with CONFIG_X86_UP_APIC?

I was... I wondered if that might do it, so I tried without. Still
reboots. Built kernel as 486 kernel with no IO-APIC too, still
reboots.

Nothing is logged (serial console).

Tried gcc-2.95 and gcc-3.2.



--cw

2003-02-18 00:34:36

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

Chris Wedgwood wrote:
> On Mon, Feb 17, 2003 at 03:18:43PM -0800, Linus Torvalds wrote:
>
>
>>Oh, and as a sign that 2.6.x really _is_ approaching, people have
>>started sending me spelling fixes.
>
>
> FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
> without spontaneous rebooting under load (kernel compile in a loop).


ACPI, or no?

highmem, or no?

Are you running your UP Athlon with CONFIG_X86_UP_APIC?

Jeff



2003-02-18 01:35:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots


On Mon, 17 Feb 2003, Chris Wedgwood wrote:
>
> FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
> without spontaneous rebooting under load (kernel compile in a loop).
>
> I note the 2.5.59-mjb4 seems pretty reliable and doesn't have this
> problem...

It would be interesting to hear exactly when the trouble started. And if
plain 2.5.59 does it (which is unclear from your description), but 59-mjb4
doesn't, then that's an interesting data point.

Linus

2003-02-18 01:43:54

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

On Mon, Feb 17, 2003 at 05:42:38PM -0800, Linus Torvalds wrote:

> It would be interesting to hear exactly when the trouble
> started. And if plain 2.5.59 does it (which is unclear from your
> description), but 59-mjb4 doesn't, then that's an interesting data
> point.

plain 2.5.59 does

59-mjb4 does NOT

I tested 59-mjb4 at the suggest of mbligh after hearing that other
people had discovered the same bug and were now using 59-mjb4


--cw

2003-02-18 01:55:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots


On Mon, 17 Feb 2003, Chris Wedgwood wrote:
>
> plain 2.5.59 does
>
> 59-mjb4 does NOT

Can you check mjb 1-3 too? The better it gets pinpointed, the easier it's
going to be to find.

Also, if you can figure out _which_ part of the patch makes a difference,
that would obviously be even better. Part of the stuff in mjb is already
merged in later kernels (ie things like using sequence locks for xtime is
already there in 2.5.60, so clearly that doesn't seem to be the thing that
helps your situation).

Martin cc'd, in case he has suggestions on how/what to split up the patch.

Do you use the starfire driver? That's a big part of the patch, for
example.. And part of the patch just makes the timer interrupt happen much
less often, if you havn't configured for 1000Hz - and it may well be that
small perturbations like that are the things that matter to you.

Linus

2003-02-18 02:06:15

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

On Mon, Feb 17, 2003 at 06:02:03PM -0800, Linus Torvalds wrote:

> Can you check mjb 1-3 too? The better it gets pinpointed, the easier
> it's going to be to find.

Sure... I'll test them later on.

> Also, if you can figure out _which_ part of the patch makes a
> difference, that would obviously be even better.

I'll try to narrow this down.

> Part of the stuff in mjb is already merged in later kernels (ie
> things like using sequence locks for xtime is already there in
> 2.5.60, so clearly that doesn't seem to be the thing that helps your
> situation).

I don't think it's anything really obvious. If the problem I'm seeing
is the same as the one showing up on *some* IBM NUMA-Q (or whatever
they are) boxen then it's probably not a driver or fs thing --- as we
have nothing in common.

Now... it could be two different problems, except the same kernel
which the IBM people found works for them also works for me.

Oddly, wli has not seen this problem and he's using similar hardware
(I think) to the other IBM people and the same compiler as me.

> Do you use the starfire driver?

Nope.

A stripped down kernel, compile for a 486 with no IO-APIC support (in
an attempt to slow things down and hopefully avoid possible hardware
problems such as overheating) still reboots on me.

The only thing I can think of is a triple-fault... I'm wondering
about using gcc-3.2 instead of 2.95.4 (Debian blah blort blem) on the
off chance it's a weird compiler problem.



--cw

2003-02-18 02:27:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots


On Mon, 17 Feb 2003, Chris Wedgwood wrote:
>
> The only thing I can think of is a triple-fault... I'm wondering
> about using gcc-3.2 instead of 2.95.4 (Debian blah blort blem) on the
> off chance it's a weird compiler problem.

A lot of people seem to be using gcc-3.2 these days, since it's what RH-8
comes with as standard. I don't think there are any _known_ problems with
that compiler, at least on x86.

Now, interestingly enough, the mjb patch _does_ contain a change to
mm/memory.c that really makes no sense _except_ in the case of a compiler
bug. So you could check whether that (small) mm/memory.c patch is the
thing that makes a difference for you..

It would also be interesting to see if you can check just the scheduler
part of the mjb patch. On the whole the mjb patch looks like it should be
fairly easy to cut into specific parts, and Martin may actually have it
somewhere as separate patches.

Linus

2003-02-18 03:11:59

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

>> plain 2.5.59 does
>>
>> 59-mjb4 does NOT
>
> Can you check mjb 1-3 too? The better it gets pinpointed, the easier it's
> going to be to find.

I should note that our performance team also has triple-faults on some
database app on a 8x machine ... that goes away with mjb4, not sure why
as yet. There's nothing in there that I can think of that would fix
a triple fault, so it may well be something annoyingly subtle.

Try -mjb1 first, if that still fixes it, then I'll start hacking off
chunks for you to test. Try 62 as well ... that has dcache_rcu merged,
which is another major chunk of the patch. kgdb is also big, and may
well change timings ...

> Also, if you can figure out _which_ part of the patch makes a difference,
> that would obviously be even better. Part of the stuff in mjb is already
> merged in later kernels (ie things like using sequence locks for xtime is
> already there in 2.5.60, so clearly that doesn't seem to be the thing that
> helps your situation).

Yup, a lot of it is designed to give our performance team a stable base
to work from - so minimal changes to a 59 base.

I use gcc-2.95.4 (Debian) as Chris does and have found that extremely
stable, not sure what the perf team were using, I'll find out.

> Now, interestingly enough, the mjb patch _does_ contain a change to
> mm/memory.c that really makes no sense _except_ in the case of a compiler
> bug. So you could check whether that (small) mm/memory.c patch is the
> thing that makes a difference for you..

That's the config_page_offset patch, which Dave ported forward from
Andrea's tree ... I've split that out below:

diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/arch/i386/Kconfig 22-config_page_offset/arch/i386/Kconfig
--- 21-config_hz/arch/i386/Kconfig Wed Feb 5 22:22:59 2003
+++ 22-config_page_offset/arch/i386/Kconfig Wed Feb 5 22:23:00 2003
@@ -660,6 +660,44 @@ config HIGHMEM64G

endchoice

+choice
+ help
+ On i386, a process can only virtually address 4GB of memory. This
+ lets you select how much of that virtual space you would like to
+ devoted to userspace, and how much to the kernel.
+
+ Some userspace programs would like to address as much as possible and
+ have few demands of the kernel other than it get out of the way. These
+ users may opt to use the 3.5GB option to give their userspace program
+ as much room as possible. Due to alignment issues imposed by PAE,
+ the "3.5GB" option is unavailable if "64GB" high memory support is
+ enabled.
+
+ Other users (especially those who use PAE) may be running out of
+ ZONE_NORMAL memory. Those users may benefit from increasing the
+ kernel's virtual address space size by taking it away from userspace,
+ which may not need all of its space. An indicator that this is
+ happening is when /proc/Meminfo's "LowFree:" is a small percentage of
+ "LowTotal:" while "HighFree:" is very large.
+
+ If unsure, say "3GB"
+ prompt "User address space size"
+ default 1GB
+
+config 05GB
+ bool "3.5 GB"
+ depends on !HIGHMEM64G
+
+config 1GB
+ bool "3 GB"
+
+config 2GB
+ bool "2 GB"
+
+config 3GB
+ bool "1 GB"
+endchoice
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/arch/i386/Makefile 22-config_page_offset/arch/i386/Makefile
--- 21-config_hz/arch/i386/Makefile Fri Jan 17 09:18:19 2003
+++ 22-config_page_offset/arch/i386/Makefile Wed Feb 5 22:23:00 2003
@@ -89,6 +89,7 @@ drivers-$(CONFIG_OPROFILE) += arch/i386

CFLAGS += $(mflags-y)
AFLAGS += $(mflags-y)
+AFLAGS_vmlinux.lds.o += -imacros $(TOPDIR)/include/asm-i386/page.h

boot := arch/i386/boot

diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/arch/i386/vmlinux.lds.S 22-config_page_offset/arch/i386/vmlinux.lds.S
--- 21-config_hz/arch/i386/vmlinux.lds.S Fri Jan 17 09:18:20 2003
+++ 22-config_page_offset/arch/i386/vmlinux.lds.S Wed Feb 5 22:23:00 2003
@@ -10,7 +10,7 @@ ENTRY(_start)
jiffies = jiffies_64;
SECTIONS
{
- . = 0xC0000000 + 0x100000;
+ . = __PAGE_OFFSET + 0x100000;
/* read-only */
_text = .; /* Text and read-only data */
.text : {
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/include/asm-i386/page.h 22-config_page_offset/include/asm-i386/page.h
--- 21-config_hz/include/asm-i386/page.h Tue Jan 14 10:06:18 2003
+++ 22-config_page_offset/include/asm-i386/page.h Wed Feb 5 22:23:00 2003
@@ -89,7 +89,16 @@ typedef struct { unsigned long pgprot; }
* and CONFIG_HIGHMEM64G options in the kernel configuration.
*/

-#define __PAGE_OFFSET (0xC0000000)
+#include <linux/config.h>
+#ifdef CONFIG_05GB
+#define __PAGE_OFFSET (0xE0000000)
+#elif defined(CONFIG_1GB)
+#define __PAGE_OFFSET (0xC0000000)
+#elif defined(CONFIG_2GB)
+#define __PAGE_OFFSET (0x80000000)
+#elif defined(CONFIG_3GB)
+#define __PAGE_OFFSET (0x40000000)
+#endif

/*
* This much address space is reserved for vmalloc() and iomap()
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/include/asm-i386/processor.h 22-config_page_offset/include/asm-i386/processor.h
--- 21-config_hz/include/asm-i386/processor.h Thu Jan 2 22:05:15 2003
+++ 22-config_page_offset/include/asm-i386/processor.h Wed Feb 5 22:23:00 2003
@@ -279,7 +279,11 @@ extern unsigned int mca_pentium_flag;
/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
+#ifdef CONFIG_05GB
+#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 16))
+#else
#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
+#endif

/*
* Size of io_bitmap in longwords: 32 is ports 0-0x3ff.
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/mm/memory.c 22-config_page_offset/mm/memory.c
--- 21-config_hz/mm/memory.c Mon Jan 13 21:09:28 2003
+++ 22-config_page_offset/mm/memory.c Wed Feb 5 22:23:00 2003
@@ -101,8 +101,7 @@ static inline void free_one_pmd(struct m

static inline void free_one_pgd(struct mmu_gather *tlb, pgd_t * dir)
{
- int j;
- pmd_t * pmd;
+ pmd_t * pmd, * md, * emd;

if (pgd_none(*dir))
return;
@@ -113,8 +112,21 @@ static inline void free_one_pgd(struct m
}
pmd = pmd_offset(dir, 0);
pgd_clear(dir);
- for (j = 0; j < PTRS_PER_PMD ; j++)
- free_one_pmd(tlb, pmd+j);
+ /*
+ * Beware if changing the loop below. It once used int j,
+ * for (j = 0; j < PTRS_PER_PMD; j++)
+ * free_one_pmd(pmd+j);
+ * but some older i386 compilers (e.g. egcs-2.91.66, gcc-2.95.3)
+ * terminated the loop with a _signed_ address comparison
+ * using "jle", when configured for HIGHMEM64GB (X86_PAE).
+ * If also configured for 3GB of kernel virtual address space,
+ * if page at physical 0x3ffff000 virtual 0x7ffff000 is used as
+ * a pmd, when that mm exits the loop goes on to free "entries"
+ * found at 0x80000000 onwards. The loop below compiles instead
+ * to be terminated by unsigned address comparison using "jb".
+ */
+ for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++)
+ free_one_pmd(tlb,md);
pmd_free_tlb(tlb, pmd);
}


2003-02-18 12:03:06

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

Hi!

> > Oh, and as a sign that 2.6.x really _is_ approaching, people have
> > started sending me spelling fixes.
>
> FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
> without spontaneous rebooting under load (kernel compile in a loop).
>
> I wondered if it was specific to my system here except a few other
> people have reported this on *very* different hardware (I'm have UP
> Athlon with IDE, they have 8-way P4 with SCSI).
>
> Is anyone else seeing this? Might there be some bogon causing triple
> faults or similar lurking that I'm just unlucky enough to hit often?

I'm seeing loop-related problems around 2.5.60+...
Pavel

--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

2003-02-18 12:56:56

by Ed Tomlinson

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

Linus Torvalds wrote:

> A lot of people seem to be using gcc-3.2 these days, since it's what RH-8
> comes with as standard. I don't think there are any known problems with
> that compiler, at least on x86.

No so,

See the lkml thread

Re: [BUG] link error in usbserial with gcc3.2

Ed Tomlinson




2003-02-18 21:34:33

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

On Mon, Feb 17, 2003 at 05:42:38PM -0800, Linus Torvalds wrote:

> It would be interesting to hear exactly when the trouble
> started. And if plain 2.5.59 does it (which is unclear from your
> description), but 59-mjb4 doesn't, then that's an interesting data
> point.

After much testing, which is still in progress it would seem that
*maybe* mjb4 does have the problem too, although it's much harder to
hit. Please note that this is a single data point where for other
kernels I have two or more occurrences of spontaneous reboots.

I've been checking older kernels... it would seem the problem first
occurs in 2.5.53 (that is 2.5.53 through 2.5.62-bk all reboot for me).
2.5.51 doesn't appear to and thus far neither does 2.5.52.

I say thus far, because the problem usually appears after about 15
minutes of compiling, but it sometimes takes a little longer. I'm
running 2.5.52 now and after 45 minutes it's still going.


As to what difference it might be between '52 and '53 I have no idea.
I had a quick look and the changes there are considerable.

I've tried different compiles, with and without preempt, and and
without IO-APIC and trimming down the kernel...



--cw

2003-02-18 21:50:02

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

On Tue, Feb 18, 2003 at 01:44:31PM -0800, Chris Wedgwood wrote:

> I say thus far, because the problem usually appears after about 15
> minutes of compiling, but it sometimes takes a little longer. I'm
> running 2.5.52 now and after 45 minutes it's still going.

Of course, Murphy being the optimist he is; about two minutes after I
make a claim that 2.5.52 does NOT spontaneously reboot --- it *DOES*.

I'm back to 2.5.51 and I'll beat it hard and see what happens. I
guess until I (or someone else who sees this) can get some concrete
data points you'll have to ignore this.


--cw

2003-02-18 22:07:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots


On Tue, 18 Feb 2003, Chris Wedgwood wrote:
>
> Of course, Murphy being the optimist he is; about two minutes after I
> make a claim that 2.5.52 does NOT spontaneously reboot --- it *DOES*.
>
> I'm back to 2.5.51 and I'll beat it hard and see what happens. I
> guess until I (or someone else who sees this) can get some concrete
> data points you'll have to ignore this.

Ok. Especially if it seems that -mjb4 also potentially does it (just
harder to trigger), I don't see many other alternatives than just going
back in time to see when it started.

But if it was getting hard to trigger with 2.5.52 too, things might be
getting hairier and hairier.. If it becomes hard enough to trigger as to
be practically nondeterministic, a better approach might be to just go
back to -mjb4, and even if it is still there in -mjb4 try to see which
part of the patch seems to be making it more stable. That might give us
more clues, and it's a much smaller problem set than going arbitrarily far
back in the 2.5.x series.

Linus

2003-02-18 22:28:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots


On Tue, 18 Feb 2003, Linus Torvalds wrote:
>
> But if it was getting hard to trigger with 2.5.52 too, things might be
> getting hairier and hairier.. If it becomes hard enough to trigger as to
> be practically nondeterministic, a better approach might be to just go
> back to -mjb4, and even if it is still there in -mjb4 try to see which
> part of the patch seems to be making it more stable.

Btw, this is particularly true if it takes you potentially hours to test
something like 2.5.51 for stability, but you can reboot 2.5.59 at will in
ten minutes.

In that case, you can test several vrsions of "2.5.59 + partial -mjb
patches" much more quickly than you can walk backwards in 2.5.x, and try
to pinpoint the "this part of -mjb makes it much less likely to reboot".

Also, with the -mjb patch there are some new configuration options. For
example, CONFIG_100HZ on -mjb has very different behaviour than a plain
2.5.59 kernel that defaults to 1kHz timer clock, and maybe the reason -mjb
seems more stable is that you may have selected a configuration option
that made -mjb act differently.

Regardless, it would be very interesting to hear what the -mjb split-down
results would be. Even if the answer might be "at 1kHz timer it is
unstable, at 100Hz it is stable" (and if that were to be it, then you'd
have to walk backwards to 2.5.24 to find the old 2.5.x kernel that had a
slow tick rate).

Linus

2003-02-18 22:51:49

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

On Tue, Feb 18, 2003 at 02:13:00PM -0800, Linus Torvalds wrote:

> > I'm back to 2.5.51 and I'll beat it hard and see what happens. I
> > guess until I (or someone else who sees this) can get some
> > concrete data points you'll have to ignore this.
>
> Ok. Especially if it seems that -mjb4 also potentially does it (just
> harder to trigger), I don't see many other alternatives than just
> going back in time to see when it started.

It seems 2.5.51 *does* also show this... but it took nearly an hour
this time.

> But if it was getting hard to trigger with 2.5.52 too, things might
> be getting hairier and hairier... If it becomes hard enough to
> trigger as to be practically nondeterministic, a better approach
> might be to just go back to -mjb4, and even if it is still there in
> -mjb4 try to see which part of the patch seems to be making it more
> stable.

I may have to do that... it seems older kernel do have this problem,
it's just harder to hit for some reason.

I'd suspect it was an Athlon or chipset problem if it weren't for the
fact 2.4.x is stable for 8+ hours doing doing the same exact thing[1].

> That might give us more clues, and it's a much smaller problem set
> than going arbitrarily far back in the 2.5.x series.

Sure thing.


--cw

2003-02-18 23:38:37

by walt

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

Chris Wedgwood wrote:

> ...I'd suspect it was an Athlon or chipset problem if it weren't for the
> fact 2.4.x is stable for 8+ hours doing doing the same exact thing[1].

Unfortunately this is not proof :-( I can tell you from personal
experience that the BSD kernels are much more sensitive to overheating
hardware than linux is, for example -- so one linux kernel could just
as easily be more sensitive to overheating than another linux kernel.

I've never found out why this is, but I know it's true. When I try
to run a BSD kernel on a dust-covered motherboard I'll get random
crashes all over the place even though a linux kernel will run just
fine on the same machine. All I do is blow the dust off the motherboard
and both kernels run again without problem. Absolutely for sure.

I'd love to know what makes the difference.



2003-02-19 10:43:48

by David Ford

[permalink] [raw]
Subject: Re: Linux v2.5.62

2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
careful and do very little in X, it seems to stay up for a few days. If
I do any sort of fast graphics or sound, etc, it'll die very quickly.
'tis an instant death with no OOPS, nothing at all on screen, nothing on
serial console.

Just an FYI, I'm trying to narrow it down.

David

Linus Torvalds wrote:

>Hmm.. Mostly lots of small updates, although the merge with Andrew
>included the RCU dcache patches from IBM that he has carried along for a
>while (ie fairly fundamnetal, but also very well tested).
>
>ARM, PPC, PPC64, alpha, kbuild.
>
>Oh, and as a sign that 2.6.x really _is_ approaching, people have started
>sending me spelling fixes. Kernel coders are apparently all atrocious
>spellers, and for some reason the spelling police always comes out of the
>woodwork when stable releases get closer.
>
>

2003-02-19 10:55:09

by Duncan Sands

[permalink] [raw]
Subject: Re: Linux v2.5.62

David, sounds like what I described in the email
"2.5.6x hard freeze playing DVDs". I have made
no progress because I don't know how to proceed.

All the best,

Duncan.

On Wednesday 19 February 2003 11:53, David Ford wrote:
> 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> careful and do very little in X, it seems to stay up for a few days. If
> I do any sort of fast graphics or sound, etc, it'll die very quickly.
> 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> serial console.
>
> Just an FYI, I'm trying to narrow it down.

2003-02-19 10:58:02

by David Ford

[permalink] [raw]
Subject: Re: Linux v2.5.62 --- spontaneous reboots

I have a 2.5.58 box that's a simple firewall/router w/ iptables running
on it. It crashes and reboots automatically roughly every other day.
It's been doing that for a long time and I never had the time to debug
it. I'll put .62 on it with a serial console and see what it comes up
with. It runs two PPPoE channels over ethX. PPPoE is known to blow up
(OOPS) on pppd hangup/restarts.

David


2003-02-19 10:58:04

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, Feb 19, 2003 at 12:04:55PM +0100, Duncan Sands wrote:
> David, sounds like what I described in the email
> "2.5.6x hard freeze playing DVDs". I have made
> no progress because I don't know how to proceed.
> All the best,

Well, there's always the NMI oopser + serial console to log oopses.
X seems to make VGA console unavailable, plus it's not loggable.


-- wli

2003-02-19 11:07:33

by Hirling Endre

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, 2003-02-19 at 11:53, David Ford wrote:
> 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> careful and do very little in X, it seems to stay up for a few days. If
> I do any sort of fast graphics or sound, etc, it'll die very quickly.
> 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> serial console.

You're lucky, for me 2.5.60+ freezes right after "uncompressing kernel".
Tried with and without ACPI, with and without 'noapic', with APIC
enabled and disabled in the BIOS.

2.5.59 is just unstable.

(msi kt4 ultra MB, athlon xp 2200+, gcc 3.2.2)

I'll try a minimal 2.5.62 now.

endre

2003-02-19 11:15:05

by Duncan Sands

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wednesday 19 February 2003 12:17, Hirling Endre wrote:
> On Wed, 2003-02-19 at 11:53, David Ford wrote:
> > 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> > careful and do very little in X, it seems to stay up for a few days. If
> > I do any sort of fast graphics or sound, etc, it'll die very quickly.
> > 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> > serial console.
>
> You're lucky, for me 2.5.60+ freezes right after "uncompressing kernel".
> Tried with and without ACPI, with and without 'noapic', with APIC
> enabled and disabled in the BIOS.
>
> 2.5.59 is just unstable.
>
> (msi kt4 ultra MB, athlon xp 2200+, gcc 3.2.2)
>
> I'll try a minimal 2.5.62 now.

Endre, check out this thread:

Re: 2.5.62 fails to boot, Uncompressing... and then nothing

Duncan.

2003-02-19 11:42:14

by Hirling Endre

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, 2003-02-19 at 12:24, Duncan Sands wrote:

> Endre, check out this thread:
>
> Re: 2.5.62 fails to boot, Uncompressing... and then nothing

Been there, done that. I tried without ACPI and I have VT console
enabled. Haven't tried early_printk, though.

endre

2003-02-19 11:49:41

by Duncan Sands

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wednesday 19 February 2003 12:07, William Lee Irwin III wrote:
> On Wed, Feb 19, 2003 at 12:04:55PM +0100, Duncan Sands wrote:
> > David, sounds like what I described in the email
> > "2.5.6x hard freeze playing DVDs". I have made
> > no progress because I don't know how to proceed.
> > All the best,
>
> Well, there's always the NMI oopser + serial console to log oopses.
> X seems to make VGA console unavailable, plus it's not loggable.

AMD K6-2 w/o APIC...

Duncan.

2003-02-19 11:54:56

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wednesday 19 February 2003 12:07, William Lee Irwin III wrote:
>> Well, there's always the NMI oopser + serial console to log oopses.
>> X seems to make VGA console unavailable, plus it's not loggable.

On Wed, Feb 19, 2003 at 12:58:58PM +0100, Duncan Sands wrote:
> AMD K6-2 w/o APIC...

Hmm. Could you be convinced to upgrade to a machine with a real
interrupt controller?

Well, hook up serial anyway. Maybe it's oopsing and you just can't
see what it's trying to printk.


-- wli

2003-02-19 12:39:43

by Thomas Molina

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, 19 Feb 2003, David Ford wrote:

> 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> careful and do very little in X, it seems to stay up for a few days. If
> I do any sort of fast graphics or sound, etc, it'll die very quickly.
> 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> serial console.

2.5.60 is where it started getting the most stable for me with similar
equipment. My system is Athlon 1.3 GHz, ASUS A7V, RedHat 8.0, Yamaha 754
soundcard.

I'm not burning any DVDs, but I am making a few CDs, lots of kernel
compiles, playing with modules, etc.

2003-02-19 18:50:43

by Zilvinas Valinskas

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, Feb 19, 2003 at 05:53:43AM -0500, David Ford wrote:
> 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> careful and do very little in X, it seems to stay up for a few days. If
> I do any sort of fast graphics or sound, etc, it'll die very quickly.
> 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> serial console.
>
> Just an FYI, I'm trying to narrow it down.

it might triple fault ? Who knows. One thing I am sure of, if I don't
load agpgart + intel-agp, laptop in questions, works flawlessly.
Otherwise first time I log of KDE trying to login as different user I
get instant reboot.

That's the clue.

ps.
Hardware :

Compaq EVO 800
Intel P4, 1.7GHz, 256MB RAM, ATI Radeon Mobility LY (something).

>
> David

2003-02-19 21:36:25

by Remco Post

[permalink] [raw]
Subject: Re: Linux v2.5.62

Hi all,

just to let you all know, The linus 2.5.62 (plain as can be) just booted
on my motorola powerstack II system. No modules, but also, no oops on
boot, like 2.5.59 and allmost every other 2.5 before that....

-- Remco

2003-02-19 22:13:43

by Remco Post

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, 19 Feb 2003 22:46:27 +0100
Remco Post <[email protected]> wrote:

> Hi all,
>
> just to let you all know, The linus 2.5.62 (plain as can be) just booted
> on my motorola powerstack II system. No modules, but also, no oops on
> boot, like 2.5.59 and allmost every other 2.5 before that....
>
> -- Remco
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

and fortunately, I also have some use for booting this kernel:

When the ethernet link goed down on my on-board dec-tulip:

eth1: timeout expired stopping DMA
kernel BUG at drivers/net/tulip/de2104x.c:925!
Oops: Exception in kernel mode, sig: 4
NIP: C0138248 LR: C0138248 SP: C0275E00 REGS: c0275d50 TRAP: 0700
Not taintedMSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c022f550[0] 'swapper' Last syscall: 120
GPR00: C0138248 C0275E00 C022F550 0000002F 00000001 C0275CB8 C0271800 C02B0000
GPR08: 0000161F 00000000 00000000 C0275D30 4000C088 00000000 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000002 00001032 C03DD000 00009032 FFFFFFCE C03DD1C0
Call trace: [c0138588] [c002066c] [c001b85c] [c0007e80] [c00061c4] [c00039
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Only after some traffic was supposed to leave the machine, not that it ever does
with this kernel....


-- Remco

2003-02-19 23:30:19

by Linus Torvalds

[permalink] [raw]
Subject: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


Ok, I wrote up this doublefault task-gate handler which has gotten some
very very minimal testing, and which is probably totally buggered on SMP
machines etc, but which has caught at least one double-fault on one of my
test-machines (which I forced to double-fault by making %esp contain an
invalid value in kernel mode).

If the reboot is due to a triple-fault, this may give out some debugging
information and then lock up hard instead of rebooting.

Change the "ptr_ok()" to match your hardware (or just make it do

#define ptr_ok(x) (1)

since I only really wrote it that way due to debugging the damn thing).

Anyway, this patch should apply pretty directly on top of 2.5.62, and if
you run UP it might even work. So apply this, and try to crash the
machine, and see if it spits out any interesting information.

NOTE NOTE NOTE! When the double-fault happens, the machine as-is will be
COMPLETELY DEAD! Don't try to access "current" or anything like that,
since the stack is scrogged. That's why it gets the state by actually
reading the current value of gdt, and following it to the TSS structure.

If this approach works, we can try to make the doublefault handling less
prone to lock up the machine (ie kill the offending task and continuing),
but in the meantime at least it should avoid having things like stack
errors result in triple faults and reboots.

Improvements welcome (and boy was this a bitch to debug).

Linus

-----
===== arch/i386/kernel/Makefile 1.35 vs edited =====
--- 1.35/arch/i386/kernel/Makefile Tue Feb 18 18:59:01 2003
+++ edited/arch/i386/kernel/Makefile Wed Feb 19 11:56:49 2003
@@ -6,7 +6,8 @@

obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \
ptrace.o i8259.o ioport.o ldt.o setup.o time.o sys_i386.o \
- pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o
+ pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \
+ doublefault.o

obj-y += cpu/
obj-y += timers/
===== arch/i386/kernel/head.S 1.24 vs edited =====
--- 1.24/arch/i386/kernel/head.S Tue Feb 18 18:58:53 2003
+++ edited/arch/i386/kernel/head.S Wed Feb 19 11:56:50 2003
@@ -476,6 +476,13 @@
.quad 0x00009a0000000000 /* 0xc0 APM CS 16 code (16 bit) */
.quad 0x0040920000000000 /* 0xc8 APM DS data */

+ .quad 0x0000000000000000 /* 0xd0 - unused */
+ .quad 0x0000000000000000 /* 0xd8 - unused */
+ .quad 0x0000000000000000 /* 0xe0 - unused */
+ .quad 0x0000000000000000 /* 0xe8 - unused */
+ .quad 0x0000000000000000 /* 0xf0 - unused */
+ .quad 0x0000000000000000 /* 0xf8 - GDT entry 31: double-fault TSS */
+
#if CONFIG_SMP
.fill (NR_CPUS-1)*GDT_ENTRIES,8,0 /* other CPU's GDT */
#endif
===== arch/i386/kernel/traps.c 1.44 vs edited =====
--- 1.44/arch/i386/kernel/traps.c Sat Feb 15 19:30:17 2003
+++ edited/arch/i386/kernel/traps.c Wed Feb 19 11:56:50 2003
@@ -775,7 +775,7 @@
}
#endif

-#define _set_gate(gate_addr,type,dpl,addr) \
+#define _set_gate(gate_addr,type,dpl,addr,seg) \
do { \
int __d0, __d1; \
__asm__ __volatile__ ("movw %%dx,%%ax\n\t" \
@@ -785,7 +785,7 @@
:"=m" (*((long *) (gate_addr))), \
"=m" (*(1+(long *) (gate_addr))), "=&a" (__d0), "=&d" (__d1) \
:"i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
- "3" ((char *) (addr)),"2" (__KERNEL_CS << 16)); \
+ "3" ((char *) (addr)),"2" ((seg) << 16)); \
} while (0)


@@ -797,22 +797,27 @@
*/
void set_intr_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n,14,0,addr);
+ _set_gate(idt_table+n,14,0,addr,__KERNEL_CS);
}

static void __init set_trap_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n,15,0,addr);
+ _set_gate(idt_table+n,15,0,addr,__KERNEL_CS);
}

static void __init set_system_gate(unsigned int n, void *addr)
{
- _set_gate(idt_table+n,15,3,addr);
+ _set_gate(idt_table+n,15,3,addr,__KERNEL_CS);
}

static void __init set_call_gate(void *a, void *addr)
{
- _set_gate(a,12,3,addr);
+ _set_gate(a,12,3,addr,__KERNEL_CS);
+}
+
+static void __init set_task_gate(unsigned int n, unsigned int gdt_entry)
+{
+ _set_gate(idt_table+n,5,0,0,(gdt_entry<<3));
}


@@ -843,7 +848,7 @@
set_system_gate(5,&bounds);
set_trap_gate(6,&invalid_op);
set_trap_gate(7,&device_not_available);
- set_trap_gate(8,&double_fault);
+ set_task_gate(8,GDT_ENTRY_DOUBLEFAULT_TSS);
set_trap_gate(9,&coprocessor_segment_overrun);
set_trap_gate(10,&invalid_TSS);
set_trap_gate(11,&segment_not_present);
===== arch/i386/kernel/cpu/common.c 1.17 vs edited =====
--- 1.17/arch/i386/kernel/cpu/common.c Sat Dec 28 09:17:17 2002
+++ edited/arch/i386/kernel/cpu/common.c Wed Feb 19 11:56:50 2003
@@ -490,6 +490,10 @@
load_TR_desc();
load_LDT(&init_mm.context);

+ /* Set up doublefault TSS pointer in the GDT */
+ __set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss);
+ cpu_gdt_table[cpu][GDT_ENTRY_DOUBLEFAULT_TSS].b &= 0xfffffdff;
+
/* Clear %fs and %gs. */
asm volatile ("xorl %eax, %eax; movl %eax, %fs; movl %eax, %gs");

===== include/asm-i386/desc.h 1.12 vs edited =====
--- 1.12/include/asm-i386/desc.h Sat Dec 28 09:18:49 2002
+++ edited/include/asm-i386/desc.h Wed Feb 19 11:56:51 2003
@@ -42,10 +42,12 @@
"rorl $16,%%eax" \
: "=m"(*(n)) : "a" (addr), "r"(n), "ir"(limit), "i"(type))

-static inline void set_tss_desc(unsigned int cpu, void *addr)
+static inline void __set_tss_desc(unsigned int cpu, unsigned int entry, void *addr)
{
- _set_tssldt_desc(&cpu_gdt_table[cpu][GDT_ENTRY_TSS], (int)addr, 235, 0x89);
+ _set_tssldt_desc(&cpu_gdt_table[cpu][entry], (int)addr, 235, 0x89);
}
+
+#define set_tss_desc(cpu,addr) __set_tss_desc(cpu, GDT_ENTRY_TSS, addr)

static inline void set_ldt_desc(unsigned int cpu, void *addr, unsigned int size)
{
===== include/asm-i386/processor.h 1.39 vs edited =====
--- 1.39/include/asm-i386/processor.h Fri Feb 14 18:24:10 2003
+++ edited/include/asm-i386/processor.h Wed Feb 19 11:56:51 2003
@@ -83,6 +83,7 @@
extern struct cpuinfo_x86 boot_cpu_data;
extern struct cpuinfo_x86 new_cpu_data;
extern struct tss_struct init_tss[NR_CPUS];
+extern struct tss_struct doublefault_tss;

#ifdef CONFIG_SMP
extern struct cpuinfo_x86 cpu_data[];
===== include/asm-i386/segment.h 1.5 vs edited =====
--- 1.5/include/asm-i386/segment.h Sat Dec 28 09:18:49 2002
+++ edited/include/asm-i386/segment.h Wed Feb 19 11:56:52 2003
@@ -37,6 +37,13 @@
* 23 - APM BIOS support
* 24 - APM BIOS support
* 25 - APM BIOS support
+ *
+ * 26 - unused
+ * 27 - unused
+ * 28 - unused
+ * 29 - unused
+ * 30 - unused
+ * 31 - TSS for double fault handler
*/
#define GDT_ENTRY_TLS_ENTRIES 3
#define GDT_ENTRY_TLS_MIN 6
@@ -64,10 +71,12 @@
#define GDT_ENTRY_PNPBIOS_BASE (GDT_ENTRY_KERNEL_BASE + 6)
#define GDT_ENTRY_APMBIOS_BASE (GDT_ENTRY_KERNEL_BASE + 11)

+#define GDT_ENTRY_DOUBLEFAULT_TSS 31
+
/*
- * The GDT has 25 entries but we pad it to cacheline boundary:
+ * The GDT has 32 entries
*/
-#define GDT_ENTRIES 28
+#define GDT_ENTRIES 32

#define GDT_SIZE (GDT_ENTRIES * 8)

--- /dev/null 2002-08-30 16:31:37.000000000 -0700
+++ ./arch/i386/kernel/doublefault.c 2003-02-19 15:26:44.000000000 -0800
@@ -0,0 +1,65 @@
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/init_task.h>
+#include <linux/fs.h>
+
+#include <asm/uaccess.h>
+#include <asm/pgtable.h>
+#include <asm/desc.h>
+
+#define DOUBLEFAULT_STACKSIZE (1024)
+static unsigned long doublefault_stack[DOUBLEFAULT_STACKSIZE];
+#define STACK_START (unsigned long)(doublefault_stack+DOUBLEFAULT_STACKSIZE)
+
+#define ptr_ok(x) ((x) > 0xc0000000 && (x) < 0xc1000000)
+
+static void doublefault_fn(void)
+{
+ struct Xgt_desc_struct gdt_desc = {0, 0};
+ unsigned long gdt, tss;
+
+ __asm__ __volatile__("sgdt %0": "=m" (gdt_desc): :"memory");
+ gdt = gdt_desc.address;
+
+ printk("double fault, gdt at %08lx [%d bytes]\n", gdt, gdt_desc.size);
+
+ if (ptr_ok(gdt)) {
+ gdt += GDT_ENTRY_TSS << 3;
+ tss = *(u16 *)(gdt+2);
+ tss += *(u8 *)(gdt+4) << 16;
+ tss += *(u8 *)(gdt+7) << 24;
+ printk("double fault, tss at %08lx\n", tss);
+
+ if (ptr_ok(tss)) {
+ struct tss_struct *t = (struct tss_struct *)tss;
+
+ printk("eip = %08lx, esp = %08lx\n", t->eip, t->esp);
+
+ printk("eax = %08lx, ebx = %08lx, ecx = %08lx, edx = %08lx\n",
+ t->eax, t->ebx, t->ecx, t->edx);
+ printk("esi = %08lx, edi = %08lx\n",
+ t->esi, t->edi);
+ }
+ }
+
+ for (;;) /* nothing */;
+}
+
+struct tss_struct doublefault_tss __cacheline_aligned = {
+ .esp0 = STACK_START,
+ .ss0 = __KERNEL_DS,
+ .ldt = 0,
+ .bitmap = INVALID_IO_BITMAP_OFFSET,
+ .io_bitmap = { [0 ... IO_BITMAP_SIZE ] = ~0 },
+
+ .eip = (unsigned long) doublefault_fn,
+ .eflags = 0x00000082,
+ .esp = STACK_START,
+ .es = __USER_DS,
+ .cs = __KERNEL_CS,
+ .ss = __KERNEL_DS,
+ .ds = __USER_DS,
+
+ .__cr3 = __pa(swapper_pg_dir)
+};

2003-02-20 01:03:40

by Tom Rini

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, Feb 19, 2003 at 11:23:44PM +0100, Remco Post wrote:
>
> On Wed, 19 Feb 2003 22:46:27 +0100
> Remco Post <[email protected]> wrote:
>
> > Hi all,
> >
> > just to let you all know, The linus 2.5.62 (plain as can be) just booted
> > on my motorola powerstack II system. No modules, but also, no oops on
> > boot, like 2.5.59 and allmost every other 2.5 before that....
> >
> > -- Remco
>
> and fortunately, I also have some use for booting this kernel:
>
> When the ethernet link goed down on my on-board dec-tulip:
>
> eth1: timeout expired stopping DMA
> kernel BUG at drivers/net/tulip/de2104x.c:925!
> Oops: Exception in kernel mode, sig: 4
> NIP: C0138248 LR: C0138248 SP: C0275E00 REGS: c0275d50 TRAP: 0700
> Not taintedMSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> TASK = c022f550[0] 'swapper' Last syscall: 120
> GPR00: C0138248 C0275E00 C022F550 0000002F 00000001 C0275CB8 C0271800 C02B0000
> GPR08: 0000161F 00000000 00000000 C0275D30 4000C088 00000000 00000000 00000000
> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> GPR24: 00000000 00000000 00000002 00001032 C03DD000 00009032 FFFFFFCE C03DD1C0
> Call trace: [c0138588] [c002066c] [c001b85c] [c0007e80] [c00061c4] [c00039
> Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing

What does that decode to?

--
Tom Rini
http://gate.crashing.org/~trini/

2003-02-20 02:14:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

Thanks!
Here is a triple fault case (2.5.62-pgcl) and since i'm not a Real
Man i had to use a simulator ;) Unfortunately i can't unwind the stack.

Freeing unused kernel memory: 100k freed
double fault, gdt at c0268020 [255 bytes]
double fault, tss at c027d800
eip = c01181c4, esp = f7f9bf90
eax = c0003dfc, ebx = ffffffff, ecx = 0000007b, edx = f7f9c04c
esi = 00000003, edi = c01181b0

0xc01181c4 <do_page_fault+20>: mov %eax,0xc(%esp,1)

(0) [0x001139e4] 0060:c01139e4 (t doublefault_fn+c4): jmp c0113ae4 ; ebfe

eax 0x1f 31
ecx 0xc027d800 -1071130624
edx 0xc027d800 -1071130624
ebx 0xc027d800 -1071130624
esp 0xc029f7ec 0xc029f7ec
ebp 0x0 0x0
esi 0xffffffff -1
edi 0x0 0
eip 0xc01139e4 0xc01139e4
eflags 0x4082 16514
cs 0x60 96
ss 0x68 104
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x0 0

CR0=0x8005003b
PG=paging=1
CD=cache disable=0
NW=not write through=0
AM=alignment mask=1
WP=write protect=1
NE=numeric error=1
ET=extension type=1
TS=task switched=1
EM=FPU emulation=0
MP=monitor coprocessor=1
PE=protection enable=1
CR2=page fault linear address=0xf7f9bf8c
CR3=0x00101000
PCD=page-level cache disable=0
PWT=page-level writes transparent=0
CR4=0x000000b0
VME=virtual-8086 mode extensions=0
PVI=protected-mode virtual interrupts=0
TSD=time stamp disable=0
DE=debugging extensions=0
PSE=page size extensions=1
PAE=physical address extension=1
MCE=machine check enable=0
PGE=page global enable=1
PCE=performance-monitor counter enable=0
OXFXSR=OS support for FXSAVE/FXRSTOR=0
OSXMMEXCPT=OS support for unmasked SIMD FP exceptions=0

Global Descriptor Table (0xc0268020):
GDT[0x00]=??? descriptor hi=00000000, lo=00000000
GDT[0x01]=??? descriptor hi=00000000, lo=00000000
GDT[0x02]=??? descriptor hi=00000000, lo=00000000
GDT[0x03]=??? descriptor hi=00000000, lo=00000000
GDT[0x04]=??? descriptor hi=00000000, lo=00000000
GDT[0x05]=??? descriptor hi=00000000, lo=00000000
GDT[0x06]=??? descriptor hi=00000000, lo=00000000
GDT[0x07]=??? descriptor hi=00000000, lo=00000000
GDT[0x08]=??? descriptor hi=00000000, lo=00000000
GDT[0x09]=??? descriptor hi=00000000, lo=00000000
GDT[0x0a]=??? descriptor hi=00000000, lo=00000000
GDT[0x0b]=??? descriptor hi=00000000, lo=00000000
GDT[0x0c]=Code segment, linearaddr=00000000, len=fffff * 4Kbytes, Execute/Read, 32-bit addrs
GDT[0x0d]=Data segment, linearaddr=00000000, len=fffff * 4Kbytes, Read/Write, Accessed
GDT[0x0e]=Code segment, linearaddr=00000000, len=fffff * 4Kbytes, Execute/Read, 32-bit addrs
GDT[0x0f]=Data segment, linearaddr=00000000, len=fffff * 4Kbytes, Read/Write, Accessed
GDT[0x10]=32-Bit TSS (Busy) at c027d800, length 0x000eb
GDT[0x11]=LDT
GDT[0x12]=Code segment, linearaddr=00000000, len=00000 * 4Kbytes, Execute/Read, 32-bit addrs
GDT[0x13]=Code segment, linearaddr=00000000, len=00000 * 4Kbytes, Execute/Read, 16-bit addrs
GDT[0x14]=Data segment, linearaddr=00000000, len=00000 * 4Kbytes, Read/Write
GDT[0x15]=Data segment, linearaddr=00000000, len=00000 * 4Kbytes, Read/Write
GDT[0x16]=Data segment, linearaddr=00000000, len=00000 * 4Kbytes, Read/Write
GDT[0x17]=Code segment, linearaddr=00000000, len=00000 bytes, Execute/Read, 32-bit addrs
GDT[0x18]=Code segment, linearaddr=00000000, len=00000 bytes, Execute/Read, 16-bit addrs
GDT[0x19]=Data segment, linearaddr=00000000, len=00000 bytes, Read/Write
GDT[0x1a]=??? descriptor hi=00000000, lo=00000000
GDT[0x1b]=??? descriptor hi=00000000, lo=00000000
GDT[0x1c]=??? descriptor hi=00000000, lo=00000000
GDT[0x1d]=??? descriptor hi=00000000, lo=00000000
GDT[0x1e]=??? descriptor hi=00000000, lo=00000000
GDT[0x1f]=32-Bit TSS (Busy) at c027f500, length 0x000eb

2003-02-20 02:17:29

by William Lee Irwin III

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Wed, Feb 19, 2003 at 09:22:42PM -0500, Zwane Mwaikambo wrote:
> Here is a triple fault case (2.5.62-pgcl) and since i'm not a Real
> Man i had to use a simulator ;) Unfortunately i can't unwind the stack.
>
> CR2=page fault linear address=0xf7f9bf8c
> CR3=0x00101000
> PCD=page-level cache disable=0
> PWT=page-level writes transparent=0

Looks like either a pagetable or physmap/vmalloc/fixmap screwup.
What do the bootlogs have for those things?


-- wli

2003-02-20 02:47:47

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Wed, 19 Feb 2003, William Lee Irwin III wrote:

> On Wed, Feb 19, 2003 at 09:22:42PM -0500, Zwane Mwaikambo wrote:
> > Here is a triple fault case (2.5.62-pgcl) and since i'm not a Real
> > Man i had to use a simulator ;) Unfortunately i can't unwind the stack.
> >
> > CR2=page fault linear address=0xf7f9bf8c
> > CR3=0x00101000
> > PCD=page-level cache disable=0
> > PWT=page-level writes transparent=0
>
> Looks like either a pagetable or physmap/vmalloc/fixmap screwup.
> What do the bootlogs have for those things?

Verified there were no overlapping regions. If you really really really
want them i can put in some printks

Zwane
--
function.linuxpower.ca

2003-02-20 03:06:14

by William Lee Irwin III

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Wed, 19 Feb 2003, William Lee Irwin III wrote:
>> Looks like either a pagetable or physmap/vmalloc/fixmap screwup.
>> What do the bootlogs have for those things?

On Wed, Feb 19, 2003 at 09:55:47PM -0500, Zwane Mwaikambo wrote:
> Verified there were no overlapping regions. If you really really really
> want them i can put in some printks

The printk's should have come in with the pgcl patch. Did you keep the
bootlogs? I'm looking for rounding errors in my pagetable init stuff
to see if we're trying to use memory beyond the edge of a 2MB region
we didn't bother mapping or something but that only matters for phys
mappings and so on. If you hit vmallocspace or fixmapspace it's an
entirely different question. There are also small "holes"...

So it'd be very handy to figure out which of the three spaces the
address that turned up in %cr2 was supposed to be in. I can probably
guess a little better if you told me your PAGE_MMUSHIFT value also.


-- wli

2003-02-20 04:45:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Wed, 19 Feb 2003, Zwane Mwaikambo wrote:
>
> Here is a triple fault case (2.5.62-pgcl) and since i'm not a Real
> Man i had to use a simulator ;) Unfortunately i can't unwind the stack.

Well, the reason you can't unwind the stack is the same reason you got the
double fault: the stack pointer is crap.

> Freeing unused kernel memory: 100k freed
> double fault, gdt at c0268020 [255 bytes]
> double fault, tss at c027d800
> eip = c01181c4, esp = f7f9bf90
> eax = c0003dfc, ebx = ffffffff, ecx = 0000007b, edx = f7f9c04c
> esi = 00000003, edi = c01181b0

Whee. So the double-fault patch actually ends up being useful? It didn't
help with Chris' problem, but hey, if it helps with something else..

Anyway, that %esp is crap, which also explains this:

> 0xc01181c4 <do_page_fault+20>: mov %eax,0xc(%esp,1)

Took a page fault because 0xc(%esp) wasn't there, and the page fault
couldn't write the fault trace to the stack (same reason), so you got a
double fault.

Anyway, it's hard to try to re-create any state from the above. Very few
clues about why the stack pointer is so messed up, but _usually_ a messed
up stack pointer is because the stack itself got hammered, and then the
stack pointer gets corrupted when somebody restores it off the stack (ie
the normal

movl %ebp,%esp
popl %ebp
ret

kind of epilogue thing).

You could try to make the double-fault handler print out more information,
suggested starting point something like the following: the stack pointer
is corrupted, but we know what the original top-of-stack was (esp0), so we
could print out part of that stack to get a guess about what it was doing
when it all went south..

Linus

------

===== arch/i386/kernel/doublefault.c 1.1 vs edited =====
--- 1.1/arch/i386/kernel/doublefault.c Wed Feb 19 17:48:55 2003
+++ edited/arch/i386/kernel/doublefault.c Wed Feb 19 20:50:47 2003
@@ -33,13 +33,26 @@

if (ptr_ok(tss)) {
struct tss_struct *t = (struct tss_struct *)tss;
+ unsigned long esp0 = t->esp0;

printk("eip = %08lx, esp = %08lx\n", t->eip, t->esp);

printk("eax = %08lx, ebx = %08lx, ecx = %08lx, edx = %08lx\n",
t->eax, t->ebx, t->ecx, t->edx);
- printk("esi = %08lx, edi = %08lx\n",
- t->esi, t->edi);
+ printk("esi = %08lx, edi = %08lx, %ebp = %08lx\n",
+ t->esi, t->edi, t->ebp);
+
+ /*
+ * We could print out the stack contents here: esp0
+ * is the beginning of the stack, we could print out
+ * all the code points we can find underneath it or
+ * something..
+ */
+
+ /* This might be a point to try to kill the process and clean up */
+ t->esp = esp0;
+ t->eip = (unsigned long) do_exit;
+ asm volatile("iret");
}
}


2003-02-20 04:58:22

by William Lee Irwin III

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Wed, Feb 19, 2003 at 08:52:46PM -0800, Linus Torvalds wrote:
> Whee. So the double-fault patch actually ends up being useful? It didn't
> help with Chris' problem, but hey, if it helps with something else..
> Anyway, that %esp is crap, which also explains this:
>> 0xc01181c4 <do_page_fault+20>: mov %eax,0xc(%esp,1)
> Took a page fault because 0xc(%esp) wasn't there, and the page fault
> couldn't write the fault trace to the stack (same reason), so you got a
> double fault.

Not sure where he got his %esp, but I extracted the following:

<zwane> MAXMEM=0x33e00000
<zwane> vmalloc: start = 0xf3e1f000, end = 0xfbe21000
<zwane> fixaddr: start = 0xfbe23000, end = 0xfffff000

which means somehow %esp landed in an unmapped tidbit in the middle of
of vmallocspace that isn't even mapped. I highly suspect rounding
errors of mine since I squished vmallocspace, fixmapspace, and the
physical mapping so close together they might share L3 pagetables, i.e.
they're separated by 2*MMUPAGE_SIZE instead of customary 8MB or so.


-- wli

2003-02-20 05:57:43

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Wed, 19 Feb 2003, Linus Torvalds wrote:

> + printk("esi = %08lx, edi = %08lx, %ebp = %08lx\n",
> + t->esi, t->edi, t->ebp);

Too much AT&T for you ;) '%ebp'

> + * We could print out the stack contents here: esp0
> + * is the beginning of the stack, we could print out
> + * all the code points we can find underneath it or
> + * something..
> + */

Simulator managed to dump stack for me, nothing interesting though

> +
> + /* This might be a point to try to kill the process and clean up */
> + t->esp = esp0;
> + t->eip = (unsigned long) do_exit;
> + asm volatile("iret");
> }
> }
>
>
>

Here is what i managed to fish out from the sim, not a real call trace,
i just piped the stack contents through ksymoops.

Trace; c02b97ec <doublefault_stack+fec/1000>
Trace; c02b97ee <doublefault_stack+fee/1000>
Trace; c02b97f0 <doublefault_stack+ff0/1000>
Trace; c02b97f2 <doublefault_stack+ff2/1000>
Trace; c02b97f4 <doublefault_stack+ff4/1000>
Trace; c02b97f6 <doublefault_stack+ff6/1000>
Trace; c02b97f8 <doublefault_stack+ff8/1000>
Trace; c02b97fa <doublefault_stack+ffa/1000>
Trace; c02b97fc <doublefault_stack+ffc/1000>
Trace; c02b97fe <doublefault_stack+ffe/1000>
Trace; c02b9800 <use_tsc+0/4>
Trace; c02b9802 <use_tsc+2/4>
Trace; c02b9804 <delay_at_last_interrupt+0/4>
Trace; c02b9806 <delay_at_last_interrupt+2/4>
Trace; c02b9808 <last_tsc_low+0/4>
Trace; c02b980a <last_tsc_low+2/4>
Trace; c02b980c <fast_gettimeoffset_quotient+0/4>
Trace; c02b980e <fast_gettimeoffset_quotient+2/4>
Trace; c02b9810 <pm_power_off+0/4>
Trace; c02b9812 <pm_power_off+2/4>
Trace; c02b9814 <no_idt+0/8>
Trace; c02b9816 <no_idt+2/8>
Trace; c02b9818 <no_idt+4/8>
Trace; c02b981a <no_idt+6/8>
Trace; c02b981c <reboot_mode+0/4>
Trace; c02b981e <reboot_mode+2/4>
Trace; c02b9820 <reboot_thru_bios+0/4>
Trace; c02b9822 <reboot_thru_bios+2/4>
Trace; c02b9824 <flush_cpumask+0/4>
Trace; c02b9826 <flush_cpumask+2/4>
Trace; c02b9828 <flush_mm+0/4>
Trace; c02b982a <flush_mm+2/4>
Trace; c02b982c <flush_va+0/4>
Trace; c02b982e <flush_va+2/4>
Trace; c02b9830 <call_data+0/8>
Trace; c02b9832 <call_data+2/8>
Trace; c02b9834 <call_data+4/8>
Trace; c02b9836 <call_data+6/8>
Trace; c02b9838 <cacheflush_time+0/8>
Trace; c02b983a <cacheflush_time+2/8>
Trace; c02b983c <cacheflush_time+4/8>
Trace; c02b983e <cacheflush_time+6/8>
Trace; c02b9840 <cpu_online_map+0/4>
Trace; c02b9842 <cpu_online_map+2/4>
Trace; c02b9844 <cpu_callout_map+0/4>
Trace; c02b9846 <cpu_callout_map+2/4>
Trace; c02b9848 <smp_threads_ready+0/4>
Trace; c02b984a <smp_threads_ready+2/4>
Trace; c02b984c <cache_decay_ticks+0/4>
Trace; c02b984e <cache_decay_ticks+2/4>
Trace; c02b9850 <phys_proc_id+0/4>
Trace; c02b9852 <phys_proc_id+2/4>
Trace; c02b9854 <cpu_callin_map+0/4>
Trace; c02b9856 <cpu_callin_map+2/4>
Trace; c02b9858 <smp_commenced_mask+0/4>
Trace; c02b985a <smp_commenced_mask+2/4>
Trace; c02b985c <trampoline_base+0/4>
Trace; c02b985e <trampoline_base+2/4>
Trace; c02b9860 <tsc_values+0/8>
Trace; c02b9862 <tsc_values+2/8>
Trace; c02b9864 <tsc_values+4/8>
Trace; c02b9866 <tsc_values+6/8>
Trace; c02b9868 <init_deasserted+0/4>
Trace; c02b986a <init_deasserted+2/4>

--
function.linuxpower.ca

2003-02-20 11:37:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


i think i managed to trigger a potentially useful oops, with BK-curr:

Unable to handle kernel paging request at virtual address 6b6b6b8b
printing eip:
c011944b
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0060:[<c011944b>] Not tainted
EFLAGS: 00010046
EIP is at do_page_fault+0x7b/0x4e4
eax: 6b6b6b8b ebx: 6b6b6b6b ecx: 0000002b edx: c02dd6ac
esi: 6b6b6b8b edi: ca095320 ebp: ca092170 esp: ca0920c8
ds: 007b es: 007b ss: 0068
Process start-threads (pid: 21685, threadinfo=ca090000 task=ca094ce0)
Stack: c02dd6ac 0000002b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b8b 6b6b6b6b 6b6b6b6b
6b6b6b6b 00030001 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
Call Trace:

[tons of pagefault recursion]

[<c01193d0>] do_page_fault+0x0/0x4e4
[<c010a691>] error_code+0x2d/0x38
[<c011944b>] do_page_fault+0x7b/0x4e4
[<c01193d0>] do_page_fault+0x0/0x4e4
[<c010a691>] error_code+0x2d/0x38
[<c011944b>] do_page_fault+0x7b/0x4e4
[<c01294f8>] do_timer+0xc8/0xd0
[<c013330c>] rcu_process_callbacks+0x17c/0x1b0
[<c011b4bf>] scheduler_tick+0x3ff/0x410
[<c0125113>] tasklet_action+0x73/0xc0
[<c01193d0>] do_page_fault+0x0/0x4e4
[<c010a691>] error_code+0x2d/0x38
[<c011b598>] schedule+0xb8/0x3d0
[<c01219fd>] release_task+0x17d/0x200
[<c011e70f>] mmput+0x1f/0xc0
[<c0122cad>] do_exit+0x31d/0x3b0
[<c010b328>] do_nmi+0x58/0x60
[<c012a93e>] __dequeue_signal+0x6e/0xb0
[<c0122ef0>] do_group_exit+0x110/0x140
[<c012a9ae>] dequeue_signal+0x2e/0x60
[<c012c2b1>] get_signal_to_deliver+0x2b1/0x440
[<c01099a2>] do_signal+0xb2/0xf0
[<c01296c4>] schedule_timeout+0x74/0xc0
[<c012c4f9>] sigprocmask+0x89/0x140
[<c0129640>] process_timeout+0x0/0x10
[<c012c62d>] sys_rt_sigprocmask+0x7d/0x1a0
[<c0129944>] sys_nanosleep+0x154/0x180
[<c0109a3b>] do_notify_resume+0x5b/0x60
[<c0109c72>] work_notifysig+0x13/0x15


2003-02-20 12:06:24

by William Lee Irwin III

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, Feb 20, 2003 at 12:46:51PM +0100, Ingo Molnar wrote:
> i think i managed to trigger a potentially useful oops, with BK-curr:
> Stack: c02dd6ac 0000002b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b8b 6b6b6b6b 6b6b6b6b
> 6b6b6b6b 00030001 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
> 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b

Looks like some kind of serious use-after-free slab issue. IF is clear,
so we aren't under spin_lock_irq(&rq->lock) on the initial fault. It
might be interesting to find a way to trap it earlier. Reproducible?
If so, how?


-- wli

2003-02-20 12:25:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


i had some other stuff in my tree as well, which could be the culprit. The
crash looked unrelated though. (procfs optimizations for the threaded
case.)

Ingo

2003-02-20 13:09:45

by Dave Jones

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, Feb 19, 2003 at 08:50:17PM +0200, Zilvinas Valinskas wrote:
> it might triple fault ? Who knows. One thing I am sure of, if I don't
> load agpgart + intel-agp, laptop in questions, works flawlessly.
> Otherwise first time I log of KDE trying to login as different user I
> get instant reboot.

Ok, there were quite a few changes in that area in .61.
Can you check .60 was ok, and .61 crashes the same way ?
If .61 is ok, agp is a red-herring, as it didnt change in .62

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2003-02-20 13:47:05

by Zilvinas Valinskas

[permalink] [raw]
Subject: Re: Linux v2.5.62

Hello Dave,

it was the same with 2.5.59,2.5.60 (not sure now, I will check that
later) and with 2.5.61 (and yesterdays most current bk snapshot as
well).

Can it be related to DRI ? (that might be my guess). Event though I
can't use DRI on debian unstable because libGL.so mistakenly recognizes
Pentium 4 as 3Dnow! capable and crashes immediately.

For some reasons always, once I log off - system reboots most of the
times when agpgart & agp-intel loaded (if these are not loaded) - DRI
can not be initialized and system is always stable during log off from
KDE session.



On Thu, 2003-02-20 at 15:31, Dave Jones wrote:
> On Wed, Feb 19, 2003 at 08:50:17PM +0200, Zilvinas Valinskas wrote:
> > it might triple fault ? Who knows. One thing I am sure of, if I don't
> > load agpgart + intel-agp, laptop in questions, works flawlessly.
> > Otherwise first time I log of KDE trying to login as different user I
> > get instant reboot.
>
> Ok, there were quite a few changes in that area in .61.
> Can you check .60 was ok, and .61 crashes the same way ?
> If .61 is ok, agp is a red-herring, as it didnt change in .62
>
> Dave
--
Zilvinas Valinskas
Best regards

2003-02-20 13:54:59

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, 20 Feb 2003, Ingo Molnar wrote:

>
> i had some other stuff in my tree as well, which could be the culprit. The
> crash looked unrelated though. (procfs optimizations for the threaded
> case.)

I can provide more debug information when i get back from work later.

Cheers,
Zwane
--
function.linuxpower.ca

2003-02-20 13:52:20

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, 20 Feb 2003, Ingo Molnar wrote:

>
> i think i managed to trigger a potentially useful oops, with BK-curr:
>
> Unable to handle kernel paging request at virtual address 6b6b6b8b
> printing eip:
> c011944b
> *pde = 00000000
> Oops: 0002
> CPU: 0
> EIP: 0060:[<c011944b>] Not tainted
> EFLAGS: 00010046
> EIP is at do_page_fault+0x7b/0x4e4
> eax: 6b6b6b8b ebx: 6b6b6b6b ecx: 0000002b edx: c02dd6ac
> esi: 6b6b6b8b edi: ca095320 ebp: ca092170 esp: ca0920c8
> ds: 007b es: 007b ss: 0068
> Process start-threads (pid: 21685, threadinfo=ca090000 task=ca094ce0)
> Stack: c02dd6ac 0000002b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b8b 6b6b6b6b 6b6b6b6b
> 6b6b6b6b 00030001 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
> 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b 6b6b6b6b
> Call Trace:

I've seen this with 2.5.62, it's here;

00407434086i[CPU0 ] task_switch: bad LDT segment at c0121a00
00407434086i[CPU0 ] task switch: posting exception 10 after commit point
00407434086p[CPU0 ] >>PANIC<< can_push(): SS invalidated.
00407434086i[SYS ] Last time is 1045745354
00407434086i[XGUI ] Exit.
00407434086i[CPU0 ] protected mode
00407434086i[CPU0 ] CS.d_b = 32 bit
00407434086i[CPU0 ] SS.d_b = 32 bit
00407434086i[CPU0 ] | EAX=f7ffd6b4 EBX=ffffffff ECX=0000007b
EDX=f7f9c048
00407434086i[CPU0 ] | ESP=c02b97dc EBP=00000001 ESI=00000000
EDI=c0118250
00407434086i[CPU0 ] | IOPL=0 NV UP DI NG NZ NA PO NC
00407434086i[CPU0 ] | SEG selector base limit G D
00407434086i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00407434086i[CPU0 ] | DS:007b( 000f| 0| 3) 00000000 000fffff 1 1
00407434086i[CPU0 ] | ES:007b( 000f| 0| 3) 00000000 000fffff 1 1
00407434086i[CPU0 ] | FS:0000( 0000| 0| 0) 00000000 000fffff 1 1
00407434086i[CPU0 ] | GS:0000( 0000| 0| 0) 00000000 000fffff 1 1
00407434086i[CPU0 ] | SS:0068( 000d| 0| 0) 00000000 000fffff 1 1
00407434086i[CPU0 ] | CS:0060( 000c| 0| 0) 00000000 000fffff 1 1
00407434086i[CPU0 ] | EIP=c0121a00 (c0121a00)
00407434086i[CPU0 ] | CR0=0x8005003b CR1=0x00000000 CR2=0xf7f9bf88
00407434086i[CPU0 ] | CR3=0x00000000 CR4=0x000000b0
00407434086i[CPU0 ] >> 55
00407434086i[CPU0 ] >> : push EBP

(gdb) disassemble 0xc0121a00
Dump of assembler code for function do_exit:
0xc0121a00 <do_exit>: push %ebp
0xc0121a01 <do_exit+1>: push %edi
0xc0121a02 <do_exit+2>: push %esi
0xc0121a03 <do_exit+3>: push %ebx

--
function.linuxpower.ca

2003-02-20 14:08:51

by Dave Jones

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Thu, Feb 20, 2003 at 03:57:05PM +0200, Zilvinas Valinskas wrote:

> it was the same with 2.5.59,2.5.60 (not sure now, I will check that
> later) and with 2.5.61 (and yesterdays most current bk snapshot as
> well).

.59 ? Ugh, a load of stuff has changed in agpgart/ since then.
Can you recall when it last actually worked for you ?

> Can it be related to DRI ? (that might be my guess).

You can test basic GART functionality with testgart
(http://www.codemonkey.org.uk/cruft/testgart.c)

> Event though I
> can't use DRI on debian unstable because libGL.so mistakenly recognizes
> Pentium 4 as 3Dnow! capable and crashes immediately.

If thats what I think it is, its not a bug. This has come up a number
of times on the dri-devel list.
libGL does a test which runs 3dnow instructions. Obviouslly it'll
crash on a non-3dnow capable box, but prior to the test it installs
an exception handler to fix things up if it all goes awry.

Whats the debian bugzilla number for this bug out of interest ?

> For some reasons always, once I log off - system reboots most of the
> times when agpgart & agp-intel loaded (if these are not loaded) - DRI
> can not be initialized and system is always stable during log off from
> KDE session.

The latter is normal, the former isn't (obviously).
Does it reboot as soon as you modprobe them, or when X/DRI starts ?

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2003-02-20 15:37:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Ingo Molnar wrote:
>
> i think i managed to trigger a potentially useful oops, with BK-curr:

Ok, this is definitely a stack overflow:

> EIP is at do_page_fault+0x7b/0x4e4
> eax: 6b6b6b8b ebx: 6b6b6b6b ecx: 0000002b edx: c02dd6ac
> esi: 6b6b6b8b edi: ca095320 ebp: ca092170 esp: ca0920c8
> ds: 007b es: 007b ss: 0068
> Process start-threads (pid: 21685, threadinfo=ca090000 task=ca094ce0)

Note the "threadinfo=ca090000" and "esp: ca0920c8".

If the threadinfo isn't on the same double-page as the stack, then you're
screwed, and you've just overwritten the _real_ threadinfo, and the stack
is probably screwed. In fact, any recursion on do_page_fault() is
_probably_ due to the fact that you overwrote thread-info.

This could explain Chris' problems too - my doublefault thing won't help
much if recursion on the stack has clobbered a lot of kernel state (and
the doublefault will likely happen only after enough state is clobbered
that even the doublefault handling might have trouble).

> [tons of pagefault recursion]
>
> [<c01193d0>] do_page_fault+0x0/0x4e4
> [<c010a691>] error_code+0x2d/0x38
> [<c011944b>] do_page_fault+0x7b/0x4e4
> [<c01193d0>] do_page_fault+0x0/0x4e4
> [<c010a691>] error_code+0x2d/0x38
> [<c011944b>] do_page_fault+0x7b/0x4e4
> [<c01294f8>] do_timer+0xc8/0xd0
> [<c013330c>] rcu_process_callbacks+0x17c/0x1b0
> [<c011b4bf>] scheduler_tick+0x3ff/0x410
> [<c0125113>] tasklet_action+0x73/0xc0
> [<c01193d0>] do_page_fault+0x0/0x4e4
> [<c010a691>] error_code+0x2d/0x38
> [<c011b598>] schedule+0xb8/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0
> [<c010b328>] do_nmi+0x58/0x60
> [<c012a93e>] __dequeue_signal+0x6e/0xb0
> [<c0122ef0>] do_group_exit+0x110/0x140
> [<c012a9ae>] dequeue_signal+0x2e/0x60
> [<c012c2b1>] get_signal_to_deliver+0x2b1/0x440
> [<c01099a2>] do_signal+0xb2/0xf0
> [<c01296c4>] schedule_timeout+0x74/0xc0
> [<c012c4f9>] sigprocmask+0x89/0x140
> [<c0129640>] process_timeout+0x0/0x10
> [<c012c62d>] sys_rt_sigprocmask+0x7d/0x1a0
> [<c0129944>] sys_nanosleep+0x154/0x180
> [<c0109a3b>] do_notify_resume+0x5b/0x60
> [<c0109c72>] work_notifysig+0x13/0x15

I bet the doublefaults are on "tsk->mm" accesses (specifically,
tsk->mm->mmap_sem", which should be the first of them).

That easily happens if "tsk" is crud (either because recursion has already
overwritten it, _or_ because %esp has recursed so far down that the
"current()" logic ends up hitting the next page.

The stack doesn't look _that_ deep to me, but if some of these functions
have a large local frame, then that would certainly do it.. At a guess, it
Looks like a fairly deep "schedule()" coupled with deep RCU processing.

And that RCU path is reasonably new. The infrastructure was put in 2.5.43,
which might explain Chris' case too ("somewhere before 2.5.51").

Does anybody have an up-to-date "use -gp and a special 'mcount()'
function to check stack depth" patch? The CONFIG_DEBUG_STACKOVERFLOW thing
is quite possibly too stupid to find things like this (it only finds
interrupts that overflow the stack, not deep call sequences).

Guys: you could try to enable CONFIG_DEBUG_STACKOVERFLOW, and then perhaps
make it a bit more aggressive (rigth now it does:

if (unlikely(esp < (sizeof(struct thread_info) + 1024))) {

and I'd suggest changing it to something more like

/* Have we used up more than half the stack? */
if (unlikely(esp < 4096)) {

and add a "for (;;)" after doing the dump_stack() because otherwise the
machine may reboot before you get anywhere.

Linus

2003-02-20 15:44:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


another datapoint: on SMP i can get various types of backtraces, on UP
it's the spontaneous reboot that triggers.

Ingo

2003-02-20 16:02:43

by Martin J. Bligh

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

> Does anybody have an up-to-date "use -gp and a special 'mcount()'
> function to check stack depth" patch? The CONFIG_DEBUG_STACKOVERFLOW thing
> is quite possibly too stupid to find things like this (it only finds
> interrupts that overflow the stack, not deep call sequences).
>
> Guys: you could try to enable CONFIG_DEBUG_STACKOVERFLOW, and then perhaps
> make it a bit more aggressive (rigth now it does:
>
> if (unlikely(esp < (sizeof(struct thread_info) + 1024))) {
>
> and I'd suggest changing it to something more like
>
> /* Have we used up more than half the stack? */
> if (unlikely(esp < 4096)) {
>
> and add a "for (;;)" after doing the dump_stack() because otherwise the
> machine may reboot before you get anywhere.

There are patches in -mjb from Dave Hansen / Ben LaHaise to detect stack
overflow included with the stuff for the 4K stacks patch (intended for
scaling to large numbers of tasks). I've split them out attatched, should
apply to mainline reasonably easily.

M.

PS. Linus, I think the attatchments will work for you as they're text/plain,
if not, I'll resend them all inline.


Attachments:
(No filename) (1.11 kB)
220-thread_info_cleanup (4.23 kB)
221-interrupt_stacks (13.51 kB)
222-stack_usage_check (6.45 kB)
223-4k_stacks (1.62 kB)
Download all attachments

2003-02-20 16:34:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:

> Ok, this is definitely a stack overflow:

> Does anybody have an up-to-date "use -gp and a special 'mcount()'
> function to check stack depth" patch? The CONFIG_DEBUG_STACKOVERFLOW
> thing is quite possibly too stupid to find things like this (it only
> finds interrupts that overflow the stack, not deep call sequences).

i had CONFIG_DEBUG_STACKOVERFLOW on, but i'll make it more agressive. It's
fairly easy to reproduce the oops. (at least it was when i was trying to
avoid them :-)

Ing

2003-02-20 16:48:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Martin J. Bligh wrote:
>
> There are patches in -mjb from Dave Hansen / Ben LaHaise to detect stack
> overflow included with the stuff for the 4K stacks patch (intended for
> scaling to large numbers of tasks). I've split them out attatched, should
> apply to mainline reasonably easily.

Ok, the 4kB stack definitely won't work in real life, but that's because
we have some hopelessly bad stack users in the kernel. But the debugging
part would be good to try (in fact, it might be a good idea to keep the
8kB stack, but with rather anal debugging. Just the "mcount" part should
do that).

A sorted list of bad stack users (more than 256 bytes) in my default build
follows. Anybody can create their own with something like

objdump -d linux/vmlinux |
grep 'sub.*$0x...,.*esp' |
awk '{ print $9,$1 }' |
sort > bigstack

and a script to look up the addresses.

That ide_unregister() thing uses up >2kB in just one call! And there are
several in the 1.5kB range too, with a long list of ~500 byte offenders.

Yeah, and this assumes we don't have alloca() users or other dynamic
stack allocators (non-constant-size automatic arrays). I hope we don't
have that kind of crap anywhere..

Linus

-----
0xc02ae062 <ide_unregister+8>: sub $0x8c4,%esp
0xc010535d <huft_build+9>: sub $0x5b0,%esp
0xc0326a53 <snd_pcm_oss_change_params+6>: sub $0x590,%esp
0xc0106156 <inflate_dynamic+6>: sub $0x554,%esp
0xc0176150 <elf_core_dump+13>: sub $0x4b4,%esp
0xc0105fb8 <inflate_fixed+7>: sub $0x4ac,%esp
0xc035935e <pci_sanity_check+6>: sub $0x398,%esp
0xc035986d <pcibios_fixup_peer_bridges+5>: sub $0x394,%esp
0xc0334b85 <snd_pcm_hw_params_old_user+8>: sub $0x37c,%esp
0xc0334a97 <snd_pcm_hw_refine_old_user+8>: sub $0x37c,%esp
0xc02fbc74 <cb_alloc+6>: sub $0x32c,%esp
0xc0211b2a <pci_do_scan_bus+14>: sub $0x314,%esp
0xc034be58 <snd_seq_midisynth_register_port+12>: sub $0x2f0,%esp
0xc0264406 <extract_entropy+6>: sub $0x2d8,%esp
0xc02fcdde <ds_ioctl+3>: sub $0x2c8,%esp
0xc01dbd6b <udf_load_pvoldesc+6>: sub $0x2bc,%esp
0xc0329c6e <snd_pcm_oss_proc_write+6>: sub $0x298,%esp
0xc02a218f <pcnet_config+6>: sub $0x294,%esp
0xc01c8457 <nlmclnt_proc+14>: sub $0x294,%esp
0xc0327ecc <snd_pcm_oss_get_formats+12>: sub $0x290,%esp
0xc01d781f <udf_add_entry+6>: sub $0x290,%esp
0xc01c8e56 <nlmclnt_reclaim+18>: sub $0x280,%esp
0xc0330802 <snd_pcm_hw_params_user+8>: sub $0x27c,%esp
0xc03304af <snd_pcm_hw_refine_user+8>: sub $0x27c,%esp
0xc01ea4c9 <reiserfs_rename+13>: sub $0x27c,%esp
0xc029b57c <e100_ethtool_eeprom+10>: sub $0x260,%esp
0xc020a9df <semctl_main+12>: sub $0x25c,%esp
0xc0267205 <do_kdgkb_ioctl+24>: sub $0x244,%esp
0xc01d0ac8 <do_udf_readdir+6>: sub $0x240,%esp
0xc01e137a <udf_get_filename+3>: sub $0x23c,%esp
0xc01bd38c <find_exported_dentry+8>: sub $0x234,%esp
0xc01a5fa4 <fat_readdirx+15>: sub $0x230,%esp
0xc01fe813 <reiserfs_delete_solid_item+6>: sub $0x22c,%esp
0xc031f24d <snd_iprintf+3>: sub $0x21c,%esp
0xc02b4d6f <cdrom_read_intr+8>: sub $0x21c,%esp
0xc024adfb <pnp_printf+3>: sub $0x218,%esp
0xc02b4cac <cdrom_buffer_sectors+11>: sub $0x210,%esp
0xc01ebf96 <reiserfs_get_block+8>: sub $0x210,%esp
0xc020b2f0 <sys_semtimedop+3>: sub $0x208,%esp
0xc01fe58d <reiserfs_delete_item+12>: sub $0x208,%esp
0xc0529e98 <snd_seq_oss_create_client+12>: sub $0x204,%esp
0xc038efed <tcp_check_req+6>: sub $0x1f8,%esp
0xc038b462 <tcp_v4_conn_request+6>: sub $0x1f8,%esp
0xc01fef81 <reiserfs_cut_from_item+6>: sub $0x1f8,%esp
0xc038df7f <tcp_timewait_state_process+8>: sub $0x1e4,%esp
0xc0325539 <snd_mixer_oss_build_input+3>: sub $0x1e0,%esp
0xc01d9328 <udf_symlink+13>: sub $0x1cc,%esp
0xc01ffb15 <reiserfs_insert_item+6>: sub $0x1c4,%esp
0xc01ffa03 <reiserfs_paste_into_item+6>: sub $0x1c4,%esp
0xc01c43b6 <svc_export_parse+3>: sub $0x1c4,%esp
0xc02f6770 <pcmcia_validate_cis+3>: sub $0x1c0,%esp
0xc052a2c7 <snd_seq_system_client_init+24>: sub $0x1bc,%esp
0xc03511c9 <snd_intel8x0_mixer+13>: sub $0x1bc,%esp
0xc01a54f8 <fat_search_long+6>: sub $0x1b4,%esp
0xc052a0a1 <snd_seq_oss_midi_lookup_ports+9>: sub $0x1ac,%esp
0xc02e99f5 <sg_ioctl+6>: sub $0x19c,%esp
0xc0320fb0 <snd_ctl_card_info+12>: sub $0x198,%esp
0xc0171860 <ep_send_events+8>: sub $0x198,%esp
0xc0155ad4 <blkdev_get+11>: sub $0x194,%esp
0xc01b3bea <nfs_symlink+6>: sub $0x18c,%esp
0xc01b2699 <nfs_readdir+9>: sub $0x18c,%esp
0xc01b347d <nfs_mknod+6>: sub $0x17c,%esp
0xc01d71e3 <udf_find_entry+6>: sub $0x178,%esp
0xc01b333d <nfs_create+6>: sub $0x178,%esp
0xc01b35ca <nfs_mkdir+6>: sub $0x174,%esp
0xc02873a3 <radeon_cp_vertex2+3>: sub $0x16c,%esp
0xc01583a5 <do_execve+3>: sub $0x158,%esp
0xc033e177 <snd_seq_oss_ioctl+3>: sub $0x154,%esp
0xc02f13d9 <mmc_ioctl+3>: sub $0x154,%esp
0xc017d267 <elf_kcore_store_hdr+6>: sub $0x150,%esp
0xc01f048d <reiserfs_readdir+6>: sub $0x148,%esp
0xc01b28aa <nfs_lookup_revalidate+11>: sub $0x148,%esp
0xc036d0e8 <rt_cache_seq_show+6>: sub $0x144,%esp
0xc01d4115 <udf_fill_inode+6>: sub $0x144,%esp
0xc032fec8 <snd_pcm_info_user+3>: sub $0x140,%esp
0xc0286167 <radeon_cp_clear+3>: sub $0x13c,%esp
0xc019608f <journal_commit_transaction+6>: sub $0x13c,%esp
0xc0174db5 <load_elf_binary+20>: sub $0x13c,%esp
0xc03b5ba4 <ip_map_parse+3>: sub $0x138,%esp
0xc035c698 <sys_sendmsg+8>: sub $0x134,%esp
0xc02f66fe <read_tuple+3>: sub $0x134,%esp
0xc01b2ed9 <nfs_lookup+6>: sub $0x134,%esp
0xc0172105 <aout_core_dump+21>: sub $0x134,%esp
0xc02df535 <ahc_linux_proc_info+11>: sub $0x130,%esp
0xc02d8097 <ahc_linux_info+16>: sub $0x130,%esp
0xc034d3db <snd_rawmidi_info_select_user+3>: sub $0x12c,%esp
0xc032e77a <snd_pcm_proc_info_read+4>: sub $0x12c,%esp
0xc0308874 <proc_getdriver+3>: sub $0x12c,%esp
0xc01d4c5c <udf_update_inode+6>: sub $0x12c,%esp
0xc034d2a5 <snd_rawmidi_info_user+3>: sub $0x128,%esp
0xc01e148f <udf_put_filename+3>: sub $0x128,%esp
0xc01d9c88 <udf_rename+6>: sub $0x128,%esp
0xc0325433 <snd_mixer_oss_build_test+3>: sub $0x124,%esp
0xc0321351 <snd_ctl_elem_info+11>: sub $0x124,%esp
0xc02f4c26 <verify_cis_cache+6>: sub $0x124,%esp
0xc0242307 <acpi_pci_bind+32>: sub $0x124,%esp
0xc01e8aff <reiserfs_add_entry+11>: sub $0x124,%esp
0xc01cc029 <nlmsvc_proc_granted_msg+3>: sub $0x124,%esp
0xc01cbfab <nlmsvc_proc_unlock_msg+3>: sub $0x124,%esp
0xc01cbf2d <nlmsvc_proc_cancel_msg+3>: sub $0x124,%esp
0xc01cbeaf <nlmsvc_proc_lock_msg+3>: sub $0x124,%esp
0xc01cbe31 <nlmsvc_proc_test_msg+3>: sub $0x124,%esp
0xc017c649 <meminfo_read_proc+15>: sub $0x124,%esp
0xc016a6b5 <setxattr+8>: sub $0x124,%esp
0xc01e37ef <autofs4_expire_run+12>: sub $0x120,%esp
0xc0198244 <log_do_checkpoint+6>: sub $0x120,%esp
0xc016a969 <getxattr+3>: sub $0x120,%esp
0xc0257e97 <parport_pc_probe_port+12>: sub $0x11c,%esp
0xc024263a <acpi_pci_bind_root+32>: sub $0x11c,%esp
0xc01e3118 <autofs4_notify_daemon+12>: sub $0x11c,%esp
0xc035c92a <sys_recvmsg+3>: sub $0x118,%esp
0xc031c91e <i8042_interrupt+8>: sub $0x118,%esp
0xc02ee068 <sg_proc_hoststrs_info+6>: sub $0x118,%esp
0xc02551f0 <do_autoprobe+3>: sub $0x118,%esp
0xc0241aab <acpi_pci_irq_add_prt+20>: sub $0x118,%esp
0xc016adc3 <removexattr+3>: sub $0x118,%esp
0xc02deecd <copy_info+3>: sub $0x114,%esp
0xc02c05c8 <scsi_request_sense+6>: sub $0x114,%esp
0xc020c619 <sys_shmctl+3>: sub $0x114,%esp
0xc0203c55 <reiserfs_breada+6>: sub $0x114,%esp
0xc012a88b <sys_reboot+10>: sub $0x114,%esp
0xc052aeea <pirq_peer_trick+13>: sub $0x110,%esp
0xc01a059f <ext2_get_parent+3>: sub $0x110,%esp
0xc01719cf <ep_events_transfer+11>: sub $0x110,%esp
0xc02efd9b <dvd_read_bca+3>: sub $0x10c,%esp
0xc02550e0 <do_active_device+8>: sub $0x10c,%esp
0xc01d2ab5 <inode_getblk+6>: sub $0x10c,%esp
0xc01898b5 <ext3_get_parent+12>: sub $0x10c,%esp
0xc024839d <acpi_bus_match+8>: sub $0x108,%esp
0xc029ac87 <e100_do_ethtool_ioctl+10>: sub $0x100,%esp
0xc01beba6 <write_filehandle+3>: sub $0x100,%esp

2003-02-20 17:16:11

by Jeff Garzik

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, Feb 20, 2003 at 08:54:55AM -0800, Linus Torvalds wrote:
> A sorted list of bad stack users (more than 256 bytes) in my default build
> follows. Anybody can create their own with something like
[...]

Yum. Thanks for this list (and means to reproduce)...

2003-02-20 17:54:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Ingo Molnar wrote:
>
> a true heisenbug. I cannot reproduce it anymore. Anyway, from the serial
> console i collected 3 instances of crashes - whatever it's worth.

Pretty much every single time, release_task() has been there on the
backtrace.

In fact, I bet you this code in do_exit() is the cause:

preempt_disable();

if (tsk->exit_signal == -1)
*** release_task(tsk); ***

schedule();

Note how "release_task()" will be releasing the stack that the process is
running on right now. And the reason it doesn't crash _every_ time is
simply that you need to have:

- another memory allocation that picks up that page and fills it with
something else in order to get a corrupted stack
- and something delays schedule() so that you have time to race _and_ you
need the stack. Which is why most of the oopses have an interrupt come
in inside schedule (see the "common_interrupt()" thing

In other words, I think we need to have schedule_tail() do the
release_task(), otherwise we'd release it too early while the task
structure (and the stack) are both still in use.

You owe me a patch.

Linus

---

> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0

> [<c010a594>] common_interrupt+0x18/0x20
> [<c010a691>] error_code+0x2d/0x38
> [<c011b881>] schedule+0x3a1/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0

> [<c010bc28>] handle_IRQ_event+0x38/0x60
> [<c010bf6b>] do_IRQ+0x14b/0x1e0
> [<c010a594>] common_interrupt+0x18/0x20
> [<c010a691>] error_code+0x2d/0x38
> [<c011b881>] schedule+0x3a1/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0

> [<c010bf6b>] do_IRQ+0x14b/0x1e0
> [<c010a594>] common_interrupt+0x18/0x20
> [<c010a691>] error_code+0x2d/0x38
> [<c011b881>] schedule+0x3a1/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0


> [<c010bf6b>] do_IRQ+0x14b/0x1e0
> [<c010a594>] common_interrupt+0x18/0x20
> [<c010a691>] error_code+0x2d/0x38
> [<c011b881>] schedule+0x3a1/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0

> [<c011e06c>] __put_task_struct+0x7c/0x90
> [<c0122cad>] do_exit+0x31d/0x3b0

> [<c010bc28>] handle_IRQ_event+0x38/0x60
> [<c010bf6b>] do_IRQ+0x14b/0x1e0
> [<c010a594>] common_interrupt+0x18/0x20
> [<c010a691>] error_code+0x2d/0x38
> [<c011b881>] schedule+0x3a1/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0

> [<c010bc28>] handle_IRQ_event+0x38/0x60
> [<c010bf6b>] do_IRQ+0x14b/0x1e0
> [<c010a594>] common_interrupt+0x18/0x20
> [<c010a691>] error_code+0x2d/0x38
> [<c011b881>] schedule+0x3a1/0x3d0
> [<c01219fd>] release_task+0x17d/0x200
> [<c011e70f>] mmput+0x1f/0xc0
> [<c0122cad>] do_exit+0x31d/0x3b0



2003-02-20 18:17:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:
>
> In other words, I think we need to have schedule_tail() do the
> release_task(), otherwise we'd release it too early while the task
> structure (and the stack) are both still in use.

Well, it's not "schedule_tail()" any more, since that is no longer called
by the normal schedule end-path.

Test suggestion:

- remove the

if (tsk->exit_signal == -1)
release_task(tsk);

from kernel/exit.c

- make "finish_switch()" something like

static void inline finish_switch(struct runqueue *rq, struct task_struct *prev)
{
finish_arch_switch(rp, prev);
if ((prev->state & TASK_ZOMBIE) && (prev->exit_signal == -1))
release_task(prev);
}

- make all of "kernel/sched.c" use "finish_switch()" instead of
"finish_arch_switch()" (ie replace it in both schedule_tail() and the
end of schedule() itself).

At some point we can think about trying to speed up that test for
release_task(), ie add some extra task-state or something that is set in
kernel/exit.c so that we don't slow down the task switching unnecessarily.

How does this sound?

Also, for debugging, how about this simple (but expensive) debugging thing
that only works without HIGHMEM (and is obviously whitespace-damaged due
to indenting it):

--- 1.148/mm/page_alloc.c Wed Feb 5 20:05:13 2003
+++ edited/mm/page_alloc.c Thu Feb 20 10:22:42 2003
@@ -685,6 +685,7 @@
void __free_pages(struct page *page, unsigned int order)
{
if (!PageReserved(page) && put_page_testzero(page)) {
+ memset(page_address(page), 0x01, PAGE_SIZE << order);
if (order == 0)
free_hot_page(page);
else

which should show the effects of a buggy "release_task()" much more
consistently.

Ehh?

Linus

2003-02-20 18:53:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:

> > a true heisenbug. I cannot reproduce it anymore. Anyway, from the serial
> > console i collected 3 instances of crashes - whatever it's worth.
>
> Pretty much every single time, release_task() has been there on the
> backtrace.
>
> In fact, I bet you this code in do_exit() is the cause:
>
> preempt_disable();
>
> if (tsk->exit_signal == -1)
> *** release_task(tsk); ***
>
> schedule();
>
> Note how "release_task()" will be releasing the stack that the process
> is running on right now. [...]

but, release_task() is a delayed thing for exactly this reason. It fills
out the per-CPU task_cache but does not free the task.

the release_task() + schedule() must be atomic though - ie. we must not be
preempted anytime inbetween [because that other task could free the
task_cache] - but i wasnt running with CONFIG_PREEMPT, so i cannot see how
it could happen.

Ingo

2003-02-20 19:32:41

by Remco Post

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Wed, 19 Feb 2003 18:13:39 -0700
Tom Rini <[email protected]> wrote:

> On Wed, Feb 19, 2003 at 11:23:44PM +0100, Remco Post wrote:
> >
> > On Wed, 19 Feb 2003 22:46:27 +0100
> > Remco Post <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > just to let you all know, The linus 2.5.62 (plain as can be) just booted
> > > on my motorola powerstack II system. No modules, but also, no oops on
> > > boot, like 2.5.59 and allmost every other 2.5 before that....
> > >
> > > -- Remco
> >
> > and fortunately, I also have some use for booting this kernel:
> >
> > When the ethernet link goed down on my on-board dec-tulip:
> >
> > eth1: timeout expired stopping DMA
> > kernel BUG at drivers/net/tulip/de2104x.c:925!
> > Oops: Exception in kernel mode, sig: 4
> > NIP: C0138248 LR: C0138248 SP: C0275E00 REGS: c0275d50 TRAP: 0700
> > Not taintedMSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> > TASK = c022f550[0] 'swapper' Last syscall: 120
> > GPR00: C0138248 C0275E00 C022F550 0000002F 00000001 C0275CB8 C0271800 C02B0000
> > GPR08: 0000161F 00000000 00000000 C0275D30 4000C088 00000000 00000000 00000000
> > GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > GPR24: 00000000 00000000 00000002 00001032 C03DD000 00009032 FFFFFFCE C03DD1C0
> > Call trace: [c0138588] [c002066c] [c001b85c] [c0007e80] [c00061c4] [c00039
> > Kernel panic: Aiee, killing interrupt handler!
> > In interrupt handler - not syncing
>
> What does that decode to?
>

Well it doesn't, of course, relevant addresses close to the ones in the call trace:

c00061c4 T ret_from_except
c0003904 t setup_disp_bat
c0003950 T init_idle_6xx
c0003988 T ppc6xx_idle
c0007bfc T timer_interrupt
c0007e94 T do_gettimeofday
c001b7d4 T do_softirq
c001b8d8 T raise_softirq
c0020560 t run_timer_softirq
c00206c4 T run_local_timers
c0138460 t de21040_media_timer
c0138620 t de_ok_to_advertise


> --
> Tom Rini
> http://gate.crashing.org/~trini/

Hope this is about what you're looking for... If not, please let me know...


--

Remco

2003-02-20 19:29:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


hm, i think i can see the SMP race.

the last put_task_struct() can also be done by procfs - and nothing keeps
it from freeing the task in __put_task_struct(), while the task struct is
after its final put_task_struct(), but before the switch_to().

this does not explain the UP crash though.

Ingo


2003-02-20 19:36:15

by Tom Rini

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Thu, Feb 20, 2003 at 08:42:36PM +0100, Remco Post wrote:
> On Wed, 19 Feb 2003 18:13:39 -0700
> Tom Rini <[email protected]> wrote:
>
> > On Wed, Feb 19, 2003 at 11:23:44PM +0100, Remco Post wrote:
> > >
> > > On Wed, 19 Feb 2003 22:46:27 +0100
> > > Remco Post <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > just to let you all know, The linus 2.5.62 (plain as can be) just booted
> > > > on my motorola powerstack II system. No modules, but also, no oops on
> > > > boot, like 2.5.59 and allmost every other 2.5 before that....
> > > >
> > > > -- Remco
> > >
> > > and fortunately, I also have some use for booting this kernel:
> > >
> > > When the ethernet link goed down on my on-board dec-tulip:
> > >
> > > eth1: timeout expired stopping DMA
> > > kernel BUG at drivers/net/tulip/de2104x.c:925!
> > > Oops: Exception in kernel mode, sig: 4
> > > NIP: C0138248 LR: C0138248 SP: C0275E00 REGS: c0275d50 TRAP: 0700
> > > Not taintedMSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> > > TASK = c022f550[0] 'swapper' Last syscall: 120
> > > GPR00: C0138248 C0275E00 C022F550 0000002F 00000001 C0275CB8 C0271800 C02B0000
> > > GPR08: 0000161F 00000000 00000000 C0275D30 4000C088 00000000 00000000 00000000
> > > GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > > GPR24: 00000000 00000000 00000002 00001032 C03DD000 00009032 FFFFFFCE C03DD1C0
> > > Call trace: [c0138588] [c002066c] [c001b85c] [c0007e80] [c00061c4] [c00039
> > > Kernel panic: Aiee, killing interrupt handler!
> > > In interrupt handler - not syncing
> >
> > What does that decode to?
> >
>
> Well it doesn't, of course, relevant addresses close to the ones in the call trace:

Um, ksymoops should be able to decode that fine...

--
Tom Rini
http://gate.crashing.org/~trini/

2003-02-20 19:47:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Ingo Molnar wrote:

> hm, i think i can see the SMP race.
>
> the last put_task_struct() can also be done by procfs - and nothing
> keeps it from freeing the task in __put_task_struct(), while the task
> struct is after its final put_task_struct(), but before the switch_to().

this race is correctly solved by moving the wait_task_inactive() from
release_task() into the tsk != current branch of __free_task_struct().

Ingo

2003-02-20 19:50:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


ie. something like:

(untested yet.)

--- linux/kernel/exit.c.orig2 2003-02-20 21:55:56.000000000 +0100
+++ linux/kernel/exit.c 2003-02-20 21:56:02.000000000 +0100
@@ -66,9 +66,6 @@

BUG_ON(p->state < TASK_ZOMBIE);

- if (p != current)
- wait_task_inactive(p);
-
atomic_dec(&p->user->processes);
security_task_free(p);
free_uid(p->user);
--- linux/kernel/fork.c.orig2 2003-02-20 21:55:59.000000000 +0100
+++ linux/kernel/fork.c 2003-02-20 21:57:07.000000000 +0100
@@ -75,6 +75,8 @@
void __put_task_struct(struct task_struct *tsk)
{
if (tsk != current) {
+ if (tsk != current)
+ wait_task_inactive(tsk);
free_thread_info(tsk->thread_info);
kmem_cache_free(task_struct_cachep,tsk);
} else {

2003-02-20 19:55:28

by Remco Post

[permalink] [raw]
Subject: Re: Linux v2.5.62

On Thu, 20 Feb 2003 12:46:14 -0700
Tom Rini <[email protected]> wrote:

>
> On Thu, Feb 20, 2003 at 08:42:36PM +0100, Remco Post wrote:
> > On Wed, 19 Feb 2003 18:13:39 -0700
> > Tom Rini <[email protected]> wrote:
> >
> > > On Wed, Feb 19, 2003 at 11:23:44PM +0100, Remco Post wrote:
> > > >
> > > > On Wed, 19 Feb 2003 22:46:27 +0100
> > > > Remco Post <[email protected]> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > just to let you all know, The linus 2.5.62 (plain as can be) just booted
> > > > > on my motorola powerstack II system. No modules, but also, no oops on
> > > > > boot, like 2.5.59 and allmost every other 2.5 before that....
> > > > >
> > > > > -- Remco
> > > >
> > > > and fortunately, I also have some use for booting this kernel:
> > > >
> > > > When the ethernet link goed down on my on-board dec-tulip:
> > > >
> > > > eth1: timeout expired stopping DMA
> > > > kernel BUG at drivers/net/tulip/de2104x.c:925!
> > > > Oops: Exception in kernel mode, sig: 4
> > > > NIP: C0138248 LR: C0138248 SP: C0275E00 REGS: c0275d50 TRAP: 0700
> > > > Not taintedMSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> > > > TASK = c022f550[0] 'swapper' Last syscall: 120
> > > > GPR00: C0138248 C0275E00 C022F550 0000002F 00000001 C0275CB8 C0271800 C02B0000
> > > > GPR08: 0000161F 00000000 00000000 C0275D30 4000C088 00000000 00000000 00000000
> > > > GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > > > GPR24: 00000000 00000000 00000002 00001032 C03DD000 00009032 FFFFFFCE C03DD1C0
> > > > Call trace: [c0138588] [c002066c] [c001b85c] [c0007e80] [c00061c4] [c00039
> > > > Kernel panic: Aiee, killing interrupt handler!
> > > > In interrupt handler - not syncing
> > >
> > > What does that decode to?
> > >
> >
> > Well it doesn't, of course, relevant addresses close to the ones in the call trace:
>
> Um, ksymoops should be able to decode that fine...
>

That's the hint I needed:

$ ksymoops -v vmlinux -O -K -L -m System.map ~/oops.file
ksymoops 2.4.5 on ppc 2.4.18-powerpc. Options used
-v vmlinux (specified)
-K (specified)
-L (specified)
-O (specified)
-m System.map (specified)

Oops: Exception in kernel mode, sig: 4
NIP: C0138248 LR: C0138248 SP: C0275E00 REGS: c0275d50 TRAP: 0700 Not tainted
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c022f550[0] 'swapper' Last syscall: 120
GPR00: C0138248 C0275E00 C022F550 0000002F 00000001 C0275CB8 C0271800 C02B0000
GPR08: 00001398 00000000 00000000 C0275D30 4000C088 00000000 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000002 00001032 C03DD000 00009032 FFFFFFCE C03DD1C0
Call trace: [c0138588] [c002066c] [c001b85c] [c0007e80] [c00061c4] [c00039
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
Using defaults from ksymoops -t elf32-powerpc -a powerpc:common
Warning (Oops_read): Code line not seen, dumping what data is available


>>NIP; c0138248 <de_set_media+48/1f0> <=====


1 warning issued. Results may not be reliable.
$

> --
> Tom Rini
> http://gate.crashing.org/~trini/
>
> ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
>

--

Remco

2003-02-20 20:07:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Ingo Molnar wrote:

> ie. something like:
>
> (untested yet.)

tested it - works fine, but i was unable to reproduce the crash in the
past couple of hours, so this datapoint is of little value ATM.

Ingo

2003-02-20 20:02:56

by Chris Wedgwood

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, Feb 20, 2003 at 07:43:16AM -0800, Linus Torvalds wrote:

> This could explain Chris' problems too - my doublefault thing won't
> help much if recursion on the stack has clobbered a lot of kernel
> state (and the doublefault will likely happen only after enough
> state is clobbered that even the doublefault handling might have
> trouble).

An overflow *might* explain why

- it never happens under 2.4.x

- for some configurations of 2.5.x it never seems to happen either

- for some configurations of 2.5.x it does happen, but it's very
nebulous as to which options are required to make this happen;
very few options seems table, many options crashes quickly, and a
in-between it lasts for what might be slightly longer periods of
time

Now, one thing I'm using that many people may not be is XFS, ACLs &
quota. Since IRIX has almost inifinite memory available in
kernel-space, I should check to make sure XFS isn't sucking too much
stack space somewhere... it could be that it is, and depending on the
right magic internal XFS state and when an interrupt arrives or
similar, something goes splat.

I have the stack checking on, but as observed it may not suffice. I
wonder if 16k stacks are possible for testing?



--cw

2003-02-20 20:11:37

by Alan

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, 2003-02-20 at 16:54, Linus Torvalds wrote:
> Ok, the 4kB stack definitely won't work in real life, but that's because
> we have some hopelessly bad stack users in the kernel. But the debugging
> part would be good to try (in fact, it might be a good idea to keep the
> 8kB stack, but with rather anal debugging. Just the "mcount" part should
> do that).

You also need IRQ stacks to get down to 4K. The wrong pattern of ten
different IRQ handlers using a mere 200 bytes each will eventually
happen and eventually kill you otherwise.

> That ide_unregister() thing uses up >2kB in just one call! And there are
> several in the 1.5kB range too, with a long list of ~500 byte offenders.

ide_unregister is a really stupid one. Its copying a struct mostly to
restore fields it shouldnt be restoring but should be setting in the
allocator. I hadn't realised quite how bad it was. Added to the ide
shitlist


2003-02-20 20:14:17

by Martin J. Bligh

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

>> Ok, the 4kB stack definitely won't work in real life, but that's because
>> we have some hopelessly bad stack users in the kernel. But the debugging
>> part would be good to try (in fact, it might be a good idea to keep the
>> 8kB stack, but with rather anal debugging. Just the "mcount" part should
>> do that).
>
> You also need IRQ stacks to get down to 4K. The wrong pattern of ten
> different IRQ handlers using a mere 200 bytes each will eventually
> happen and eventually kill you otherwise.

That's in Dave's patchset, and 4K stacks is a config option for now.

M.

2003-02-20 20:16:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On 20 Feb 2003, Alan Cox wrote:
> On Thu, 2003-02-20 at 16:54, Linus Torvalds wrote:
> > Ok, the 4kB stack definitely won't work in real life, but that's because
> > we have some hopelessly bad stack users in the kernel. But the debugging
> > part would be good to try (in fact, it might be a good idea to keep the
> > 8kB stack, but with rather anal debugging. Just the "mcount" part should
> > do that).
>
> You also need IRQ stacks to get down to 4K. The wrong pattern of ten
> different IRQ handlers using a mere 200 bytes each will eventually
> happen and eventually kill you otherwise.

Martin's patch set included the per-IRQ stacks, so that part should be ok.
However, since even a single function will overflow the stack depth test
of "half the stack", I'm just saying that right now the 4kB stacks
obviously shouldn't be used for overflow testing (and the 8kB stack
version right now is way too permissive).

> > That ide_unregister() thing uses up >2kB in just one call! And there are
> > several in the 1.5kB range too, with a long list of ~500 byte offenders.
>
> ide_unregister is a really stupid one. Its copying a struct mostly to
> restore fields it shouldnt be restoring but should be setting in the
> allocator. I hadn't realised quite how bad it was. Added to the ide
> shitlist

Well, ide_unregister() was only the worst of a fairly large bunch of crap.

Although I guess nobody is really surprised.

Linus

2003-02-20 20:11:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Ingo Molnar wrote:
>
> ie. something like:

Well, please remove the double test for task inequality.

I like the patch conceptually, HOWEVER, I'm not sure it's correct. The
thing is, moving the wait_task_inactive() to __put_task_struct() means
that we will be doing the "release_task()" teardown while the task is
still potentially active on another CPU.

In particular, we'll be freeing the security stuff and the signals while
the process may still be active in the scheduler on another CPU. This can
be dangerous, ie doing things like calling "free_uid()" on a process that
is still running means that suddenly you have issues like not being able
to trust "current->user" from interrupts. We may not care right now, but
it's still wrong (imagine us doing per-user time accounting - which makes
a _lot_ of sense).

Linus

2003-02-20 20:34:34

by William Lee Irwin III

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

At some point in the past, _A_ wrote:
>> You also need IRQ stacks to get down to 4K. The wrong pattern of ten
>> different IRQ handlers using a mere 200 bytes each will eventually
>> happen and eventually kill you otherwise.

On Thu, Feb 20, 2003 at 12:23:49PM -0800, Martin J. Bligh wrote:
> That's in Dave's patchset, and 4K stacks is a config option for now.

You might want to grab aeb's fully non-recursive pathwalking if
you really want to cut back the stack to 4KB, as well as fixing
whatever stackblasting drivers are about.


-- wli

2003-02-20 20:42:48

by Andrew Morton

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

Linus Torvalds <[email protected]> wrote:
>
> wait_task_inactive()

There are two other bugs in this exact area. I received the below from Bill
Irwin and Rick Lindsley yesterday. Can someone take this off my hands?


Fixes two deadlocks in the scheduler exit path:

1: We're calling mmdrop() under spin_lock_irq(&rq->lock). But mmdrop
calls vfree(), which calls smp_call_function().

It is not legal to call smp_call_function() with irq's off. Because
another CPU may be running smp_call_function() against _this_ CPU, which
deadlocks.

So the patch arranges for mmdrop() to not be called under
spin_lock_irq(&rq->lock).

2: We are leaving local interrupts disabled coming out of exit_notify().
But we are about to call wait_task_inactive() which spins, waiting for
another CPU to end a task. If that CPU has issued smp_call_function() to
this CPU, deadlock.

So the patch enables interrupts again before returning from exit_notify().

Also, exit_notify() returns with preemption disabled, so there is no
need to perform another preempt_disable() in do_exit().


exit.c | 17 +++++++++++------
sched.c | 43 ++++++++++++++++++++++++++++++++++++++-----
2 files changed, 49 insertions(+), 11 deletions(-)

diff -puN kernel/exit.c~wli-mem-leak-fix kernel/exit.c
--- 25/kernel/exit.c~wli-mem-leak-fix 2003-02-20 03:10:08.000000000 -0800
+++ 25-akpm/kernel/exit.c 2003-02-20 03:10:35.000000000 -0800
@@ -674,13 +674,19 @@ static void exit_notify(struct task_stru

tsk->state = TASK_ZOMBIE;
/*
- * No need to unlock IRQs, we'll schedule() immediately
- * anyway. In the preemption case this also makes it
- * impossible for the task to get runnable again (thus
- * the "_raw_" unlock - to make sure we don't try to
- * preempt here).
+ * In the preemption case it must be impossible for the task
+ * to get runnable again, so use "_raw_" unlock to keep
+ * preempt_count elevated until we schedule().
+ *
+ * To avoid deadlock on SMP, interrupts must be unmasked. If we
+ * don't, subsequently called functions (e.g, wait_task_inactive()
+ * via release_task()) will spin, with interrupt flags
+ * unwittingly blocked, until the other task sleeps. That task
+ * may itself be waiting for smp_call_function() to answer and
+ * complete, and with interrupts blocked that will never happen.
*/
_raw_write_unlock(&tasklist_lock);
+ local_irq_enable();
}

NORET_TYPE void do_exit(long code)
@@ -727,7 +733,6 @@ NORET_TYPE void do_exit(long code)

tsk->exit_code = code;
exit_notify(tsk);
- preempt_disable();

if (tsk->exit_signal == -1)
release_task(tsk);
diff -puN kernel/sched.c~wli-mem-leak-fix kernel/sched.c
--- 25/kernel/sched.c~wli-mem-leak-fix 2003-02-20 03:10:08.000000000 -0800
+++ 25-akpm/kernel/sched.c 2003-02-20 03:10:08.000000000 -0800
@@ -152,6 +152,7 @@ struct runqueue {
unsigned long nr_running, nr_switches, expired_timestamp,
nr_uninterruptible;
task_t *curr, *idle;
+ struct mm_struct *prev_mm;
prio_array_t *active, *expired, arrays[2];
int prev_nr_running[NR_CPUS];
#ifdef CONFIG_NUMA
@@ -388,7 +389,10 @@ static inline void resched_task(task_t *
* wait_task_inactive - wait for a thread to unschedule.
*
* The caller must ensure that the task *will* unschedule sometime soon,
- * else this function might spin for a *long* time.
+ * else this function might spin for a *long* time. This function can't
+ * be called with interrupts off, or it may introduce deadlock with
+ * smp_call_function() if an IPI is sent by the same process we are
+ * waiting to become inactive.
*/
void wait_task_inactive(task_t * p)
{
@@ -558,10 +562,24 @@ void sched_exit(task_t * p)
/**
* schedule_tail - first thing a freshly forked thread must call.
* @prev: the thread we just switched away from.
+ *
+ * Note that we may have delayed dropping an mm in context_switch(). If
+ * so, we finish that here outside of the runqueue lock. (Doing it
+ * with the lock held can cause deadlocks; see schedule() for
+ * details.)
+ */
+if (mm)
*/
asmlinkage void schedule_tail(task_t *prev)
{
- finish_arch_switch(this_rq(), prev);
+ runqueue_t *rq = this_rq();
+ struct mm_struct *mm = rq->prev_mm;
+
+ rq->prev_mm = NULL;
+ finish_arch_switch(rq, prev);
+ if (mm)
+ mmdrop(mm);
+
if (current->set_child_tid)
put_user(current->pid, current->set_child_tid);
}
@@ -570,7 +588,7 @@ asmlinkage void schedule_tail(task_t *pr
* context_switch - switch to the new MM and the new
* thread's register state.
*/
-static inline task_t * context_switch(task_t *prev, task_t *next)
+static inline task_t * context_switch(runqueue_t *rq, task_t *prev, task_t *next)
{
struct mm_struct *mm = next->mm;
struct mm_struct *oldmm = prev->active_mm;
@@ -584,7 +602,8 @@ static inline task_t * context_switch(ta

if (unlikely(!prev->mm)) {
prev->active_mm = NULL;
- mmdrop(oldmm);
+ WARN_ON(rq->prev_mm);
+ rq->prev_mm = oldmm;
}

/* Here we just switch the register state and the stack. */
@@ -1223,14 +1242,28 @@ switch_tasks:
RCU_qsctr(prev->thread_info->cpu)++;

if (likely(prev != next)) {
+ struct mm_struct *prev_mm;
rq->nr_switches++;
rq->curr = next;

prepare_arch_switch(rq, next);
- prev = context_switch(prev, next);
+ prev = context_switch(rq, prev, next);
barrier();
rq = this_rq();
+ prev_mm = rq->prev_mm;
+ rq->prev_mm = NULL;
+
+ /*
+ * It's extremely improtant to drop the runqueue lock
+ * before mmdrop(): on i386, destroy_context(), called
+ * by mmdrop(), can potentially vfree() LDT's. This may
+ * generate interrupts to processors spinning (with
+ * interrupts blocked) on the runqueue lock we're holding.
+ */
finish_arch_switch(rq, prev);
+
+ if (prev_mm)
+ mmdrop(prev_mm);
} else
spin_unlock_irq(&rq->lock);


_

2003-02-20 20:46:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, William Lee Irwin III wrote:
>
> You might want to grab aeb's fully non-recursive pathwalking if
> you really want to cut back the stack to 4KB, as well as fixing
> whatever stackblasting drivers are about.

The path walking should really not be an issue. Each level of a symlink
takes something like 64 bytes of stack on x86 (I checked it some time ago,
maybe it's changed a bit), since the actual recursive part is very shallow
indeed.

And since we don't recurse deeper than 5 levels anyway, the symlink
recursion ends up not being a real problem compared to a lot of other
code (never mind the single functions with hundreds of bytes of stack
space: just regular function calls 5 levels deep is quite normal).

That fs recursion was not the problem even back in the days when the max
stack depth was <3kB (4kB allocation, 1kB task_struct). It used to be 8
levels deep or something, it was changed to 5 not because we ran out on
x86, but because of those stupid sparc register windows (causing much
bigger minimum function stack requirements than on x86).

Linus

2003-02-20 21:55:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Andrew Morton wrote:

> Fixes two deadlocks in the scheduler exit path:
>
> 1: We're calling mmdrop() under spin_lock_irq(&rq->lock). But mmdrop
> calls vfree(), which calls smp_call_function().

this has been fixed in the -F3 scheduler patch.

Ingo

2003-02-20 21:51:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:

> > ie. something like:
>
> Well, please remove the double test for task inequality.

ok.

> I like the patch conceptually, HOWEVER, I'm not sure it's correct. The
> thing is, moving the wait_task_inactive() to __put_task_struct() means
> that we will be doing the "release_task()" teardown while the task is
> still potentially active on another CPU.
>
> In particular, we'll be freeing the security stuff and the signals while
> the process may still be active in the scheduler on another CPU. This
> can be dangerous, ie doing things like calling "free_uid()" on a process
> that is still running means that suddenly you have issues like not being
> able to trust "current->user" from interrupts. We may not care right
> now, but it's still wrong (imagine us doing per-user time accounting -
> which makes a _lot_ of sense).

well, we can do the wait_task_inactive() in both cases - in
release_task(), and in __put_task_struct(). [in the release_task() path
that will just be a nop]. This further simplifies the patch.

Ingo

--- kernel/fork.c.orig
+++ kernel/fork.c
@@ -75,6 +75,7 @@
void __put_task_struct(struct task_struct *tsk)
{
if (tsk != current) {
+ wait_task_inactive(tsk);
free_thread_info(tsk->thread_info);
kmem_cache_free(task_struct_cachep,tsk);
} else {

2003-02-20 22:26:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Ingo Molnar wrote:
>
> well, we can do the wait_task_inactive() in both cases - in
> release_task(), and in __put_task_struct(). [in the release_task() path
> that will just be a nop]. This further simplifies the patch.

I think the _real_ simplification is to just have the task switch do this
in the tail:

if (prev->state & TASK_DEAD)
put_task_struct(prev);

suddenly we don't have any issues at all with possibly freeing stuff
before its time, since we're guaranteed to keep the process around untill
we've properly scheduled out of it.

Suggested patch (against current BK, which has the finish_task_switch()
cleanups I mentioned earlier) appended. No special cases, nu subtlety with
__put_task_struct() caches, no nothing.

Linus

-----
===== kernel/exit.c 1.97 vs edited =====
--- 1.97/kernel/exit.c Thu Feb 20 03:10:35 2003
+++ edited/kernel/exit.c Thu Feb 20 14:28:39 2003
@@ -103,7 +103,6 @@
dput(proc_dentry);
}
release_thread(p);
- put_task_struct(p);
}

/* we are using it only for SMP init */
===== kernel/sched.c 1.160 vs edited =====
--- 1.160/kernel/sched.c Thu Feb 20 05:42:54 2003
+++ edited/kernel/sched.c Thu Feb 20 14:27:23 2003
@@ -581,6 +581,8 @@
finish_arch_switch(rq, prev);
if (mm)
mmdrop(mm);
+ if (prev->state & TASK_DEAD)
+ put_task_struct(prev);
}

/**

2003-02-20 22:33:10

by William Lee Irwin III

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, 20 Feb 2003, Andrew Morton wrote:
>> Fixes two deadlocks in the scheduler exit path:
>> 1: We're calling mmdrop() under spin_lock_irq(&rq->lock). But mmdrop
>> calls vfree(), which calls smp_call_function().

On Thu, Feb 20, 2003 at 11:04:41PM +0100, Ingo Molnar wrote:
> this has been fixed in the -F3 scheduler patch.

Not quite. It leaks mm's because schedule_tail() isn't cleaning
up rq->prev_mm.


-- wli

2003-02-20 22:36:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:
>
> Suggested patch (against current BK, which has the finish_task_switch()
> cleanups I mentioned earlier) appended. No special cases, nu subtlety with
> __put_task_struct() caches, no nothing.

Yeah, don't bother to tell me it doesn't work. We need the task pointer to
include information on _both_ "I'm still using it" (the task itself) _and_
the "I'm waiting for it" case. So it's not just a matter of moving the
put_task() thing around, it needs to get the accounting right..

Linus

2003-02-20 22:39:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:
>
> Yeah, don't bother to tell me it doesn't work. We need the task pointer to
> include information on _both_ "I'm still using it" (the task itself) _and_
> the "I'm waiting for it" case. So it's not just a matter of moving the
> put_task() thing around, it needs to get the accounting right..

And the way to get the accounting right (I think) is actually truly
trivial: we should initialize the task count to _two_ at process creation
time, since we have two users (the parent who will do the wait, and our
own usage).

This should mean that we'd actually have the process count right, and
wouldn't need the games we play right now. Ie the patch should be
something like the appended (which again is totally untested, it might
easily have serious problems, that's not really the point. The point is
that reference counting is the only sane memory management policy, and we
did it wrong).

Linus

---
===== kernel/fork.c 1.106 vs edited =====
--- 1.106/kernel/fork.c Tue Feb 18 13:54:44 2003
+++ edited/kernel/fork.c Thu Feb 20 14:42:25 2003
@@ -217,7 +217,9 @@
*tsk = *orig;
tsk->thread_info = ti;
ti->task = tsk;
- atomic_set(&tsk->usage,1);
+
+ /* One for us, one for whoever does the "release_task()" (usually parent) */
+ atomic_set(&tsk->usage,2);
return tsk;
}

===== kernel/sched.c 1.160 vs edited =====
--- 1.160/kernel/sched.c Thu Feb 20 05:42:54 2003
+++ edited/kernel/sched.c Thu Feb 20 14:27:23 2003
@@ -581,6 +581,8 @@
finish_arch_switch(rq, prev);
if (mm)
mmdrop(mm);
+ if (prev->state & TASK_DEAD)
+ put_task_struct(prev);
}

/**

2003-02-20 22:58:58

by Chris Wedgwood

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, Feb 20, 2003 at 08:11:31AM -0800, Martin J. Bligh wrote:

> There are patches in -mjb from Dave Hansen / Ben LaHaise to detect
> stack overflow included with the stuff for the 4K stacks patch
> (intended for scaling to large numbers of tasks). I've split them
> out attatched, should apply to mainline reasonably easily.

I tried with these patches and also wli's sched deadlock fix to see if
that helps.

Sadly not, I can still easily reproduce a reboot.


--cw

2003-02-20 22:47:51

by John Levon

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, Feb 20, 2003 at 02:32:02PM -0800, Linus Torvalds wrote:

> I think the _real_ simplification is to just have the task switch do this
> in the tail:
>
> if (prev->state & TASK_DEAD)
> put_task_struct(prev);
>
> suddenly we don't have any issues at all with possibly freeing stuff
> before its time, since we're guaranteed to keep the process around untill
> we've properly scheduled out of it.

Side note ... if there's a sleepable context in which oprofile can
synchronise its buffers (i.e. after the task can possible run on a CPU
again, and before the task_struct itself is freed/reused), that would be
very handy.

Currently we're masking out any samples when PF_EXITING is set for
current(), which is obviously less than ideal.

Would this be such a spot ? Basically somewhere that profile_exit_task
can sit.

regards
john

2003-02-20 23:12:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:

> > well, we can do the wait_task_inactive() in both cases - in
> > release_task(), and in __put_task_struct(). [in the release_task() path
> > that will just be a nop]. This further simplifies the patch.
>
> I think the _real_ simplification is to just have the task switch do
> this in the tail:

if possible i'd avoid putting more overhead into the scheduler - it's
clearly more performance-sensitive than the task create/exit path.

Ingo

2003-02-20 23:30:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Fri, 21 Feb 2003, Ingo Molnar wrote:
>
> if possible i'd avoid putting more overhead into the scheduler - it's
> clearly more performance-sensitive than the task create/exit path.

This is a single non-serializing bit test, and if it means that the task
counters are _right_, that's definitely the right thing to do.

Linus

2003-02-21 06:50:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, Linus Torvalds wrote:

> > if possible i'd avoid putting more overhead into the scheduler - it's
> > clearly more performance-sensitive than the task create/exit path.
>
> This is a single non-serializing bit test, and if it means that the task
> counters are _right_, that's definitely the right thing to do.

ok. Plus the wait_task_inactive() stuff was always a bit volatile. Now we
could in fact remove it from release_task(), right?

Ingo

2003-02-21 06:55:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Thu, 20 Feb 2003, William Lee Irwin III wrote:

> >> 1: We're calling mmdrop() under spin_lock_irq(&rq->lock). But mmdrop
> >> calls vfree(), which calls smp_call_function().
>
> On Thu, Feb 20, 2003 at 11:04:41PM +0100, Ingo Molnar wrote:
> > this has been fixed in the -F3 scheduler patch.
>
> Not quite. It leaks mm's because schedule_tail() isn't cleaning
> up rq->prev_mm.

hm, this i think was a forward-porting oversight. Anyway, now the separate
patch is in, and it's better that way, the fix was unrelated to the main
things -F3 does.

Ingo

2003-02-21 07:32:52

by Muli Ben-Yehuda

[permalink] [raw]
Subject: [PATCH] snd_pcm_oss_change_params is a stack offender

On Thu, Feb 20, 2003 at 08:54:55AM -0800, Linus Torvalds wrote:

> Ok, the 4kB stack definitely won't work in real life, but that's because
> we have some hopelessly bad stack users in the kernel. But the debugging
> part would be good to try (in fact, it might be a good idea to keep the
> 8kB stack, but with rather anal debugging. Just the "mcount" part should
> do that).
>
> A sorted list of bad stack users (more than 256 bytes) in my default build
> follows. Anybody can create their own with something like
>
> objdump -d linux/vmlinux |
> grep 'sub.*$0x...,.*esp' |
> awk '{ print $9,$1 }' |
> sort > bigstack
>
> and a script to look up the addresses.
>
[snipped]

> 0xc02ae062 <ide_unregister+8>: sub $0x8c4,%esp
> 0xc010535d <huft_build+9>: sub $0x5b0,%esp
> 0xc0326a53 <snd_pcm_oss_change_params+6>: sub $0x590,%esp

Here's a quick patch to fix the third worst offender,
snd_pcm_oss_change_params. Compiles fine but not tested yet.

# sound/core/oss/pcm_oss.c 1.20 -> 1.21
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/02/21 [email protected] 1.1007
# snd_pcm_oss_change_params was a stack offender, having three large
# structs on the stack. Allocate those structs on the heap and change
# the code accordingly.
# --------------------------------------------
#
diff -Nru a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
--- a/sound/core/oss/pcm_oss.c Fri Feb 21 09:35:24 2003
+++ b/sound/core/oss/pcm_oss.c Fri Feb 21 09:35:24 2003
@@ -291,11 +291,51 @@
return snd_pcm_hw_param_near(substream, params, SNDRV_PCM_HW_PARAM_RATE, best_rate, 0);
}

+static int alloc_param_structs(snd_pcm_hw_params_t** params,
+ snd_pcm_hw_params_t** sparams,
+ snd_pcm_sw_params_t** sw_params)
+{
+ snd_pcm_hw_params_t* hwp;
+ snd_pcm_sw_params_t* swp;
+
+ if (!(hwp = kmalloc(sizeof(*hwp), GFP_KERNEL)))
+ goto out;
+
+ memset(hwp, 0, sizeof(*hwp));
+ *params = hwp;
+
+ if (!(hwp = kmalloc(sizeof(*hwp), GFP_KERNEL)))
+ goto free_params;
+
+ memset(hwp, 0, sizeof(*hwp));
+ *sparams = hwp;
+
+ if (!(swp = kmalloc(sizeof(*swp), GFP_KERNEL)))
+ goto free_sparams;
+
+ memset(swp, 0, sizeof(*swp));
+ *sw_params = swp;
+
+ return 0;
+
+ free_sparams:
+ kfree(*sparams);
+ *sparams = NULL;
+
+ free_params:
+ kfree(*params);
+ *params = NULL;
+
+ out:
+ return -ENOMEM;
+}
+
+
static int snd_pcm_oss_change_params(snd_pcm_substream_t *substream)
{
snd_pcm_runtime_t *runtime = substream->runtime;
- snd_pcm_hw_params_t params, sparams;
- snd_pcm_sw_params_t sw_params;
+ snd_pcm_hw_params_t *params, *sparams;
+ snd_pcm_sw_params_t *sw_params;
ssize_t oss_buffer_size, oss_period_size;
size_t oss_frame_size;
int err;
@@ -311,9 +351,14 @@
direct = (setup != NULL && setup->direct);
}

- _snd_pcm_hw_params_any(&sparams);
- _snd_pcm_hw_param_setinteger(&sparams, SNDRV_PCM_HW_PARAM_PERIODS);
- _snd_pcm_hw_param_min(&sparams, SNDRV_PCM_HW_PARAM_PERIODS, 2, 0);
+ if ((err = alloc_param_structs(&params, &sparams, &sw_params))) {
+ snd_printd("out of memory\n");
+ return err;
+ }
+
+ _snd_pcm_hw_params_any(sparams);
+ _snd_pcm_hw_param_setinteger(sparams, SNDRV_PCM_HW_PARAM_PERIODS);
+ _snd_pcm_hw_param_min(sparams, SNDRV_PCM_HW_PARAM_PERIODS, 2, 0);
snd_mask_none(&mask);
if (atomic_read(&runtime->mmap_count))
snd_mask_set(&mask, SNDRV_PCM_ACCESS_MMAP_INTERLEAVED);
@@ -322,17 +367,17 @@
if (!direct)
snd_mask_set(&mask, SNDRV_PCM_ACCESS_RW_NONINTERLEAVED);
}
- err = snd_pcm_hw_param_mask(substream, &sparams, SNDRV_PCM_HW_PARAM_ACCESS, &mask);
+ err = snd_pcm_hw_param_mask(substream, sparams, SNDRV_PCM_HW_PARAM_ACCESS, &mask);
if (err < 0) {
snd_printd("No usable accesses\n");
return -EINVAL;
}
- choose_rate(substream, &sparams, runtime->oss.rate);
- snd_pcm_hw_param_near(substream, &sparams, SNDRV_PCM_HW_PARAM_CHANNELS, runtime->oss.channels, 0);
+ choose_rate(substream, sparams, runtime->oss.rate);
+ snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_CHANNELS, runtime->oss.channels, 0);

format = snd_pcm_oss_format_from(runtime->oss.format);

- sformat_mask = *hw_param_mask(&sparams, SNDRV_PCM_HW_PARAM_FORMAT);
+ sformat_mask = *hw_param_mask(sparams, SNDRV_PCM_HW_PARAM_FORMAT);
if (direct)
sformat = format;
else
@@ -349,46 +394,46 @@
return -EINVAL;
}
}
- err = _snd_pcm_hw_param_set(&sparams, SNDRV_PCM_HW_PARAM_FORMAT, sformat, 0);
+ err = _snd_pcm_hw_param_set(sparams, SNDRV_PCM_HW_PARAM_FORMAT, sformat, 0);
snd_assert(err >= 0, return err);

if (direct) {
- params = sparams;
+ memcpy(params, sparams, sizeof(*params));
} else {
- _snd_pcm_hw_params_any(&params);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_ACCESS,
+ _snd_pcm_hw_params_any(params);
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_ACCESS,
SNDRV_PCM_ACCESS_RW_INTERLEAVED, 0);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_FORMAT,
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_FORMAT,
snd_pcm_oss_format_from(runtime->oss.format), 0);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_CHANNELS,
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_CHANNELS,
runtime->oss.channels, 0);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_RATE,
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_RATE,
runtime->oss.rate, 0);
pdprintf("client: access = %i, format = %i, channels = %i, rate = %i\n",
- params_access(&params), params_format(&params),
- params_channels(&params), params_rate(&params));
+ params_access(params), params_format(params),
+ params_channels(params), params_rate(params));
}
pdprintf("slave: access = %i, format = %i, channels = %i, rate = %i\n",
- params_access(&sparams), params_format(&sparams),
- params_channels(&sparams), params_rate(&sparams));
+ params_access(sparams), params_format(sparams),
+ params_channels(sparams), params_rate(sparams));

- oss_frame_size = snd_pcm_format_physical_width(params_format(&params)) *
- params_channels(&params) / 8;
+ oss_frame_size = snd_pcm_format_physical_width(params_format(params)) *
+ params_channels(params) / 8;

snd_pcm_oss_plugin_clear(substream);
if (!direct) {
/* add necessary plugins */
snd_pcm_oss_plugin_clear(substream);
if ((err = snd_pcm_plug_format_plugins(substream,
- &params,
- &sparams)) < 0) {
+ params,
+ sparams)) < 0) {
snd_printd("snd_pcm_plug_format_plugins failed: %i\n", err);
snd_pcm_oss_plugin_clear(substream);
return err;
}
if (runtime->oss.plugin_first) {
snd_pcm_plugin_t *plugin;
- if ((err = snd_pcm_plugin_build_io(substream, &sparams, &plugin)) < 0) {
+ if ((err = snd_pcm_plugin_build_io(substream, sparams, &plugin)) < 0) {
snd_printd("snd_pcm_plugin_build_io failed: %i\n", err);
snd_pcm_oss_plugin_clear(substream);
return err;
@@ -405,51 +450,50 @@
}
}

- err = snd_pcm_oss_period_size(substream, &params, &sparams);
+ err = snd_pcm_oss_period_size(substream, params, sparams);
if (err < 0)
return err;

n = snd_pcm_plug_slave_size(substream, runtime->oss.period_bytes / oss_frame_size);
- err = snd_pcm_hw_param_near(substream, &sparams, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, n, 0);
+ err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, n, 0);
snd_assert(err >= 0, return err);

- err = snd_pcm_hw_param_near(substream, &sparams, SNDRV_PCM_HW_PARAM_PERIODS,
+ err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIODS,
runtime->oss.periods, 0);
snd_assert(err >= 0, return err);

snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_DROP, 0);

- if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_HW_PARAMS, &sparams)) < 0) {
+ if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_HW_PARAMS, sparams)) < 0) {
snd_printd("HW_PARAMS failed: %i\n", err);
return err;
}

- memset(&sw_params, 0, sizeof(sw_params));
if (runtime->oss.trigger) {
- sw_params.start_threshold = 1;
+ sw_params->start_threshold = 1;
} else {
- sw_params.start_threshold = runtime->boundary;
+ sw_params->start_threshold = runtime->boundary;
}
if (atomic_read(&runtime->mmap_count))
- sw_params.stop_threshold = runtime->boundary;
+ sw_params->stop_threshold = runtime->boundary;
else
- sw_params.stop_threshold = runtime->buffer_size;
- sw_params.tstamp_mode = SNDRV_PCM_TSTAMP_NONE;
- sw_params.period_step = 1;
- sw_params.sleep_min = 0;
- sw_params.avail_min = runtime->period_size;
- sw_params.xfer_align = 1;
- sw_params.silence_threshold = 0;
- sw_params.silence_size = 0;
+ sw_params->stop_threshold = runtime->buffer_size;
+ sw_params->tstamp_mode = SNDRV_PCM_TSTAMP_NONE;
+ sw_params->period_step = 1;
+ sw_params->sleep_min = 0;
+ sw_params->avail_min = runtime->period_size;
+ sw_params->xfer_align = 1;
+ sw_params->silence_threshold = 0;
+ sw_params->silence_size = 0;

- if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_SW_PARAMS, &sw_params)) < 0) {
+ if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_SW_PARAMS, sw_params)) < 0) {
snd_printd("SW_PARAMS failed: %i\n", err);
return err;
}
runtime->control->avail_min = runtime->period_size;

- runtime->oss.periods = params_periods(&sparams);
- oss_period_size = snd_pcm_plug_client_size(substream, params_period_size(&sparams));
+ runtime->oss.periods = params_periods(sparams);
+ oss_period_size = snd_pcm_plug_client_size(substream, params_period_size(sparams));
snd_assert(oss_period_size >= 0, return -EINVAL);
if (runtime->oss.plugin_first) {
err = snd_pcm_plug_alloc(substream, oss_period_size);
@@ -468,12 +512,12 @@
runtime->oss.period_bytes,
runtime->oss.buffer_bytes);
pdprintf("slave: period_size = %i, buffer_size = %i\n",
- params_period_size(&sparams),
- params_buffer_size(&sparams));
+ params_period_size(sparams),
+ params_buffer_size(sparams));

- runtime->oss.format = snd_pcm_oss_format_to(params_format(&params));
- runtime->oss.channels = params_channels(&params);
- runtime->oss.rate = params_rate(&params);
+ runtime->oss.format = snd_pcm_oss_format_to(params_format(params));
+ runtime->oss.channels = params_channels(params);
+ runtime->oss.rate = params_rate(params);

runtime->oss.params = 0;
runtime->oss.prepare = 1;


--
Muli Ben-Yehuda
http://www.mulix.org

2003-02-21 07:49:01

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] snd_pcm_oss_change_params is a stack offender

On Feb 21, 2003 09:39 +0200, Muli Ben-Yehuda wrote:
> +static int alloc_param_structs(snd_pcm_hw_params_t** params,
> + snd_pcm_hw_params_t** sparams,
> + snd_pcm_sw_params_t** sw_params)

So, it looks like you've changed a large stack user into a leaker of
memory. Nowhere is the allocated memory freed, AFAICS, not upon
successful completion, nor at any of the error exits.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2003-02-21 08:13:57

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] snd_pcm_oss_change_params is a stack offender

[again, with a real subject line this time. This just isn't my day].

On Fri, Feb 21, 2003 at 12:58:52AM -0700, Andreas Dilger wrote:
> On Feb 21, 2003 09:39 +0200, Muli Ben-Yehuda wrote:
> > +static int alloc_param_structs(snd_pcm_hw_params_t** params,
> > + snd_pcm_hw_params_t** sparams,
> > + snd_pcm_sw_params_t** sw_params)
>
> So, it looks like you've changed a large stack user into a leaker of
> memory. Nowhere is the allocated memory freed, AFAICS, not upon
> successful completion, nor at any of the error exits.

Thanks for spotting. I can only claim not having woken up yet.

Here's a fixed patch, which frees the allocations properly. I didn't
want to make more than the minimal changes necessary, but if it's ok
with the maintainer, it should be switched to the common "goto style",
and something should be done about those snd_asserts. Jaroslav, ok to
rewrite?

# sound/core/oss/pcm_oss.c 1.20 -> 1.22
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/02/21 [email protected] 1.1007
# snd_pcm_oss_change_params was a stack offender, having three large
# structs on the stack. Allocate those structs on the heap and change
# the code accordingly.
# --------------------------------------------
# 03/02/21 [email protected] 1.1008
# This time, also free the memory :-((
# Thanks to Andreas Dilger for spotting.
# --------------------------------------------
#
diff -Nru a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
--- a/sound/core/oss/pcm_oss.c Fri Feb 21 10:15:10 2003
+++ b/sound/core/oss/pcm_oss.c Fri Feb 21 10:15:10 2003
@@ -291,11 +291,58 @@
return snd_pcm_hw_param_near(substream, params, SNDRV_PCM_HW_PARAM_RATE, best_rate, 0);
}

+static int alloc_param_structs(snd_pcm_hw_params_t** params,
+ snd_pcm_hw_params_t** sparams,
+ snd_pcm_sw_params_t** sw_params)
+{
+ snd_pcm_hw_params_t* hwp;
+ snd_pcm_sw_params_t* swp;
+
+ if (!(hwp = kmalloc(sizeof(*hwp), GFP_KERNEL)))
+ goto out;
+
+ memset(hwp, 0, sizeof(*hwp));
+ *params = hwp;
+
+ if (!(hwp = kmalloc(sizeof(*hwp), GFP_KERNEL)))
+ goto free_params;
+
+ memset(hwp, 0, sizeof(*hwp));
+ *sparams = hwp;
+
+ if (!(swp = kmalloc(sizeof(*swp), GFP_KERNEL)))
+ goto free_sparams;
+
+ memset(swp, 0, sizeof(*swp));
+ *sw_params = swp;
+
+ return 0;
+
+ free_sparams:
+ kfree(*sparams);
+ *sparams = NULL;
+
+ free_params:
+ kfree(*params);
+ *params = NULL;
+
+ out:
+ return -ENOMEM;
+}
+
+static void free_param_structs(snd_pcm_hw_params_t* params, snd_pcm_hw_params_t* sparams,
+ snd_pcm_sw_params_t* sw_params)
+{
+ kfree(params);
+ kfree(sparams);
+ kfree(sw_params);
+}
+
static int snd_pcm_oss_change_params(snd_pcm_substream_t *substream)
{
snd_pcm_runtime_t *runtime = substream->runtime;
- snd_pcm_hw_params_t params, sparams;
- snd_pcm_sw_params_t sw_params;
+ snd_pcm_hw_params_t *params, *sparams;
+ snd_pcm_sw_params_t *sw_params;
ssize_t oss_buffer_size, oss_period_size;
size_t oss_frame_size;
int err;
@@ -311,9 +358,14 @@
direct = (setup != NULL && setup->direct);
}

- _snd_pcm_hw_params_any(&sparams);
- _snd_pcm_hw_param_setinteger(&sparams, SNDRV_PCM_HW_PARAM_PERIODS);
- _snd_pcm_hw_param_min(&sparams, SNDRV_PCM_HW_PARAM_PERIODS, 2, 0);
+ if ((err = alloc_param_structs(&params, &sparams, &sw_params))) {
+ snd_printd("out of memory\n");
+ return err;
+ }
+
+ _snd_pcm_hw_params_any(sparams);
+ _snd_pcm_hw_param_setinteger(sparams, SNDRV_PCM_HW_PARAM_PERIODS);
+ _snd_pcm_hw_param_min(sparams, SNDRV_PCM_HW_PARAM_PERIODS, 2, 0);
snd_mask_none(&mask);
if (atomic_read(&runtime->mmap_count))
snd_mask_set(&mask, SNDRV_PCM_ACCESS_MMAP_INTERLEAVED);
@@ -322,17 +374,18 @@
if (!direct)
snd_mask_set(&mask, SNDRV_PCM_ACCESS_RW_NONINTERLEAVED);
}
- err = snd_pcm_hw_param_mask(substream, &sparams, SNDRV_PCM_HW_PARAM_ACCESS, &mask);
+ err = snd_pcm_hw_param_mask(substream, sparams, SNDRV_PCM_HW_PARAM_ACCESS, &mask);
if (err < 0) {
+ free_param_structs(params, sparams, sw_params);
snd_printd("No usable accesses\n");
return -EINVAL;
}
- choose_rate(substream, &sparams, runtime->oss.rate);
- snd_pcm_hw_param_near(substream, &sparams, SNDRV_PCM_HW_PARAM_CHANNELS, runtime->oss.channels, 0);
+ choose_rate(substream, sparams, runtime->oss.rate);
+ snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_CHANNELS, runtime->oss.channels, 0);

format = snd_pcm_oss_format_from(runtime->oss.format);

- sformat_mask = *hw_param_mask(&sparams, SNDRV_PCM_HW_PARAM_FORMAT);
+ sformat_mask = *hw_param_mask(sparams, SNDRV_PCM_HW_PARAM_FORMAT);
if (direct)
sformat = format;
else
@@ -345,50 +398,53 @@
break;
}
if (sformat > SNDRV_PCM_FORMAT_LAST) {
+ free_param_structs(params, sparams, sw_params);
snd_printd("Cannot find a format!!!\n");
return -EINVAL;
}
}
- err = _snd_pcm_hw_param_set(&sparams, SNDRV_PCM_HW_PARAM_FORMAT, sformat, 0);
- snd_assert(err >= 0, return err);
+ err = _snd_pcm_hw_param_set(sparams, SNDRV_PCM_HW_PARAM_FORMAT, sformat, 0);
+ snd_assert(err >= 0, {free_param_structs(params, sparams, sw_params); return err});

if (direct) {
- params = sparams;
+ memcpy(params, sparams, sizeof(*params));
} else {
- _snd_pcm_hw_params_any(&params);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_ACCESS,
+ _snd_pcm_hw_params_any(params);
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_ACCESS,
SNDRV_PCM_ACCESS_RW_INTERLEAVED, 0);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_FORMAT,
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_FORMAT,
snd_pcm_oss_format_from(runtime->oss.format), 0);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_CHANNELS,
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_CHANNELS,
runtime->oss.channels, 0);
- _snd_pcm_hw_param_set(&params, SNDRV_PCM_HW_PARAM_RATE,
+ _snd_pcm_hw_param_set(params, SNDRV_PCM_HW_PARAM_RATE,
runtime->oss.rate, 0);
pdprintf("client: access = %i, format = %i, channels = %i, rate = %i\n",
- params_access(&params), params_format(&params),
- params_channels(&params), params_rate(&params));
+ params_access(params), params_format(params),
+ params_channels(params), params_rate(params));
}
pdprintf("slave: access = %i, format = %i, channels = %i, rate = %i\n",
- params_access(&sparams), params_format(&sparams),
- params_channels(&sparams), params_rate(&sparams));
+ params_access(sparams), params_format(sparams),
+ params_channels(sparams), params_rate(sparams));

- oss_frame_size = snd_pcm_format_physical_width(params_format(&params)) *
- params_channels(&params) / 8;
+ oss_frame_size = snd_pcm_format_physical_width(params_format(params)) *
+ params_channels(params) / 8;

snd_pcm_oss_plugin_clear(substream);
if (!direct) {
/* add necessary plugins */
snd_pcm_oss_plugin_clear(substream);
if ((err = snd_pcm_plug_format_plugins(substream,
- &params,
- &sparams)) < 0) {
+ params,
+ sparams)) < 0) {
+ free_param_structs(params, sparams, sw_params);
snd_printd("snd_pcm_plug_format_plugins failed: %i\n", err);
snd_pcm_oss_plugin_clear(substream);
return err;
}
if (runtime->oss.plugin_first) {
snd_pcm_plugin_t *plugin;
- if ((err = snd_pcm_plugin_build_io(substream, &sparams, &plugin)) < 0) {
+ if ((err = snd_pcm_plugin_build_io(substream, sparams, &plugin)) < 0) {
+ free_param_structs(params, sparams, sw_params);
snd_printd("snd_pcm_plugin_build_io failed: %i\n", err);
snd_pcm_oss_plugin_clear(substream);
return err;
@@ -399,67 +455,73 @@
err = snd_pcm_plugin_insert(plugin);
}
if (err < 0) {
+ free_param_structs(params, sparams, sw_params);
snd_pcm_oss_plugin_clear(substream);
return err;
}
}
}

- err = snd_pcm_oss_period_size(substream, &params, &sparams);
- if (err < 0)
+ err = snd_pcm_oss_period_size(substream, params, sparams);
+ if (err < 0) {
+ free_param_structs(params, sparams, sw_params);
return err;
+ }

n = snd_pcm_plug_slave_size(substream, runtime->oss.period_bytes / oss_frame_size);
- err = snd_pcm_hw_param_near(substream, &sparams, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, n, 0);
- snd_assert(err >= 0, return err);
+ err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, n, 0);
+ snd_assert(err >= 0, {free_param_structs(params, sparams, sw_params); return err});

- err = snd_pcm_hw_param_near(substream, &sparams, SNDRV_PCM_HW_PARAM_PERIODS,
+ err = snd_pcm_hw_param_near(substream, sparams, SNDRV_PCM_HW_PARAM_PERIODS,
runtime->oss.periods, 0);
- snd_assert(err >= 0, return err);
+ snd_assert(err >= 0, {free_param_structs(params, sparams, sw_params); return err});

snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_DROP, 0);

- if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_HW_PARAMS, &sparams)) < 0) {
+ if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_HW_PARAMS, sparams)) < 0) {
+ free_param_structs(params, sparams, sw_params);
snd_printd("HW_PARAMS failed: %i\n", err);
return err;
}

- memset(&sw_params, 0, sizeof(sw_params));
if (runtime->oss.trigger) {
- sw_params.start_threshold = 1;
+ sw_params->start_threshold = 1;
} else {
- sw_params.start_threshold = runtime->boundary;
+ sw_params->start_threshold = runtime->boundary;
}
if (atomic_read(&runtime->mmap_count))
- sw_params.stop_threshold = runtime->boundary;
+ sw_params->stop_threshold = runtime->boundary;
else
- sw_params.stop_threshold = runtime->buffer_size;
- sw_params.tstamp_mode = SNDRV_PCM_TSTAMP_NONE;
- sw_params.period_step = 1;
- sw_params.sleep_min = 0;
- sw_params.avail_min = runtime->period_size;
- sw_params.xfer_align = 1;
- sw_params.silence_threshold = 0;
- sw_params.silence_size = 0;
+ sw_params->stop_threshold = runtime->buffer_size;
+ sw_params->tstamp_mode = SNDRV_PCM_TSTAMP_NONE;
+ sw_params->period_step = 1;
+ sw_params->sleep_min = 0;
+ sw_params->avail_min = runtime->period_size;
+ sw_params->xfer_align = 1;
+ sw_params->silence_threshold = 0;
+ sw_params->silence_size = 0;

- if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_SW_PARAMS, &sw_params)) < 0) {
+ if ((err = snd_pcm_kernel_ioctl(substream, SNDRV_PCM_IOCTL_SW_PARAMS, sw_params)) < 0) {
+ free_param_structs(params, sparams, sw_params);
snd_printd("SW_PARAMS failed: %i\n", err);
return err;
}
runtime->control->avail_min = runtime->period_size;

- runtime->oss.periods = params_periods(&sparams);
- oss_period_size = snd_pcm_plug_client_size(substream, params_period_size(&sparams));
- snd_assert(oss_period_size >= 0, return -EINVAL);
+ runtime->oss.periods = params_periods(sparams);
+ oss_period_size = snd_pcm_plug_client_size(substream, params_period_size(sparams));
+ snd_assert(oss_period_size >= 0, {free_param_structs(params, sparams, sw_params); return -EINVAL});
if (runtime->oss.plugin_first) {
err = snd_pcm_plug_alloc(substream, oss_period_size);
- if (err < 0)
+ if (err < 0) {
+ free_param_structs(params, sparams, sw_params);
return err;
+ }
}
oss_period_size *= oss_frame_size;

oss_buffer_size = oss_period_size * runtime->oss.periods;
- snd_assert(oss_buffer_size >= 0, return -EINVAL);
+ snd_assert(oss_buffer_size >= 0, {free_param_structs(params, sparams, sw_params); return -EINVAL});

runtime->oss.period_bytes = oss_period_size;
runtime->oss.buffer_bytes = oss_buffer_size;
@@ -468,12 +530,12 @@
runtime->oss.period_bytes,
runtime->oss.buffer_bytes);
pdprintf("slave: period_size = %i, buffer_size = %i\n",
- params_period_size(&sparams),
- params_buffer_size(&sparams));
+ params_period_size(sparams),
+ params_buffer_size(sparams));

- runtime->oss.format = snd_pcm_oss_format_to(params_format(&params));
- runtime->oss.channels = params_channels(&params);
- runtime->oss.rate = params_rate(&params);
+ runtime->oss.format = snd_pcm_oss_format_to(params_format(params));
+ runtime->oss.channels = params_channels(params);
+ runtime->oss.rate = params_rate(params);

runtime->oss.params = 0;
runtime->oss.prepare = 1;
@@ -483,6 +545,8 @@
runtime->oss.buffer_used = 0;
if (runtime->dma_area)
snd_pcm_format_set_silence(runtime->format, runtime->dma_area, bytes_to_samples(runtime, runtime->dma_bytes));
+
+ free_param_structs(params, sparams, sw_params);
return 0;
}


--
Muli Ben-Yehuda
http://www.mulix.org


2003-02-21 13:07:15

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: Linux v2.5.62

Zilvinas Valinskas <[email protected]> writes:

> On Wed, Feb 19, 2003 at 05:53:43AM -0500, David Ford wrote:
> > 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> > careful and do very little in X, it seems to stay up for a few days. If
> > I do any sort of fast graphics or sound, etc, it'll die very quickly.
> > 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> > serial console.
> >
> > Just an FYI, I'm trying to narrow it down.
>
> it might triple fault ? Who knows. One thing I am sure of, if I don't
> load agpgart + intel-agp, laptop in questions, works flawlessly.
> Otherwise first time I log of KDE trying to login as different user I
> get instant reboot.
>

I'm seeing the same on my Evo800c, I think it's very much
ACPI-related, as logging out of gnome and back in worked before i got
a newer ACPI-patch on 2.4. Currently on 2.4.20 with ACPI patch from
early January.

Planning on testing out the latest ACPI-patch dates February 18th
along with 2.4.21-pre4 now; and tinker a bit with the DSDT to make it
usefull; I'll let you know how it works out.

>
> Compaq EVO 800
> Intel P4, 1.7GHz, 256MB RAM, ATI Radeon Mobility LY (something).
>

Got the same box, only 512Mb more RAM ;)

mvh,
A

--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy

2003-02-21 14:58:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)


On Fri, 21 Feb 2003, Ingo Molnar wrote:
> >
> > This is a single non-serializing bit test, and if it means that the task
> > counters are _right_, that's definitely the right thing to do.
>
> ok. Plus the wait_task_inactive() stuff was always a bit volatile. Now we
> could in fact remove it from release_task(), right?

Yes, except for the same concerns I had about your patch moving it.

That part could be cleanly solvged by just moving a lot of the tear-down
of the "struct task_struct" entirely into "__put_task_struct()" (which now
can never be called with "current == tsk"), ie if we do the "free_user()"
_there_, then I think we can remove the wait_task_inactive() entirely from
the wait path.

Linus

2003-02-22 04:25:34

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: Linux v2.5.62

Alexander Hoogerhuis <[email protected]> writes:

> Zilvinas Valinskas <[email protected]> writes:
>
> > On Wed, Feb 19, 2003 at 05:53:43AM -0500, David Ford wrote:
> > > 2.5.60+ is rather unstable for me on an Athlon CPU w/ gcc 3.2.2. If I'm
> > > careful and do very little in X, it seems to stay up for a few days. If
> > > I do any sort of fast graphics or sound, etc, it'll die very quickly.
> > > 'tis an instant death with no OOPS, nothing at all on screen, nothing on
> > > serial console.
> > >
> > > Just an FYI, I'm trying to narrow it down.
> >
> > it might triple fault ? Who knows. One thing I am sure of, if I don't
> > load agpgart + intel-agp, laptop in questions, works flawlessly.
> > Otherwise first time I log of KDE trying to login as different user I
> > get instant reboot.
> >
>
> I'm seeing the same on my Evo800c, I think it's very much
> ACPI-related, as logging out of gnome and back in worked before i got
> a newer ACPI-patch on 2.4. Currently on 2.4.20 with ACPI patch from
> early January.
>
> Planning on testing out the latest ACPI-patch dates February 18th
> along with 2.4.21-pre4 now; and tinker a bit with the DSDT to make it
> usefull; I'll let you know how it works out.
>

Made a new kernel, 2.4.21-pre4 with ACPI form 0218 patched it, and
recompiled. Running with the builtin its fine, and my own supplied DSDT the
machine will instantly reboot when hitting the logout-button in Gnome
2.2.

How do I get a way of telling exactly what went pear shaped whe the
machine just reboots like that?

mvh,
A
--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy

2003-02-27 18:45:20

by Randy.Dunlap

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, 20 Feb 2003 08:54:55 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

| On Thu, 20 Feb 2003, Martin J. Bligh wrote:
| >
| > There are patches in -mjb from Dave Hansen / Ben LaHaise to detect stack
| > overflow included with the stuff for the 4K stacks patch (intended for
| > scaling to large numbers of tasks). I've split them out attatched, should
| > apply to mainline reasonably easily.
|
| Ok, the 4kB stack definitely won't work in real life, but that's because
| we have some hopelessly bad stack users in the kernel. But the debugging
| part would be good to try (in fact, it might be a good idea to keep the
| 8kB stack, but with rather anal debugging. Just the "mcount" part should
| do that).
|
| A sorted list of bad stack users (more than 256 bytes) in my default build
| follows. Anybody can create their own with something like
|
| objdump -d linux/vmlinux |
| grep 'sub.*$0x...,.*esp' |
| awk '{ print $9,$1 }' |
| sort > bigstack
|
| and a script to look up the addresses.
|
| That ide_unregister() thing uses up >2kB in just one call! And there are
| several in the 1.5kB range too, with a long list of ~500 byte offenders.
|
| Yeah, and this assumes we don't have alloca() users or other dynamic
| stack allocators (non-constant-size automatic arrays). I hope we don't
| have that kind of crap anywhere..

I don't get a nice listing from this script like you did.
Example of mine is below. Do I just have a tools issue?

Thanks,
--
~Randy



$0x424,%esp c01f6bc0:
$0x490,%esp c0106010:
$0x4ac,%esp c016aec3:
$0x540,%esp c01061a6:
$0x5ac,%esp c010533e:
$0x798,%esp c02528b8:
$0x924,%esp c02484fb:

2003-02-27 19:38:54

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, Feb 27, 2003 at 10:50:56AM -0800, Randy.Dunlap wrote:
> On Thu, 20 Feb 2003 08:54:55 -0800 (PST)
> Linus Torvalds <[email protected]> wrote:

[snipped]

> | A sorted list of bad stack users (more than 256 bytes) in my default build
> | follows. Anybody can create their own with something like
> |
> | objdump -d linux/vmlinux |
> | grep 'sub.*$0x...,.*esp' |
> | awk '{ print $9,$1 }' |
> | sort > bigstack
> |
> | and a script to look up the addresses.

[snipped]

> I don't get a nice listing from this script like you did.
> Example of mine is below. Do I just have a tools issue?

See the part where Linus said "...and a script to look up the
addresses.". You can use 'ksymoops -v vmlinux -m System.map --no-ksyms
--no-lsmod -A 0xcodebabe' to translate address to symbol.
--
Muli Ben-Yehuda
http://www.mulix.org


Attachments:
(No filename) (845.00 B)
(No filename) (189.00 B)
Download all attachments

2003-02-27 19:42:15

by Randy.Dunlap

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

On Thu, 27 Feb 2003 21:39:44 +0200
Muli Ben-Yehuda <[email protected]> wrote:

| On Thu, Feb 27, 2003 at 10:50:56AM -0800, Randy.Dunlap wrote:
| > On Thu, 20 Feb 2003 08:54:55 -0800 (PST)
| > Linus Torvalds <[email protected]> wrote:
|
| [snipped]
|
| > | A sorted list of bad stack users (more than 256 bytes) in my default build
| > | follows. Anybody can create their own with something like
| > |
| > | objdump -d linux/vmlinux |
| > | grep 'sub.*$0x...,.*esp' |
| > | awk '{ print $9,$1 }' |
| > | sort > bigstack
| > |
| > | and a script to look up the addresses.
|
| [snipped]
|
| > I don't get a nice listing from this script like you did.
| > Example of mine is below. Do I just have a tools issue?
|
| See the part where Linus said "...and a script to look up the
| addresses.". You can use 'ksymoops -v vmlinux -m System.map --no-ksyms
| --no-lsmod -A 0xcodebabe' to translate address to symbol.

Yes, sorry about skimming over that.
And yes, I'm familiar with that option of ksymoops.* :)

--
~Randy


*: since it's based on
http://www.osdl.org/archive/rddunlap/scripts/ksysmap

2003-02-27 23:27:03

by Randy.Dunlap

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

| A sorted list of bad stack users (more than 256 bytes) in my default build
| follows. Anybody can create their own with something like
|
| objdump -d linux/vmlinux |
| grep 'sub.*$0x...,.*esp' |
| awk '{ print $9,$1 }' |
| sort > bigstack
|
| and a script to look up the addresses.
|
| That ide_unregister() thing uses up >2kB in just one call! And there are
| several in the 1.5kB range too, with a long list of ~500 byte offenders.
|
| Yeah, and this assumes we don't have alloca() users or other dynamic
| stack allocators (non-constant-size automatic arrays). I hope we don't
| have that kind of crap anywhere..

Keith Owens did such a script over 1 year ago. It's available from
http://kernelnewbies.org/scripts/check-stack.sh
It also identifies (flags) dynamic stack allocation.
(course, I can't read Keith's as well as I can Linus's)

--
~Randy

2003-03-02 06:02:10

by Keith Owens

[permalink] [raw]
Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots)

Linus Torvalds <[email protected]> wrote:
> A sorted list of bad stack users (more than 256 bytes) in my default build
> follows. Anybody can create their own with something like
>
> objdump -d linux/vmlinux |
> grep 'sub.*$0x...,.*esp' |
> awk '{ print $9,$1 }' |
> sort > bigstack
>
> and a script to look up the addresses.
>
> Yeah, and this assumes we don't have alloca() users or other dynamic
> stack allocators (non-constant-size automatic arrays). I hope we don't
> have that kind of crap anywhere..

We do.

kernel.stack identifies big offenders, dynamic stacks and tells you
which procedure is at fault. This must be at least the fifth time I
have published this script.

#!/bin/bash
#
# Run a compiled ix86 kernel and print large local stack usage.
#
# />:/{s/[<>:]*//g; h; } On lines that contain '>:' (headings like
# c0100000 <_stext>:), remove <, > and : and hold the line. Identifies
# the procedure and its start address.
#
# /subl\?.*\$0x[^,][^,][^,].*,%esp/{ Select lines containing
# subl\?...0x...,%esp but only if there are at least 3 digits between 0x and
# ,%esp. These are local stacks of at least 0x100 bytes.
#
# s/.*$0x\([^,]*\).*/\1/; Extract just the stack adjustment
# /^[89a-f].......$/d; Ignore lines with 8 digit offsets that are
# negative. Some compilers adjust the stack on exit, seems to be related
# to goto statements
# G; Append the held line (procedure and start address).
# s/\(.*\)\n.* \(.*\)/\1 \2/; Remove the newline and procedure start
# address. Leaves just stack size and procedure name.
# p; }; Print stack size and procedure name.
#
# /subl\?.*%.*,%esp/{ Selects adjustment of %esp by register, dynamic
# arrays on stack.
# G; Append the held line (procedure and start address).
# s/\(.*\)\n\(.*\)/Dynamic \2 \1/; Reformat to "Dynamic", procedure
# start address, procedure name and the instruction that adjusts the
# stack, including its offset within the proc.
# p; }; Print the dynamic line.
#
#
# Leading spaces in the sed string are required.
#
objdump --disassemble "$@" | \
sed -ne '/>:/{s/[<>:]*//g; h; }
/subl\?.*\$0x[^,][^,][^,].*,%esp/{
s/.*\$0x\([^,]*\).*/\1/; /^[89a-f].......$/d; G; s/\(.*\)\n.* \(.*\)/\1 \2/; p; };
/subl\?.*%.*,%esp/{ G; s/\(.*\)\n\(.*\)/Dynamic \2 \1/; p; }; ' | \
sort