2003-05-02 08:48:39

by Andrew Morton

[permalink] [raw]
Subject: 2.5.68-mm4


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.68/2.5.68-mm4/

. Much reworking of the disk IO scheduler patches due to the updated
dynamic-disk-request-allocation patch. No real functional changes here.

. Included the `kexec' patch - load Linux from Linux. Various people want
this for various reasons. I like the idea of going from a login prompt to
"Calibrating delay loop" in 0.5 seconds.

I tried it on four machines and it worked with small glitches on three of
them, and wedged up the fourth. So if it is to proceed this code needs
help with testing and careful bug reporting please.

There's a femto-HOWTO in the patch itself, reproduced here:



- enable kexec in config, build, install.

- grab kexec-tools from

http://www.osdl.org/archive/andyp/kexec/2.5.68/

- edit ./kexec/kexec-syscall.c and make sure __NR_kexec_load is set
to 269 (-mm kernels have an additional syscall)

- run `make distclean' and `make'

- I use this script:

#!/bin/sh

usage()
{
echo "Usage: do-kexec.sh /boot/bzImage [commandline options]"
exit 1
}

if [ $# -lt 1 ]
then
usage
fi

sync
IMAGE=$1
shift
./objdir/build/sbin/kexec -l $IMAGE --command-line="$(cat /proc/cmdline) $*"
./objdir/build/sbin/kexec -e


invoked as

cd /usr/src/kexec-tools
./do-kexec.sh /boot/bzImage-2.5.68

This is fairly crude - it's an instant reboot, no shutdown or anything.
Only do this if you're using journalled filesystems!




Changes since 2.5.68-mm3:


linus.patch

Latest BK drop

-irqreturn-i2c.patch
-irqreturn-sound-2.patch
-irqreturn-smcc.patch
-SLAB_NO_GROW-fix.patch
-irqreturn-bttv.patch
-apm-locking-fix.patch
-xd-warning-fix.patch
-DAC960-interface-fixes.patch
-alt_instr-__KERNEL__.patch
-modular-jbd.patch
-hdlc-module-update.patch
-proc_file_read-fix.patch
-disk_name-size-check.patch
-cleanups.patch
-mwave-cleanup.patch
-ext3-ro-mount-fix.patch
-nr_threads-docco-fix.patch
-lost-tick-HZ-fix.patch
-nr_inactive-race-fix.patch
-blockdev-aio-support.patch
-percpu-counters-fix.patch
-config-menu-aesthetics.patch
-oom-kill-locking.patch
-restore-modinfo-section.patch
-implement-__module_get.patch

Merged

+compat-ioctl-fix.patch

Fix 32-bit ioctl fallback

+generic-subarch-missing-bit.patch

Some ofthe generic subarch patch got lost

+config-PAGE_OFFSET-025G.patch

Allow really small amounts of lowmem.

+dont-set-kernel-pgd-on-PAE.patch

little ia32 optimisation/cleanup

+shrink_slab-accounting.patch

Teach page reclaim to notice success due to slab shrinkage

-dynamic-request-allocation.patch
-dynamic-request-allocation-fix.patch
+rq-dyn-works.patch

latest dynamic disk request allocation patch

+as-iosched-dyn.patch

Update as-iosched for dynamic request allocation

+cfq-iosched-dyn.patch

Update cfq-iosched for dynamic request allocation

+security_d_instantiate-movement.patch
+ext3-security-xattr.patch
+ext2-security-xattr.patch

Security stuff

+pcmcia-fix.patch

Compile fix for the pcmcia fix.

+kexec.patch

kexec.




All 99 patches

linus.patch

mm.patch
add -mmN to EXTRAVERSION

compat-ioctl-fix.patch
Fix NULL handler for compat_ioctl

generic-subarch.patch
generic subarchitecture for ia32

generic-subarch-fix.patch
generic subarch: SMP only

generic-subarch-missing-bit.patch
generic subarch: missing chunk

ipmi-warning-fixes.patch

irqreturn-uml.patch
UML updates for the new IRQ API

irqreturn-aic79xx.patch
Fix aic79xx for new IRQ API

kgdb-ga.patch
kgdb stub for ia32 (George Anzinger's one)

kgdb-ga-ppc64-fix.patch

irqreturn-kgdb-ga.patch

irqreturn-drivers-net.patch

kgdb-ga-smp_num_cpus.patch

kgdb-ga-discontigmem-fixup.patch
kgdb: discontigmem fixup

slab-magazine-layer.patch
magazine layer for slab

config_spinline.patch
uninline spinlocks for profiling accuracy.

ppc64-reloc_hide.patch

ppc64-pci-patch.patch
Subject: pci patch

ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio

ppc64-scruffiness.patch
Fix some PPC64 compile warnings

ppc64-update.patch
ppc64 update

ppc64-update-fixes.patch

ppc64-irqfixes.patch

ppc64-pci-bogons.patch

sym-do-160.patch
make the SYM driver do 160 MB/sec

misc.patch
misc fixes

config-PAGE_OFFSET.patch
Configurable kenrel/user memory split

config-PAGE_OFFSET-025G.patch
3.75G config option

fat-speedup.patch
fat cluster search speedup

buffer-debug.patch
buffer.c debugging

ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages

VM_RESERVED-check.patch
VM_RESERVED check

semop-race-fix.patch
semtimedop(): Fix racy BUG check

reiserfs_file_write-5.patch

rcu-stats.patch
RCU statistics reporting

ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3

nfs-speedup.patch

nfs-oom-fix.patch
nfs oom fix

sk-allocation.patch
Subject: Re: nfs oom

nfs-more-oom-fix.patch

rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations

linux-isp.patch

isp-update-1.patch

dcache_lock-vs-tasklist_lock-take-2.patch
Fix dcache_lock/tasklist_lock ranking bug

clone-retval-fix.patch
copy_process return value fix

de_thread-fix.patch
de_thread memory corruption fix

list_del-debug.patch
list_del debug check

airo-schedule-fix.patch
airo.c: don't sleep in atomic regions

386-access_ok-race-fix.patch
access_ok() race fix for 80386.

synaptics-mouse-support.patch
Add Synaptics touchpad tweaking to psmouse driver

swapfile-hold-i_sem.patch
hold i_sem on swapfiles

dont-set-kernel-pgd-on-PAE.patch
remove unnecessary PAE pgd set

shrink_slab-accounting.patch
account for slab reclaim in try_to_free_pages()

rq-dyn-works.patch
rq-dyn, dynamic request allocation

kblockd.patch
Create `kblockd' workqueue

cfq-infrastructure.patch

elevator-completion-api.patch
elevator completion API

as-iosched.patch
anticipatory I/O scheduler

as-use-completion.patch
AS use completion notifier

as-remove-debug-checks.patch
AS: remove debug checks

as-iosched-dyn.patch
AS: update to dynamic request allocation API

unplug-use-kblockd.patch
Use kblockd for running request queues

cfq-2.patch
CFQ scheduler, #2

cfq-iosched-dyn.patch
CFQ: update to rq-dyn API

unmap-page-debugging.patch
unmap unused pages for debugging

fremap-all-mappings.patch
Make all executable mappings be nonlinear

sched-2.5.68-B2.patch
HT scheduler, sched-2.5.68-B2

sched_idle-typo-fix.patch
fix sched_idle typo

kgdb-ga-idle-fix.patch

sched-2.5.64-D3.patch
sched-2.5.64-D3, more interactivity changes

show_task-free-stack-fix.patch
show_task() fix and cleanup

htree-nfs-fix.patch
Fix ext3 htree / NFS compatibility problems

i8042-share-irqs.patch
allow i8042 interrupt sharing

select-speedup.patch
Subject: Re: IA64 changes to fs/select.c

select-speedup-fix.patch
select() sleedup fix

slab_store_user-large-objects.patch
slab debug: perform redzoning against larger objects

htree-nfs-fix-2.patch
htree nfs fix

htree-leak-fix.patch
ext3: htree memory leak fix

put_task_struct-debug.patch

ia32-mknod64.patch
mknod64 for ia32

ext2-64-bit-special-inodes.patch
ext2: support for 64-bit device nodes

ext3-64-bit-special-inodes.patch
ext3: support for 64-bit device nodes

64-bit-dev_t-kdev_t.patch
64-bit dev_t and kdev_t

oops-dump-preceding-code.patch
i386 oops output: dump preceding code

lockmeter.patch

security_d_instantiate-movement.patch
Move security_d_instantiate hook calls

ext3-security-xattr.patch
ext3 xattr handler for security modules

ext2-security-xattr.patch
ext2 xattr handler for security modules

ext3-no-bkl.patch

journal_dirty_metadata-speedup.patch

journal_get_write_access-speedup.patch

ext3-concurrent-block-inode-allocation.patch
Subject: [PATCH] concurrent block/inode allocation for EXT3

ext3-orlov-approx-counter-fix.patch
Fix orlov allocator boundary case

ext3-concurrent-block-allocation-fix-1.patch

ext3-concurrent-block-allocation-hashed.patch
Subject: Re: [PATCH] concurrent block/inode allocation for EXT3

pcmcia-deadlock-fix-2.patch
Fix PCMCIA deadlock (rev. 2)

pcmcia-fix.patch

kexec.patch
kexec




2003-05-02 13:06:43

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, May 02, 2003 at 02:01:49AM -0700, Andrew Morton wrote:
> +dont-set-kernel-pgd-on-PAE.patch
> little ia32 optimisation/cleanup

It looks like no one listened to my commentary on the set_pgd() patch.

Remove pointless #ifdef, pointless set_pgd(), and a mysterious line
full of nothing but whitespace after the #endif, and update commentary.

-- wli

$ diffstat ../patches/mm4-2.5.68-2
fault.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff -urpN mm4-2.5.68-1/arch/i386/mm/fault.c mm4-2.5.68-2/arch/i386/mm/fault.c
--- mm4-2.5.68-1/arch/i386/mm/fault.c 2003-05-02 05:32:27.000000000 -0700
+++ mm4-2.5.68-2/arch/i386/mm/fault.c 2003-05-02 05:54:14.000000000 -0700
@@ -333,16 +333,12 @@ vmalloc_fault:

if (!pgd_present(*pgd_k))
goto no_context;
+
/*
- * kernel pmd pages are shared among all processes
- * with PAE on. Since vmalloc pages are always
- * in the kernel area, this will always be a
- * waste with PAE on.
+ * set_pgd(pgd, *pgd_k); here would be useless on PAE
+ * and redundant with the set_pmd() on non-PAE.
*/
-#ifndef CONFIG_X86_PAE
- set_pgd(pgd, *pgd_k);
-#endif
-
+
pmd = pmd_offset(pgd, address);
pmd_k = pmd_offset(pgd_k, address);
if (!pmd_present(*pmd_k))

2003-05-02 15:23:34

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.5.68-mm4


Hi,

> . Included the `kexec' patch - load Linux from Linux. Various people want
> this for various reasons. I like the idea of going from a login prompt to
> "Calibrating delay loop" in 0.5 seconds.

One thing that bothers me about kexec is how we grab low pages in
kimage_alloc_page(). On a partitioned ppc64 box I will need to grab
memory in the low 256MB and the machine might have 500GB of memory
free. Thats going to take some time :)

Id hate to introduce a separate zone just for this sort of stuff (we
currently throw all memory in the DMA zone). Could we add a hint to
the page allocator where it makes a best effort to grab memory below
a threshold?

Anton

2003-05-02 20:50:26

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 14:49, Steven Cole wrote:
> On Fri, 2003-05-02 at 14:34, Andrew Morton wrote:
> > Steven Cole <[email protected]> wrote:
> > >
> > > For what it's worth, kexec has worked for me on the following
> > > two systems.
> > > ...
> > > 00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
> >
> > Are you using eepro100 or e100? I found that e100 failed to bring up the
> > interface on restart ("failed selftest"), but eepro100 was OK.
>
> CONFIG_EEPRO100=y
> # CONFIG_EEPRO100_PIO is not set
> # CONFIG_E100 is not set
>
> I can test E100 again to verify if that would help.
>
# CONFIG_EEPRO100 is not set
CONFIG_E100=y

Well, e100 works for me with kexec. Sure is quick to just do:

cp arch/i386/boot/bzImage /boot/vmlinuz-2.5.68-mm4x
do-kexec.sh /boot/vmlinuz-2.5.68-mm4x

and no running /sbin/lilo. Nice.
( I put do-kexec.sh and kexec in /usr/local/bin )

Steven

2003-05-02 14:50:00

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 08:45, Steven Cole wrote:
> On Fri, 2003-05-02 at 03:01, Andrew Morton wrote:

> > - grab kexec-tools from
> >
> > http://www.osdl.org/archive/andyp/kexec/2.5.68/
> >
> The andyp directory seems to be missing. I found kexec-tools-1.8 here:
> http://www.xmission.com/~ebiederm/files/kexec/
>
> Is that the latest version?

Now kexec-tools-1.8-2.5.68.tgz is there at the original URL. Thanks.

Steven

2003-05-02 16:43:02

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Fix up NUMA-Q build with new generic apic mode stuff


Attachments:
(No filename) (54.00 B)
mm4-fixed (777.00 B)
Download all attachments

2003-05-02 21:37:55

by Andy Pfiffer

[permalink] [raw]
Subject: Re: 2.5.68-mm4


> > > > I found that e100 failed to bring up the
> > > > interface on restart ("failed selftest"), but eepro100 was OK.

> Here is a snippet from dmesg output for a successful kexec e100 boot:

Any chance we could get lspci output from both of these systems?

2003-05-02 16:36:47

by Dave Hansen

[permalink] [raw]
Subject: Re: 2.5.68-mm4

William Lee Irwin III wrote:
>> On Fri, May 02, 2003 at 02:01:49AM -0700, Andrew Morton wrote:
>>+dont-set-kernel-pgd-on-PAE.patch
>> little ia32 optimisation/cleanup
>
> It looks like no one listened to my commentary on the set_pgd() patch.
>
> Remove pointless #ifdef, pointless set_pgd(), and a mysterious line
> full of nothing but whitespace after the #endif, and update commentary.
> -#ifndef CONFIG_X86_PAE
> - set_pgd(pgd, *pgd_k);
> -#endif

I wask thinking that the PMD set in 4G mode was a noop. But, it isn't,
so it makes up for the completely removed pgd set.

This comment needs to get updated in include/asm-i386/pgtable-2level.h:
/*
* (pmds are folded into pgds so this doesn't get actually called,
* but the define is needed for a generic inline function.)
*/
#define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
#define set_pgd(pgdptr, pgdval) (*(pgdptr) = pgdval)

--
Dave Hansen
[email protected]

2003-05-02 19:53:52

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 03:01, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.68/2.5.68-mm4/
>
> . Much reworking of the disk IO scheduler patches due to the updated
> dynamic-disk-request-allocation patch. No real functional changes here.
>
> . Included the `kexec' patch - load Linux from Linux. Various people want
> this for various reasons. I like the idea of going from a login prompt to
> "Calibrating delay loop" in 0.5 seconds.
>
> I tried it on four machines and it worked with small glitches on three of
> them, and wedged up the fourth. So if it is to proceed this code needs
> help with testing and careful bug reporting please.
>

For what it's worth, kexec has worked for me on the following
two systems.

single P-III 933Mhz, 256MB, IDE (system 1)

00:00.0 Host bridge: Intel Corp. 82810E DC-133 GMCH [Graphics Memory Controller Hub] (rev 03)
00:01.0 VGA compatible controller: Intel Corp. 82810E DC-133 CGC [Chipset Graphics Controller] (rev 03)
00:1e.0 PCI bridge: Intel Corp. 82801AA PCI Bridge (rev 02)
00:1f.0 ISA bridge: Intel Corp. 82801AA ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801AA IDE (rev 02)
00:1f.2 USB Controller: Intel Corp. 82801AA USB (rev 02)
00:1f.3 SMBus: Intel Corp. 82801AA SMBus (rev 02)
00:1f.5 Multimedia audio controller: Intel Corp. 82801AA AC'97 Audio (rev 02)
01:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)


dual P-III 1000Mhz, 1024MB, SCSI (system 2)

00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
00:02.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC] (rev 7a)
00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
01:04.0 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)
01:04.1 SCSI storage controller: Adaptec AIC-7899P U160/m (rev 01)

The times for reboot back to run level 3 are:

normal kexec
system 1 69 seconds 35 seconds
system 2 150 seconds 75 seconds

Steven


2003-05-02 20:38:09

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 14:34, Andrew Morton wrote:
> Steven Cole <[email protected]> wrote:
> >
> > For what it's worth, kexec has worked for me on the following
> > two systems.
> > ...
> > 00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
>
> Are you using eepro100 or e100? I found that e100 failed to bring up the
> interface on restart ("failed selftest"), but eepro100 was OK.

CONFIG_EEPRO100=y
# CONFIG_EEPRO100_PIO is not set
# CONFIG_E100 is not set

I can test E100 again to verify if that would help.

Also, I found that if I mistyped the argument to do-kexec.sh, the
system would stay up, but the interface would get hosed, fixable with
/etc/rc.d/init.d/network restart.

Otherwise, kexec works fine here so far over about a dozen reboots on
both machines.

Steven

2003-05-02 21:09:27

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 15:05, Andrew Morton wrote:
> Steven Cole <[email protected]> wrote:
> >
> > On Fri, 2003-05-02 at 14:34, Andrew Morton wrote:
> > > Steven Cole <[email protected]> wrote:
> > > >
> > > > For what it's worth, kexec has worked for me on the following
> > > > two systems.
> > > > ...
> > > > 00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
> > >
> > > Are you using eepro100 or e100? I found that e100 failed to bring up the
> > > interface on restart ("failed selftest"), but eepro100 was OK.
> >
> > CONFIG_EEPRO100=y
> > # CONFIG_EEPRO100_PIO is not set
> > # CONFIG_E100 is not set
> >
> > I can test E100 again to verify if that would help.
>
> May as well.
>
> There's something in the driver shutdown which is failing to bring the
> device into a state in which the driver startup can start it up. Probably
> just a missing device reset. I'll bug Scott about it if we get that far.
>
Here is a snippet from dmesg output for a successful kexec e100 boot:

Intel(R) PRO/100 Network Driver - version 2.2.21-k1
Copyright (c) 2003 Intel Corporation

e100: selftest OK.
Freeing alive device c1b23000, eth%d
e100: eth0: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled

I booted the e100 2.5.68-mm4 kernel twice with kexec, initially from the
eepro100 version, and once from the e100 version. Both worked OK.

Steven



2003-05-02 19:46:33

by J. Hidding

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Hi,

2.5.68-mm4 went dead when I tried to mount a floppy. It
gave no time to record any logs, or sysrqs.

Thanks,
Johan Hidding

---
Running latest Gentoo on AMD Athlon on Via MoBo.

2003-05-02 20:25:08

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Steven Cole <[email protected]> wrote:
>
> For what it's worth, kexec has worked for me on the following
> two systems.
> ...
> 00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)

Are you using eepro100 or e100? I found that e100 failed to bring up the
interface on restart ("failed selftest"), but eepro100 was OK.

2003-05-02 20:56:14

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Steven Cole <[email protected]> wrote:
>
> On Fri, 2003-05-02 at 14:34, Andrew Morton wrote:
> > Steven Cole <[email protected]> wrote:
> > >
> > > For what it's worth, kexec has worked for me on the following
> > > two systems.
> > > ...
> > > 00:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
> >
> > Are you using eepro100 or e100? I found that e100 failed to bring up the
> > interface on restart ("failed selftest"), but eepro100 was OK.
>
> CONFIG_EEPRO100=y
> # CONFIG_EEPRO100_PIO is not set
> # CONFIG_E100 is not set
>
> I can test E100 again to verify if that would help.

May as well.

There's something in the driver shutdown which is failing to bring the
device into a state in which the driver startup can start it up. Probably
just a missing device reset. I'll bug Scott about it if we get that far.

> Also, I found that if I mistyped the argument to do-kexec.sh, the
> system would stay up, but the interface would get hosed, fixable with
> /etc/rc.d/init.d/network restart.

Yes, kexec userspace shuts down the network interfaces then tries to exec
the new kernel. But none was loaded and the syscall returns -EINVAL.
You're left with downed interfaces. The script should be checking the
success of the initial image loading.


2003-05-02 14:34:56

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 03:01, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.68/2.5.68-mm4/
>
> . Much reworking of the disk IO scheduler patches due to the updated
> dynamic-disk-request-allocation patch. No real functional changes here.
>
> . Included the `kexec' patch - load Linux from Linux. Various people want
> this for various reasons. I like the idea of going from a login prompt to
> "Calibrating delay loop" in 0.5 seconds.
>
> I tried it on four machines and it worked with small glitches on three of
> them, and wedged up the fourth. So if it is to proceed this code needs
> help with testing and careful bug reporting please.
>
> There's a femto-HOWTO in the patch itself, reproduced here:
>
>
>
> - enable kexec in config, build, install.
>
> - grab kexec-tools from
>
> http://www.osdl.org/archive/andyp/kexec/2.5.68/
>
The andyp directory seems to be missing. I found kexec-tools-1.8 here:
http://www.xmission.com/~ebiederm/files/kexec/

Is that the latest version?

Steven

2003-05-02 21:49:39

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2003-05-02 at 15:49, Andy Pfiffer wrote:
> > > > > I found that e100 failed to bring up the
> > > > > interface on restart ("failed selftest"), but eepro100 was OK.
>
> > Here is a snippet from dmesg output for a successful kexec e100 boot:
>
> Any chance we could get lspci output from both of these systems?

Sure. I posted that initially. See this:

http://marc.theaimsgroup.com/?l=linux-kernel&m=105190618322919&w=2

Steven


2003-05-02 23:09:44

by Matt Bernstein

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On May 2 Steven Cole wrote:

>Here is a snippet from dmesg output for a successful kexec e100 boot:

Bizarrely I have a nasty crash on modprobing e100 *without* kexec (having
previously modprobed unix, af_packet and mii) and then trying to modprobe
serio (which then deadlocks the machine).

http://www.dcs.qmul.ac.uk/~mb/oops/

More information available on request

Matt

2003-05-02 23:33:00

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Matt Bernstein <[email protected]> wrote:
>
> On May 2 Steven Cole wrote:
>
> >Here is a snippet from dmesg output for a successful kexec e100 boot:
>
> Bizarrely I have a nasty crash on modprobing e100 *without* kexec (having
> previously modprobed unix, af_packet and mii) and then trying to modprobe
> serio (which then deadlocks the machine).
>
> http://www.dcs.qmul.ac.uk/~mb/oops/
>

Andi, it died in the middle of modprobe->apply_alternatives()

2003-05-03 01:02:08

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Andrew Morton <[email protected]> wrote:
>
> Are you using eepro100 or e100? I found that e100 failed to bring up the
> interface on restart ("failed selftest"), but eepro100 was OK.

That's because the e100 driver puts the card into state D3 when shutting
down but can't get it back to D0 afterwards.

Please send info about your chipset to Intel so they can work this out.
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2003-05-03 02:41:10

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Sat, May 03, 2003 at 01:41:59AM +0200, Andrew Morton wrote:
> Matt Bernstein <[email protected]> wrote:
> >
> > On May 2 Steven Cole wrote:
> >
> > >Here is a snippet from dmesg output for a successful kexec e100 boot:
> >
> > Bizarrely I have a nasty crash on modprobing e100 *without* kexec (having
> > previously modprobed unix, af_packet and mii) and then trying to modprobe
> > serio (which then deadlocks the machine).
> >
> > http://www.dcs.qmul.ac.uk/~mb/oops/
> >
>
> Andi, it died in the middle of modprobe->apply_alternatives()

The important part of the oops - the first lines are missing in the .png.

What is the failing address? And can you send me your e100.o ?

-Andi

2003-05-03 03:02:33

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Sat, May 03, 2003 at 01:41:59AM +0200, Andrew Morton wrote:
> Andi, it died in the middle of modprobe->apply_alternatives()

BTW I just loaded an e100 module with BK-CVS current and there were no
problems.

-Andi

2003-05-03 06:56:28

by Matt Bernstein

[permalink] [raw]
Subject: Re: 2.5.68-mm4

At 04:53 +0200 Andi Kleen wrote:
>> >
>> > Bizarrely I have a nasty crash on modprobing e100 *without* kexec (having
>> > previously modprobed unix, af_packet and mii) and then trying to modprobe
>> > serio (which then deadlocks the machine).
>> >
>> > http://www.dcs.qmul.ac.uk/~mb/oops/
>>
>> Andi, it died in the middle of modprobe->apply_alternatives()
>
>The important part of the oops - the first lines are missing in the .png.
>
>What is the failing address? And can you send me your e100.o ?

I'm sorry I can't get to the machine now till Tuesday. I'll try to get it
into a smaller font, or failing that a serial console if you like.

I've posted e100.{,k}o, vmlinux and System.map to the above URL. FWIW,
they both give "c010e840 T apply_alternatives". I've also posted ".config"
which Apache elects not to list :)

Does any of the above help?

2003-05-03 11:38:53

by Charlie Baylis

[permalink] [raw]
Subject: Re: 2.5.68-mm4


Hi

2.5.68-mm4 fixes APM suspend on my Vaio (problem reported with 2.5.68-mm2) but
my PCMCIA ethernet is still broken after suspend and requires ifconfig eth0
down; cardctl eject; cardctl insert before it will come to life (it took two
goes at that the first time, only one the second)

As before, I get thousands of "eth0: command 0x5800 did not complete!" after
resume, and I got the following backtrace after resume (possibly triggered by
the cardctl commands).

As before, Sony Vaio, pre-empt, APM, combined ethernet/modem PCMCIA using
3c574_cs.

Any more info required?

Charlie

irq 11: nobody cared!
Call Trace:
[<c010b640>] handle_IRQ_event+0x90/0x100
[<c010b897>] do_IRQ+0x97/0x120
[<c0109c68>] common_interrupt+0x18/0x20
[<c01ab1e3>] pci_bus_write_config_word+0x73/0x90
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c88a7890>] dead_socket+0x0/0xc [pcmcia_core]
[<c882984a>] yenta_set_socket+0xba/0x1b0 [yenta_socket]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c882a027>] yenta_clear_maps+0x57/0x90 [yenta_socket]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c88a7890>] dead_socket+0x0/0xc [pcmcia_core]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c882a20b>] yenta_init+0x1b/0x30 [yenta_socket]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c882aa70>] ricoh_init+0x10/0xe0 [yenta_socket]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c8829036>] +0x36/0x40 [yenta_socket]
[<c882c180>] +0x0/0x840 [yenta_socket]
[<c889dada>] init_socket+0x2a/0x30 [pcmcia_core]
[<c889df25>] shutdown_socket+0x15/0x100 [pcmcia_core]
[<c889e16a>] socket_shutdown+0x4a/0x60 [pcmcia_core]
[<c889e47a>] socket_insert+0x7a/0x80 [pcmcia_core]
[<c889da1a>] get_socket_status+0x1a/0x20 [pcmcia_core]
[<c889e6ad>] pccardd+0x13d/0x1f0 [pcmcia_core]
[<c0119df0>] default_wake_function+0x0/0x20
[<c01091d2>] ret_from_fork+0x6/0x14
[<c0119df0>] default_wake_function+0x0/0x20
[<c889e570>] pccardd+0x0/0x1f0 [pcmcia_core]
[<c010722d>] kernel_thread_helper+0x5/0x18

handlers:
[<c8862410>] (el3_interrupt+0x0/0x260 [3c574_cs])

2003-05-03 13:30:38

by Diego Calleja

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Fri, 2 May 2003 16:41:59 -0700
Andrew Morton <[email protected]> wrote:

> > http://www.dcs.qmul.ac.uk/~mb/oops/
> >
>
> Andi, it died in the middle of modprobe->apply_alternatives()

I just got this oops under -mm4 while connecting with a ppp link:

Probably it was modprobe'ing one of those:

ppp_deflate 5312 0 [unsafe]
zlib_deflate 21912 1 ppp_deflate
zlib_inflate 21408 1 ppp_deflate
bsd_comp 5600 0 [unsafe]
ppp_async 10496 1 [unsafe]
ppp_generic 27080 5 ppp_deflate,bsd_comp,ppp_async
slhc 5952 1 ppp_generic


CSLIP: code copyright 1989 Regents of the University of California
PPP generic driver version 2.4.2
Unable to handle kernel paging request at virtual address c03390df
printing eip:
c0110a39
*pde = 00102027
*pte = 00339000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c0110a39>] Not tainted VLI
EFLAGS: 00013202
EIP is at apply_alternatives+0xd9/0x120
eax: 00000001 ebx: d088fb8c ecx: 00000000 edx: 00000001
esi: c03390df edi: d08898cf ebp: ccfe5eec esp: ccfe5ed8
ds: 007b es: 007b ss: 0068
Process modprobe (pid: 375, threadinfo=ccfe4000 task=ce236690)
Stack: c02f68e0 00000003 d0883880 0000008f d08835b7 ccfe5f0c c011a436 d088fb8c
d088fc1b d0883504 d087c000 d0891b20 00000460 ccfe5f94 c013c8ae d087c000
d0883600 d0891b20 00000016 d0891b20 00000000 00000000 00000000 00000488
Call Trace:
[<c011a436>] module_finalize+0x96/0xa0
[<c013c8ae>] load_module+0x64e/0x870
[<c013cb6c>] sys_init_module+0x9c/0x2b0
[<c0109aef>] syscall_call+0x7/0xb

Code: 00 00 8b 0b 83 fa 09 b8 08 00 00 00 0f 4c c2 8b 7d f0 01 cf 8b 4d ec 8b 34
81 89 c1 c1 e9 02 f3 a5 a8 02 74 02 66 a5 a8 01 74 01 <a4> 01 45 f0 29 c2 85 d2
7f cd 83 c3 0c 3b 5d 0c 0f 82 71 ff ff




2003-05-03 14:02:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.5.68-mm4

Anton Blanchard <[email protected]> writes:

> Hi,
>
> > . Included the `kexec' patch - load Linux from Linux. Various people want
> > this for various reasons. I like the idea of going from a login prompt to
> > "Calibrating delay loop" in 0.5 seconds.
>
> One thing that bothers me about kexec is how we grab low pages in
> kimage_alloc_page(). On a partitioned ppc64 box I will need to grab
> memory in the low 256MB and the machine might have 500GB of memory
> free. Thats going to take some time :)

Could you explain to me the need to allocate memory in the low 256MB.
Generally the design is that you can allocate the memory anywhere
and then relocate_kernel.S will move where it needs to be kept.

I have had people wanting to use 300MB initial ramdisks and the like.
If you have 500GB of memory what is the point of keeping anything on a disk?

When you have 4TB on a cluster or a NUMA machine I can understand
wanting to keep things local to a node. But in those cases you want
to have local node zones so the problem does not come up.

In general I hate restricting the memory you can use, because kexec is
not just about booting linux. But it is about booting anything that
we reasonably can. The only case I have seen so far that makes sense
is when your physical memory is larger than your virtual memory.

> Id hate to introduce a separate zone just for this sort of stuff (we
> currently throw all memory in the DMA zone). Could we add a hint to
> the page allocator where it makes a best effort to grab memory below
> a threshold?

I suspect so. And I can't imagine it would be that hard to implement.

But I think I would like to see why you need that.

Eric

2003-05-05 03:37:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.5.68-mm4 && kexec

Andrew Morton <[email protected]> writes:

> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.68/2.5.68-mm4/
>
>
> . Much reworking of the disk IO scheduler patches due to the updated
> dynamic-disk-request-allocation patch. No real functional changes here.
>
> . Included the `kexec' patch - load Linux from Linux. Various people want
> this for various reasons. I like the idea of going from a login prompt to
> "Calibrating delay loop" in 0.5 seconds.
>
> I tried it on four machines and it worked with small glitches on three of
> them, and wedged up the fourth. So if it is to proceed this code needs
> help with testing and careful bug reporting please.

The current state of the code is that APM is not expected to work. The
user space tool needs a fix to pass the address of the APM entry points
to the new kernel.

But beyond that everything should work baring drivers which have
problems shutting themselves down and restarting.

Eric

2003-05-06 14:23:46

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Tue, May 06, 2003 at 04:15:55PM +0200, Matt Bernstein wrote:
> Is this helpful?

What I really need is an probably decoded with ksymoops oops, not jpegs.

Also you seem to be the only one with the problem so just to avoid
any weird build problems do a make distclean and rebuild from scratch
and reinstall the modules.

-Andi

2003-05-06 15:38:33

by Matt Bernstein

[permalink] [raw]
Subject: Re: 2.5.68-mm4

At 16:35 +0200 Andi Kleen wrote:

>On Tue, May 06, 2003 at 04:15:55PM +0200, Matt Bernstein wrote:
>> Is this helpful?
>
>What I really need is an probably decoded with ksymoops oops, not jpegs.

OK, I'll do this tomorrow morning (I think I can do it without a serial
console now).

>Also you seem to be the only one with the problem so just to avoid
>any weird build problems do a make distclean and rebuild from scratch
>and reinstall the modules.

The only odd thing I think I'm doing is hacking this into rc.sysinit:

awk '/version 2\.5\./ {exit 1}' /proc/version || egrep -v '^#' /etc/sysconfig/modules | while read i
do
action $"Loading $i module: " /sbin/modprobe $i
done

This might be naughty, but it shouldn't be able to hang the box!

I'd prefer to have a proper set of aliases for 2.5 in /etc/modules.conf,
but I'm too lazy to google for one. Also, I'd prefer yet more to shunt
this stuff into an initramfs but I'll wait for documentation to appear for
that :)

Cheers,

Matt

2003-05-07 10:14:42

by Matt Bernstein

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On May 6 Andi Kleen wrote:

>On Tue, May 06, 2003 at 04:15:55PM +0200, Matt Bernstein wrote:
>> Is this helpful?
>
>What I really need is an probably decoded with ksymoops oops, not jpegs.

ksymoops 2.4.9 on i686 2.4.20-8. Options used
-v /opt/linux-2.5.69-mm1/vmlinux (specified)
-K (specified)
-L (specified)
-o /lib/modules/2.5.69-mm1 (specified)
-m /boot/System.map-2.5.69-mm1 (specified)

No modules in ksyms, skipping objects
ACPI: LAPIC_NMI (acpi_id[0xff] polarity[0x0] trigger[0x0] lint[0x1])
Machine check exception polling timer started.
Unable to handle kernel paging request at virtual address c03b6e83
c010e93f
*pde = 00102027
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c010e93f>] Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: e094c580 ebx: 00000001 ecx: 00000000 edx: c0345740
esi: c03b6e83 edi: e0944f1d ebp: 00000001 esp: dcfb5ed8
ds: 007b es: 007b ss: 0068
Stack: 00000000 dcfb5ee8 00000001 c0345740 00000003 e094c580 e093c448 c030321c
e093c19f c030320b c01149fe e094c568 e094c5f7 e093c0f2 e092f000 e093c1f0
00000460 00000460 c012f7e1 e092f000 e093c1f0 e09505c0 00000016 e09505c0
Call Trace:
[<c01149fe>] module_finalize+0x8e/0xa0
[<c012f7e1>] load_module+0x6d1/0x920
[<c012faa8>] sys_init_module+0x78/0x1d0
[<c01091e5>] sysenter_past_esp+0x52/0x71
Code: 8b 54 24 0c 0f 4c dd 8b 7c 24 10 03 38 81 fb ff 01 00 00 8b 34 9a 77 39 89 d9 c1 e9 02 f3 a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 <a4> 29 dd 01 5c 24 10 85 ed 7f be 83 44 24 14 0c 8b 5c 24 30 39


>>EIP; c010e93f <apply_alternatives+ff/180> <=====

>>eax; e094c580 <_end+20555520/3fc06fa0>
>>edx; c0345740 <k7_nops+0/24>
>>esi; c03b6e83 <k7nops+0/2d>
>>edi; e0944f1d <_end+2054debd/3fc06fa0>
>>esp; dcfb5ed8 <_end+1cbbee78/3fc06fa0>

Trace; c01149fe <module_finalize+8e/a0>
Trace; c012f7e1 <load_module+6d1/920>
Trace; c012faa8 <sys_init_module+78/1d0>
Trace; c01091e5 <sysenter_past_esp+52/71>

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code; c010e914 <apply_alternatives+d4/180>
00000000 <_EIP>:
Code; c010e914 <apply_alternatives+d4/180>
0: 8b 54 24 0c mov 0xc(%esp,1),%edx
Code; c010e918 <apply_alternatives+d8/180>
4: 0f 4c dd cmovl %ebp,%ebx
Code; c010e91b <apply_alternatives+db/180>
7: 8b 7c 24 10 mov 0x10(%esp,1),%edi
Code; c010e91f <apply_alternatives+df/180>
b: 03 38 add (%eax),%edi
Code; c010e921 <apply_alternatives+e1/180>
d: 81 fb ff 01 00 00 cmp $0x1ff,%ebx
Code; c010e927 <apply_alternatives+e7/180>
13: 8b 34 9a mov (%edx,%ebx,4),%esi
Code; c010e92a <apply_alternatives+ea/180>
16: 77 39 ja 51 <_EIP+0x51>
Code; c010e92c <apply_alternatives+ec/180>
18: 89 d9 mov %ebx,%ecx
Code; c010e92e <apply_alternatives+ee/180>
1a: c1 e9 02 shr $0x2,%ecx
Code; c010e931 <apply_alternatives+f1/180>
1d: f3 a5 repz movsl %ds:(%esi),%es:(%edi)
Code; c010e933 <apply_alternatives+f3/180>
1f: f6 c3 02 test $0x2,%bl
Code; c010e936 <apply_alternatives+f6/180>
22: 74 02 je 26 <_EIP+0x26>
Code; c010e938 <apply_alternatives+f8/180>
24: 66 a5 movsw %ds:(%esi),%es:(%edi)
Code; c010e93a <apply_alternatives+fa/180>
26: f6 c3 01 test $0x1,%bl
Code; c010e93d <apply_alternatives+fd/180>
29: 74 01 je 2c <_EIP+0x2c>

This decode from eip onwards should be reliable

Code; c010e93f <apply_alternatives+ff/180>
00000000 <_EIP>:
Code; c010e93f <apply_alternatives+ff/180> <=====
0: a4 movsb %ds:(%esi),%es:(%edi) <=====
Code; c010e940 <apply_alternatives+100/180>
1: 29 dd sub %ebx,%ebp
Code; c010e942 <apply_alternatives+102/180>
3: 01 5c 24 10 add %ebx,0x10(%esp,1)
Code; c010e946 <apply_alternatives+106/180>
7: 85 ed test %ebp,%ebp
Code; c010e948 <apply_alternatives+108/180>
9: 7f be jg ffffffc9 <_EIP+0xffffffc9>
Code; c010e94a <apply_alternatives+10a/180>
b: 83 44 24 14 0c addl $0xc,0x14(%esp,1)
Code; c010e94f <apply_alternatives+10f/180>
10: 8b 5c 24 30 mov 0x30(%esp,1),%ebx
Code; c010e953 <apply_alternatives+113/180>
14: 39 .byte 0x39

>Also you seem to be the only one with the problem so just to avoid
>any weird build problems do a make distclean and rebuild from scratch
>and reinstall the modules.

Will do later today if the above isn't helpful. One other thing I did do
was a make -j19 KBUILD_VERBOSE=0 but I've been told this is completely
safe these days.

Cheers,

Matt

2003-05-07 12:22:59

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.68-mm4

On Wed, May 07, 2003 at 12:27:02PM +0200, Matt Bernstein wrote:
>
> Will do later today if the above isn't helpful. One other thing I did do
> was a make -j19 KBUILD_VERBOSE=0 but I've been told this is completely
> safe these days.

It tries to patch an instruction past the kernel text.

It could be in the discarded .exit.text/.text.exit. With new binutils you should
get an link error when this happens, but perhaps yours are too old for that.

When you comment these entries out from the DISCARD statement in
arch/i386/vmlinux.lds.S does it go away ? Alternatively use Andrew's
latest 2.5.69-mm*, that has the patch too.

-Andi

2003-05-07 15:33:21

by Matt Bernstein

[permalink] [raw]
Subject: Re: 2.5.68-mm4

At 14:35 +0200 Andi Kleen wrote:

>It tries to patch an instruction past the kernel text.
>
>It could be in the discarded .exit.text/.text.exit. With new binutils you should
>get an link error when this happens, but perhaps yours are too old for that.

I'm using the RH 9 standard 2.13.90.0.18-9. My environment is exactly RH9
+ modutils 2.4.22-10 from rawhide, on a single Athlon XP.

>When you comment these entries out from the DISCARD statement in
>arch/i386/vmlinux.lds.S does it go away ? Alternatively use Andrew's
>latest 2.5.69-mm*, that has the patch too.

Tried 2.5.69-mm2, it crashed the same way :-/