Hi,
I have an old Compaq Armada 1592DT. The thing goes automagically into
suspend mode after being forgotten for a while. And there is this button
to wake it up (the blue one, above the keyboard).
Last time i tried to wake it up it produced the attached oops.
"Unknown key"s are probable the blue button.
After printing out the oops, the system went back into suspend.
-alex
Suspending devices
Suspending device c03219ac
Unable to handle kernel NULL pointer dereference at virtual address 00000090
printing eip:
c011459f
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c011459f>] Not tainted
EFLAGS: 00010202
EIP is at fix_processor_context+0x5f/0x100
eax: 0000007c ebx: c5f0e000 ecx: 00000002 edx: 00000000
esi: 00000060 edi: 00000000 ebp: c5f0ff5c esp: c5f0ff54
ds: 007b es: 007b ss: 0068
Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
Stack: c5f0e000 00000060 c5f0ff64 c0114529 c5f0ff78 c01135c8 00000002 00000000
00000002 c5f0ff8c c0113845 00000001 c5f0e000 c5f0ffb4 c5f0ffdc c0113aa4
00000000 c5fbc640 c0117950 00000000 00000000 c0290000 c030f6b4 00000000
Call Trace:
[<c0114529>] restore_processor_state+0x69/0x80
[<c01135c8>] suspend+0x138/0x200
[<c0113845>] check_events+0xf5/0x230
[<c0113aa4>] apm_mainloop+0x94/0xb0
[<c0117950>] default_wake_function+0x0/0x20
[<c0117950>] default_wake_function+0x0/0x20
[<c01141a0>] apm+0x0/0x280
[<c0114262>] apm+0xc2/0x280
[<c0107255>] kernel_thread_helper+0x5/0x10
Code: 8b 48 14 8b 42 7c 85 c0 75 0a b9 00 10 29 c0 b8 05 00 00 00
<6>note: kapmd[4] exited with preempt_count 2
hda: dma_timer_expiry: dma status == 0x61
atkbd.c: Unknown key (set 2, scancode 0xb6, on isa0060/serio0) pressed.
atkbd.c: Unknown key (set 2, scancode 0x9d, on isa0060/serio0) pressed.
atkbd.c: Unknown key (set 2, scancode 0x19d, on isa0060/serio0) pressed.
atkbd.c: Unknown key (set 2, scancode 0xb8, on isa0060/serio0) pressed.
hda: timeout waiting for DMA
hda: timeout waiting for DMA
hda: (__ide_dma_test_irq) called while not waiting
atkbd.c: Unknown key (set 2, scancode 0x1b8, on isa0060/serio0) pressed.
Alex Riesen writes:
> Hi,
>
> I have an old Compaq Armada 1592DT. The thing goes automagically into
> suspend mode after being forgotten for a while. And there is this button
> to wake it up (the blue one, above the keyboard).
>
> Last time i tried to wake it up it produced the attached oops.
> "Unknown key"s are probable the blue button.
> After printing out the oops, the system went back into suspend.
>
> -alex
>
> Suspending devices
> Suspending device c03219ac
> Unable to handle kernel NULL pointer dereference at virtual address 00000090
> printing eip:
> c011459f
> *pde = 00000000
> Oops: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c011459f>] Not tainted
> EFLAGS: 00010202
> EIP is at fix_processor_context+0x5f/0x100
> eax: 0000007c ebx: c5f0e000 ecx: 00000002 edx: 00000000
> esi: 00000060 edi: 00000000 ebp: c5f0ff5c esp: c5f0ff54
> ds: 007b es: 007b ss: 0068
> Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
> Stack: c5f0e000 00000060 c5f0ff64 c0114529 c5f0ff78 c01135c8 00000002 00000000
> 00000002 c5f0ff8c c0113845 00000001 c5f0e000 c5f0ffb4 c5f0ffdc c0113aa4
> 00000000 c5fbc640 c0117950 00000000 00000000 c0290000 c030f6b4 00000000
> Call Trace:
> [<c0114529>] restore_processor_state+0x69/0x80
> [<c01135c8>] suspend+0x138/0x200
> [<c0113845>] check_events+0xf5/0x230
> [<c0113aa4>] apm_mainloop+0x94/0xb0
> [<c0117950>] default_wake_function+0x0/0x20
> [<c0117950>] default_wake_function+0x0/0x20
> [<c01141a0>] apm+0x0/0x280
> [<c0114262>] apm+0xc2/0x280
> [<c0107255>] kernel_thread_helper+0x5/0x10
>
> Code: 8b 48 14 8b 42 7c 85 c0 75 0a b9 00 10 29 c0 b8 05 00 00 00
Since 2.5.69-bk8 or so, apm.c will invoke restore_processor_state()
at resume-time. This is needed to reinitialise the SYSENTER MSRs
used by 2.5's new system call mechanism.
> <6>note: kapmd[4] exited with preempt_count 2
This I don't like. I'm not convinced the resume path is preempt-safe.
Please try again, either with CONFIG_PREEMPT disabled, or with a
preempt_disable() / preempt_enable() pair around apm.c's suspend code,
like in the patch below. (Untested, you may need to stick an #include
<preempt.h> somewhere in apm.c to make it compile.)
/Mikael
--- linux-2.5.69-bk8/arch/i386/kernel/apm.c.~1~ 2003-05-14 14:31:31.000000000 +0200
+++ linux-2.5.69-bk8/arch/i386/kernel/apm.c 2003-05-14 15:01:03.000000000 +0200
@@ -1213,9 +1213,11 @@
spin_unlock(&i8253_lock);
write_sequnlock_irq(&xtime_lock);
+ preempt_disable();
save_processor_state();
err = set_system_power_state(APM_STATE_SUSPEND);
restore_processor_state();
+ preempt_enable();
write_seqlock_irq(&xtime_lock);
spin_lock(&i8253_lock);
[email protected], Wed, May 14, 2003 15:04:38 +0200:
> > I have an old Compaq Armada 1592DT. The thing goes automagically into
> > suspend mode after being forgotten for a while. And there is this button
> > to wake it up (the blue one, above the keyboard).
> >
> > Last time i tried to wake it up it produced the attached oops.
> > "Unknown key"s are probable the blue button.
> > After printing out the oops, the system went back into suspend.
> >
> > Suspending devices
> > Suspending device c03219ac
> > Unable to handle kernel NULL pointer dereference at virtual address 00000090
...
> > EIP is at fix_processor_context+0x5f/0x100
...
> > Call Trace:
> > [<c0114529>] restore_processor_state+0x69/0x80
> > [<c01135c8>] suspend+0x138/0x200
> > [<c0113845>] check_events+0xf5/0x230
> > [<c0113aa4>] apm_mainloop+0x94/0xb0
> > [<c0117950>] default_wake_function+0x0/0x20
> > [<c0117950>] default_wake_function+0x0/0x20
> > [<c01141a0>] apm+0x0/0x280
> > [<c0114262>] apm+0xc2/0x280
> > [<c0107255>] kernel_thread_helper+0x5/0x10
>
> Since 2.5.69-bk8 or so, apm.c will invoke restore_processor_state()
> at resume-time. This is needed to reinitialise the SYSENTER MSRs
> used by 2.5's new system call mechanism.
and it supposed to go oops?
> > <6>note: kapmd[4] exited with preempt_count 2
> This I don't like. I'm not convinced the resume path is preempt-safe.
> Please try again, either with CONFIG_PREEMPT disabled, or with a
> preempt_disable() / preempt_enable() pair around apm.c's suspend code,
> like in the patch below. (Untested, you may need to stick an #include
> <preempt.h> somewhere in apm.c to make it compile.)
It changed things a bit. preempt_count is 3 now.
Oops didn't change.
The system still works, afaict. Didn't notice any ill effects.
-alex
Alex Riesen writes:
> [email protected], Wed, May 14, 2003 15:04:38 +0200:
> > Since 2.5.69-bk8 or so, apm.c will invoke restore_processor_state()
> > at resume-time. This is needed to reinitialise the SYSENTER MSRs
> > used by 2.5's new system call mechanism.
>
> and it supposed to go oops?
Of course not. It doesn't oops my Dell Latitude: on that laptop it
prevents oopses since otherwise user-space processes will oops the kernel
as soon as they make a system call or return from a system call. But this
only happens if both the CPU and glibc are capable of using SYSENTER.
> > > <6>note: kapmd[4] exited with preempt_count 2
> > This I don't like. I'm not convinced the resume path is preempt-safe.
> > Please try again, either with CONFIG_PREEMPT disabled, or with a
> > preempt_disable() / preempt_enable() pair around apm.c's suspend code,
> > like in the patch below. (Untested, you may need to stick an #include
> > <preempt.h> somewhere in apm.c to make it compile.)
>
> It changed things a bit. preempt_count is 3 now.
> Oops didn't change.
Ok so it wasn't preempt.
Can you identify in which statement the oops occurs?
And can you confirm that commenting out the calls in apm.c to
save_processor_state() and restore_processor_state() eliminates the oops?
/Mikael
[email protected], Wed, May 14, 2003 16:03:55 +0200:
> > > Since 2.5.69-bk8 or so, apm.c will invoke restore_processor_state()
> > > at resume-time. This is needed to reinitialise the SYSENTER MSRs
> > > used by 2.5's new system call mechanism.
> >
> > and it supposed to go oops?
>
> Of course not. It doesn't oops my Dell Latitude: on that laptop it
> prevents oopses since otherwise user-space processes will oops the kernel
> as soon as they make a system call or return from a system call. But this
> only happens if both the CPU and glibc are capable of using SYSENTER.
>
I'm not sure if my glibc uses sysenter. But I'd like to have the system
prepared if I eventually get one which does.
> > > > <6>note: kapmd[4] exited with preempt_count 2
> > > This I don't like. I'm not convinced the resume path is preempt-safe.
> > > Please try again, either with CONFIG_PREEMPT disabled, or with a
> > > preempt_disable() / preempt_enable() pair around apm.c's suspend code,
> > > like in the patch below. (Untested, you may need to stick an #include
> > > <preempt.h> somewhere in apm.c to make it compile.)
> >
> > It changed things a bit. preempt_count is 3 now.
> > Oops didn't change.
>
> Ok so it wasn't preempt.
> Can you identify in which statement the oops occurs?
not really. Somewhere fix_processor_context+0x5f/0x100, that's where EIP
points. But latest bk doesn't have this anymore, so I think I'll try it
first.
> And can you confirm that commenting out the calls in apm.c to
> save_processor_state() and restore_processor_state() eliminates the oops?
after I have tried it.
Alex Riesen writes:
> I'm not sure if my glibc uses sysenter. But I'd like to have the system
> prepared if I eventually get one which does.
RH9 on a P6/K7 or higher will use sysenter. Old P5s don't have it.
> not really. Somewhere fix_processor_context+0x5f/0x100, that's where EIP
> points.
I need to know your .config and gcc version if I'm to investigate this.
Otherwise I can't build a kernel similar to yours, and without that,
the EIP address you quoted is meaningless to me.
/Mikael
[email protected], Wed, May 14, 2003 16:37:58 +0200:
> Alex Riesen writes:
> > I'm not sure if my glibc uses sysenter. But I'd like to have the system
> > prepared if I eventually get one which does.
>
> RH9 on a P6/K7 or higher will use sysenter. Old P5s don't have it.
>
P5-233. Dunno how old it really is (model 8, Mobile Pentium MMX, stepping 1).
> > not really. Somewhere fix_processor_context+0x5f/0x100, that's where EIP
> > points.
>
> I need to know your .config and gcc version if I'm to investigate this.
> Otherwise I can't build a kernel similar to yours, and without that,
> the EIP address you quoted is meaningless to me.
>
Sorry. Here it goes (comments stripped):
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_EXPERIMENTAL=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_KMOD=y
CONFIG_X86_PC=y
CONFIG_M586MMX=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_F00F_BUG=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_PREEMPT=y
CONFIG_X86_TSC=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_EDD=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_PM=y
CONFIG_APM=y
CONFIG_APM_CPU_IDLE=y
CONFIG_APM_DISPLAY_BLANK=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
CONFIG_HOTPLUG=y
CONFIG_PCMCIA=m
CONFIG_CARDBUS=y
CONFIG_I82092=m
CONFIG_I82365=m
CONFIG_PCMCIA_PROBE=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_PC_CML1=m
CONFIG_PARPORT_SERIAL=m
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PARPORT_PC_SUPERIO=y
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
CONFIG_PNP_NAMES=y
CONFIG_PNP_DEBUG=y
CONFIG_ISAPNP=y
CONFIG_PNPBIOS=y
CONFIG_BLK_DEV_FD=m
CONFIG_BLK_DEV_LOOP=m
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECS=m
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDESCSI=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_OPTI621=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_SCSI=m
CONFIG_BLK_DEV_SD=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_REPORT_LUNS=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_UNIX=y
CONFIG_NET_KEY=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_INET_ECN=y
CONFIG_SYN_COOKIES=y
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IRC=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_LIMIT=m
CONFIG_IP_NF_MATCH_MAC=m
CONFIG_IP_NF_MATCH_PKTTYPE=m
CONFIG_IP_NF_MATCH_MARK=m
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_DSCP=m
CONFIG_IP_NF_MATCH_AH_ESP=m
CONFIG_IP_NF_MATCH_LENGTH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_TCPMSS=m
CONFIG_IP_NF_MATCH_HELPER=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_MATCH_CONNTRACK=m
CONFIG_IP_NF_MATCH_UNCLEAN=m
CONFIG_IP_NF_MATCH_OWNER=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_MIRROR=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_NAT_IRC=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_TOS=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_DSCP=m
CONFIG_IP_NF_TARGET_MARK=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
CONFIG_IPV6=m
CONFIG_IPV6_SCTP__=m
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_NET_PCMCIA=y
CONFIG_PCMCIA_3C589=m
CONFIG_PCMCIA_3C574=m
CONFIG_PCMCIA_FMVJ18X=m
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
CONFIG_PCMCIA_XIRC2PS=m
CONFIG_IRDA=m
CONFIG_IRLAN=m
CONFIG_IRCOMM=m
CONFIG_IRDA_ULTRA=y
CONFIG_IRDA_CACHE_LAST_LSAP=y
CONFIG_IRDA_FAST_RR=y
CONFIG_IRDA_DEBUG=y
CONFIG_IRTTY_SIR=m
CONFIG_NSC_FIR=m
CONFIG_WINBOND_FIR=m
CONFIG_TOSHIBA_OLD=m
CONFIG_TOSHIBA_FIR=m
CONFIG_SMC_IRCC_FIR=m
CONFIG_ALI_FIR=m
CONFIG_VLSI_FIR=m
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=m
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=m
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_RTC=y
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_JBD=y
CONFIG_FS_MBCACHE=y
CONFIG_FS_POSIX_ACL=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=m
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_SUNRPC=m
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_MSDOS_PARTITION=y
CONFIG_SMB_NLS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=m
CONFIG_FB=y
CONFIG_FB_VESA=y
CONFIG_VIDEO_SELECT=y
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_PCI_CONSOLE=y
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_DEBUG=y
CONFIG_SND_DEBUG_DETECT=y
CONFIG_SND_DUMMY=m
CONFIG_SND_SB8=m
CONFIG_SND_SB16=m
CONFIG_SOUND_PRIME=m
CONFIG_SOUND_OSS=m
CONFIG_SOUND_TRACEINIT=y
CONFIG_SOUND_VMIDI=m
CONFIG_SOUND_SB=m
CONFIG_USB=m
CONFIG_USB_DEVICEFS=y
CONFIG_USB_BANDWIDTH=y
CONFIG_USB_OHCI_HCD=m
CONFIG_USB_PRINTER=m
CONFIG_USB_STORAGE=m
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_HP8200e=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_HID=m
CONFIG_USB_HIDINPUT=y
CONFIG_USB_SCANNER=m
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SLAB=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_KALLSYMS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_FRAME_POINTER=y
CONFIG_CRC32=m
CONFIG_ZLIB_INFLATE=m
CONFIG_X86_BIOS_REBOOT=y
On Wed, 14 May 2003 11:48:13 +0200, Alex Riesen wrote:
>I have an old Compaq Armada 1592DT. The thing goes automagically into
>suspend mode after being forgotten for a while. And there is this button
>to wake it up (the blue one, above the keyboard).
>
>Last time i tried to wake it up it produced the attached oops.
>"Unknown key"s are probable the blue button.
>After printing out the oops, the system went back into suspend.
>
>-alex
>
>Suspending devices
>Suspending device c03219ac
>Unable to handle kernel NULL pointer dereference at virtual address 00000090
> printing eip:
>c011459f
>*pde = 00000000
>Oops: 0000 [#1]
>CPU: 0
>EIP: 0060:[<c011459f>] Not tainted
>EFLAGS: 00010202
>EIP is at fix_processor_context+0x5f/0x100
>eax: 0000007c ebx: c5f0e000 ecx: 00000002 edx: 00000000
>esi: 00000060 edi: 00000000 ebp: c5f0ff5c esp: c5f0ff54
>ds: 007b es: 007b ss: 0068
>Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
After receiving Alex' .config and gcc version (3.2.3), I've been
able to decipher this. current->mm is NULL in the kapmd task. The call
load_LDT(¤t->mm->context); /* This does lldt */
in fix_processor_context() computes the address of context as
(current->mm)+0x7c, which is 0x7c. load_LDT_nolock() dereferences
0x7c+0x14 (void *segments = pc->ldt) and the oops follows.
As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
when I test APM suspend in 2.5.69-bk. A lot of code dereferences
current->mm without checking, so I guess current->mm==NULL is a bug.
/Mikael
[email protected], Mon, May 19, 2003 14:16:24 +0200:
> On Wed, 14 May 2003 11:48:13 +0200, Alex Riesen wrote:
> >I have an old Compaq Armada 1592DT. The thing goes automagically into
> >suspend mode after being forgotten for a while. And there is this button
> >to wake it up (the blue one, above the keyboard).
> >
> >Last time i tried to wake it up it produced the attached oops.
> >"Unknown key"s are probable the blue button.
> >After printing out the oops, the system went back into suspend.
> >
> >-alex
> >
> >Suspending devices
> >Suspending device c03219ac
> >Unable to handle kernel NULL pointer dereference at virtual address 00000090
> > printing eip:
> >c011459f
> >*pde = 00000000
> >Oops: 0000 [#1]
> >CPU: 0
> >EIP: 0060:[<c011459f>] Not tainted
> >EFLAGS: 00010202
> >EIP is at fix_processor_context+0x5f/0x100
> >eax: 0000007c ebx: c5f0e000 ecx: 00000002 edx: 00000000
> >esi: 00000060 edi: 00000000 ebp: c5f0ff5c esp: c5f0ff54
> >ds: 007b es: 007b ss: 0068
> >Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
>
> After receiving Alex' .config and gcc version (3.2.3), I've been
> able to decipher this. current->mm is NULL in the kapmd task. The call
>
> load_LDT(¤t->mm->context); /* This does lldt */
>
> in fix_processor_context() computes the address of context as
> (current->mm)+0x7c, which is 0x7c. load_LDT_nolock() dereferences
> 0x7c+0x14 (void *segments = pc->ldt) and the oops follows.
>
> As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
> when I test APM suspend in 2.5.69-bk. A lot of code dereferences
> current->mm without checking, so I guess current->mm==NULL is a bug.
>
i just go and try it with the latest -bk.
-alex
Alex Riesen, Mon, May 19, 2003 14:31:19 +0200:
> > >EIP is at fix_processor_context+0x5f/0x100
> > >Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
> >
> > After receiving Alex' .config and gcc version (3.2.3), I've been
> > able to decipher this. current->mm is NULL in the kapmd task. The call
> >
> > load_LDT(¤t->mm->context); /* This does lldt */
> >
> > in fix_processor_context() computes the address of context as
> > (current->mm)+0x7c, which is 0x7c. load_LDT_nolock() dereferences
> > 0x7c+0x14 (void *segments = pc->ldt) and the oops follows.
> >
> > As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
> > when I test APM suspend in 2.5.69-bk. A lot of code dereferences
> > current->mm without checking, so I guess current->mm==NULL is a bug.
> >
>
> i just go and try it with the latest -bk.
>
no change. Still oopses.
Is it safe to trace this path with printks? I'm about to put them in,
but a good advice could probably come before the compilation finishes.
-alex
Alex Riesen, Mon, May 19, 2003 16:41:30 +0200:
> Alex Riesen, Mon, May 19, 2003 14:31:19 +0200:
> > > >EIP is at fix_processor_context+0x5f/0x100
> > > >Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
> > >
> > > After receiving Alex' .config and gcc version (3.2.3), I've been
> > > able to decipher this. current->mm is NULL in the kapmd task. The call
> > >
> > > load_LDT(¤t->mm->context); /* This does lldt */
> > >
> > > in fix_processor_context() computes the address of context as
> > > (current->mm)+0x7c, which is 0x7c. load_LDT_nolock() dereferences
> > > 0x7c+0x14 (void *segments = pc->ldt) and the oops follows.
> > >
> > > As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
> > > when I test APM suspend in 2.5.69-bk. A lot of code dereferences
> > > current->mm without checking, so I guess current->mm==NULL is a bug.
> > >
> >
> > i just go and try it with the latest -bk.
> >
>
> no change. Still oopses.
>
> Is it safe to trace this path with printks? I'm about to put them in,
> but a good advice could probably come before the compilation finishes.
>
current->mm is NULL even before save_processor_state.
The unlucky wakeup afterwards made the system unstable:
Unable to handle kernel NULL pointer dereference at virtual address 000003ff
printing eip:
c0180015
*pde = 00000000
Oops: 0002 [#2]
CPU: 0
EIP: 0060:[<c0180015>] Not tainted
EFLAGS: 00010297
EIP is at ext3_get_inode_loc+0xf5/0x180
eax: 000003ff ebx: 00000300 ecx: c5ae8604 edx: c12ee9a0
esi: c5ebb200 edi: 00000260 ebp: c5af1c90 esp: c5af1c78
ds: 007b es: 007b ss: 0068
Process syslogd (pid: 171, threadinfo=c5af0000 task=c5bf38c0)
Stack: 00000016 00000013 00026007 00000000 c58cc3c4 c5af1cc4 c5af1cb0 c0180b1d
c5ae8604 c5af1cc3 c5af1cc4 c5af1cc4 c5ae8604 c58cc3c4 c5af1ce0 c0180bba
c58cc3c4 c5ae8604 c5af1cc4 c5af1ce0 c018ab09 c5ee1a80 c58cc3c4 c58cc3c4
Call Trace:
[<c0180b1d>] ext3_reserve_inode_write+0x1d/0xa0
[<c0180bba>] ext3_mark_inode_dirty+0x1a/0x40
[<c018ab09>] journal_start+0x89/0xb0
[<c0180c97>] ext3_dirty_inode+0xb7/0xc0
[<c01688a7>] __mark_inode_dirty+0xf7/0x100
[<c0162e78>] inode_update_time+0x68/0xa0
[<c0132567>] generic_file_aio_write_nolock+0x207/0xac0
[<c020573b>] __kfree_skb+0x7b/0xf0
[<c0252841>] unix_dgram_recvmsg+0x141/0x1f0
[<c0132e8d>] generic_file_write_nolock+0x6d/0x90
[<c0203333>] sys_recvfrom+0x83/0xe0
[<c0130b4d>] unlock_page+0xd/0x50
[<c013d023>] do_wp_page+0x3c3/0x420
[<c015bc2a>] poll_freewait+0x3a/0x50
[<c013307d>] generic_file_writev+0x3d/0x60
[<c014aeb3>] do_readv_writev+0x143/0x270
[<c014aa10>] do_sync_write+0x0/0xb0
[<c014b07b>] vfs_writev+0x4b/0x50
[<c014b0fe>] sys_writev+0x2e/0x50
[<c0109187>] syscall_call+0x7/0xb
Code: 89 10 8b 4a 18 01 cb 89 58 04 8b 55 ec 89 50 08 31 c0 e9 58
and some more, more or less like that. This was the first.
On Mon, 19 May 2003 [email protected] wrote:
> As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
> when I test APM suspend in 2.5.69-bk. A lot of code dereferences
> current->mm without checking, so I guess current->mm==NULL is a bug.
For kernel threads (as is the case with kapmd) current->mm would be NULL.
Zwane
--
function.linuxpower.ca
Alex Riesen wrote:
> Alex Riesen, Mon, May 19, 2003 14:31:19 +0200:
>
>>>>EIP is at fix_processor_context+0x5f/0x100
>>>>Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
>>>
>>>After receiving Alex' .config and gcc version (3.2.3), I've been
>>>able to decipher this. current->mm is NULL in the kapmd task. The call
>>>
>>> load_LDT(¤t->mm->context); /* This does lldt */
>>>
>>>in fix_processor_context() computes the address of context as
>>>(current->mm)+0x7c, which is 0x7c. load_LDT_nolock() dereferences
>>>0x7c+0x14 (void *segments = pc->ldt) and the oops follows.
>>>
>>>As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
>>>when I test APM suspend in 2.5.69-bk. A lot of code dereferences
>>>current->mm without checking, so I guess current->mm==NULL is a bug.
>>
>>i just go and try it with the latest -bk.
>
> no change. Still oopses.
Could you try to compile with gcc-3.3? In another thread (2.5.69-mm6:
pccard oops) this helped IIRC. I'm suspecting gcc 3.2.3 generates
incorrect code for some cases.
Regards,
Carl-Daniel
--
http://www.hailfinger.org/
Carl-Daniel Hailfinger, Tue, May 20, 2003 12:39:54 +0200:
> >>>>EIP is at fix_processor_context+0x5f/0x100
> >>>>Process kapmd (pid: 4, threadinfo=c5f0e000 task=c5fbc640)
> >>>
> >>>After receiving Alex' .config and gcc version (3.2.3), I've been
> >>>able to decipher this. current->mm is NULL in the kapmd task. The call
> >>>
> >>> load_LDT(¤t->mm->context); /* This does lldt */
> >>>
> >>>in fix_processor_context() computes the address of context as
> >>>(current->mm)+0x7c, which is 0x7c. load_LDT_nolock() dereferences
> >>>0x7c+0x14 (void *segments = pc->ldt) and the oops follows.
> >>>
> >>>As to _why_ kapmd's current->mm is NULL, I don't know. It isn't
> >>>when I test APM suspend in 2.5.69-bk. A lot of code dereferences
> >>>current->mm without checking, so I guess current->mm==NULL is a bug.
> >>
> >>i just go and try it with the latest -bk.
> >
> > no change. Still oopses.
>
> Could you try to compile with gcc-3.3? In another thread (2.5.69-mm6:
> pccard oops) this helped IIRC. I'm suspecting gcc 3.2.3 generates
> incorrect code for some cases.
I preferably not. Zwane already mentioned that current->mm allowed to
be NULL for kernel threads, and i seen it be NULL even before suspending.
-alex
Shouldn't this just use active_mm? Can somebody test?
by the way, I saw this with a 486 kernel compiled by
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)
on a Toshiba 2105 (aka 2100 +- sw) 486DX2/50, although I am not
at that computer presenlty to test.
milton
===== arch/i386/kernel/suspend.c 1.15 vs edited =====
--- 1.15/arch/i386/kernel/suspend.c Sat May 10 09:24:02 2003
+++ edited/arch/i386/kernel/suspend.c Tue May 20 11:26:18 2003
@@ -114,7 +114,7 @@
cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff;
load_TR_desc(); /* This does ltr */
- load_LDT(¤t->mm->context); /* This does lldt */
+ load_LDT(¤t->active_mm->context); /* This does lldt */
/*
* Now maybe reload the debug registers
Milton Miller, Tue, May 20, 2003 18:34:09 +0200:
> Shouldn't this just use active_mm? Can somebody test?
It helped.
> by the way, I saw this with a 486 kernel compiled by
> gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)
>
> on a Toshiba 2105 (aka 2100 +- sw) 486DX2/50, although I am not
> at that computer presenlty to test.
>
Also the stability problems I mentioned before gone.
-alex
Alex Riesen, Tue, May 20, 2003 19:00:54 +0200:
> Milton Miller, Tue, May 20, 2003 18:34:09 +0200:
> > Shouldn't this just use active_mm? Can somebody test?
>
> It helped.
>
> > by the way, I saw this with a 486 kernel compiled by
> > gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)
> >
> > on a Toshiba 2105 (aka 2100 +- sw) 486DX2/50, although I am not
> > at that computer presenlty to test.
> >
>
> Also the stability problems I mentioned before gone.
>
the last sentence is not true:
Unable to handle kernel paging request at virtual address 8bd6c008
printing eip:
c0180010
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c0180010>] Not tainted
EFLAGS: 00010216
EIP is at ext3_get_inode_loc+0xc0/0x180
eax: c5eb6000 ebx: 00066080 ecx: 0000000c edx: 00000066
esi: c5ebbe00 edi: 00000100 ebp: c5c89e20 esp: c5c89e08
ds: 007b es: 007b ss: 0068
Process find (pid: 22, threadinfo=c5c88000 task=c5e7e080)
Stack: 00000cc1 00000008 c5c89e30 c49bb984 c5ebbe00 c49bb900 c5c89e58 c018018d
c49bb984 c5c89e3c c5c89e54 c0162920 c5ebbe00 c10fe4f4 000209c2 c10fe4f4
000209c2 c49bb984 c5ebbe00 c56ad0a0 c5c89e74 c018215e c49bb984 c20861c8
Call Trace:
[<c018018d>] ext3_read_inode+0x2d/0x3c0
[<c0162920>] iget_locked+0x90/0xc0
[<c018215e>] ext3_lookup+0x12e/0x140
[<c01570cc>] real_lookup+0xac/0xd0
[<c015739e>] do_lookup+0x6e/0x80
[<c015783a>] link_path_walk+0x48a/0x900
[<c01b74be>] write_chan+0x16e/0x220
[<c0156cd8>] getname+0x78/0xc0
[<c01580e2>] __user_walk+0x32/0x50
[<c01533f7>] vfs_lstat+0x17/0x50
[<c01539d4>] sys_lstat64+0x14/0x30
[<c014aba2>] vfs_write+0xb2/0xf0
[<c014ac5e>] sys_write+0x2e/0x50
[<c0109187>] syscall_call+0x7/0xb
Code: 8b 4c 00 08 01 ca 89 55 f0 8b 46 0c 50 52 8b 86 8c 00 00 00
It is harder to trigger, but possible.
I booted with init=/bin/bash. Than I started this
find / -type f -fprint /dev/stderr -print | xargs cat > /dev/null
and began going in suspend mode and back.
At some point it broke with oops above.
-alex
This isn't the first time crashes have been seen suspending when hardly
any memory has been used. With the 2.4 swsusp project, we've seen
crashes appear when trying to suspend with init=/bin/bash that don't
appear when more memory has been used. Perhaps there's a common issue
here.
Regards,
Nigel
On Wed, 2003-05-21 at 05:17, Alex Riesen wrote:
> It is harder to trigger, but possible.
> I booted with init=/bin/bash. Than I started this
> find / -type f -fprint /dev/stderr -print | xargs cat > /dev/null
> and began going in suspend mode and back.
>
> At some point it broke with oops above.
>
> -alex
--
Nigel Cunningham
495 St Georges Road South, Hastings 4201, New Zealand
Be diligent to present yourself approved to God as a workman who does
not need to be ashamed, handling accurately the word of truth.
-- 2 Timothy 2:14, NASB.
On Wed, 21 May 2003, Nigel Cunningham wrote:
> On Wed, 2003-05-21 at 05:17, Alex Riesen wrote:
> > It is harder to trigger, but possible.
> > I booted with init=/bin/bash. Than I started this
> > find / -type f -fprint /dev/stderr -print | xargs cat > /dev/null
> > and began going in suspend mode and back.
> >
> > At some point it broke with oops above.
Very nice, i'll try it and see what dies
Zwane
--
function.linuxpower.ca