2001-03-22 17:14:37

by Camm Maguire

[permalink] [raw]
Subject: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

Please reply directly as I'm in ECN exile from this mailing list!


[1.] One line summary of the problem:

2.2.18 oops leaves umount hung in disk sleep

[2.] Full description of the problem/report:

Greetings! We have a backup server running 2.2.18,
nfs-kernel-server 0.1.9.1-1 (Debian), mount 2.10f-5.1, autofs
3.1.4-9. Two nights ago, the kernel had an oops when autofs
tried to umount an nfs-mounted filesystem, leaving the umount
process in an uninterruptible state.

15655 ? D 0:00 /bin/umount /mnt/i19

[3.] Keywords (i.e., modules, networking, kernel):

nfs, autofs, mount

[4.] Kernel version (from /proc/version):

intech9# cat /proc/version

Linux version 2.2.18-i586tsc (root@intech19) (gcc version 2.95.2
20000220 (Debian GNU/Linux)) #1 Tue Feb 13 14:31:34 EST 2001


[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

intech9# cat oo.txt

Unable to handle kernel NULL pointer dereference at virtual address 00000000
current->tss.cr3 = 02872000, %%cr3 = 02872000
*pde = 00000000
Oops: 0000
CPU: 0

intech9# ksymoops <oo.txt

ksymoops 2.3.4 on i586 2.2.18-i586tsc. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.2.18-i586tsc/ (default)
-m /boot/System.map-2.2.18-i586tsc (default)

Warning: You did not tell me where to find symbol information. I
will assume that the log matches the kernel and modules that are
running right now and I'll use the default options above for
symbol resolution. If the current kernel and/or modules do not
match the log, you can get more accurate output by telling me the
kernel version and where to find map, modules, ksyms etc.
ksymoops -h explains the options.

Warning (compare_maps): ksyms_base symbol
module_list_R__ver_module_list not found in System.map. Ignoring
ksyms_base entry

Unable to handle kernel NULL pointer dereference at virtual
address 00000000 current->tss.cr3 = 02872000, %%cr3 = 02872000
*pde = 00000000 Oops: 0000 CPU: 0

2 warnings issued. Results may not be reliable.



[6.] A small shell script or example program which triggers the
problem (if possible)

NA

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

intech9# sh ./ver_linux
sh ./ver_linux
-- Versions installed: (if some fields are empty or looks
-- unusual then possibly you have very old versions)
Linux intech9 2.2.18-i586tsc #1 Tue Feb 13 14:31:34 EST 2001 i586 unknown
Kernel modules 2.4.1
Gnu C 2.95.2
Binutils 2.9.5.0.37
Linux C Library 2.1.3
Dynamic linker ldd: version 1.9.11
Procps 2.0.6
Mount 2.10f
Net-tools 2.05
Console-tools 0.2.3
Sh-utils 2.0
Modules Loaded nls_cp437 smbfs st ide-scsi scsi_mod nfsd parport_probe parport_pc lp parport nfs autofs lockd sunrpc ne2k-pci 8390 serial ide-disk ide-probe ide-mod


[7.2.] Processor information (from /proc/cpuinfo):

intech9# cat /proc/cpuinfo
cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 8
model name : AMD-K6(tm) 3D processor
stepping : 12
cpu MHz : 451.019
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mmx 3dnow
bogomips : 901.12



[7.3.] Module information (from /proc/modules):

intech9# cat /proc/modules
cat /proc/modules
nls_cp437 3920 0 (autoclean)
smbfs 26144 0 (autoclean)
st 24160 0 (autoclean)
ide-scsi 7424 0 (autoclean)
scsi_mod 38160 2 (autoclean) [st ide-scsi]
nfsd 161952 8 (autoclean)
parport_probe 3408 0 (autoclean)
parport_pc 7408 1 (autoclean)
lp 5296 0 (autoclean) (unused)
parport 7584 1 (autoclean) [parport_probe parport_pc lp]
nfs 44480 0 (autoclean)
autofs 9136 4 (autoclean)
lockd 43312 1 (autoclean) [nfsd nfs]
sunrpc 58768 1 (autoclean) [nfsd nfs lockd]
ne2k-pci 4176 1 (autoclean)
8390 6144 0 (autoclean) [ne2k-pci]
serial 18016 0 (autoclean)
ide-disk 5736 8 (autoclean)
ide-probe 6116 0 (autoclean)
ide-mod 43384 8 (autoclean) [ide-scsi ide-disk ide-probe]


[7.4.] SCSI information (from /proc/scsi/scsi)

intech9# cat /proc/scsi/scsi
cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: CONNER Model: CTT8000-A Rev: 1.17
Type: Sequential-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: HP Model: COLORADO 14GB Rev: 4.00
Type: Sequential-Access ANSI SCSI revision: 02
intech9#



[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

intech9# cat /proc/15655/status
cat /proc/15655/status
Name: umount
State: D (disk sleep)
Pid: 15655
PPid: 305
Uid: 0 0 0 0
Gid: 0 0 0 0
Groups:
VmSize: 1024 kB
VmLck: 0 kB
VmRSS: 408 kB
VmData: 32 kB
VmStk: 8 kB
VmExe: 36 kB
VmLib: 924 kB
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 8000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff

Here are the syslog excepts:

Mar 21 01:09:01 intech9 automount[15648]: running expiration on path /mnt
Mar 21 01:09:01 intech9 automount[15648]: expired /mnt/local
Mar 21 01:09:01 intech9 automount[15648]: expired /mnt/i19d
Mar 21 01:09:47 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:09:47 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:11:03 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:11:03 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:12:18 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:12:18 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:13:33 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:13:33 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:14:01 intech9 automount[15653]: running expiration on path /mnt
Mar 21 01:14:01 intech9 automount[15653]: expired /mnt/local
Mar 21 01:14:01 intech9 automount[15653]: expired /mnt/i19d
Mar 21 01:14:49 intech9 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Mar 21 01:14:49 intech9 kernel: current->tss.cr3 = 02872000, %%cr3 = 02872000
Mar 21 01:14:49 intech9 kernel: *pde = 00000000
Mar 21 01:14:49 intech9 kernel: Oops: 0000
Mar 21 01:14:49 intech9 kernel: CPU: 0
Mar 21 01:14:49 intech9 automount[305]: using kernel protocol version 3 on reawaken

The nfs host i19 does not show the client as having an active mount at
present.




[X.] Other notes, patches, fixes, workarounds:


Andrea's vmglobal patch applied to kernel. PIII SSE enabling
patch also applied.



--
Camm Maguire [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah


2001-03-22 17:46:27

by Giuliano Pochini

[permalink] [raw]
Subject: 4GB -> no boot



If I compile the kernel with 4GB support it doesn't
boot. The kernel starts fine but the boot sequence
ends with an error about modprobe that repeats
endlessly.
In 1GB It works fine but it doesn't use 128MB of
very useful memory.
Suggestions ?


[Dual-800, 1GB, 2.4.2, egcs-2.91.66, modutils 2.4.2]


Bye.
Giuliano Pochini ->)|(<- Shiny Network {AS6665} ->)|(<-

2001-03-22 18:44:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

>>>>> " " == Camm Maguire <[email protected]> writes:

> 2.2.18 oops leaves umount hung in disk sleep

This is normal behaviour for an Oops ;-)

> Unable to handle kernel NULL pointer dereference at
> virtual address 00000000
current-> tss.cr3 = 02872000, %%cr3 = 02872000
> *pde = 00000000 Oops: 0000 CPU: 0

> intech9# ksymoops <oo.txt

> ksymoops 2.3.4 on i586 2.2.18-i586tsc. Options used -V
> (default) -k /proc/ksyms (default) -l /proc/modules
> (default) -o /lib/modules/2.2.18-i586tsc/ (default) -m
> /boot/System.map-2.2.18-i586tsc (default)

> Warning: You did not tell me where to find symbol
> information. I will assume that the log matches the
> kernel and modules that are running right now and I'll use
> the default options above for symbol resolution. If the
> current kernel and/or modules do not match the log, you
> can get more accurate output by telling me the kernel
> version and where to find map, modules, ksyms etc.
> ksymoops -h explains the options.

> Warning (compare_maps): ksyms_base symbol
> module_list_R__ver_module_list not found in System.map.
> Ignoring ksyms_base entry

> Unable to handle kernel NULL pointer dereference at
> virtual address 00000000 current->tss.cr3 = 02872000,
> %%cr3 = 02872000 *pde = 00000000 Oops: 0000 CPU: 0

Do you have the full ksymoops decode available? The above is somewhat
minimal.

Also please could you try to duplicate the problem with a standard
autofs v3 daemon? I'm not sure that the v4 'automount' is quite as
well tested as the v3 daemon (it still seems to be in beta).

Cheers,
Trond

2001-03-22 19:40:27

by Camm Maguire

[permalink] [raw]
Subject: Re: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

Greetings, and thanks for your reply!

Trond Myklebust <[email protected]> writes:

> >>>>> " " == Camm Maguire <[email protected]> writes:
>
> > 2.2.18 oops leaves umount hung in disk sleep
>
> This is normal behaviour for an Oops ;-)
>
> > Unable to handle kernel NULL pointer dereference at
> > virtual address 00000000
> current-> tss.cr3 = 02872000, %%cr3 = 02872000
> > *pde = 00000000 Oops: 0000 CPU: 0
>
> > intech9# ksymoops <oo.txt
>
> > Unable to handle kernel NULL pointer dereference at
> > virtual address 00000000 current->tss.cr3 = 02872000,
> > %%cr3 = 02872000 *pde = 00000000 Oops: 0000 CPU: 0
>
> Do you have the full ksymoops decode available? The above is somewhat
> minimal.
>

I'd be happy to generate one if I could. I've got the system map.
The defaults reported by ksymoops are all correct. Don't know why it
didn't give me more info. Normally, the info is reported by klogd
anyway, but not here. I've sent you all I currently have. If you can
suggest how I can get more, would be glad to do so.

Take care,

> Also please could you try to duplicate the problem with a standard
> autofs v3 daemon? I'm not sure that the v4 'automount' is quite as
> well tested as the v3 daemon (it still seems to be in beta).
>

I thought I was running v3. Can't seem to find anything now which
indicates the protocol version in use, but was under the impression
that v4 was only an option in 2.4.x, no?

Also, the system is in general *very* stable. It will take quite a
while for this to resurface, I'd guess.

Take care,

> Cheers,
> Trond
>
>

--
Camm Maguire [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah

2001-03-22 22:10:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

>>>>> " " == Camm Maguire <[email protected]> writes:

> I'd be happy to generate one if I could. I've got the system
> map. The defaults reported by ksymoops are all correct. Don't
> know why it didn't give me more info. Normally, the info is
> reported by klogd anyway, but not here. I've sent you all I
> currently have. If you can suggest how I can get more, would
> be glad to do so.


Unless you happen to have a dump from 'dmesg', there's probably not
much you can do to recover the rest of the Oops...

We need at least the line 'EIP:' if we're to find out where the fault
occurred. Are you certain that it can't be found in the syslog?

> I thought I was running v3. Can't seem to find anything now
> which indicates the protocol version in use, but was under the
> impression that v4 was only an option in 2.4.x, no?


Mar 21 01:14:49 intech9 automount[305]: using kernel protocol version 3 on reawaken

Sorry, the above message fooled me.


Cheers,
Trond

2001-03-23 01:46:44

by Camm Maguire

[permalink] [raw]
Subject: Re: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

Greetings! Here are the contiguous lines from kern.log:

Mar 21 01:14:47 intech9 kernel: eth0: bogus packet: status=0x80 nxpg=0x57 size=1270
Mar 21 01:14:49 intech9 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Mar 21 01:14:49 intech9 kernel: current->tss.cr3 = 02872000, %%cr3 = 02872000
Mar 21 01:14:49 intech9 kernel: *pde = 00000000
Mar 21 01:14:49 intech9 kernel: Oops: 0000
Mar 21 01:14:49 intech9 kernel: CPU: 0
Mar 22 12:30:08 intech9 kernel: klogd 1.3-3#33.1, log source = /proc/kmsg started.

Why would this have not been included, would you happen to know? In
any case, I understand that its pretty much impossible to debug now,
right? dmesg wrapped around by the time I got to it (I seem to be
having a lot of ethernet bogus packet messages, as shown above. I've
chalked this up to the heavy traffic during the amanda backup, but
maybe something is wrong here too/instead?)

Thanks again!

Trond Myklebust <[email protected]> writes:

> >>>>> " " == Camm Maguire <[email protected]> writes:
>
> > I'd be happy to generate one if I could. I've got the system
> > map. The defaults reported by ksymoops are all correct. Don't
> > know why it didn't give me more info. Normally, the info is
> > reported by klogd anyway, but not here. I've sent you all I
> > currently have. If you can suggest how I can get more, would
> > be glad to do so.
>
>
> Unless you happen to have a dump from 'dmesg', there's probably not
> much you can do to recover the rest of the Oops...
>
> We need at least the line 'EIP:' if we're to find out where the fault
> occurred. Are you certain that it can't be found in the syslog?
>
> > I thought I was running v3. Can't seem to find anything now
> > which indicates the protocol version in use, but was under the
> > impression that v4 was only an option in 2.4.x, no?
>
>
> Mar 21 01:14:49 intech9 automount[305]: using kernel protocol version 3 on reawaken
>
> Sorry, the above message fooled me.
>
>
> Cheers,
> Trond
>
>

--
Camm Maguire [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah

2001-03-27 15:16:53

by Camm Maguire

[permalink] [raw]
Subject: Re: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

Hello again! We're in luck! The oops happened again, and this time,
the full oops dump appeared on the screen, which I have copied below:

=============================================================================
Unable to handle kernel paging request at virtual address 6e617274
current->tss.c3 = 03d06000, %cr3 = 03d06000
*pde = 00000000
Oops = 0
CPU = 0
EIP = 0010:[<c012facf>]
EFLAGS = 00010a87
eax = 00000000 ebx = 6e617274 ecx = 6e617274 edx = 00000006
esi = 00000006 edi = c3bda800 ebp = 00000006 esp = c2381f5c
ds = 0018 es = 0018 ss = 0018
Process umount (pid:6942, process nr:58,stackpage = c2381000)
Stack: fffffffe c01281a2 c3bda800 00000006 c25db80c 00000006 fffffffa c01282e8
00000006 00000000 00000000 00000000 08050006 c0b176e0 08054208 00000000
c01283e3 00000006 00000000 c2380000 08054209 0804fa20 c01283fc 08054208
Call Trace: [<c01281a2>][<c01282e8>][<c01283e3>][<c01283fc>][<c01094fc>]
code: 8b 1b 39 79 34 75 ef 8b 41 04 8b 11 89 42 04 89 10 a1 e4 4d
=============================================================================
and through ksymoops:
=============================================================================
intech9# ksymoops</home/camm/oops
ksymoops</home/camm/oops
ksymoops 2.3.4 on i586 2.2.18-i586tsc. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.2.18-i586tsc/ (default)
-m /boot/System.map-2.2.18-i586tsc (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_maps): ksyms_base symbol module_list_R__ver_module_list not found in System.map. Ignoring ksyms_base entry
Unable to handle kernel paging request at virtual address 6e617274
current->tss.c3 = 03d06000, %cr3 = 03d06000
*pde = 00000000
Stack: fffffffe c01281a2 c3bda800 00000006 c25db80c 00000006 fffffffa c01282e8
00000006 00000000 00000000 00000000 08050006 c0b176e0 08054208 00000000
c01283e3 00000006 00000000 c2380000 08054209 0804fa20 c01283fc 08054208
Call Trace: [<c01281a2>][<c01282e8>][<c01283e3>][<c01283fc>][<c01094fc>]
code: 8b 1b 39 79 34 75 ef 8b 41 04 8b 11 89 42 04 89 10 a1 e4 4d
Using defaults from ksymoops -t elf32-i386 -a i386

Trace; c01281a2 <do_umount+5a/144>
Trace; c01282e8 <umount_dev+5c/ac>
Trace; c01283e3 <sys_umount+ab/b8>
Trace; c01283fc <sys_oldumount+c/10>
Trace; c01094fc <system_call+34/38>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 8b 1b mov (%ebx),%ebx
Code; 00000002 Before first symbol
2: 39 79 34 cmp %edi,0x34(%ecx)
Code; 00000005 Before first symbol
5: 75 ef jne fffffff6 <_EIP+0xfffffff6> fffffff6 <END_OF_CODE+3b764797/????>
Code; 00000007 Before first symbol
7: 8b 41 04 mov 0x4(%ecx),%eax
Code; 0000000a Before first symbol
a: 8b 11 mov (%ecx),%edx
Code; 0000000c Before first symbol
c: 89 42 04 mov %eax,0x4(%edx)
Code; 0000000f Before first symbol
f: 89 10 mov %edx,(%eax)
Code; 00000011 Before first symbol
11: a1 e4 4d 00 00 mov 0x4de4,%eax


2 warnings issued. Results may not be reliable.
=============================================================================

As before, the oops was truncated in the log.

I received two eth0 (ne2k-pci,using 8390) errors before this oops:
eth0: next frame inconsistency, 0xf2
eth0: next frame inconsistency, 0xb8

And then one eth0 error after the oops before the machine died:
eth0: bogus packet: status=0x80 nxpg=0x7b size=1518

I will be trying to recompile with gcc272 to see if anything changes.
In the meantime, I'd greatly appreciate any insights!

Take care,

Trond Myklebust <[email protected]> writes:

> >>>>> " " == Camm Maguire <[email protected]> writes:
>
> > Greetings! Here are the contiguous lines from kern.log: Mar 21
> > 01:14:47 intech9 kernel: eth0: bogus packet: status=0x80
> > nxpg=0x57 size=1270 Mar 21 01:14:49 intech9 kernel: Unable to
> > handle kernel NULL pointer dereference at virtual address
> > 00000000 Mar 21 01:14:49 intech9 kernel: current->tss.cr3 =
> > 02872000, %%cr3 = 02872000 Mar 21 01:14:49 intech9 kernel: *pde
> > = 00000000 Mar 21 01:14:49 intech9 kernel: Oops: 0000 Mar 21
> > 01:14:49 intech9 kernel: CPU: 0 Mar 22 12:30:08 intech9 kernel:
> > klogd 1.3-3#33.1, log source = /proc/kmsg started.
>
> > Why would this have not been included, would you happen to
> > know? In any case, I understand that its pretty much
>
> I've no idea why it wasn't logged. Did you possibly reboot without
> syncing the disk?
>
> > impossible to debug now, right? dmesg wrapped around by the
> > time I got to it (I seem to be having a lot of ethernet bogus
> > packet messages, as shown above. I've chalked this up to the
> > heavy traffic during the amanda backup, but maybe something is
> > wrong here too/instead?)
>
> Have you tried to use an older version of gcc? AFAIK gcc-2.95.2 has a
> lot of bugs that are known to cause problems with the kernel. If you
> are having additional problems such as the bogus ethernet packets,
> then it might be worth your while to experiment a bit to see whether
> this might be some corruption problem.
>
> Cheers,
> Trond
>
>

--
Camm Maguire [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah

2001-03-28 23:20:37

by Camm Maguire

[permalink] [raw]
Subject: Re: PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

OK, some new information. Apparently, the ethernet traffic is getting
corrupted by heavy disk access to the second disk on my primary ALI
5229 controller. I suspect this is related to the oops, as the kernel
log messages reporting the errors tend to come roughly at the same
time as the oopses.

Here is the test: I run netpipe-tcp to the host while running bonnie
on the second disk. I then receive quite a few messages reading:
Mar 28 17:55:33 intech9 kernel: eth0: bogus packet: status=0x80 nxpg=0x6e size=1518
Mar 28 17:56:25 intech9 kernel: eth0: bogus packet: status=0x80 nxpg=0x69 size=1518

The size is always 1518. There are also other less frequent messages
which I can collate if needed. Running thee same test with bonnie on
the first disk does not produce the error. Turning dma off on both
disks does not help.

Here is my disk information:
=============================================================================
intech9:/proc/ide/ide0/hda# for i in * ; do echo "---"$i"---" ; cat $i ; done
for i in * ; do echo "---"$i"---" ; cat $i ; done
---cache---
512
---capacity---
12672450
---driver---
ide-disk version 1.08
---geometry---
physical 13410/15/63
logical 788/255/63
---identify---
045a 3462 0000 000f 0000 0000 003f 0000
0000 0000 2020 2020 2020 2020 2020 2020
3035 3338 3232 3537 0000 0400 0004 4544
2d30 332d 3038 4655 4a49 5453 5520 4d50
4533 3036 3441 5420 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 0b00 0000 0200 0000 0007 3462 000f
003f 5dc2 00c1 0000 5dc2 00c1 0000 0007
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
001e 0000 346b 4008 4000 0061 0000 4000
041f 0006 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
---media---
disk
---model---
FUJITSU MPE3064AT
---settings---
name value min max mode
---- ----- --- --- ----
bios_cyl 788 0 65535 rw
bios_head 255 0 255 rw
bios_sect 63 0 63 rw
breada_readahead 4 0 127 rw
bswap 0 0 1 r
file_readahead 124 0 2097151 rw
io_32bit 0 0 3 rw
keepsettings 0 0 1 rw
max_kb_per_request 64 1 127 rw
multcount 0 0 8 rw
nice1 1 0 1 rw
nowerr 0 0 1 rw
pio_mode write-only 0 255 w
slow 0 0 1 rw
unmaskirq 0 0 1 rw
using_dma 0 0 1 rw
---smart_thresholds---
0010 2001 0000 0000 0000 0000 0000 1402
0000 0000 0000 0000 0000 1903 0000 0000
0000 0000 0000 1004 0000 0000 0000 0000
0000 1805 0000 0000 0000 0000 0000 1407
0000 0000 0000 0000 0000 1308 0000 0000
0000 0000 0000 1409 0000 0000 0000 0000
0000 140a 0000 0000 0000 0000 0000 140c
0000 0000 0000 0000 0000 18c4 0000 0000
0000 0000 0000 14c5 0000 0000 0000 0000
0000 14c6 0000 0000 0000 0000 0000 00c7
0000 0000 0000 0000 0000 14c8 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 a900
---smart_values---
0010 0b01 6400 5164 02e2 0000 0000 0502
6400 0064 0000 0000 0000 0703 6000 0160
0000 0000 0000 1204 6400 2964 0000 0000
0000 3305 6400 0064 0000 0000 0000 0b07
6400 e664 0006 0000 0000 0508 6400 0064
0000 0000 0000 1209 3800 7d38 71c5 0001
0000 130a 6400 0064 0000 0000 0000 320c
6400 2964 0000 0000 0000 33c4 6400 0064
0000 0000 0000 10c5 6400 0064 0000 0000
0000 10c6 6400 0064 0000 0000 0000 0ac7
c800 00c8 0000 0000 0000 0bc8 6400 0d61
0001 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 00b4 1b00
0002 0001 0802 0000 0000 0000 0000 0000
0000 3e50 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 671d 2958 087e 7b50 9700
intech9:/proc/ide/ide0/hda# cd ../hdb
cd ../hdb
intech9:/proc/ide/ide0/hdb# for i in * ; do echo "---"$i"---" ; cat $i ; done
for i in * ; do echo "---"$i"---" ; cat $i ; done
---cache---
128
---capacity---
4127760
---driver---
ide-disk version 1.08
---geometry---
physical 4095/16/63
logical 4095/16/63
---identify---
045a 0fff 0000 0010 ffff 03b7 003f 0000
0000 0000 2020 2020 2020 2020 2020 2020
4a42 3831 3830 3334 0003 0100 0016 3038
2e30 382e 3031 5354 3332 3134 3041 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8020
0000 0b00 0000 0200 0000 0003 0fff 0010
003f fc10 003e 0100 fc10 003e 0007 0407
0003 0078 0078 00b4 0078 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
---media---
disk
---model---
ST32140A
---settings---
name value min max mode
---- ----- --- --- ----
bios_cyl 4095 0 65535 rw
bios_head 16 0 255 rw
bios_sect 63 0 63 rw
breada_readahead 4 0 127 rw
bswap 0 0 1 r
file_readahead 124 0 2097151 rw
io_32bit 0 0 3 rw
keepsettings 0 0 1 rw
max_kb_per_request 64 1 127 rw
multcount 0 0 16 rw
nice1 1 0 1 rw
nowerr 0 0 1 rw
pio_mode write-only 0 255 w
slow 0 0 1 rw
unmaskirq 0 0 1 rw
using_dma 0 0 1 rw
---smart_thresholds---
0005 5505 0000 0000 0000 0000 0000 5f07
0000 0000 0000 0000 0000 620a 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 cf00
---smart_values---
0005 1305 6300 0263 0000 0000 0000 0b07
6400 0064 0000 0000 0000 130a 6400 0064
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0003 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 e61e
0403 002c 0000 104a 07cf 1c9b 01f2 0afd
0eed 19c5 140c 0000 0000 0000 0000 0000
0000 0000 0000 0000 0502 3307 0000 0700
intech9:/proc/ide/ide0/hdb# hdparm -d 1 /dev/hda
hdparm -d 1 /dev/hda

/dev/hda:
setting using_dma to 1 (on)
using_dma = 1 (on)
intech9:/proc/ide/ide0/hdb# hdparm /dev/hda
hdparm /dev/hda

/dev/hda:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 788/255/63, sectors = 12672450, start = 0
intech9:/proc/ide/ide0/hdb# hdparm /dev/hdb
hdparm /dev/hdb

/dev/hdb:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 4095/16/63, sectors = 4127760, start = 0
=============================================================================
and the controller:
=============================================================================
Mar 28 09:56:45 intech9 kernel: ALI15X3: IDE controller on PCI bus 00 dev 78
Mar 28 09:56:45 intech9 kernel: ALI15X3: not 100%% native mode: will probe irqs later
Mar 28 09:56:45 intech9 kernel: ide0: BM-DMA at 0xb400-0xb407, BIOS settings: hda:DMA, hdb:DMA
Mar 28 09:56:45 intech9 kernel:
Mar 28 09:56:45 intech9 kernel: ************************************
Mar 28 09:56:45 intech9 kernel: * ALi IDE driver (1.0 beta3) *
Mar 28 09:56:45 intech9 kernel: * Chip Revision is C1 *
Mar 28 09:56:45 intech9 kernel: * Maximum capability is - UDMA 33 *
Mar 28 09:56:45 intech9 kernel: ************************************
Mar 28 09:56:45 intech9 kernel:
Mar 28 09:56:45 intech9 kernel: ide1: BM-DMA at 0xb408-0xb40f, BIOS settings: hdc:pio, hdd:pio
Mar 28 09:56:45 intech9 kernel: hda: FUJITSU MPE3064AT, ATA DISK drive
Mar 28 09:56:45 intech9 kernel: hdb: ST32140A, ATA DISK drive
Mar 28 09:56:45 intech9 kernel: hdc: CONNER CTT8000-A, ATAPI TAPE drive
Mar 28 09:56:45 intech9 kernel: hdd: HP COLORADO 14GB, ATAPI TAPE drive
Mar 28 09:56:45 intech9 kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Mar 28 09:56:45 intech9 kernel: ide1 at 0x170-0x177,0x376 on irq 15
Mar 28 09:56:45 intech9 kernel: ALI15X3: Ultra DMA enabled
Mar 28 09:56:45 intech9 kernel: hda: FUJITSU MPE3064AT, 6187MB w/512kB Cache, CHS=788/255/63, (U)DMA
Mar 28 09:56:45 intech9 kernel: ALI15X3: MultiWord DMA enabled
Mar 28 09:56:45 intech9 kernel: hdb: ST32140A, 2015MB w/128kB Cache, CHS=4095/16/63, DMA
Mar 28 09:56:45 intech9 kernel: hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
Mar 28 09:56:45 intech9 kernel: hdb: hdb1 hdb2

intech9:/proc/ide/ide0/hdb# lspci
lspci
00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04)
00:01.0 PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 04)
00:03.0 Bridge: Acer Laboratories Inc. [ALi] M7101 PMU
00:07.0 ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev c3)
00:09.0 Ethernet controller: Winbond Electronics Corp W89C940
00:0f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c1)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP (rev 65)

=============================================================================


Any insights most appreciated!


Camm Maguire <[email protected]> writes:

> Hello again! We're in luck! The oops happened again, and this time,
> the full oops dump appeared on the screen, which I have copied below:
>
> =============================================================================
> Unable to handle kernel paging request at virtual address 6e617274
> current->tss.c3 = 03d06000, %cr3 = 03d06000
> *pde = 00000000
> Oops = 0
> CPU = 0
> EIP = 0010:[<c012facf>]
> EFLAGS = 00010a87
> eax = 00000000 ebx = 6e617274 ecx = 6e617274 edx = 00000006
> esi = 00000006 edi = c3bda800 ebp = 00000006 esp = c2381f5c
> ds = 0018 es = 0018 ss = 0018
> Process umount (pid:6942, process nr:58,stackpage = c2381000)
> Stack: fffffffe c01281a2 c3bda800 00000006 c25db80c 00000006 fffffffa c01282e8
> 00000006 00000000 00000000 00000000 08050006 c0b176e0 08054208 00000000
> c01283e3 00000006 00000000 c2380000 08054209 0804fa20 c01283fc 08054208
> Call Trace: [<c01281a2>][<c01282e8>][<c01283e3>][<c01283fc>][<c01094fc>]
> code: 8b 1b 39 79 34 75 ef 8b 41 04 8b 11 89 42 04 89 10 a1 e4 4d
> =============================================================================
> and through ksymoops:
> =============================================================================
> intech9# ksymoops</home/camm/oops
> ksymoops</home/camm/oops
> ksymoops 2.3.4 on i586 2.2.18-i586tsc. Options used
> -V (default)
> -k /proc/ksyms (default)
> -l /proc/modules (default)
> -o /lib/modules/2.2.18-i586tsc/ (default)
> -m /boot/System.map-2.2.18-i586tsc (default)
>
> Warning: You did not tell me where to find symbol information. I will
> assume that the log matches the kernel and modules that are running
> right now and I'll use the default options above for symbol resolution.
> If the current kernel and/or modules do not match the log, you can get
> more accurate output by telling me the kernel version and where to find
> map, modules, ksyms etc. ksymoops -h explains the options.
>
> Warning (compare_maps): ksyms_base symbol module_list_R__ver_module_list not found in System.map. Ignoring ksyms_base entry
> Unable to handle kernel paging request at virtual address 6e617274
> current->tss.c3 = 03d06000, %cr3 = 03d06000
> *pde = 00000000
> Stack: fffffffe c01281a2 c3bda800 00000006 c25db80c 00000006 fffffffa c01282e8
> 00000006 00000000 00000000 00000000 08050006 c0b176e0 08054208 00000000
> c01283e3 00000006 00000000 c2380000 08054209 0804fa20 c01283fc 08054208
> Call Trace: [<c01281a2>][<c01282e8>][<c01283e3>][<c01283fc>][<c01094fc>]
> code: 8b 1b 39 79 34 75 ef 8b 41 04 8b 11 89 42 04 89 10 a1 e4 4d
> Using defaults from ksymoops -t elf32-i386 -a i386
>
> Trace; c01281a2 <do_umount+5a/144>
> Trace; c01282e8 <umount_dev+5c/ac>
> Trace; c01283e3 <sys_umount+ab/b8>
> Trace; c01283fc <sys_oldumount+c/10>
> Trace; c01094fc <system_call+34/38>
> Code; 00000000 Before first symbol
> 00000000 <_EIP>:
> Code; 00000000 Before first symbol
> 0: 8b 1b mov (%ebx),%ebx
> Code; 00000002 Before first symbol
> 2: 39 79 34 cmp %edi,0x34(%ecx)
> Code; 00000005 Before first symbol
> 5: 75 ef jne fffffff6 <_EIP+0xfffffff6> fffffff6 <END_OF_CODE+3b764797/????>
> Code; 00000007 Before first symbol
> 7: 8b 41 04 mov 0x4(%ecx),%eax
> Code; 0000000a Before first symbol
> a: 8b 11 mov (%ecx),%edx
> Code; 0000000c Before first symbol
> c: 89 42 04 mov %eax,0x4(%edx)
> Code; 0000000f Before first symbol
> f: 89 10 mov %edx,(%eax)
> Code; 00000011 Before first symbol
> 11: a1 e4 4d 00 00 mov 0x4de4,%eax
>
>
> 2 warnings issued. Results may not be reliable.
> =============================================================================
>
> As before, the oops was truncated in the log.
>
> I received two eth0 (ne2k-pci,using 8390) errors before this oops:
> eth0: next frame inconsistency, 0xf2
> eth0: next frame inconsistency, 0xb8
>
> And then one eth0 error after the oops before the machine died:
> eth0: bogus packet: status=0x80 nxpg=0x7b size=1518
>
> I will be trying to recompile with gcc272 to see if anything changes.
> In the meantime, I'd greatly appreciate any insights!
>
> Take care,
>
> Trond Myklebust <[email protected]> writes:
>
> > >>>>> " " == Camm Maguire <[email protected]> writes:
> >
> > > Greetings! Here are the contiguous lines from kern.log: Mar 21
> > > 01:14:47 intech9 kernel: eth0: bogus packet: status=0x80
> > > nxpg=0x57 size=1270 Mar 21 01:14:49 intech9 kernel: Unable to
> > > handle kernel NULL pointer dereference at virtual address
> > > 00000000 Mar 21 01:14:49 intech9 kernel: current->tss.cr3 =
> > > 02872000, %%cr3 = 02872000 Mar 21 01:14:49 intech9 kernel: *pde
> > > = 00000000 Mar 21 01:14:49 intech9 kernel: Oops: 0000 Mar 21
> > > 01:14:49 intech9 kernel: CPU: 0 Mar 22 12:30:08 intech9 kernel:
> > > klogd 1.3-3#33.1, log source = /proc/kmsg started.
> >
> > > Why would this have not been included, would you happen to
> > > know? In any case, I understand that its pretty much
> >
> > I've no idea why it wasn't logged. Did you possibly reboot without
> > syncing the disk?
> >
> > > impossible to debug now, right? dmesg wrapped around by the
> > > time I got to it (I seem to be having a lot of ethernet bogus
> > > packet messages, as shown above. I've chalked this up to the
> > > heavy traffic during the amanda backup, but maybe something is
> > > wrong here too/instead?)
> >
> > Have you tried to use an older version of gcc? AFAIK gcc-2.95.2 has a
> > lot of bugs that are known to cause problems with the kernel. If you
> > are having additional problems such as the bogus ethernet packets,
> > then it might be worth your while to experiment a bit to see whether
> > this might be some corruption problem.
> >
> > Cheers,
> > Trond
> >
> >
>
> --
> Camm Maguire [email protected]
> ==========================================================================
> "The earth is but one country, and mankind its citizens." -- Baha'u'llah
>
>

--
Camm Maguire [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah