2002-06-19 09:25:25

by Richard Ems

[permalink] [raw]
Subject: kernel OOPS: 2.4.18, nscd, nfsd

Hi all!

Two kernel Oopses in short time (22:35:59 and 22:50:00). But the computer was still alive until 00:00:00, where the daily cron jobs are started and then ... kernel panic, LED's where blinking :(

kernel is 2.4.18, from SuSE's k_deflt-2.4.18-174 package (2.4.19-pre10aa2)

Please CC to [email protected], I'm not on the linux-kernel mailing list.

Thanks, Richard Ems


# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(TM) XP1800+
stepping : 2
cpu MHz : 1544.555
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3080.19



# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP]
00:05.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
00:0d.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
00:11.0 ISA bridge: VIA Technologies, Inc.: Unknown device 3147
00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:11.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 23)
00:11.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 23)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)



# scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux bingo 2.4.18-4GB #1 Fri Jun 14 17:46:33 UTC 2002 i686 unknown

Gnu C 2.95.3
Gnu make 3.79.1
util-linux 2.11n
mount 2.11n
modutils 2.4.12
e2fsprogs 1.26
PPP 2.4.1
Linux C Library x 1 root root 1394238 Mar 23 19:34 /lib/libc.so.6
Dynamic linker (ldd) 2.2.5
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0
Modules Loaded nfsd autofs4 matroxfb_base matroxfb_Ti3026 matroxfb_DAC1064 matroxfb_accel fbcon-cfb4 g450_pll matroxfb_misc 3c59x ext3 jbd lvm-mod


Extract from /var/log/messages:

...
Jun 18 22:21:36 bingo automount[23936]: expired /net/jupiter
Jun 18 22:22:52 bingo automount[23942]: expired /net/diablo
Jun 18 22:30:00 bingo /USR/SBIN/CRON[23961]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 22:35:59 bingo kernel: Unable to handle kernel paging request at virtual address 92766008
Jun 18 22:35:59 bingo kernel: printing eip:
Jun 18 22:35:59 bingo kernel: c0217944
Jun 18 22:35:59 bingo kernel: *pde = 00000000
Jun 18 22:35:59 bingo kernel: Oops: 0000
Jun 18 22:35:59 bingo kernel: CPU: 0
Jun 18 22:35:59 bingo kernel: EIP: 0010:[sock_poll+4/32] Not tainted
Jun 18 22:35:59 bingo kernel: EFLAGS: 00210282
Jun 18 22:35:59 bingo kernel: eax: c0217940 ebx: 00000145 ecx: 00000000 edx: 92766000
Jun 18 22:35:59 bingo kernel: esi: dcd88000 edi: c2bf76e0 ebp: 00000000 esp: c3419f28
Jun 18 22:35:59 bingo kernel: ds: 0018 es: 0018 ss: 0018
Jun 18 22:35:59 bingo kernel: Process nscd (pid: 863, stackpage=c3419000)
Jun 18 22:35:59 bingo kernel: Stack: c0149610 92766000 00000000 00000000 000005dd 00000000 00000000 c01496fc
Jun 18 22:35:59 bingo kernel: 00000001 dcd88000 c3419f60 c3419f64 c3418000 c3418000 00000000 00000000
Jun 18 22:35:59 bingo kernel: 00000001 bf7ffa04 00000000 00000001 c0149860 00000001 00000000 00000001
Jun 18 22:35:59 bingo kernel: Call Trace: [do_pollfd+128/144] [do_poll+220/240] [sys_poll+336/704] [sys_time+17/80] [system_call+51/64]
Jun 18 22:35:59 bingo kernel:
Jun 18 22:35:59 bingo kernel: Code: 8b 42 08 8b 40 08 05 14 01 00 00 8b 48 08 ff 74 24 08 50 52
Jun 18 22:35:59 bingo kernel: klogd 1.4.1, ---------- state change ----------
Jun 18 22:35:59 bingo kernel: Inspecting /boot/System.map-2.4.18-4GB
Jun 18 22:35:59 bingo kernel: Loaded 13574 symbols from /boot/System.map-2.4.18-4GB.
Jun 18 22:35:59 bingo kernel: Symbols match kernel version 2.4.18.
Jun 18 22:35:59 bingo kernel: Loaded 168 symbols from 13 modules.
Jun 18 22:40:00 bingo /USR/SBIN/CRON[24006]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 22:50:00 bingo /USR/SBIN/CRON[24053]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 22:50:00 bingo kernel: <1>Unable to handle kernel paging request at virtual address 42627044
Jun 18 22:50:00 bingo kernel: printing eip:
Jun 18 22:50:00 bingo kernel: 42627044
Jun 18 22:50:00 bingo kernel: *pde = 00000000
Jun 18 22:50:00 bingo kernel: Oops: 0000
Jun 18 22:50:00 bingo kernel: CPU: 0
Jun 18 22:50:00 bingo kernel: EIP: 0010:[zisofs_cleanup+1113747380/-1072693392] Not tainted
Jun 18 22:50:00 bingo kernel: EFLAGS: 00210282
Jun 18 22:50:00 bingo kernel: eax: c1607270 ebx: c39d5c9c ecx: cdb20020 edx: cdb20d40
Jun 18 22:50:00 bingo kernel: esi: 00000011 edi: 00000003 ebp: 00000007 esp: c370bf30
Jun 18 22:50:00 bingo kernel: ds: 0018 es: 0018 ss: 0018
Jun 18 22:50:00 bingo kernel: Process nfsd (pid: 839, stackpage=c370b000)
Jun 18 22:50:00 bingo kernel: Stack: cdb20d40 d7f64160 c2668fb8 00000001 00000003 00000011 c364d800 c362f580
Jun 18 22:50:00 bingo kernel: c6934014 c362f580 c3620505 c364d800 00000002 cf131240 c6934014 c364dc9c
Jun 18 22:50:00 bingo kernel: c02687ec c364d800 c6934014 00000000 00000027 00000007 c6934014 00000000
Jun 18 22:50:00 bingo kernel: Call Trace: [nfsd:__insmod_nfsd_S.data_L2432+1888/2432] [nfsd:__insmod_nfsd_S.data_L2432+1888/2432] [nfsd:__insmod_nfsd_S.text_L52871+1189/52872] [svc_process+1100/1344] [nfsd:__insmod_nfsd_S.text_L52871+806/52872]
Jun 18 22:50:00 bingo kernel: [nfsd:__insmod_nfsd_S.data_L2432+0/2432] [kernel_thread+38/48] [nfsd:__insmod_nfsd_S.text_L52871+352/52872]
Jun 18 22:50:00 bingo kernel:
Jun 18 22:50:00 bingo kernel: Code: Bad EIP value.
Jun 18 22:59:00 bingo /USR/SBIN/CRON[24075]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Jun 18 23:00:00 bingo /USR/SBIN/CRON[24080]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 23:10:00 bingo /USR/SBIN/CRON[24137]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 23:20:00 bingo /USR/SBIN/CRON[24184]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 23:30:00 bingo /USR/SBIN/CRON[24209]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 23:40:00 bingo /USR/SBIN/CRON[24254]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 23:50:00 bingo /USR/SBIN/CRON[24301]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 18 23:59:00 bingo /USR/SBIN/CRON[24323]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Jun 19 00:00:00 bingo /USR/SBIN/CRON[24329]: (root) CMD ( /usr/lib/sa/sa2 -A #update reports every 6 hour)
Jun 19 00:00:00 bingo /USR/SBIN/CRON[24330]: (root) CMD ( /usr/lib/sa/sa1 )
Jun 19 10:25:25 bingo syslogd 1.4.1: restart.
...


Thanks again, Richard

--
Richard Ems
... e-mail: [email protected]
... Computer Science, University of Hamburg

Unix IS user friendly. It's just selective about who its friends are.



2002-06-19 10:22:39

by Richard Ems

[permalink] [raw]
Subject: Re: kernel OOPS: 2.4.18, nscd, nfsd

I already noticed that SuSE's k_deflt-2.4.18-174 has nfsd compiled as a module and the k_deflt-2.4.18-58 kernel that came with SuSE 8.0 had it compiled in ...???

Thanks, Richard

--
Richard Ems
... e-mail: [email protected]
... Computer Science, University of Hamburg

Unix IS user friendly. It's just selective about who its friends are.


2002-06-19 12:01:58

by Richard Ems

[permalink] [raw]
Subject: Re: kernel OOPS: 2.4.18, nscd, nfsd


Richard Ems wrote:

> I already noticed that SuSE's k_deflt-2.4.18-174 has nfsd compiled as a module and the k_deflt-2.4.18-58 kernel that came with SuSE 8.0 had it compiled in ...???

This is NOT TRUE..
On both kernels is nfsd compiled as a module!

Thanks, Richard

--
Richard Ems
... e-mail: [email protected]
... Computer Science, University of Hamburg

Unix IS user friendly. It's just selective about who its friends are.


2002-06-21 06:21:28

by NeilBrown

[permalink] [raw]
Subject: Re: kernel OOPS: 2.4.18, nscd, nfsd

On Wednesday June 19, [email protected] wrote:
> Hi all!
>
> Two kernel Oopses in short time (22:35:59 and 22:50:00). But the computer was still alive until 00:00:00, where the daily cron jobs are started and then ... kernel panic, LED's where blinking :(
>
> kernel is 2.4.18, from SuSE's k_deflt-2.4.18-174 package (2.4.19-pre10aa2)
>
> Please CC to [email protected], I'm not on the linux-kernel mailing
> list.

Would I be right is surmising that you are exporting an ISO filesystem
over NFS?? That would be the second Oops in as many days with that
scenario.

If that is the case, then I'm afraid that I cannot point you to any
fix, though exporting with "no_subtree_check" may reduce the incidence.

NeilBrown

2002-06-22 15:41:17

by Richard Ems

[permalink] [raw]
Subject: Re: kernel OOPS: 2.4.18, nscd, nfsd


Neil Brown wrote:

> On Wednesday June 19, [email protected] wrote:
> > Hi all!
> >
> > Two kernel Oopses in short time (22:35:59 and 22:50:00). But the computer was still alive until 00:00:00, where the daily cron jobs are started and then ... kernel panic, LED's where blinking :(
> >
> > kernel is 2.4.18, from SuSE's k_deflt-2.4.18-174 package (2.4.19-pre10aa2)
> >
> > Please CC to [email protected], I'm not on the linux-kernel mailing
> > list.
>
> Would I be right is surmising that you are exporting an ISO filesystem
> over NFS?? That would be the second Oops in as many days with that
> scenario.
>
> If that is the case, then I'm afraid that I cannot point you to any
> fix, though exporting with "no_subtree_check" may reduce the incidence.
>
> NeilBrown

No, no ISO fs exported.

My /etc/exports:

# cat /etc/exports
/home \
diablo(rw,no_root_squash) @linux(rw,root_squash) @unix(rw,root_squash) mtgvaio1(ro,root_squash) @cluster01_hosts(rw,root_squash)
/tmp \
diablo(rw,no_root_squash) @linux(rw,root_squash) @unix(ro,root_squash) mtgvaio1(ro,root_squash) @cluster01_hosts(rw,root_squash)
/usr/local \
diablo(ro,no_root_squash) @linux(ro,root_squash) @unix(ro,root_squash) mtgvaio1(ro,root_squash) @cluster01_hosts(ro,root_squash)

and

# egrep " /home | /tmp | /usr " /proc/mounts
/dev/vg01/home /home ext3 rw 0 0
/dev/vg01/tmp /tmp ext3 rw 0 0
/dev/vg01/usr /usr ext3 rw 0 0

all exported filesystems are ext3 over LVM.

(I stopped a while ago exporting ISO fs's over nfs, I'm now copying the distro DVD (SuSE) on a HD ... Any "good" solution for exporting ISO fs's over NFS ? the problem is always when you need the
cdrom/dvd drive and want to umount the NFS exported cdrom/dvd, then ...)

What about the first Oops ?

Thanks, Richard



--
Richard Ems
... e-mail: [email protected]
... Computer Science, University of Hamburg

Unix IS user friendly. It's just selective about who its friends are.


2002-07-05 10:02:24

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: kernel OOPS: 2.4.18, nscd, nfsd

On Wed, Jun 19, 2002 at 11:25:08AM +0200, Richard Ems wrote:
> Jun 18 22:35:59 bingo kernel: Unable to handle kernel paging request at virtual address 92766008
> Jun 18 22:35:59 bingo kernel: printing eip:
> Jun 18 22:35:59 bingo kernel: c0217944
> Jun 18 22:35:59 bingo kernel: *pde = 00000000
> Jun 18 22:35:59 bingo kernel: Oops: 0000
> Jun 18 22:35:59 bingo kernel: CPU: 0
> Jun 18 22:35:59 bingo kernel: EIP: 0010:[sock_poll+4/32] Not tainted
> Jun 18 22:35:59 bingo kernel: EFLAGS: 00210282
> Jun 18 22:35:59 bingo kernel: eax: c0217940 ebx: 00000145 ecx: 00000000 edx: 92766000
> Jun 18 22:35:59 bingo kernel: esi: dcd88000 edi: c2bf76e0 ebp: 00000000 esp: c3419f28
> Jun 18 22:35:59 bingo kernel: ds: 0018 es: 0018 ss: 0018
> Jun 18 22:35:59 bingo kernel: Process nscd (pid: 863, stackpage=c3419000)
> Jun 18 22:35:59 bingo kernel: Stack: c0149610 92766000 00000000 00000000 000005dd 00000000 00000000 c01496fc
> Jun 18 22:35:59 bingo kernel: 00000001 dcd88000 c3419f60 c3419f64 c3418000 c3418000 00000000 00000000
> Jun 18 22:35:59 bingo kernel: 00000001 bf7ffa04 00000000 00000001 c0149860 00000001 00000000 00000001
> Jun 18 22:35:59 bingo kernel: Call Trace: [do_pollfd+128/144] [do_poll+220/240] [sys_poll+336/704] [sys_time+17/80] [system_call+51/64]
> Jun 18 22:35:59 bingo kernel:
> Jun 18 22:35:59 bingo kernel: Code: 8b 42 08 8b 40 08 05 14 01 00 00 8b 48 08 ff 74 24 08 50 52
> Jun 18 22:35:59 bingo kernel: klogd 1.4.1, ---------- state change ----------
> Jun 18 22:35:59 bingo kernel: Inspecting /boot/System.map-2.4.18-4GB
> Jun 18 22:35:59 bingo kernel: Loaded 13574 symbols from /boot/System.map-2.4.18-4GB.
> Jun 18 22:35:59 bingo kernel: Symbols match kernel version 2.4.18.
> Jun 18 22:35:59 bingo kernel: Loaded 168 symbols from 13 modules.
> Jun 18 22:40:00 bingo /USR/SBIN/CRON[24006]: (root) CMD ( /usr/lib/sa/sa1 )
> Jun 18 22:50:00 bingo /USR/SBIN/CRON[24053]: (root) CMD ( /usr/lib/sa/sa1 )
> Jun 18 22:50:00 bingo kernel: <1>Unable to handle kernel paging request at virtual address 42627044
> Jun 18 22:50:00 bingo kernel: printing eip:
> Jun 18 22:50:00 bingo kernel: 42627044
> Jun 18 22:50:00 bingo kernel: *pde = 00000000
> Jun 18 22:50:00 bingo kernel: Oops: 0000
> Jun 18 22:50:00 bingo kernel: CPU: 0
> Jun 18 22:50:00 bingo kernel: EIP: 0010:[zisofs_cleanup+1113747380/-1072693392] Not tainted

Not that we can trust a second oops and I guess also the first oops
isn't near the bug, but are you using the zisofs actively or the above
is just a random EIP?

Also note if you exported the iso via nfs in the past without a reboot
and fsck -f in between, you could have mem/fs corruption that trigger
days after you stopped exporting iso via nfsd.

If you can reproduce that would be helpful.

thanks,

Andrea