2004-11-18 17:38:01

by martin f krafft

[permalink] [raw]
Subject: 2.6.9 nfsd crashing often

We upgraded our cluster master server to 2.6.9 this morning and are
now experiencing major problems with the kernel NFS server, which
crashes every hours or so:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
00000000
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: sd_mod scsi_mod af_packet ipv6 ipt_mac ipt_LOG ipt_limit ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables uhci_hcd ohci_hcd ehci_hcd usbcore 8139cp shpchp pciehp pci_hotplug via_agp agpgart pcspkr evdev 8139too mii crc32 via_ircc irda crc_ccitt nfsd exportfs lockd sunrpc capability commoncap ide_cd cdrom rtc isofs xfs ext2 ext3 jbd mbcache ide_generic via82cxxx ide_disk ide_core raid1 md unix fbcon font vesafb cfbcopyarea cfbimgblt cfbfillrect
CPU: 0
EIP: 0060:[<00000000>] Not tainted VLI
EFLAGS: 00010286 (2.6.9-1-k7)
EIP is at 0x0
eax: c0390f00 ebx: fffffff4 ecx: d131bae4 edx: d131bae4
esi: d1c6f22c edi: d131ba28 ebp: 00000000 esp: db4abeac
ds: 007b es: 007b ss: 0068
Process nfsd (pid: 2134, threadinfo=db4aa000 task=dea3e020)
Stack: c0166197 d1c6f22c d131ba28 00000000 ffffffff dfd49aba 0000000a dfd49ab0
c01661ef db4abee8 d131bab4 00000000 c0166261 db4abee8 d131bab4 ead0df9b
0000000a dfd49ab0 daf85804 d131bab4 daf85804 e0a99493 dfd49ab0 d131bab4
Call Trace:
[<c0166197>] __lookup_hash+0xa7/0xe0
[<c01661ef>] lookup_hash+0x1f/0x30
[<c0166261>] lookup_one_len+0x61/0x70
[<e0a99493>] nfsd_lookup+0x113/0x4e0 [nfsd]
[<e0aa26a1>] nfsd3_proc_lookup+0xa1/0xe0 [nfsd]
[<e0a96729>] nfsd_dispatch+0xd9/0x230 [nfsd]
[<e0a3e6ed>] svc_process+0x56d/0x780 [sunrpc]
[<c011a120>] default_wake_function+0x0/0x20
[<e0a96495>] nfsd+0x1f5/0x3b0 [nfsd]
[<e0a962a0>] nfsd+0x0/0x3b0 [nfsd]
[<c01042b1>] kernel_thread_helper+0x5/0x14
Code: Bad EIP value.

We had experienced this problem exactly once before with a 2.6.8.1
kernel, where it otherwise ran very stable.

There are no reasons to believe that the hardware is at fault.
Everything else seems to work fine, memtest86 does not report
any problems, and the harddrives are okay.

The machine runs Debian sarge. The kernel is a stock kernel.

The hardware is an AMD Athlon XP 2200+ with 512 Mb of RAM. /dev/hda
and /dev/hdc run as part of a software RAID 1.

0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
0000:00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233 PCI to ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400] (rev b2)

Reverting to 2.6.8.1 for now. I would appreciate any help, would be
glad to provide additional information, and hope that this is either
fixable, or not a kernel problem at all.

--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck

invalid/expired pgp subkeys? use subkeys.pgp.net as keyserver!
spamtraps: [email protected]

"verbing weirds language."
-- calvin


Attachments:
(No filename) (3.31 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2004-11-19 04:38:01

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.9 nfsd crashing often

On Thursday November 18, [email protected] wrote:
> We upgraded our cluster master server to 2.6.9 this morning and are
> now experiencing major problems with the kernel NFS server, which
> crashes every hours or so:
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> 00000000
...
> Call Trace:
> [<c0166197>] __lookup_hash+0xa7/0xe0

Looks like i_op->lookup == NULL, and I don't think nfsd could do that.

What filesystem are you using?

NeilBrown

2004-11-19 08:45:37

by martin f krafft

[permalink] [raw]
Subject: Re: 2.6.9 nfsd crashing often

also sprach Neil Brown <[email protected]> [2004.11.19.0542 +0100]:
> > Call Trace:
> > [<c0166197>] __lookup_hash+0xa7/0xe0
>
> Looks like i_op->lookup == NULL, and I don't think nfsd could do that.
>
> What filesystem are you using?

XFS.

Thanks for following up!

--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck

invalid/expired pgp subkeys? use subkeys.pgp.net as keyserver!
spamtraps: [email protected]

"moderation is a fatal thing. enough is as bad as a meal. more than
enough is as good as a feast."
-- oscar wilde


Attachments:
(No filename) (677.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2004-11-19 11:17:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.9 nfsd crashing often

On Fri, Nov 19, 2004 at 09:44:58AM +0100, martin f krafft wrote:
> also sprach Neil Brown <[email protected]> [2004.11.19.0542 +0100]:
> > > Call Trace:
> > > [<c0166197>] __lookup_hash+0xa7/0xe0
> >
> > Looks like i_op->lookup == NULL, and I don't think nfsd could do that.
> >
> > What filesystem are you using?
>
> XFS.
>
> Thanks for following up!

Care to check whether it goes away with the patch below?
(also I'm not sure 2.6.9 has all the relevant xfs and nfsd fixes,
updating to 2.6.10-rc is a good idea)


Index: fs/xfs/xfs_dmapi.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/xfs_dmapi.c,v
retrieving revision 1.108
diff -u -p -r1.108 xfs_dmapi.c
--- fs/xfs/xfs_dmapi.c 28 Oct 2004 22:43:01 -0000 1.108
+++ fs/xfs/xfs_dmapi.c 18 Nov 2004 13:01:48 -0000
@@ -2666,8 +2666,10 @@ xfs_dm_set_fileattr(
}

VOP_SETATTR(vp, &vat, ATTR_DMI, NULL, error);
- if (!error)
- vn_revalidate(vp); /* update Linux inode flags */
+ if (!error) {
+ vn_revalidate_core(vp, &vat);
+ VUNMODIFY(vp);
+ }
return(-error); /* Return negative error to DMAPI */
}

Index: fs/xfs/xfs_vnodeops.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/xfs_vnodeops.c,v
retrieving revision 1.636
diff -u -p -r1.636 xfs_vnodeops.c
--- fs/xfs/xfs_vnodeops.c 30 Sep 2004 03:40:12 -0000 1.636
+++ fs/xfs/xfs_vnodeops.c 18 Nov 2004 13:01:50 -0000
@@ -909,6 +909,9 @@ xfs_setattr(
mandlock_after);
}

+ vap->va_mask = XFS_AT_STAT|XFS_AT_XFLAGS;
+ code = xfs_getattr(bdp, vap, ATTR_LAZY, credp);
+
xfs_iunlock(ip, lock_flags);

/*
@@ -929,6 +932,7 @@ xfs_setattr(
NULL, DM_RIGHT_NULL, NULL, NULL,
0, 0, AT_DELAY_FLAG(flags));
}
+
return 0;

abort_return:
Index: fs/xfs/linux-2.6/xfs_file.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_file.c,v
retrieving revision 1.109
diff -u -p -r1.109 xfs_file.c
--- fs/xfs/linux-2.6/xfs_file.c 26 Oct 2004 22:56:13 -0000 1.109
+++ fs/xfs/linux-2.6/xfs_file.c 18 Nov 2004 13:01:51 -0000
@@ -409,8 +409,11 @@ linvfs_file_mmap(
vma->vm_ops = &linvfs_file_vm_ops;

VOP_SETATTR(vp, &va, XFS_AT_UPDATIME, NULL, error);
- if (!error)
- vn_revalidate(vp); /* update Linux inode flags */
+ if (!error) {
+ vn_revalidate_core(vp, &va);
+ VUNMODIFY(vp);
+ }
+
return 0;
}

Index: fs/xfs/linux-2.6/xfs_ioctl.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_ioctl.c,v
retrieving revision 1.116
diff -u -p -r1.116 xfs_ioctl.c
--- fs/xfs/linux-2.6/xfs_ioctl.c 27 Oct 2004 12:06:24 -0000 1.116
+++ fs/xfs/linux-2.6/xfs_ioctl.c 18 Nov 2004 13:01:51 -0000
@@ -1102,8 +1102,10 @@ xfs_ioc_xattr(
va.va_extsize = fa.fsx_extsize;

VOP_SETATTR(vp, &va, attr_flags, NULL, error);
- if (!error)
- vn_revalidate(vp); /* update Linux inode flags */
+ if (!error) {
+ vn_revalidate_core(vp, &va);
+ VUNMODIFY(vp);
+ }
return -error;
}

@@ -1147,8 +1149,10 @@ xfs_ioc_xattr(
xfs_dic2xflags(&ip->i_d, ARCH_NOCONVERT));

VOP_SETATTR(vp, &va, attr_flags, NULL, error);
- if (!error)
- vn_revalidate(vp); /* update Linux inode flags */
+ if (!error) {
+ vn_revalidate_core(vp, &va);
+ VUNMODIFY(vp);
+ }
return -error;
}

Index: fs/xfs/linux-2.6/xfs_iops.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_iops.c,v
retrieving revision 1.223
diff -u -p -r1.223 xfs_iops.c
--- fs/xfs/linux-2.6/xfs_iops.c 1 Oct 2004 05:55:52 -0000 1.223
+++ fs/xfs/linux-2.6/xfs_iops.c 18 Nov 2004 13:01:51 -0000
@@ -531,11 +531,10 @@ linvfs_setattr(
vattr.va_mask |= XFS_AT_CTIME;
vattr.va_ctime = attr->ia_ctime;
}
+
if (ia_valid & ATTR_MODE) {
vattr.va_mask |= XFS_AT_MODE;
vattr.va_mode = attr->ia_mode;
- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- inode->i_mode &= ~S_ISGID;
}

if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET))
@@ -546,10 +545,11 @@ linvfs_setattr(
#endif

VOP_SETATTR(vp, &vattr, flags, NULL, error);
- if (error)
- return -error;
- vn_revalidate(vp);
- return error;
+ if (!error) {
+ vn_revalidate_core(vp, &vattr);
+ VUNMODIFY(vp);
+ }
+ return -error;
}

STATIC void