2004-11-22 19:10:39

by Phil Dier

[permalink] [raw]
Subject: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Hi,

I'm setting up a storage array with Linux, software RAID, LVM, and XFS,
but I keep getting oopses during heavy I/O. I've been able to reproduce
this with 2.6.6, 2.6.8.1, 2.6.9, and 2.6.10-rc2-bk4. I have dual xeon
2.8s with 4gb of ram. I'm using adaptec and a fusion mpt scsi devices
(more details in the following link). Connected are 2 ultra160 scsi
jbods w/ 2 disks apiece. I'm using raid 10 (or should it be 01?) mirrored
stripes.

Due to its size, I've posted my debug info at this location (I've included
output from all of the above kernels):

<http://www.icglink.com/cluster-debug-info.html> (~235kb)

Please let me know if I've left anything out that would help in locating
the source of the problem. I'm very willing to try out any patches/config
changes.

please cc me on any replies, as I am not subscribed to the list...

Thanks,

--

Phil Dier (ICGLink.com -- 615 370-1530 x733)

/* vim:set noai nocindent ts=8 sw=8: */


2004-11-23 00:19:28

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Phil Dier <[email protected]> wrote:
>
> I'm setting up a storage array with Linux, software RAID, LVM, and XFS,
> but I keep getting oopses during heavy I/O. I've been able to reproduce
> this with 2.6.6, 2.6.8.1, 2.6.9, and 2.6.10-rc2-bk4. I have dual xeon
> 2.8s with 4gb of ram. I'm using adaptec and a fusion mpt scsi devices
> (more details in the following link). Connected are 2 ultra160 scsi
> jbods w/ 2 disks apiece. I'm using raid 10 (or should it be 01?) mirrored
> stripes.
>
> Due to its size, I've posted my debug info at this location (I've included
> output from all of the above kernels):
>
> <http://www.icglink.com/cluster-debug-info.html> (~235kb)

yow. The dread combination of XFS, LVM, software RAID and bloaty scsi
drivers. Looks like a stack overrun.

Can you rebuild the kernel with CONFIG_4KSTACKS=n?

2004-11-23 15:54:26

by Phil Dier

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Mon, 22 Nov 2004 16:17:25 -0800
Andrew Morton <[email protected]> wrote:

> yow. The dread combination of XFS, LVM, software RAID and bloaty scsi
> drivers. Looks like a stack overrun.
>
> Can you rebuild the kernel with CONFIG_4KSTACKS=n?
>

Thanks for the suggestion.. I'm doing a burn-in right now with 8k
stacks, and so far, so good.

I'm building this system with stability and flexibility foremost in
mind. Am I foolish in using all of these technologies with a new-ish
version of 2.6? Is there a particular version that would be better
suited for my application? Any other suggestions you (or anyone else
on the list) could give regarding stability would be greatly appreciated.

Thanks,

--

Phil Dier (ICGLink.com -- 615 370-1530 x733)

/* vim:set noai nocindent ts=8 sw=8: */

2004-11-23 18:41:45

by Phil Dier

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Tue, 23 Nov 2004 18:02:23 +0100
Jakob Oestergaard <[email protected]> wrote:

> If you'll be exporting via. NFS, it seems that there are still problems
> with XFS+NFS.
>
> With SMP, what I see is that sometimes a directory might decide that
> it's a file - but I can't delete it, becuase it isn't 'empty' (it's
> still somehow a directory). Waiting a day or two, the system will
> change its mind back to letting the directory be a directory. Sometimes
> modes will be fscked up as well - a regular file can change owner, or it
> can change modes from '-rw-rw---' to '?---------'. Weird stuff, no
> way to reproduce it reliably.
>
> With UP, I know someone who's seeing stale handles reported by the NFS
> server. The only known workaround is to stat the directories in question
> on the *server* side - a little bash with 'while true; sleep 5; ls -l
> /directory; do' will do the trick.
>
> All of what I describe here are production environments - so it sucks to
> have that kind of problems. Some of it can be reproduced (the stale
> handle errors), and some of it can't.
>
> I guess the good news would be, that I don't know of any problems with
> XFS+LVM+MD if you do not export the FS via. NFS :)
>
> That is, if you run 2.6.9. Any earlier kernel will b0rk your XFS under
> load.

Thanks for the tips, Jakob.

I *will* be exporting via NFS, so this is definetly good to know. I've
been looking at using jfs and reiser as well, but some preliminary
benchmarks suggested that xfs was the best performer for the kind of
workload that I'm anticipating. I guess xfs is out of the question now,
as I definetly don't want to deal with weird interactions like that.

Can anyone speak on the stability of (reiser|jfs|other) with nfs? My
biggest requirements are online resizing and stability (ext3 online
resize is still beta IIRC, but I wouldn't be opposed to using it if
someone could tell me otherwise); speed would be nice, but I'm willing
to sacrifice speed for the sake of reliability.

I'm personally using lvm + reiser + nfs without consequence on my
fileserver at home, but it's not seeing nearly the loads that this box
is going to see.

Thanks again,
--

Phil Dier (ICGLink.com -- 615 370-1530 x733)

/* vim:set noai nocindent ts=8 sw=8: */

2004-11-23 21:38:20

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Tue, Nov 23, 2004 at 09:37:44AM -0600, Phil Dier wrote:
...
> I'm building this system with stability and flexibility foremost in
> mind. Am I foolish in using all of these technologies with a new-ish
> version of 2.6? Is there a particular version that would be better
> suited for my application? Any other suggestions you (or anyone else
> on the list) could give regarding stability would be greatly appreciated.

If you'll be exporting via. NFS, it seems that there are still problems
with XFS+NFS.

With SMP, what I see is that sometimes a directory might decide that
it's a file - but I can't delete it, becuase it isn't 'empty' (it's
still somehow a directory). Waiting a day or two, the system will
change its mind back to letting the directory be a directory. Sometimes
modes will be fscked up as well - a regular file can change owner, or it
can change modes from '-rw-rw---' to '?---------'. Weird stuff, no
way to reproduce it reliably.

With UP, I know someone who's seeing stale handles reported by the NFS
server. The only known workaround is to stat the directories in question
on the *server* side - a little bash with 'while true; sleep 5; ls -l
/directory; do' will do the trick.

All of what I describe here are production environments - so it sucks to
have that kind of problems. Some of it can be reproduced (the stale
handle errors), and some of it can't.

I guess the good news would be, that I don't know of any problems with
XFS+LVM+MD if you do not export the FS via. NFS :)

That is, if you run 2.6.9. Any earlier kernel will b0rk your XFS under
load.

--

/ jakob

2004-11-23 22:43:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Tue, Nov 23, 2004 at 06:02:23PM +0100, Jakob Oestergaard wrote:
> With SMP, what I see is that sometimes a directory might decide that
> it's a file - but I can't delete it, becuase it isn't 'empty' (it's
> still somehow a directory). Waiting a day or two, the system will
> change its mind back to letting the directory be a directory. Sometimes
> modes will be fscked up as well - a regular file can change owner, or it
> can change modes from '-rw-rw---' to '?---------'. Weird stuff, no
> way to reproduce it reliably.

Actually I can reproduce it reliably by running nfs_fsstress.sh for a
looong time. The problem is that in the current XFS code the inode
generation counter starts at 0, but higher level code uses that as
a wildcard for any possible generation, so you may get a newly created
file for a stale nfs file handler of an deleted file with the same inode
number.

The patch below fixes it for me:


Index: fs/xfs/xfs_inode.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/xfs_inode.c,v
retrieving revision 1.406
diff -u -p -r1.406 xfs_inode.c
--- fs/xfs/xfs_inode.c 27 Oct 2004 12:06:24 -0000 1.406
+++ fs/xfs/xfs_inode.c 23 Nov 2004 20:40:56 -0000
@@ -1224,9 +1224,16 @@ xfs_ialloc(
ip->i_d.di_nextents = 0;
ASSERT(ip->i_d.di_nblocks == 0);
xfs_ichgtime(ip, XFS_ICHGTIME_CHG|XFS_ICHGTIME_ACC|XFS_ICHGTIME_MOD);
+
/*
- * di_gen will have been taken care of in xfs_iread.
+ * Bump the generation count so no one will confuse us with an
+ * earlier incarnations of this inode.
+ *
+ * Done early to skip generation 0, which is used as a wildcard
+ * by higher level code.
*/
+ ip->i_d.di_gen++;
+
ip->i_d.di_extsize = 0;
ip->i_d.di_dmevmask = 0;
ip->i_d.di_dmstate = 0;
@@ -2370,11 +2377,6 @@ xfs_ifree(
XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
- /*
- * Bump the generation count so no one will be confused
- * by reincarnations of this inode.
- */
- ip->i_d.di_gen++;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);

if (delete) {

2004-11-23 23:15:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Tue, Nov 23, 2004 at 11:56:50PM +0100, Jakob Oestergaard wrote:
> Very nice!
>
> Is that patch on its way into mainline kernels, or is it waiting for
> more test data ?
>
> I could apply it and test it here if that would help (?)

It's waiting for review right now, but should go into mainline fairly
soon. Additional testing is of course always welcome.

2004-11-23 22:59:48

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Tue, Nov 23, 2004 at 10:39:35PM +0000, Christoph Hellwig wrote:
> On Tue, Nov 23, 2004 at 06:02:23PM +0100, Jakob Oestergaard wrote:
> > With SMP, what I see is that sometimes a directory might decide that
> > it's a file - but I can't delete it, becuase it isn't 'empty' (it's
> > still somehow a directory). Waiting a day or two, the system will
> > change its mind back to letting the directory be a directory. Sometimes
> > modes will be fscked up as well - a regular file can change owner, or it
> > can change modes from '-rw-rw---' to '?---------'. Weird stuff, no
> > way to reproduce it reliably.
>
> Actually I can reproduce it reliably by running nfs_fsstress.sh for a
> looong time. The problem is that in the current XFS code the inode
> generation counter starts at 0, but higher level code uses that as
> a wildcard for any possible generation, so you may get a newly created
> file for a stale nfs file handler of an deleted file with the same inode
> number.
>
> The patch below fixes it for me:

Very nice!

Is that patch on its way into mainline kernels, or is it waiting for
more test data ?

I could apply it and test it here if that would help (?)

--

/ jakob

2004-11-24 16:50:21

by Phil Dier

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Mon, 22 Nov 2004 16:17:25 -0800
Andrew Morton <[email protected]> wrote:

> Phil Dier <[email protected]> wrote:
> >
> > I'm setting up a storage array with Linux, software RAID, LVM, and XFS,
> > but I keep getting oopses during heavy I/O. I've been able to reproduce
> > this with 2.6.6, 2.6.8.1, 2.6.9, and 2.6.10-rc2-bk4. I have dual xeon
> > 2.8s with 4gb of ram. I'm using adaptec and a fusion mpt scsi devices
> > (more details in the following link). Connected are 2 ultra160 scsi
> > jbods w/ 2 disks apiece. I'm using raid 10 (or should it be 01?) mirrored
> > stripes.
> >
> > Due to its size, I've posted my debug info at this location (I've included
> > output from all of the above kernels):
> >
> > <http://www.icglink.com/cluster-debug-info.html> (~235kb)
>
> yow. The dread combination of XFS, LVM, software RAID and bloaty scsi
> drivers. Looks like a stack overrun.
>
> Can you rebuild the kernel with CONFIG_4KSTACKS=n?
>


Looks like 8k stacks did the trick, at least for the oops. Now I'm
seeing the stuff below.

I got a ton more of this with jfs and xfs, but it seems much less with
reiser. Should I be worried, or is this something I can safely ignore?
It doesn't lock the system.. Could files be getting corrupted?


Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20
Nov 23 17:38:20 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
Nov 23 17:38:20 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
Nov 23 17:38:20 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
Nov 23 17:38:20 calculon [<c0140813>] alloc_slabmgmt+0x55/0x5f
Nov 23 17:38:20 calculon [<c0140992>] cache_grow+0xab/0x14d
Nov 23 17:38:20 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
Nov 23 17:38:20 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
Nov 23 17:38:20 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
Nov 23 17:38:20 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
Nov 23 17:38:20 calculon [<c032e8e0>] e1000_clean_rx_irq+0x189/0x44a
Nov 23 17:38:20 calculon [<c032e4f2>] e1000_intr+0x36/0x83
Nov 23 17:38:20 calculon [<c0107899>] handle_IRQ_event+0x31/0x65
Nov 23 17:38:20 calculon [<c0107c19>] do_IRQ+0xb0/0x15f
Nov 23 17:38:20 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 23 17:38:20 calculon [<c010301e>] default_idle+0x0/0x2c
Nov 23 17:38:20 calculon [<c0103047>] default_idle+0x29/0x2c
Nov 23 17:38:20 calculon [<c01030bc>] cpu_idle+0x3f/0x58
Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20
Nov 23 17:38:20 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
Nov 23 17:38:20 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
Nov 23 17:38:20 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
Nov 23 17:38:20 calculon [<c0140992>] cache_grow+0xab/0x14d
Nov 23 17:38:20 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
Nov 23 17:38:20 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
Nov 23 17:38:20 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
Nov 23 17:38:20 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
Nov 23 17:38:20 calculon [<c032e8e0>] e1000_clean_rx_irq+0x189/0x44a
Nov 23 17:38:20 calculon [<c032e4f2>] e1000_intr+0x36/0x83
Nov 23 17:38:20 calculon [<c0107899>] handle_IRQ_event+0x31/0x65
Nov 23 17:38:20 calculon [<c0107c19>] do_IRQ+0xb0/0x15f
Nov 23 17:38:20 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 23 17:38:20 calculon [<c010301e>] default_idle+0x0/0x2c
Nov 23 17:38:20 calculon [<c0103047>] default_idle+0x29/0x2c
Nov 23 17:38:20 calculon [<c01030bc>] cpu_idle+0x3f/0x58
Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20
Nov 23 17:38:20 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
Nov 23 17:38:20 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
Nov 23 17:38:20 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
Nov 23 17:38:20 calculon [<c0140992>] cache_grow+0xab/0x14d
Nov 23 17:38:20 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
Nov 23 17:38:20 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
Nov 23 17:38:20 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
Nov 23 17:38:20 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
Nov 23 17:38:20 calculon [<c032e8e0>] e1000_clean_rx_irq+0x189/0x44a
Nov 23 17:38:20 calculon [<c032e4f2>] e1000_intr+0x36/0x83
Nov 23 17:38:20 calculon [<c0107899>] handle_IRQ_event+0x31/0x65
Nov 23 17:38:20 calculon [<c0107c19>] do_IRQ+0xb0/0x15f
Nov 23 17:38:20 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 23 17:38:20 calculon [<c010301e>] default_idle+0x0/0x2c
Nov 23 17:38:20 calculon [<c0103047>] default_idle+0x29/0x2c
Nov 23 17:38:20 calculon [<c01030bc>] cpu_idle+0x3f/0x58
Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20
Nov 23 17:38:20 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
Nov 23 17:38:20 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
Nov 23 17:38:20 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
Nov 23 17:38:20 calculon [<c0140992>] cache_grow+0xab/0x14d
Nov 23 17:38:20 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
Nov 23 17:38:20 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
Nov 23 17:38:20 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
Nov 23 17:38:20 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
Nov 23 17:38:20 calculon [<c032e8e0>] e1000_clean_rx_irq+0x189/0x44a
Nov 23 17:38:20 calculon [<c0140ff0>] __kmalloc+0x77/0x8c
Nov 23 17:38:20 calculon [<c032e4f2>] e1000_intr+0x36/0x83
Nov 23 17:38:20 calculon [<c03f5020>] alloc_skb+0xde/0xe0
Nov 23 17:38:20 calculon [<c0107899>] handle_IRQ_event+0x31/0x65
Nov 23 17:38:20 calculon [<c0107c19>] do_IRQ+0xb0/0x15f
Nov 23 17:38:20 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 23 17:38:20 calculon [<c03fb243>] net_rx_action+0x62/0xf6
Nov 23 17:38:20 calculon [<c0121beb>] __do_softirq+0xb7/0xc6
Nov 23 17:38:20 calculon [<c0121c27>] do_softirq+0x2d/0x2f
Nov 23 17:38:20 calculon [<c0107c8d>] do_IRQ+0x124/0x15f
Nov 23 17:38:20 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 23 17:38:20 calculon [<c010301e>] default_idle+0x0/0x2c
Nov 23 17:38:20 calculon [<c0103047>] default_idle+0x29/0x2c
Nov 23 17:38:20 calculon [<c01030bc>] cpu_idle+0x3f/0x58

Nov 24 01:18:09 calculon swapper: page allocation failure. order:0, mode:0x20
Nov 24 01:18:09 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
Nov 24 01:18:09 calculon [<c040ce57>] ip_local_deliver_finish+0x0/0x181
Nov 24 01:18:09 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
Nov 24 01:18:09 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
Nov 24 01:18:09 calculon [<c0140813>] alloc_slabmgmt+0x55/0x5f
Nov 24 01:18:09 calculon [<c0140992>] cache_grow+0xab/0x14d
Nov 24 01:18:09 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
Nov 24 01:18:09 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
Nov 24 01:18:09 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
Nov 24 01:18:09 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
Nov 24 01:18:09 calculon [<c032e8e0>] e1000_clean_rx_irq+0x189/0x44a
Nov 24 01:18:09 calculon [<c012d45d>] rcu_check_quiescent_state+0x78/0x8e
Nov 24 01:18:09 calculon [<c032e4f2>] e1000_intr+0x36/0x83
Nov 24 01:18:09 calculon [<c0107899>] handle_IRQ_event+0x31/0x65
Nov 24 01:18:09 calculon [<c0107c19>] do_IRQ+0xb0/0x15f
Nov 24 01:18:09 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 24 01:18:09 calculon [<c010301e>] default_idle+0x0/0x2c
Nov 24 01:18:09 calculon [<c0103047>] default_idle+0x29/0x2c
Nov 24 01:18:09 calculon [<c01030bc>] cpu_idle+0x3f/0x58
Nov 24 01:18:09 calculon swapper: page allocation failure. order:0, mode:0x20
Nov 24 01:18:09 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
Nov 24 01:18:09 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
Nov 24 01:18:09 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
Nov 24 01:18:09 calculon [<c0140992>] cache_grow+0xab/0x14d
Nov 24 01:18:09 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
Nov 24 01:18:09 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
Nov 24 01:18:09 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
Nov 24 01:18:09 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
Nov 24 01:18:09 calculon [<c032e8e0>] e1000_clean_rx_irq+0x189/0x44a
Nov 24 01:18:09 calculon [<c012d45d>] rcu_check_quiescent_state+0x78/0x8e
Nov 24 01:18:09 calculon [<c032e4f2>] e1000_intr+0x36/0x83
Nov 24 01:18:09 calculon [<c0107899>] handle_IRQ_event+0x31/0x65
Nov 24 01:18:09 calculon [<c0107c19>] do_IRQ+0xb0/0x15f
Nov 24 01:18:09 calculon [<c0105a68>] common_interrupt+0x18/0x20
Nov 24 01:18:09 calculon [<c010301e>] default_idle+0x0/0x2c
Nov 24 01:18:09 calculon [<c0103047>] default_idle+0x29/0x2c
Nov 24 01:18:09 calculon [<c01030bc>] cpu_idle+0x3f/0x58



--

Phil Dier (ICGLink.com -- 615 370-1530 x733)

/* vim:set noai nocindent ts=8 sw=8: */

2004-11-24 17:36:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Wed, Nov 24, 2004 at 09:45:49AM -0600, Phil Dier wrote:
> Looks like 8k stacks did the trick, at least for the oops. Now I'm
> seeing the stuff below.
>
> I got a ton more of this with jfs and xfs, but it seems much less with
> reiser. Should I be worried, or is this something I can safely ignore?
> It doesn't lock the system.. Could files be getting corrupted?
>
>
> Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20

This is pretty harmless. It just means the NIC driver couldn't allocate as
much memory in the RX path as it wanted. Try increasing
/proc/sys/vm/min_free_kbytes to make the warnings go away and get less packet
drops

2004-11-24 23:12:15

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Phil Dier <[email protected]> wrote:
>
> > Can you rebuild the kernel with CONFIG_4KSTACKS=n?
> >
>
>
> Looks like 8k stacks did the trick, at least for the oops. Now I'm
> seeing the stuff below.
>
> I got a ton more of this with jfs and xfs, but it seems much less with
> reiser. Should I be worried, or is this something I can safely ignore?
> It doesn't lock the system.. Could files be getting corrupted?
>
>
> Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20
> Nov 23 17:38:20 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
> Nov 23 17:38:20 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
> Nov 23 17:38:20 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
> Nov 23 17:38:20 calculon [<c0140813>] alloc_slabmgmt+0x55/0x5f
> Nov 23 17:38:20 calculon [<c0140992>] cache_grow+0xab/0x14d
> Nov 23 17:38:20 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
> Nov 23 17:38:20 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
> Nov 23 17:38:20 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
> Nov 23 17:38:20 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3

You didn't mention the kernel version. 2.6.9 had problems in this area, so
2.6.10-rc2 should be better. And there are post-2.6.10-rc2 fixes which
will provide more headroom.

2004-11-24 23:36:37

by NeilBrown

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Monday November 22, [email protected] wrote:
> > <http://www.icglink.com/cluster-debug-info.html> (~235kb)
>
> yow. The dread combination of XFS, LVM, software RAID and bloaty scsi
> drivers. Looks like a stack overrun.
>
> Can you rebuild the kernel with CONFIG_4KSTACKS=n?
>

Would the following (untested-but-seems-to-compile -
explanation-of-concept) patch be at all reasonable to avoid stack
depth problems with stacked block devices, or is adding stuff to
task_struct frowned upon?

NeilBrown

==============================================
Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/block/ll_rw_blk.c | 38 +++++++++++++++++++++++++++++++++++++-
./include/linux/sched.h | 3 +++
2 files changed, 40 insertions(+), 1 deletion(-)

diff ./drivers/block/ll_rw_blk.c~current~ ./drivers/block/ll_rw_blk.c
--- ./drivers/block/ll_rw_blk.c~current~ 2004-11-16 15:55:55.000000000 +1100
+++ ./drivers/block/ll_rw_blk.c 2004-11-25 10:05:14.000000000 +1100
@@ -2609,7 +2609,7 @@ static inline void block_wait_queue_runn
* bi_sector for remaps as it sees fit. So the values of these fields
* should NOT be depended on after the call to generic_make_request.
*/
-void generic_make_request(struct bio *bio)
+static inline void __generic_make_request(struct bio *bio)
{
request_queue_t *q;
sector_t maxsector;
@@ -2686,6 +2686,42 @@ end_io:
} while (ret);
}

+/*
+ * We only want one ->make_request_fn to be active at a time,
+ * else stack usage with stacked devices could be a problem.
+ * So use current->bio_{list,tail} to keep a list of requests
+ * submited by a make_request_fn function.
+ * current->bio_tail is also used as a flag to say if
+ * generic_make_request is currently activce in this task or not.
+ * If it is NULL, then no make_request is active. If it is non-NULL,
+ * then a make_request is active, and new requests should be added
+ * at the tail
+ */
+void generic_make_request(struct bio *bio)
+{
+ if (current->bio_tail) {
+ /* make_request is active */
+ *(current->bio_tail) = bio;
+ bio->bi_next = NULL;
+ current->bio_tail = &bio->bi_next;
+ return;
+ }
+ /* not active yet, make it active */
+ current->bio_list = NULL;
+ current->bio_tail = & current->bio_list;
+ __generic_make_request(bio);
+ while (current->bio_list) {
+ bio = current->bio_list;
+ current->bio_list = bio->bi_next;
+ if (bio->bi_next == NULL)
+ current->bio_tail = &current->bio_list;
+ else
+ bio->bi_next = NULL;
+ __generic_make_request(bio);
+ }
+ current->bio_tail = NULL; /* deactivate */
+}
+
EXPORT_SYMBOL(generic_make_request);

/**

diff ./include/linux/sched.h~current~ ./include/linux/sched.h
--- ./include/linux/sched.h~current~ 2004-11-25 09:57:07.000000000 +1100
+++ ./include/linux/sched.h 2004-11-25 09:57:34.000000000 +1100
@@ -649,6 +649,9 @@ struct task_struct {

/* journalling filesystem info */
void *journal_info;
+
+/* stacked block device info */
+ struct bio *bio_list, **bio_tail;

/* VM state */
struct reclaim_state *reclaim_state;

2004-11-25 00:18:05

by NeilBrown

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Wednesday November 24, [email protected] wrote:
> Neil Brown <[email protected]> wrote:
> >
> > Would the following (untested-but-seems-to-compile -
> > explanation-of-concept) patch be at all reasonable to avoid stack
> > depth problems with stacked block devices, or is adding stuff to
> > task_struct frowned upon?
>
> It's always a tradeoff - we've put things in task_struct before to get
> around sticky situations. Certainly, removing potentially unbounded stack
> utilisation is a worthwhile thing to do.
>
> The patch bends my brain a bit.

Recursion is like that (... like recursion, that is :-).

> Shouldn't the queueing happen in
> submit_bio()?

Both md and dm call generic_make_request rather than submit_bio to
start IO on slaves, so it wouldn't work in submit_bio. If dm and md
were changes to use submit_bio, then the counts (page-in, page-out)
would be quite different...

>
> Is bi_next free in there? If anyone tries to do synchronous I/O things
> will get stuck.

It is my understanding the bi_next is free. It is available for use
by ->make_request_fn and below. __make_request uses it for chaining
bio's together into a request. raid5 uses it for other things.

If a ->make_request_fn did synchronous IO things would definitely get
unstuck. But I don't think they should and doubt if they do (md
certainly doesn't).

NeilBrown

2004-11-25 00:20:41

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Neil Brown <[email protected]> wrote:
>
> Would the following (untested-but-seems-to-compile -
> explanation-of-concept) patch be at all reasonable to avoid stack
> depth problems with stacked block devices, or is adding stuff to
> task_struct frowned upon?

It's always a tradeoff - we've put things in task_struct before to get
around sticky situations. Certainly, removing potentially unbounded stack
utilisation is a worthwhile thing to do.

The patch bends my brain a bit. Shouldn't the queueing happen in
submit_bio()?

Is bi_next free in there? If anyone tries to do synchronous I/O things
will get stuck.

2004-11-25 01:15:47

by Phil Dier

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Wed, 24 Nov 2004 15:12:34 -0800
Andrew Morton <[email protected]> wrote:

> You didn't mention the kernel version. 2.6.9 had problems in this
> area, so 2.6.10-rc2 should be better. And there are post-2.6.10-rc2
> fixes which will provide more headroom.
>

Sorry, yes, it is 2.6.9 that I'm using atm. I pushed
/proc/sys/vm/min_free_kbytes up to 2048 (it was at 987 or something)
as Christoph suggested and so far, so good. It was such an infrequent
thing though, it's hard to tell if it did any good. I left some stuff
hammering on the array to run over the holiday break, so hopefully any
bad stuff will shake out. I'll give 2.6.10-rc2+ a whirl when I get back
on monday.


Thanks everyone,

Phil

2004-11-25 01:20:14

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Neil Brown <[email protected]> wrote:
>
> If a ->make_request_fn did synchronous IO things would definitely get
> unstuck. But I don't think they should and doubt if they do (md
> certainly doesn't).

generic_make_request() can block in get_request_wait(), but I can't
immediately think of a way in which that can deadlock things, especially if
each level is using a distinct queue.

It could certainly deadlock if a higher-level make_request() caller
required allocation of two or more requests at a lower level - all we'd
need is N/2 proceses each trying to allocate two requests. But such a
lockup could happen in the current code anyway..

2004-11-25 06:58:29

by Jens Axboe

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Thu, Nov 25 2004, Neil Brown wrote:
> On Wednesday November 24, [email protected] wrote:
> > Neil Brown <[email protected]> wrote:
> > >
> > > Would the following (untested-but-seems-to-compile -
> > > explanation-of-concept) patch be at all reasonable to avoid stack
> > > depth problems with stacked block devices, or is adding stuff to
> > > task_struct frowned upon?
> >
> > It's always a tradeoff - we've put things in task_struct before to get
> > around sticky situations. Certainly, removing potentially unbounded stack
> > utilisation is a worthwhile thing to do.
> >
> > The patch bends my brain a bit.
>
> Recursion is like that (... like recursion, that is :-).

Pardon my ignorance, but where is the bug that called for something like
this? I can't say I love the idea of adding a bio list structure to the
tasklist, it feels pretty hacky. generic_make_request() doesn't really
use that much stack, if you just kill the BDEVNAME_SIZE struct.

===== drivers/block/ll_rw_blk.c 1.280 vs edited =====
--- 1.280/drivers/block/ll_rw_blk.c 2004-11-15 11:21:40 +01:00
+++ edited/drivers/block/ll_rw_blk.c 2004-11-25 07:56:10 +01:00
@@ -67,6 +67,11 @@
EXPORT_SYMBOL(blk_max_low_pfn);
EXPORT_SYMBOL(blk_max_pfn);

+struct b_name {
+ char b[BDEVNAME_SIZE];
+};
+static DEFINE_PER_CPU(struct b_name, b_cpu_name);
+
/* Amount of time in which a process may batch requests */
#define BLK_BATCH_TIME (HZ/50UL)

@@ -2622,19 +2627,21 @@

if (maxsector < nr_sectors ||
maxsector - nr_sectors < sector) {
- char b[BDEVNAME_SIZE];
+ struct b_name *bn = &get_cpu_var(b_cpu_name);
+
/* This may well happen - the kernel calls
* bread() without checking the size of the
* device, e.g., when mounting a device. */
printk(KERN_INFO
"attempt to access beyond end of device\n");
printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
- bdevname(bio->bi_bdev, b),
+ bdevname(bio->bi_bdev, bn->b),
bio->bi_rw,
(unsigned long long) sector + nr_sectors,
(long long) maxsector);

set_bit(BIO_EOF, &bio->bi_flags);
+ put_cpu_var(bn);
goto end_io;
}
}

> > Shouldn't the queueing happen in
> > submit_bio()?
>
> Both md and dm call generic_make_request rather than submit_bio to
> start IO on slaves, so it wouldn't work in submit_bio. If dm and md
> were changes to use submit_bio, then the counts (page-in, page-out)
> would be quite different...

generic_make_request() has always been where the unstacking has
happened, so yeah submit_bio() would not work.

> >
> > Is bi_next free in there? If anyone tries to do synchronous I/O things
> > will get stuck.
>
> It is my understanding the bi_next is free. It is available for use
> by ->make_request_fn and below. __make_request uses it for chaining
> bio's together into a request. raid5 uses it for other things.

That's correct, bi_next is only used for request chaining. So it's
available for free use by the stacking drivers up until they call
make_request on a bio.

> If a ->make_request_fn did synchronous IO things would definitely get
> unstuck. But I don't think they should and doubt if they do (md
> certainly doesn't).

There's nothing guaranteeing that a make_request would not do sync io.

--
Jens Axboe

2004-11-25 07:14:57

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Jens Axboe <[email protected]> wrote:
>
> On Thu, Nov 25 2004, Neil Brown wrote:
> > On Wednesday November 24, [email protected] wrote:
> > > Neil Brown <[email protected]> wrote:
> > > >
> > > > Would the following (untested-but-seems-to-compile -
> > > > explanation-of-concept) patch be at all reasonable to avoid stack
> > > > depth problems with stacked block devices, or is adding stuff to
> > > > task_struct frowned upon?
> > >
> > > It's always a tradeoff - we've put things in task_struct before to get
> > > around sticky situations. Certainly, removing potentially unbounded stack
> > > utilisation is a worthwhile thing to do.
> > >
> > > The patch bends my brain a bit.
> >
> > Recursion is like that (... like recursion, that is :-).
>
> Pardon my ignorance, but where is the bug that called for something like
> this?

Well there was an xfs-on-raid-on-lvm stack overrun reported, but the
general problem we're addressing here is that stacking drivers can cause
arbitrary amounts of kernel stack windup.

> I can't say I love the idea of adding a bio list structure to the
> tasklist, it feels pretty hacky. generic_make_request() doesn't really
> use that much stack, if you just kill the BDEVNAME_SIZE struct.

Looks like a sensible thing to do, although it would be tidier to move the
whole thing into a separate function, no?


--- 25/drivers/block/ll_rw_blk.c~generic_make_request-stack-savings 2004-11-24 23:03:06.347778648 -0800
+++ 25-akpm/drivers/block/ll_rw_blk.c 2004-11-24 23:07:39.798207864 -0800
@@ -2584,6 +2584,20 @@ static inline void block_wait_queue_runn
}
}

+static void handle_bad_sector(struct bio *bio)
+{
+ char b[BDEVNAME_SIZE];
+
+ printk(KERN_INFO "attempt to access beyond end of device\n");
+ printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
+ bdevname(bio->bi_bdev, b),
+ bio->bi_rw,
+ (unsigned long long)bio->bi_sector + bio_sectors(bio),
+ (long long)(bio->bi_bdev->bd_inode->i_size >> 9));
+
+ set_bit(BIO_EOF, &bio->bi_flags);
+}
+
/**
* generic_make_request: hand a buffer to its device driver for I/O
* @bio: The bio describing the location in memory and on the device.
@@ -2620,21 +2634,13 @@ void generic_make_request(struct bio *bi
if (maxsector) {
sector_t sector = bio->bi_sector;

- if (maxsector < nr_sectors ||
- maxsector - nr_sectors < sector) {
- char b[BDEVNAME_SIZE];
- /* This may well happen - the kernel calls
- * bread() without checking the size of the
- * device, e.g., when mounting a device. */
- printk(KERN_INFO
- "attempt to access beyond end of device\n");
- printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
- bdevname(bio->bi_bdev, b),
- bio->bi_rw,
- (unsigned long long) sector + nr_sectors,
- (long long) maxsector);
-
- set_bit(BIO_EOF, &bio->bi_flags);
+ if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
+ /*
+ * This may well happen - the kernel calls bread()
+ * without checking the size of the device, e.g., when
+ * mounting a device.
+ */
+ handle_bad_sector(bio);
goto end_io;
}
}
_

2004-11-25 07:17:20

by Jens Axboe

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

On Wed, Nov 24 2004, Andrew Morton wrote:
> Jens Axboe <[email protected]> wrote:
> >
> > On Thu, Nov 25 2004, Neil Brown wrote:
> > > On Wednesday November 24, [email protected] wrote:
> > > > Neil Brown <[email protected]> wrote:
> > > > >
> > > > > Would the following (untested-but-seems-to-compile -
> > > > > explanation-of-concept) patch be at all reasonable to avoid stack
> > > > > depth problems with stacked block devices, or is adding stuff to
> > > > > task_struct frowned upon?
> > > >
> > > > It's always a tradeoff - we've put things in task_struct before to get
> > > > around sticky situations. Certainly, removing potentially unbounded stack
> > > > utilisation is a worthwhile thing to do.
> > > >
> > > > The patch bends my brain a bit.
> > >
> > > Recursion is like that (... like recursion, that is :-).
> >
> > Pardon my ignorance, but where is the bug that called for something like
> > this?
>
> Well there was an xfs-on-raid-on-lvm stack overrun reported, but the
> general problem we're addressing here is that stacking drivers can cause
> arbitrary amounts of kernel stack windup.

Ok. Without b[] on the stack locally, I don't think it's an issue.

> > I can't say I love the idea of adding a bio list structure to the
> > tasklist, it feels pretty hacky. generic_make_request() doesn't really
> > use that much stack, if you just kill the BDEVNAME_SIZE struct.
>
> Looks like a sensible thing to do, although it would be tidier to move the
> whole thing into a separate function, no?

Yep, works for me.

--
Jens Axboe

2004-11-28 11:31:15

by David Greaves

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Andrew Morton wrote:

>Phil Dier <[email protected]> wrote:
>
>
>>>Can you rebuild the kernel with CONFIG_4KSTACKS=n?
>>>
>>>
>>>
>>Looks like 8k stacks did the trick, at least for the oops. Now I'm
>>seeing the stuff below.
>>
>>I got a ton more of this with jfs and xfs, but it seems much less with
>>reiser. Should I be worried, or is this something I can safely ignore?
>>It doesn't lock the system.. Could files be getting corrupted?
>>
>>
>>Nov 23 17:38:20 calculon swapper: page allocation failure. order:0, mode:0x20
>>Nov 23 17:38:20 calculon [<c013c854>] __alloc_pages+0x1b9/0x35e
>>Nov 23 17:38:20 calculon [<c013ca1e>] __get_free_pages+0x25/0x3f
>>Nov 23 17:38:20 calculon [<c013fccb>] kmem_getpages+0x21/0xc9
>>Nov 23 17:38:20 calculon [<c0140813>] alloc_slabmgmt+0x55/0x5f
>>Nov 23 17:38:20 calculon [<c0140992>] cache_grow+0xab/0x14d
>>Nov 23 17:38:20 calculon [<c0140ba8>] cache_alloc_refill+0x174/0x219
>>Nov 23 17:38:20 calculon [<c0140ffe>] __kmalloc+0x85/0x8c
>>Nov 23 17:38:20 calculon [<c03f4f89>] alloc_skb+0x47/0xe0
>>Nov 23 17:38:20 calculon [<c032ebe5>] e1000_alloc_rx_buffers+0x44/0xe3
>>
>>
>
>You didn't mention the kernel version. 2.6.9 had problems in this area, so
>2.6.10-rc2 should be better. And there are post-2.6.10-rc2 fixes which
>will provide more headroom.
>
>
Hi
I have a system that's running 2.6.10rc2
It has libata sata_promise + sata_sil drives in an md raid5 array that's
used by lvm2 and then xfs; then exported via nfs.
I saw this thread, upgraded to 2.6.10rc2 and I'm posting this in case
it's related (it's hard to tell)

This oops happened whilst the box was quiet

Hopefully relevant config bits:
Single processor
echo 16384 > /proc/sys/vm/min_free_kbytes
CONFIG_4KSTACKS=n
I've done a memtest.
I haven't applied the inode patch - I'm usually writing a single 1-3Gb
files whilst reading another.

Can I help by providing anything else?

Nov 28 09:05:03 cu kernel: Unable to handle kernel paging request at
virtual address 00100104
Nov 28 09:05:03 cu kernel: printing eip:
Nov 28 09:05:03 cu kernel: c0139a62
Nov 28 09:05:03 cu kernel: *pde = 00000000
Nov 28 09:05:03 cu kernel: Oops: 0002 [#1]
Nov 28 09:05:03 cu kernel: Modules linked in: nfs af_packet ipv6 e100
mii usblp uhci_hcd usbcore nfsd exportfs lockd sunrpc sk98lin unix
Nov 28 09:05:03 cu kernel: CPU: 0
Nov 28 09:05:03 cu kernel: EIP: 0060:[cache_alloc_refill+210/528]
Not tainted VLI
Nov 28 09:05:03 cu kernel: EFLAGS: 00010046 (2.6.10-rc2)
Nov 28 09:05:03 cu kernel: EIP is at cache_alloc_refill+0xd2/0x210
Nov 28 09:05:03 cu kernel: eax: 00100100 ebx: dffe2a00 ecx:
ffffffff edx: dffe3a6c
Nov 28 09:05:03 cu kernel: esi: c6118020 edi: c6118038 ebp:
dffe2a10 esp: dd627e40
Nov 28 09:05:03 cu kernel: ds: 007b es: 007b ss: 0068
Nov 28 09:05:03 cu kernel: Process nfsd (pid: 2230, threadinfo=dd626000
task=df1a7a00)
Nov 28 09:05:03 cu kernel: Stack: 0000002c 00000008 ca45dcbc c6118038
dffe3a6c dffe3a74 00000296 ca45dcbc
Nov 28 09:05:03 cu kernel: d12c7b7c 00000000 c0139d8e dffe3a60
000000d0 fffffff4 c0162d8c dffe3a60
Nov 28 09:05:03 cu kernel: 000000d0 dd627ee4 d12c7b7c 00000000
c015922d d12c7b7c fffffff4 ca45dcbc
Nov 28 09:05:03 cu kernel: Call Trace:
Nov 28 09:05:03 cu kernel: [kmem_cache_alloc+62/64]
kmem_cache_alloc+0x3e/0x40
Nov 28 09:05:03 cu kernel: [d_alloc+28/416] d_alloc+0x1c/0x1a0
Nov 28 09:05:03 cu kernel: [cached_lookup+125/144] cached_lookup+0x7d/0x90
Nov 28 09:05:03 cu kernel: [__lookup_hash+139/224] __lookup_hash+0x8b/0xe0
Nov 28 09:05:03 cu kernel: [lookup_hash+31/48] lookup_hash+0x1f/0x30
Nov 28 09:05:03 cu kernel: [lookup_one_len+97/112] lookup_one_len+0x61/0x70
Nov 28 09:05:03 cu kernel: [pg0+550179216/1069196288]
nfsd_lookup+0x110/0x490 [nfsd]
Nov 28 09:05:03 cu kernel: [pg0+550211681/1069196288]
nfsd3_proc_lookup+0xa1/0xe0[nfsd]
Nov 28 09:05:03 cu kernel: [pg0+550167977/1069196288]
nfsd_dispatch+0xd9/0x230 [nfsd]
Nov 28 09:05:03 cu kernel: [pg0+550042452/1069196288]
svc_process+0x4a4/0x690 [sunrpc]
Nov 28 09:05:03 cu kernel: [default_wake_function+0/32]
default_wake_function+0x0/0x20
Nov 28 09:05:03 cu kernel: [pg0+550167404/1069196288] nfsd+0x18c/0x2f0
[nfsd]
Nov 28 09:05:03 cu kernel: [pg0+550167008/1069196288] nfsd+0x0/0x2f0 [nfsd]
Nov 28 09:05:03 cu kernel: [kernel_thread_helper+5/20]
kernel_thread_helper+0x5/0x14
Nov 28 09:05:03 cu kernel: Code: 8b 56 10 0f b7 46 14 42 89 56 10 8b 7c
24 0c 0f b7 04 47 66 89 46 14 8b 44 24 2c 3b 50 3c 73 06 49 83 f9 ff 75
c3 8b 56 04 8b 06 <89> 50 04 89 02 c7 46 04 00 02 20 00 66 83 7e 14 ff
c7 06 00 01

2004-11-28 18:29:05

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

David Greaves <[email protected]> wrote:
>
> ...
> I have a system that's running 2.6.10rc2
> It has libata sata_promise + sata_sil drives in an md raid5 array that's
> used by lvm2 and then xfs; then exported via nfs.
> I saw this thread, upgraded to 2.6.10rc2 and I'm posting this in case
> it's related (it's hard to tell)
>
> This oops happened whilst the box was quiet
>
> Hopefully relevant config bits:
> Single processor
> echo 16384 > /proc/sys/vm/min_free_kbytes
> CONFIG_4KSTACKS=n
> I've done a memtest.
> I haven't applied the inode patch - I'm usually writing a single 1-3Gb
> files whilst reading another.
>
> Can I help by providing anything else?
>
> Nov 28 09:05:03 cu kernel: Unable to handle kernel paging request at
> virtual address 00100104

That's the list_del() poisoning pattern.

> Nov 28 09:05:03 cu kernel: printing eip:
> Nov 28 09:05:03 cu kernel: c0139a62
> Nov 28 09:05:03 cu kernel: *pde = 00000000
> Nov 28 09:05:03 cu kernel: Oops: 0002 [#1]
> Nov 28 09:05:03 cu kernel: Modules linked in: nfs af_packet ipv6 e100
> mii usblp uhci_hcd usbcore nfsd exportfs lockd sunrpc sk98lin unix
> Nov 28 09:05:03 cu kernel: CPU: 0
> Nov 28 09:05:03 cu kernel: EIP: 0060:[cache_alloc_refill+210/528]
> Not tainted VLI
> Nov 28 09:05:03 cu kernel: EFLAGS: 00010046 (2.6.10-rc2)
> Nov 28 09:05:03 cu kernel: EIP is at cache_alloc_refill+0xd2/0x210
> Nov 28 09:05:03 cu kernel: eax: 00100100 ebx: dffe2a00 ecx:
> ffffffff edx: dffe3a6c
> Nov 28 09:05:03 cu kernel: esi: c6118020 edi: c6118038 ebp:
> dffe2a10 esp: dd627e40
> Nov 28 09:05:03 cu kernel: ds: 007b es: 007b ss: 0068
> Nov 28 09:05:03 cu kernel: Process nfsd (pid: 2230, threadinfo=dd626000
> task=df1a7a00)
> Nov 28 09:05:03 cu kernel: Stack: 0000002c 00000008 ca45dcbc c6118038
> dffe3a6c dffe3a74 00000296 ca45dcbc
> Nov 28 09:05:03 cu kernel: d12c7b7c 00000000 c0139d8e dffe3a60
> 000000d0 fffffff4 c0162d8c dffe3a60
> Nov 28 09:05:03 cu kernel: 000000d0 dd627ee4 d12c7b7c 00000000
> c015922d d12c7b7c fffffff4 ca45dcbc
> Nov 28 09:05:03 cu kernel: Call Trace:
> Nov 28 09:05:03 cu kernel: [kmem_cache_alloc+62/64]
> kmem_cache_alloc+0x3e/0x40
> Nov 28 09:05:03 cu kernel: [d_alloc+28/416] d_alloc+0x1c/0x1a0
> Nov 28 09:05:03 cu kernel: [cached_lookup+125/144] cached_lookup+0x7d/0x90
> Nov 28 09:05:03 cu kernel: [__lookup_hash+139/224] __lookup_hash+0x8b/0xe0
> Nov 28 09:05:03 cu kernel: [lookup_hash+31/48] lookup_hash+0x1f/0x30
> Nov 28 09:05:03 cu kernel: [lookup_one_len+97/112] lookup_one_len+0x61/0x70
> Nov 28 09:05:03 cu kernel: [pg0+550179216/1069196288]
> nfsd_lookup+0x110/0x490 [nfsd]
> Nov 28 09:05:03 cu kernel: [pg0+550211681/1069196288]
> nfsd3_proc_lookup+0xa1/0xe0[nfsd]
> Nov 28 09:05:03 cu kernel: [pg0+550167977/1069196288]
> nfsd_dispatch+0xd9/0x230 [nfsd]
> Nov 28 09:05:03 cu kernel: [pg0+550042452/1069196288]
> svc_process+0x4a4/0x690 [sunrpc]
> Nov 28 09:05:03 cu kernel: [default_wake_function+0/32]
> default_wake_function+0x0/0x20
> Nov 28 09:05:03 cu kernel: [pg0+550167404/1069196288] nfsd+0x18c/0x2f0
> [nfsd]
> Nov 28 09:05:03 cu kernel: [pg0+550167008/1069196288] nfsd+0x0/0x2f0 [nfsd]
> Nov 28 09:05:03 cu kernel: [kernel_thread_helper+5/20]

It appears that the dentry cache's slab freelists have become corrupted.
Odd, because everyone uses that code a lot. I'd suggest that you enable
CONFIG_DEBUG_SLAB, see if that catches anything.

2004-11-30 17:40:53

by Phil Dier

[permalink] [raw]
Subject: Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm, and xfs

Using the patch below with nfs_fsstress.sh results in this oops:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
00000000
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0060:[<00000000>] Not tainted VLI
EFLAGS: 00010286 (2.6.9)
EIP is at 0x0
eax: c05b2be0 ebx: fffffff4 ecx: f590e744 edx: f590e744
esi: f03508e0 edi: f2f418c0 ebp: 00000000 esp: f7bafeac
ds: 007b es: 007b ss: 0068
Process nfsd (pid: 6095, threadinfo=f7bae000 task=f70260b0)
Stack: c01638b6 f03508e0 f2f418c0 00000000 ffffffff f6f860d9 c3183204 f6f860c8
c0163905 f7bafee8 f590e710 00000000 c0163970 f7bafee8 f590e710 b28e88ba
00000011 f6f860c8 00000011 f590e710 00000011 c01f4f16 f6f860c8 f590e710
Call Trace:
[<c01638b6>] __lookup_hash+0xa6/0xd6
[<c0163905>] lookup_hash+0x1f/0x23
[<c0163970>] lookup_one_len+0x67/0x74
[<c01f4f16>] nfsd_lookup+0x115/0x4be
[<c01fd791>] nfsd3_proc_lookup+0xa1/0xe0
[<c01f23c7>] nfsd_dispatch+0xd9/0x1fa
[<c043adda>] svc_process+0x56a/0x784
[<c0119d71>] default_wake_function+0x0/0x12
[<c01f2148>] nfsd+0x1f3/0x399
[<c01f1f55>] nfsd+0x0/0x399
[<c0103271>] kernel_thread_helper+0x5/0xb
Code: Bad EIP value.

Here is my .config:

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.9
# Tue Nov 30 10:20:05 2004
#
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_HOTPLUG=y
# CONFIG_IKCONFIG is not set
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SHMEM=y
# CONFIG_TINY_SHMEM is not set

#
# Loadable module support
#
CONFIG_MODULES=y
# CONFIG_MODULE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
CONFIG_KMOD=y

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HPET_TIMER=y
# CONFIG_HPET_EMULATE_RTC is not set
CONFIG_SMP=y
CONFIG_NR_CPUS=4
CONFIG_SCHED_SMT=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MCE_P4THERMAL=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_HIGHMEM=y
CONFIG_HIGHPTE=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
CONFIG_IRQBALANCE=y
CONFIG_HAVE_DEC_LOCK=y
# CONFIG_REGPARM is not set

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_BOOT=y
CONFIG_ACPI_INTERPRETER=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_BUS=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_PCI=y
CONFIG_ACPI_SYSTEM=y
# CONFIG_X86_PM_TIMER is not set

#
# APM (Advanced Power Management) BIOS Support
#
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY_PROC=y
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set

#
# PCMCIA/CardBus support
#
# CONFIG_PCMCIA is not set
CONFIG_PCMCIA_PROBE=y

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_MISC=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
# CONFIG_DEBUG_DRIVER is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_PC_CML1=y
# CONFIG_PARPORT_SERIAL is not set
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_OTHER is not set
# CONFIG_PARPORT_1284 is not set

#
# Plug and Play support
#
CONFIG_PNP=y
# CONFIG_PNP_DEBUG is not set

#
# Protocols
#
# CONFIG_ISAPNP is not set
# CONFIG_PNPBIOS is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
# CONFIG_BLK_DEV_RAM is not set
CONFIG_LBD=y

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set
CONFIG_IDE_TASKFILE_IO=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
CONFIG_CHR_DEV_SG=y

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y

#
# SCSI Transport Attributes
#
# CONFIG_SCSI_SPI_ATTRS is not set
# CONFIG_SCSI_FC_ATTRS is not set

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_7000FASST is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_ENABLE_RD_STRM is not set
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_IN2000 is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=y
CONFIG_MEGARAID_MAILBOX=y
# CONFIG_SCSI_SATA is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_EATA_PIO is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_PSI240I is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_ISP is not set
# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA2XXX=y
# CONFIG_SCSI_QLA21XX is not set
# CONFIG_SCSI_QLA22XX is not set
# CONFIG_SCSI_QLA2300 is not set
# CONFIG_SCSI_QLA2322 is not set
# CONFIG_SCSI_QLA6312 is not set
# CONFIG_SCSI_QLA6322 is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_NSP32 is not set
# CONFIG_SCSI_DEBUG is not set

#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_RAID6 is not set
# CONFIG_MD_MULTIPATH is not set
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_CRYPT is not set
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_MIRROR=y
CONFIG_DM_ZERO=y

#
# Fusion MPT device support
#
CONFIG_FUSION=y
CONFIG_FUSION_MAX_SGE=40
# CONFIG_FUSION_CTL is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Networking support
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
# CONFIG_NETLINK_DEV is not set
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
# CONFIG_NET_CLS_ROUTE is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_NET_SB1000 is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set

#
# Tulip family network device support
#
# CONFIG_NET_TULIP is not set
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_AC3200 is not set
# CONFIG_APRICOT is not set
# CONFIG_B44 is not set
# CONFIG_FORCEDETH is not set
# CONFIG_CS89x0 is not set
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=y
# CONFIG_E100_NAPI is not set
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
# CONFIG_NET_POCKET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
CONFIG_E1000=y
# CONFIG_E1000_NAPI is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_IXGB is not set
CONFIG_S2IO=m
# CONFIG_S2IO_NAPI is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input I/O drivers
#
# CONFIG_GAMEPORT is not set
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
# CONFIG_SERIO_RAW is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
# CONFIG_SERIAL_8250_ACPI is not set
CONFIG_SERIAL_8250_NR_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_PRINTER is not set
# CONFIG_PPDEV is not set
# CONFIG_TIPAR is not set

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_AGP is not set
# CONFIG_DRM is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_HPET is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# I2C support
#
# CONFIG_I2C is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
# CONFIG_FB is not set
# CONFIG_VIDEO_SELECT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
# CONFIG_SOUND is not set

#
# USB support
#
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_EHCI_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_UHCI_HCD=y

#
# USB Device Class drivers
#
# CONFIG_USB_BLUETOOTH_TTY is not set
# CONFIG_USB_ACM is not set
CONFIG_USB_PRINTER=y
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_RW_DETECT is not set
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
# CONFIG_USB_STORAGE_HP8200e is not set
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y

#
# USB Human Interface Devices (HID)
#
CONFIG_USB_HID=y
CONFIG_USB_HIDINPUT=y
# CONFIG_HID_FF is not set
CONFIG_USB_HIDDEV=y
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_EGALAX is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USB_HPUSBSCSI is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set

#
# Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network adaptors
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set

#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_TIGL is not set
# CONFIG_USB_AUERSWALD is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_TEST is not set

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
# CONFIG_JFS_POSIX_ACL is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_RT is not set
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_SECURITY is not set
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
# CONFIG_DEVFS_FS is not set
# CONFIG_DEVPTS_FS_XATTR is not set
CONFIG_TMPFS=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V4 is not set
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
CONFIG_SMB_FS=y
# CONFIG_SMB_NLS_DEFAULT is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Profiling support
#
# CONFIG_PROFILING is not set

#
# Kernel hacking
#
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_HIGHMEM is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_FRAME_POINTER is not set
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_KPROBES is not set
CONFIG_DEBUG_STACK_USAGE=y
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_4KSTACKS is not set
# CONFIG_SCHEDSTATS is not set
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y

#
# Security options
#
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
# CONFIG_CRYPTO_HMAC is not set
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_MD4 is not set
# CONFIG_CRYPTO_MD5 is not set
# CONFIG_CRYPTO_SHA1 is not set
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_AES_586 is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_CRC32C is not set
# CONFIG_CRYPTO_TEST is not set

#
# Library routines
#
# CONFIG_CRC_CCITT is not set
CONFIG_CRC32=y
CONFIG_LIBCRC32C=m
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_PC=y

On Tue, 23 Nov 2004 22:39:35 +0000
Christoph Hellwig <[email protected]> wrote:

> Actually I can reproduce it reliably by running nfs_fsstress.sh for a
> looong time. The problem is that in the current XFS code the inode
> generation counter starts at 0, but higher level code uses that as
> a wildcard for any possible generation, so you may get a newly created
> file for a stale nfs file handler of an deleted file with the same inode
> number.
>
> The patch below fixes it for me:
>
>
> Index: fs/xfs/xfs_inode.c
> ===================================================================
> RCS file: /cvs/linux-2.6-xfs/fs/xfs/xfs_inode.c,v
> retrieving revision 1.406
> diff -u -p -r1.406 xfs_inode.c
> --- fs/xfs/xfs_inode.c 27 Oct 2004 12:06:24 -0000 1.406
> +++ fs/xfs/xfs_inode.c 23 Nov 2004 20:40:56 -0000
> @@ -1224,9 +1224,16 @@ xfs_ialloc(
> ip->i_d.di_nextents = 0;
> ASSERT(ip->i_d.di_nblocks == 0);
> xfs_ichgtime(ip, XFS_ICHGTIME_CHG|XFS_ICHGTIME_ACC|XFS_ICHGTIME_MOD);
> +
> /*
> - * di_gen will have been taken care of in xfs_iread.
> + * Bump the generation count so no one will confuse us with an
> + * earlier incarnations of this inode.
> + *
> + * Done early to skip generation 0, which is used as a wildcard
> + * by higher level code.
> */
> + ip->i_d.di_gen++;
> +
> ip->i_d.di_extsize = 0;
> ip->i_d.di_dmevmask = 0;
> ip->i_d.di_dmstate = 0;
> @@ -2370,11 +2377,6 @@ xfs_ifree(
> XFS_IFORK_DSIZE(ip) / (uint)sizeof(xfs_bmbt_rec_t);
> ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
> ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
> - /*
> - * Bump the generation count so no one will be confused
> - * by reincarnations of this inode.
> - */
> - ip->i_d.di_gen++;
> xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>
> if (delete) {


--

Phil Dier (ICGLink.com -- 615 370-1530 x733)

/* vim:set noai nocindent ts=8 sw=8: */