2004-11-24 18:43:14

by Anders Saaby

[permalink] [raw]
Subject: 2.6.9 Oops: Major problems with XFS and ext3 (VFS related?)

Hi Lists, (XFS list CC'ed)

We are encountering what looks like a race on both ext3 and XFS on a high-load
mailserver.

Here is the cituation:
We have a high-load mailserver serving IMAP from Maildirs. We originally had
the maildirs on ext3 but the kernel eventually Oopsed every ~20 hours (Oops -
included) - we then moved the Maildirs to XFS thinking the problems where
history, but now we get a somewhat similar error from XFS (inluded). They
both look like a race to me but I am not able to get more out of it.

System: IBM Dual Xeon P4 - IBM ips raidcontroller (raid 0+1) ~150G.
Kernel: Linux 2.6.9 SMP

So buttomline both ext3 and XFS causes crashes. Comments anyone? ...We are
desperate.

Here is what XFS says:
<SNIP>
Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that
is not in the AIL
xfs_force_shutdown(sdb1,0x8) called from line 382 of file
fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56
@Linux version 2.6.9 ([email protected]) (gcc version 2.96 20000731 (Red
Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004
</SNIP>

Here is what ext3 says:
<SNIP>
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
c018b2f5
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: nfs e1000 iptable_nat rtc
CPU:????2
EIP:????0060:[<c018b2f5>]????Not?tainted?VLI
EFLAGS: 00010286???(2.6.9)
EIP is at journal_commit_transaction+0x545/0x11b0
eax: d971826c???ebx:?00000000???ecx:?e489eefc???edx:?00000014
esi: d971826c???edi:?f7406000???ebp:?ea0a6f80???esp:?f7407d8c
ds: 007b???es:?007b???ss:?0068
Process kjournald (pid: 177, threadinfo=f7406000 task=f7df63b0)
Stack: 03afe6b2 c2157478 f7407e40 f7406000 c2157414 00000000 00000000 00000000
???????00000000?00000000?e489ebfc?cd61056c?000010e8?01c2bf60?c040e020?00000000
???????f7406000?0000001e?f7407e1c?c0412f80?00000008?f7407e5c?c01134e3?f7407e1c
Call Trace:
?[<c01134e3>]?find_busiest_group+0xf3/0x300
?[<c0113799>]?find_busiest_queue+0xa9/0xd0
?[<c0115620>]?autoremove_wake_function+0x0/0x40
?[<c0115620>]?autoremove_wake_function+0x0/0x40
?[<c018e0e1>]?kjournald+0xc1/0x230
?[<c0115620>]?autoremove_wake_function+0x0/0x40
?[<c0112ba3>]?finish_task_switch+0x33/0x70
?[<c0115620>]?autoremove_wake_function+0x0/0x40
?[<c0103ff6>]?ret_from_fork+0x6/0x14
?[<c018e000>]?commit_timeout+0x0/0x10
?[<c018e020>]?kjournald+0x0/0x230
?[<c010253d>]?kernel_thread_helper+0x5/0x18
Code: 00 89 f0 e8 5e e1 17 00 83 c4 14 8b 45 18 85 c0 0f 84 49 01 00 00 bf 00
e0 ff ff 21 e7 89 f6 8d bc 27 00 00 00 00 8b 70 20 8b 1e <f0> ff 43 0c 8b 03
83 e0 04 74 4e 8b 94 24
?e8?01?00?00?8d?82?c0
</SNIP>

I will be happy to supply any info and do some testing - if anyone catches
interest! :-)

--
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: [email protected] - http://www.cohaesio.com
------------------------------------------------


2004-11-26 23:54:03

by Nathan Scott

[permalink] [raw]
Subject: Re: 2.6.9 Oops: Major problems with XFS and ext3 (VFS related?)

On Wed, Nov 24, 2004 at 06:12:33PM +0100, Anders Saaby wrote:
> Hi Lists, (XFS list CC'ed)

Hi there,

> Here is the cituation:
> We have a high-load mailserver serving IMAP from Maildirs. We originally had
> the maildirs on ext3 but the kernel eventually Oopsed every ~20 hours (Oops -
> included) - we then moved the Maildirs to XFS thinking the problems where
> history, but now we get a somewhat similar error from XFS (inluded). They
> both look like a race to me but I am not able to get more out of it.
> ...
> Here is what XFS says:
> <SNIP>
> Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that
> is not in the AIL
> xfs_force_shutdown(sdb1,0x8) called from line 382 of file
> fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56
> @Linux version 2.6.9 ([email protected]) (gcc version 2.96 20000731 (Red
> Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004
> ...
> I will be happy to supply any info and do some testing - if anyone catches
> interest! :-)

Yep, very interested. So, "serving IMAP from Maildirs" - from
the filesystems perspective, can you describe that in detail for
me? I would guess that means a shallow directory tree, with quite
large directories (how large?) and many (how many?) small files?
(how small on average?) How frequently are files added/removed?

Is this easily reproducible for you? If so, can you send me
enough details that I can try to reproduce it locally?

thanks.

--
Nathan