2003-08-24 20:57:52

by Pascal Schmidt

[permalink] [raw]
Subject: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164


Hi!

I was running fsx to test a userspace NFSv3 server. The underlying
filesystem was ext3. After about 10 seconds into the fsx run, I hit the
following BUG() in transaction.c. data=journal was used. I could not start
any new processes after the incident and had to press the reset button.

Is this a known problem?

Assertion failure in journal_dirty_metadata() at transaction.c:1164:
"jh->b_frozen_data == 0"

ksymoops 2.4.4 on i686 2.4.22-rc1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22-rc1/ (default)
-m /boot/System.map-2.4.22-rc1 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
kernel BUG at transaction.c:1164!
invalid operand: 0000
CPU: 0
EIP: 0010:[journal_dirty_metadata+359/416] Not tainted
EIP: 0010:[<c015dcc7>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010292
eax: 00000061 ebx: e6044f30 ecx: 00000005 edx: e77a9f44
esi: e77a67c0 edi: e7c447c0 ebp: e655c940 esp: d1459dcc
ds: 0018 es: 0018 ss: 0018
Process fsx (pid: 4689, stackpage=d1459000)
Stack: c02d8fa0 c02d4126 c02d49b8 0000048c c02d4bd1 d0785640 e655c940 00000000
00001000 c015581a e655c940 d0785640 00000246 00000000 00000246 00000000
d0785640 d07d7000 00001000 0000001e 00001000 d0785640 0000001c c01637f7
Call Trace: [commit_write_fn+26/96] [__jbd_kmalloc+39/160] [walk_page_buffers+93/128] [ext3_commit_write+166/448] [commit_write_fn+0/96]
Call Trace: [<c015581a>] [<c01637f7>] [<c015557d>] [<c0155906>] [<c0155800>]
[<c01276cd>] [<c0127ad0>] [<c01531ff>] [<c0132245>] [<c0131e20>] [<c0131fce>]
[<c01088a3>]
Code: 0f 0b 8c 04 b8 49 2d c0 83 c4 14 6a 03 ff 75 00 53 e8 43 0a

>>EIP; c015dcc7 <journal_dirty_metadata+167/1a0> <=====
Trace; c015581a <commit_write_fn+1a/60>
Trace; c01637f7 <__jbd_kmalloc+27/a0>
Trace; c015557d <walk_page_buffers+5d/80>
Trace; c0155906 <ext3_commit_write+a6/1c0>
Trace; c0155800 <commit_write_fn+0/60>
Trace; c01276cd <do_generic_file_write+29d/3e0>
Trace; c0127ad0 <generic_file_write+f0/110>
Trace; c01531ff <ext3_file_write+1f/b0>
Trace; c0132245 <sys_write+95/f0>
Trace; c0131e20 <generic_file_llseek+0/b0>
Trace; c0131fce <sys_lseek+6e/80>
Trace; c01088a3 <system_call+33/38>
Code; c015dcc7 <journal_dirty_metadata+167/1a0>
00000000 <_EIP>:
Code; c015dcc7 <journal_dirty_metadata+167/1a0> <=====
0: 0f 0b ud2a <=====
Code; c015dcc9 <journal_dirty_metadata+169/1a0>
2: 8c 04 b8 movl %es,(%eax,%edi,4)
Code; c015dccc <journal_dirty_metadata+16c/1a0>
5: 49 dec %ecx
Code; c015dccd <journal_dirty_metadata+16d/1a0>
6: 2d c0 83 c4 14 sub $0x14c483c0,%eax
Code; c015dcd2 <journal_dirty_metadata+172/1a0>
b: 6a 03 push $0x3
Code; c015dcd4 <journal_dirty_metadata+174/1a0>
d: ff 75 00 pushl 0x0(%ebp)
Code; c015dcd7 <journal_dirty_metadata+177/1a0>
10: 53 push %ebx
Code; c015dcd8 <journal_dirty_metadata+178/1a0>
11: e8 43 0a 00 00 call a59 <_EIP+0xa59> c015e720 <__journal_file_buffer+0/1e0>


1 warning and 1 error issued. Results may not be reliable.


--
Ciao,
Pascal


2003-08-26 12:20:29

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164


Pascal,

I've never seen this oops reported before.

Can you reproduce the problem?

Andrew, Stephen?

On Sun, 24 Aug 2003, Pascal Schmidt wrote:

>
> Hi!
>
> I was running fsx to test a userspace NFSv3 server. The underlying
> filesystem was ext3. After about 10 seconds into the fsx run, I hit the
> following BUG() in transaction.c. data=journal was used. I could not start
> any new processes after the incident and had to press the reset button.
>
> Is this a known problem?
>
> Assertion failure in journal_dirty_metadata() at transaction.c:1164:
> "jh->b_frozen_data == 0"
>
> ksymoops 2.4.4 on i686 2.4.22-rc1. Options used
> -V (default)
> -k /proc/ksyms (default)
> -l /proc/modules (default)
> -o /lib/modules/2.4.22-rc1/ (default)
> -m /boot/System.map-2.4.22-rc1 (default)
>
> Warning: You did not tell me where to find symbol information. I will
> assume that the log matches the kernel and modules that are running
> right now and I'll use the default options above for symbol resolution.
> If the current kernel and/or modules do not match the log, you can get
> more accurate output by telling me the kernel version and where to find
> map, modules, ksyms etc. ksymoops -h explains the options.
>
> Error (regular_file): read_ksyms stat /proc/ksyms failed
> No modules in ksyms, skipping objects
> No ksyms, skipping lsmod
> kernel BUG at transaction.c:1164!
> invalid operand: 0000
> CPU: 0
> EIP: 0010:[journal_dirty_metadata+359/416] Not tainted
> EIP: 0010:[<c015dcc7>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010292
> eax: 00000061 ebx: e6044f30 ecx: 00000005 edx: e77a9f44
> esi: e77a67c0 edi: e7c447c0 ebp: e655c940 esp: d1459dcc
> ds: 0018 es: 0018 ss: 0018
> Process fsx (pid: 4689, stackpage=d1459000)
> Stack: c02d8fa0 c02d4126 c02d49b8 0000048c c02d4bd1 d0785640 e655c940 00000000
> 00001000 c015581a e655c940 d0785640 00000246 00000000 00000246 00000000
> d0785640 d07d7000 00001000 0000001e 00001000 d0785640 0000001c c01637f7
> Call Trace: [commit_write_fn+26/96] [__jbd_kmalloc+39/160] [walk_page_buffers+93/128] [ext3_commit_write+166/448] [commit_write_fn+0/96]
> Call Trace: [<c015581a>] [<c01637f7>] [<c015557d>] [<c0155906>] [<c0155800>]
> [<c01276cd>] [<c0127ad0>] [<c01531ff>] [<c0132245>] [<c0131e20>] [<c0131fce>]
> [<c01088a3>]
> Code: 0f 0b 8c 04 b8 49 2d c0 83 c4 14 6a 03 ff 75 00 53 e8 43 0a
>
> >>EIP; c015dcc7 <journal_dirty_metadata+167/1a0> <=====
> Trace; c015581a <commit_write_fn+1a/60>
> Trace; c01637f7 <__jbd_kmalloc+27/a0>
> Trace; c015557d <walk_page_buffers+5d/80>
> Trace; c0155906 <ext3_commit_write+a6/1c0>
> Trace; c0155800 <commit_write_fn+0/60>
> Trace; c01276cd <do_generic_file_write+29d/3e0>
> Trace; c0127ad0 <generic_file_write+f0/110>
> Trace; c01531ff <ext3_file_write+1f/b0>
> Trace; c0132245 <sys_write+95/f0>
> Trace; c0131e20 <generic_file_llseek+0/b0>
> Trace; c0131fce <sys_lseek+6e/80>
> Trace; c01088a3 <system_call+33/38>
> Code; c015dcc7 <journal_dirty_metadata+167/1a0>
> 00000000 <_EIP>:
> Code; c015dcc7 <journal_dirty_metadata+167/1a0> <=====
> 0: 0f 0b ud2a <=====
> Code; c015dcc9 <journal_dirty_metadata+169/1a0>
> 2: 8c 04 b8 movl %es,(%eax,%edi,4)
> Code; c015dccc <journal_dirty_metadata+16c/1a0>
> 5: 49 dec %ecx
> Code; c015dccd <journal_dirty_metadata+16d/1a0>
> 6: 2d c0 83 c4 14 sub $0x14c483c0,%eax
> Code; c015dcd2 <journal_dirty_metadata+172/1a0>
> b: 6a 03 push $0x3
> Code; c015dcd4 <journal_dirty_metadata+174/1a0>
> d: ff 75 00 pushl 0x0(%ebp)
> Code; c015dcd7 <journal_dirty_metadata+177/1a0>
> 10: 53 push %ebx
> Code; c015dcd8 <journal_dirty_metadata+178/1a0>
> 11: e8 43 0a 00 00 call a59 <_EIP+0xa59> c015e720 <__journal_file_buffer+0/1e0>
>
>
> 1 warning and 1 error issued. Results may not be reliable.
>
>
>

2003-08-26 15:07:59

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

On Tue, 26 Aug 2003, Marcelo Tosatti wrote:

> I've never seen this oops reported before.
> Can you reproduce the problem?

Yes. Running fsx directly on my ext3 /home partition gets me the
BUG within a few seconds, with the exact same backtrace as below.
There don't seem to be any jbd changes from -rc1 to final 2.4.22,
so I assume the problem exists in 2.4.22 as well.

Box survives a night of memtest86, so I figure it's not a memory
problem.

> > Assertion failure in journal_dirty_metadata() at transaction.c:1164:
> > "jh->b_frozen_data == 0"
> > kernel BUG at transaction.c:1164!

> > >>EIP; c015dcc7 <journal_dirty_metadata+167/1a0> <=====
> > Trace; c015581a <commit_write_fn+1a/60>
> > Trace; c01637f7 <__jbd_kmalloc+27/a0>
> > Trace; c015557d <walk_page_buffers+5d/80>
> > Trace; c0155906 <ext3_commit_write+a6/1c0>
> > Trace; c0155800 <commit_write_fn+0/60>
> > Trace; c01276cd <do_generic_file_write+29d/3e0>
> > Trace; c0127ad0 <generic_file_write+f0/110>
> > Trace; c01531ff <ext3_file_write+1f/b0>
> > Trace; c0132245 <sys_write+95/f0>
> > Trace; c0131e20 <generic_file_llseek+0/b0>
> > Trace; c0131fce <sys_lseek+6e/80>
> > Trace; c01088a3 <system_call+33/38>

--
Ciao,
Pascal

2003-08-26 15:57:47

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

On Tue, 26 Aug 2003, Pascal Schmidt wrote:

> Yes. Running fsx directly on my ext3 /home partition gets me the
> BUG within a few seconds, with the exact same backtrace as below.
> There don't seem to be any jbd changes from -rc1 to final 2.4.22,
> so I assume the problem exists in 2.4.22 as well.

I've just updated to 2.4.22-rc3 (since bkcvs doesn't seem to have
final 2.4.22 yet). There, the BUG is not triggered. Instead I get
tons of these:

Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 2472965)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 2472966)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 2472967)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 2472952)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 2472954)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 2472955)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 71346)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 66050)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 66050)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 66049)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 66050)

These messages pile up as long as I run fsx.

Again, this is on an ext3 partition mounted with data=journal.

--
Ciao,
Pascal

2003-08-26 16:17:19

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

On Tue, 26 Aug 2003, Pascal Schmidt wrote:

> I've just updated to 2.4.22-rc3 (since bkcvs doesn't seem to have
> final 2.4.22 yet). There, the BUG is not triggered.

Sigh. I spoke too soon. Turns out I have two different versions of
fsx.c around. The one that caused the BUG before still does, but
it's a different one now:

Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 84449)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 84450)
Unexpected dirty buffer encountered at do_get_write_access:616 (03:07 blocknr 84451)
Assertion failure in journal_commit_transaction() at commit.c:712: "!(((bh)->b_state & (1UL << BH_Dirty)) != 0)"
kernel BUG at commit.c:712!

ksymoops 2.4.4 on i686 2.4.22-rc3. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22-rc3/ (default)
-m /boot/System.map-2.4.22-rc3 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
kernel BUG at commit.c:712!
invalid operand: 0000
CPU: 0
EIP: 0010:[journal_commit_transaction+4387/4881] Not tainted
EIP: 0010:[<c015fd63>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 00000074 ebx: e605b9c0 ecx: ffffffff edx: e6ccff44
esi: 00000000 edi: e42207c0 ebp: 00000000 esp: e609fe08
ds: 0018 es: 0018 ss: 0018
Process kjournald (pid: 1523, stackpage=e609f000)
Stack: c02d9640 c02d53bf c02d53b6 000002c8 c02dc3e0 e7c44880 00000000 00000f2c
e2a8a0d4 00000000 00000000 e752b440 e605b270 00000b07 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace: [schedule+786/832] [kjournald+371/704] [commit_timeout+0/16] [arch_kernel_thread+38/48] [kjournald+0/704]
Call Trace: [<c0114662>] [<c01620e3>] [<c0161f50>] [<c0107116>] [<c0161f70>]
Code: 0f 0b c8 02 b6 53 2d c0 83 c4 14 8b 73 18 85 f6 74 29 68 c0

>>EIP; c015fd63 <journal_commit_transaction+1123/1311> <=====
Trace; c0114662 <schedule+312/340>
Trace; c01620e3 <kjournald+173/2c0>
Trace; c0161f50 <commit_timeout+0/10>
Trace; c0107116 <arch_kernel_thread+26/30>
Trace; c0161f70 <kjournald+0/2c0>
Code; c015fd63 <journal_commit_transaction+1123/1311>
00000000 <_EIP>:
Code; c015fd63 <journal_commit_transaction+1123/1311> <=====
0: 0f 0b ud2a <=====
Code; c015fd65 <journal_commit_transaction+1125/1311>
2: c8 02 b6 53 enter $0xb602,$0x53
Code; c015fd69 <journal_commit_transaction+1129/1311>
6: 2d c0 83 c4 14 sub $0x14c483c0,%eax
Code; c015fd6e <journal_commit_transaction+112e/1311>
b: 8b 73 18 mov 0x18(%ebx),%esi
Code; c015fd71 <journal_commit_transaction+1131/1311>
e: 85 f6 test %esi,%esi
Code; c015fd73 <journal_commit_transaction+1133/1311>
10: 74 29 je 3b <_EIP+0x3b> c015fd9e <journal_commit_transaction+115e/1311>
Code; c015fd75 <journal_commit_transaction+1135/1311>
12: 68 c0 00 00 00 push $0xc0


1 warning and 1 error issued. Results may not be reliable.

--
Ciao,
Pascal

2003-08-26 18:51:39

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

Hi,

On Tue, 2003-08-26 at 17:15, Pascal Schmidt wrote:

> Sigh. I spoke too soon. Turns out I have two different versions of
> fsx.c around. The one that caused the BUG before still does, but
> it's a different one now:

OK, could you send me both of your versions so that I can try them
here? I've got an uptodate fsx around myself, but not necessarily the
same version as you, and evidently the precise version matters here.

Thanks,
Stephen

2003-08-26 20:45:52

by Lorenzo Allegrucci

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

On Tuesday 26 August 2003 18:51, Stephen C. Tweedie wrote:
> Hi,
>
> On Tue, 2003-08-26 at 17:15, Pascal Schmidt wrote:
> > Sigh. I spoke too soon. Turns out I have two different versions of
> > fsx.c around. The one that caused the BUG before still does, but
> > it's a different one now:
>
> OK, could you send me both of your versions so that I can try them
> here? I've got an uptodate fsx around myself, but not necessarily the
> same version as you, and evidently the precise version matters here.

I've just got a similar oops from 2.4.21

Aug 26 22:01:34 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 10108)
Aug 26 22:01:34 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 10108)
Aug 26 22:01:59 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9824)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9600)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9601)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9595)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 8923)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9608)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9608)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9609)
Aug 26 22:02:09 odyssey kernel: Unexpected dirty buffer encountered at
do_get_write_access:616 (03:42 blocknr 9609)
Aug 26 22:02:09 odyssey kernel: Assertion failure in journal_dirty_metadata()
at transaction.c:1164: "jh->b_frozen_data == 0"

ksymoops 2.4.8 on i686 2.4.21. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21/ (default)
-m /boot/System.map-2.4.21 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Aug 26 22:02:09 odyssey kernel: kernel BUG at transaction.c:1164!
Aug 26 22:02:09 odyssey kernel: invalid operand: 0000
Aug 26 22:02:09 odyssey kernel: CPU: 0
Aug 26 22:02:09 odyssey kernel: EIP: 0010:[journal_dirty_metadata+411/496]
Not tainted
Aug 26 22:02:09 odyssey kernel: EFLAGS: 00010282
Aug 26 22:02:09 odyssey kernel: eax: 00000061 ebx: de8c94c0 ecx: dd786000
edx: deaf7f7c
Aug 26 22:02:09 odyssey kernel: esi: c1593cf4 edi: c1593c80 ebp: dc6f88c0
esp: dd787e68
Aug 26 22:02:09 odyssey kernel: ds: 0018 es: 0018 ss: 0018
Aug 26 22:02:09 odyssey kernel: Process fsx-linux (pid: 554,
stackpage=dd787000)
Aug 26 22:02:09 odyssey kernel: Stack: c0248f80 c024652a c0246c9d 0000048c
c0246e0e df19b570 dcc15e40 dc6f88c0
Aug 26 22:02:09 odyssey kernel: 00000000 00001000 c015a5b4 dc6f88c0
dcc15e40 c015a27f dc6f88c0 dcc15e40
Aug 26 22:02:09 odyssey kernel: 00001000 c0161f9b dcc15e40 dcc15e40
00001000 c015a251 dc6f88c0 c6330200
Aug 26 22:02:09 odyssey kernel: Call Trace: [commit_write_fn+36/128]
[do_journal_get_write_access+31/128] [new_handle+75/112]
[walk_page_buffers+113/128] [ext3_commit_write+139/512]
Aug 26 22:02:09 odyssey kernel: Code: 0f 0b 8c 04 9d 6c 24 c0 eb ad c7 44 24
10 c0 b2 24 c0 c7 44
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; de8c94c0 <_end+1e5dde00/205489c0>
>>ecx; dd786000 <_end+1d49a940/205489c0>
>>edx; deaf7f7c <_end+1e80c8bc/205489c0>
>>esi; c1593cf4 <_end+12a8634/205489c0>
>>edi; c1593c80 <_end+12a85c0/205489c0>
>>ebp; dc6f88c0 <_end+1c40d200/205489c0>
>>esp; dd787e68 <_end+1d49c7a8/205489c0>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 8c 04 9d 6c 24 c0 eb movl %es,0xebc0246c(,%ebx,4)
Code; 00000009 Before first symbol
9: ad lods %ds:(%esi),%eax
Code; 0000000a Before first symbol
a: c7 44 24 10 c0 b2 24 movl $0xc024b2c0,0x10(%esp,1)
Code; 00000011 Before first symbol
11: c0
Code; 00000012 Before first symbol
12: c7 44 00 00 00 00 00 movl $0x0,0x0(%eax,%eax,1)
Code; 00000019 Before first symbol
19: 00


1 warning issued. Results may not be reliable.


I can reproduce the 2.4.22 oops too..

2003-08-28 11:15:27

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

Hi,

On Tue, 2003-08-26 at 21:43, Pascal Schmidt wrote:

> fsx1.c (from 2001, whoa) is the one that causes the bad problems,
> while fsx2.c only yields tons of messages but no BUG().

Many thanks --- I was able to reproduce this very easily, and I know of
one or two very unusual things that fsx does which might well be the
trigger here. I'll let you know how things go.

Cheers,
Stephen

2003-08-28 13:58:14

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

On 28 Aug 2003, Stephen C. Tweedie wrote:

> Many thanks --- I was able to reproduce this very easily, and I know of
> one or two very unusual things that fsx does which might well be the
> trigger here. I'll let you know how things go.

Good, at least it's not a bug that only happens here and is hard to
reproduce elsewhere.

I hope this does not happen under normal fs usage. ;)

--
Ciao,
Pascal

2003-08-28 21:19:32

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164

Hi,

On Thu, 2003-08-28 at 14:57, Pascal Schmidt wrote:

> > Many thanks --- I was able to reproduce this very easily, and I know of
> > one or two very unusual things that fsx does which might well be the
> > trigger here. I'll let you know how things go.
>
> Good, at least it's not a bug that only happens here and is hard to
> reproduce elsewhere.
>
> I hope this does not happen under normal fs usage. ;)

It's all down to ext3_writepage() using data-journaling rather than
metadata journaling.

The obvious fix is just to make the journal_dirty_async_data() code
commit its writes as metadata if the inode is marked for
data-journaling, and to set the transaction handle to be synchronous in
that case. Sounds like a recipe for deadlock if done incorrectly,
though, so I'll give it a more careful look tomorrow.

Cheers,
Stephen

2003-09-12 12:10:59

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4.22-rc1] ext3/jbd assertion failure transaction.c:1164


On 28 Aug 2003, Stephen C. Tweedie wrote:

> Hi,
>
> On Thu, 2003-08-28 at 14:57, Pascal Schmidt wrote:
>
> > > Many thanks --- I was able to reproduce this very easily, and I know of
> > > one or two very unusual things that fsx does which might well be the
> > > trigger here. I'll let you know how things go.
> >
> > Good, at least it's not a bug that only happens here and is hard to
> > reproduce elsewhere.
> >
> > I hope this does not happen under normal fs usage. ;)
>
> It's all down to ext3_writepage() using data-journaling rather than
> metadata journaling.
>
> The obvious fix is just to make the journal_dirty_async_data() code
> commit its writes as metadata if the inode is marked for
> data-journaling, and to set the transaction handle to be synchronous in
> that case. Sounds like a recipe for deadlock if done incorrectly,
> though, so I'll give it a more careful look tomorrow.

Hello Stephen,

Whats the status of this?

You told me the other day you knew how to fix but needed some more
thoughs...

Thanks