2007-01-19 17:57:50

by Zan Lynx

[permalink] [raw]
Subject: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
and rc4-mm1 have been giving me these freezes. They were happening
inside X and without external console it was impossible to get anything,
plus I was reluctant to test it since the freeze sometimes requires a
full fsck.reiser4 --build-fs to recover the filesystem.

But I finally got some output in a console session. I wasn't able to
get it all, I made some notes of what I think the problem is. I may try
again later once I get netconsole working (netconsole fails as a
built-in, I'll try it as a module next).

1 lock held by pdflush/185:
#0: (&type->s_umount_key#15) ... writeback_inodes+0x89

3 locks held by realsync/12942:
#0: (&type->s_umount_key#15) at ... __sync_inodes+0x78
#1: (&mgr->commit_mutex) ... reiser4_txn_end+0x37a
#2: (&qp->mutex) ... synchronize_qrcu+0x19

So, I *think* the problem is two locks on s_umount_key#15. Does that
sound likely? I also noticed QRCU may be involved.

Perhaps someone will look at this and instantly know what the problem
is.

If not, I'll be following up with more details like .config and perhaps
a full sysrq-T dump as soon as that fsck finishes.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-01-19 19:53:54

by Edward Shishkin

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

Zan Lynx wrote:

>I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
>and rc4-mm1 have been giving me these freezes.
>

I didn't investigate it in details yet, other file systems also freeze
for me:
http://marc.theaimsgroup.com/?l=linux-kernel&m=116809282829254&w=2

> They were happening
>inside X and without external console it was impossible to get anything,
>plus I was reluctant to test it since the freeze sometimes requires a
>full fsck.reiser4 --build-fs to recover the filesystem.
>
>

Why did you decide to recover? Got oops after mount, or?

>But I finally got some output in a console session. I wasn't able to
>get it all, I made some notes of what I think the problem is. I may try
>again later once I get netconsole working (netconsole fails as a
>built-in, I'll try it as a module next).
>
>1 lock held by pdflush/185:
>#0: (&type->s_umount_key#15) ... writeback_inodes+0x89
>
>3 locks held by realsync/12942:
>#0: (&type->s_umount_key#15) at ... __sync_inodes+0x78
>#1: (&mgr->commit_mutex) ... reiser4_txn_end+0x37a
>#2: (&qp->mutex) ... synchronize_qrcu+0x19
>
>So, I *think* the problem is two locks on s_umount_key#15. Does that
>sound likely? I also noticed QRCU may be involved.
>
>Perhaps someone will look at this and instantly know what the problem
>is.
>
>If not, I'll be following up with more details like .config and perhaps
>a full sysrq-T dump as soon as that fsck finishes.
>
>

2007-01-19 23:34:38

by Vladimir V. Saveliev

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

Hello

On Friday 19 January 2007 20:58, Zan Lynx wrote:
> I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
> and rc4-mm1 have been giving me these freezes. They were happening
> inside X and without external console it was impossible to get anything,
> plus I was reluctant to test it since the freeze sometimes requires a
> full fsck.reiser4 --build-fs to recover the filesystem.
>
> But I finally got some output in a console session. I wasn't able to
> get it all, I made some notes of what I think the problem is. I may try
> again later once I get netconsole working (netconsole fails as a
> built-in, I'll try it as a module next).
>
> 1 lock held by pdflush/185:
> #0: (&type->s_umount_key#15) ... writeback_inodes+0x89
>
> 3 locks held by realsync/12942:
> #0: (&type->s_umount_key#15) at ... __sync_inodes+0x78
> #1: (&mgr->commit_mutex) ... reiser4_txn_end+0x37a
> #2: (&qp->mutex) ... synchronize_qrcu+0x19
>
> So, I *think* the problem is two locks on s_umount_key#15. Does that
> sound likely? I also noticed QRCU may be involved.
>
> Perhaps someone will look at this and instantly know what the problem
> is.
>
> If not, I'll be following up with more details like .config and perhaps
> a full sysrq-T dump as soon as that fsck finishes.
>
yes, please provide more information. Full kernel output at time of freeze is very desirable.

2007-01-23 07:47:07

by Vince

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

Zan Lynx wrote:
> I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
> and rc4-mm1 have been giving me these freezes. They were happening
> inside X and without external console it was impossible to get anything,
> plus I was reluctant to test it since the freeze sometimes requires a
> full fsck.reiser4 --build-fs to recover the filesystem.
> [...]

Hi,

I don't know if it is related, but I've had the following BUG on
2.6.20-rc4-mm1 (+ hot-fixes patches applied) :

-------------------
kernel BUG at fs/reiser4/plugin/item/extent_file_ops.c:973!
invalid opcode: 0000 [#1]
PREEMPT
last sysfs file: /devices/pci0000:00/0000:00:13.0/eth0/statistics/collisions
Modules linked in: binfmt_misc nfs lockd sunrpc radeon drm reiser4
ati_remote fuse usbhid snd_via82xx snd_ac97_codec ac97_bus snd_pcm_oss
snd_mixer_oss snd_pcm snd_page_alloc snd_mpu401_uart snd_seq_oss
snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device ohci1394 ieee1394 psmouse sr_mod cdrom sg ehci_hcd
via_agp agpgart uhci_hcd usbcore i2c_viapro snd soundcore
CPU: 0
EIP: 0060:[<f9b8a2e0>] Not tainted VLI
EFLAGS: 00010282 (2.6.20-rc4-mm1 #1)
EIP is at reiser4_write_extent+0xd5/0x626 [reiser4]
eax: ccca139c ebx: 00000200 ecx: f5bec400 edx: ffffffe4
esi: 00000000 edi: f5bec414 ebp: da6ff274 esp: e17d7e34
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process sstrip (pid: 23858, ti=e17d6000 task=d8ffc570 task.ti=e17d6000)
Stack: 00000000 00100100 00200200 00100100 00000034 bf826a50 e083ff00
00000000
c0000000 da6ff2c8 dccba4c0 00000005 000001ff 0000021e 00000000
00000000
00000000 00000000 00000004 f9b6cdad 00000000 00000004 00000004
00000001
Call Trace:
[<f9b6cdad>] reiser4_update_sd+0x22/0x28 [reiser4]
[<c0162459>] notify_change+0x200/0x20f
[<c01b89ed>] vsscanf+0x1e2/0x3ff
[<f9b75c80>] write_unix_file+0x0/0x495 [reiser4]
[<c013630d>] __remove_suid+0x10/0x14
[<c013d847>] mark_page_accessed+0x1c/0x2e
[<f9b5fbc2>] reiser4_txn_begin+0x1c/0x2e [reiser4]
[<f9b8a20b>] reiser4_write_extent+0x0/0x626 [reiser4]
[<f9b75eda>] write_unix_file+0x25a/0x495 [reiser4]
[<c0142601>] __handle_mm_fault+0x2bd/0x79b
[<f9b75c80>] write_unix_file+0x0/0x495 [reiser4]
[<c01514e9>] vfs_write+0x8a/0x136
[<c0151a27>] sys_write+0x41/0x67
[<c0103c86>] sysenter_past_esp+0x5f/0x85
=======================
Code: 04 89 0c 24 31 c9 89 5c 24 04 e8 52 fc ff ff 31 d2 e9 59 05 00 00
64 a1 08 00 00 00 8b 80 b4 04 00 00 8b 40 38 83 78 08 00 74 04 <0f> 0b
eb fe 8b 8c 24 e0 00 00 00 31 db 8b 01 8b 51 04 89 c1 0f
EIP: [<f9b8a2e0>] reiser4_write_extent+0xd5/0x626 [reiser4] SS:ESP
0068:e17d7e34
<4>reiser4[sstrip(23858)]: release_unix_file
(fs/reiser4/plugin/file/file.c:2417)[vs-44]:
WARNING: out of memory?
reiser4[sstrip(23858)]: release_unix_file
(fs/reiser4/plugin/file/file.c:2417)[vs-44]:
WARNING: out of memory?

2007-01-23 18:16:10

by Vladimir V. Saveliev

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

Hello

On Tuesday 23 January 2007 10:38, Vince wrote:
> Zan Lynx wrote:
> > I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
> > and rc4-mm1 have been giving me these freezes. They were happening
> > inside X and without external console it was impossible to get anything,
> > plus I was reluctant to test it since the freeze sometimes requires a
> > full fsck.reiser4 --build-fs to recover the filesystem.
> > [...]
>
> Hi,
>
> I don't know if it is related, but I've had the following BUG on
> 2.6.20-rc4-mm1 (+ hot-fixes patches applied) :
>
> -------------------
> kernel BUG at fs/reiser4/plugin/item/extent_file_ops.c:973!

This is another problem than Zan's one. The attached patch should fix it.

Andrew, please apply.


From: Vladimir Saveliev <[email protected]>

remove_suid may open a transaction in reiser4 which is to be restarted
before entering into main write loop.

Signed-off-by: Vladimir Saveliev <[email protected]>




diff -puN fs/reiser4/plugin/file/file.c~reiser4-restart-transaction-after-remove_suid fs/reiser4/plugin/file/file.c
--- linux-2.6.20-rc3-mm1/fs/reiser4/plugin/file/file.c~reiser4-restart-transaction-after-remove_suid 2007-01-23 18:59:14.000000000 +0300
+++ linux-2.6.20-rc3-mm1-vs/fs/reiser4/plugin/file/file.c 2007-01-23 19:00:37.000000000 +0300
@@ -2175,6 +2175,8 @@ ssize_t write_unix_file(struct file *fil
reiser4_exit_context(ctx);
return result;
}
+ /* remove_suid might create a transaction */
+ reiser4_txn_restart(ctx);

uf_info = unix_file_inode_data(inode);


_


> invalid opcode: 0000 [#1]
> PREEMPT
> last sysfs file: /devices/pci0000:00/0000:00:13.0/eth0/statistics/collisions
> Modules linked in: binfmt_misc nfs lockd sunrpc radeon drm reiser4
> ati_remote fuse usbhid snd_via82xx snd_ac97_codec ac97_bus snd_pcm_oss
> snd_mixer_oss snd_pcm snd_page_alloc snd_mpu401_uart snd_seq_oss
> snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
> snd_seq_device ohci1394 ieee1394 psmouse sr_mod cdrom sg ehci_hcd
> via_agp agpgart uhci_hcd usbcore i2c_viapro snd soundcore
> CPU: 0
> EIP: 0060:[<f9b8a2e0>] Not tainted VLI
> EFLAGS: 00010282 (2.6.20-rc4-mm1 #1)
> EIP is at reiser4_write_extent+0xd5/0x626 [reiser4]
> eax: ccca139c ebx: 00000200 ecx: f5bec400 edx: ffffffe4
> esi: 00000000 edi: f5bec414 ebp: da6ff274 esp: e17d7e34
> ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
> Process sstrip (pid: 23858, ti=e17d6000 task=d8ffc570 task.ti=e17d6000)
> Stack: 00000000 00100100 00200200 00100100 00000034 bf826a50 e083ff00
> 00000000
> c0000000 da6ff2c8 dccba4c0 00000005 000001ff 0000021e 00000000
> 00000000
> 00000000 00000000 00000004 f9b6cdad 00000000 00000004 00000004
> 00000001
> Call Trace:
> [<f9b6cdad>] reiser4_update_sd+0x22/0x28 [reiser4]
> [<c0162459>] notify_change+0x200/0x20f
> [<c01b89ed>] vsscanf+0x1e2/0x3ff
> [<f9b75c80>] write_unix_file+0x0/0x495 [reiser4]
> [<c013630d>] __remove_suid+0x10/0x14
> [<c013d847>] mark_page_accessed+0x1c/0x2e
> [<f9b5fbc2>] reiser4_txn_begin+0x1c/0x2e [reiser4]
> [<f9b8a20b>] reiser4_write_extent+0x0/0x626 [reiser4]
> [<f9b75eda>] write_unix_file+0x25a/0x495 [reiser4]
> [<c0142601>] __handle_mm_fault+0x2bd/0x79b
> [<f9b75c80>] write_unix_file+0x0/0x495 [reiser4]
> [<c01514e9>] vfs_write+0x8a/0x136
> [<c0151a27>] sys_write+0x41/0x67
> [<c0103c86>] sysenter_past_esp+0x5f/0x85
> =======================
> Code: 04 89 0c 24 31 c9 89 5c 24 04 e8 52 fc ff ff 31 d2 e9 59 05 00 00
> 64 a1 08 00 00 00 8b 80 b4 04 00 00 8b 40 38 83 78 08 00 74 04 <0f> 0b
> eb fe 8b 8c 24 e0 00 00 00 31 db 8b 01 8b 51 04 89 c1 0f
> EIP: [<f9b8a2e0>] reiser4_write_extent+0xd5/0x626 [reiser4] SS:ESP
> 0068:e17d7e34
> <4>reiser4[sstrip(23858)]: release_unix_file
> (fs/reiser4/plugin/file/file.c:2417)[vs-44]:
> WARNING: out of memory?
> reiser4[sstrip(23858)]: release_unix_file
> (fs/reiser4/plugin/file/file.c:2417)[vs-44]:
> WARNING: out of memory?
>
>
>

2007-01-25 07:33:36

by Vince

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

Vladimir V. Saveliev wrote:
> Hello
>
> On Tuesday 23 January 2007 10:38, Vince wrote:
[...]
>> I don't know if it is related, but I've had the following BUG on
>> 2.6.20-rc4-mm1 (+ hot-fixes patches applied) :
>>
>> -------------------
>> kernel BUG at fs/reiser4/plugin/item/extent_file_ops.c:973!
>
> This is another problem than Zan's one. The attached patch should fix it.
>
> Andrew, please apply.
>
>
> From: Vladimir Saveliev <[email protected]>
>
> remove_suid may open a transaction in reiser4 which is to be restarted
> before entering into main write loop.
>
> Signed-off-by: Vladimir Saveliev <[email protected]>

I'm pleased to confirm I wasn't able to reproduce the bug with your
patch applied.

Regards,

Vince

2007-02-01 15:55:07

by Edward Shishkin

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

Zan Lynx wrote:

>On Sat, 2007-01-20 at 03:34 +0300, Vladimir V. Saveliev wrote:
>
>
>>Hello
>>
>>On Friday 19 January 2007 20:58, Zan Lynx wrote:
>>
>>
>>>I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
>>>and rc4-mm1 have been giving me these freezes. They were happening
>>>inside X and without external console it was impossible to get anything,
>>>plus I was reluctant to test it since the freeze sometimes requires a
>>>full fsck.reiser4 --build-fs to recover the filesystem.
>>>
>>>But I finally got some output in a console session. I wasn't able to
>>>get it all, I made some notes of what I think the problem is. I may try
>>>again later once I get netconsole working (netconsole fails as a
>>>built-in, I'll try it as a module next).
>>>
>>>
>[snip]
>
>
>>yes, please provide more information. Full kernel output at time of freeze is very desirable.
>>
>>
>
>Here comes a full sized bug report, as best as I can do it. This is
>kernel 2.6.20-rc6-mm3 instead of rc4-mm1. Still has the problem.
>
>

Thanks for the dump.

>[ 3138.456588] [<ffffffff8033f5de>] current_atom_finish_all_fq+0x12e/0x280
>[ 3138.456661] [<ffffffff80296510>] autoremove_wake_function+0x0/0x30
>[ 3138.456674] [<ffffffff803350ac>] submit_wb_list+0x11c/0x130
>[ 3138.456690] [<ffffffff80335409>] reiser4_txn_end+0x349/0x530
>[ 3138.456710] [<ffffffff803355f9>] reiser4_txn_restart+0x9/0x20
>[ 3138.456781] [<ffffffff80335680>] force_commit_atom+0x50/0x60
>[ 3138.456793] [<ffffffff8034cfb1>] writepages_unix_file+0x671/0x780
>[ 3138.456824] [<ffffffff802590b3>] do_writepages+0x43/0x80
>[ 3138.456838] [<ffffffff8024dbf8>] __filemap_fdatawrite_range+0x58/0x70
>[ 3138.456914] [<ffffffff8024e19d>] do_fsync+0x3d/0xe0
>[ 3138.456930] [<ffffffff802c2473>] sys_msync+0x143/0x1f0
>[ 3138.456945] [<ffffffff8025c11e>] system_call+0x7e/0x83
>
>

This is waiting for IO completion, and no success because of new plugging
policy introduced by block layer folks. The attached patch should help.
Andrew, please apply.

Thanks,
Edward.


Attachments:
reiser4-vs-git-block3.patch (1.93 kB)

2007-02-02 00:21:41

by Zan Lynx

[permalink] [raw]
Subject: Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

On Thu, 2007-02-01 at 18:54 +0300, Edward Shishkin wrote:
[snip]
> Thanks for the dump.
>
> >[ 3138.456588] [<ffffffff8033f5de>] current_atom_finish_all_fq+0x12e/0x280
> >[ 3138.456661] [<ffffffff80296510>] autoremove_wake_function+0x0/0x30
> >[ 3138.456674] [<ffffffff803350ac>] submit_wb_list+0x11c/0x130
> >[ 3138.456690] [<ffffffff80335409>] reiser4_txn_end+0x349/0x530
> >[ 3138.456710] [<ffffffff803355f9>] reiser4_txn_restart+0x9/0x20
> >[ 3138.456781] [<ffffffff80335680>] force_commit_atom+0x50/0x60
> >[ 3138.456793] [<ffffffff8034cfb1>] writepages_unix_file+0x671/0x780
> >[ 3138.456824] [<ffffffff802590b3>] do_writepages+0x43/0x80
> >[ 3138.456838] [<ffffffff8024dbf8>] __filemap_fdatawrite_range+0x58/0x70
> >[ 3138.456914] [<ffffffff8024e19d>] do_fsync+0x3d/0xe0
> >[ 3138.456930] [<ffffffff802c2473>] sys_msync+0x143/0x1f0
> >[ 3138.456945] [<ffffffff8025c11e>] system_call+0x7e/0x83
> >
> >
>
> This is waiting for IO completion, and no success because of new plugging
> policy introduced by block layer folks. The attached patch should help.
> Andrew, please apply.

OK, I have been using it with your patch for many hours and it has not
frozen up yet. I believe that the patch did indeed fix it.

Thank you.
--
Zan Lynx <[email protected]>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part