2005-03-02 10:10:18

by Florian Engelhardt

[permalink] [raw]
Subject: freezes with reiser4 in a raid1 with 2.6.11-rc5-mm1

Hello,

i?m having some trouble here with my testing server.
It uses the 2.6.11-rc5-mm1 kernel, there are three hd?s
in it (all reiser4), hda is the system and boot disk, hdc and hdd are
in a raid 1 (via the kernel?s multiple device driver).
Without the raid, the system works as expected, but when
activating the raid, the system wents unstable and in
some cases it freezes or reboots.

I activated the raid (/dev/md0), then mounted it, and after
that i was starting nfs. I was able to mount the share
on my desktop, creating direcrotys was no problem, but
as soon as i was copying a file to the share, the server
freezed.
Creating files localy (while loged in via ssh) is leading
to the process is staying in state D.
Sometimes, when i start nfsd, the system reboots immediately,
sometimes not.
At the momment, most of the processes are in state D, reboot
does not work, and i am not at home, so i am unable to reboot
the machine manualy.

Every process that trys to do any IO operations on the raid
remains now in state D.

Are there any Problems known with reiser4, linux raid and nfs?

Kind Regards

Florian Engelhardt

PS: I am not on the list, so please CC me


2005-03-02 10:39:16

by Brad Campbell

[permalink] [raw]
Subject: Re: freezes with reiser4 in a raid1 with 2.6.11-rc5-mm1

Florian Engelhardt wrote:
>
> I activated the raid (/dev/md0), then mounted it, and after
> that i was starting nfs. I was able to mount the share
> on my desktop, creating direcrotys was no problem, but
> as soon as i was copying a file to the share, the server
> freezed.
> Creating files localy (while loged in via ssh) is leading
> to the process is staying in state D.
> Sometimes, when i start nfsd, the system reboots immediately,
> sometimes not.
> At the momment, most of the processes are in state D, reboot
> does not work, and i am not at home, so i am unable to reboot
> the machine manualy.

Neat trick which I only discovered in desparation last week when battling a RAID lockup on the
-rc4-mm1 kernel on a remote box.

I was also having hard lockup issues, but reseating all my PCI cards appear to have rectified that one.

As root. echo b > /proc/sysrq-trigger

Of course only if you have alt-sysrq built in.

Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

2005-03-05 09:07:03

by Florian Engelhardt

[permalink] [raw]
Subject: Re: freezes with reiser4 in a raid1 with 2.6.11-rc5-mm1

Hello,

On Wed, 02 Mar 2005 14:38:59 +0400
Brad Campbell <[email protected]> wrote:

> Florian Engelhardt wrote:
> >
> > I activated the raid (/dev/md0), then mounted it, and after
> > that i was starting nfs. I was able to mount the share
> > on my desktop, creating direcrotys was no problem, but
> > as soon as i was copying a file to the share, the server
> > freezed.
> > Creating files localy (while loged in via ssh) is leading
> > to the process is staying in state D.
> > Sometimes, when i start nfsd, the system reboots immediately,
> > sometimes not.
> > At the momment, most of the processes are in state D, reboot
> > does not work, and i am not at home, so i am unable to reboot
> > the machine manualy.
>
> Neat trick which I only discovered in desparation last week when
> battling a RAID lockup on the -rc4-mm1 kernel on a remote box.
>
> I was also having hard lockup issues, but reseating all my PCI cards
> appear to have rectified that one.

Well, there are not much PCI-Cards in this server and reseating them
didnt fix it.

> As root. echo b > /proc/sysrq-trigger
>
> Of course only if you have alt-sysrq built in.

Thanks for that, i was able to reboot the machine with that trick, but
i couldnt find anything bad in the messages file.

I made some further tests with the server:
Deactivating the raid, and formating the hd?s (hdc and hdd) with
reiser4, mounting them and sharing them via nfs and ftp worked great, no
freezes, no reboots, everything perfect, even the performance.
But as soon, as i activated the raid, the server freezed, or rebooted.

Maybe this problem is not a bug in a single component (eg: nfs or
reiser4), i think it is the combination of linux raid with reiser4, but
i dont know.
I will try to get the raid up and running with ext3 and/or jfs.

Then we know exactly, if it is the combination of raid and reiser4.

Kind regards

Florian Engelhardt

--
"I may have invented it, but Bill made it famous"
David Bradley, who invented the (in)famous ctrl-alt-del key combination

2005-03-05 09:28:45

by Brad Campbell

[permalink] [raw]
Subject: Re: freezes with reiser4 in a raid1 with 2.6.11-rc5-mm1

Florian Engelhardt wrote:

>>Neat trick which I only discovered in desparation last week when
>>battling a RAID lockup on the -rc4-mm1 kernel on a remote box.
>>
>>I was also having hard lockup issues, but reseating all my PCI cards
>>appear to have rectified that one.
>
>
> Well, there are not much PCI-Cards in this server and reseating them
> didnt fix it.

Sorry, I was just pointing out what "appeared" to solve my hard-lock problems, I was not suggesting
it as a cure for yours.

>>As root. echo b > /proc/sysrq-trigger
>>
>>Of course only if you have alt-sysrq built in.
>
>
> Thanks for that, i was able to reboot the machine with that trick, but
> i couldnt find anything bad in the messages file.
>
> I made some further tests with the server:
> Deactivating the raid, and formating the hd?s (hdc and hdd) with
> reiser4, mounting them and sharing them via nfs and ftp worked great, no
> freezes, no reboots, everything perfect, even the performance.
> But as soon, as i activated the raid, the server freezed, or rebooted.
>

A complete hard lock appears to be very rare these days with kernel bugs. It may be tickling a
hardware bug somewhere. My machine was only hard-locking when I was writing to the array. A complete
lock-up or reboot really does sound more hardware like. Have you tried running something like memtest86?
I found after a couple of hours of memtest86 my box would lock solid, which eliminated the linux
kernel from the equation completely.

I'm running ext3 on all my machines, so I can't help with reiser at all.
I'm running a largish raid5 on 2.6.10-bk10 and a fairly large raid6 on 2.6.11-rc5-bk3. I had
problems with raid6 on 2.6.11-rc4-mm1 causing raid subsystem lockups, but nothing that precluded me
from rebooting with the sysrq-trigger.

Regards,
Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams