2004-04-14 06:53:47

by Paul Wagland

[permalink] [raw]
Subject: reiser4 and megaraid problems with debian 2.6.5

Hi all,

I would like to report on a problem that I am having. I am just testing
out the new megaraid unified driver, and have been doing some baseline
testing with bonnie++.

My problem is that, although reiserfs, ext2, jfs and xfs all work,
reiser4 fails with the following error:
---
Can't write block.
Bonnie: drastic I/O error (write(2)): No such file or directory
---

I am using the debian prepared kernel with the debian reiser4 patch. I
made a cursory examination of the patch, and it appears to correlate
fairly closely with the patch from the namesys site.

Given that this works with reiserfs, ext2, jfs and xfs it would appear
to be a reiser4 problem, however ext3 also fails, though with a
different error, it claims that the disk is full, but it is trying to
write a 2 1GB files onto a 2.5GB filesystem, so it should have enough
room, and indeed it did even work two or three times out of about 10
runs (lots of timing :-). This implies that it might be a megaraid
problem. As you can tell, I really have no idea ;-)

I will try playing around tonight with an official kernel and the
official reiser4 patch to see if that makes any difference, but would
just like to raise this potential problem sooner rather than later.

If I can help debug this situation (I am probably the only person
trying this combination :-) please let me know how I should go about
it.

Cheers,
Paul


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-04-14 09:06:02

by Domenico Andreoli

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5

[ bringing this also on reiserfs ml, a great place for this kind
of posts. this is also the reason of the full quoting. sorry ]

On Wed, Apr 14, 2004 at 08:51:53AM +0200, Paul Wagland wrote:
> Hi all,

hi Paul,

> I would like to report on a problem that I am having. I am just testing
> out the new megaraid unified driver, and have been doing some baseline
> testing with bonnie++.
>
> My problem is that, although reiserfs, ext2, jfs and xfs all work,
> reiser4 fails with the following error:
> ---
> Can't write block.
> Bonnie: drastic I/O error (write(2)): No such file or directory
> ---
>
> I am using the debian prepared kernel with the debian reiser4 patch. I
> made a cursory examination of the patch, and it appears to correlate
> fairly closely with the patch from the namesys site.

of course it is correlated to that of namesys! i have no skills at all
to invent reiser4 :))

you forgot to specify version of the patch you are talking about,
currently debian provides two versions. anyway i suppose you are talking
about version 20040326-2, aren't you?

> Given that this works with reiserfs, ext2, jfs and xfs it would appear
> to be a reiser4 problem, however ext3 also fails, though with a
> different error, it claims that the disk is full, but it is trying to
> write a 2 1GB files onto a 2.5GB filesystem, so it should have enough
> room, and indeed it did even work two or three times out of about 10
> runs (lots of timing :-). This implies that it might be a megaraid
> problem. As you can tell, I really have no idea ;-)
>
> I will try playing around tonight with an official kernel and the
> official reiser4 patch to see if that makes any difference, but would
> just like to raise this potential problem sooner rather than later.

latest reiser4 snapshot provided a patch which applied cleanly on
2.6.5-rc2 but not to 2.6.5. i had to modify it as suggested on the
reiserfs ml. if you look at the debian package's changelog you can find
the reference to that thread.

> If I can help debug this situation (I am probably the only person
> trying this combination :-) please let me know how I should go about
> it.

i'm sorry but i can't help further.

cheers
domenico

-----[ Domenico Andreoli, aka cavok
--[ http://filibusta.crema.unimi.it/~cavok/gpgkey.asc
---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50

2004-04-14 12:37:25

by Paul Wagland

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5


On Apr 14, 2004, at 11:05, Domenico Andreoli wrote:

> [ bringing this also on reiserfs ml, a great place for this kind
> of posts. this is also the reason of the full quoting. sorry ]

Thanks ;-)

>> I am using the debian prepared kernel with the debian reiser4 patch. I
>> made a cursory examination of the patch, and it appears to correlate
>> fairly closely with the patch from the namesys site.
>
> you forgot to specify version of the patch you are talking about,
> currently debian provides two versions. anyway i suppose you are
> talking
> about version 20040326-2, aren't you?

Yes, that is correct.

>> If I can help debug this situation (I am probably the only person
>> trying this combination :-) please let me know how I should go about
>> it.
>
> i'm sorry but i can't help further.

Thanks for the tip... the link that you referred to was most useful. I
might now have an idea what the problem might be... Further on in the
thread <http://marc.theaimsgroup.com/?l=reiserfs&m=108117079808733&w=2>
it says that there is something in the patch that "can lead to a
dirtied_when in the future, and missed writeback". Well, what happens
if the directory that I am missing was in that writeback that got
missed?

I will try updating the debian patch myself and give it another test
tonight and will report back on my findings. But, before I do so, does
it seem likely that this could cause the problem?

Cheers,
Paul


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-04-14 13:11:53

by Nikita Danilov

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5

Paul Wagland writes:
>
> On Apr 14, 2004, at 11:05, Domenico Andreoli wrote:
>
> > [ bringing this also on reiserfs ml, a great place for this kind
> > of posts. this is also the reason of the full quoting. sorry ]
>
> Thanks ;-)
>
> >> I am using the debian prepared kernel with the debian reiser4 patch. I
> >> made a cursory examination of the patch, and it appears to correlate
> >> fairly closely with the patch from the namesys site.
> >
> > you forgot to specify version of the patch you are talking about,
> > currently debian provides two versions. anyway i suppose you are
> > talking
> > about version 20040326-2, aren't you?
>
> Yes, that is correct.
>
> >> If I can help debug this situation (I am probably the only person
> >> trying this combination :-) please let me know how I should go about
> >> it.

Is there anything in the logs?

[...]

>
> Cheers,
> Paul

Nikita.

2004-04-14 13:25:57

by Paul Wagland

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5


On Apr 14, 2004, at 15:09, Nikita Danilov wrote:

>>> Paul Wagland writes:
>>>> If I can help debug this situation (I am probably the only person
>>>> trying this combination :-) please let me know how I should go about
>>>> it.
>
> Is there anything in the logs?

Sadly I forgot to check... though I will check again tonight since the
problem is quite reproducible for me. Will report back later...

Cheers,
Paul


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-04-14 15:13:26

by Hans Reiser

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5

Paul Wagland wrote:

> Hi all,
>
> I would like to report on a problem that I am having. I am just
> testing out the new megaraid unified driver, and have been doing some
> baseline testing with bonnie++.
>
> My problem is that, although reiserfs, ext2, jfs and xfs all work,
> reiser4 fails with the following error:
> ---
> Can't write block.
> Bonnie: drastic I/O error (write(2)): No such file or directory
> ---
>
> I am using the debian prepared kernel with the debian reiser4 patch. I
> made a cursory examination of the patch, and it appears to correlate
> fairly closely with the patch from the namesys site.

In what way does it not correlate?

>
> Given that this works with reiserfs, ext2, jfs and xfs it would appear
> to be a reiser4 problem, however ext3 also fails, though with a
> different error, it claims that the disk is full, but it is trying to
> write a 2 1GB files onto a 2.5GB filesystem, so it should have enough
> room, and indeed it did even work two or three times out of about 10
> runs (lots of timing :-). This implies that it might be a megaraid
> problem. As you can tell, I really have no idea ;-)
>
> I will try playing around tonight with an official kernel and the
> official reiser4 patch to see if that makes any difference, but would
> just like to raise this potential problem sooner rather than later.
>
> If I can help debug this situation (I am probably the only person
> trying this combination :-) please let me know how I should go about it.
>
> Cheers,
> Paul

I don't have the hardware to test it, can you get the error without your
hardware?

--
Hans

2004-04-14 15:37:54

by Paul Wagland

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5

Hi,

On Apr 14, 2004, at 17:13, Hans Reiser wrote:

> Paul Wagland wrote:
>
>> I am using the debian prepared kernel with the debian reiser4 patch.
>> I made a cursory examination of the patch, and it appears to
>> correlate fairly closely with the patch from the namesys site.
>
> In what way does it not correlate?

As was mentioned by Domenico Andreoli the changes are just those
required to get reiser4 to work under 2.6.5. Other differences are line
offsets due to the fact that the debian kernel also has patches
applied.

>> If I can help debug this situation (I am probably the only person
>> trying this combination :-) please let me know how I should go about
>> it.
>
> I don't have the hardware to test it, can you get the error without
> your hardware?

Unfortunately, not easily, since this is the only box that I can
currently test this out on. However, there a couple of tests that I can
still perform (as mentioned elsewhere in this thread) and I will report
back on the results of those later tonight.

Cheers,
Paul


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-04-15 00:00:13

by Paul Wagland

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5

On Wed, 2004-04-14 at 15:25, Paul Wagland wrote:
> On Apr 14, 2004, at 15:09, Nikita Danilov wrote:
>
> >>> Paul Wagland writes:
> >>>> If I can help debug this situation (I am probably the only person
> >>>> trying this combination :-) please let me know how I should go about
> >>>> it.
> >
> > Is there anything in the logs?
>
> Sadly I forgot to check... though I will check again tonight since the
> problem is quite reproducible for me. Will report back later...

OK. There is nothing in the logs. I have recompiled the kernel with
extra REISER4 debugging and checking and still nothing.

This error is 100% reproducible for me.

I have had a thought, what if it is "only" the wrong error code that is
being returned? What if the real problem is that we are running out of
free blocks. To test this theory (a little at least) I ran:

# bonnie++ -q -x4 -d /mnt/sdq -u 0:0 -f -r500
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
tidbit.kungfoocoder.org,1G,,,55236,11,36165,10,,,73514,8,2138.3,2,16,+++++,+++,+++++,+++,25015,99,28712,100,+++++,+++,26846,100
tidbit.kungfoocoder.org,1G,,,55236,11,30073,8,,,84287,10,2046.9,2,16,+++++,+++,+++++,+++,24862,99,28340,99,+++++,+++,26490,99
tidbit.kungfoocoder.org,1G,,,55391,11,30140,9,,,84506,10,2050.2,2,16,+++++,+++,+++++,+++,24642,100,28725,100,+++++,+++,26653,100
tidbit.kungfoocoder.org,1G,,,55364,11,30165,8,,,83055,11,2051.9,2,16,+++++,+++,+++++,+++,24682,100,28264,100,+++++,+++,26804,99


Note that even with debugging turned on we are about 5% faster at
reading and 20% slower than writing compared to reiserfs. Pretty good I
dare say.

However, when I run:

~# bonnie++ -x4 -d /mnt/sdq -u 0:0 -f -q -r800
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
Can't write block.
Bonnie: drastic I/O error (re write(2)): No such file or directory

Using reiserfs I can happily run:
# bonnie++ -x4 -d /mnt/sdq -u 0:0 -f -q -r1008

and the partition is 2.5GB in size.

Some more background information: my hardware is not overclocked, and
has been 100% reliable, about two weeks ago I sat it through about 24
hours of memtest86+ without any problems. The machine has 1GB of RAM.
The logical partition that I am testing is 2.5Gb

Here are the REISER4 settings from my configuration:
tidbit:~# grep REISER4 /boot/config-2.6.5pw-newmega-k7-1
CONFIG_REISER4_FS=m
# CONFIG_REISER4_FS_SYSCALL is not set
CONFIG_REISER4_LARGE_KEY=y
CONFIG_REISER4_CHECK=y
CONFIG_REISER4_FS_SYSCALL_DEBUG=y
# CONFIG_REISER4_DEBUG_MODIFY is not set
# CONFIG_REISER4_DEBUG_MEMCPY is not set
# CONFIG_REISER4_DEBUG_NODE is not set
# CONFIG_REISER4_ZERO_NEW_NODE is not set
# CONFIG_REISER4_TRACE is not set
# CONFIG_REISER4_EVENT_LOG is not set
# CONFIG_REISER4_STATS is not set
# CONFIG_REISER4_PROF is not set
# CONFIG_REISER4_LOCKPROF is not set
# CONFIG_REISER4_DEBUG_OUTPUT is not set
# CONFIG_REISER4_NOOPT is not set
CONFIG_REISER4_USE_EFLUSH=y
# CONFIG_REISER4_COPY_ON_CAPTURE is not set
# CONFIG_REISER4_BADBLOCKS is not set


I have removed the |1 from the jiffies|1 assignment. It still works,
which means that the kernel must have been fixed :-) But it didn't help
:-\

Hope this helps provide some illumination to the gurus out there...

Cheers,
Paul

2004-04-18 22:36:51

by Paul Wagland

[permalink] [raw]
Subject: Re: reiser4 and megaraid problems with debian 2.6.5 (*solved*)

Hi all,

well partly solved anyway... I am just posting this so that if anyone
finds this thread later they can also find this conclusion... There is
still more work to be done before this problem can be properly closed,
but at least now I am certain that it has nothing to do with the
hardware :-)

It appears (my own unsupported theory) that the problem is that reiser4
is taking some time to free up the free blocks that are currently in use
by the wandering log. Since I was running a test that causes a lot of
wandering log to be created, and I was doing it on a filesystem with
very little free space, then I was running into the problem.

Rerunning the test with either a) more space, or b) a smaller data set
solved the problem. On the reiserfs-list we are now trying to find out
exactly why this is happening, and how to solve the problem properly.

Cheers,
Paul


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part