2002-10-08 21:58:29

by Benjamin LaHaise

[permalink] [raw]
Subject: [patch] silence an unnescessary raid5 debugging message

Hello Ingo,

LVM manages to trigger the "raid5: switching cache buffer size" printk
quiet voluminously when using a snapshot device. The following patch
disables it by placing it under the debugging PRINTK macro.

-ben
--
"Do you seek knowledge in time travel?"

diff -urN linux.orig/drivers/md/raid5.c linux/drivers/md/raid5.c
--- linux.orig/drivers/md/raid5.c Mon Feb 25 14:37:58 2002
+++ linux/drivers/md/raid5.c Tue Oct 8 17:56:43 2002
@@ -282,7 +282,7 @@
}

if (conf->buffer_size != size) {
- printk("raid5: switching cache buffer size, %d --> %d\n", oldsize, size);
+ PRINTK("raid5: switching cache buffer size, %d --> %d\n", oldsize, size);
shrink_stripe_cache(conf);
if (size==0) BUG();
conf->buffer_size = size;


2002-10-08 23:27:56

by NeilBrown

[permalink] [raw]
Subject: Re: [patch] silence an unnescessary raid5 debugging message

On Tuesday October 8, [email protected] wrote:
> Hello Ingo,
>
> LVM manages to trigger the "raid5: switching cache buffer size" printk
> quiet voluminously when using a snapshot device. The following patch
> disables it by placing it under the debugging PRINTK macro.

This is there as a 'printk' deliberately. It warns you that what you
are doing isn't really supported and will cause a substantial
performance hit (as all IO to the array is completely serialised
around one of these messages).

If you want to make it PRITNK for yourself, that is fine. But I would
rather it stayed as printk in the mainstream kernel untill the root
problem is fixed (and I have seen patches that possibly fix the
problem, but I haven't had a chance to look at them).

NeilBrown

>
> -ben
> --
> "Do you seek knowledge in time travel?"
>
> diff -urN linux.orig/drivers/md/raid5.c linux/drivers/md/raid5.c
> --- linux.orig/drivers/md/raid5.c Mon Feb 25 14:37:58 2002
> +++ linux/drivers/md/raid5.c Tue Oct 8 17:56:43 2002
> @@ -282,7 +282,7 @@
> }
>
> if (conf->buffer_size != size) {
> - printk("raid5: switching cache buffer size, %d --> %d\n", oldsize, size);
> + PRINTK("raid5: switching cache buffer size, %d --> %d\n", oldsize, size);
> shrink_stripe_cache(conf);
> if (size==0) BUG();
> conf->buffer_size = size;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-10-08 23:31:13

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [patch] silence an unnescessary raid5 debugging message

On Wed, Oct 09, 2002 at 09:31:14AM +1000, Neil Brown wrote:
> This is there as a 'printk' deliberately. It warns you that what you
> are doing isn't really supported and will cause a substantial
> performance hit (as all IO to the array is completely serialised
> around one of these messages).

As it stands, the syslogging from the printk does more damage to performance
than the underlying problem. Besides, LVM snapshots are slow, but they're
useful for a class of problems anyways.

-ben

2002-10-09 00:54:07

by David Miller

[permalink] [raw]
Subject: Re: [patch] silence an unnescessary raid5 debugging message

From: Benjamin LaHaise <[email protected]>
Date: Tue, 8 Oct 2002 19:36:12 -0400

As it stands, the syslogging from the printk does more damage to
performance than the underlying problem. Besides, LVM snapshots
are slow, but they're useful for a class of problems anyways.

He's just saying kill the real problem first, that's all.

2002-10-09 00:59:06

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [patch] silence an unnescessary raid5 debugging message

On Tue, Oct 08, 2002 at 05:51:16PM -0700, David S. Miller wrote:
> From: Benjamin LaHaise <[email protected]>
> Date: Tue, 8 Oct 2002 19:36:12 -0400
>
> As it stands, the syslogging from the printk does more damage to
> performance than the underlying problem. Besides, LVM snapshots
> are slow, but they're useful for a class of problems anyways.
>
> He's just saying kill the real problem first, that's all.

I'm just saying that the message is the only real problem I have with
the state of 2.4. Sure, 2.5 deserves it fixed correctly, but I doubt
the correct fix will make it into 2.4 anytime soon (it's far more
dangerous than we should consider shipping in a "stable" series).

-ben
--
"Do you seek knowledge in time travel?"

2002-10-09 11:49:50

by jw schultz

[permalink] [raw]
Subject: Re: [patch] silence an unnescessary raid5 debugging message

On Tue, Oct 08, 2002 at 09:02:49PM -0400, Benjamin LaHaise wrote:
> On Tue, Oct 08, 2002 at 05:51:16PM -0700, David S. Miller wrote:
> > From: Benjamin LaHaise <[email protected]>
> > Date: Tue, 8 Oct 2002 19:36:12 -0400
> >
> > As it stands, the syslogging from the printk does more damage to
> > performance than the underlying problem. Besides, LVM snapshots
> > are slow, but they're useful for a class of problems anyways.
> >
> > He's just saying kill the real problem first, that's all.
>
> I'm just saying that the message is the only real problem I have with
> the state of 2.4. Sure, 2.5 deserves it fixed correctly, but I doubt
> the correct fix will make it into 2.4 anytime soon (it's far more
> dangerous than we should consider shipping in a "stable" series).

It sound to me like a big nasty message should be generated
when the snapshot volume is created to advise that
performance will be horribly impaired. Thrashing the log
file to warn the user is like breaking your arm to remind
yourself to trim your fingernails.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-10-09 17:31:25

by David Mansfield

[permalink] [raw]
Subject: Re: [patch] silence an unnescessary raid5 debugging message


>LVM manages to trigger the "raid5: switching cache buffer size" printk
>quiet voluminously when using a snapshot device. The following patch
>disables it by placing it under the debugging PRINTK macro.

Ben (and Ingo),

I happen to hit this message thousands of times per second sometimes under
normal operation in certain loads (raw devices for oracle and fs on LVM on
raid5). I understand that it's annoying, I actually think it shouldn't be
removed, because it's telling the operator importantn information.

As I understand it, the message is indicating a really bad performance
problem (i.e a complete flush of the stripe cache), and that anyone
encountering it on a very frequent (i.e. annoying) basis should consider
changing their setup.

Encountering this message has forced us to plan to split the single raid5
we have into two, in order to satisfy the different request sizes of the
raw-device vs. the ext3 fs.

David

P.S. Is there any hope of fixing this issue so that the stripe cache can
handle different sized requests? Possibly is this a bug in LVM?

--
/==============================\
| David Mansfield |
| [email protected] |
\==============================/