2016-03-09 01:58:38

by NeilBrown

[permalink] [raw]
Subject: [PATCH] md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list


break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another. This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero. So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/[email protected]
Reported-by: Martin Svec <[email protected]> (and others)
Tested-by: Tom Weber <[email protected]>
Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
Cc: [email protected] (v4.1 and later)
Signed-off-by: NeilBrown <[email protected]>
---
drivers/md/raid5.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b4f02c9959f2..2e7d253be6ce 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4236,7 +4236,6 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
WARN_ON_ONCE(sh->state & ((1 << STRIPE_ACTIVE) |
(1 << STRIPE_SYNCING) |
(1 << STRIPE_REPLACED) |
- (1 << STRIPE_PREREAD_ACTIVE) |
(1 << STRIPE_DELAYED) |
(1 << STRIPE_BIT_DELAY) |
(1 << STRIPE_FULL_WRITE) |
@@ -4251,6 +4250,7 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
(1 << STRIPE_REPLACED)));

set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS |
+ (1 << STRIPE_PREREAD_ACTIVE) |
(1 << STRIPE_DEGRADED)),
head_sh->state & (1 << STRIPE_INSYNC));

--
2.7.2


Attachments:
signature.asc (818.00 B)

2016-03-09 17:27:03

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH] md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

On Wed, Mar 09, 2016 at 12:58:25PM +1100, Neil Brown wrote:
>
> break_stripe_batch_list breaks up a batch and copies some flags from
> the batch head to the members, preserving others.
>
> It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not
> normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
> stripe_head is added to a batch, and is not set on stripe_heads
> already in a batch.
>
> However there is no locking to ensure one thread doesn't set the flag
> after it has just been cleared in another. This does occasionally happen.
>
> md/raid5 maintains a count of the number of stripe_heads with
> STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When
> break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
> this could becomes incorrect and will never again return to zero.
>
> md/raid5 delays the handling of some stripe_heads until
> preread_active_stripes becomes zero. So when the above mention race
> happens, those stripe_heads become blocked and never progress,
> resulting is write to the array handing.
>
> So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
> in the members of a batch.
>
> URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
> URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
> URL: http://thread.gmane.org/[email protected]
> Reported-by: Martin Svec <[email protected]> (and others)
> Tested-by: Tom Weber <[email protected]>
> Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
> Cc: [email protected] (v4.1 and later)
> Signed-off-by: NeilBrown <[email protected]>

Applied, thanks Neil! I'll split the WARN_ON_ONCE and do it for each bit, so
next time we can have clear clue.

Thanks,
Shaohua

> ---
> drivers/md/raid5.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index b4f02c9959f2..2e7d253be6ce 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4236,7 +4236,6 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
> WARN_ON_ONCE(sh->state & ((1 << STRIPE_ACTIVE) |
> (1 << STRIPE_SYNCING) |
> (1 << STRIPE_REPLACED) |
> - (1 << STRIPE_PREREAD_ACTIVE) |
> (1 << STRIPE_DELAYED) |
> (1 << STRIPE_BIT_DELAY) |
> (1 << STRIPE_FULL_WRITE) |
> @@ -4251,6 +4250,7 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
> (1 << STRIPE_REPLACED)));
>
> set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS |
> + (1 << STRIPE_PREREAD_ACTIVE) |
> (1 << STRIPE_DEGRADED)),
> head_sh->state & (1 << STRIPE_INSYNC));
>
> --
> 2.7.2
>


2016-03-09 19:20:05

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

On Thu, Mar 10 2016, Shaohua Li wrote:

> On Wed, Mar 09, 2016 at 12:58:25PM +1100, Neil Brown wrote:
>>
>> break_stripe_batch_list breaks up a batch and copies some flags from
>> the batch head to the members, preserving others.
>>
>> It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not
>> normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
>> stripe_head is added to a batch, and is not set on stripe_heads
>> already in a batch.
>>
>> However there is no locking to ensure one thread doesn't set the flag
>> after it has just been cleared in another. This does occasionally happen.
>>
>> md/raid5 maintains a count of the number of stripe_heads with
>> STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When
>> break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
>> this could becomes incorrect and will never again return to zero.
>>
>> md/raid5 delays the handling of some stripe_heads until
>> preread_active_stripes becomes zero. So when the above mention race
>> happens, those stripe_heads become blocked and never progress,
>> resulting is write to the array handing.
>>
>> So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
>> in the members of a batch.
>>
>> URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
>> URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
>> URL: http://thread.gmane.org/[email protected]
>> Reported-by: Martin Svec <[email protected]> (and others)
>> Tested-by: Tom Weber <[email protected]>
>> Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
>> Cc: [email protected] (v4.1 and later)
>> Signed-off-by: NeilBrown <[email protected]>
>
> Applied, thanks Neil! I'll split the WARN_ON_ONCE and do it for each bit, so
> next time we can have clear clue.

I personally think that would look ugly and increase the in-line code
size for minimal gain.
If you want to make a change (which I'm in two minds about) I think it
would be much cleaner to do
if (WARN_ON_ONCE(...)) printk(....);

Then at least the extra code will be out of line - not even loaded into
the instruction cache until needed.

Thanks,
NeilBrown

>
> Thanks,
> Shaohua
>
>> ---
>> drivers/md/raid5.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index b4f02c9959f2..2e7d253be6ce 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -4236,7 +4236,6 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
>> WARN_ON_ONCE(sh->state & ((1 << STRIPE_ACTIVE) |
>> (1 << STRIPE_SYNCING) |
>> (1 << STRIPE_REPLACED) |
>> - (1 << STRIPE_PREREAD_ACTIVE) |
>> (1 << STRIPE_DELAYED) |
>> (1 << STRIPE_BIT_DELAY) |
>> (1 << STRIPE_FULL_WRITE) |
>> @@ -4251,6 +4250,7 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
>> (1 << STRIPE_REPLACED)));
>>
>> set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS |
>> + (1 << STRIPE_PREREAD_ACTIVE) |
>> (1 << STRIPE_DEGRADED)),
>> head_sh->state & (1 << STRIPE_INSYNC));
>>
>> --
>> 2.7.2
>>


Attachments:
signature.asc (818.00 B)

2016-03-09 19:23:42

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH] md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

On Thu, Mar 10, 2016 at 06:19:42AM +1100, Neil Brown wrote:
> On Thu, Mar 10 2016, Shaohua Li wrote:
>
> > On Wed, Mar 09, 2016 at 12:58:25PM +1100, Neil Brown wrote:
> >>
> >> break_stripe_batch_list breaks up a batch and copies some flags from
> >> the batch head to the members, preserving others.
> >>
> >> It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not
> >> normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
> >> stripe_head is added to a batch, and is not set on stripe_heads
> >> already in a batch.
> >>
> >> However there is no locking to ensure one thread doesn't set the flag
> >> after it has just been cleared in another. This does occasionally happen.
> >>
> >> md/raid5 maintains a count of the number of stripe_heads with
> >> STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes. When
> >> break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
> >> this could becomes incorrect and will never again return to zero.
> >>
> >> md/raid5 delays the handling of some stripe_heads until
> >> preread_active_stripes becomes zero. So when the above mention race
> >> happens, those stripe_heads become blocked and never progress,
> >> resulting is write to the array handing.
> >>
> >> So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
> >> in the members of a batch.
> >>
> >> URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
> >> URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
> >> URL: http://thread.gmane.org/[email protected]
> >> Reported-by: Martin Svec <[email protected]> (and others)
> >> Tested-by: Tom Weber <[email protected]>
> >> Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
> >> Cc: [email protected] (v4.1 and later)
> >> Signed-off-by: NeilBrown <[email protected]>
> >
> > Applied, thanks Neil! I'll split the WARN_ON_ONCE and do it for each bit, so
> > next time we can have clear clue.
>
> I personally think that would look ugly and increase the in-line code
> size for minimal gain.
> If you want to make a change (which I'm in two minds about) I think it
> would be much cleaner to do
> if (WARN_ON_ONCE(...)) printk(....);
>
> Then at least the extra code will be out of line - not even loaded into
> the instruction cache until needed.

There is a handy WARN_ONCE(). It's like WARN_ON_ONCE() but allows printing exra info.

Thanks,
Shaohua