LinuxLists.cc - Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

2016-03-01 07:30:06

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
> A race condition exists between request requeueing and scsi layer
> error handling:
> When UFS driver queuecommand returns a busy status for a request,
> it will be requeued and its tag will be freed and set to -1.
> At the same time it is possible that the request will timeout and
> scsi layer will start error handling for it. The scsi layer reuses
> the request and its tag to send error related commands to the device,
> however its tag is no longer valid.
Hmm. How can the host return a 'busy' status for a request?
>From my understanding we have three possibilities:

1) queuecommand returns busy; however, that means that the command has
never been send and this issue shouldn't occur
2) The command returns with BUSY status. But in this case it has already
been returned, so there cannot be any timeout coming in.
3) The host receives a command with a tag which is already in-use.
However, that should have been prevented by the block-layer, which
really should ensure that this situation never happens.

So either way I look at it, it really looks like a bug and adding a
timeout handler will just paper over it.
(Not that a timeout handler is a bad idea, in fact I'm convinced that
you need one. Just not for this purpose.)

So can you elaborate how this 'busy' status comes about?
Is the command sent to the device?

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2016-03-01 13:25:53

by Yaniv Gardi

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>> A race condition exists between request requeueing and scsi layer
>> error handling:
>> When UFS driver queuecommand returns a busy status for a request,
>> it will be requeued and its tag will be freed and set to -1.
>> At the same time it is possible that the request will timeout and
>> scsi layer will start error handling for it. The scsi layer reuses
>> the request and its tag to send error related commands to the device,
>> however its tag is no longer valid.
> Hmm. How can the host return a 'busy' status for a request?
> From my understanding we have three possibilities:
>
> 1) queuecommand returns busy; however, that means that the command has
> never been send and this issue shouldn't occur
> 2) The command returns with BUSY status. But in this case it has already
> been returned, so there cannot be any timeout coming in.
> 3) The host receives a command with a tag which is already in-use.
> However, that should have been prevented by the block-layer, which
> really should ensure that this situation never happens.
>
> So either way I look at it, it really looks like a bug and adding a
> timeout handler will just paper over it.
> (Not that a timeout handler is a bad idea, in fact I'm convinced that
> you need one. Just not for this purpose.)
>
> So can you elaborate how this 'busy' status comes about?
> Is the command sent to the device?
>
> Cheers,
>
> Hannes

Hi Hannes,

it's going to be a bit long :)
I think you are missing the point.
I will describe a race condition happened to us a while ago, that was
quite difficult to understand and fix.
So, this patch is not about the "busy" returning to the scsi dispatch
routine. it's about the abort triggered after 30 seconds.

imagine a request being queued and sent to the scsi, and then to the ufs.
a timer, initialized to 30 seconds start ticking.
but the request is never sent to the ufs device, as queuecommand() returns
with "SCSI_MLQUEUE_HOST_BUSY"
by looking at the code, this could happen, for example:
err = ufshcd_hold(hba, true);
if (err) {
err = SCSI_MLQUEUE_HOST_BUSY;
goto out;
}

so, now, the request should be re-queued, and its timer should be reset.
(REMEMBER THIS POINT, let's call it "POINT A")
BUT, a context switch happens before it's actually re-queued, and CPU is
moving to other tasks, doing other things for 30 seconds. yes, sounds
crazy, but it did happen.

NOW, the timeout_handler invoked, and the scsi_abort() routine start
executing, (since 30 seconds passed with no completion).
so far, so good.
but hey, another context switch happens, right at the beginning of
scsi_abort() routine, before anything useful happens. (this is "POINT B")
so, now, context is going back "POINT A", to the blk_requeue_request()
routine, that is calling:
blk_delete_timer(rq); (which does nothing cause the timer already expired)
and then it calls:
blk_queue_end_tag()
which place "-1" in the tag field of the request, marking the request, as
"not tagged yet".

however, a context switch happens again, and we are back in scsi_abort()
routine ("POINT B"), that now needs to abort this very request, but hey,
in the "tag" field, what it sees is tag "-1" which is obviously wrong.

this patch fixes this very rare race condition:
1. upon timeout, blk_rq_timed_out() is called
2. then it calls rq_timed_out_fn() which eventually call
the new callback presented in this patch: "ufshcd_eh_timed_out()"
3. this routine returns with the right flag:
BLK_EH_NOT_HANDLED or BLK_EH_RESET_TIMER.
4. blk_rq_timed_out() checks the returned value:
in case of BLK_EH_HANDLED, it handles normally, meaning, calling scsi_abort()
in case of BLK_EH_RESET_TIMER it starts a new timer, and scsi_abort()
never called.

hope that helps.
regards,
Yaniv

> --
> Dr. Hannes Reinecke zSeries & Storage
> [email protected] +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)
>

2016-03-03 07:23:16

by Hannes Reinecke

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

On 03/01/2016 09:25 PM, [email protected] wrote:
>> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>>> A race condition exists between request requeueing and scsi layer
>>> error handling:
>>> When UFS driver queuecommand returns a busy status for a request,
>>> it will be requeued and its tag will be freed and set to -1.
>>> At the same time it is possible that the request will timeout and
>>> scsi layer will start error handling for it. The scsi layer reuses
>>> the request and its tag to send error related commands to the device,
>>> however its tag is no longer valid.
>> Hmm. How can the host return a 'busy' status for a request?
>> From my understanding we have three possibilities:
>>
>> 1) queuecommand returns busy; however, that means that the command has
>> never been send and this issue shouldn't occur
>> 2) The command returns with BUSY status. But in this case it has already
>> been returned, so there cannot be any timeout coming in.
>> 3) The host receives a command with a tag which is already in-use.
>> However, that should have been prevented by the block-layer, which
>> really should ensure that this situation never happens.
>>
>> So either way I look at it, it really looks like a bug and adding a
>> timeout handler will just paper over it.
>> (Not that a timeout handler is a bad idea, in fact I'm convinced that
>> you need one. Just not for this purpose.)
>>
>> So can you elaborate how this 'busy' status comes about?
>> Is the command sent to the device?
>>
>> Cheers,
>>
>> Hannes
>
>
> Hi Hannes,
>
> it's going to be a bit long :)
> I think you are missing the point.
> I will describe a race condition happened to us a while ago, that was
> quite difficult to understand and fix.
> So, this patch is not about the "busy" returning to the scsi dispatch
> routine. it's about the abort triggered after 30 seconds.
>
> imagine a request being queued and sent to the scsi, and then to the ufs.
> a timer, initialized to 30 seconds start ticking.
> but the request is never sent to the ufs device, as queuecommand() returns
> with "SCSI_MLQUEUE_HOST_BUSY"
> by looking at the code, this could happen, for example:
> err = ufshcd_hold(hba, true);
> if (err) {
> err = SCSI_MLQUEUE_HOST_BUSY;
> goto out;
> }
>
Uuhhh.
You probably should not have pointed me to that piece of code ...
open-coding loops in ufshcd_hold() ... shudder.
(Did I ever review that one? Must've ...)
_Anyway_: sleeping in queuecommand is always a bad idea, as then
precisely those issues you've just described will happen.

Couldn't you just call
ufshcd_hold(hba, false)
instead of
ufshcd_hold(hba, true)
?
The request will be requeued more-or-less immediately, avoiding the
issue with timeout handler kicking in.
And the queue will remain blocked until the ungate work item returns, at
which point I/O submission will continue.
As the request will be requeued to the head of the queue there won't be
other I/O competing with tags, so it shouldn't have any adverse effects.

Wouldn't that work?

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2016-03-03 09:10:34

by Yaniv Gardi

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

> On 03/01/2016 09:25 PM, [email protected] wrote:
>>> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>>>> A race condition exists between request requeueing and scsi layer
>>>> error handling:
>>>> When UFS driver queuecommand returns a busy status for a request,
>>>> it will be requeued and its tag will be freed and set to -1.
>>>> At the same time it is possible that the request will timeout and
>>>> scsi layer will start error handling for it. The scsi layer reuses
>>>> the request and its tag to send error related commands to the device,
>>>> however its tag is no longer valid.
>>> Hmm. How can the host return a 'busy' status for a request?
>>> From my understanding we have three possibilities:
>>>
>>> 1) queuecommand returns busy; however, that means that the command has
>>> never been send and this issue shouldn't occur
>>> 2) The command returns with BUSY status. But in this case it has
>>> already
>>> been returned, so there cannot be any timeout coming in.
>>> 3) The host receives a command with a tag which is already in-use.
>>> However, that should have been prevented by the block-layer, which
>>> really should ensure that this situation never happens.
>>>
>>> So either way I look at it, it really looks like a bug and adding a
>>> timeout handler will just paper over it.
>>> (Not that a timeout handler is a bad idea, in fact I'm convinced that
>>> you need one. Just not for this purpose.)
>>>
>>> So can you elaborate how this 'busy' status comes about?
>>> Is the command sent to the device?
>>>
>>> Cheers,
>>>
>>> Hannes
>>
>>
>> Hi Hannes,
>>
>> it's going to be a bit long :)
>> I think you are missing the point.
>> I will describe a race condition happened to us a while ago, that was
>> quite difficult to understand and fix.
>> So, this patch is not about the "busy" returning to the scsi dispatch
>> routine. it's about the abort triggered after 30 seconds.
>>
>> imagine a request being queued and sent to the scsi, and then to the
>> ufs.
>> a timer, initialized to 30 seconds start ticking.
>> but the request is never sent to the ufs device, as queuecommand()
>> returns
>> with "SCSI_MLQUEUE_HOST_BUSY"
>> by looking at the code, this could happen, for example:
>> err = ufshcd_hold(hba, true);
>> if (err) {
>> err = SCSI_MLQUEUE_HOST_BUSY;
>> goto out;
>> }
>>
> Uuhhh.
> You probably should not have pointed me to that piece of code ...
> open-coding loops in ufshcd_hold() ... shudder.
> (Did I ever review that one? Must've ...)
> _Anyway_: sleeping in queuecommand is always a bad idea, as then
> precisely those issues you've just described will happen.
>
> Couldn't you just call
> ufshcd_hold(hba, false)
> instead of
> ufshcd_hold(hba, true)
> ?
> The request will be requeued more-or-less immediately, avoiding the
> issue with timeout handler kicking in.
> And the queue will remain blocked until the ungate work item returns, at
> which point I/O submission will continue.
> As the request will be requeued to the head of the queue there won't be
> other I/O competing with tags, so it shouldn't have any adverse effects.
>
> Wouldn't that work?
>
> Cheers,
>
> Hannes

Hi Hannes

This is a bug, and it should be fixed.
if you choose to bypass it, by calling ufshcd_hold(hba, false), not only
the race condition is still there, and can pop-out at any other point in
the future, but also, not sure what are the consequences of
ufshcd_hold(hba, false) unstead of "true".
so, changing the already tested and working code, (not to return BUSY from
queuecommand) is not a fix.
I strongly recommend we upstream this race-condition fix.

thanks,
Yaniv

> --
> Dr. Hannes Reinecke zSeries & Storage
> [email protected] +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-03-03 12:53:24

by Hannes Reinecke

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

On 03/03/2016 05:10 PM, [email protected] wrote:
>> On 03/01/2016 09:25 PM, [email protected] wrote:
>>>> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>>>>> A race condition exists between request requeueing and scsi layer
>>>>> error handling:
>>>>> When UFS driver queuecommand returns a busy status for a request,
>>>>> it will be requeued and its tag will be freed and set to -1.
>>>>> At the same time it is possible that the request will timeout and
>>>>> scsi layer will start error handling for it. The scsi layer reuses
>>>>> the request and its tag to send error related commands to the device,
>>>>> however its tag is no longer valid.
>>>> Hmm. How can the host return a 'busy' status for a request?
>>>> From my understanding we have three possibilities:
>>>>
>>>> 1) queuecommand returns busy; however, that means that the command has
>>>> never been send and this issue shouldn't occur
>>>> 2) The command returns with BUSY status. But in this case it has
>>>> already
>>>> been returned, so there cannot be any timeout coming in.
>>>> 3) The host receives a command with a tag which is already in-use.
>>>> However, that should have been prevented by the block-layer, which
>>>> really should ensure that this situation never happens.
>>>>
>>>> So either way I look at it, it really looks like a bug and adding a
>>>> timeout handler will just paper over it.
>>>> (Not that a timeout handler is a bad idea, in fact I'm convinced that
>>>> you need one. Just not for this purpose.)
>>>>
>>>> So can you elaborate how this 'busy' status comes about?
>>>> Is the command sent to the device?
>>>>
>>>> Cheers,
>>>>
>>>> Hannes
>>>
>>>
>>> Hi Hannes,
>>>
>>> it's going to be a bit long :)
>>> I think you are missing the point.
>>> I will describe a race condition happened to us a while ago, that was
>>> quite difficult to understand and fix.
>>> So, this patch is not about the "busy" returning to the scsi dispatch
>>> routine. it's about the abort triggered after 30 seconds.
>>>
>>> imagine a request being queued and sent to the scsi, and then to the
>>> ufs.
>>> a timer, initialized to 30 seconds start ticking.
>>> but the request is never sent to the ufs device, as queuecommand()
>>> returns
>>> with "SCSI_MLQUEUE_HOST_BUSY"
>>> by looking at the code, this could happen, for example:
>>> err = ufshcd_hold(hba, true);
>>> if (err) {
>>> err = SCSI_MLQUEUE_HOST_BUSY;
>>> goto out;
>>> }
>>>
>> Uuhhh.
>> You probably should not have pointed me to that piece of code ...
>> open-coding loops in ufshcd_hold() ... shudder.
>> (Did I ever review that one? Must've ...)
>> _Anyway_: sleeping in queuecommand is always a bad idea, as then
>> precisely those issues you've just described will happen.
>>
>> Couldn't you just call
>> ufshcd_hold(hba, false)
>> instead of
>> ufshcd_hold(hba, true)
>> ?
>> The request will be requeued more-or-less immediately, avoiding the
>> issue with timeout handler kicking in.
>> And the queue will remain blocked until the ungate work item returns, at
>> which point I/O submission will continue.
>> As the request will be requeued to the head of the queue there won't be
>> other I/O competing with tags, so it shouldn't have any adverse effects.
>>
>> Wouldn't that work?
>>
>> Cheers,
>>
>> Hannes
>
> Hi Hannes
>
> This is a bug, and it should be fixed.
Oh, definitely agreed. The question is _where_.

> if you choose to bypass it, by calling ufshcd_hold(hba, false), not only
> the race condition is still there, and can pop-out at any other point in
> the future, but also, not sure what are the consequences of
> ufshcd_hold(hba, false) unstead of "true".
Well ... seeing it's your driver, I would've thought _you_ should know ...

> so, changing the already tested and working code, (not to return BUSY from
> queuecommand) is not a fix.
Hey, I did _not_ suggest not to retury BUSY from queuecommand.

I was suggesting this patch:

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9c1b94b..b9295ad 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1388,7 +1388,7 @@ static int ufshcd_queuecommand(struct Scsi_Host
*host, struct scsi_cmnd *cmd)
goto out;
}

- err = ufshcd_hold(hba, true);
+ err = ufshcd_hold(hba, false);
if (err) {
err = SCSI_MLQUEUE_HOST_BUSY;
clear_bit_unlock(tag, &hba->lrb_in_use);

which, by reading the code, should be avoiding this issue.
I was just asking you if you could give this patch a spin and see if it
works. If not (for whatever reason) I'm happy to accept your patch.
But first I would like to have an explanation why the above would _not_
work.

Unfortunately I don't have the hardware otherwise I'd be running the
tests myself.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2016-03-06 10:33:20

by Yaniv Gardi

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

> On 03/03/2016 05:10 PM, [email protected] wrote:
>>> On 03/01/2016 09:25 PM, [email protected] wrote:
>>>>> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>>>>>> A race condition exists between request requeueing and scsi layer
>>>>>> error handling:
>>>>>> When UFS driver queuecommand returns a busy status for a request,
>>>>>> it will be requeued and its tag will be freed and set to -1.
>>>>>> At the same time it is possible that the request will timeout and
>>>>>> scsi layer will start error handling for it. The scsi layer reuses
>>>>>> the request and its tag to send error related commands to the
>>>>>> device,
>>>>>> however its tag is no longer valid.
>>>>> Hmm. How can the host return a 'busy' status for a request?
>>>>> From my understanding we have three possibilities:
>>>>>
>>>>> 1) queuecommand returns busy; however, that means that the command
>>>>> has
>>>>> never been send and this issue shouldn't occur
>>>>> 2) The command returns with BUSY status. But in this case it has
>>>>> already
>>>>> been returned, so there cannot be any timeout coming in.
>>>>> 3) The host receives a command with a tag which is already in-use.
>>>>> However, that should have been prevented by the block-layer, which
>>>>> really should ensure that this situation never happens.
>>>>>
>>>>> So either way I look at it, it really looks like a bug and adding a
>>>>> timeout handler will just paper over it.
>>>>> (Not that a timeout handler is a bad idea, in fact I'm convinced that
>>>>> you need one. Just not for this purpose.)
>>>>>
>>>>> So can you elaborate how this 'busy' status comes about?
>>>>> Is the command sent to the device?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Hannes
>>>>
>>>>
>>>> Hi Hannes,
>>>>
>>>> it's going to be a bit long :)
>>>> I think you are missing the point.
>>>> I will describe a race condition happened to us a while ago, that was
>>>> quite difficult to understand and fix.
>>>> So, this patch is not about the "busy" returning to the scsi dispatch
>>>> routine. it's about the abort triggered after 30 seconds.
>>>>
>>>> imagine a request being queued and sent to the scsi, and then to the
>>>> ufs.
>>>> a timer, initialized to 30 seconds start ticking.
>>>> but the request is never sent to the ufs device, as queuecommand()
>>>> returns
>>>> with "SCSI_MLQUEUE_HOST_BUSY"
>>>> by looking at the code, this could happen, for example:
>>>> err = ufshcd_hold(hba, true);
>>>> if (err) {
>>>> err = SCSI_MLQUEUE_HOST_BUSY;
>>>> goto out;
>>>> }
>>>>
>>> Uuhhh.
>>> You probably should not have pointed me to that piece of code ...
>>> open-coding loops in ufshcd_hold() ... shudder.
>>> (Did I ever review that one? Must've ...)
>>> _Anyway_: sleeping in queuecommand is always a bad idea, as then
>>> precisely those issues you've just described will happen.
>>>
>>> Couldn't you just call
>>> ufshcd_hold(hba, false)
>>> instead of
>>> ufshcd_hold(hba, true)
>>> ?
>>> The request will be requeued more-or-less immediately, avoiding the
>>> issue with timeout handler kicking in.
>>> And the queue will remain blocked until the ungate work item returns,
>>> at
>>> which point I/O submission will continue.
>>> As the request will be requeued to the head of the queue there won't be
>>> other I/O competing with tags, so it shouldn't have any adverse
>>> effects.
>>>
>>> Wouldn't that work?
>>>
>>> Cheers,
>>>
>>> Hannes
>>
>> Hi Hannes
>>
>> This is a bug, and it should be fixed.
> Oh, definitely agreed. The question is _where_.
>
>
>> if you choose to bypass it, by calling ufshcd_hold(hba, false), not only
>> the race condition is still there, and can pop-out at any other point in
>> the future, but also, not sure what are the consequences of
>> ufshcd_hold(hba, false) unstead of "true".
> Well ... seeing it's your driver, I would've thought _you_ should know ...
>
>> so, changing the already tested and working code, (not to return BUSY
>> from
>> queuecommand) is not a fix.
> Hey, I did _not_ suggest not to retury BUSY from queuecommand.
>
> I was suggesting this patch:
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 9c1b94b..b9295ad 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -1388,7 +1388,7 @@ static int ufshcd_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *cmd)
> goto out;
> }
>
> - err = ufshcd_hold(hba, true);
> + err = ufshcd_hold(hba, false);
> if (err) {
> err = SCSI_MLQUEUE_HOST_BUSY;
> clear_bit_unlock(tag, &hba->lrb_in_use);
>
> which, by reading the code, should be avoiding this issue.

Hannes,
we are not trying to avoid returning BUSY from queuecommand().
On the contrary. By returning BUSY we actually re-queuing the request
which is exactly what we need to do.
your patch doesn't fix the race condition.

thanks,
Yaniv

> I was just asking you if you could give this patch a spin and see if it
> works. If not (for whatever reason) I'm happy to accept your patch.
> But first I would like to have an explanation why the above would _not_
> work.
>
> Unfortunately I don't have the hardware otherwise I'd be running the
> tests myself.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> [email protected] +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-03-08 11:48:33

by Yaniv Gardi

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

Hello, Hannes,

Re-sending

thanks,
Yaniv

>> On 03/03/2016 05:10 PM, [email protected] wrote:
>>>> On 03/01/2016 09:25 PM, [email protected] wrote:
>>>>>> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>>>>>>> A race condition exists between request requeueing and scsi layer
>>>>>>> error handling:
>>>>>>> When UFS driver queuecommand returns a busy status for a request,
>>>>>>> it will be requeued and its tag will be freed and set to -1.
>>>>>>> At the same time it is possible that the request will timeout and
>>>>>>> scsi layer will start error handling for it. The scsi layer reuses
>>>>>>> the request and its tag to send error related commands to the
>>>>>>> device,
>>>>>>> however its tag is no longer valid.
>>>>>> Hmm. How can the host return a 'busy' status for a request?
>>>>>> From my understanding we have three possibilities:
>>>>>>
>>>>>> 1) queuecommand returns busy; however, that means that the command
>>>>>> has
>>>>>> never been send and this issue shouldn't occur
>>>>>> 2) The command returns with BUSY status. But in this case it has
>>>>>> already
>>>>>> been returned, so there cannot be any timeout coming in.
>>>>>> 3) The host receives a command with a tag which is already in-use.
>>>>>> However, that should have been prevented by the block-layer, which
>>>>>> really should ensure that this situation never happens.
>>>>>>
>>>>>> So either way I look at it, it really looks like a bug and adding a
>>>>>> timeout handler will just paper over it.
>>>>>> (Not that a timeout handler is a bad idea, in fact I'm convinced
>>>>>> that
>>>>>> you need one. Just not for this purpose.)
>>>>>>
>>>>>> So can you elaborate how this 'busy' status comes about?
>>>>>> Is the command sent to the device?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Hannes
>>>>>
>>>>>
>>>>> Hi Hannes,
>>>>>
>>>>> it's going to be a bit long :)
>>>>> I think you are missing the point.
>>>>> I will describe a race condition happened to us a while ago, that was
>>>>> quite difficult to understand and fix.
>>>>> So, this patch is not about the "busy" returning to the scsi dispatch
>>>>> routine. it's about the abort triggered after 30 seconds.
>>>>>
>>>>> imagine a request being queued and sent to the scsi, and then to the
>>>>> ufs.
>>>>> a timer, initialized to 30 seconds start ticking.
>>>>> but the request is never sent to the ufs device, as queuecommand()
>>>>> returns
>>>>> with "SCSI_MLQUEUE_HOST_BUSY"
>>>>> by looking at the code, this could happen, for example:
>>>>> err = ufshcd_hold(hba, true);
>>>>> if (err) {
>>>>> err = SCSI_MLQUEUE_HOST_BUSY;
>>>>> goto out;
>>>>> }
>>>>>
>>>> Uuhhh.
>>>> You probably should not have pointed me to that piece of code ...
>>>> open-coding loops in ufshcd_hold() ... shudder.
>>>> (Did I ever review that one? Must've ...)
>>>> _Anyway_: sleeping in queuecommand is always a bad idea, as then
>>>> precisely those issues you've just described will happen.
>>>>
>>>> Couldn't you just call
>>>> ufshcd_hold(hba, false)
>>>> instead of
>>>> ufshcd_hold(hba, true)
>>>> ?
>>>> The request will be requeued more-or-less immediately, avoiding the
>>>> issue with timeout handler kicking in.
>>>> And the queue will remain blocked until the ungate work item returns,
>>>> at
>>>> which point I/O submission will continue.
>>>> As the request will be requeued to the head of the queue there won't
>>>> be
>>>> other I/O competing with tags, so it shouldn't have any adverse
>>>> effects.
>>>>
>>>> Wouldn't that work?
>>>>
>>>> Cheers,
>>>>
>>>> Hannes
>>>
>>> Hi Hannes
>>>
>>> This is a bug, and it should be fixed.
>> Oh, definitely agreed. The question is _where_.
>>
>>
>>> if you choose to bypass it, by calling ufshcd_hold(hba, false), not
>>> only
>>> the race condition is still there, and can pop-out at any other point
>>> in
>>> the future, but also, not sure what are the consequences of
>>> ufshcd_hold(hba, false) unstead of "true".
>> Well ... seeing it's your driver, I would've thought _you_ should know
>> ...
>>
>>> so, changing the already tested and working code, (not to return BUSY
>>> from
>>> queuecommand) is not a fix.
>> Hey, I did _not_ suggest not to retury BUSY from queuecommand.
>>
>> I was suggesting this patch:
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 9c1b94b..b9295ad 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -1388,7 +1388,7 @@ static int ufshcd_queuecommand(struct Scsi_Host
>> *host, struct scsi_cmnd *cmd)
>> goto out;
>> }
>>
>> - err = ufshcd_hold(hba, true);
>> + err = ufshcd_hold(hba, false);
>> if (err) {
>> err = SCSI_MLQUEUE_HOST_BUSY;
>> clear_bit_unlock(tag, &hba->lrb_in_use);
>>
>> which, by reading the code, should be avoiding this issue.
>
>
> Hannes,
> we are not trying to avoid returning BUSY from queuecommand().
> On the contrary. By returning BUSY we actually re-queuing the request
> which is exactly what we need to do.
> your patch doesn't fix the race condition.
>
> thanks,
> Yaniv
>
>> I was just asking you if you could give this patch a spin and see if it
>> works. If not (for whatever reason) I'm happy to accept your patch.
>> But first I would like to have an explanation why the above would _not_
>> work.
>>
>> Unfortunately I don't have the hardware otherwise I'd be running the
>> tests myself.
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke zSeries & Storage
>> [email protected] +49 911 74053 688
>> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
>> GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-03-08 11:48:37

by Yaniv Gardi

[permalink] [raw]

Subject: Re: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

2016-03-08 12:26:36

by Dolev Raviv

[permalink] [raw]

Subject: RE: [PATCH v5 03/15] scsi: ufs: implement scsi host timeout handler

>> On 03/03/2016 05:10 PM, [email protected] wrote:
>>>> On 03/01/2016 09:25 PM, [email protected] wrote:
>>>>>> On 02/28/2016 09:32 PM, Yaniv Gardi wrote:
>>>>>>> A race condition exists between request requeueing and scsi
>>>>>>> layer error handling:
>>>>>>> When UFS driver queuecommand returns a busy status for a
>>>>>>> request, it will be requeued and its tag will be freed and set to
-1.
>>>>>>> At the same time it is possible that the request will timeout
>>>>>>> and scsi layer will start error handling for it. The scsi layer
>>>>>>> reuses the request and its tag to send error related commands to
>>>>>>> the device, however its tag is no longer valid.
>>>>>> Hmm. How can the host return a 'busy' status for a request?
>>>>>> From my understanding we have three possibilities:
>>>>>>
>>>>>> 1) queuecommand returns busy; however, that means that the
>>>>>> command has never been send and this issue shouldn't occur
>>>>>> 2) The command returns with BUSY status. But in this case it has
>>>>>> already been returned, so there cannot be any timeout coming in.
>>>>>> 3) The host receives a command with a tag which is already in-use.
>>>>>> However, that should have been prevented by the block-layer,
>>>>>> which really should ensure that this situation never happens.
>>>>>>
>>>>>> So either way I look at it, it really looks like a bug and adding
>>>>>> a timeout handler will just paper over it.
>>>>>> (Not that a timeout handler is a bad idea, in fact I'm convinced
>>>>>> that you need one. Just not for this purpose.)
>>>>>>
>>>>>> So can you elaborate how this 'busy' status comes about?
>>>>>> Is the command sent to the device?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Hannes
>>>>>
>>>>>
>>>>> Hi Hannes,
>>>>>
>>>>> it's going to be a bit long :)
>>>>> I think you are missing the point.
>>>>> I will describe a race condition happened to us a while ago, that
>>>>> was quite difficult to understand and fix.
>>>>> So, this patch is not about the "busy" returning to the scsi
>>>>> dispatch routine. it's about the abort triggered after 30 seconds.
>>>>>
>>>>> imagine a request being queued and sent to the scsi, and then to
>>>>> the ufs.
>>>>> a timer, initialized to 30 seconds start ticking.
>>>>> but the request is never sent to the ufs device, as queuecommand()
>>>>> returns with "SCSI_MLQUEUE_HOST_BUSY"
>>>>> by looking at the code, this could happen, for example:
>>>>> err = ufshcd_hold(hba, true);
>>>>> if (err) {
>>>>> err = SCSI_MLQUEUE_HOST_BUSY;
>>>>> goto out;
>>>>> }
>>>>>
>>>> Uuhhh.
>>>> You probably should not have pointed me to that piece of code ...
>>>> open-coding loops in ufshcd_hold() ... shudder.
>>>> (Did I ever review that one? Must've ...)
>>>> _Anyway_: sleeping in queuecommand is always a bad idea, as then
>>>> precisely those issues you've just described will happen.
>>>>
>>>> Couldn't you just call
>>>> ufshcd_hold(hba, false)
>>>> instead of
>>>> ufshcd_hold(hba, true)
>>>> ?
>>>> The request will be requeued more-or-less immediately, avoiding the
>>>> issue with timeout handler kicking in.
>>>> And the queue will remain blocked until the ungate work item
>>>> returns, at which point I/O submission will continue.
>>>> As the request will be requeued to the head of the queue there
>>>> won't be other I/O competing with tags, so it shouldn't have any
>>>> adverse effects.
>>>>
>>>> Wouldn't that work?
>>>>
>>>> Cheers,
>>>>
>>>> Hannes
>>>
>>> Hi Hannes
>>>
>>> This is a bug, and it should be fixed.
>> Oh, definitely agreed. The question is _where_.
>>
>>
>>> if you choose to bypass it, by calling ufshcd_hold(hba, false), not
>>> only the race condition is still there, and can pop-out at any other
>>> point in the future, but also, not sure what are the consequences of
>>> ufshcd_hold(hba, false) unstead of "true".
>> Well ... seeing it's your driver, I would've thought _you_ should
>> know ...
>>
>>> so, changing the already tested and working code, (not to return
>>> BUSY from
>>> queuecommand) is not a fix.
>> Hey, I did _not_ suggest not to retury BUSY from queuecommand.
>>
>> I was suggesting this patch:
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 9c1b94b..b9295ad 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -1388,7 +1388,7 @@ static int ufshcd_queuecommand(struct Scsi_Host
>> *host, struct scsi_cmnd *cmd)
>> goto out;
>> }
>>
>> - err = ufshcd_hold(hba, true);
>> + err = ufshcd_hold(hba, false);
>> if (err) {
>> err = SCSI_MLQUEUE_HOST_BUSY;
>> clear_bit_unlock(tag, &hba->lrb_in_use);
>>
>> which, by reading the code, should be avoiding this issue.
>
>
> Hannes,
> we are not trying to avoid returning BUSY from queuecommand().
> On the contrary. By returning BUSY we actually re-queuing the request
> which is exactly what we need to do.
> your patch doesn't fix the race condition.
>
> thanks,
> Yaniv
>
>> I was just asking you if you could give this patch a spin and see if
>> it works. If not (for whatever reason) I'm happy to accept your patch.
>> But first I would like to have an explanation why the above would
>> _not_ work.
>>
>> Unfortunately I don't have the hardware otherwise I'd be running the
>> tests myself.
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke zSeries & Storage
>> [email protected] +49 911 74053 688
>> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
>> GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
>> in the body of a message to [email protected] More majordomo
>> info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in the body of a message to [email protected] More majordomo
> info at http://vger.kernel.org/majordomo-info.html
>

I reviewed the patch, you can add

Reviewed-by: Dolev Raviv <[email protected]>

Thanks,
Dolev
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project