2008-12-05 14:50:21

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Correctly release and allocate a new request on TUR retries

On Fri, Dec 05 2008, Alan D. Brunelle wrote:
>

>
> Commands needing to be retried (TUR in this case) would result in a block
> I/O request being re-used, without being re-initialized properly. This
> patch ensures that the requests are correctly re-initialized via
> standard allocation means.
>
> Prior to this patch, boots were failing consistently as in:
> http://lkml.org/lkml/2008/12/5/161
>
> With this patch in place, the system is booting reliably.
>
> Signed-off-by: Alan D. Brunelle <[email protected]>
> Cc: Jens Axboe <[email protected]>

Looks good.

Acked-by: Jens Axboe <[email protected]>

Perhaps James can push it in, I'm about to shutdown for the day...


> ---
> drivers/scsi/device_handler/scsi_dh_hp_sw.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> index 9aec4ca..1f6b6a8 100644
> --- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> +++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> @@ -107,6 +107,7 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
> struct request *req;
> int ret;
>
> +retry:
> req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO);
> if (!req)
> return SCSI_DH_RES_TEMP_UNAVAIL;
> @@ -121,7 +122,6 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
> memset(req->sense, 0, SCSI_SENSE_BUFFERSIZE);
> req->sense_len = 0;
>
> -retry:
> ret = blk_execute_rq(req->q, NULL, req, 1);
> if (ret == -EIO) {
> if (req->sense_len > 0) {
> @@ -136,8 +136,10 @@ retry:
> h->path_state = HP_SW_PATH_ACTIVE;
> ret = SCSI_DH_OK;
> }
> - if (ret == SCSI_DH_IMM_RETRY)
> + if (ret == SCSI_DH_IMM_RETRY) {
> + blk_put_request(req);
> goto retry;
> + }
> if (ret == SCSI_DH_DEV_OFFLINED) {
> h->path_state = HP_SW_PATH_PASSIVE;
> ret = SCSI_DH_OK;
> --
> 1.5.6.3
>


--
Jens Axboe

2008-12-05 18:09:11

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCH] Correctly release and allocate a new request on TUR retries

Jens Axboe <[email protected]> wrote:
> On Fri, Dec 05 2008, Alan D. Brunelle wrote:
> >
>
> >
> > Commands needing to be retried (TUR in this case) would result in a block
> > I/O request being re-used, without being re-initialized properly. This
> > patch ensures that the requests are correctly re-initialized via
> > standard allocation means.
> >
> > Prior to this patch, boots were failing consistently as in:
> > http://lkml.org/lkml/2008/12/5/161
> >
> > With this patch in place, the system is booting reliably.
> >
> > Signed-off-by: Alan D. Brunelle <[email protected]>
> > Cc: Jens Axboe <[email protected]>
>
> Looks good.
>
> Acked-by: Jens Axboe <[email protected]>
>
> Perhaps James can push it in, I'm about to shutdown for the day...
>

I know a failure was not detected in the hp_sw_start_stop function, but it
uses the same retry method as hp_sw_tur we should update this function
also.

I made a quick scope of callers of blk_get_request and I did not see a
repeated of this retry usage model. I will make another pass to see if I
missed something.

>
> > ---
> > drivers/scsi/device_handler/scsi_dh_hp_sw.c | 6 ++++--
> > 1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> > index 9aec4ca..1f6b6a8 100644
> > --- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> > +++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> > @@ -107,6 +107,7 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
> > struct request *req;
> > int ret;
> >
> > +retry:
> > req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO);
> > if (!req)
> > return SCSI_DH_RES_TEMP_UNAVAIL;
> > @@ -121,7 +122,6 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
> > memset(req->sense, 0, SCSI_SENSE_BUFFERSIZE);
> > req->sense_len = 0;
> >
> > -retry:
> > ret = blk_execute_rq(req->q, NULL, req, 1);
> > if (ret == -EIO) {
> > if (req->sense_len > 0) {
> > @@ -136,8 +136,10 @@ retry:
> > h->path_state = HP_SW_PATH_ACTIVE;
> > ret = SCSI_DH_OK;
> > }
> > - if (ret == SCSI_DH_IMM_RETRY)
> > + if (ret == SCSI_DH_IMM_RETRY) {
> > + blk_put_request(req);
> > goto retry;
> > + }
> > if (ret == SCSI_DH_DEV_OFFLINED) {
> > h->path_state = HP_SW_PATH_PASSIVE;
> > ret = SCSI_DH_OK;
> > --
> > 1.5.6.3
> >
>
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

-andmike
--
Michael Anderson
[email protected]

2008-12-08 13:16:07

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [PATCH] Correctly release and allocate a new request on TUR retries

Mike Anderson wrote:
> Jens Axboe <[email protected]> wrote:
>> On Fri, Dec 05 2008, Alan D. Brunelle wrote:
>>> Commands needing to be retried (TUR in this case) would result in a block
>>> I/O request being re-used, without being re-initialized properly. This
>>> patch ensures that the requests are correctly re-initialized via
>>> standard allocation means.
>>>
>>> Prior to this patch, boots were failing consistently as in:
>>> http://lkml.org/lkml/2008/12/5/161
>>>
>>> With this patch in place, the system is booting reliably.
>>>
>>> Signed-off-by: Alan D. Brunelle <[email protected]>
>>> Cc: Jens Axboe <[email protected]>
>> Looks good.
>>
>> Acked-by: Jens Axboe <[email protected]>
>>
>> Perhaps James can push it in, I'm about to shutdown for the day...
>>
>
> I know a failure was not detected in the hp_sw_start_stop function, but it
> uses the same retry method as hp_sw_tur we should update this function
> also.
>
> I made a quick scope of callers of blk_get_request and I did not see a
> repeated of this retry usage model. I will make another pass to see if I
> missed something.

drivers/cdrom/cdrom.c:cdrom_read_cdda_bpc() is even worse: it gets one
request, then sits in a while loop re-using the same request over and
over again.

Since blk_rq_init() is an exported symbol, perhaps instead of having the
three callers realloc, it _may_ be sufficient to just have them call
that before re-use? (See attached un-tested patch for an example.)

Regards,
Alan


Attachments:
0001-Correctly-re-initialize-requests-on-retries.patch (1.79 kB)

2008-12-08 13:21:24

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Correctly release and allocate a new request on TUR retries

On Mon, Dec 08 2008, Alan D. Brunelle wrote:
> Mike Anderson wrote:
> > Jens Axboe <[email protected]> wrote:
> >> On Fri, Dec 05 2008, Alan D. Brunelle wrote:
> >>> Commands needing to be retried (TUR in this case) would result in a block
> >>> I/O request being re-used, without being re-initialized properly. This
> >>> patch ensures that the requests are correctly re-initialized via
> >>> standard allocation means.
> >>>
> >>> Prior to this patch, boots were failing consistently as in:
> >>> http://lkml.org/lkml/2008/12/5/161
> >>>
> >>> With this patch in place, the system is booting reliably.
> >>>
> >>> Signed-off-by: Alan D. Brunelle <[email protected]>
> >>> Cc: Jens Axboe <[email protected]>
> >> Looks good.
> >>
> >> Acked-by: Jens Axboe <[email protected]>
> >>
> >> Perhaps James can push it in, I'm about to shutdown for the day...
> >>
> >
> > I know a failure was not detected in the hp_sw_start_stop function, but it
> > uses the same retry method as hp_sw_tur we should update this function
> > also.
> >
> > I made a quick scope of callers of blk_get_request and I did not see a
> > repeated of this retry usage model. I will make another pass to see if I
> > missed something.
>
> drivers/cdrom/cdrom.c:cdrom_read_cdda_bpc() is even worse: it gets one
> request, then sits in a while loop re-using the same request over and
> over again.

Sigh, it does indeed look messy...

> Since blk_rq_init() is an exported symbol, perhaps instead of having the
> three callers realloc, it _may_ be sufficient to just have them call
> that before re-use? (See attached un-tested patch for an example.)

I think that's a really bad idea, since it basically just clears the
'rq'. If you have that rq on some list (timeout, for instance), the
kernel will not be happy. I think we have to, for now at least, put and
get a request before looping. Then for 2.6.29 we can hopefully improve
this situation!


>
> Regards,
> Alan

>
> Commands needing to be retried would result in a block I/O request being
> re-used, without being re-initialized properly. This patch ensures that
> the requests are correctly re-initialized via standard allocation means.
>
> Prior to this patch, boots were failing consistently as in:
> http://lkml.org/lkml/2008/12/5/161
>
> With this patch in place, the system is booting reliably.
>
> Signed-off-by: Alan D. Brunelle <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Mike Anderson <[email protected]>
> Cc: James Bottomley <[email protected]>
> ---
> drivers/cdrom/cdrom.c | 2 ++
> drivers/scsi/device_handler/scsi_dh_hp_sw.c | 8 ++++++--
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
> index d16b024..0b86d8a 100644
> --- a/drivers/cdrom/cdrom.c
> +++ b/drivers/cdrom/cdrom.c
> @@ -2131,6 +2131,8 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
> nframes -= nr;
> lba += nr;
> ubuf += len;
> +
> + blk_rq_init(q, rq);
> }
>
> blk_put_request(rq);
> diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> index 9aec4ca..075ae35 100644
> --- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> +++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> @@ -136,8 +136,10 @@ retry:
> h->path_state = HP_SW_PATH_ACTIVE;
> ret = SCSI_DH_OK;
> }
> - if (ret == SCSI_DH_IMM_RETRY)
> + if (ret == SCSI_DH_IMM_RETRY) {
> + blk_rq_init(req->q, q);
> goto retry;
> + }
> if (ret == SCSI_DH_DEV_OFFLINED) {
> h->path_state = HP_SW_PATH_PASSIVE;
> ret = SCSI_DH_OK;
> @@ -231,8 +233,10 @@ retry:
> ret = SCSI_DH_OK;
>
> if (ret == SCSI_DH_RETRY) {
> - if (--retry)
> + if (--retry) {
> + blk_rq_init(req->q, req);
> goto retry;
> + }
> ret = SCSI_DH_IO;
> }
>
> --
> 1.5.6.3
>


--
Jens Axboe

2008-12-08 13:26:15

by Alan D. Brunelle

[permalink] [raw]
Subject: Re: [PATCH] Correctly release and allocate a new request on TUR retries

Jens Axboe wrote:
> On Mon, Dec 08 2008, Alan D. Brunelle wrote:
>> Mike Anderson wrote:
>>> Jens Axboe <[email protected]> wrote:
>>>> On Fri, Dec 05 2008, Alan D. Brunelle wrote:
>>>>> Commands needing to be retried (TUR in this case) would result in a block
>>>>> I/O request being re-used, without being re-initialized properly. This
>>>>> patch ensures that the requests are correctly re-initialized via
>>>>> standard allocation means.
>>>>>
>>>>> Prior to this patch, boots were failing consistently as in:
>>>>> http://lkml.org/lkml/2008/12/5/161
>>>>>
>>>>> With this patch in place, the system is booting reliably.
>>>>>
>>>>> Signed-off-by: Alan D. Brunelle <[email protected]>
>>>>> Cc: Jens Axboe <[email protected]>
>>>> Looks good.
>>>>
>>>> Acked-by: Jens Axboe <[email protected]>
>>>>
>>>> Perhaps James can push it in, I'm about to shutdown for the day...
>>>>
>>> I know a failure was not detected in the hp_sw_start_stop function, but it
>>> uses the same retry method as hp_sw_tur we should update this function
>>> also.
>>>
>>> I made a quick scope of callers of blk_get_request and I did not see a
>>> repeated of this retry usage model. I will make another pass to see if I
>>> missed something.
>> drivers/cdrom/cdrom.c:cdrom_read_cdda_bpc() is even worse: it gets one
>> request, then sits in a while loop re-using the same request over and
>> over again.
>
> Sigh, it does indeed look messy...
>
>> Since blk_rq_init() is an exported symbol, perhaps instead of having the
>> three callers realloc, it _may_ be sufficient to just have them call
>> that before re-use? (See attached un-tested patch for an example.)
>
> I think that's a really bad idea, since it basically just clears the
> 'rq'. If you have that rq on some list (timeout, for instance), the
> kernel will not be happy. I think we have to, for now at least, put and
> get a request before looping. Then for 2.6.29 we can hopefully improve
> this situation!

OK, I'll work that one up for 2.6.28 and test it out this morning.

Alan

2008-12-08 13:29:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Correctly release and allocate a new request on TUR retries

On Mon, Dec 08 2008, Alan D. Brunelle wrote:
> Jens Axboe wrote:
> > On Mon, Dec 08 2008, Alan D. Brunelle wrote:
> >> Mike Anderson wrote:
> >>> Jens Axboe <[email protected]> wrote:
> >>>> On Fri, Dec 05 2008, Alan D. Brunelle wrote:
> >>>>> Commands needing to be retried (TUR in this case) would result in a block
> >>>>> I/O request being re-used, without being re-initialized properly. This
> >>>>> patch ensures that the requests are correctly re-initialized via
> >>>>> standard allocation means.
> >>>>>
> >>>>> Prior to this patch, boots were failing consistently as in:
> >>>>> http://lkml.org/lkml/2008/12/5/161
> >>>>>
> >>>>> With this patch in place, the system is booting reliably.
> >>>>>
> >>>>> Signed-off-by: Alan D. Brunelle <[email protected]>
> >>>>> Cc: Jens Axboe <[email protected]>
> >>>> Looks good.
> >>>>
> >>>> Acked-by: Jens Axboe <[email protected]>
> >>>>
> >>>> Perhaps James can push it in, I'm about to shutdown for the day...
> >>>>
> >>> I know a failure was not detected in the hp_sw_start_stop function, but it
> >>> uses the same retry method as hp_sw_tur we should update this function
> >>> also.
> >>>
> >>> I made a quick scope of callers of blk_get_request and I did not see a
> >>> repeated of this retry usage model. I will make another pass to see if I
> >>> missed something.
> >> drivers/cdrom/cdrom.c:cdrom_read_cdda_bpc() is even worse: it gets one
> >> request, then sits in a while loop re-using the same request over and
> >> over again.
> >
> > Sigh, it does indeed look messy...
> >
> >> Since blk_rq_init() is an exported symbol, perhaps instead of having the
> >> three callers realloc, it _may_ be sufficient to just have them call
> >> that before re-use? (See attached un-tested patch for an example.)
> >
> > I think that's a really bad idea, since it basically just clears the
> > 'rq'. If you have that rq on some list (timeout, for instance), the
> > kernel will not be happy. I think we have to, for now at least, put and
> > get a request before looping. Then for 2.6.29 we can hopefully improve
> > this situation!
>
> OK, I'll work that one up for 2.6.28 and test it out this morning.

Thanks Alan, if do you that and confirm that it works with your hw
handler problem at least, then I'll get it upstream asap.

--
Jens Axboe