2018-10-30 08:29:26

by Zengtao (B)

[permalink] [raw]
Subject: scsi_set_medium_removal timeout issue

Hi

I have recently met a scsi_set_medium_removal timeout issue, and it's related
to both SCSI and USB MASS storage.
Since i am not an expert in either scsi or usb mass storage, i am writing to report
the issue and ask for a solution from you guys.

My test scenario is as follow:
1.Linux HOST-----Linux mass storage gadget(the back store is a flash partition).
2.Host mount the device.
3.Host writes some data to the Mass storage device.
4.Host Unmount the device.
Both Linux kernels(Host and Device) are Linux 4.9.
Some has reported the same issue a long time ago, but it remains there.
https://www.spinics.net/lists/linux-usb/msg53739.html

For the issue itself, there is my observation:
In the step 4, the Host will issue an PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
and and timeout happens due to the device 's very slow fsg_lun_fsync_sub.
I found there are two methods to workaround the issue:
1. Change the timeout value of host scsi command PREVENT_ALLOW_MEDIUM_REMOVAL to
to about 60 seconds from 10 seconds.
2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget driver.

Thanks

Regards
zentao


2018-10-30 08:57:22

by Oliver Neukum

[permalink] [raw]
Subject: Re: scsi_set_medium_removal timeout issue

On Di, 2018-10-30 at 08:28 +0000, Zengtao (B) wrote:
> Hi

> For the issue itself, there is my observation:
> In the step 4, the Host will issue an PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
> and and timeout happens due to the device 's very slow fsg_lun_fsync_sub.
> I found there are two methods to workaround the issue:
> 1. Change the timeout value of host scsi command PREVENT_ALLOW_MEDIUM_REMOVAL to
> to about 60 seconds from 10 seconds.

That is near useless, because the gadget can be used with other
systems.

> 2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget driver.

It exists for a reason. The blocks have to be on the medium.
It seems to me that your gadget just allows too many dirty pages in the
cache.

Regards
Oliver


2018-10-30 09:25:27

by Zengtao (B)

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

Hi:

>-----Original Message-----
>From: Oliver Neukum [mailto:[email protected]]
>Sent: Tuesday, October 30, 2018 4:56 PM
>To: Zengtao (B) <[email protected]>; [email protected];
>[email protected]; [email protected];
>[email protected]
>Cc: [email protected]; [email protected];
>[email protected]; [email protected]
>Subject: Re: scsi_set_medium_removal timeout issue
>
>On Di, 2018-10-30 at 08:28 +0000, Zengtao (B) wrote:
>> Hi
>
>> For the issue itself, there is my observation:
>> In the step 4, the Host will issue an
>PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
>> and and timeout happens due to the device 's very slow
>fsg_lun_fsync_sub.
>> I found there are two methods to workaround the issue:
>> 1. Change the timeout value of host scsi command
>> PREVENT_ALLOW_MEDIUM_REMOVAL to to about 60 seconds from 10
>seconds.
>
>That is near useless, because the gadget can be used with other systems.
>

Is it reasonable to keep the current value? Or can we change it to cover
as much systems as possible.

>> 2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget
>driver.
>
>It exists for a reason. The blocks have to be on the medium.
>It seems to me that your gadget just allows too many dirty pages in the
>cache.
>
> Regards
> Oliver

Regards
Zengtao

2018-10-30 14:10:01

by Alan Stern

[permalink] [raw]
Subject: Re: scsi_set_medium_removal timeout issue

On Tue, 30 Oct 2018, Zengtao (B) wrote:

> Hi
>
> I have recently met a scsi_set_medium_removal timeout issue, and it's related
> to both SCSI and USB MASS storage.
> Since i am not an expert in either scsi or usb mass storage, i am writing to report
> the issue and ask for a solution from you guys.
>
> My test scenario is as follow:
> 1.Linux HOST-----Linux mass storage gadget(the back store is a flash partition).
> 2.Host mount the device.
> 3.Host writes some data to the Mass storage device.
> 4.Host Unmount the device.
> Both Linux kernels(Host and Device) are Linux 4.9.
> Some has reported the same issue a long time ago, but it remains there.
> https://www.spinics.net/lists/linux-usb/msg53739.html
>
> For the issue itself, there is my observation:
> In the step 4, the Host will issue an PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
> and and timeout happens due to the device 's very slow fsg_lun_fsync_sub.

Something is wrong here. Before sending PREVENT-ALLOW MEDIUM
REMOVAL, the host should issue SYNCHRONIZE CACHE. This will force
fsg_lun_fsync_sub to run, and the host should allow a long timeout for
this command. Then when PREVENT-ALLOW MEDIUM REMOVAL is sent, nothing
will need to be flushed.

Alan Stern

> I found there are two methods to workaround the issue:
> 1. Change the timeout value of host scsi command PREVENT_ALLOW_MEDIUM_REMOVAL to
> to about 60 seconds from 10 seconds.
> 2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget driver.
>
> Thanks
>
> Regards
> zentao


2018-10-31 02:36:39

by Zengtao (B)

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

Hi:

>-----Original Message-----
>From: Alan Stern [mailto:[email protected]]
>Sent: Tuesday, October 30, 2018 10:08 PM
>To: Zengtao (B) <[email protected]>
>Cc: [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]
>Subject: Re: scsi_set_medium_removal timeout issue
>
>On Tue, 30 Oct 2018, Zengtao (B) wrote:
>
>> Hi
>>
>> I have recently met a scsi_set_medium_removal timeout issue, and it's
>> related to both SCSI and USB MASS storage.
>> Since i am not an expert in either scsi or usb mass storage, i am
>> writing to report the issue and ask for a solution from you guys.
>>
>> My test scenario is as follow:
>> 1.Linux HOST-----Linux mass storage gadget(the back store is a flash
>partition).
>> 2.Host mount the device.
>> 3.Host writes some data to the Mass storage device.
>> 4.Host Unmount the device.
>> Both Linux kernels(Host and Device) are Linux 4.9.
>> Some has reported the same issue a long time ago, but it remains there.
>> https://www.spinics.net/lists/linux-usb/msg53739.html
>>
>> For the issue itself, there is my observation:
>> In the step 4, the Host will issue an
>PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
>> and and timeout happens due to the device 's very slow
>fsg_lun_fsync_sub.
>
>Something is wrong here. Before sending PREVENT-ALLOW MEDIUM
>REMOVAL, the host should issue SYNCHRONIZE CACHE. This will force
>fsg_lun_fsync_sub to run, and the host should allow a long timeout for
>this command. Then when PREVENT-ALLOW MEDIUM REMOVAL is sent,
>nothing will need to be flushed.
>

Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it directly
issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe something wrong
with the scsi layer or something wrong with the mass storage class driver?

Zengtao

>Alan Stern
>
>> I found there are two methods to workaround the issue:
>> 1. Change the timeout value of host scsi command
>> PREVENT_ALLOW_MEDIUM_REMOVAL to to about 60 seconds from 10
>seconds.
>> 2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget
>driver.
>>
>> Thanks
>>
>> Regards
>> zentao

2018-10-31 14:22:58

by Alan Stern

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

On Wed, 31 Oct 2018, Zengtao (B) wrote:

> Hi:
>
> >-----Original Message-----
> >From: Alan Stern [mailto:[email protected]]
> >Sent: Tuesday, October 30, 2018 10:08 PM
> >To: Zengtao (B) <[email protected]>
> >Cc: [email protected]; [email protected];
> >[email protected]; [email protected];
> >[email protected]; [email protected];
> >[email protected]
> >Subject: Re: scsi_set_medium_removal timeout issue
> >
> >On Tue, 30 Oct 2018, Zengtao (B) wrote:
> >
> >> Hi
> >>
> >> I have recently met a scsi_set_medium_removal timeout issue, and it's
> >> related to both SCSI and USB MASS storage.
> >> Since i am not an expert in either scsi or usb mass storage, i am
> >> writing to report the issue and ask for a solution from you guys.
> >>
> >> My test scenario is as follow:
> >> 1.Linux HOST-----Linux mass storage gadget(the back store is a flash
> >partition).
> >> 2.Host mount the device.
> >> 3.Host writes some data to the Mass storage device.
> >> 4.Host Unmount the device.
> >> Both Linux kernels(Host and Device) are Linux 4.9.
> >> Some has reported the same issue a long time ago, but it remains there.
> >> https://www.spinics.net/lists/linux-usb/msg53739.html
> >>
> >> For the issue itself, there is my observation:
> >> In the step 4, the Host will issue an
> >PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
> >> and and timeout happens due to the device 's very slow
> >fsg_lun_fsync_sub.
> >
> >Something is wrong here. Before sending PREVENT-ALLOW MEDIUM
> >REMOVAL, the host should issue SYNCHRONIZE CACHE. This will force
> >fsg_lun_fsync_sub to run, and the host should allow a long timeout for
> >this command. Then when PREVENT-ALLOW MEDIUM REMOVAL is sent,
> >nothing will need to be flushed.
> >
>
> Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it directly
> issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe something wrong
> with the scsi layer or something wrong with the mass storage class driver?

Or it could be something else. Can you please post the dmesg log from
the host, showing what happens when the device is first plugged in?

Alan Stern


2018-11-12 11:57:21

by Zengtao (B)

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

Hi Alan:

>-----Original Message-----
>From: Alan Stern [mailto:[email protected]]
>Sent: Wednesday, October 31, 2018 10:20 PM
>To: Zengtao (B) <[email protected]>
>Cc: [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]
>Subject: RE: scsi_set_medium_removal timeout issue
>
>On Wed, 31 Oct 2018, Zengtao (B) wrote:
>
>> Hi:
>>
>> >-----Original Message-----
>> >From: Alan Stern [mailto:[email protected]]
>> >Sent: Tuesday, October 30, 2018 10:08 PM
>> >To: Zengtao (B) <[email protected]>
>> >Cc: [email protected]; [email protected];
>> >[email protected]; [email protected];
>> >[email protected]; [email protected];
>> >[email protected]
>> >Subject: Re: scsi_set_medium_removal timeout issue
>> >
>> >On Tue, 30 Oct 2018, Zengtao (B) wrote:
>> >
>> >> Hi
>> >>
>> >> I have recently met a scsi_set_medium_removal timeout issue, and
>> >> it's related to both SCSI and USB MASS storage.
>> >> Since i am not an expert in either scsi or usb mass storage, i am
>> >> writing to report the issue and ask for a solution from you guys.
>> >>
>> >> My test scenario is as follow:
>> >> 1.Linux HOST-----Linux mass storage gadget(the back store is a
>> >> flash
>> >partition).
>> >> 2.Host mount the device.
>> >> 3.Host writes some data to the Mass storage device.
>> >> 4.Host Unmount the device.
>> >> Both Linux kernels(Host and Device) are Linux 4.9.
>> >> Some has reported the same issue a long time ago, but it remains
>there.
>> >> https://www.spinics.net/lists/linux-usb/msg53739.html
>> >>
>> >> For the issue itself, there is my observation:
>> >> In the step 4, the Host will issue an
>> >PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
>> >> and and timeout happens due to the device 's very slow
>> >fsg_lun_fsync_sub.
>> >
>> >Something is wrong here. Before sending PREVENT-ALLOW MEDIUM
>> >REMOVAL, the host should issue SYNCHRONIZE CACHE. This will force
>> >fsg_lun_fsync_sub to run, and the host should allow a long timeout
>> >for this command. Then when PREVENT-ALLOW MEDIUM REMOVAL
>is sent,
>> >nothing will need to be flushed.
>> >
>>
>> Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it
>> directly issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe
>something
>> wrong with the scsi layer or something wrong with the mass storage
>class driver?
>
>Or it could be something else. Can you please post the dmesg log from
>the host, showing what happens when the device is first plugged in?
>

I have enabled the SCSI log for the host, please refer to the attachment.

Thanks.
Zengtao


Attachments:
1940.txt (57.71 kB)
1940.txt

2018-11-12 15:33:54

by Alan Stern

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

On Mon, 12 Nov 2018, Zengtao (B) wrote:

> >> >Something is wrong here. Before sending PREVENT-ALLOW MEDIUM
> >> >REMOVAL, the host should issue SYNCHRONIZE CACHE. This will force
> >> >fsg_lun_fsync_sub to run, and the host should allow a long timeout
> >> >for this command. Then when PREVENT-ALLOW MEDIUM REMOVAL
> >is sent,
> >> >nothing will need to be flushed.
> >> >
> >>
> >> Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it
> >> directly issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe
> >something
> >> wrong with the scsi layer or something wrong with the mass storage
> >class driver?
> >
> >Or it could be something else. Can you please post the dmesg log from
> >the host, showing what happens when the device is first plugged in?
> >
>
> I have enabled the SCSI log for the host, please refer to the attachment.

The log you attached was incomplete -- it was missing some commands
from the beginning. In any case, it wasn't what I wanted. I asked
you to post the dmesg log, not the SCSI log.

Alan Stern


2018-11-14 02:49:03

by Zengtao (B)

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

Hi Alan:

>-----Original Message-----
>From: Alan Stern [mailto:[email protected]]
>Sent: Monday, November 12, 2018 11:33 PM
>To: Zengtao (B) <[email protected]>
>Cc: [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]
>Subject: RE: scsi_set_medium_removal timeout issue
>
>On Mon, 12 Nov 2018, Zengtao (B) wrote:
>
>> >> >Something is wrong here. Before sending PREVENT-ALLOW
>MEDIUM
>> >> >REMOVAL, the host should issue SYNCHRONIZE CACHE. This will
>force
>> >> >fsg_lun_fsync_sub to run, and the host should allow a long timeout
>> >> >for this command. Then when PREVENT-ALLOW MEDIUM
>REMOVAL
>> >is sent,
>> >> >nothing will need to be flushed.
>> >> >
>> >>
>> >> Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it
>> >> directly issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe
>> >something
>> >> wrong with the scsi layer or something wrong with the mass storage
>> >class driver?
>> >
>> >Or it could be something else. Can you please post the dmesg log
>> >from the host, showing what happens when the device is first plugged
>in?
>> >
>>
>> I have enabled the SCSI log for the host, please refer to the attachment.
>
>The log you attached was incomplete -- it was missing some commands

I just enabled the scsi log in the middle of the umount operation, otherwise I can't
reproduce the issue when the scsi log is enabled.

>from the beginning. In any case, it wasn't what I wanted. I asked you to
>post the dmesg log, not the SCSI log.

Please refer to the new attachment for dmesg log.

Thanks
Zengtao


Attachments:
dmesg.txt (13.96 kB)
dmesg.txt

2018-11-14 15:36:41

by Alan Stern

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

On Wed, 14 Nov 2018, Zengtao (B) wrote:

> I just enabled the scsi log in the middle of the umount operation, otherwise I can't
> reproduce the issue when the scsi log is enabled.
>
> >from the beginning. In any case, it wasn't what I wanted. I asked you to
> >post the dmesg log, not the SCSI log.
>
> Please refer to the new attachment for dmesg log.

Heh, yes, I see now.

Martin, shouldn't sd_release() call sd_sync_cache() in the same way
that sd_shutdown() does, before it calls scsi_set_medium_removal()?

Alan Stern


2018-11-29 03:14:29

by Zengtao (B)

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

Ping?

>-----Original Message-----
>From: Alan Stern [mailto:[email protected]]
>Sent: Wednesday, November 14, 2018 11:35 PM
>To: Martin Petersen <[email protected]>; Zengtao (B)
><[email protected]>
>Cc: [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; [email protected]
>Subject: RE: scsi_set_medium_removal timeout issue
>
>On Wed, 14 Nov 2018, Zengtao (B) wrote:
>
>> I just enabled the scsi log in the middle of the umount operation,
>> otherwise I can't reproduce the issue when the scsi log is enabled.
>>
>> >from the beginning. In any case, it wasn't what I wanted. I asked
>> >you to post the dmesg log, not the SCSI log.
>>
>> Please refer to the new attachment for dmesg log.
>
>Heh, yes, I see now.
>
>Martin, shouldn't sd_release() call sd_sync_cache() in the same way that
>sd_shutdown() does, before it calls scsi_set_medium_removal()?
>
>Alan Stern

2018-12-04 16:38:07

by Alan Stern

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

On Thu, 29 Nov 2018, Zengtao (B) wrote:

> Ping?
>
> >-----Original Message-----
> >From: Alan Stern [mailto:[email protected]]
> >Sent: Wednesday, November 14, 2018 11:35 PM
> >To: Martin Petersen <[email protected]>; Zengtao (B)
> ><[email protected]>
> >Cc: [email protected]; [email protected];
> >[email protected]; [email protected];
> >[email protected]; [email protected]
> >Subject: RE: scsi_set_medium_removal timeout issue
> >
> >On Wed, 14 Nov 2018, Zengtao (B) wrote:
> >
> >> I just enabled the scsi log in the middle of the umount operation,
> >> otherwise I can't reproduce the issue when the scsi log is enabled.
> >>
> >> >from the beginning. In any case, it wasn't what I wanted. I asked
> >> >you to post the dmesg log, not the SCSI log.
> >>
> >> Please refer to the new attachment for dmesg log.
> >
> >Heh, yes, I see now.
> >
> >Martin, shouldn't sd_release() call sd_sync_cache() in the same way that
> >sd_shutdown() does, before it calls scsi_set_medium_removal()?
> >
> >Alan Stern

I don't know if this is the right thing to do, but you can try out the
following patch to see if it helps.

Alan Stern



Index: usb-4.x/drivers/scsi/sd.c
===================================================================
--- usb-4.x.orig/drivers/scsi/sd.c
+++ usb-4.x/drivers/scsi/sd.c
@@ -113,6 +113,7 @@ static void sd_shutdown(struct device *)
static int sd_suspend_system(struct device *);
static int sd_suspend_runtime(struct device *);
static int sd_resume(struct device *);
+static int sd_sync_cache(struct scsi_disk *sdkp, struct scsi_sense_hdr *sshdr);
static void sd_rescan(struct device *);
static int sd_init_command(struct scsi_cmnd *SCpnt);
static void sd_uninit_command(struct scsi_cmnd *SCpnt);
@@ -1393,8 +1394,14 @@ static void sd_release(struct gendisk *d
SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));

if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
- if (scsi_block_when_processing_errors(sdev))
+ if (scsi_block_when_processing_errors(sdev)) {
+ if (sdkp->WCE && sdkp->media_present) {
+ sd_printk(KERN_NOTICE, sdkp,
+ "Synchronizing SCSI cache\n");
+ sd_sync_cache(sdkp, NULL);
+ }
scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
+ }
}

/*


2019-04-01 20:29:18

by Alan Stern

[permalink] [raw]
Subject: RE: scsi_set_medium_removal timeout issue

On Tue, 4 Dec 2018, Alan Stern wrote:

> On Thu, 29 Nov 2018, Zengtao (B) wrote:
>
> > Ping?
> >
> > >-----Original Message-----
> > >From: Alan Stern [mailto:[email protected]]
> > >Sent: Wednesday, November 14, 2018 11:35 PM
> > >To: Martin Petersen <[email protected]>; Zengtao (B)
> > ><[email protected]>
> > >Cc: [email protected]; [email protected];
> > >[email protected]; [email protected];
> > >[email protected]; [email protected]
> > >Subject: RE: scsi_set_medium_removal timeout issue
> > >
> > >On Wed, 14 Nov 2018, Zengtao (B) wrote:
> > >
> > >> I just enabled the scsi log in the middle of the umount operation,
> > >> otherwise I can't reproduce the issue when the scsi log is enabled.
> > >>
> > >> >from the beginning. In any case, it wasn't what I wanted. I asked
> > >> >you to post the dmesg log, not the SCSI log.
> > >>
> > >> Please refer to the new attachment for dmesg log.
> > >
> > >Heh, yes, I see now.
> > >
> > >Martin, shouldn't sd_release() call sd_sync_cache() in the same way that
> > >sd_shutdown() does, before it calls scsi_set_medium_removal()?
> > >
> > >Alan Stern
>
> I don't know if this is the right thing to do, but you can try out the
> following patch to see if it helps.
>
> Alan Stern
>
>
>
> Index: usb-4.x/drivers/scsi/sd.c
> ===================================================================
> --- usb-4.x.orig/drivers/scsi/sd.c
> +++ usb-4.x/drivers/scsi/sd.c
> @@ -113,6 +113,7 @@ static void sd_shutdown(struct device *)
> static int sd_suspend_system(struct device *);
> static int sd_suspend_runtime(struct device *);
> static int sd_resume(struct device *);
> +static int sd_sync_cache(struct scsi_disk *sdkp, struct scsi_sense_hdr *sshdr);
> static void sd_rescan(struct device *);
> static int sd_init_command(struct scsi_cmnd *SCpnt);
> static void sd_uninit_command(struct scsi_cmnd *SCpnt);
> @@ -1393,8 +1394,14 @@ static void sd_release(struct gendisk *d
> SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));
>
> if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
> - if (scsi_block_when_processing_errors(sdev))
> + if (scsi_block_when_processing_errors(sdev)) {
> + if (sdkp->WCE && sdkp->media_present) {
> + sd_printk(KERN_NOTICE, sdkp,
> + "Synchronizing SCSI cache\n");
> + sd_sync_cache(sdkp, NULL);
> + }
> scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
> + }
> }
>
> /*

Zengtao, did you ever try out this patch? Did it fix your problem?

Alan Stern