2021-10-08 04:37:59

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()

After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
shost->cmd_per_lun to a negative "short" value.

Use min_t(int, ...) to fix the issue.

Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
Cc: [email protected]
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Haiyang Zhang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---

v1 tried to fix the issue by changing the storvsc driver:
https://lwn.net/ml/linux-kernel/BYAPR21MB1270BBC14D5F1AE69FC31A16BFB09@BYAPR21MB1270.namprd21.prod.outlook.com/

v2 directly fixed the scsi core change instead as Michael Kelley suggested
(refer to the above link).

v3 simplified the commit log, as John Garry suggested.
Added Haiyang's and Ming's Reviewed-by.

drivers/scsi/hosts.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 3f6f14f0cafb..24b72ee4246f 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -220,7 +220,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
goto fail;
}

- shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,
+ /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */
+ shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
shost->can_queue);

error = scsi_init_sense_cache(shost);
--
2.25.1


2021-10-08 12:02:39

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()

On 08/10/2021 05:35, Dexuan Cui wrote:
> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.
>
> Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
> Cc:[email protected]
> Signed-off-by: Dexuan Cui<[email protected]>
> Reviewed-by: Haiyang Zhang<[email protected]>
> Reviewed-by: Ming Lei<[email protected]>

Reviewed-by: John Garry <[email protected]>

thanks

2021-10-12 16:44:08

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()


Dexuan,

> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.

I queued this up as a short term workaround. However, I am hoping that
the rework of the scaling code in storvsc lands soon.

--
Martin K. Petersen Oracle Linux Engineering

2021-10-12 18:09:24

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()

> From: Martin K. Petersen <[email protected]>
> Sent: Tuesday, October 12, 2021 9:42 AM
>
> Dexuan,
>
> > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> > shost->cmd_per_lun to a negative "short" value.
> >
> > Use min_t(int, ...) to fix the issue.
>
> I queued this up as a short term workaround. However, I am hoping that
> the rework of the scaling code in storvsc lands soon.

Thanks, Martin! I know Michael Kelley will improve the netvsc.

Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to
set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should
change scsi_driver.can_queue from "int" to "u16"? BTW, I guess the "cmd_per_lun"
should also be "u16" rather than "short"?

This was discussed in May, and it looks like the conclusion was not clear to me:
https://lwn.net/ml/linux-kernel/[email protected]/

Thanks,
-- Dexuan

2021-10-12 20:39:08

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()

On Thu, 7 Oct 2021 21:35:46 -0700, Dexuan Cui wrote:

> After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
> boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int"
> value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets
> shost->cmd_per_lun to a negative "short" value.
>
> Use min_t(int, ...) to fix the issue.
>
> [...]

Applied to 5.15/scsi-fixes, thanks!

[1/1] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
https://git.kernel.org/mkp/scsi/c/50b6cb351636

--
Martin K. Petersen Oracle Linux Engineering

2021-10-12 21:30:20

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()


Dexuan,

> Regarding this patch, I'm not sure if it's a "workaround": if it's
> incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value,
> probably we should change scsi_driver.can_queue from "int" to "u16"?

> BTW, I guess the "cmd_per_lun" should also be "u16" rather than
> "short"?

I agree that it would be nice to get all this cleaned up. Several,
somewhat peculiar, 25-year old design choices.

cmd_per_lun has traditionally been in the ballpark of low hundreds,
can_queue typically in the low thousands. And the block layer currently
caps at ~10K. Happy to take patches fixing this up, although I am a bit
worried about how much churn it will generate.

That said, I do think that cleaning this up is somewhat orthogonal to
the issue with storvsc. I suspect that allowing a huge amount of
concurrent outstanding commands is going to be detrimental to
performance for most workloads. And from that perspective I think that
the short->int fix, while valid given the type discrepancy, is just
treating the symptom.

Therefore I consider the short->int fix a workaround. And the proper fix
involves looking closely at things are scaled in the storvsc case. Which
I have noted that Michael is working on.

--
Martin K. Petersen Oracle Linux Engineering

2021-10-12 21:45:28

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()

> From: Martin K. Petersen <[email protected]>
> Sent: Tuesday, October 12, 2021 2:28 PM
> To: Dexuan Cui <[email protected]>
>
> > Regarding this patch, I'm not sure if it's a "workaround": if it's
> > incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value,
> > probably we should change scsi_driver.can_queue from "int" to "u16"?
>
> > BTW, I guess the "cmd_per_lun" should also be "u16" rather than
> > "short"?
>
> I agree that it would be nice to get all this cleaned up. Several,
> somewhat peculiar, 25-year old design choices.
>
> cmd_per_lun has traditionally been in the ballpark of low hundreds,
> can_queue typically in the low thousands. And the block layer currently
> caps at ~10K. Happy to take patches fixing this up, although I am a bit
> worried about how much churn it will generate.

Thanks for the explanation!

> That said, I do think that cleaning this up is somewhat orthogonal to
> the issue with storvsc. I suspect that allowing a huge amount of
> concurrent outstanding commands is going to be detrimental to
> performance for most workloads. And from that perspective I think that
> the short->int fix, while valid given the type discrepancy, is just
> treating the symptom.

Agreed.

> Therefore I consider the short->int fix a workaround. And the proper fix
> involves looking closely at things are scaled in the storvsc case. Which
> I have noted that Michael is working on.

Agreed. My v1 actually tried to work around the storvsc driver instread. :-)