2022-10-21 09:49:13

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

Hi,

We got the following report in Debian after an update from 5.10.140 to
the current 5.10.149. Full quoting below (from
https://bugs.debian.org/1022126). Does this ring some bell about known
regressions?
On Thu, Oct 20, 2022 at 05:21:03PM +0200, Adi Kriegisch wrote:
> Package: linux-image-5.10.0-19-amd64
> Version: 5.10.149-1
> Severity: important
>
> Dear maintainers,
>
> with the upgrade to the latest bullseye kernel (5.10.149-1), our xen setup
> is unbootable due to swiotlb buffer errors:
> | sd 0:0:0:0: scsi_dma_map failed: request for 401408 bytes!
> and
> | mpt3sas 0000:01:00.0: swiotlb buffer is full (sz: 401408 bytes),
> | total 32768 (slots), used 0 (slots)
> (the byte sizes vary between boots).
>
> After reading bug #850425[1], we also tried to force 32bit mode in the
> mpt3sas driver by specifying a dom0 memory below 4G; this lets the machine
> boot, but almost immediately after that fails with the same error. Notable
> difference is that the used slots are 128.
>
> Xen commandline:
> dom0_mem=4096M,max:4096M dom0_max_vcpus=4 dom0_vcpus_pin
> ucode=scan xpti=dom0=false,domu=true gnttab_max_frames=128
>
> Using dom0-iommu=map-inclusive in some combinations with swiotlb on the
> kernel commandline gives us some used slots (way below 128) in the error
> message even in 64bit dma mode in the mpt3sas driver.
>
> The kernel works when booted without xen. We'd be more than happy to get
> pointers on how to fix that issue or patches to test!
>
> Thanks for your help!
>
> -- Adi
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850425

Regards,
Salvatore


2022-10-23 01:57:09

by Bart Van Assche

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 10/21/22 02:22, Salvatore Bonaccorso wrote:
> We got the following report in Debian after an update from 5.10.140 to
> the current 5.10.149. Full quoting below (from
> https://bugs.debian.org/1022126). Does this ring some bell about known
> regressions?

Only three mpt3sas changes are new in v5.10.149 compared to v5.10.140:
$ git log --format=oneline v5.10.140..v5.10.149
2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return value check of dma_get_required_mask()
e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region
ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-after-free warning

Sreekanth and Suganath, can you help with bisecting this issue? For the
full report, see also https://lore.kernel.org/linux-scsi/[email protected]/.

Thanks,

Bart.

2022-10-24 13:22:20

by Sreekanth Reddy

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <[email protected]> wrote:
>
> On 10/21/22 02:22, Salvatore Bonaccorso wrote:
> > We got the following report in Debian after an update from 5.10.140 to
> > the current 5.10.149. Full quoting below (from
> > https://bugs.debian.org/1022126). Does this ring some bell about known
> > regressions?
>
> Only three mpt3sas changes are new in v5.10.149 compared to v5.10.140:
> $ git log --format=oneline v5.10.140..v5.10.149
> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return value check of dma_get_required_mask()
> e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region
> ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-after-free warning
>
> Sreekanth and Suganath, can you help with bisecting this issue? For the
> full report, see also https://lore.kernel.org/linux-scsi/[email protected]/.

This issue is getting observed after having the below patch changes,
2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
value check of dma_get_required_mask()

What is happening is that on Xen hypervisor, this
dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
says that the minimum DMA mask required to access the host memory is
32 bit and hence mpt3sas driver is setting the DMA mask to 32bit. So,
on a 64 bit machine, if the driver set's the DMA mask to 32 bit then
SWIOTLB's bounce buffer comes into picture during IOs. Since these
bounce buffers are limited in size and hence we observe the IO hang if
the large IOs are issued.

I am not sure whether this API's return value is correct or not in the
Xen environment. If it is correct then I have to modify the driver to
not use this API and directly set the DMA mask to 64 bit if the system
is a 64bit machine.

Thanks,
Sreekanth



>
> Thanks,
>
> Bart.


Attachments:
smime.p7s (4.12 kB)
S/MIME Cryptographic Signature

2022-10-24 15:27:58

by James Bottomley

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On Mon, 2022-10-24 at 17:26 +0530, Sreekanth Reddy wrote:
> On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <[email protected]>
> wrote:
> > On 10/21/22 02:22, Salvatore Bonaccorso wrote:
> > > We got the following report in Debian after an update from
> > > 5.10.140 to
> > > the current 5.10.149. Full quoting below (from
> > > https://bugs.debian.org/1022126). Does this ring some bell about
> > > known
> > > regressions?
> >
> > Only three mpt3sas changes are new in v5.10.149 compared to
> > v5.10.140:
> > $ git log --format=oneline v5.10.140..v5.10.149
> > 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> > value check of dma_get_required_mask()
> > e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe
> > scatterlist allocations to be within same 4 GB region
> > ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-
> > after-free warning
> >
> > Sreekanth and Suganath, can you help with bisecting this issue? For
> > the
> > full report, see also
> > https://lore.kernel.org/linux-scsi/[email protected]/.
>
> This issue is getting observed after having the below patch changes,
> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> value check of dma_get_required_mask()
>
> What is happening is that on Xen hypervisor, this
> dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
> says that the minimum DMA mask required to access the host memory is
> 32 bit and hence mpt3sas driver is setting the DMA mask to 32bit.

This sounds entirely correct because the VM is booted with (from the
original debian bug report):

dom0_mem=4096M,max:4096M dom0_max_vcpus=4 dom0_vcpus_pin
ucode=scan xpti=dom0=false,domu=true gnttab_max_frames=128

So it has no memory above 4GB and thus 32 bit addressing is the minimum
required. If you boot a machine with >4GB and Xen still returns a 32
bit mask here, then we have a Xen problem.

> So, on a 64 bit machine, if the driver set's the DMA mask to 32 bit
> then SWIOTLB's bounce buffer comes into picture during IOs. Since
> these bounce buffers are limited in size and hence we observe the IO
> hang if the large IOs are issued.

Why is the SWIOTLB active if all the physical memory in the VM is
within the range of the DMA mask? If this is really happening, it
sounds like a SWIOTLB bug.

> I am not sure whether this API's return value is correct or not in
> the Xen environment. If it is correct then I have to modify the
> driver to not use this API and directly set the DMA mask to 64 bit if
> the system is a 64bit machine.

The original design of the API is to describe exactly the minimum
direct DMA requirements. There are a large number of cards with
multiple DMA register formats, the most common being to use either a
compact 32 bit or an expanded 64 bit register to describe a page
location. The former gives 39 bits of addressing and the latter 64.
If the DMA mask is 39 bits or below as described by this API, then the
card can use the compact address form.

James


2022-10-24 15:36:25

by Juergen Gross

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 24.10.22 13:56, Sreekanth Reddy wrote:
> On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <[email protected]> wrote:
>>
>> On 10/21/22 02:22, Salvatore Bonaccorso wrote:
>>> We got the following report in Debian after an update from 5.10.140 to
>>> the current 5.10.149. Full quoting below (from
>>> https://bugs.debian.org/1022126). Does this ring some bell about known
>>> regressions?
>>
>> Only three mpt3sas changes are new in v5.10.149 compared to v5.10.140:
>> $ git log --format=oneline v5.10.140..v5.10.149
>> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return value check of dma_get_required_mask()
>> e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region
>> ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-after-free warning
>>
>> Sreekanth and Suganath, can you help with bisecting this issue? For the
>> full report, see also https://lore.kernel.org/linux-scsi/[email protected]/.
>
> This issue is getting observed after having the below patch changes,
> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> value check of dma_get_required_mask()
>
> What is happening is that on Xen hypervisor, this
> dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
> says that the minimum DMA mask required to access the host memory is
> 32 bit and hence mpt3sas driver is setting the DMA mask to 32bit. So,
> on a 64 bit machine, if the driver set's the DMA mask to 32 bit then
> SWIOTLB's bounce buffer comes into picture during IOs. Since these
> bounce buffers are limited in size and hence we observe the IO hang if
> the large IOs are issued.
>
> I am not sure whether this API's return value is correct or not in the
> Xen environment. If it is correct then I have to modify the driver to
> not use this API and directly set the DMA mask to 64 bit if the system
> is a 64bit machine.

Please recheck the backported patch in 5.10.y. It is _wrong_. The backport
has:

--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2993,7 +2993,7 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc,
struct pci_dev *pdev)

if (ioc->is_mcpu_endpoint ||
sizeof(dma_addr_t) == 4 || ioc->use_32bit_dma ||
- dma_get_required_mask(&pdev->dev) <= 32)
+ dma_get_required_mask(&pdev->dev) <= DMA_BIT_MASK(32))
ioc->dma_mask = 32;
/* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
else if (ioc->hba_mpi_version_belonged > MPI2_VERSION)

While the upstream patch has:

+ if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
+ dma_get_required_mask(&pdev->dev) <= 32) {
ioc->dma_mask = 32;
+ coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-10-24 16:54:39

by Juergen Gross

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 24.10.22 14:51, James Bottomley wrote:
> On Mon, 2022-10-24 at 17:26 +0530, Sreekanth Reddy wrote:
>> On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <[email protected]>
>> wrote:
>>> On 10/21/22 02:22, Salvatore Bonaccorso wrote:
>>>> We got the following report in Debian after an update from
>>>> 5.10.140 to
>>>> the current 5.10.149. Full quoting below (from
>>>> https://bugs.debian.org/1022126). Does this ring some bell about
>>>> known
>>>> regressions?
>>>
>>> Only three mpt3sas changes are new in v5.10.149 compared to
>>> v5.10.140:
>>> $ git log --format=oneline v5.10.140..v5.10.149
>>> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
>>> value check of dma_get_required_mask()
>>> e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe
>>> scatterlist allocations to be within same 4 GB region
>>> ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-
>>> after-free warning
>>>
>>> Sreekanth and Suganath, can you help with bisecting this issue? For
>>> the
>>> full report, see also
>>> https://lore.kernel.org/linux-scsi/[email protected]/.
>>
>> This issue is getting observed after having the below patch changes,
>> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
>> value check of dma_get_required_mask()
>>
>> What is happening is that on Xen hypervisor, this
>> dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
>> says that the minimum DMA mask required to access the host memory is
>> 32 bit and hence mpt3sas driver is setting the DMA mask to 32bit.
>
> This sounds entirely correct because the VM is booted with (from the
> original debian bug report):
>
> dom0_mem=4096M,max:4096M dom0_max_vcpus=4 dom0_vcpus_pin
> ucode=scan xpti=dom0=false,domu=true gnttab_max_frames=128
>
> So it has no memory above 4GB and thus 32 bit addressing is the minimum
> required. If you boot a machine with >4GB and Xen still returns a 32
> bit mask here, then we have a Xen problem.

Dom0 is (normally) a PV domain, so the physical memory can be still above
4 GB even with dom0_mem set to 4GB.

But please see my other reply to this thread. I think at least one of the
backports is plain wrong.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-10-24 20:55:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On Mon, Oct 24, 2022 at 05:28:05PM +0000, Andrew Cooper wrote:
> I don't know exactly how this translates to Linux internals, but most
> devices are fine and it's routinely the mpt2/3sas drivers which
> encounter problems.? It would be lovely if we could get to the bottom of
> this for once and for all.

So to summarize my two mails: I think te use of dma_get_required_mask
in mpt3sas is wrong, and the dma_get_required_mask return value from
xen-swiotlb is also wrong. Fixing either one should fix this problem,
and I think we should fix both.

2022-10-24 21:06:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On Mon, Oct 24, 2022 at 03:20:43PM +0200, Juergen Gross wrote:
> Dom0 is (normally) a PV domain, so the physical memory can be still above
> 4 GB even with dom0_mem set to 4GB.

Which means that we need to ensure the DMA ops for Xen-PV (which is
always xen-swiotlb I think?) need to return DMA_BIT_MASK(64) or whatever
is the highest possible address.

2022-10-24 21:07:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On Mon, Oct 24, 2022 at 05:26:44PM +0530, Sreekanth Reddy wrote:
> This issue is getting observed after having the below patch changes,
> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> value check of dma_get_required_mask()

Looking at this commit it seems odd. dma_get_required_mask() should
only be used as an optimization for hardware that actually benefits
from a lower DMA Mask. That means either classic PCI that requires
DAC cycles, or firmware architectures like aic7xxx that do need
additional overhead. I don't think either is the case for mpt3sas,
so I think (in addition to fixing up the Xen required mask), mpt3sas
should do something like:

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 4e981ccaac4163..295942a8989780 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2992,8 +2992,7 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
struct sysinfo s;
u64 coherent_dma_mask, dma_mask;

- if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
- dma_get_required_mask(&pdev->dev) <= DMA_BIT_MASK(32)) {
+ if (ioc->is_mcpu_endpoint) {
ioc->dma_mask = 32;
coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);
/* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */

2022-10-24 21:34:49

by Adi Kriegisch

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

Hi,


On Mon, Oct 24, 2022 at 05:26:44PM +0530, Sreekanth Reddy wrote:
> On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <[email protected]> wrote:
> >
> > On 10/21/22 02:22, Salvatore Bonaccorso wrote:
> > > We got the following report in Debian after an update from 5.10.140 to
> > > the current 5.10.149. Full quoting below (from
> > > https://bugs.debian.org/1022126). Does this ring some bell about known
> > > regressions?
> >
> > Only three mpt3sas changes are new in v5.10.149 compared to v5.10.140:
> > $ git log --format=oneline v5.10.140..v5.10.149
> > 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return value check of dma_get_required_mask()
> > e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region
> > ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-after-free warning
> >
> > Sreekanth and Suganath, can you help with bisecting this issue? For the
> > full report, see also https://lore.kernel.org/linux-scsi/[email protected]/.
>
> This issue is getting observed after having the below patch changes,
> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> value check of dma_get_required_mask()
reverting this patch fixed the issue for us.

> What is happening is that on Xen hypervisor, this
> dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
> says that the minimum DMA mask required to access the host memory is
> 32 bit and hence mpt3sas driver is setting the DMA mask to 32bit. So,
> on a 64 bit machine, if the driver set's the DMA mask to 32 bit then
> SWIOTLB's bounce buffer comes into picture during IOs. Since these
> bounce buffers are limited in size and hence we observe the IO hang if
> the large IOs are issued.
I am not sure about what is going on here but while reverting the above
patch, I added a printk right above the if:
| printk("dma_get_required_mask =%lld\n",dma_get_required_mask(&pdev->dev));
which always (as in booting dom0 with 2GB, 4GB, 16GB and booting bare metal
with 256GB) returns 4294967295 (DMA_BIT_MASK(32)).

> I am not sure whether this API's return value is correct or not in the
> Xen environment. If it is correct then I have to modify the driver to
> not use this API and directly set the DMA mask to 64 bit if the system
> is a 64bit machine.
Obviously, our server always reports a 32bit mask which works just fine in
the bare metal case but does not in the xen dom0 case. Is there anything I
can do to help tracking the issue down?

thanks and all the best,
Adi

> Thanks,
> Sreekanth
>
>
>
> >
> > Thanks,
> >
> > Bart.



Attachments:
(No filename) (2.64 kB)
signature.asc (849.00 B)
Download all attachments

2022-10-24 22:30:49

by James Bottomley

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On Mon, 2022-10-24 at 14:55 +0200, Juergen Gross wrote:
[...]
> Please recheck the backported patch in 5.10.y. It is _wrong_. The
> backport
> has:
>
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -2993,7 +2993,7 @@ _base_config_dma_addressing(struct
> MPT3SAS_ADAPTER *ioc,
> struct pci_dev *pdev)
>
> if (ioc->is_mcpu_endpoint ||
> sizeof(dma_addr_t) == 4 || ioc->use_32bit_dma ||
> - dma_get_required_mask(&pdev->dev) <= 32)
> + dma_get_required_mask(&pdev->dev) <= DMA_BIT_MASK(32))
> ioc->dma_mask = 32;
> /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
> else if (ioc->hba_mpi_version_belonged > MPI2_VERSION)
>
> While the upstream patch has:
>
> + if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
> + dma_get_required_mask(&pdev->dev) <= 32) {
> ioc->dma_mask = 32;
> + coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);
>

That's not the upstream commit. The quoted upstream patch in the
stable commit is:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e0747de0ea3dd87cdbb0393311e17471a9baf1

Which has the same diff as you quoted for the stable tree.

James


Attachments:
signature.asc (235.00 B)
This is a digitally signed message part

2022-10-25 04:47:48

by Juergen Gross

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 24.10.22 20:50, James Bottomley wrote:
> On Mon, 2022-10-24 at 14:55 +0200, Juergen Gross wrote:
> [...]
>> Please recheck the backported patch in 5.10.y. It is _wrong_. The
>> backport
>> has:
>>
>> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
>> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
>> @@ -2993,7 +2993,7 @@ _base_config_dma_addressing(struct
>> MPT3SAS_ADAPTER *ioc,
>> struct pci_dev *pdev)
>>
>> if (ioc->is_mcpu_endpoint ||
>> sizeof(dma_addr_t) == 4 || ioc->use_32bit_dma ||
>> - dma_get_required_mask(&pdev->dev) <= 32)
>> + dma_get_required_mask(&pdev->dev) <= DMA_BIT_MASK(32))
>> ioc->dma_mask = 32;
>> /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
>> else if (ioc->hba_mpi_version_belonged > MPI2_VERSION)
>>
>> While the upstream patch has:
>>
>> + if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
>> + dma_get_required_mask(&pdev->dev) <= 32) {
>> ioc->dma_mask = 32;
>> + coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);
>>
>
> That's not the upstream commit. The quoted upstream patch in the
> stable commit is:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e0747de0ea3dd87cdbb0393311e17471a9baf1
>
> Which has the same diff as you quoted for the stable tree.

Hmm, something is fishy here.

Looking at the source in current HEAD of Linus' tree shows me:

static int
_base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
{
struct sysinfo s;
u64 coherent_dma_mask, dma_mask;

if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
dma_get_required_mask(&pdev->dev) <= 32) {
ioc->dma_mask = 32;
coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);
/* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */

This doesn't look like commit e0e0747de0ea3dd has been applied.

git log drivers/scsi/mpt3sas/mpt3sas_base.c doesn't show it either.

git name-rev e0e0747de0ea3dd tells me it should be in v6.0-rc7.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-10-25 07:32:55

by Juergen Gross

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 24.10.22 14:55, Juergen Gross wrote:
> On 24.10.22 13:56, Sreekanth Reddy wrote:
>> On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <[email protected]> wrote:
>>>
>>> On 10/21/22 02:22, Salvatore Bonaccorso wrote:
>>>> We got the following report in Debian after an update from 5.10.140 to
>>>> the current 5.10.149. Full quoting below (from
>>>> https://bugs.debian.org/1022126). Does this ring some bell about known
>>>> regressions?
>>>
>>> Only three mpt3sas changes are new in v5.10.149 compared to v5.10.140:
>>> $ git log --format=oneline v5.10.140..v5.10.149
>>> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return value
>>> check of dma_get_required_mask()
>>> e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe
>>> scatterlist allocations to be within same 4 GB region
>>> ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-after-free
>>> warning
>>>
>>> Sreekanth and Suganath, can you help with bisecting this issue? For the
>>> full report, see also
>>> https://lore.kernel.org/linux-scsi/[email protected]/.
>>
>> This issue is getting observed after having the below patch changes,
>> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
>> value check of dma_get_required_mask()
>>
>> What is happening is that on Xen hypervisor, this
>> dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
>> says that the minimum DMA mask required to access the host memory is
>> 32 bit and hence mpt3sas driver is setting the DMA mask to 32bit. So,
>> on a 64 bit machine, if the driver set's the DMA mask to 32 bit then
>> SWIOTLB's bounce buffer comes into picture during IOs. Since these
>> bounce buffers are limited in size and hence we observe the IO hang if
>> the large IOs are issued.
>>
>> I am not sure whether this API's return value is correct or not in the
>> Xen environment. If it is correct then I have to modify the driver to
>> not use this API and directly set the DMA mask to 64 bit if the system
>> is a 64bit machine.
>
> Please recheck the backported patch in 5.10.y. It is _wrong_. The backport
> has:
>
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -2993,7 +2993,7 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc,
> struct pci_dev *pdev)
>
>         if (ioc->is_mcpu_endpoint ||
>             sizeof(dma_addr_t) == 4 || ioc->use_32bit_dma ||
> -           dma_get_required_mask(&pdev->dev) <= 32)
> +           dma_get_required_mask(&pdev->dev) <= DMA_BIT_MASK(32))
>                 ioc->dma_mask = 32;
>         /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
>         else if (ioc->hba_mpi_version_belonged > MPI2_VERSION)
>
> While the upstream patch has:
>
> +       if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
> +           dma_get_required_mask(&pdev->dev) <= 32) {
>                 ioc->dma_mask = 32;
> +               coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);

Sorry for this mistake of mine, which seems to have been caused by a git
inconsistency, as the upstream source is still showing the line

dma_get_required_mask(&pdev->dev) <= 32

I didn't double check which upstream patch was referenced by the backport
patch, but looked at the output of "git blame" to look at the last patch
older than the backport changing the line in question.

I didn't even think of the possibility that git could be wrong.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2023-03-14 11:24:10

by Juergen Gross

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 14.03.23 12:17, Joost Roeleveld wrote:
>> On Mon, Oct 24, 2022 at 05:28:05PM +0000, Andrew Cooper wrote:
>>> I don't know exactly how this translates to Linux internals, but most
>>> devices are fine and it's routinely the mpt2/3sas drivers which
>>> encounter problems.  It would be lovely if we could get to the bottom of
>>> this for once and for all.
>>
>> So to summarize my two mails: I think te use of dma_get_required_mask
>> in mpt3sas is wrong, and the dma_get_required_mask return value from
>> xen-swiotlb is also wrong. Fixing either one should fix this problem,
>> and I think we should fix both.
>
> Hi all,
>
> Is there anything that can be done to get this fixed in xen and the kernel somehow?
> I keep having to manually patch the mpt3sas driver to prevent it from being able
> to switch to 32bit mode each time I upgrade the kernel.

Just use 5.10.173.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.03 kB)
OpenPGP public key
OpenPGP_signature (495.00 B)
OpenPGP digital signature
Download all attachments

2023-03-14 11:44:19

by Andrew Cooper

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

On 14/03/2023 11:23 am, Juergen Gross wrote:
> On 14.03.23 12:17, Joost Roeleveld wrote:
>>> On Mon, Oct 24, 2022 at 05:28:05PM +0000, Andrew Cooper wrote:
>>>> I don't know exactly how this translates to Linux internals, but most
>>>> devices are fine and it's routinely the mpt2/3sas drivers which
>>>> encounter problems.  It would be lovely if we could get to the
>>>> bottom of
>>>> this for once and for all.
>>>
>>> So to summarize my two mails: I think te use of dma_get_required_mask
>>> in mpt3sas is wrong, and the dma_get_required_mask return value from
>>> xen-swiotlb is also wrong. Fixing either one should fix this problem,
>>> and I think we should fix both.
>>
>> Hi all,
>>
>> Is there anything that can be done to get this fixed in xen and the
>> kernel somehow?
>> I keep having to manually patch the mpt3sas driver to prevent it from
>> being able to switch to 32bit mode each time I upgrade the kernel.
>
> Just use 5.10.173.

Its still broken even in 6.1

We've just stumbled over this yet again in XenServer.

~Andrew

2023-03-14 12:08:32

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

Hi Andrew,

On Tue, Mar 14, 2023 at 11:41:08AM +0000, Andrew Cooper wrote:
> On 14/03/2023 11:23 am, Juergen Gross wrote:
> > On 14.03.23 12:17, Joost Roeleveld wrote:
> >>> On Mon, Oct 24, 2022 at 05:28:05PM +0000, Andrew Cooper wrote:
> >>>> I don't know exactly how this translates to Linux internals, but most
> >>>> devices are fine and it's routinely the mpt2/3sas drivers which
> >>>> encounter problems.? It would be lovely if we could get to the
> >>>> bottom of
> >>>> this for once and for all.
> >>>
> >>> So to summarize my two mails: I think te use of dma_get_required_mask
> >>> in mpt3sas is wrong, and the dma_get_required_mask return value from
> >>> xen-swiotlb is also wrong. Fixing either one should fix this problem,
> >>> and I think we should fix both.
> >>
> >> Hi all,
> >>
> >> Is there anything that can be done to get this fixed in xen and the
> >> kernel somehow?
> >> I keep having to manually patch the mpt3sas driver to prevent it from
> >> being able to switch to 32bit mode each time I upgrade the kernel.
> >
> > Just use 5.10.173.
>
> Its still broken even in 6.1
>
> We've just stumbled over this yet again in XenServer.

The mentioned commit has been backported to 6.1.16, 5.15.100 and
5.10.173. Is the issue still reproducible with 6.1.16?

Regards,
Salvatore