2020-04-17 18:50:22

by Alan Mikhak

[permalink] [raw]
Subject: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

From: Alan Mikhak <[email protected]>

Modify __get_cached_msi_msg() to check both pointers for null before
copying the contents from the struct msi_msg pointer to the pointer
provided by caller.

Without this sanity check, __get_cached_msi_msg() crashes when invoked by
dw_edma_irq_request() in drivers/dma/dw-edma/dw-edma-core.c running on a
Linux-based PCIe endpoint device. MSI interrupt are not received by PCIe
endpoint devices. As a result, irq_get_msi_desc() returns null since there
are no cached struct msi_msg entry on the endpoint side.

Signed-off-by: Alan Mikhak <[email protected]>
---
kernel/irq/msi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index eb95f6106a1e..f39d42ef0d50 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -58,7 +58,8 @@ void free_msi_entry(struct msi_desc *entry)

void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
{
- *msg = entry->msg;
+ if (entry && msg)
+ *msg = entry->msg;
}

void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg)
--
2.7.4


2020-04-18 11:23:29

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

On Fri, 17 Apr 2020 11:48:42 -0700
Alan Mikhak <[email protected]> wrote:

Hi Alan,

> From: Alan Mikhak <[email protected]>
>
> Modify __get_cached_msi_msg() to check both pointers for null before
> copying the contents from the struct msi_msg pointer to the pointer
> provided by caller.
>
> Without this sanity check, __get_cached_msi_msg() crashes when invoked by
> dw_edma_irq_request() in drivers/dma/dw-edma/dw-edma-core.c running on a
> Linux-based PCIe endpoint device. MSI interrupt are not received by PCIe
> endpoint devices. As a result, irq_get_msi_desc() returns null since there
> are no cached struct msi_msg entry on the endpoint side.
>
> Signed-off-by: Alan Mikhak <[email protected]>
> ---
> kernel/irq/msi.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index eb95f6106a1e..f39d42ef0d50 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -58,7 +58,8 @@ void free_msi_entry(struct msi_desc *entry)
>
> void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
> {
> - *msg = entry->msg;
> + if (entry && msg)
> + *msg = entry->msg;
> }
>
> void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg)

I'm not convinced by this. If you know that, by construction, these
interrupts are not associated with an underlying MSI, why calling
get_cached_msi_msg() the first place?

There seem to be some assumptions in the DW EDMA driver that the
signaling would be MSI based, so maybe someone from Synopsys (Gustavo?)
could clarify that. From my own perspective, running on an endpoint
device means that it is *generating* interrupts, and I'm not sure what
the MSIs represent here.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-04-18 15:21:56

by Gustavo Pimentel

[permalink] [raw]
Subject: RE: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

Hi Marc and Alan,

> I'm not convinced by this. If you know that, by construction, these
> interrupts are not associated with an underlying MSI, why calling
> get_cached_msi_msg() the first place?
>
> There seem to be some assumptions in the DW EDMA driver that the
> signaling would be MSI based, so maybe someone from Synopsys (Gustavo?)
> could clarify that. From my own perspective, running on an endpoint
> device means that it is *generating* interrupts, and I'm not sure what
> the MSIs represent here.

Giving a little context to this topic.

The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
configured and triggered *remotely* as well *locally*.
For the sake of simplicity let's assume for now the eDMA was implemented
on the EP and that is the IP that we want to configure and use.

When I say *remotely* I mean that this IP can be configurable through the
RC/CPU side, however, for that, it requires the eDMA registers to be
exposed through a PCIe BAR on the EP. This will allow setting the SAR,
DAR and other settings, also need(s) the interrupt(s) address(es) to be
set as well (MSI or MSI-X only) so that it can signal through PCIe (to
the RC and consecutively the associated EP driver) if the data transfer
has been completed, aborted or if the Linked List consumer algorithm has
passed in some linked element marked with a watermark.

It was based on this case that the eDMA driver was exclusively developed.

However, Alan, wants to expand a little more this, by being able to use
this driver on the EP side (through
pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this IP
*locally*.
In fact, when doing this, he doesn't need to configure the interrupt
address (MSI or MSI-X), because this IP provides a local interrupt line
so that be connected to other blocks on the EP side.

Regards,
Gustavo

2020-04-20 09:17:57

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

On 2020-04-18 16:19, Gustavo Pimentel wrote:
> Hi Marc and Alan,
>
>> I'm not convinced by this. If you know that, by construction, these
>> interrupts are not associated with an underlying MSI, why calling
>> get_cached_msi_msg() the first place?
>>
>> There seem to be some assumptions in the DW EDMA driver that the
>> signaling would be MSI based, so maybe someone from Synopsys
>> (Gustavo?)
>> could clarify that. From my own perspective, running on an endpoint
>> device means that it is *generating* interrupts, and I'm not sure what
>> the MSIs represent here.
>
> Giving a little context to this topic.
>
> The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
> configured and triggered *remotely* as well *locally*.
> For the sake of simplicity let's assume for now the eDMA was
> implemented
> on the EP and that is the IP that we want to configure and use.
>
> When I say *remotely* I mean that this IP can be configurable through
> the
> RC/CPU side, however, for that, it requires the eDMA registers to be
> exposed through a PCIe BAR on the EP. This will allow setting the SAR,
> DAR and other settings, also need(s) the interrupt(s) address(es) to be
> set as well (MSI or MSI-X only) so that it can signal through PCIe (to
> the RC and consecutively the associated EP driver) if the data transfer
> has been completed, aborted or if the Linked List consumer algorithm
> has
> passed in some linked element marked with a watermark.
>
> It was based on this case that the eDMA driver was exclusively
> developed.
>
> However, Alan, wants to expand a little more this, by being able to use
> this driver on the EP side (through
> pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this
> IP
> *locally*.
> In fact, when doing this, he doesn't need to configure the interrupt
> address (MSI or MSI-X), because this IP provides a local interrupt line
> so that be connected to other blocks on the EP side.

Right, so this confirms my hunch that the driver is being used in
a way that doesn't reflect the expected use case. Rather than
papering over the problem by hacking the core code, I'd rather see
the eDMA driver be updated to support both host and endpoint cases.
This probably boils down to a PCI vs non-PCI set of helpers.

Alan, could you confirm whether we got it right?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-04-20 16:09:58

by Alan Mikhak

[permalink] [raw]
Subject: Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

On Mon, Apr 20, 2020 at 2:14 AM Marc Zyngier <[email protected]> wrote:
>
> On 2020-04-18 16:19, Gustavo Pimentel wrote:
> > Hi Marc and Alan,
> >
> >> I'm not convinced by this. If you know that, by construction, these
> >> interrupts are not associated with an underlying MSI, why calling
> >> get_cached_msi_msg() the first place?
> >>
> >> There seem to be some assumptions in the DW EDMA driver that the
> >> signaling would be MSI based, so maybe someone from Synopsys
> >> (Gustavo?)
> >> could clarify that. From my own perspective, running on an endpoint
> >> device means that it is *generating* interrupts, and I'm not sure what
> >> the MSIs represent here.
> >
> > Giving a little context to this topic.
> >
> > The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
> > configured and triggered *remotely* as well *locally*.
> > For the sake of simplicity let's assume for now the eDMA was
> > implemented
> > on the EP and that is the IP that we want to configure and use.
> >
> > When I say *remotely* I mean that this IP can be configurable through
> > the
> > RC/CPU side, however, for that, it requires the eDMA registers to be
> > exposed through a PCIe BAR on the EP. This will allow setting the SAR,
> > DAR and other settings, also need(s) the interrupt(s) address(es) to be
> > set as well (MSI or MSI-X only) so that it can signal through PCIe (to
> > the RC and consecutively the associated EP driver) if the data transfer
> > has been completed, aborted or if the Linked List consumer algorithm
> > has
> > passed in some linked element marked with a watermark.
> >
> > It was based on this case that the eDMA driver was exclusively
> > developed.
> >
> > However, Alan, wants to expand a little more this, by being able to use
> > this driver on the EP side (through
> > pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this
> > IP
> > *locally*.
> > In fact, when doing this, he doesn't need to configure the interrupt
> > address (MSI or MSI-X), because this IP provides a local interrupt line
> > so that be connected to other blocks on the EP side.
>
> Right, so this confirms my hunch that the driver is being used in
> a way that doesn't reflect the expected use case. Rather than
> papering over the problem by hacking the core code, I'd rather see
> the eDMA driver be updated to support both host and endpoint cases.
> This probably boils down to a PCI vs non-PCI set of helpers.
>
> Alan, could you confirm whether we got it right?

Thanks Marc and Gustavo. I appreciate all your comments and feedback.

You both got it right. As Gustavo mentioned, I am trying to expand dw-edma
for additional use cases.

First new use case is for integration of dw-edma with pci-epf-test so the latter
can initiate dma transfers locally from endpoint memory to host memory over the
PCIe bus in response to a user command issued from the host-side command
prompt using the pcitest utility. When the locally-initiated dma
transfer completes
in this use case on the endpoint side, dw-edma issues an interrupt to the local
CPU on the endpoint side by way of a legacy interrupt and pci-epf-test issues
an interrupt toward the remote host CPU across the PCIe bus by way of legacy,
MSI, or possibly MSI-X interrupt.

Second new use case is for integration of dw-edma with pci_endpoint_test
running on the host CPU so the latter can initiate dma transfers locally from
host-side in response to a user command issued from the host-side command
prompt using the pcitest utility. This use case is for host systems that have
Synopsys DesignWare PCI eDMA hardware on the host side. When the
locally-initiated dma transfer completes in this use case on the host-side,
dw-edma issues a legacy interrupt to its local host CPU and pci-epf-test running
on the endpoint side issues a legacy, MSI, or possibly MSI-X interrupt
across the
PCIe bus toward the host CPU.

When both the host and endpoint sides have the Synopsys DesignWare PCI
eDMA hardware, more use cases become possible in which eDMA controllers
from both systems can be engaged to move data. Embedded DMA controllers
from other PCIe IP vendors may also be supported with additional dmaengine
drivers under the Linux PCI Endpoint Framework with pci-epf-test, pcitest, and
pci_endpoint_test suite as well as new PCI endpoint function drivers for such
applications that require dma, for example nvme or virtio_net endpoint function
drivers.

I submitted a recent patch [1] and [2] which Gustavo ACk'd to decouple dw-edma
from struct pci_dev. This enabled me to exercise dw-edma on some riscv host
and endpoint systems that I work with.

I will submit another patch to decouple dw-edma from struct msi_msg such
that it would only call get_cached_msi_msg() on the host-side in its original
use case with remotely initiated dma transfers using the BAR access method.

The crash that I reported in __get_cached_msi_msg() is probably worth fixing
too. It seems to be low impact since get_cached_msi_msg() seems to be called
infrequently by a few callers.

Regards,
Alan Mikhak

[1] https://patchwork.kernel.org/patch/11489607/
[2] https://patchwork.kernel.org/patch/11491757/

>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...

2020-04-20 19:21:56

by Alan Mikhak

[permalink] [raw]
Subject: Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

On Mon, Apr 20, 2020 at 9:08 AM Alan Mikhak <[email protected]> wrote:
>
> On Mon, Apr 20, 2020 at 2:14 AM Marc Zyngier <[email protected]> wrote:
> >
> > On 2020-04-18 16:19, Gustavo Pimentel wrote:
> > > Hi Marc and Alan,
> > >
> > >> I'm not convinced by this. If you know that, by construction, these
> > >> interrupts are not associated with an underlying MSI, why calling
> > >> get_cached_msi_msg() the first place?
> > >>
> > >> There seem to be some assumptions in the DW EDMA driver that the
> > >> signaling would be MSI based, so maybe someone from Synopsys
> > >> (Gustavo?)
> > >> could clarify that. From my own perspective, running on an endpoint
> > >> device means that it is *generating* interrupts, and I'm not sure what
> > >> the MSIs represent here.
> > >
> > > Giving a little context to this topic.
> > >
> > > The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
> > > configured and triggered *remotely* as well *locally*.
> > > For the sake of simplicity let's assume for now the eDMA was
> > > implemented
> > > on the EP and that is the IP that we want to configure and use.
> > >
> > > When I say *remotely* I mean that this IP can be configurable through
> > > the
> > > RC/CPU side, however, for that, it requires the eDMA registers to be
> > > exposed through a PCIe BAR on the EP. This will allow setting the SAR,
> > > DAR and other settings, also need(s) the interrupt(s) address(es) to be
> > > set as well (MSI or MSI-X only) so that it can signal through PCIe (to
> > > the RC and consecutively the associated EP driver) if the data transfer
> > > has been completed, aborted or if the Linked List consumer algorithm
> > > has
> > > passed in some linked element marked with a watermark.
> > >
> > > It was based on this case that the eDMA driver was exclusively
> > > developed.
> > >
> > > However, Alan, wants to expand a little more this, by being able to use
> > > this driver on the EP side (through
> > > pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this
> > > IP
> > > *locally*.
> > > In fact, when doing this, he doesn't need to configure the interrupt
> > > address (MSI or MSI-X), because this IP provides a local interrupt line
> > > so that be connected to other blocks on the EP side.
> >
> > Right, so this confirms my hunch that the driver is being used in
> > a way that doesn't reflect the expected use case. Rather than
> > papering over the problem by hacking the core code, I'd rather see
> > the eDMA driver be updated to support both host and endpoint cases.
> > This probably boils down to a PCI vs non-PCI set of helpers.
> >
> > Alan, could you confirm whether we got it right?
>
> Thanks Marc and Gustavo. I appreciate all your comments and feedback.
>
> You both got it right. As Gustavo mentioned, I am trying to expand dw-edma
> for additional use cases.
>
> First new use case is for integration of dw-edma with pci-epf-test so the latter
> can initiate dma transfers locally from endpoint memory to host memory over the
> PCIe bus in response to a user command issued from the host-side command
> prompt using the pcitest utility. When the locally-initiated dma
> transfer completes
> in this use case on the endpoint side, dw-edma issues an interrupt to the local
> CPU on the endpoint side by way of a legacy interrupt

Although dw-edma enables and handles such interrupts, to be correct, I should
have said that in this use case Synopsys DesignWare PCI eDMA hardware issues
an interrupt to the local CPU on the endpoint side by way of a legacy interrupt.

> and pci-epf-test issues
> an interrupt toward the remote host CPU across the PCIe bus by way of legacy,
> MSI, or possibly MSI-X interrupt.
>
> Second new use case is for integration of dw-edma with pci_endpoint_test
> running on the host CPU so the latter can initiate dma transfers locally from
> host-side in response to a user command issued from the host-side command
> prompt using the pcitest utility. This use case is for host systems that have
> Synopsys DesignWare PCI eDMA hardware on the host side. When the
> locally-initiated dma transfer completes in this use case on the host-side,
> dw-edma issues a legacy interrupt to its local host CPU

Although dw-edma enables and handles such interrupts, to be correct, I should
have said that in this case the Synopsys DesignWare PCI eDMA hardware issues
a legacy interrupt to its local host CPU in this use case.

> and pci-epf-test running
> on the endpoint side issues a legacy, MSI, or possibly MSI-X interrupt
> across the
> PCIe bus toward the host CPU.
>
> When both the host and endpoint sides have the Synopsys DesignWare PCI
> eDMA hardware, more use cases become possible in which eDMA controllers
> from both systems can be engaged to move data. Embedded DMA controllers
> from other PCIe IP vendors may also be supported with additional dmaengine
> drivers under the Linux PCI Endpoint Framework with pci-epf-test, pcitest, and
> pci_endpoint_test suite as well as new PCI endpoint function drivers for such
> applications that require dma, for example nvme or virtio_net endpoint function
> drivers.
>
> I submitted a recent patch [1] and [2] which Gustavo ACk'd to decouple dw-edma
> from struct pci_dev. This enabled me to exercise dw-edma on some riscv host
> and endpoint systems that I work with.
>
> I will submit another patch to decouple dw-edma from struct msi_msg such
> that it would only call get_cached_msi_msg() on the host-side in its original
> use case with remotely initiated dma transfers using the BAR access method.

Marc, I submitted another patch [3] to modify dw-edma to check if a
struct msi_desc
entry exists before copying the contents of its struct msi_msg
pointer. This is not
exactly decoupling dw-edma from struct msi_msg as I suggested earlier but this
set of dw-edma patches should show your concern is being addressed.

>
> The crash that I reported in __get_cached_msi_msg() is probably worth fixing
> too. It seems to be low impact since get_cached_msi_msg() seems to be called
> infrequently by a few callers.
>
> Regards,
> Alan Mikhak
>
> [1] https://patchwork.kernel.org/patch/11489607/
> [2] https://patchwork.kernel.org/patch/11491757/
[3] https://patchwork.kernel.org/patch/11499551/

>
> >
> > Thanks,
> >
> > M.
> > --
> > Jazz is not dead. It just smells funny...

2020-04-21 08:41:17

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

On Mon, 20 Apr 2020 09:08:27 -0700
Alan Mikhak <[email protected]> wrote:

Alan,

> On Mon, Apr 20, 2020 at 2:14 AM Marc Zyngier <[email protected]> wrote:
> >
> > On 2020-04-18 16:19, Gustavo Pimentel wrote:
> > > Hi Marc and Alan,
> > >
> > >> I'm not convinced by this. If you know that, by construction, these
> > >> interrupts are not associated with an underlying MSI, why calling
> > >> get_cached_msi_msg() the first place?
> > >>
> > >> There seem to be some assumptions in the DW EDMA driver that the
> > >> signaling would be MSI based, so maybe someone from Synopsys
> > >> (Gustavo?)
> > >> could clarify that. From my own perspective, running on an endpoint
> > >> device means that it is *generating* interrupts, and I'm not sure what
> > >> the MSIs represent here.
> > >
> > > Giving a little context to this topic.
> > >
> > > The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
> > > configured and triggered *remotely* as well *locally*.
> > > For the sake of simplicity let's assume for now the eDMA was
> > > implemented
> > > on the EP and that is the IP that we want to configure and use.
> > >
> > > When I say *remotely* I mean that this IP can be configurable through
> > > the
> > > RC/CPU side, however, for that, it requires the eDMA registers to be
> > > exposed through a PCIe BAR on the EP. This will allow setting the SAR,
> > > DAR and other settings, also need(s) the interrupt(s) address(es) to be
> > > set as well (MSI or MSI-X only) so that it can signal through PCIe (to
> > > the RC and consecutively the associated EP driver) if the data transfer
> > > has been completed, aborted or if the Linked List consumer algorithm
> > > has
> > > passed in some linked element marked with a watermark.
> > >
> > > It was based on this case that the eDMA driver was exclusively
> > > developed.
> > >
> > > However, Alan, wants to expand a little more this, by being able to use
> > > this driver on the EP side (through
> > > pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this
> > > IP
> > > *locally*.
> > > In fact, when doing this, he doesn't need to configure the interrupt
> > > address (MSI or MSI-X), because this IP provides a local interrupt line
> > > so that be connected to other blocks on the EP side.
> >
> > Right, so this confirms my hunch that the driver is being used in
> > a way that doesn't reflect the expected use case. Rather than
> > papering over the problem by hacking the core code, I'd rather see
> > the eDMA driver be updated to support both host and endpoint cases.
> > This probably boils down to a PCI vs non-PCI set of helpers.
> >
> > Alan, could you confirm whether we got it right?
>
> Thanks Marc and Gustavo. I appreciate all your comments and feedback.
>
> You both got it right. As Gustavo mentioned, I am trying to expand dw-edma
> for additional use cases.
>
> First new use case is for integration of dw-edma with pci-epf-test so the latter
> can initiate dma transfers locally from endpoint memory to host memory over the
> PCIe bus in response to a user command issued from the host-side command
> prompt using the pcitest utility. When the locally-initiated dma
> transfer completes
> in this use case on the endpoint side, dw-edma issues an interrupt to the local
> CPU on the endpoint side by way of a legacy interrupt and pci-epf-test issues
> an interrupt toward the remote host CPU across the PCIe bus by way of legacy,
> MSI, or possibly MSI-X interrupt.
>
> Second new use case is for integration of dw-edma with pci_endpoint_test
> running on the host CPU so the latter can initiate dma transfers locally from
> host-side in response to a user command issued from the host-side command
> prompt using the pcitest utility. This use case is for host systems that have
> Synopsys DesignWare PCI eDMA hardware on the host side. When the
> locally-initiated dma transfer completes in this use case on the host-side,
> dw-edma issues a legacy interrupt to its local host CPU and pci-epf-test running
> on the endpoint side issues a legacy, MSI, or possibly MSI-X interrupt
> across the
> PCIe bus toward the host CPU.
>
> When both the host and endpoint sides have the Synopsys DesignWare PCI
> eDMA hardware, more use cases become possible in which eDMA controllers
> from both systems can be engaged to move data. Embedded DMA controllers
> from other PCIe IP vendors may also be supported with additional dmaengine
> drivers under the Linux PCI Endpoint Framework with pci-epf-test, pcitest, and
> pci_endpoint_test suite as well as new PCI endpoint function drivers for such
> applications that require dma, for example nvme or virtio_net endpoint function
> drivers.
>
> I submitted a recent patch [1] and [2] which Gustavo ACk'd to decouple dw-edma
> from struct pci_dev. This enabled me to exercise dw-edma on some riscv host
> and endpoint systems that I work with.
>
> I will submit another patch to decouple dw-edma from struct msi_msg such
> that it would only call get_cached_msi_msg() on the host-side in its
> original use case with remotely initiated dma transfers using the BAR
> access method.
>
> The crash that I reported in __get_cached_msi_msg() is probably worth
> fixing too. It seems to be low impact since get_cached_msi_msg()
> seems to be called infrequently by a few callers.

It isn't about the frequency of the calls, nor the overhead of this
function. It is about the fundamental difference between a wired
interrupt (in most case a level triggered one) and a MSI (edge by
definition on PCI). By making get_cached_msi_msg() "safe" to be called
for non-MSI IRQs, you hide a variety of design bugs which would
otherwise be obvious, like the one you are currently dealing with.

Your eDMA driver uses MSI by construction, and is likely to use the MSI
semantics (edge triggering, coalescing, memory barrier). On the other
hand, your use case is likely to have interrupts with very different
semantics (level triggered, no barrier effect). Papering over these
differences is not the way to go, I'm afraid.

I would recommend that you adapt the driver to have a separate
interrupt management for the non-MSI case, or at least not blindly use
MSI-specific APIs when not using them.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2020-04-21 23:46:07

by Alan Mikhak

[permalink] [raw]
Subject: Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

On Tue, Apr 21, 2020 at 1:39 AM Marc Zyngier <[email protected]> wrote:
>
> On Mon, 20 Apr 2020 09:08:27 -0700
> Alan Mikhak <[email protected]> wrote:
>
> Alan,
>
> > On Mon, Apr 20, 2020 at 2:14 AM Marc Zyngier <[email protected]> wrote:
> > >
> > > On 2020-04-18 16:19, Gustavo Pimentel wrote:
> > > > Hi Marc and Alan,
> > > >
> > > >> I'm not convinced by this. If you know that, by construction, these
> > > >> interrupts are not associated with an underlying MSI, why calling
> > > >> get_cached_msi_msg() the first place?
> > > >>
> > > >> There seem to be some assumptions in the DW EDMA driver that the
> > > >> signaling would be MSI based, so maybe someone from Synopsys
> > > >> (Gustavo?)
> > > >> could clarify that. From my own perspective, running on an endpoint
> > > >> device means that it is *generating* interrupts, and I'm not sure what
> > > >> the MSIs represent here.
> > > >
> > > > Giving a little context to this topic.
> > > >
> > > > The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
> > > > configured and triggered *remotely* as well *locally*.
> > > > For the sake of simplicity let's assume for now the eDMA was
> > > > implemented
> > > > on the EP and that is the IP that we want to configure and use.
> > > >
> > > > When I say *remotely* I mean that this IP can be configurable through
> > > > the
> > > > RC/CPU side, however, for that, it requires the eDMA registers to be
> > > > exposed through a PCIe BAR on the EP. This will allow setting the SAR,
> > > > DAR and other settings, also need(s) the interrupt(s) address(es) to be
> > > > set as well (MSI or MSI-X only) so that it can signal through PCIe (to
> > > > the RC and consecutively the associated EP driver) if the data transfer
> > > > has been completed, aborted or if the Linked List consumer algorithm
> > > > has
> > > > passed in some linked element marked with a watermark.
> > > >
> > > > It was based on this case that the eDMA driver was exclusively
> > > > developed.
> > > >
> > > > However, Alan, wants to expand a little more this, by being able to use
> > > > this driver on the EP side (through
> > > > pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this
> > > > IP
> > > > *locally*.
> > > > In fact, when doing this, he doesn't need to configure the interrupt
> > > > address (MSI or MSI-X), because this IP provides a local interrupt line
> > > > so that be connected to other blocks on the EP side.
> > >
> > > Right, so this confirms my hunch that the driver is being used in
> > > a way that doesn't reflect the expected use case. Rather than
> > > papering over the problem by hacking the core code, I'd rather see
> > > the eDMA driver be updated to support both host and endpoint cases.
> > > This probably boils down to a PCI vs non-PCI set of helpers.
> > >
> > > Alan, could you confirm whether we got it right?
> >
> > Thanks Marc and Gustavo. I appreciate all your comments and feedback.
> >
> > You both got it right. As Gustavo mentioned, I am trying to expand dw-edma
> > for additional use cases.
> >
> > First new use case is for integration of dw-edma with pci-epf-test so the latter
> > can initiate dma transfers locally from endpoint memory to host memory over the
> > PCIe bus in response to a user command issued from the host-side command
> > prompt using the pcitest utility. When the locally-initiated dma
> > transfer completes
> > in this use case on the endpoint side, dw-edma issues an interrupt to the local
> > CPU on the endpoint side by way of a legacy interrupt and pci-epf-test issues
> > an interrupt toward the remote host CPU across the PCIe bus by way of legacy,
> > MSI, or possibly MSI-X interrupt.
> >
> > Second new use case is for integration of dw-edma with pci_endpoint_test
> > running on the host CPU so the latter can initiate dma transfers locally from
> > host-side in response to a user command issued from the host-side command
> > prompt using the pcitest utility. This use case is for host systems that have
> > Synopsys DesignWare PCI eDMA hardware on the host side. When the
> > locally-initiated dma transfer completes in this use case on the host-side,
> > dw-edma issues a legacy interrupt to its local host CPU and pci-epf-test running
> > on the endpoint side issues a legacy, MSI, or possibly MSI-X interrupt
> > across the
> > PCIe bus toward the host CPU.
> >
> > When both the host and endpoint sides have the Synopsys DesignWare PCI
> > eDMA hardware, more use cases become possible in which eDMA controllers
> > from both systems can be engaged to move data. Embedded DMA controllers
> > from other PCIe IP vendors may also be supported with additional dmaengine
> > drivers under the Linux PCI Endpoint Framework with pci-epf-test, pcitest, and
> > pci_endpoint_test suite as well as new PCI endpoint function drivers for such
> > applications that require dma, for example nvme or virtio_net endpoint function
> > drivers.
> >
> > I submitted a recent patch [1] and [2] which Gustavo ACk'd to decouple dw-edma
> > from struct pci_dev. This enabled me to exercise dw-edma on some riscv host
> > and endpoint systems that I work with.
> >
> > I will submit another patch to decouple dw-edma from struct msi_msg such
> > that it would only call get_cached_msi_msg() on the host-side in its
> > original use case with remotely initiated dma transfers using the BAR
> > access method.
> >
> > The crash that I reported in __get_cached_msi_msg() is probably worth
> > fixing too. It seems to be low impact since get_cached_msi_msg()
> > seems to be called infrequently by a few callers.
>
> It isn't about the frequency of the calls, nor the overhead of this
> function. It is about the fundamental difference between a wired
> interrupt (in most case a level triggered one) and a MSI (edge by
> definition on PCI). By making get_cached_msi_msg() "safe" to be called
> for non-MSI IRQs, you hide a variety of design bugs which would
> otherwise be obvious, like the one you are currently dealing with.
>
> Your eDMA driver uses MSI by construction, and is likely to use the MSI
> semantics (edge triggering, coalescing, memory barrier). On the other
> hand, your use case is likely to have interrupts with very different
> semantics (level triggered, no barrier effect). Papering over these
> differences is not the way to go, I'm afraid.
>
> I would recommend that you adapt the driver to have a separate
> interrupt management for the non-MSI case, or at least not blindly use
> MSI-specific APIs when not using them.

Thanks Marc,

I understand that the crash I reported here is to be kept to catch such
issues. The design of dw-edma was correct for its original use case.
Since I am the one trying to expand its use cases, I accept your
recommendation and will see if I can offer patches that would be
acceptable there.

Regards,
Alan

>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...