While multi-MSI appears to work with pci-hyperv.c, there was a concern about
how linux was doing the ITRE allocations. Patch 2 addresses the concern.
However, patch 2 exposed an issue with how compose_msi_msg() was freeing a
previous allocation when called for the Nth time. Imagine a driver using
pci_alloc_irq_vectors() to request 32 MSIs. This would cause compose_msi_msg()
to be called 32 times, once for each MSI. With patch 2, MSI0 would allocate
the ITREs needed, and MSI1-31 would use the cached information. Then the driver
uses request_irq() on MSI1-17. This would call compose_msi_msg() again on those
MSIs, which would again use the cached information. Then unmask() would be
called to retarget the MSIs to the right VCPU vectors. Finally, the driver
calls request_irq() on MSI0. This would call conpose_msi_msg(), which would
free the block of 32 MSIs, and allocate a new block. This would undo the
retarget of MSI1-17, and likely leave those MSIs targeting invalid VCPU vectors.
This is addressed by patch 1, which is introduced first to prevent a regression.
Jeffrey Hugo (2):
PCI: hv: Reuse existing ITRE allocation in compose_msi_msg()
PCI: hv: Fix interrupt mapping for multi-MSI
drivers/pci/controller/pci-hyperv.c | 76 ++++++++++++++++++++++++++++---------
1 file changed, 59 insertions(+), 17 deletions(-)
--
2.7.4
> From: Jeffrey Hugo <[email protected]>
> Sent: Monday, May 9, 2022 2:48 PM
Thank you Jeff for working out the patches!
Thanks,
-- Dexuan
Currently if compose_msi_msg() is called multiple times, it will free any
previous ITRE allocation, and generate a new allocation. While nothing
prevents this from occurring, it is extranious when Linux could just reuse
the existing allocation and avoid a bunch of overhead.
However, when future ITRE allocations operate on blocks of MSIs instead of
a single line, freeing the allocation will impact all of the lines. This
could cause an issue where an allocation of N MSIs occurs, then some of
the lines are retargeted, and finally the allocation is freed/reallocated.
The freeing of the allocation removes all of the configuration for the
entire block, which requires all the lines to be retargeted, which might
not happen since some lines might already be unmasked/active.
Signed-off-by: Jeffrey Hugo <[email protected]>
---
drivers/pci/controller/pci-hyperv.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index cf2fe575..5e2e637 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1707,6 +1707,15 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
u32 size;
int ret;
+ /* Reuse the previous allocation */
+ if (data->chip_data) {
+ int_desc = data->chip_data;
+ msg->address_hi = int_desc->address >> 32;
+ msg->address_lo = int_desc->address & 0xffffffff;
+ msg->data = int_desc->data;
+ return;
+ }
+
pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data));
dest = irq_data_get_effective_affinity_mask(data);
pbus = pdev->bus;
@@ -1716,13 +1725,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
if (!hpdev)
goto return_null_message;
- /* Free any previous message that might have already been composed. */
- if (data->chip_data) {
- int_desc = data->chip_data;
- data->chip_data = NULL;
- hv_int_desc_free(hpdev, int_desc);
- }
-
int_desc = kzalloc(sizeof(*int_desc), GFP_ATOMIC);
if (!int_desc)
goto drop_reference;
--
2.7.4
On 5/9/2022 5:23 PM, Dexuan Cui wrote:
>> From: Jeffrey Hugo <[email protected]>
>> Sent: Monday, May 9, 2022 2:48 PM
>
> Thank you Jeff for working out the patches!
Thanks for the feedback and reviews. This certainly would have been
more challenging without your assistance.
-Jeff
From: Jeffrey Hugo <[email protected]> Sent: Monday, May 9, 2022 2:48 PM
>
> While multi-MSI appears to work with pci-hyperv.c, there was a concern about
> how linux was doing the ITRE allocations. Patch 2 addresses the concern.
>
> However, patch 2 exposed an issue with how compose_msi_msg() was freeing a
> previous allocation when called for the Nth time. Imagine a driver using
> pci_alloc_irq_vectors() to request 32 MSIs. This would cause compose_msi_msg()
> to be called 32 times, once for each MSI. With patch 2, MSI0 would allocate
> the ITREs needed, and MSI1-31 would use the cached information. Then the driver
> uses request_irq() on MSI1-17. This would call compose_msi_msg() again on those
> MSIs, which would again use the cached information. Then unmask() would be
> called to retarget the MSIs to the right VCPU vectors. Finally, the driver
> calls request_irq() on MSI0. This would call conpose_msi_msg(), which would
> free the block of 32 MSIs, and allocate a new block. This would undo the
> retarget of MSI1-17, and likely leave those MSIs targeting invalid VCPU vectors.
> This is addressed by patch 1, which is introduced first to prevent a regression.
>
> Jeffrey Hugo (2):
> PCI: hv: Reuse existing ITRE allocation in compose_msi_msg()
> PCI: hv: Fix interrupt mapping for multi-MSI
>
> drivers/pci/controller/pci-hyperv.c | 76 ++++++++++++++++++++++++++++---------
> 1 file changed, 59 insertions(+), 17 deletions(-)
>
> --
> 2.7.4
I tested these two patches in combination with the earlier two on
5.18-rc6 in an ARM64 VM in Azure. This was a smoke-test to ensure
everything compiled and that the changes aren't fundamentally broken
on ARM64. The PCI device in this case is the Mellanox Virtual Function
offered to VMs in Azure, which is MSI-X. As such, the new MSI "batch"
handling is not tested.
Tested-by: Michael Kelley <[email protected]>
On Mon, May 09, 2022 at 03:48:20PM -0600, Jeffrey Hugo wrote:
> While multi-MSI appears to work with pci-hyperv.c, there was a concern about
> how linux was doing the ITRE allocations. Patch 2 addresses the concern.
>
> However, patch 2 exposed an issue with how compose_msi_msg() was freeing a
> previous allocation when called for the Nth time. Imagine a driver using
> pci_alloc_irq_vectors() to request 32 MSIs. This would cause compose_msi_msg()
> to be called 32 times, once for each MSI. With patch 2, MSI0 would allocate
> the ITREs needed, and MSI1-31 would use the cached information. Then the driver
> uses request_irq() on MSI1-17. This would call compose_msi_msg() again on those
> MSIs, which would again use the cached information. Then unmask() would be
> called to retarget the MSIs to the right VCPU vectors. Finally, the driver
> calls request_irq() on MSI0. This would call conpose_msi_msg(), which would
> free the block of 32 MSIs, and allocate a new block. This would undo the
> retarget of MSI1-17, and likely leave those MSIs targeting invalid VCPU vectors.
> This is addressed by patch 1, which is introduced first to prevent a regression.
>
> Jeffrey Hugo (2):
> PCI: hv: Reuse existing ITRE allocation in compose_msi_msg()
> PCI: hv: Fix interrupt mapping for multi-MSI
>
Applied to hyperv-next. Thanks.
On 5/11/2022 9:19 AM, Wei Liu wrote:
> On Wed, May 11, 2022 at 08:47:23AM -0600, Jeffrey Hugo wrote:
>> On 5/11/2022 8:41 AM, Wei Liu wrote:
>>> On Mon, May 09, 2022 at 03:48:20PM -0600, Jeffrey Hugo wrote:
>>>> While multi-MSI appears to work with pci-hyperv.c, there was a concern about
>>>> how linux was doing the ITRE allocations. Patch 2 addresses the concern.
>>>>
>>>> However, patch 2 exposed an issue with how compose_msi_msg() was freeing a
>>>> previous allocation when called for the Nth time. Imagine a driver using
>>>> pci_alloc_irq_vectors() to request 32 MSIs. This would cause compose_msi_msg()
>>>> to be called 32 times, once for each MSI. With patch 2, MSI0 would allocate
>>>> the ITREs needed, and MSI1-31 would use the cached information. Then the driver
>>>> uses request_irq() on MSI1-17. This would call compose_msi_msg() again on those
>>>> MSIs, which would again use the cached information. Then unmask() would be
>>>> called to retarget the MSIs to the right VCPU vectors. Finally, the driver
>>>> calls request_irq() on MSI0. This would call conpose_msi_msg(), which would
>>>> free the block of 32 MSIs, and allocate a new block. This would undo the
>>>> retarget of MSI1-17, and likely leave those MSIs targeting invalid VCPU vectors.
>>>> This is addressed by patch 1, which is introduced first to prevent a regression.
>>>>
>>>> Jeffrey Hugo (2):
>>>> PCI: hv: Reuse existing ITRE allocation in compose_msi_msg()
>>>> PCI: hv: Fix interrupt mapping for multi-MSI
>>>>
>>>
>>> Applied to hyperv-next. Thanks.
>>
>> Huh? I thought you wanted a V2. I was intending on sending that out today.
>>
>
> Please send them out. I will apply the new version.
Sure, sending shortly.
-Jeff
On Wed, May 11, 2022 at 08:47:23AM -0600, Jeffrey Hugo wrote:
> On 5/11/2022 8:41 AM, Wei Liu wrote:
> > On Mon, May 09, 2022 at 03:48:20PM -0600, Jeffrey Hugo wrote:
> > > While multi-MSI appears to work with pci-hyperv.c, there was a concern about
> > > how linux was doing the ITRE allocations. Patch 2 addresses the concern.
> > >
> > > However, patch 2 exposed an issue with how compose_msi_msg() was freeing a
> > > previous allocation when called for the Nth time. Imagine a driver using
> > > pci_alloc_irq_vectors() to request 32 MSIs. This would cause compose_msi_msg()
> > > to be called 32 times, once for each MSI. With patch 2, MSI0 would allocate
> > > the ITREs needed, and MSI1-31 would use the cached information. Then the driver
> > > uses request_irq() on MSI1-17. This would call compose_msi_msg() again on those
> > > MSIs, which would again use the cached information. Then unmask() would be
> > > called to retarget the MSIs to the right VCPU vectors. Finally, the driver
> > > calls request_irq() on MSI0. This would call conpose_msi_msg(), which would
> > > free the block of 32 MSIs, and allocate a new block. This would undo the
> > > retarget of MSI1-17, and likely leave those MSIs targeting invalid VCPU vectors.
> > > This is addressed by patch 1, which is introduced first to prevent a regression.
> > >
> > > Jeffrey Hugo (2):
> > > PCI: hv: Reuse existing ITRE allocation in compose_msi_msg()
> > > PCI: hv: Fix interrupt mapping for multi-MSI
> > >
> >
> > Applied to hyperv-next. Thanks.
>
> Huh? I thought you wanted a V2. I was intending on sending that out today.
>
Please send them out. I will apply the new version.
Thanks,
Wei.
> -Jeff
On 5/11/2022 8:41 AM, Wei Liu wrote:
> On Mon, May 09, 2022 at 03:48:20PM -0600, Jeffrey Hugo wrote:
>> While multi-MSI appears to work with pci-hyperv.c, there was a concern about
>> how linux was doing the ITRE allocations. Patch 2 addresses the concern.
>>
>> However, patch 2 exposed an issue with how compose_msi_msg() was freeing a
>> previous allocation when called for the Nth time. Imagine a driver using
>> pci_alloc_irq_vectors() to request 32 MSIs. This would cause compose_msi_msg()
>> to be called 32 times, once for each MSI. With patch 2, MSI0 would allocate
>> the ITREs needed, and MSI1-31 would use the cached information. Then the driver
>> uses request_irq() on MSI1-17. This would call compose_msi_msg() again on those
>> MSIs, which would again use the cached information. Then unmask() would be
>> called to retarget the MSIs to the right VCPU vectors. Finally, the driver
>> calls request_irq() on MSI0. This would call conpose_msi_msg(), which would
>> free the block of 32 MSIs, and allocate a new block. This would undo the
>> retarget of MSI1-17, and likely leave those MSIs targeting invalid VCPU vectors.
>> This is addressed by patch 1, which is introduced first to prevent a regression.
>>
>> Jeffrey Hugo (2):
>> PCI: hv: Reuse existing ITRE allocation in compose_msi_msg()
>> PCI: hv: Fix interrupt mapping for multi-MSI
>>
>
> Applied to hyperv-next. Thanks.
Huh? I thought you wanted a V2. I was intending on sending that out today.
-Jeff