Hi All,
This is v3 series to support passthrough on Xen when dom0 is PVH.
v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup gsi and map pirq for passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi from irq. To add a new sysfs for gsi, then userspace can get gsi number from sysfs.
v2 link:
https://lore.kernel.org/lkml/[email protected]/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions.
I will introduce all issues that these patches try to fix and the differences between v1 and v2.
Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run “sudo xl pci-assignable-add <sbdf>” to assign a device, pci_stub will
call “pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()”,
the pci config write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.
Reason: the reason is that we don't tell vPCI that the device has been reset, so the current
cached state in pdev->vpci is all out of date and is different from the real device state.
Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new
hypercall to reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/[email protected]/), v1 simply allow
domU to write pci bar, it does not comply with the design principles of vPCI.
2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then
xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.
Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag,
it will fail at has_pirq check.
Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is
PVH). The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the has_pirq check(xen
https://lore.kernel.org/xen-devel/[email protected]/).
3. the gsi of a passthrough device doesn't be unmasked
3.1 failed to check the permission of pirq
3.2 the gsi of passthrough device was not registered in PVH dom0
Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has
corresponding mappings in dom0. But it didn’t, so failed. See
XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the
devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be
able to be mapped into a domU.
Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked.
Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough
device in PVH dom0) call the unmask_irq() when we assign a device to be passthrough. So that
passthrough devices can have the mapping of gsi on PVH dom0 and gsi can be registered. This v2 patch
is different from the v1(
kernel https://lore.kernel.org/xen-devel/[email protected]/,
kernel https://lore.kernel.org/xen-devel/[email protected]/ and
xen https://lore.kernel.org/xen-devel/[email protected]/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and
may cause multiple registration.
4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi to pirq in function
xen_pt_realize(). But failed.
Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu
pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in
function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0,
when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and
follow the principle of applying first, distributing first. And if you debug the kernel codes(see
function __irq_alloc_descs), you will find the irq number is allocated from small to large by order,
but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller
irq number than gsi 28, and then gsi != irq.
Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi,
we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq)
records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide
a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get
gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2
patch is the same as v1(
kernel https://lore.kernel.org/xen-devel/[email protected]/ and
xen https://lore.kernel.org/xen-devel/[email protected]/)
About the v2 patch of qemu, just change an included head file, other are similar to the v1 (
qemu https://lore.kernel.org/xen-devel/[email protected]/), just call
xc_physdev_gsi_from_irq() to get gsi from irq.
Jiqian Chen (3):
xen/pci: Add xen_reset_device_state function
xen/pvh: Setup gsi and map pirq for passthrough device
PCI/sysfs: Add gsi sysfs for pci_dev
arch/x86/xen/enlighten_pvh.c | 116 +++++++++++++++++++++++++++++
drivers/acpi/pci_irq.c | 3 +-
drivers/pci/pci-sysfs.c | 11 +++
drivers/xen/pci.c | 12 +++
drivers/xen/xen-pciback/pci_stub.c | 12 +++
include/linux/acpi.h | 1 +
include/linux/pci.h | 2 +
include/xen/acpi.h | 1 +
include/xen/interface/physdev.h | 8 ++
include/xen/pci.h | 6 ++
10 files changed, 171 insertions(+), 1 deletion(-)
--
2.34.1
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.
And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.
Co-developed-by: Huang Rui <[email protected]>
Signed-off-by: Jiqian Chen <[email protected]>
---
drivers/xen/pci.c | 12 ++++++++++++
drivers/xen/xen-pciback/pci_stub.c | 4 ++++
include/xen/interface/physdev.h | 8 ++++++++
include/xen/pci.h | 6 ++++++
4 files changed, 30 insertions(+)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
}
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+ struct physdev_pci_device device = {
+ .seg = pci_domain_nr(dev->bus),
+ .bus = dev->bus->number,
+ .devfn = dev->devfn
+ };
+
+ return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device);
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..24f599eaec14 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -421,6 +421,10 @@ static int pcistub_init_device(struct pci_dev *dev)
else {
dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
__pci_reset_function_locked(dev);
+ if (!xen_pv_domain())
+ err = xen_reset_device_state(dev);
+ if (err)
+ goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..bed53afc4c52 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,14 @@ struct physdev_pci_device_add {
*/
#define PHYSDEVOP_prepare_msix 30
#define PHYSDEVOP_release_msix 31
+/*
+ * On PVH dom0, when device is reset, the vpci on Xen side
+ * won't get notification, so that the cached state in vpci is
+ * all out of date with the real device state. Use this to reset
+ * the vpci state of device.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
struct physdev_pci_device {
/* IN */
uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
#define __XEN_PCI_H__
#if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
int xen_find_device_domain_owner(struct pci_dev *dev);
int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
int xen_unregister_device_domain_owner(struct pci_dev *dev);
#else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+ return -1;
+}
+
static inline int xen_find_device_domain_owner(struct pci_dev *dev)
{
return -1;
--
2.34.1
When dom0 is PVH, the gsi isn't be unmasked, that causes two
problems.
First, in PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.
When assign a device to passthrough, proactively setup the gsi
of the device during that process.
Second, for hvm guest, it allocates a pirq and irq for a
passthrough device by using gsi, before that, the gsi must first
have a mapping in dom0, see Xen code
pci_add_dm_done->xc_domain_irq_permission, it calls into Xen and
check whether dom0 has the mapping. But currently PVH dom0 uses
the kernel local interrupt mechanism instead of the pirq. So if
passthrough a device to guest on PVH dom0, it will fail at the
permission checking.
When assign a device to passthrough, proactively map priq for the
gsi of the device during that process.
Co-developed-by: Huang Rui <[email protected]>
Signed-off-by: Jiqian Chen <[email protected]>
---
arch/x86/xen/enlighten_pvh.c | 116 +++++++++++++++++++++++++++++
drivers/acpi/pci_irq.c | 2 +-
drivers/xen/xen-pciback/pci_stub.c | 8 ++
include/linux/acpi.h | 1 +
include/xen/acpi.h | 1 +
5 files changed, 127 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index ada3868c02c2..d74a221bfb81 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/acpi.h>
#include <linux/export.h>
+#include <linux/pci.h>
#include <xen/hvc-console.h>
@@ -25,6 +26,121 @@
bool __ro_after_init xen_pvh;
EXPORT_SYMBOL_GPL(xen_pvh);
+typedef struct gsi_info {
+ u32 gsi;
+ int trigger;
+ int polarity;
+ int pirq;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+ struct acpi_pci_id id;
+ u8 pin;
+ acpi_handle link;
+ u32 index; /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+ gsi_info_t *gsi_info)
+{
+ int gsi;
+ u8 pin = 0;
+ struct acpi_prt_entry *entry;
+ int trigger = ACPI_LEVEL_SENSITIVE;
+ int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+ if (dev)
+ pin = dev->pin;
+ if (!dev || !pin || !gsi_info)
+ return -EINVAL;
+
+ entry = acpi_pci_irq_lookup(dev, pin);
+ if (entry) {
+ if (entry->link)
+ gsi = acpi_pci_link_allocate_irq(entry->link,
+ entry->index,
+ &trigger, &polarity,
+ NULL);
+ else
+ gsi = entry->index;
+ } else
+ return -EINVAL;
+
+ if (gsi < 0)
+ return -EINVAL;
+
+ gsi_info->gsi = gsi;
+ gsi_info->trigger = trigger;
+ gsi_info->polarity = polarity;
+
+ return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+ struct physdev_setup_gsi setup_gsi;
+
+ if (!gsi_info)
+ return -EINVAL;
+
+ setup_gsi.gsi = gsi_info->gsi;
+ setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+ setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+ return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+}
+
+static int xen_pvh_map_pirq(gsi_info_t *gsi_info)
+{
+ struct physdev_map_pirq map_irq;
+ int ret;
+
+ if (!gsi_info)
+ return -EINVAL;
+
+ map_irq.domid = DOMID_SELF;
+ map_irq.type = MAP_PIRQ_TYPE_GSI;
+ map_irq.index = gsi_info->gsi;
+ map_irq.pirq = gsi_info->gsi;
+
+ ret = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq);
+ gsi_info->pirq = map_irq.pirq;
+
+ return ret;
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+ int ret;
+ gsi_info_t gsi_info;
+
+ if (!dev)
+ return -EINVAL;
+
+ ret = xen_pvh_get_gsi_info(dev, &gsi_info);
+ if (ret) {
+ xen_raw_printk("Fail to get gsi info!\n");
+ return ret;
+ }
+
+ ret = xen_pvh_setup_gsi(&gsi_info);
+ if (ret == -EEXIST) {
+ ret = 0;
+ xen_raw_printk("Already setup the GSI :%u\n", gsi_info.gsi);
+ } else if (ret) {
+ xen_raw_printk("Fail to setup gsi (%d)!\n", gsi_info.gsi);
+ return ret;
+ }
+
+ ret = xen_pvh_map_pirq(&gsi_info);
+ if (ret)
+ xen_raw_printk("Fail to map pirq for gsi (%d)!\n", gsi_info.gsi);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
void __init xen_pvh_init(struct boot_params *boot_params)
{
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
}
#endif /* CONFIG_X86_IO_APIC */
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
{
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 24f599eaec14..c3aeefbf4ba1 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -20,6 +20,7 @@
#include <linux/atomic.h>
#include <xen/events.h>
#include <xen/pci.h>
+#include <xen/acpi.h>
#include <xen/xen.h>
#include <asm/xen/hypervisor.h>
#include <xen/interface/physdev.h>
@@ -427,6 +428,13 @@ static int pcistub_init_device(struct pci_dev *dev)
goto config_release;
pci_restore_state(dev);
}
+
+ if (xen_initial_domain() && xen_pvh_domain()) {
+ err = xen_pvh_passthrough_gsi(dev);
+ if (err)
+ goto config_release;
+ }
+
/* Now disable the device (this also ensures some private device
* data is setup before we export)
*/
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 54189e0e5f41..a211bdcdd6ff 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -360,6 +360,7 @@ void acpi_unregister_gsi (u32 gsi);
struct pci_dev;
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin);
int acpi_pci_irq_enable (struct pci_dev *dev);
void acpi_penalize_isa_irq(int irq, int active);
bool acpi_isa_irq_available(int irq);
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index b1e11863144d..ce7f5554f88e 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -67,6 +67,7 @@ static inline void xen_acpi_sleep_register(void)
acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
}
}
+int xen_pvh_passthrough_gsi(struct pci_dev *dev);
#else
static inline void xen_acpi_sleep_register(void)
{
--
2.34.1
There is a need for some scenarios to use gsi sysfs.
For example, when xen passthrough a device to dumU, it will
use gsi to map pirq, but currently userspace can't get gsi
number.
So, add gsi sysfs for that and for other potential scenarios.
Co-developed-by: Huang Rui <[email protected]>
Signed-off-by: Jiqian Chen <[email protected]>
---
drivers/acpi/pci_irq.c | 1 +
drivers/pci/pci-sysfs.c | 11 +++++++++++
include/linux/pci.h | 2 ++
3 files changed, 14 insertions(+)
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 630fe0a34bc6..739a58755df2 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
kfree(entry);
return 0;
}
+ dev->gsi = gsi;
rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity);
if (rc < 0) {
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 2321fdfefd7d..c51df88d079e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
}
static DEVICE_ATTR_RO(irq);
+static ssize_t gsi_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ return sysfs_emit(buf, "%u\n", pdev->gsi);
+}
+static DEVICE_ATTR_RO(gsi);
+
static ssize_t broken_parity_status_show(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
&dev_attr_revision.attr,
&dev_attr_class.attr,
&dev_attr_irq.attr,
+ &dev_attr_gsi.attr,
&dev_attr_local_cpus.attr,
&dev_attr_local_cpulist.attr,
&dev_attr_modalias.attr,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..7ef9060b239c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -529,6 +529,8 @@ struct pci_dev {
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+ unsigned int gsi;
};
static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
--
2.34.1
On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
> There is a need for some scenarios to use gsi sysfs.
> For example, when xen passthrough a device to dumU, it will
> use gsi to map pirq, but currently userspace can't get gsi
> number.
> So, add gsi sysfs for that and for other potential scenarios.
>
> Co-developed-by: Huang Rui <[email protected]>
> Signed-off-by: Jiqian Chen <[email protected]>
> ---
> drivers/acpi/pci_irq.c | 1 +
> drivers/pci/pci-sysfs.c | 11 +++++++++++
> include/linux/pci.h | 2 ++
> 3 files changed, 14 insertions(+)
>
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index 630fe0a34bc6..739a58755df2 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
> kfree(entry);
> return 0;
> }
> + dev->gsi = gsi;
It would be better if the gsi if fetched without requiring calling
acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
enabled. The gsi is known at boot time and won't change for the
lifetime of the device.
>
> rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity);
> if (rc < 0) {
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 2321fdfefd7d..c51df88d079e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(irq);
>
> +static ssize_t gsi_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
const
Thanks, Roger.
On 2023/12/12 01:57, Roger Pau Monné wrote:
> On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
>> There is a need for some scenarios to use gsi sysfs.
>> For example, when xen passthrough a device to dumU, it will
>> use gsi to map pirq, but currently userspace can't get gsi
>> number.
>> So, add gsi sysfs for that and for other potential scenarios.
>>
>> Co-developed-by: Huang Rui <[email protected]>
>> Signed-off-by: Jiqian Chen <[email protected]>
>> ---
>> drivers/acpi/pci_irq.c | 1 +
>> drivers/pci/pci-sysfs.c | 11 +++++++++++
>> include/linux/pci.h | 2 ++
>> 3 files changed, 14 insertions(+)
>>
>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>> index 630fe0a34bc6..739a58755df2 100644
>> --- a/drivers/acpi/pci_irq.c
>> +++ b/drivers/acpi/pci_irq.c
>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>> kfree(entry);
>> return 0;
>> }
>> + dev->gsi = gsi;
>
> It would be better if the gsi if fetched without requiring calling
> acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
> enabled. The gsi is known at boot time and won't change for the
> lifetime of the device.
Do you have any suggest places to do this?
>
>>
>> rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity);
>> if (rc < 0) {
>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>> index 2321fdfefd7d..c51df88d079e 100644
>> --- a/drivers/pci/pci-sysfs.c
>> +++ b/drivers/pci/pci-sysfs.c
>> @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
>> }
>> static DEVICE_ATTR_RO(irq);
>>
>> +static ssize_t gsi_show(struct device *dev,
>> + struct device_attribute *attr,
>> + char *buf)
>> +{
>> + struct pci_dev *pdev = to_pci_dev(dev);
>
> const
Do you mean "const struct pci_dev *pdev = to_pci_dev(dev);" ?
>
> Thanks, Roger.
--
Best regards,
Jiqian Chen.
On Mon, Dec 11, 2023 at 12:15:17AM +0800, Jiqian Chen wrote:
> When device on dom0 side has been reset, the vpci on Xen side
> won't get notification, so that the cached state in vpci is
> all out of date with the real device state.
> To solve that problem, add a new function to clear all vpci
> device state when device is reset on dom0 side.
>
> And call that function in pcistub_init_device. Because when
> using "pci-assignable-add" to assign a passthrough device in
> Xen, it will reset passthrough device and the vpci state will
> out of date, and then device will fail to restore bar state.
>
> Co-developed-by: Huang Rui <[email protected]>
> Signed-off-by: Jiqian Chen <[email protected]>
> ---
> drivers/xen/pci.c | 12 ++++++++++++
> drivers/xen/xen-pciback/pci_stub.c | 4 ++++
> include/xen/interface/physdev.h | 8 ++++++++
> include/xen/pci.h | 6 ++++++
> 4 files changed, 30 insertions(+)
>
> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> index 72d4e3f193af..e9b30bc09139 100644
> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
> return r;
> }
>
> +int xen_reset_device_state(const struct pci_dev *dev)
> +{
> + struct physdev_pci_device device = {
> + .seg = pci_domain_nr(dev->bus),
> + .bus = dev->bus->number,
> + .devfn = dev->devfn
> + };
> +
> + return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device);
> +}
> +EXPORT_SYMBOL_GPL(xen_reset_device_state);
> +
> static int xen_pci_notifier(struct notifier_block *nb,
> unsigned long action, void *data)
> {
> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
> index e34b623e4b41..24f599eaec14 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -421,6 +421,10 @@ static int pcistub_init_device(struct pci_dev *dev)
> else {
> dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
> __pci_reset_function_locked(dev);
> + if (!xen_pv_domain())
> + err = xen_reset_device_state(dev);
> + if (err)
> + goto config_release;
I think you are missing other instances where
__pci_reset_function_locked() is called in pci_stub.c? See
pcistub_device_release() and pcistub_put_pci_dev().
Overall I'm not sure why the hypercall wrapper needs to live in
xen/pci.c. I think it would be better if you could create a static
wrapper in pci_stub.c that does the call to
__pci_reset_function_locked() plus PHYSDEVOP_pci_device_state_reset.
That would make it less likely that new callers of
__pci_reset_function_locked() are introduced without noticing the need
to also call PHYSDEVOP_pci_device_state_reset.
Thanks, Roger.
On Tue, Dec 12, 2023 at 06:34:27AM +0000, Chen, Jiqian wrote:
>
> On 2023/12/12 01:57, Roger Pau Monné wrote:
> > On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
> >> There is a need for some scenarios to use gsi sysfs.
> >> For example, when xen passthrough a device to dumU, it will
> >> use gsi to map pirq, but currently userspace can't get gsi
> >> number.
> >> So, add gsi sysfs for that and for other potential scenarios.
> >>
> >> Co-developed-by: Huang Rui <[email protected]>
> >> Signed-off-by: Jiqian Chen <[email protected]>
> >> ---
> >> drivers/acpi/pci_irq.c | 1 +
> >> drivers/pci/pci-sysfs.c | 11 +++++++++++
> >> include/linux/pci.h | 2 ++
> >> 3 files changed, 14 insertions(+)
> >>
> >> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> >> index 630fe0a34bc6..739a58755df2 100644
> >> --- a/drivers/acpi/pci_irq.c
> >> +++ b/drivers/acpi/pci_irq.c
> >> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
> >> kfree(entry);
> >> return 0;
> >> }
> >> + dev->gsi = gsi;
> >
> > It would be better if the gsi if fetched without requiring calling
> > acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
> > enabled. The gsi is known at boot time and won't change for the
> > lifetime of the device.
> Do you have any suggest places to do this?
I'm not an expert on this, but drivers/pci/pci-sysfs.c would seem like
a better place, together with the rest of the resources.
Maybe my understanding is incorrect, but given the suggested placement
in acpi_pci_irq_enable() I think the device would need to bind the
interrupt in order for the gsi node to appear on sysfs?
Would the current approach work if the device is assigned to pciback
on the kernel command line, and thus never owned by any driver in
dom0?
> >
> >>
> >> rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity);
> >> if (rc < 0) {
> >> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> >> index 2321fdfefd7d..c51df88d079e 100644
> >> --- a/drivers/pci/pci-sysfs.c
> >> +++ b/drivers/pci/pci-sysfs.c
> >> @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
> >> }
> >> static DEVICE_ATTR_RO(irq);
> >>
> >> +static ssize_t gsi_show(struct device *dev,
> >> + struct device_attribute *attr,
> >> + char *buf)
> >> +{
> >> + struct pci_dev *pdev = to_pci_dev(dev);
> >
> > const
> Do you mean "const struct pci_dev *pdev = to_pci_dev(dev);" ?
Yup.
Thanks, Roger.
On 2023/12/12 16:08, Roger Pau Monné wrote:
> On Mon, Dec 11, 2023 at 12:15:17AM +0800, Jiqian Chen wrote:
>> When device on dom0 side has been reset, the vpci on Xen side
>> won't get notification, so that the cached state in vpci is
>> all out of date with the real device state.
>> To solve that problem, add a new function to clear all vpci
>> device state when device is reset on dom0 side.
>>
>> And call that function in pcistub_init_device. Because when
>> using "pci-assignable-add" to assign a passthrough device in
>> Xen, it will reset passthrough device and the vpci state will
>> out of date, and then device will fail to restore bar state.
>>
>> Co-developed-by: Huang Rui <[email protected]>
>> Signed-off-by: Jiqian Chen <[email protected]>
>> ---
>> drivers/xen/pci.c | 12 ++++++++++++
>> drivers/xen/xen-pciback/pci_stub.c | 4 ++++
>> include/xen/interface/physdev.h | 8 ++++++++
>> include/xen/pci.h | 6 ++++++
>> 4 files changed, 30 insertions(+)
>>
>> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
>> index 72d4e3f193af..e9b30bc09139 100644
>> --- a/drivers/xen/pci.c
>> +++ b/drivers/xen/pci.c
>> @@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
>> return r;
>> }
>>
>> +int xen_reset_device_state(const struct pci_dev *dev)
>> +{
>> + struct physdev_pci_device device = {
>> + .seg = pci_domain_nr(dev->bus),
>> + .bus = dev->bus->number,
>> + .devfn = dev->devfn
>> + };
>> +
>> + return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device);
>> +}
>> +EXPORT_SYMBOL_GPL(xen_reset_device_state);
>> +
>> static int xen_pci_notifier(struct notifier_block *nb,
>> unsigned long action, void *data)
>> {
>> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
>> index e34b623e4b41..24f599eaec14 100644
>> --- a/drivers/xen/xen-pciback/pci_stub.c
>> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> @@ -421,6 +421,10 @@ static int pcistub_init_device(struct pci_dev *dev)
>> else {
>> dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
>> __pci_reset_function_locked(dev);
>> + if (!xen_pv_domain())
>> + err = xen_reset_device_state(dev);
>> + if (err)
>> + goto config_release;
>
> I think you are missing other instances where
> __pci_reset_function_locked() is called in pci_stub.c? See
> pcistub_device_release() and pcistub_put_pci_dev().
Sorry, I didn't consider the situation to free passthrough device. You are right.
>
> Overall I'm not sure why the hypercall wrapper needs to live in
> xen/pci.c.
For other possible scenarios where this function may be used?
> I think it would be better if you could create a static wrapper in pci_stub.c that does the call to
> __pci_reset_function_locked() plus PHYSDEVOP_pci_device_state_reset.
> That would make it less likely that new callers of
> __pci_reset_function_locked() are introduced without noticing the need
> to also call PHYSDEVOP_pci_device_state_reset.
Ok, I will add a new function to do __pci_reset_function_locked and PHYSDEVOP_pci_device_state_reset in pci_stub.c
>
> Thanks, Roger.
--
Best regards,
Jiqian Chen.
On 2023/12/12 17:18, Roger Pau Monné wrote:
> On Tue, Dec 12, 2023 at 06:34:27AM +0000, Chen, Jiqian wrote:
>>
>> On 2023/12/12 01:57, Roger Pau Monné wrote:
>>> On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
>>>> There is a need for some scenarios to use gsi sysfs.
>>>> For example, when xen passthrough a device to dumU, it will
>>>> use gsi to map pirq, but currently userspace can't get gsi
>>>> number.
>>>> So, add gsi sysfs for that and for other potential scenarios.
>>>>
>>>> Co-developed-by: Huang Rui <[email protected]>
>>>> Signed-off-by: Jiqian Chen <[email protected]>
>>>> ---
>>>> drivers/acpi/pci_irq.c | 1 +
>>>> drivers/pci/pci-sysfs.c | 11 +++++++++++
>>>> include/linux/pci.h | 2 ++
>>>> 3 files changed, 14 insertions(+)
>>>>
>>>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>>>> index 630fe0a34bc6..739a58755df2 100644
>>>> --- a/drivers/acpi/pci_irq.c
>>>> +++ b/drivers/acpi/pci_irq.c
>>>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>>>> kfree(entry);
>>>> return 0;
>>>> }
>>>> + dev->gsi = gsi;
>>>
>>> It would be better if the gsi if fetched without requiring calling
>>> acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
>>> enabled. The gsi is known at boot time and won't change for the
>>> lifetime of the device.
>> Do you have any suggest places to do this?
>
> I'm not an expert on this, but drivers/pci/pci-sysfs.c would seem like
> a better place, together with the rest of the resources.
I'm not familiar with this too. But it seems pci-sysfs.c only creates sysfs node and supports the read/write method without initializing the values.
If want to initialize the value of gsi here. An approach to initialize it is to call acpi_pci_irq_lookup to get gsi number when the first time it is read?
>
> Maybe my understanding is incorrect, but given the suggested placement
> in acpi_pci_irq_enable() I think the device would need to bind the
> interrupt in order for the gsi node to appear on sysfs?
No, gsi sysfs has existed there, in acpi_pci_irq_enable is to initialize the value of gsi.
>
> Would the current approach work if the device is assigned to pciback
> on the kernel command line, and thus never owned by any driver in
> dom0?
If assigned to pciback, I think pciback will enable the device, and then acpi_pci_irq_enable will be called, and then the gsi will be initialized. So, current can work.
>
>>>
>>>>
>>>> rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity);
>>>> if (rc < 0) {
>>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>>> index 2321fdfefd7d..c51df88d079e 100644
>>>> --- a/drivers/pci/pci-sysfs.c
>>>> +++ b/drivers/pci/pci-sysfs.c
>>>> @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
>>>> }
>>>> static DEVICE_ATTR_RO(irq);
>>>>
>>>> +static ssize_t gsi_show(struct device *dev,
>>>> + struct device_attribute *attr,
>>>> + char *buf)
>>>> +{
>>>> + struct pci_dev *pdev = to_pci_dev(dev);
>>>
>>> const
>> Do you mean "const struct pci_dev *pdev = to_pci_dev(dev);" ?
>
> Yup.
>
> Thanks, Roger.
--
Best regards,
Jiqian Chen.
On Wed, Dec 13, 2023 at 03:31:21AM +0000, Chen, Jiqian wrote:
> On 2023/12/12 17:18, Roger Pau Monné wrote:
> > On Tue, Dec 12, 2023 at 06:34:27AM +0000, Chen, Jiqian wrote:
> >>
> >> On 2023/12/12 01:57, Roger Pau Monné wrote:
> >>> On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
> >>>> There is a need for some scenarios to use gsi sysfs.
> >>>> For example, when xen passthrough a device to dumU, it will
> >>>> use gsi to map pirq, but currently userspace can't get gsi
> >>>> number.
> >>>> So, add gsi sysfs for that and for other potential scenarios.
> >>>>
> >>>> Co-developed-by: Huang Rui <[email protected]>
> >>>> Signed-off-by: Jiqian Chen <[email protected]>
> >>>> ---
> >>>> drivers/acpi/pci_irq.c | 1 +
> >>>> drivers/pci/pci-sysfs.c | 11 +++++++++++
> >>>> include/linux/pci.h | 2 ++
> >>>> 3 files changed, 14 insertions(+)
> >>>>
> >>>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> >>>> index 630fe0a34bc6..739a58755df2 100644
> >>>> --- a/drivers/acpi/pci_irq.c
> >>>> +++ b/drivers/acpi/pci_irq.c
> >>>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
> >>>> kfree(entry);
> >>>> return 0;
> >>>> }
> >>>> + dev->gsi = gsi;
> >>>
> >>> It would be better if the gsi if fetched without requiring calling
> >>> acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
> >>> enabled. The gsi is known at boot time and won't change for the
> >>> lifetime of the device.
> >> Do you have any suggest places to do this?
> >
> > I'm not an expert on this, but drivers/pci/pci-sysfs.c would seem like
> > a better place, together with the rest of the resources.
> I'm not familiar with this too. But it seems pci-sysfs.c only creates sysfs node and supports the read/write method without initializing the values.
> If want to initialize the value of gsi here. An approach to initialize it is to call acpi_pci_irq_lookup to get gsi number when the first time it is read?
Hm, maybe, I don't really have much experience with sysfs, so don't
know how nodes are usually initialized.
> >
> > Maybe my understanding is incorrect, but given the suggested placement
> > in acpi_pci_irq_enable() I think the device would need to bind the
> > interrupt in order for the gsi node to appear on sysfs?
> No, gsi sysfs has existed there, in acpi_pci_irq_enable is to initialize the value of gsi.
>
> >
> > Would the current approach work if the device is assigned to pciback
> > on the kernel command line, and thus never owned by any driver in
> > dom0?
> If assigned to pciback, I think pciback will enable the device, and then acpi_pci_irq_enable will be called, and then the gsi will be initialized. So, current can work.
This needs checking to be sure, I'm certainly not that familiar. You
would need to at least test that it works properly when the device is
hidden using xen-pciback.hide=(SBDF) in the Linux kernel command line.
Thanks, Roger.
On 2023/12/13 20:12, Roger Pau Monné wrote:
> On Wed, Dec 13, 2023 at 03:31:21AM +0000, Chen, Jiqian wrote:
>> On 2023/12/12 17:18, Roger Pau Monné wrote:
>>> On Tue, Dec 12, 2023 at 06:34:27AM +0000, Chen, Jiqian wrote:
>>>>
>>>> On 2023/12/12 01:57, Roger Pau Monné wrote:
>>>>> On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
>>>>>> There is a need for some scenarios to use gsi sysfs.
>>>>>> For example, when xen passthrough a device to dumU, it will
>>>>>> use gsi to map pirq, but currently userspace can't get gsi
>>>>>> number.
>>>>>> So, add gsi sysfs for that and for other potential scenarios.
>>>>>>
>>>>>> Co-developed-by: Huang Rui <[email protected]>
>>>>>> Signed-off-by: Jiqian Chen <[email protected]>
>>>>>> ---
>>>>>> drivers/acpi/pci_irq.c | 1 +
>>>>>> drivers/pci/pci-sysfs.c | 11 +++++++++++
>>>>>> include/linux/pci.h | 2 ++
>>>>>> 3 files changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>>>>>> index 630fe0a34bc6..739a58755df2 100644
>>>>>> --- a/drivers/acpi/pci_irq.c
>>>>>> +++ b/drivers/acpi/pci_irq.c
>>>>>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>>>>>> kfree(entry);
>>>>>> return 0;
>>>>>> }
>>>>>> + dev->gsi = gsi;
>>>>>
>>>>> It would be better if the gsi if fetched without requiring calling
>>>>> acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
>>>>> enabled. The gsi is known at boot time and won't change for the
>>>>> lifetime of the device.
>>>> Do you have any suggest places to do this?
>>>
>>> I'm not an expert on this, but drivers/pci/pci-sysfs.c would seem like
>>> a better place, together with the rest of the resources.
>> I'm not familiar with this too. But it seems pci-sysfs.c only creates sysfs node and supports the read/write method without initializing the values.
>> If want to initialize the value of gsi here. An approach to initialize it is to call acpi_pci_irq_lookup to get gsi number when the first time it is read?
>
> Hm, maybe, I don't really have much experience with sysfs, so don't
> know how nodes are usually initialized.
Maybe the maintainers of sysfs can give some suggest places to initialize the value of gsi.
>
>>>
>>> Maybe my understanding is incorrect, but given the suggested placement
>>> in acpi_pci_irq_enable() I think the device would need to bind the
>>> interrupt in order for the gsi node to appear on sysfs?
>> No, gsi sysfs has existed there, in acpi_pci_irq_enable is to initialize the value of gsi.
>>
>>>
>>> Would the current approach work if the device is assigned to pciback
>>> on the kernel command line, and thus never owned by any driver in
>>> dom0?
>> If assigned to pciback, I think pciback will enable the device, and then acpi_pci_irq_enable will be called, and then the gsi will be initialized. So, current can work.
>
> This needs checking to be sure, I'm certainly not that familiar. You
> would need to at least test that it works properly when the device is
> hidden using xen-pciback.hide=(SBDF) in the Linux kernel command line.
Sure, I have validated it on my side. Both the "Static assignment for built-in xen-pciback(xen-pciback.hide=(SBDF))" and the "Dynamic assignment with xl(sudo modprobe xen-pciback & sudo xl pci-assignable-add SBDF)" can work fine with current implementation.
>
> Thanks, Roger.
--
Best regards,
Jiqian Chen.
On Thu, Dec 14, 2023 at 07:08:32AM +0000, Chen, Jiqian wrote:
> On 2023/12/13 20:12, Roger Pau Monné wrote:
> > On Wed, Dec 13, 2023 at 03:31:21AM +0000, Chen, Jiqian wrote:
> >> On 2023/12/12 17:18, Roger Pau Monné wrote:
> >>> On Tue, Dec 12, 2023 at 06:34:27AM +0000, Chen, Jiqian wrote:
> >>>>
> >>>> On 2023/12/12 01:57, Roger Pau Monné wrote:
> >>>>> On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
> >>>>>> There is a need for some scenarios to use gsi sysfs.
> >>>>>> For example, when xen passthrough a device to dumU, it will
> >>>>>> use gsi to map pirq, but currently userspace can't get gsi
> >>>>>> number.
> >>>>>> So, add gsi sysfs for that and for other potential scenarios.
> >>>>>>
> >>>>>> Co-developed-by: Huang Rui <[email protected]>
> >>>>>> Signed-off-by: Jiqian Chen <[email protected]>
> >>>>>> ---
> >>>>>> drivers/acpi/pci_irq.c | 1 +
> >>>>>> drivers/pci/pci-sysfs.c | 11 +++++++++++
> >>>>>> include/linux/pci.h | 2 ++
> >>>>>> 3 files changed, 14 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> >>>>>> index 630fe0a34bc6..739a58755df2 100644
> >>>>>> --- a/drivers/acpi/pci_irq.c
> >>>>>> +++ b/drivers/acpi/pci_irq.c
> >>>>>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
> >>>>>> kfree(entry);
> >>>>>> return 0;
> >>>>>> }
> >>>>>> + dev->gsi = gsi;
> >>>>>
> >>>>> It would be better if the gsi if fetched without requiring calling
> >>>>> acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
> >>>>> enabled. The gsi is known at boot time and won't change for the
> >>>>> lifetime of the device.
> >>>> Do you have any suggest places to do this?
> >>>
> >>> I'm not an expert on this, but drivers/pci/pci-sysfs.c would seem like
> >>> a better place, together with the rest of the resources.
> >> I'm not familiar with this too. But it seems pci-sysfs.c only creates sysfs node and supports the read/write method without initializing the values.
> >> If want to initialize the value of gsi here. An approach to initialize it is to call acpi_pci_irq_lookup to get gsi number when the first time it is read?
> >
> > Hm, maybe, I don't really have much experience with sysfs, so don't
> > know how nodes are usually initialized.
> Maybe the maintainers of sysfs can give some suggest places to initialize the value of gsi.
>
> >
> >>>
> >>> Maybe my understanding is incorrect, but given the suggested placement
> >>> in acpi_pci_irq_enable() I think the device would need to bind the
> >>> interrupt in order for the gsi node to appear on sysfs?
> >> No, gsi sysfs has existed there, in acpi_pci_irq_enable is to initialize the value of gsi.
> >>
> >>>
> >>> Would the current approach work if the device is assigned to pciback
> >>> on the kernel command line, and thus never owned by any driver in
> >>> dom0?
> >> If assigned to pciback, I think pciback will enable the device, and then acpi_pci_irq_enable will be called, and then the gsi will be initialized. So, current can work.
> >
> > This needs checking to be sure, I'm certainly not that familiar. You
> > would need to at least test that it works properly when the device is
> > hidden using xen-pciback.hide=(SBDF) in the Linux kernel command line.
> Sure, I have validated it on my side. Both the "Static assignment for built-in xen-pciback(xen-pciback.hide=(SBDF))" and the "Dynamic assignment with xl(sudo modprobe xen-pciback & sudo xl pci-assignable-add SBDF)" can work fine with current implementation.
Oh, OK, if that's the case I don't have much objection in doing the
initialization in acpi_pci_irq_enable(), that's an internal Linux
detail. I mostly care about the GSI being exposed in sysfs in a
non-Xen specific way.
Thanks, Roger.
On 2023/12/14 16:46, Roger Pau Monné wrote:
> On Thu, Dec 14, 2023 at 07:08:32AM +0000, Chen, Jiqian wrote:
>> On 2023/12/13 20:12, Roger Pau Monné wrote:
>>> On Wed, Dec 13, 2023 at 03:31:21AM +0000, Chen, Jiqian wrote:
>>>> On 2023/12/12 17:18, Roger Pau Monné wrote:
>>>>> On Tue, Dec 12, 2023 at 06:34:27AM +0000, Chen, Jiqian wrote:
>>>>>>
>>>>>> On 2023/12/12 01:57, Roger Pau Monné wrote:
>>>>>>> On Mon, Dec 11, 2023 at 12:15:19AM +0800, Jiqian Chen wrote:
>>>>>>>> There is a need for some scenarios to use gsi sysfs.
>>>>>>>> For example, when xen passthrough a device to dumU, it will
>>>>>>>> use gsi to map pirq, but currently userspace can't get gsi
>>>>>>>> number.
>>>>>>>> So, add gsi sysfs for that and for other potential scenarios.
>>>>>>>>
>>>>>>>> Co-developed-by: Huang Rui <[email protected]>
>>>>>>>> Signed-off-by: Jiqian Chen <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/acpi/pci_irq.c | 1 +
>>>>>>>> drivers/pci/pci-sysfs.c | 11 +++++++++++
>>>>>>>> include/linux/pci.h | 2 ++
>>>>>>>> 3 files changed, 14 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>>>>>>>> index 630fe0a34bc6..739a58755df2 100644
>>>>>>>> --- a/drivers/acpi/pci_irq.c
>>>>>>>> +++ b/drivers/acpi/pci_irq.c
>>>>>>>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>>>>>>>> kfree(entry);
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> + dev->gsi = gsi;
>>>>>>>
>>>>>>> It would be better if the gsi if fetched without requiring calling
>>>>>>> acpi_pci_irq_enable(), as the gsi doesn't require the interrupt to be
>>>>>>> enabled. The gsi is known at boot time and won't change for the
>>>>>>> lifetime of the device.
>>>>>> Do you have any suggest places to do this?
>>>>>
>>>>> I'm not an expert on this, but drivers/pci/pci-sysfs.c would seem like
>>>>> a better place, together with the rest of the resources.
>>>> I'm not familiar with this too. But it seems pci-sysfs.c only creates sysfs node and supports the read/write method without initializing the values.
>>>> If want to initialize the value of gsi here. An approach to initialize it is to call acpi_pci_irq_lookup to get gsi number when the first time it is read?
>>>
>>> Hm, maybe, I don't really have much experience with sysfs, so don't
>>> know how nodes are usually initialized.
>> Maybe the maintainers of sysfs can give some suggest places to initialize the value of gsi.
>>
>>>
>>>>>
>>>>> Maybe my understanding is incorrect, but given the suggested placement
>>>>> in acpi_pci_irq_enable() I think the device would need to bind the
>>>>> interrupt in order for the gsi node to appear on sysfs?
>>>> No, gsi sysfs has existed there, in acpi_pci_irq_enable is to initialize the value of gsi.
>>>>
>>>>>
>>>>> Would the current approach work if the device is assigned to pciback
>>>>> on the kernel command line, and thus never owned by any driver in
>>>>> dom0?
>>>> If assigned to pciback, I think pciback will enable the device, and then acpi_pci_irq_enable will be called, and then the gsi will be initialized. So, current can work.
>>>
>>> This needs checking to be sure, I'm certainly not that familiar. You
>>> would need to at least test that it works properly when the device is
>>> hidden using xen-pciback.hide=(SBDF) in the Linux kernel command line.
>> Sure, I have validated it on my side. Both the "Static assignment for built-in xen-pciback(xen-pciback.hide=(SBDF))" and the "Dynamic assignment with xl(sudo modprobe xen-pciback & sudo xl pci-assignable-add SBDF)" can work fine with current implementation.
>
> Oh, OK, if that's the case I don't have much objection in doing the
> initialization in acpi_pci_irq_enable(), that's an internal Linux
> detail. I mostly care about the GSI being exposed in sysfs in a
> non-Xen specific way.
Yes, current implementation is a Linux internal way, not a Xen specific. In baremetal Linux, I also can see gsi sysfs. Thank you.
>
> Thanks, Roger.
--
Best regards,
Jiqian Chen.