From: Bodong Wang <[email protected]>
Sometimes it is not desirable to probe the virtual functions after
SRIOV is enabled. This can save host side resource usage by VF
instances which would be eventually probed to VMs.
Added a new PCI sysfs interface "sriov_probe_vfs" to control that
from PF, all current callers still retain the same functionality.
To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable probe) to
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
Note that, the choice must be made before enabling VFs. The change
will not take effect if VFs are already enabled. Simply, one can set
sriov_numvfs to 0, choose whether to probe or not, and then resume
sriov_numvfs.
Change-Id: I48e6db1e8c7b364bb371590e2b13b4d7ee87713c
Signed-off-by: Bodong Wang <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
---
Documentation/PCI/pci-iov-howto.txt | 10 ++++++++++
drivers/pci/iov.c | 1 +
drivers/pci/pci-driver.c | 15 +++++++++++----
drivers/pci/pci-sysfs.c | 28 ++++++++++++++++++++++++++++
drivers/pci/pci.h | 1 +
5 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
index 2d91ae2..902a528 100644
--- a/Documentation/PCI/pci-iov-howto.txt
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -68,6 +68,16 @@ To disable SR-IOV capability:
echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
+To enable probing VFs by a compatible driver on the host:
+Before enabling SR-IOV capabilities, do:
+ echo 1 > \
+ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
+
+To disable probing VFs by a compatible driver on the host:
+Before enabling SR-IOV capabilities, do:
+ echo 0 > \
+ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
+
3.2 Usage example
Following piece of code illustrates the usage of the SR-IOV API.
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2479ae8..70691de 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -450,6 +450,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
iov->total_VFs = total;
iov->pgsz = pgsz;
iov->self = dev;
+ iov->probe_vfs = true;
pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, &iov->cap);
pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, &iov->link);
if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index afa7271..930552c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -405,11 +405,18 @@ static int pci_device_probe(struct device *dev)
return error;
pci_dev_get(pci_dev);
- error = __pci_device_probe(drv, pci_dev);
- if (error) {
- pcibios_free_irq(pci_dev);
- pci_dev_put(pci_dev);
+#ifdef CONFIG_PCI_IOV
+ if (!pci_dev->is_virtfn ||
+ (pci_dev->is_virtfn && pci_dev->physfn->sriov->probe_vfs)) {
+#endif
+ error = __pci_device_probe(drv, pci_dev);
+ if (error) {
+ pcibios_free_irq(pci_dev);
+ pci_dev_put(pci_dev);
+ }
+#ifdef CONFIG_PCI_IOV
}
+#endif
return error;
}
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 25d010d..1d5b89d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -526,10 +526,37 @@ static ssize_t sriov_numvfs_store(struct device *dev,
return count;
}
+static ssize_t sriov_probe_vfs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ return sprintf(buf, "%u\n", pdev->sriov->probe_vfs);
+}
+
+static ssize_t sriov_probe_vfs_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ bool probe_vfs;
+
+ if (kstrtobool(buf, &probe_vfs) < 0)
+ return -EINVAL;
+
+ pdev->sriov->probe_vfs = probe_vfs;
+
+ return count;
+}
+
static struct device_attribute sriov_totalvfs_attr = __ATTR_RO(sriov_totalvfs);
static struct device_attribute sriov_numvfs_attr =
__ATTR(sriov_numvfs, (S_IRUGO|S_IWUSR|S_IWGRP),
sriov_numvfs_show, sriov_numvfs_store);
+static struct device_attribute sriov_probe_vfs_attr =
+ __ATTR(sriov_probe_vfs, (S_IRUGO|S_IWUSR|S_IWGRP),
+ sriov_probe_vfs_show, sriov_probe_vfs_store);
#endif /* CONFIG_PCI_IOV */
static ssize_t driver_override_store(struct device *dev,
@@ -1549,6 +1576,7 @@ static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj,
static struct attribute *sriov_dev_attrs[] = {
&sriov_totalvfs_attr.attr,
&sriov_numvfs_attr.attr,
+ &sriov_probe_vfs_attr.attr,
NULL,
};
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 8dd38e6..a62c6bf 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -272,6 +272,7 @@ struct pci_sriov {
struct pci_dev *self; /* this PF */
struct mutex lock; /* lock for setting sriov_numvfs in sysfs */
resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */
+ bool probe_vfs; /* probe VFs or not */
};
#ifdef CONFIG_PCI_ATS
--
1.8.3.1
On Mon, Mar 20, 2017 at 05:14:34PM +0200, [email protected] wrote:
>From: Bodong Wang <[email protected]>
>
>Sometimes it is not desirable to probe the virtual functions after
>SRIOV is enabled. This can save host side resource usage by VF
>instances which would be eventually probed to VMs.
>
>Added a new PCI sysfs interface "sriov_probe_vfs" to control that
>from PF, all current callers still retain the same functionality.
>To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable probe) to
>
>/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
>
>Note that, the choice must be made before enabling VFs. The change
>will not take effect if VFs are already enabled. Simply, one can set
>sriov_numvfs to 0, choose whether to probe or not, and then resume
>sriov_numvfs.
>
Bodong, I'm not sure if there is a requirement to load driver for the
specified number of VFs? That indicates no driver will be loaded for
other VFs. If so, this interface might serve the purpose as well.
>Change-Id: I48e6db1e8c7b364bb371590e2b13b4d7ee87713c
>Signed-off-by: Bodong Wang <[email protected]>
>Signed-off-by: Eli Cohen <[email protected]>
>---
> Documentation/PCI/pci-iov-howto.txt | 10 ++++++++++
> drivers/pci/iov.c | 1 +
> drivers/pci/pci-driver.c | 15 +++++++++++----
> drivers/pci/pci-sysfs.c | 28 ++++++++++++++++++++++++++++
> drivers/pci/pci.h | 1 +
> 5 files changed, 51 insertions(+), 4 deletions(-)
>
>diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
>index 2d91ae2..902a528 100644
>--- a/Documentation/PCI/pci-iov-howto.txt
>+++ b/Documentation/PCI/pci-iov-howto.txt
>@@ -68,6 +68,16 @@ To disable SR-IOV capability:
> echo 0 > \
> /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>
>+To enable probing VFs by a compatible driver on the host:
>+Before enabling SR-IOV capabilities, do:
>+ echo 1 > \
>+ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
>+
>+To disable probing VFs by a compatible driver on the host:
>+Before enabling SR-IOV capabilities, do:
>+ echo 0 > \
>+ /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
>+
> 3.2 Usage example
>
> Following piece of code illustrates the usage of the SR-IOV API.
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index 2479ae8..70691de 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -450,6 +450,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> iov->total_VFs = total;
> iov->pgsz = pgsz;
> iov->self = dev;
>+ iov->probe_vfs = true;
> pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, &iov->cap);
> pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, &iov->link);
> if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>index afa7271..930552c 100644
>--- a/drivers/pci/pci-driver.c
>+++ b/drivers/pci/pci-driver.c
>@@ -405,11 +405,18 @@ static int pci_device_probe(struct device *dev)
> return error;
>
> pci_dev_get(pci_dev);
>- error = __pci_device_probe(drv, pci_dev);
>- if (error) {
>- pcibios_free_irq(pci_dev);
>- pci_dev_put(pci_dev);
>+#ifdef CONFIG_PCI_IOV
>+ if (!pci_dev->is_virtfn ||
>+ (pci_dev->is_virtfn && pci_dev->physfn->sriov->probe_vfs)) {
>+#endif
>+ error = __pci_device_probe(drv, pci_dev);
>+ if (error) {
>+ pcibios_free_irq(pci_dev);
>+ pci_dev_put(pci_dev);
>+ }
>+#ifdef CONFIG_PCI_IOV
> }
>+#endif
>
I think it's reasonable to have a inline function for this check:
#ifdef CONFIG_PCI_IOV
static inline bool pci_device_can_probe(struct pci_dev *pdev)
{
return (!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
}
#else
static inline bool pci_device_can_probe(struct pci_dev *pdev)
{
return true;
}
#endif
> return error;
> }
>diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>index 25d010d..1d5b89d 100644
>--- a/drivers/pci/pci-sysfs.c
>+++ b/drivers/pci/pci-sysfs.c
>@@ -526,10 +526,37 @@ static ssize_t sriov_numvfs_store(struct device *dev,
> return count;
> }
>
>+static ssize_t sriov_probe_vfs_show(struct device *dev,
>+ struct device_attribute *attr,
>+ char *buf)
>+{
>+ struct pci_dev *pdev = to_pci_dev(dev);
>+
>+ return sprintf(buf, "%u\n", pdev->sriov->probe_vfs);
>+}
>+
>+static ssize_t sriov_probe_vfs_store(struct device *dev,
>+ struct device_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ struct pci_dev *pdev = to_pci_dev(dev);
>+ bool probe_vfs;
>+
>+ if (kstrtobool(buf, &probe_vfs) < 0)
>+ return -EINVAL;
>+
>+ pdev->sriov->probe_vfs = probe_vfs;
>+
>+ return count;
>+}
>+
> static struct device_attribute sriov_totalvfs_attr = __ATTR_RO(sriov_totalvfs);
> static struct device_attribute sriov_numvfs_attr =
> __ATTR(sriov_numvfs, (S_IRUGO|S_IWUSR|S_IWGRP),
> sriov_numvfs_show, sriov_numvfs_store);
>+static struct device_attribute sriov_probe_vfs_attr =
>+ __ATTR(sriov_probe_vfs, (S_IRUGO|S_IWUSR|S_IWGRP),
>+ sriov_probe_vfs_show, sriov_probe_vfs_store);
> #endif /* CONFIG_PCI_IOV */
>
> static ssize_t driver_override_store(struct device *dev,
>@@ -1549,6 +1576,7 @@ static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj,
> static struct attribute *sriov_dev_attrs[] = {
> &sriov_totalvfs_attr.attr,
> &sriov_numvfs_attr.attr,
>+ &sriov_probe_vfs_attr.attr,
> NULL,
> };
>
>diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>index 8dd38e6..a62c6bf 100644
>--- a/drivers/pci/pci.h
>+++ b/drivers/pci/pci.h
>@@ -272,6 +272,7 @@ struct pci_sriov {
> struct pci_dev *self; /* this PF */
> struct mutex lock; /* lock for setting sriov_numvfs in sysfs */
> resource_size_t barsz[PCI_SRIOV_NUM_BARS]; /* VF BAR size */
>+ bool probe_vfs; /* probe VFs or not */
> };
>
> #ifdef CONFIG_PCI_ATS
Thanks,
Gavin
On 3/20/2017 6:07 PM, Gavin Shan wrote:
> On Mon, Mar 20, 2017 at 05:14:34PM +0200, [email protected] wrote:
>> From: Bodong Wang <[email protected]>
>>
>> Sometimes it is not desirable to probe the virtual functions after
>> SRIOV is enabled. This can save host side resource usage by VF
>> instances which would be eventually probed to VMs.
>>
>> Added a new PCI sysfs interface "sriov_probe_vfs" to control that
> >from PF, all current callers still retain the same functionality.
>> To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable probe) to
>>
>> /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
>>
>> Note that, the choice must be made before enabling VFs. The change
>> will not take effect if VFs are already enabled. Simply, one can set
>> sriov_numvfs to 0, choose whether to probe or not, and then resume
>> sriov_numvfs.
>>
> Bodong, I'm not sure if there is a requirement to load driver for the
> specified number of VFs? That indicates no driver will be loaded for
> other VFs. If so, this interface might serve the purpose as well.
Gavin, thanks for the review. That is indeed an interesting suggestion.
Theoretically, we can change that probe_vfs from boolean to integer.
And use it as a counter to probe the first N VFs(if N < total_vfs).
Let's see if there are any objections.
>
>
> +#ifdef CONFIG_PCI_IOV
> + if (!pci_dev->is_virtfn ||
> + (pci_dev->is_virtfn && pci_dev->physfn->sriov->probe_vfs)) {
> +#endif
> + error = __pci_device_probe(drv, pci_dev);
> + if (error) {
> + pcibios_free_irq(pci_dev);
> + pci_dev_put(pci_dev);
> + }
> +#ifdef CONFIG_PCI_IOV
> }
> +#endif
>
> I think it's reasonable to have a inline function for this check:
It's doable, but what's the benefit?
>
> #ifdef CONFIG_PCI_IOV
> static inline bool pci_device_can_probe(struct pci_dev *pdev)
> {
> return (!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
should be return (!pdev->is_virtfn || (pci_dev->is_virtfn &&
pci_dev->physfn->sriov->probe_vfs));
We want to probe that device if 1) it's a PF 2) it'a VF and probe_vfs is set
> }
> #else
> static inline bool pci_device_can_probe(struct pci_dev *pdev)
> {
> return true;
> }
This function will be a waste if CONFIG_PCI_IOV is not defined.
> #endif
> Thanks,
> Gavin
>
On Mon, Mar 20, 2017 at 06:34:23PM -0500, Bodong Wang wrote:
>On 3/20/2017 6:07 PM, Gavin Shan wrote:
>>On Mon, Mar 20, 2017 at 05:14:34PM +0200, [email protected] wrote:
>>>From: Bodong Wang <[email protected]>
>>>
>>>Sometimes it is not desirable to probe the virtual functions after
>>>SRIOV is enabled. This can save host side resource usage by VF
>>>instances which would be eventually probed to VMs.
>>>
>>>Added a new PCI sysfs interface "sriov_probe_vfs" to control that
>>>from PF, all current callers still retain the same functionality.
>>>To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable probe) to
>>>
>>>/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
>>>
>>>Note that, the choice must be made before enabling VFs. The change
>>>will not take effect if VFs are already enabled. Simply, one can set
>>>sriov_numvfs to 0, choose whether to probe or not, and then resume
>>>sriov_numvfs.
>>>
>>Bodong, I'm not sure if there is a requirement to load driver for the
>>specified number of VFs? That indicates no driver will be loaded for
>>other VFs. If so, this interface might serve the purpose as well.
>Gavin, thanks for the review. That is indeed an interesting suggestion.
>Theoretically, we can change that probe_vfs from boolean to integer. And use
>it as a counter to probe the first N VFs(if N < total_vfs). Let's see if
>there are any objections.
Ok.
>>+#ifdef CONFIG_PCI_IOV
>>+ if (!pci_dev->is_virtfn ||
>>+ (pci_dev->is_virtfn && pci_dev->physfn->sriov->probe_vfs)) {
>>+#endif
>>+ error = __pci_device_probe(drv, pci_dev);
>>+ if (error) {
>>+ pcibios_free_irq(pci_dev);
>>+ pci_dev_put(pci_dev);
>>+ }
>>+#ifdef CONFIG_PCI_IOV
>> }
>>+#endif
>>
>>I think it's reasonable to have a inline function for this check:
>It's doable, but what's the benefit?
>>
>>#ifdef CONFIG_PCI_IOV
>>static inline bool pci_device_can_probe(struct pci_dev *pdev)
>>{
>> return (!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
>should be return (!pdev->is_virtfn || (pci_dev->is_virtfn &&
>pci_dev->physfn->sriov->probe_vfs));
>
>We want to probe that device if 1) it's a PF 2) it'a VF and probe_vfs is set
>>}
>>#else
>>static inline bool pci_device_can_probe(struct pci_dev *pdev)
>>{
>> return true;
>>}
>This function will be a waste if CONFIG_PCI_IOV is not defined.
>>#endif
It makes the code a bit clean. Nope, the proposed conditional
expression is elaborate. Yeah, the purpose is exactly same as
you said: probe driver for non-VF or VFs that were allowed.
(!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
When pdev->is_virtfn is flase, "pdev->physfn->sriov->probe_vfs"
doesn't take effect. Otherwise, it means pdev->is_virtfn is true
indirectly and going to check "pdev->physfn->sriov->probe_vfs".
So it needn't check pdev->is_virtfn explicitly in later case,
but it isn't wrong :)
Thanks,
Gavin
On 3/20/2017 7:24 PM, Gavin Shan wrote:
> On Mon, Mar 20, 2017 at 06:34:23PM -0500, Bodong Wang wrote:
>> On 3/20/2017 6:07 PM, Gavin Shan wrote:
>>> On Mon, Mar 20, 2017 at 05:14:34PM +0200, [email protected] wrote:
>>>> From: Bodong Wang <[email protected]>
>>>>
>>>> Sometimes it is not desirable to probe the virtual functions after
>>>> SRIOV is enabled. This can save host side resource usage by VF
>>>> instances which would be eventually probed to VMs.
>>>>
>>>> Added a new PCI sysfs interface "sriov_probe_vfs" to control that
>>> >from PF, all current callers still retain the same functionality.
>>>> To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable probe) to
>>>>
>>>> /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
>>>>
>>>> Note that, the choice must be made before enabling VFs. The change
>>>> will not take effect if VFs are already enabled. Simply, one can set
>>>> sriov_numvfs to 0, choose whether to probe or not, and then resume
>>>> sriov_numvfs.
>>>>
>>> Bodong, I'm not sure if there is a requirement to load driver for the
>>> specified number of VFs? That indicates no driver will be loaded for
>>> other VFs. If so, this interface might serve the purpose as well.
>> Gavin, thanks for the review. That is indeed an interesting suggestion.
>> Theoretically, we can change that probe_vfs from boolean to integer. And use
>> it as a counter to probe the first N VFs(if N < total_vfs). Let's see if
>> there are any objections.
> Ok.
>
>>> +#ifdef CONFIG_PCI_IOV
>>> + if (!pci_dev->is_virtfn ||
>>> + (pci_dev->is_virtfn && pci_dev->physfn->sriov->probe_vfs)) {
>>> +#endif
>>> + error = __pci_device_probe(drv, pci_dev);
>>> + if (error) {
>>> + pcibios_free_irq(pci_dev);
>>> + pci_dev_put(pci_dev);
>>> + }
>>> +#ifdef CONFIG_PCI_IOV
>>> }
>>> +#endif
>>>
>>> I think it's reasonable to have a inline function for this check:
>> It's doable, but what's the benefit?
>>> #ifdef CONFIG_PCI_IOV
>>> static inline bool pci_device_can_probe(struct pci_dev *pdev)
>>> {
>>> return (!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
>> should be return (!pdev->is_virtfn || (pci_dev->is_virtfn &&
>> pci_dev->physfn->sriov->probe_vfs));
>>
>> We want to probe that device if 1) it's a PF 2) it'a VF and probe_vfs is set
>>> }
>>> #else
>>> static inline bool pci_device_can_probe(struct pci_dev *pdev)
>>> {
>>> return true;
>>> }
>> This function will be a waste if CONFIG_PCI_IOV is not defined.
>>> #endif
> It makes the code a bit clean. Nope, the proposed conditional
> expression is elaborate. Yeah, the purpose is exactly same as
> you said: probe driver for non-VF or VFs that were allowed.
>
> (!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
>
> When pdev->is_virtfn is flase, "pdev->physfn->sriov->probe_vfs"
> doesn't take effect. Otherwise, it means pdev->is_virtfn is true
> indirectly and going to check "pdev->physfn->sriov->probe_vfs".
> So it needn't check pdev->is_virtfn explicitly in later case,
> but it isn't wrong :)
>
> Thanks,
> Gavin
>
Make sense :) Will apply in V1.
Thanks,
Bodong
On Mon, 20 Mar 2017 18:34:23 -0500
Bodong Wang <[email protected]> wrote:
> On 3/20/2017 6:07 PM, Gavin Shan wrote:
> > On Mon, Mar 20, 2017 at 05:14:34PM +0200, [email protected] wrote:
> >> From: Bodong Wang <[email protected]>
> >>
> >> Sometimes it is not desirable to probe the virtual functions after
> >> SRIOV is enabled. This can save host side resource usage by VF
> >> instances which would be eventually probed to VMs.
What resources would not be released when the VF driver is unbound?
> >> Added a new PCI sysfs interface "sriov_probe_vfs" to control that
> > >from PF, all current callers still retain the same functionality.
> >> To modify it, echo 0/n/N (disable probe) or 1/y/Y (enable probe) to
> >>
> >> /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_probe_vfs
> >>
> >> Note that, the choice must be made before enabling VFs. The change
> >> will not take effect if VFs are already enabled. Simply, one can set
> >> sriov_numvfs to 0, choose whether to probe or not, and then resume
> >> sriov_numvfs.
> >>
> > Bodong, I'm not sure if there is a requirement to load driver for the
> > specified number of VFs? That indicates no driver will be loaded for
> > other VFs. If so, this interface might serve the purpose as well.
> Gavin, thanks for the review. That is indeed an interesting suggestion.
> Theoretically, we can change that probe_vfs from boolean to integer.
> And use it as a counter to probe the first N VFs(if N < total_vfs).
> Let's see if there are any objections.
Is it just me or does this seem like a confusing user interface, ie. to
get binary on/off behavior a user now needs to 'cat total_vfs >
sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
> >
> >
> > +#ifdef CONFIG_PCI_IOV
> > + if (!pci_dev->is_virtfn ||
> > + (pci_dev->is_virtfn && pci_dev->physfn->sriov->probe_vfs)) {
> > +#endif
> > + error = __pci_device_probe(drv, pci_dev);
> > + if (error) {
> > + pcibios_free_irq(pci_dev);
> > + pci_dev_put(pci_dev);
> > + }
> > +#ifdef CONFIG_PCI_IOV
> > }
> > +#endif
> >
> > I think it's reasonable to have a inline function for this check:
> It's doable, but what's the benefit?
Way cleaner.
> >
> > #ifdef CONFIG_PCI_IOV
> > static inline bool pci_device_can_probe(struct pci_dev *pdev)
> > {
> > return (!pdev->is_virtfn || pdev->physfn->sriov->probe_vfs);
> should be return (!pdev->is_virtfn || (pci_dev->is_virtfn &&
> pci_dev->physfn->sriov->probe_vfs));
>
> We want to probe that device if 1) it's a PF 2) it'a VF and probe_vfs is set
> > }
> > #else
> > static inline bool pci_device_can_probe(struct pci_dev *pdev)
> > {
> > return true;
> > }
> This function will be a waste if CONFIG_PCI_IOV is not defined.
> > #endif
>
> > Thanks,
> > Gavin
> >
>
On Mon, Mar 20, 2017 at 10:57:08PM -0600, Alex Williamson wrote:
>On Mon, 20 Mar 2017 18:34:23 -0500
>Bodong Wang <[email protected]> wrote:
.../...
>> > Bodong, I'm not sure if there is a requirement to load driver for the
>> > specified number of VFs? That indicates no driver will be loaded for
>> > other VFs. If so, this interface might serve the purpose as well.
>> Gavin, thanks for the review. That is indeed an interesting suggestion.
>> Theoretically, we can change that probe_vfs from boolean to integer.
>> And use it as a counter to probe the first N VFs(if N < total_vfs).
>> Let's see if there are any objections.
>
>Is it just me or does this seem like a confusing user interface, ie. to
>get binary on/off behavior a user now needs to 'cat total_vfs >
>sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
>
After it's changed to integer, it accepts number. If users want to load
driver for all VFs and don't want to check the maximal number of VFs,
they can simply write 0xffffffff. So "on" and "off" are replaced with 0xffffffff
and 0, but users has to press the keyboard more times though.
drivers/net/ethernet/mellanox/mlx4/main.c::probe_vfs_argc allows to specify
the number of VFs with which we're going to bind drivers. Less time is needed
to enable SRIOV capability. As I had in some development environment: assume
PF supports 256 VFs and I'm going to enable all of them, but I only want to
load driver for two of them, then test the data path on those two VFs. Besides,
I can image the VF needn't a driver in host if it's going to be passed to guest.
Not sure how much sense it makes.
Thanks,
Gavin
On Tue, 21 Mar 2017 16:43:05 +1100
Gavin Shan <[email protected]> wrote:
> On Mon, Mar 20, 2017 at 10:57:08PM -0600, Alex Williamson wrote:
> >On Mon, 20 Mar 2017 18:34:23 -0500
> >Bodong Wang <[email protected]> wrote:
>
> .../...
>
> >> > Bodong, I'm not sure if there is a requirement to load driver for the
> >> > specified number of VFs? That indicates no driver will be loaded for
> >> > other VFs. If so, this interface might serve the purpose as well.
> >> Gavin, thanks for the review. That is indeed an interesting suggestion.
> >> Theoretically, we can change that probe_vfs from boolean to integer.
> >> And use it as a counter to probe the first N VFs(if N < total_vfs).
> >> Let's see if there are any objections.
> >
> >Is it just me or does this seem like a confusing user interface, ie. to
> >get binary on/off behavior a user now needs to 'cat total_vfs >
> >sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
> >
>
> After it's changed to integer, it accepts number. If users want to load
> driver for all VFs and don't want to check the maximal number of VFs,
> they can simply write 0xffffffff. So "on" and "off" are replaced with 0xffffffff
> and 0, but users has to press the keyboard more times though.
>
> drivers/net/ethernet/mellanox/mlx4/main.c::probe_vfs_argc allows to specify
> the number of VFs with which we're going to bind drivers. Less time is needed
> to enable SRIOV capability. As I had in some development environment: assume
> PF supports 256 VFs and I'm going to enable all of them, but I only want to
> load driver for two of them, then test the data path on those two VFs. Besides,
> I can image the VF needn't a driver in host if it's going to be passed to guest.
> Not sure how much sense it makes.
Yes, I understand what you're trying to do, but I still think it's
confusing for a user interface. This also doesn't answer what's the
practical, typical user case you see where it's useful to probe some
VFs but not others. The case listed is a development case where you
could just as easily disable all probing, then manually bind the first
two VFs to the host driver. Which is the better design, impose a
confusing interface on all users to simplify an obscure development
environment or simplify the user interface and assume developers know
how to bind devices otherwise? Thanks,
Alex
On Tue, Mar 21, 2017 at 12:01:58AM -0600, Alex Williamson wrote:
>On Tue, 21 Mar 2017 16:43:05 +1100
>Gavin Shan <[email protected]> wrote:
>> On Mon, Mar 20, 2017 at 10:57:08PM -0600, Alex Williamson wrote:
>> >On Mon, 20 Mar 2017 18:34:23 -0500
>> >Bodong Wang <[email protected]> wrote:
>>
>> .../...
>>
>> >> > Bodong, I'm not sure if there is a requirement to load driver for the
>> >> > specified number of VFs? That indicates no driver will be loaded for
>> >> > other VFs. If so, this interface might serve the purpose as well.
>> >> Gavin, thanks for the review. That is indeed an interesting suggestion.
>> >> Theoretically, we can change that probe_vfs from boolean to integer.
>> >> And use it as a counter to probe the first N VFs(if N < total_vfs).
>> >> Let's see if there are any objections.
>> >
>> >Is it just me or does this seem like a confusing user interface, ie. to
>> >get binary on/off behavior a user now needs to 'cat total_vfs >
>> >sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
>> >
>>
>> After it's changed to integer, it accepts number. If users want to load
>> driver for all VFs and don't want to check the maximal number of VFs,
>> they can simply write 0xffffffff. So "on" and "off" are replaced with 0xffffffff
>> and 0, but users has to press the keyboard more times though.
>>
>> drivers/net/ethernet/mellanox/mlx4/main.c::probe_vfs_argc allows to specify
>> the number of VFs with which we're going to bind drivers. Less time is needed
>> to enable SRIOV capability. As I had in some development environment: assume
>> PF supports 256 VFs and I'm going to enable all of them, but I only want to
>> load driver for two of them, then test the data path on those two VFs. Besides,
>> I can image the VF needn't a driver in host if it's going to be passed to guest.
>> Not sure how much sense it makes.
>
>Yes, I understand what you're trying to do, but I still think it's
>confusing for a user interface. This also doesn't answer what's the
>practical, typical user case you see where it's useful to probe some
>VFs but not others. The case listed is a development case where you
>could just as easily disable all probing, then manually bind the first
>two VFs to the host driver. Which is the better design, impose a
>confusing interface on all users to simplify an obscure development
>environment or simplify the user interface and assume developers know
>how to bind devices otherwise? Thanks,
>
Yeah, your explanation is also fairly reasonable. The interface has
been named as "probe_vfs" instead of "probe_vf" or "probe_vf_driver".
So it seems it should accept number of VFs on which drivers are loaded.
Besides, making this interface accept number corresponds to 3 possiblities:
all, none and load drivers on part of available VFs. So more flexibility
is gained.
User can theoritically have the use case as I had - passing through
some of the VFs to guest: (A) All VFs are bound with drivers; (B) unbind
the drivers for some of the VFs; (C) bind the VFs with vfio-pci; (D) passing
through; (A) is overhead in this scenario. Some CPU cycles are saved if (A)
is avoided.
Thanks,
Gavin
On 3/21/2017 1:01 AM, Alex Williamson wrote:
> On Tue, 21 Mar 2017 16:43:05 +1100
> Gavin Shan <[email protected]> wrote:
>
>> On Mon, Mar 20, 2017 at 10:57:08PM -0600, Alex Williamson wrote:
>>> On Mon, 20 Mar 2017 18:34:23 -0500
>>> Bodong Wang <[email protected]> wrote:
>> .../...
>>
>>>>> Bodong, I'm not sure if there is a requirement to load driver for the
>>>>> specified number of VFs? That indicates no driver will be loaded for
>>>>> other VFs. If so, this interface might serve the purpose as well.
>>>> Gavin, thanks for the review. That is indeed an interesting suggestion.
>>>> Theoretically, we can change that probe_vfs from boolean to integer.
>>>> And use it as a counter to probe the first N VFs(if N < total_vfs).
>>>> Let's see if there are any objections.
>>> Is it just me or does this seem like a confusing user interface, ie. to
>>> get binary on/off behavior a user now needs to 'cat total_vfs >
>>> sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
>>>
>> After it's changed to integer, it accepts number. If users want to load
>> driver for all VFs and don't want to check the maximal number of VFs,
>> they can simply write 0xffffffff. So "on" and "off" are replaced with 0xffffffff
>> and 0, but users has to press the keyboard more times though.
>>
>> drivers/net/ethernet/mellanox/mlx4/main.c::probe_vfs_argc allows to specify
>> the number of VFs with which we're going to bind drivers. Less time is needed
>> to enable SRIOV capability. As I had in some development environment: assume
>> PF supports 256 VFs and I'm going to enable all of them, but I only want to
>> load driver for two of them, then test the data path on those two VFs. Besides,
>> I can image the VF needn't a driver in host if it's going to be passed to guest.
>> Not sure how much sense it makes.
> Yes, I understand what you're trying to do, but I still think it's
> confusing for a user interface. This also doesn't answer what's the
> practical, typical user case you see where it's useful to probe some
> VFs but not others. The case listed is a development case where you
> could just as easily disable all probing, then manually bind the first
> two VFs to the host driver. Which is the better design, impose a
> confusing interface on all users to simplify an obscure development
> environment or simplify the user interface and assume developers know
> how to bind devices otherwise? Thanks,
>
> Alex
I agree with Alex on this concern. Sometimes, I need to probe 1 or 2 VFs
to host side just for development purpose. Bind/unbind satisfy this case
perfectly, and that's how current implementation is designed. But, from
Gavin's use case, it will be a pain to bind/unbind hundreds of VFs. So,
I want to understand how common this use case is. If it's not common, I
prefer to keep current design because 1) the interface is much easier to
understand and use 2) less error prone because no need to check current
total_vfs and maintain a static counter to enable N vfs.
Bodong
On Tue, 21 Mar 2017 20:25:18 +1100
Gavin Shan <[email protected]> wrote:
> On Tue, Mar 21, 2017 at 12:01:58AM -0600, Alex Williamson wrote:
> >On Tue, 21 Mar 2017 16:43:05 +1100
> >Gavin Shan <[email protected]> wrote:
> >> On Mon, Mar 20, 2017 at 10:57:08PM -0600, Alex Williamson wrote:
> >> >On Mon, 20 Mar 2017 18:34:23 -0500
> >> >Bodong Wang <[email protected]> wrote:
> >>
> >> .../...
> >>
> >> >> > Bodong, I'm not sure if there is a requirement to load driver for the
> >> >> > specified number of VFs? That indicates no driver will be loaded for
> >> >> > other VFs. If so, this interface might serve the purpose as well.
> >> >> Gavin, thanks for the review. That is indeed an interesting suggestion.
> >> >> Theoretically, we can change that probe_vfs from boolean to integer.
> >> >> And use it as a counter to probe the first N VFs(if N < total_vfs).
> >> >> Let's see if there are any objections.
> >> >
> >> >Is it just me or does this seem like a confusing user interface, ie. to
> >> >get binary on/off behavior a user now needs to 'cat total_vfs >
> >> >sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
> >> >
> >>
> >> After it's changed to integer, it accepts number. If users want to load
> >> driver for all VFs and don't want to check the maximal number of VFs,
> >> they can simply write 0xffffffff. So "on" and "off" are replaced with 0xffffffff
> >> and 0, but users has to press the keyboard more times though.
> >>
> >> drivers/net/ethernet/mellanox/mlx4/main.c::probe_vfs_argc allows to specify
> >> the number of VFs with which we're going to bind drivers. Less time is needed
> >> to enable SRIOV capability. As I had in some development environment: assume
> >> PF supports 256 VFs and I'm going to enable all of them, but I only want to
> >> load driver for two of them, then test the data path on those two VFs. Besides,
> >> I can image the VF needn't a driver in host if it's going to be passed to guest.
> >> Not sure how much sense it makes.
> >
> >Yes, I understand what you're trying to do, but I still think it's
> >confusing for a user interface. This also doesn't answer what's the
> >practical, typical user case you see where it's useful to probe some
> >VFs but not others. The case listed is a development case where you
> >could just as easily disable all probing, then manually bind the first
> >two VFs to the host driver. Which is the better design, impose a
> >confusing interface on all users to simplify an obscure development
> >environment or simplify the user interface and assume developers know
> >how to bind devices otherwise? Thanks,
> >
>
> Yeah, your explanation is also fairly reasonable. The interface has
> been named as "probe_vfs" instead of "probe_vf" or "probe_vf_driver".
> So it seems it should accept number of VFs on which drivers are loaded.
> Besides, making this interface accept number corresponds to 3 possiblities:
> all, none and load drivers on part of available VFs. So more flexibility
> is gained.
>
> User can theoritically have the use case as I had - passing through
> some of the VFs to guest: (A) All VFs are bound with drivers; (B) unbind
> the drivers for some of the VFs; (C) bind the VFs with vfio-pci; (D) passing
> through; (A) is overhead in this scenario. Some CPU cycles are saved if (A)
> is avoided.
Huh? I'm asking what the practical and typical use case is for this
and you're rehashing the name of the interface and giving me
theoretical examples. Outside of your development environment, why
would a user every actually want to do this?
If we want to talk about the ABI, I would suggest drawing from existing
ABIs. We already have drivers_autoprobe as part of the standard sysfs
ABI, so if we want a binary switch, then sriov_drivers_autoprobe might
be a logical choice. If you're concerned about this mythical overhead
of binding to one driver then another, then why not draw from the
driver_override interface to allow the user to specify the driver to
bind to, perhaps sriov_driver_override. Then if the user wants to bind
all the devices to vfio-pci, they can do so easily. I still fail to
see that probing some fixed number of the VFs and leaving the rest
unprobed has any practical value and I imagine bugs coming in because
users are confused why some of their VFs behave differently than
others. Thanks,
Alex
> If we want to talk about the ABI, I would suggest drawing from existing ABIs. We already have
> drivers_autoprobe as part of the standard sysfs ABI, so if we want a binary switch, then
>sriov_drivers_autoprobe might be a logical choice. If you're concerned about this mythical overhead of > binding to one driver then another, then why not draw from the driver_override interface to allow the
> user to specify the driver to bind to, perhaps sriov_driver_override. Then if the user wants to bind all
> the devices to vfio-pci, they can do so easily. I still fail to see that probing some fixed number of the VFs
> and leaving the rest unprobed has any practical value and I imagine bugs coming in because users are
> confused why some of their VFs behave differently than others. Thanks,
I agree with Alex - the interface should better be binary - either probe VFs or not. The rest can be done with binding/unbinding VFs as necessary. The main goal is to refrain from automatically initializing virtual functions at the hypervisor if they were initially instantiated to assign then to guests.
On Tue, Mar 21, 2017 at 08:23:29AM -0600, Alex Williamson wrote:
>On Tue, 21 Mar 2017 20:25:18 +1100
>Gavin Shan <[email protected]> wrote:
>> On Tue, Mar 21, 2017 at 12:01:58AM -0600, Alex Williamson wrote:
>> >On Tue, 21 Mar 2017 16:43:05 +1100
>> >Gavin Shan <[email protected]> wrote:
>> >> On Mon, Mar 20, 2017 at 10:57:08PM -0600, Alex Williamson wrote:
>> >> >On Mon, 20 Mar 2017 18:34:23 -0500
>> >> >Bodong Wang <[email protected]> wrote:
>> >>
>> >> .../...
>> >>
>> >> >> > Bodong, I'm not sure if there is a requirement to load driver for the
>> >> >> > specified number of VFs? That indicates no driver will be loaded for
>> >> >> > other VFs. If so, this interface might serve the purpose as well.
>> >> >> Gavin, thanks for the review. That is indeed an interesting suggestion.
>> >> >> Theoretically, we can change that probe_vfs from boolean to integer.
>> >> >> And use it as a counter to probe the first N VFs(if N < total_vfs).
>> >> >> Let's see if there are any objections.
>> >> >
>> >> >Is it just me or does this seem like a confusing user interface, ie. to
>> >> >get binary on/off behavior a user now needs to 'cat total_vfs >
>> >> >sriov_probe_vfs'. It's not very intuitive, what's the use case for it?
>> >> >
>> >>
>> >> After it's changed to integer, it accepts number. If users want to load
>> >> driver for all VFs and don't want to check the maximal number of VFs,
>> >> they can simply write 0xffffffff. So "on" and "off" are replaced with 0xffffffff
>> >> and 0, but users has to press the keyboard more times though.
>> >>
>> >> drivers/net/ethernet/mellanox/mlx4/main.c::probe_vfs_argc allows to specify
>> >> the number of VFs with which we're going to bind drivers. Less time is needed
>> >> to enable SRIOV capability. As I had in some development environment: assume
>> >> PF supports 256 VFs and I'm going to enable all of them, but I only want to
>> >> load driver for two of them, then test the data path on those two VFs. Besides,
>> >> I can image the VF needn't a driver in host if it's going to be passed to guest.
>> >> Not sure how much sense it makes.
>> >
>> >Yes, I understand what you're trying to do, but I still think it's
>> >confusing for a user interface. This also doesn't answer what's the
>> >practical, typical user case you see where it's useful to probe some
>> >VFs but not others. The case listed is a development case where you
>> >could just as easily disable all probing, then manually bind the first
>> >two VFs to the host driver. Which is the better design, impose a
>> >confusing interface on all users to simplify an obscure development
>> >environment or simplify the user interface and assume developers know
>> >how to bind devices otherwise? Thanks,
>> >
>>
>> Yeah, your explanation is also fairly reasonable. The interface has
>> been named as "probe_vfs" instead of "probe_vf" or "probe_vf_driver".
>> So it seems it should accept number of VFs on which drivers are loaded.
>> Besides, making this interface accept number corresponds to 3 possiblities:
>> all, none and load drivers on part of available VFs. So more flexibility
>> is gained.
>>
>> User can theoritically have the use case as I had - passing through
>> some of the VFs to guest: (A) All VFs are bound with drivers; (B) unbind
>> the drivers for some of the VFs; (C) bind the VFs with vfio-pci; (D) passing
>> through; (A) is overhead in this scenario. Some CPU cycles are saved if (A)
>> is avoided.
>
>Huh? I'm asking what the practical and typical use case is for this
>and you're rehashing the name of the interface and giving me
>theoretical examples. Outside of your development environment, why
>would a user every actually want to do this?
>
Frankly, I'm not sure how much sense it has as I mentioned from the
beginning. I knew mlx4 driver supports it and it's why I asked.
>If we want to talk about the ABI, I would suggest drawing from existing
>ABIs. We already have drivers_autoprobe as part of the standard sysfs
>ABI, so if we want a binary switch, then sriov_drivers_autoprobe might
>be a logical choice. If you're concerned about this mythical overhead
>of binding to one driver then another, then why not draw from the
>driver_override interface to allow the user to specify the driver to
>bind to, perhaps sriov_driver_override. Then if the user wants to bind
>all the devices to vfio-pci, they can do so easily. I still fail to
>see that probing some fixed number of the VFs and leaving the rest
>unprobed has any practical value and I imagine bugs coming in because
>users are confused why some of their VFs behave differently than
>others. Thanks,
>
Ok. I tend to agree with you - the use case isn't practical. On the other
hand, I think "sriov_driver_orverride" would be a good idea, but it's not
too much related to this patch. Thank you for the explanation.
Thanks,
Gavin
On Tue, Mar 21, 2017 at 02:34:46PM +0000, Eli Cohen wrote:
>> If we want to talk about the ABI, I would suggest drawing from existing ABIs. We already have
>> drivers_autoprobe as part of the standard sysfs ABI, so if we want a binary switch, then
>>sriov_drivers_autoprobe might be a logical choice. If you're concerned about this mythical overhead of > binding to one driver then another, then why not draw from the driver_override interface to allow the
>> user to specify the driver to bind to, perhaps sriov_driver_override. Then if the user wants to bind all
>> the devices to vfio-pci, they can do so easily. I still fail to see that probing some fixed number of the VFs
>> and leaving the rest unprobed has any practical value and I imagine bugs coming in because users are
>> confused why some of their VFs behave differently than others. Thanks,
>
>I agree with Alex - the interface should better be binary - either probe VFs or not. The rest can be done with binding/unbinding VFs as necessary. The main goal is to refrain from automatically initializing virtual functions at the hypervisor if they were initially instantiated to assign then to guests.
>
It's fairly reasonable. Thanks for confirm. I'll review v2.
Thanks,
Gavin