2009-03-06 20:37:50

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote:
> + virtfn->sysdata = dev->bus->sysdata;
> + virtfn->dev.parent = dev->dev.parent;
> + virtfn->dev.bus = dev->dev.bus;
> + virtfn->devfn = devfn;
> + virtfn->hdr_type = PCI_HEADER_TYPE_NORMAL;
> + virtfn->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> + virtfn->error_state = pci_channel_io_normal;
> + virtfn->current_state = PCI_UNKNOWN;
> + virtfn->is_pcie = 1;
> + virtfn->pcie_type = PCI_EXP_TYPE_ENDPOINT;
> + virtfn->dma_mask = 0xffffffff;
> + virtfn->vendor = dev->vendor;
> + virtfn->subsystem_vendor = dev->subsystem_vendor;
> + virtfn->class = dev->class;

There seems to be a certain amount of commonality between this and
pci_scan_device(). Have you considered trying to make a common helper
function, or does it not work out well?

> + pci_device_add(virtfn, virtfn->bus);

Greg is probably going to ding you here for adding the device, then
creating the symlinks. I believe it's now best practice to create the
symlinks first, so there's no window where userspace can get confused.

> + mutex_unlock(&iov->pdev->sriov->lock);

I question the existance of this mutex now. What's it protecting?

Aren't we going to be implicitly protected by virtue of the Physical
Function device driver being the only one calling this function, and the
driver will be calling it from the ->probe routine which is not called
simultaneously for the same device.

> + virtfn->physfn = pci_dev_get(dev);
> +
> + rc = pci_bus_add_device(virtfn);
> + if (rc)
> + goto failed1;
> + sprintf(buf, "%d", id);

%u, perhaps? And maybe 'id' should always be unsigned? Just a thought.

> + rc = sysfs_create_link(&iov->dev.kobj, &virtfn->dev.kobj, buf);
> + if (rc)
> + goto failed1;
> + rc = sysfs_create_link(&virtfn->dev.kobj, &dev->dev.kobj, "physfn");
> + if (rc)
> + goto failed2;

I'm glad to see these symlinks documented in later patches!

> + nres = 0;
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + res = dev->resource + PCI_SRIOV_RESOURCES + i;
> + if (!res->parent)
> + continue;
> + nres++;
> + }

Can't this be written more simply as:

for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = dev->resource + PCI_SRIOV_RESOURCES + i;
if (res->parent)
nres++;
}
?

> + if (nres != iov->nres) {
> + dev_err(&dev->dev, "no enough MMIO for SR-IOV\n");
> + return -ENOMEM;
> + }

Randy, can you help us out with better wording here?

> + dev_err(&dev->dev, "no enough bus range for SR-IOV\n");

and here.

> + if (iov->link != dev->devfn) {
> + rc = -ENODEV;
> + list_for_each_entry(link, &dev->bus->devices, bus_list) {
> + if (link->sriov && link->devfn == iov->link)
> + rc = sysfs_create_link(&iov->dev.kobj,
> + &link->dev.kobj, "dep_link");

I skipped to the end and read patch 7/7 and I still don't understand
what dep_link is for. Can you explain please? In particular, how is it
different from physfn?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."


2009-03-06 21:47:15

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

Matthew Wilcox wrote:
> On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote:
>
>> + if (nres != iov->nres) {
>> + dev_err(&dev->dev, "no enough MMIO for SR-IOV\n");
>> + return -ENOMEM;
>> + }

"not enough MMIO BARs for SR-IOV"
or
"not enough MMIO resources for SR-IOV"
or
"too few MMIO BARs for SR-IOV"
?

> Randy, can you help us out with better wording here?
>
>> + dev_err(&dev->dev, "no enough bus range for SR-IOV\n");
>
> and here.

"SR-IOV: bus number too large"
or
"SR-IOV: bus number out of range"
or
"SR-IOV: cannot allocate valid bus number"
?

>> + if (iov->link != dev->devfn) {
>> + rc = -ENODEV;
>> + list_for_each_entry(link, &dev->bus->devices, bus_list) {
>> + if (link->sriov && link->devfn == iov->link)
>> + rc = sysfs_create_link(&iov->dev.kobj,
>> + &link->dev.kobj, "dep_link");

2009-03-07 02:58:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

On Fri, Mar 06, 2009 at 01:37:18PM -0700, Matthew Wilcox wrote:
> On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote:
> > + virtfn->sysdata = dev->bus->sysdata;
> > + virtfn->dev.parent = dev->dev.parent;
> > + virtfn->dev.bus = dev->dev.bus;
> > + virtfn->devfn = devfn;
> > + virtfn->hdr_type = PCI_HEADER_TYPE_NORMAL;
> > + virtfn->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> > + virtfn->error_state = pci_channel_io_normal;
> > + virtfn->current_state = PCI_UNKNOWN;
> > + virtfn->is_pcie = 1;
> > + virtfn->pcie_type = PCI_EXP_TYPE_ENDPOINT;
> > + virtfn->dma_mask = 0xffffffff;
> > + virtfn->vendor = dev->vendor;
> > + virtfn->subsystem_vendor = dev->subsystem_vendor;
> > + virtfn->class = dev->class;
>
> There seems to be a certain amount of commonality between this and
> pci_scan_device(). Have you considered trying to make a common helper
> function, or does it not work out well?
>
> > + pci_device_add(virtfn, virtfn->bus);
>
> Greg is probably going to ding you here for adding the device, then
> creating the symlinks. I believe it's now best practice to create the
> symlinks first, so there's no window where userspace can get confused.

If the uevent gets sent before the symlinks are created, it's a bug.

thanks,

greg k-h

2009-03-09 08:24:21

by Zhao, Yu

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

On Sat, Mar 07, 2009 at 04:37:18AM +0800, Matthew Wilcox wrote:
> On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote:
> > + virtfn->sysdata = dev->bus->sysdata;
> > + virtfn->dev.parent = dev->dev.parent;
> > + virtfn->dev.bus = dev->dev.bus;
> > + virtfn->devfn = devfn;
> > + virtfn->hdr_type = PCI_HEADER_TYPE_NORMAL;
> > + virtfn->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> > + virtfn->error_state = pci_channel_io_normal;
> > + virtfn->current_state = PCI_UNKNOWN;
> > + virtfn->is_pcie = 1;
> > + virtfn->pcie_type = PCI_EXP_TYPE_ENDPOINT;
> > + virtfn->dma_mask = 0xffffffff;
> > + virtfn->vendor = dev->vendor;
> > + virtfn->subsystem_vendor = dev->subsystem_vendor;
> > + virtfn->class = dev->class;
>
> There seems to be a certain amount of commonality between this and
> pci_scan_device(). Have you considered trying to make a common helper
> function, or does it not work out well?

It's doable. Will enhance the pci_setup_device and use it to setup the VF.

> > + pci_device_add(virtfn, virtfn->bus);
>
> Greg is probably going to ding you here for adding the device, then
> creating the symlinks. I believe it's now best practice to create the
> symlinks first, so there's no window where userspace can get confused.

Yes, but unfortunately we can't create links before adding a device.
I double checked device_add(), there is no place for those links to be
created before it sends uevent. So for now, we have to trigger another
uevent for those links.

> > + mutex_unlock(&iov->pdev->sriov->lock);
>
> I question the existance of this mutex now. What's it protecting?
>
> Aren't we going to be implicitly protected by virtue of the Physical
> Function device driver being the only one calling this function, and the
> driver will be calling it from the ->probe routine which is not called
> simultaneously for the same device.

The PF driver patches I listed before support dynamical enabling/disabling
of the SR-IOV through sysfs interface. So we have to protect the VF bus
allocation as I explained before.

> > + virtfn->physfn = pci_dev_get(dev);
> > +
> > + rc = pci_bus_add_device(virtfn);
> > + if (rc)
> > + goto failed1;
> > + sprintf(buf, "%d", id);
>
> %u, perhaps? And maybe 'id' should always be unsigned? Just a thought.

Yes, will replace %d to %u.

> > + rc = sysfs_create_link(&iov->dev.kobj, &virtfn->dev.kobj, buf);
> > + if (rc)
> > + goto failed1;
> > + rc = sysfs_create_link(&virtfn->dev.kobj, &dev->dev.kobj, "physfn");
> > + if (rc)
> > + goto failed2;
>
> I'm glad to see these symlinks documented in later patches!
>
> > + nres = 0;
> > + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > + res = dev->resource + PCI_SRIOV_RESOURCES + i;
> > + if (!res->parent)
> > + continue;
> > + nres++;
> > + }
>
> Can't this be written more simply as:
>
> for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> res = dev->resource + PCI_SRIOV_RESOURCES + i;
> if (res->parent)
> nres++;
> }

Yes, will do

> ?
>
> > + if (nres != iov->nres) {
> > + dev_err(&dev->dev, "no enough MMIO for SR-IOV\n");
> > + return -ENOMEM;
> > + }
>
> Randy, can you help us out with better wording here?
>
> > + dev_err(&dev->dev, "no enough bus range for SR-IOV\n");
>
> and here.
>
> > + if (iov->link != dev->devfn) {
> > + rc = -ENODEV;
> > + list_for_each_entry(link, &dev->bus->devices, bus_list) {
> > + if (link->sriov && link->devfn == iov->link)
> > + rc = sysfs_create_link(&iov->dev.kobj,
> > + &link->dev.kobj, "dep_link");
>
> I skipped to the end and read patch 7/7 and I still don't understand
> what dep_link is for. Can you explain please? In particular, how is it
> different from physfn?

It's defined by spec as:

3.3.8. Function Dependency Link (12h)
The programming model for a Device may have vendor specific dependencies
between sets of Functions. The Function Dependency Link field is used to
describe these dependencies. This field describes dependencies between PFs.
VF dependencies are the same as the dependencies of their associated PFs.
If a PF is independent from other PFs of a Device, this field shall
contain its own Function Number. If a PF is dependent on other PFs of a
Device, this field shall contain the Function Number of the next PF in
the same Function Dependency List. The last PF in a Function Dependency
List shall contain the Function Number of the first PF in the Function
Dependency List. If PF p and PF q are in the same Function Dependency
List, than any SI that is assigned VF p,n shall also be assigned to VF q,n.

Thanks,
Yu

2009-03-09 08:28:39

by Zhao, Yu

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

Thanks a lot, Randy!

On Sat, Mar 07, 2009 at 05:48:33AM +0800, Randy Dunlap wrote:
> Matthew Wilcox wrote:
> > On Fri, Feb 20, 2009 at 02:54:45PM +0800, Yu Zhao wrote:
> >
> >> + if (nres != iov->nres) {
> >> + dev_err(&dev->dev, "no enough MMIO for SR-IOV\n");
> >> + return -ENOMEM;
> >> + }
>
> "not enough MMIO BARs for SR-IOV"
> or
> "not enough MMIO resources for SR-IOV"
> or
> "too few MMIO BARs for SR-IOV"
> ?
>
> > Randy, can you help us out with better wording here?
> >
> >> + dev_err(&dev->dev, "no enough bus range for SR-IOV\n");
> >
> > and here.
>
> "SR-IOV: bus number too large"
> or
> "SR-IOV: bus number out of range"
> or
> "SR-IOV: cannot allocate valid bus number"
> ?
>
> >> + if (iov->link != dev->devfn) {
> >> + rc = -ENODEV;
> >> + list_for_each_entry(link, &dev->bus->devices, bus_list) {
> >> + if (link->sriov && link->devfn == iov->link)
> >> + rc = sysfs_create_link(&iov->dev.kobj,
> >> + &link->dev.kobj, "dep_link");

2009-03-09 19:46:44

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote:
> > > + pci_device_add(virtfn, virtfn->bus);
> >
> > Greg is probably going to ding you here for adding the device, then
> > creating the symlinks. I believe it's now best practice to create the
> > symlinks first, so there's no window where userspace can get confused.
>
> Yes, but unfortunately we can't create links before adding a device.
> I double checked device_add(), there is no place for those links to be
> created before it sends uevent. So for now, we have to trigger another
> uevent for those links.

What exactly are you trying to do with a symlink here that you need to
do it this way? I vaguely remember you mentioning this in the past, but
I thought you had dropped the symlinks after our conversation about this
very problem.

thanks,

greg k-h

2009-03-10 01:37:07

by Zhao, Yu

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

On Tue, Mar 10, 2009 at 03:39:01AM +0800, Greg KH wrote:
> On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote:
> > > > + pci_device_add(virtfn, virtfn->bus);
> > >
> > > Greg is probably going to ding you here for adding the device, then
> > > creating the symlinks. I believe it's now best practice to create the
> > > symlinks first, so there's no window where userspace can get confused.
> >
> > Yes, but unfortunately we can't create links before adding a device.
> > I double checked device_add(), there is no place for those links to be
> > created before it sends uevent. So for now, we have to trigger another
> > uevent for those links.
>
> What exactly are you trying to do with a symlink here that you need to
> do it this way? I vaguely remember you mentioning this in the past, but
> I thought you had dropped the symlinks after our conversation about this
> very problem.

I'd like to create some symlinks to reflect the relationship between
Physical Function and its associated Virtual Functions. The Physical
Function is like a master device that controls the allocation of its
Virtual Functions and owns the device physical resource. The Virtual
Functions are like slave devices of the Physical Function. For example,
if 01:00.0 is a Physical Function and 02:00.0 is a Virtual Function
associated with 01:00.0. Then the symlinks (virtfnN and physfn) would
look like:

$ ls -l /sys/bus/pci/devices/0000:01:00.0/
...
... virtfn0 -> ../0000:02:00.0
... virtfn1 -> ../0000:02:00.1
... virtfn2 -> ../0000:02:00.2
...

$ ls -l /sys/bus/pci/devices/0000:02:00.0/
...
... physfn -> ../0000:01:00.0
...

This is very useful for userspace applications, both KVM and Xen need
to know this kind of relationship so they can request the permission
from a Physical Function before using its associated Virtual Functions.

Thanks,
Yu

2009-03-11 04:38:27

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

On Tue, Mar 10, 2009 at 09:37:53AM +0800, Yu Zhao wrote:
> On Tue, Mar 10, 2009 at 03:39:01AM +0800, Greg KH wrote:
> > On Mon, Mar 09, 2009 at 04:25:05PM +0800, Yu Zhao wrote:
> > > > > + pci_device_add(virtfn, virtfn->bus);
> > > >
> > > > Greg is probably going to ding you here for adding the device, then
> > > > creating the symlinks. I believe it's now best practice to create the
> > > > symlinks first, so there's no window where userspace can get confused.
> > >
> > > Yes, but unfortunately we can't create links before adding a device.
> > > I double checked device_add(), there is no place for those links to be
> > > created before it sends uevent. So for now, we have to trigger another
> > > uevent for those links.
> >
> > What exactly are you trying to do with a symlink here that you need to
> > do it this way? I vaguely remember you mentioning this in the past, but
> > I thought you had dropped the symlinks after our conversation about this
> > very problem.
>
> I'd like to create some symlinks to reflect the relationship between
> Physical Function and its associated Virtual Functions. The Physical
> Function is like a master device that controls the allocation of its
> Virtual Functions and owns the device physical resource. The Virtual
> Functions are like slave devices of the Physical Function. For example,
> if 01:00.0 is a Physical Function and 02:00.0 is a Virtual Function
> associated with 01:00.0. Then the symlinks (virtfnN and physfn) would
> look like:
>
> $ ls -l /sys/bus/pci/devices/0000:01:00.0/
> ...
> ... virtfn0 -> ../0000:02:00.0
> ... virtfn1 -> ../0000:02:00.1
> ... virtfn2 -> ../0000:02:00.2
> ...
>
> $ ls -l /sys/bus/pci/devices/0000:02:00.0/
> ...
> ... physfn -> ../0000:01:00.0
> ...
>
> This is very useful for userspace applications, both KVM and Xen need
> to know this kind of relationship so they can request the permission
> from a Physical Function before using its associated Virtual Functions.

Ok, but then make sure you never rely on a udev rule or notifier to see
these symlinks when the device is added to the kernel, as there will be
a nice race condition there :)

thanks,

greg k-h