2008-11-25 21:24:35

by djwong

[permalink] [raw]
Subject: [PATCH] fakephp: Allocate PCI resources before adding the device

For PCI devices, pci_bus_assign_resources() must be called to set up the
pci_device->resource array before pci_bus_add_devices() can be called, else
attempts to load drivers results in BAR collision errors where there are none.
This is not done in fakephp, so devices can be "unplugged" but scanning the
parent bus won't bring the devices back due to resource unallocation. Move the
pci_bus_add_device-calling logic into pci_rescan_bus and preface it with a call
to pci_bus_assign_resources so that we only have to (re)allocate resources once
per bus where a new device is found.

Signed-off-by: Darrick J. Wong <[email protected]>
---
drivers/pci/hotplug/fakephp.c | 42 +++++++++++++++++++++++++----------------
1 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/fakephp.c b/drivers/pci/hotplug/fakephp.c
index 3a2637a..a3826e6 100644
--- a/drivers/pci/hotplug/fakephp.c
+++ b/drivers/pci/hotplug/fakephp.c
@@ -195,13 +195,13 @@ static void remove_slot_worker(struct work_struct *work)
* Tries hard not to re-enable already existing devices;
* also handles scanning of subfunctions.
*/
-static void pci_rescan_slot(struct pci_dev *temp)
+static int pci_rescan_slot(struct pci_dev *temp)
{
struct pci_bus *bus = temp->bus;
struct pci_dev *dev;
int func;
- int retval;
u8 hdr_type;
+ int count = 0;

if (!pci_read_config_byte(temp, PCI_HEADER_TYPE, &hdr_type)) {
temp->hdr_type = hdr_type & 0x7f;
@@ -213,17 +213,12 @@ static void pci_rescan_slot(struct pci_dev *temp)
dbg("New device on %s function %x:%x\n",
bus->name, temp->devfn >> 3,
temp->devfn & 7);
- retval = pci_bus_add_device(dev);
- if (retval)
- dev_err(&dev->dev, "error adding "
- "device, continuing.\n");
- else
- add_slot(dev);
+ count++;
}
}
/* multifunction device? */
if (!(hdr_type & 0x80))
- return;
+ return count;

/* continue scanning for other functions */
for (func = 1, temp->devfn++; func < 8; func++, temp->devfn++) {
@@ -239,16 +234,13 @@ static void pci_rescan_slot(struct pci_dev *temp)
dbg("New device on %s function %x:%x\n",
bus->name, temp->devfn >> 3,
temp->devfn & 7);
- retval = pci_bus_add_device(dev);
- if (retval)
- dev_err(&dev->dev, "error adding "
- "device, continuing.\n");
- else
- add_slot(dev);
+ count++;
}
}
}
}
+
+ return count;
}


@@ -262,6 +254,8 @@ static void pci_rescan_bus(const struct pci_bus *bus)
{
unsigned int devfn;
struct pci_dev *dev;
+ int retval;
+ int found = 0;
dev = alloc_pci_dev();
if (!dev)
return;
@@ -270,7 +264,23 @@ static void pci_rescan_bus(const struct pci_bus *bus)
dev->sysdata = bus->sysdata;
for (devfn = 0; devfn < 0x100; devfn += 8) {
dev->devfn = devfn;
- pci_rescan_slot(dev);
+ found += pci_rescan_slot(dev);
+ }
+
+ if (found) {
+ pci_bus_assign_resources(bus);
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ /* Skip already-added devices */
+ if (dev->is_added)
+ continue;
+ retval = pci_bus_add_device(dev);
+ if (retval)
+ dev_err(&dev->dev,
+ "Error adding device, continuing\n");
+ else
+ add_slot(dev);
+ }
+ pci_bus_add_devices(bus);
}
kfree(dev);
}


2008-11-25 21:49:31

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Tue, Nov 25, 2008 at 01:24:22PM -0800, Darrick J. Wong wrote:
> For PCI devices, pci_bus_assign_resources() must be called to set up the
> pci_device->resource array before pci_bus_add_devices() can be called, else
> attempts to load drivers results in BAR collision errors where there are none.
> This is not done in fakephp, so devices can be "unplugged" but scanning the
> parent bus won't bring the devices back due to resource unallocation. Move the
> pci_bus_add_device-calling logic into pci_rescan_bus and preface it with a call
> to pci_bus_assign_resources so that we only have to (re)allocate resources once
> per bus where a new device is found.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

Darrick, I am no longer the PCI maintainer, Jesse is.

Please resend this to him, and the linux-pci mailing list if you want it
to have a chance to get accepted.

thanks,

greg k-h

2008-11-26 04:47:27

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Tue, 25 Nov 2008, Darrick J. Wong wrote:
> For PCI devices, pci_bus_assign_resources() must be called to set up the
> pci_device->resource array before pci_bus_add_devices() can be called, else
> attempts to load drivers results in BAR collision errors where there are none.

I've had a patch to fakephp that did something like this for a while, but I
called pci_bus_assign_resources() _after_ adding the devices with calls to
pci_bus_add_device(). It seems like that might be easier, to just add all
the devices and then call pci_bus_assign_resources() when done. It appears
to work fine for me. Is there a reason this is wrong?

2008-11-26 07:48:21

by djwong

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

[Fixing cc/to list]

On Tue, Nov 25, 2008 at 08:46:37PM -0800, Trent Piepho wrote:
>
> I've had a patch to fakephp that did something like this for a while, but I
> called pci_bus_assign_resources() _after_ adding the devices with calls to
> pci_bus_add_device(). It seems like that might be easier, to just add all
> the devices and then call pci_bus_assign_resources() when done. It appears
> to work fine for me. Is there a reason this is wrong?

afaict, pci_bus_add_devices calls device_add to set up sysfs files and
trigger a event that can (ultimately) cause a pci probe action to
happen... but the probe will fail because the resources aren't ready.
In any case, if a device shows up in sysfs I'd assume that to mean that
the device is ready to go--all the BARs are reserved for the device,
etc. For sure, I woudn't expect to be racing
pci_bus_assign_resources().

--D

2008-11-26 09:56:56

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Tue, 25 Nov 2008, Darrick J. Wong wrote:
> On Tue, Nov 25, 2008 at 08:46:37PM -0800, Trent Piepho wrote:
> >
> > I've had a patch to fakephp that did something like this for a while, but I
> > called pci_bus_assign_resources() _after_ adding the devices with calls to
> > pci_bus_add_device(). It seems like that might be easier, to just add all
> > the devices and then call pci_bus_assign_resources() when done. It appears
> > to work fine for me. Is there a reason this is wrong?
>
> afaict, pci_bus_add_devices calls device_add to set up sysfs files and
> trigger a event that can (ultimately) cause a pci probe action to
> happen... but the probe will fail because the resources aren't ready.
> In any case, if a device shows up in sysfs I'd assume that to mean that
> the device is ready to go--all the BARs are reserved for the device,
> etc. For sure, I woudn't expect to be racing
> pci_bus_assign_resources().

Ok, that makes sense. The device I'm using fakephp for doesn't have a
kernel driver so I wouldn't have noticed that.

Have you tested this with a device that isn't present at boot? I found
that I needed to a call to pci_enable_device() after assigning resources,
otherwise the BARs wouldn't be enabled. This only happened if the device
wasn't present at boot time.

My hardware doesn't run on the latest kernel so I can't test it. It looks
like there have been a bunch of pci hotplug changes so back porting this
might not be feasible. It also looks a previous patch by Alex Chiang
completely changed the sysfs interface for fakephp. I thought sysfs
interfaces were supposed to be stable?! Also looks like it made fakephp
useless. How are you supposed to figure out which "fake-n" directory is
the right one to disable the device you want?

2008-11-26 18:18:54

by djwong

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, Nov 26, 2008 at 01:56:45AM -0800, Trent Piepho wrote:
> Ok, that makes sense. The device I'm using fakephp for doesn't have a
> kernel driver so I wouldn't have noticed that.
>
> Have you tested this with a device that isn't present at boot? I found
> that I needed to a call to pci_enable_device() after assigning resources,
> otherwise the BARs wouldn't be enabled. This only happened if the device
> wasn't present at boot time.

Yes, I was actually using this driver to <cough> turn the ioatdma
controller on after turning it off in the BIOS.

> My hardware doesn't run on the latest kernel so I can't test it. It looks
> like there have been a bunch of pci hotplug changes so back porting this
> might not be feasible. It also looks a previous patch by Alex Chiang
> completely changed the sysfs interface for fakephp. I thought sysfs
> interfaces were supposed to be stable?! Also looks like it made fakephp

I doubt that, sysfs interfaces change all the time. I think only the
syscall interface has any sort of stability guarantee.

> useless. How are you supposed to figure out which "fake-n" directory is
> the right one to disable the device you want?

cat /sys/bus/pci/slots/fake*/address

--D

2008-11-26 22:23:19

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> On Wed, Nov 26, 2008 at 01:56:45AM -0800, Trent Piepho wrote:
> > Ok, that makes sense. The device I'm using fakephp for doesn't have a
> > kernel driver so I wouldn't have noticed that.
> >
> > Have you tested this with a device that isn't present at boot? I found
> > that I needed to a call to pci_enable_device() after assigning resources,
> > otherwise the BARs wouldn't be enabled. This only happened if the device
> > wasn't present at boot time.
>
> Yes, I was actually using this driver to <cough> turn the ioatdma
> controller on after turning it off in the BIOS.

Maybe it's different on powerpc then? My pseudo-hotplugable device is also
the only thing connected to the PCI-E host bus controller. At boot the
controller is empty and so I think some code to enable its BARs gets
skipped. But without the pci_enable_device(), I get this:

01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
Flags: fast devsel, IRQ 255
Memory at 40000000 (64-bit, prefetchable) [disabled] [size=4K]
Capabilities: [40] Power Management version 3
Capabilities: [48] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/0 Enable-
Capabilities: [60] Express Endpoint IRQ 0
Capabilities: [100] Device Serial Number 00-00-00-00-00-00-00-00

> > might not be feasible. It also looks a previous patch by Alex Chiang
> > completely changed the sysfs interface for fakephp. I thought sysfs
> > interfaces were supposed to be stable?! Also looks like it made fakephp
>
> I doubt that, sysfs interfaces change all the time. I think only the
> syscall interface has any sort of stability guarantee.

The ones meant to be used from userspace usually don't. At least that
doesn't seem to be what Linus is saying here http://lwn.net/Articles/172986/

I've written software that uses an established interface, which has been
the same for years, and I see someone went and broke it, no warning. That
hardly seems reasonable.

> > useless. How are you supposed to figure out which "fake-n" directory is
> > the right one to disable the device you want?
>
> cat /sys/bus/pci/slots/fake*/address

Ahh, my kernel doesn't have an address attribute in the slot directories.

Still, this is much more inefficient than the previous way. It also has a
race condition. After scanning each fake* device to find the one with the
correct address there is a window before the power attribute is written.
The fake* device might change in that time and one could write to the wrong
power attribute.

2008-11-26 22:55:49

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

* Trent Piepho <[email protected]>:
> On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> > On Wed, Nov 26, 2008 at 01:56:45AM -0800, Trent Piepho wrote:
> > > Ok, that makes sense. The device I'm using fakephp for doesn't have a
> > > kernel driver so I wouldn't have noticed that.
> > >
> > > Have you tested this with a device that isn't present at boot? I found
> > > that I needed to a call to pci_enable_device() after assigning resources,
> > > otherwise the BARs wouldn't be enabled. This only happened if the device
> > > wasn't present at boot time.
> >
> > Yes, I was actually using this driver to <cough> turn the ioatdma
> > controller on after turning it off in the BIOS.
>
> Maybe it's different on powerpc then? My pseudo-hotplugable device is also
> the only thing connected to the PCI-E host bus controller. At boot the
> controller is empty and so I think some code to enable its BARs gets
> skipped. But without the pci_enable_device(), I get this:
>
> 01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
> Flags: fast devsel, IRQ 255
> Memory at 40000000 (64-bit, prefetchable) [disabled] [size=4K]
> Capabilities: [40] Power Management version 3
> Capabilities: [48] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/0 Enable-
> Capabilities: [60] Express Endpoint IRQ 0
> Capabilities: [100] Device Serial Number 00-00-00-00-00-00-00-00
>
> > > might not be feasible. It also looks a previous patch by Alex Chiang
> > > completely changed the sysfs interface for fakephp. I thought sysfs
> > > interfaces were supposed to be stable?! Also looks like it made fakephp
> >
> > I doubt that, sysfs interfaces change all the time. I think only the
> > syscall interface has any sort of stability guarantee.
>
> The ones meant to be used from userspace usually don't. At least that
> doesn't seem to be what Linus is saying here http://lwn.net/Articles/172986/
>
> I've written software that uses an established interface, which has been
> the same for years, and I see someone went and broke it, no warning. That
> hardly seems reasonable.

Which commit are you talking about?

5fe6cc60680d29740b85278e17a002fa27b7e642
PCI: prevent duplicate slot names

fe99740cac117f208707488c03f3789cf4904957
PCI: construct one fakephp slot per PCI slot

Sorry for your frustration. Both patchsets went through multiple
revisions, and had plenty of input from the various PCI
developers. At the time, no one seemed to think that the proposed
changes were unreasonable.

> > > useless. How are you supposed to figure out which "fake-n" directory is
> > > the right one to disable the device you want?
> >
> > cat /sys/bus/pci/slots/fake*/address
>
> Ahh, my kernel doesn't have an address attribute in the slot directories.

That is bizarre. You are saying that you're getting slot entries
without address files?

Can you send the output of 'ls /sys/bus/pci/slots' and also:

$ for i in /sys/bus/pci/slots ; do ls $i ; done

What kernel are you on?

> Still, this is much more inefficient than the previous way. It also has a
> race condition. After scanning each fake* device to find the one with the
> correct address there is a window before the power attribute is written.
> The fake* device might change in that time and one could write to the wrong
> power attribute.

There is no way for fakephp to hot-add devices. The only use case
is to hot-remove devices.

Once fakephp is loaded, the fake* entry created in sysfs will not
change.

Maybe you're talking about something else. Some more context for
what you're trying to do, please?

/ac

2008-11-27 01:44:47

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, 26 Nov 2008, Alex Chiang wrote:
> * Trent Piepho <[email protected]>:
> > > I doubt that, sysfs interfaces change all the time. I think only the
> > > syscall interface has any sort of stability guarantee.
> >
> > The ones meant to be used from userspace usually don't. At least that
> > doesn't seem to be what Linus is saying here http://lwn.net/Articles/172986/
> >
> > I've written software that uses an established interface, which has been
> > the same for years, and I see someone went and broke it, no warning. That
> > hardly seems reasonable.
>
> Which commit are you talking about?
>
> fe99740cac117f208707488c03f3789cf4904957
> PCI: construct one fakephp slot per PCI slot

This one. It will break any software that used the existing interface.

> Sorry for your frustration. Both patchsets went through multiple
> revisions, and had plenty of input from the various PCI
> developers. At the time, no one seemed to think that the proposed
> changes were unreasonable.

I wonder how many users of fakephp saw them? I can't monitor every patch
to linux kernel. I checked the lkml archive and didn't see any discussion
of the change to the fakephp interface. There was stuff about the other
patches, but nothing about that.

Is there a reason you had to change it? It breaks an existing interface.
It's clearly more inefficient and complicated to find a slot using it. The
exiting PCI device name, like "0000:01:00.0", has been used in the sysfs
interface and with tools like lspci since forever. Why should
/sys/bus/pci/slots use different names from /sys/bus/pci/devices for the
same device?

> > > > useless. How are you supposed to figure out which "fake-n" directory is
> > > > the right one to disable the device you want?
> > >
> > > cat /sys/bus/pci/slots/fake*/address
> >
> > Ahh, my kernel doesn't have an address attribute in the slot directories.
>
> That is bizarre. You are saying that you're getting slot entries
> without address files?
>
> Can you send the output of 'ls /sys/bus/pci/slots' and also:
>
> $ for i in /sys/bus/pci/slots ; do ls $i ; done
>
> What kernel are you on?

It's a 2.6.23 kernel, but I've back-ported the PCI code from 2.6.26 since
there were a number of important powerpc improvements that I needed. I
assure you there is no 'address' attribute in the directories in
bus/pci/slots. /sys/bus/pci/slots/* == /sys/bus/pci/devices/*, which is
quite nice.

> > Still, this is much more inefficient than the previous way. It also has a
> > race condition. After scanning each fake* device to find the one with the
> > correct address there is a window before the power attribute is written.
> > The fake* device might change in that time and one could write to the wrong
> > power attribute.
>
> There is no way for fakephp to hot-add devices. The only use case
> is to hot-remove devices.

Sure it's possible to hot-add devices. Look at the patch to fakephp just
previous to your first patch:

commit ca99eb8c2d56bdfff0161388b81e641f4e039b3f
Author: Trent Piepho <[email protected]>
Date: Mon Apr 7 16:04:16 2008 -0700

PCI: Hotplug: fakephp: Return success, not ENODEV, when bus rescan is triggered

What is the purpose of a bus rescan, if not to hot-add devices?

> Once fakephp is loaded, the fake* entry created in sysfs will not
> change.
>
> Maybe you're talking about something else. Some more context for
> what you're trying to do, please?

/ # lspci
00:00.0 PCI bridge: Freescale Semiconductor Inc MPC8572 PowerQUICC III PCIe Host Controller (rev 11)
01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
/ # echo 0 > /sys/bus/pci/slots/`lspci -Dm -d 1957:fc00|cut -d\ -f1`/power
/ # lspci
00:00.0 PCI bridge: Freescale Semiconductor Inc MPC8572 PowerQUICC III PCIe Host Controller (rev 11)
/ # [reload the image on the FPGA, which implements the PCI-E interface]
/ # echo 1 > /sys/bus/pci/slots/0000:00:00.0/power
/ # lspci
00:00.0 PCI bridge: Freescale Semiconductor Inc MPC8572 PowerQUICC III PCIe Host Controller (rev 11)
01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface

Note that the device doesn't exist after boot, as the image on the FPGA
which implements the PCI-E interface must be loaded from Linux first. When
the image is reloaded in the middle step, it could change the device into a
completely different one. Different PCI id, different BARs, PCI-E link
width changing from 4x to 8x, and so on. I just don't have any visibly
different images handy that run on the hardware I have online at the
moment.

Having a PCI/PCI-E interface that exists in an FPGA isn't uncommon for
custom hardware. An IP block for a PCI/whatever interface is a standard
feature for any FPGA powerful enough for that sort of thing. I know there
are others who use fakephp for this same thing with there FPGA based
hardware.

2008-11-27 01:52:24

by djwong

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, Nov 26, 2008 at 03:55:35PM -0700, Alex Chiang wrote:
> > Maybe it's different on powerpc then? My pseudo-hotplugable device is also
> > the only thing connected to the PCI-E host bus controller. At boot the
> > controller is empty and so I think some code to enable its BARs gets
> > skipped. But without the pci_enable_device(), I get this:
> >
> > 01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
> > Flags: fast devsel, IRQ 255
> > Memory at 40000000 (64-bit, prefetchable) [disabled] [size=4K]
Are you referring to this? ^^^^^^^^^^

Without seeing the raw dump of the PCI config space, it looks to me like
the memory space enable bit of the PCICMD register is unset. Probably
the device driver should call pci_enable_device() at init time, though I
suppose you did say earlier that there is no driver.

> There is no way for fakephp to hot-add devices. The only use case
> is to hot-remove devices.

Actually, you can "hot-add" devices that were previously removed. Just
rescan the removed device's bridge by running this command: echo 1 >
/sys/bus/pci/slots/fakeX/power. Onlining the bridge (even if it's
already online) causes the PCI code to rescan the bridge for devices,
and that which you've removed comes back.

> Maybe you're talking about something else. Some more context for
> what you're trying to do, please?

Here is my bizarro case. The MCH has a bit that controls whether or not
the ioat controller pays attention to accesses to PCI configuration
data. If the bit is set, the ioat device shows up as 0:8.0. If the bit
is unset, it looks like there's no device there. So I set this bit and
use the fakephp driver to rescan the root PCI bridge; after doing that,
the ioat controller pops up. Quite possibly this (ab)use of the fakephp
driver could be used to turn on other things from Linux that would
otherwise be disabled, but your machine might blow up too.

--D

2008-11-27 02:42:33

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, Nov 26, 2008 at 05:44:35PM -0800, Trent Piepho wrote:
> Is there a reason you had to change it? It breaks an existing interface.
> It's clearly more inefficient and complicated to find a slot using it. The
> exiting PCI device name, like "0000:01:00.0", has been used in the sysfs
> interface and with tools like lspci since forever. Why should
> /sys/bus/pci/slots use different names from /sys/bus/pci/devices for the
> same device?

Because fakephp was the odd one out. Every other hotplug driver
registers a slot number and puts the address in the directory. We're
making hotplug drivers simpler (by sharing more of the logic in the
core) and so fakephp had to change to match the other drivers. I'm
sorry for your inconvenience, but it's necessary.

We can discuss other ways to make your life better, but it can't be
changed back to one 'slot' per PCI function.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-11-28 09:51:19

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> On Wed, Nov 26, 2008 at 03:55:35PM -0700, Alex Chiang wrote:
> > > Maybe it's different on powerpc then? My pseudo-hotplugable device is also
> > > the only thing connected to the PCI-E host bus controller. At boot the
> > > controller is empty and so I think some code to enable its BARs gets
> > > skipped. But without the pci_enable_device(), I get this:
> > >
> > > 01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
> > > Flags: fast devsel, IRQ 255
> > > Memory at 40000000 (64-bit, prefetchable) [disabled] [size=4K]
> Are you referring to this? ^^^^^^^^^^
>
> Without seeing the raw dump of the PCI config space, it looks to me like
> the memory space enable bit of the PCICMD register is unset. Probably
> the device driver should call pci_enable_device() at init time, though I
> suppose you did say earlier that there is no driver.

Yes, that's it. It seems like since the BARs are normally enabled after a
device is scanned at boot time that they should also be enabled when a
device is found by a fakephp rescan. So I thought it seemed reasonable to
put pci_enable_device() in fakephp.

2008-11-28 10:12:03

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Wed, 26 Nov 2008, Matthew Wilcox wrote:
> On Wed, Nov 26, 2008 at 05:44:35PM -0800, Trent Piepho wrote:
> > Is there a reason you had to change it? It breaks an existing interface.
> > It's clearly more inefficient and complicated to find a slot using it. The
> > exiting PCI device name, like "0000:01:00.0", has been used in the sysfs
> > interface and with tools like lspci since forever. Why should
> > /sys/bus/pci/slots use different names from /sys/bus/pci/devices for the
> > same device?
>
> Because fakephp was the odd one out. Every other hotplug driver
> registers a slot number and puts the address in the directory. We're
> making hotplug drivers simpler (by sharing more of the logic in the
> core) and so fakephp had to change to match the other drivers. I'm
> sorry for your inconvenience, but it's necessary.

Seems like changing a long established interface like this should be put on
the feature removable schedule.

> We can discuss other ways to make your life better, but it can't be
> changed back to one 'slot' per PCI function.

Why not have a slot per pci function with each hotplug driver adding
attributes to the slot? The slot would be registered by the hotplug core.

Is there any reason the fakephp driver should not be allowed to enable and
disable individual functions? There a lot of mutli-function devices that
allow individual functions to be enabled and disabled with registers on the
primary function. For instance, it would be possible to work around a BIOS
that doesn't have an option to disabled a southbridge's USB controller by
doing so in Linux and then using fakephp to remove the device, and thus
free address space that other devices might need.

2008-11-28 18:43:03

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

Trent Piepho wrote:
> On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> > On Wed, Nov 26, 2008 at 03:55:35PM -0700, Alex Chiang wrote:
> > > > Maybe it's different on powerpc then? My pseudo-hotplugable device
> > > > is also the only thing connected to the PCI-E host bus controller.
> > > > At boot the controller is empty and so I think some code to enable
> > > > its BARs gets skipped. But without the pci_enable_device(), I get
> > > > this:
> > > >
> > > > 01:00.0 Signal processing controller: Freescale Semiconductor Inc
> > > > Aurora Nexus Trace Interface Flags: fast devsel, IRQ 255
> > > > Memory at 40000000 (64-bit, prefetchable) [disabled]
> > > > [size=4K]
> >
> > Are you referring to this? ^^^^^^^^^^
> >
> > Without seeing the raw dump of the PCI config space, it looks to me like
> > the memory space enable bit of the PCICMD register is unset. Probably
> > the device driver should call pci_enable_device() at init time, though I
> > suppose you did say earlier that there is no driver.
>
> Yes, that's it. It seems like since the BARs are normally enabled after a
> device is scanned at boot time that they should also be enabled when a
> device is found by a fakephp rescan. So I thought it seemed reasonable to
> put pci_enable_device() in fakephp.

No, pci_enable_device() will be called by the device driver. The hotplug
drivers have nothing to do with that.

Eike


Attachments:
(No filename) (1.40 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-28 18:57:50

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Fri, Nov 28, 2008 at 02:11:49AM -0800, Trent Piepho wrote:
> On Wed, 26 Nov 2008, Matthew Wilcox wrote:
> > Because fakephp was the odd one out. Every other hotplug driver
> > registers a slot number and puts the address in the directory. We're
> > making hotplug drivers simpler (by sharing more of the logic in the
> > core) and so fakephp had to change to match the other drivers. I'm
> > sorry for your inconvenience, but it's necessary.
>
> Seems like changing a long established interface like this should be put on
> the feature removable schedule.

sysfs is more fluid than other interfaces. Large parts of it
change without warning. I suppose it could have gone through
feature-removal-schedule, and perhaps we should try to do that in future.

> > We can discuss other ways to make your life better, but it can't be
> > changed back to one 'slot' per PCI function.
>
> Why not have a slot per pci function with each hotplug driver adding
> attributes to the slot? The slot would be registered by the hotplug core.

What we have now is the 'address' file managed by the core. This
doesn't include the function number.

> Is there any reason the fakephp driver should not be allowed to enable and
> disable individual functions? There a lot of mutli-function devices that
> allow individual functions to be enabled and disabled with registers on the
> primary function. For instance, it would be possible to work around a BIOS
> that doesn't have an option to disabled a southbridge's USB controller by
> doing so in Linux and then using fakephp to remove the device, and thus
> free address space that other devices might need.

There are lots of circumstances where enabling and disabling individual
functions is a good idea, but I don't think that fakephp is the right
way to do it. We need a better interface.

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-11-28 21:07:05

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Fri, 28 Nov 2008, Rolf Eike Beer wrote:
> Trent Piepho wrote:
> > On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> > > On Wed, Nov 26, 2008 at 03:55:35PM -0700, Alex Chiang wrote:
> > > > > Maybe it's different on powerpc then? My pseudo-hotplugable device
> > > > > is also the only thing connected to the PCI-E host bus controller.
> > > > > At boot the controller is empty and so I think some code to enable
> > > > > its BARs gets skipped. But without the pci_enable_device(), I get
> > > > > this:
> > > > >
> > > > > 01:00.0 Signal processing controller: Freescale Semiconductor Inc
> > > > > Aurora Nexus Trace Interface Flags: fast devsel, IRQ 255
> > > > > Memory at 40000000 (64-bit, prefetchable) [disabled]
> > > > > [size=4K]
> > >
> > > Are you referring to this? ^^^^^^^^^^
> > >
> > > Without seeing the raw dump of the PCI config space, it looks to me like
> > > the memory space enable bit of the PCICMD register is unset. Probably
> > > the device driver should call pci_enable_device() at init time, though I
> > > suppose you did say earlier that there is no driver.
> >
> > Yes, that's it. It seems like since the BARs are normally enabled after a
> > device is scanned at boot time that they should also be enabled when a
> > device is found by a fakephp rescan. So I thought it seemed reasonable to
> > put pci_enable_device() in fakephp.
>
> No, pci_enable_device() will be called by the device driver. The hotplug
> drivers have nothing to do with that.

I guess you didn't read the part about there not being a device driver?

Why should a device added by fakephp be configured differently than one
added at boot time?

Is there something that will break if fakephp enables devices it finds?

2008-11-28 21:21:34

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Fri, 28 Nov 2008, Matthew Wilcox wrote:
> On Fri, Nov 28, 2008 at 02:11:49AM -0800, Trent Piepho wrote:
> > On Wed, 26 Nov 2008, Matthew Wilcox wrote:
> > > Because fakephp was the odd one out. Every other hotplug driver
> > > registers a slot number and puts the address in the directory. We're
> > > making hotplug drivers simpler (by sharing more of the logic in the
> > > core) and so fakephp had to change to match the other drivers. I'm
> > > sorry for your inconvenience, but it's necessary.
> >
> > Seems like changing a long established interface like this should be put on
> > the feature removable schedule.
>
> sysfs is more fluid than other interfaces. Large parts of it
> change without warning. I suppose it could have gone through
> feature-removal-schedule, and perhaps we should try to do that in future.

Sometimes I just want to give up on Linux. Is there a different interface
that isn't wantonly changed without warning? No? Do you not see the
problem this creates for developers? Do you not care?

> > > We can discuss other ways to make your life better, but it can't be
> > > changed back to one 'slot' per PCI function.
> >
> > Why not have a slot per pci function with each hotplug driver adding
> > attributes to the slot? The slot would be registered by the hotplug core.
>
> What we have now is the 'address' file managed by the core. This
> doesn't include the function number.

So the race condition doesn't matter? Alex Chiang wasn't even aware
fakephp could be used to add new devices, so this change obviously wasn't
though out very well.

> > Is there any reason the fakephp driver should not be allowed to enable and
> > disable individual functions? There a lot of mutli-function devices that
> > allow individual functions to be enabled and disabled with registers on the
> > primary function. For instance, it would be possible to work around a BIOS
> > that doesn't have an option to disabled a southbridge's USB controller by
> > doing so in Linux and then using fakephp to remove the device, and thus
> > free address space that other devices might need.
>
> There are lots of circumstances where enabling and disabling individual
> functions is a good idea, but I don't think that fakephp is the right
> way to do it. We need a better interface.

So maybe this better interface should be created before breaking the
existing one?

2008-11-28 21:30:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Fri, Nov 28, 2008 at 01:21:22PM -0800, Trent Piepho wrote:
> Sometimes I just want to give up on Linux. Is there a different interface
> that isn't wantonly changed without warning? No? Do you not see the
> problem this creates for developers? Do you not care?

You see, we didn't know that anyone was using fakephp for real work.
It's marketed as being a way for developers to test their device drivers
with hotplug slots even if they don't have a real hotplug machine.
If we'd known, we'd've done a better job.

BTW, cut the crap about "Sometimes I just want to give up on Linux".
We're putting in a lot of work that you get to use for free. That doesn't
give you the right to be abusive. And if you stopped using Linux, we'd
have one fewer person complaining, so I don't personally find it a huge
motivator to drop everything and attend to your whims.

> > What we have now is the 'address' file managed by the core. This
> > doesn't include the function number.
>
> So the race condition doesn't matter? Alex Chiang wasn't even aware
> fakephp could be used to add new devices, so this change obviously wasn't
> though out very well.

There is no race condition. You can't remove a fakephp slot.

> > There are lots of circumstances where enabling and disabling individual
> > functions is a good idea, but I don't think that fakephp is the right
> > way to do it. We need a better interface.
>
> So maybe this better interface should be created before breaking the
> existing one?

Hey, I have a great idea. Why don't you help instead of just bitching?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-11-28 23:18:33

by Alex Chiang

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

* Trent Piepho <[email protected]>:
> On Wed, 26 Nov 2008, Alex Chiang wrote:
> >
> > Which commit are you talking about?
> >
> > fe99740cac117f208707488c03f3789cf4904957 PCI: construct one
> > fakephp slot per PCI slot
>
> This one. It will break any software that used the existing
> interface.
>
> > Sorry for your frustration. Both patchsets went through
> > multiple revisions, and had plenty of input from the various
> > PCI developers. At the time, no one seemed to think that the
> > proposed changes were unreasonable.
>
> I wonder how many users of fakephp saw them? I can't monitor
> every patch to linux kernel. I checked the lkml archive and
> didn't see any discussion of the change to the fakephp
> interface. There was stuff about the other patches, but
> nothing about that.

Until now, I hadn't realized that there were "real" users of
fakephp. My assumption was that anyone using it was a developer.

I see now that my assumption was wrong.

> Is there a reason you had to change it? It breaks an existing
> interface. It's clearly more inefficient and complicated to
> find a slot using it. The exiting PCI device name, like
> "0000:01:00.0", has been used in the sysfs interface and with
> tools like lspci since forever.

As Willy pointed out, we had to change fakephp to move to one
'slot' per device to normalize the other hotplug drivers. The
overall goal was to reduce the complexity of the individual
hotplug drivers and move more functionality into the hotplug
core. This change in fakephp was a side-effect.

> Why should /sys/bus/pci/slots use different names from
> /sys/bus/pci/devices for the same device?

Because they are fundamentally different concepts. Entries in
/sys/bus/pci/slots/ _should_ correspond to physical PCI slots.

> > That is bizarre. You are saying that you're getting slot
> > entries without address files?
> >
> > Can you send the output of 'ls /sys/bus/pci/slots' and also:
> >
> > $ for i in /sys/bus/pci/slots ; do ls $i ; done
> >
> > What kernel are you on?
>
> It's a 2.6.23 kernel, but I've back-ported the PCI code from
> 2.6.26 since there were a number of important powerpc
> improvements that I needed. I assure you there is no 'address'
> attribute in the directories in bus/pci/slots.

It sounds like you didn't backport enough stuff then.

> > There is no way for fakephp to hot-add devices. The only use case
> > is to hot-remove devices.
>
> Sure it's possible to hot-add devices. Look at the patch to fakephp just
> previous to your first patch:

Oh, ok. The reason I didn't realize this is because the fakeN
entry goes away after you echo a 0 into its power file. Since I'm
not a heavy fakephp user, I never realized that it was possible
to get it to rescan the bus.

> > Once fakephp is loaded, the fake* entry created in sysfs will not
> > change.
> >
> > Maybe you're talking about something else. Some more context for
> > what you're trying to do, please?
>
> / # lspci
> 00:00.0 PCI bridge: Freescale Semiconductor Inc MPC8572 PowerQUICC III PCIe Host Controller (rev 11)
> 01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
> / # echo 0 > /sys/bus/pci/slots/`lspci -Dm -d 1957:fc00|cut -d\ -f1`/power
> / # lspci
> 00:00.0 PCI bridge: Freescale Semiconductor Inc MPC8572 PowerQUICC III PCIe Host Controller (rev 11)
> / # [reload the image on the FPGA, which implements the PCI-E interface]
> / # echo 1 > /sys/bus/pci/slots/0000:00:00.0/power
> / # lspci
> 00:00.0 PCI bridge: Freescale Semiconductor Inc MPC8572 PowerQUICC III PCIe Host Controller (rev 11)
> 01:00.0 Signal processing controller: Freescale Semiconductor Inc Aurora Nexus Trace Interface
>
> Note that the device doesn't exist after boot, as the image on the FPGA
> which implements the PCI-E interface must be loaded from Linux first. When
> the image is reloaded in the middle step, it could change the device into a
> completely different one. Different PCI id, different BARs, PCI-E link
> width changing from 4x to 8x, and so on. I just don't have any visibly
> different images handy that run on the hardware I have online at the
> moment.
>
> Having a PCI/PCI-E interface that exists in an FPGA isn't uncommon for
> custom hardware. An IP block for a PCI/whatever interface is a standard
> feature for any FPGA powerful enough for that sort of thing. I know there
> are others who use fakephp for this same thing with there FPGA based
> hardware.

I have to be honest, my first reaction to reading your
attack-filled mails was to shrug and think, "I don't know who
this jerk is, but I sure don't feel like helping him."

Whatever, I'm over it.

Let's try and find a solution that will work for everyone. I
don't think that going back to one entry in /sys/bus/pci/slots
per function is a solution.

Last time, Willy proposed a few ideas in this thread:

http://lkml.org/lkml/2008/9/9/28

I like this idea:

Maybe we want a /sys/bus/pci/scan or
/sys/bus/pci/devices/scan file that we can echo
"0000:01:02.3" to scan just that function, or
"0000:01:02" to scan the device.

Will that work for you, Trent?

/ac

2008-12-01 01:10:29

by Trent Piepho

[permalink] [raw]
Subject: Problems with fakephp

On Fri, 28 Nov 2008, Matthew Wilcox wrote:
> On Fri, Nov 28, 2008 at 01:21:22PM -0800, Trent Piepho wrote:
> > Sometimes I just want to give up on Linux. Is there a different interface
> > that isn't wantonly changed without warning? No? Do you not see the
> > problem this creates for developers? Do you not care?
>
> You see, we didn't know that anyone was using fakephp for real work.
> It's marketed as being a way for developers to test their device drivers
> with hotplug slots even if they don't have a real hotplug machine.
> If we'd known, we'd've done a better job.
>
> BTW, cut the crap about "Sometimes I just want to give up on Linux".
> We're putting in a lot of work that you get to use for free. That doesn't
> give you the right to be abusive. And if you stopped using Linux, we'd
> have one fewer person complaining, so I don't personally find it a huge
> motivator to drop everything and attend to your whims.

I apologize if I've come off as being abusive. I think what's been done to
fakephp is a real problem and being told "too bad" in not so many words or
"you're wrong" when I'm not is very discouraging.

I've been contributing code to open source projects for over a dozen years
and have fixed more than a few bugs in the linux kernel. When I had the
same problem Darrick Wong did with resource assignment, I didn't whine
about it, I made a patch and fixed it. But when someone thinks removing an
interface, that's been around for years, with no warning is ok, or assumes
that because they don't know someone they can't have a clue what they're
talking about, how can I patch that?

> > So the race condition doesn't matter? Alex Chiang wasn't even aware
>
> There is no race condition. You can't remove a fakephp slot.

Please take a glance at fakephp before telling me I'm wrong, as you'll see
that I'm not.

> > So maybe this better interface should be created before breaking the
> > existing one?
>
> Hey, I have a great idea. Why don't you help instead of just bitching?

When you get told, "your fix is unacceptable because it fixes a problem I
don't care about," it's clear than any fix to the problem will have the
same failing, so what's the point of even trying?

I was under the impression that you and Alex Chiang knew there were "real"
users of fakephp and just didn't care. But it looks like I was mistaken
about that, so maybe it's not hopeless.

As a token of goodwill, have this spare PCI patch:


>From d962157b2b36f2c54d147a296921553b4aefcf7b Mon Sep 17 00:00:00 2001
From: Trent Piepho <[email protected]>
Date: Sun, 30 Nov 2008 16:51:29 -0800
Subject: [PATCH] PCI: Make settable sysfs attributes more consistent

PCI devices have three settable boolean attributes, enable,
broken_parity_status, and msi_bus.

The store functions for these would silently interpret "0x01" as false,
"1llogical" as true, and "true" would be (silently!) ignored and do
nothing.

This is inconsistent with typical sysfs handling of settable attributes,
and just plain doesn't make much sense.

So, use strict_strtoul(), which was created for this purpose. The store
functions will treat a value of 0 as false, non-zero as true, and return
-EINVAL for a parse failure.

Additionally, is_enabled_store() and msi_bus_store() return -EPERM if
CAP_SYS_ADMIN is lacking, rather than silently doing nothing. This is more
typical behavior for sysfs attributes that need a capability.

And msi_bus_store() will only print the "forced subordinate bus ..."
warning if the MSI flag was actually forced to a different value.

Signed-off-by: Trent Piepho <[email protected]>
---
drivers/pci/pci-sysfs.c | 48 +++++++++++++++++++++++++++-------------------
1 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 5d72866..9a5bdc0 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -58,13 +58,14 @@ static ssize_t broken_parity_status_store(struct device *dev,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
- ssize_t consumed = -EINVAL;
+ unsigned long val;

- if ((count > 0) && (*buf == '0' || *buf == '1')) {
- pdev->broken_parity_status = *buf == '1' ? 1 : 0;
- consumed = count;
- }
- return consumed;
+ if (strict_strtoul(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ pdev->broken_parity_status = !!val;
+
+ return count;
}

static ssize_t local_cpus_show(struct device *dev,
@@ -133,19 +134,23 @@ static ssize_t is_enabled_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t count)
{
- ssize_t result = -EINVAL;
struct pci_dev *pdev = to_pci_dev(dev);
+ unsigned long val;
+ ssize_t result = strict_strtoul(buf, 0, &val);
+
+ if (result < 0)
+ return result;

/* this can crash the machine when done on the "wrong" device */
if (!capable(CAP_SYS_ADMIN))
- return count;
+ return -EPERM;

- if (*buf == '0') {
+ if (!val) {
if (atomic_read(&pdev->enable_cnt) != 0)
pci_disable_device(pdev);
else
result = -EIO;
- } else if (*buf == '1')
+ } else
result = pci_enable_device(pdev);

return result < 0 ? result : count;
@@ -185,25 +190,28 @@ msi_bus_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
+ unsigned long val;
+
+ if (strict_strtoul(buf, 0, &val) < 0)
+ return -EINVAL;

/* bad things may happen if the no_msi flag is changed
* while some drivers are loaded */
if (!capable(CAP_SYS_ADMIN))
- return count;
+ return -EPERM;

+ /* Maybe pci devices without subordinate busses shouldn't even have this
+ * attribute in the first place? */
if (!pdev->subordinate)
return count;

- if (*buf == '0') {
- pdev->subordinate->bus_flags |= PCI_BUS_FLAGS_NO_MSI;
- dev_warn(&pdev->dev, "forced subordinate bus to not support MSI,"
- " bad things could happen.\n");
- }
+ /* Is the flag going to change, or keep the value it already had? */
+ if (!(pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI) ^
+ !!val) {
+ pdev->subordinate->bus_flags ^= PCI_BUS_FLAGS_NO_MSI;

- if (*buf == '1') {
- pdev->subordinate->bus_flags &= ~PCI_BUS_FLAGS_NO_MSI;
- dev_warn(&pdev->dev, "forced subordinate bus to support MSI,"
- " bad things could happen.\n");
+ dev_warn(&pdev->dev, "forced subordinate bus to%s support MSI,"
+ " bad things could happen\n", val ? "" : " not");
}

return count;
--
1.5.4.1

2008-12-01 13:00:29

by Trent Piepho

[permalink] [raw]
Subject: Problems with fakephp

On Fri, 28 Nov 2008, Alex Chiang wrote:
> * Trent Piepho <[email protected]>:
> Let's try and find a solution that will work for everyone. I
> don't think that going back to one entry in /sys/bus/pci/slots
> per function is a solution.
>
> Last time, Willy proposed a few ideas in this thread:
>
> http://lkml.org/lkml/2008/9/9/28

It looks like you weren't opposed to reverting the original patch. I
think that should be done for 2.6.27 and 2.6.28, to restore the existing
interface. I can made the patch. I realize that fakephp's (ab)use of the
hotplug system doesn't fit in with your plans, and there's no reason not
to do anything about that. But it should be done the right way. fakephp
was perhaps not the best interface for getting linux to re-scan for PCI
devices and for removing existing ones, but none the less it was the long
established and only interface for this.

What should be done is to create a better interface for disabling PCI
devices/functions/bridges and re-scanning. Then create a compatibility
system that will allow software the used fakephp to continue to work. Now
fakephp can change to be more like a real hotplug driver. At some time in
future, the compatibility layer, which will have been listed in the
feature removal schedule, can be removed.

You might also be interested to know that fakephp as it is now has a bug
in it. Another reason to revert the change. Removing a bridge doesn't
work so well.

# lspci -tv
+-10.0 LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ul
\-11.0-[0000:02]--+-00.0 Advanced Micro Devices [AMD] 79c970 [PCn
+-01.0 Ensoniq ES1371 [AudioPCI-97]
\-02.0 VMware Inc Unknown device 0770
# cat fake[6-9]/address
0000:00:11
0000:02:00
0000:02:01
0000:02:02
# echo 0 > fake6/power
# lspci -tv
\-10.0 LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ul
[ 00:11 and everything behind it are gone, as expected ]
# ls
fake1/ fake2/ fake3/ fake4/ fake5/ fake7/ fake8/ fake9/
[ notice that fake[7-9] are still there! ]
# cat fake6/address
Segmentation fault
BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c02089c3>] address_read_file+0x23/0x70

The problem is that fakephp's slots aren't really slots, they are devices.
If a PCI device goes away and it's not done with fakephp, then fakephp
doesn't know about it and the fakephp "slot" sticks around pointing to a
removed device. I think I can come up with a better system that fixes
these problems.

I also noticed this in drivers/pci/probe.c:pci_scan_device()

list_for_each_entry(slot, &bus->slots, list)
if(PCI_SLOT(devfn) == slot->number)
dev->slot = slot;

This appears to assume that there will only one slot with a given number
on the same bus. But if fakephp and real hotplug driver are both loaded,
this won't be the case anymore, will it?

I also have to wonder, if the bus + slot number is unique, then why not
use that to name the slot's kobject? Instead of something like "fake123",
where the number is basically meaningless. If you have a meaningful
unique ID, why not use that?

2008-12-01 13:37:29

by Trent Piepho

[permalink] [raw]
Subject: Problems with fakephp

On Fri, 28 Nov 2008, Alex Chiang wrote:
> I like this idea:
>
> Maybe we want a /sys/bus/pci/scan or
> /sys/bus/pci/devices/scan file that we can echo
> "0000:01:02.3" to scan just that function, or
> "0000:01:02" to scan the device.
>
> Will that work for you, Trent?

I don't think that's the best interface for it. If you want to know what
bus number a SCSI bus was assigned, you can find that out from /proc/scsi.
The SCSI ID is usually explicitly set by the user on the device. So if you
want to add a scsi device, you should be able to know all the parts of the
address.

But PCI is different. If the PCI devices on your computer right now
disappeared from Linux, would you know their IDs off the top of your head?
I sure wouldn't. And how would you find them out? Suppose someone plugged
an FPGA card into a sever PCI system with multiple busses and bridges. How
would they ever guess what ID to scan? What function numbers might be
present once the FPGA is programmed?

I think a much more useful interface would be a "scan" file that will just
trigger a rescan of everything. Even the users who know what ID to scan
would probably rather not be bothered to be forced to specify.

Maybe each PCI device could get a 'rescan' attribute that triggers a rescan
of that device for new functions and recursively rescans any subordinate
busses if it's a bridge? That might be useful too.

But anyway, how about removing PCI devices. I see adding a "remove"
attribute was mentioned. When I first had the problem with getting the
FPGA re-scanned I was going to add something like that, but searching for
what people with a similar problem has done turned up fakephp as the
established interface.

I've made a patch to add the remove attribute. It ends up not being that
much code. fakephp is a lot more complex that it needs to be.

However, fakephp does not like this! As I mentioned earlier, it can't
handle something other than itself causing a PCI device to be removed.

So I came up with a compatibility layer that creates directories in
bus/pci/slots the same way fakephp used too. This layer _can_ handle
devices being removed (and added!) by other means. It ends up being a lot
simpler than fakephp too. It doesn't use hotplug at all and should be able
to coexist with real hotplug drivers and fakephp.

I'll followup with the patches. So far this only handles removal. I
haven't done bus rescanning yet.

2008-12-01 14:09:19

by Trent Piepho

[permalink] [raw]
Subject: [PATCH] PCI: Method for removing PCI devices

This patch adds an attribute named "remove" to a PCI device's sysfs
directory. Writing a non-zero value to this attribute will remove the PCI
device and any children of it.
---
drivers/pci/pci-sysfs.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9a5bdc0..fe0f6a9 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -217,6 +217,31 @@ msi_bus_store(struct device *dev, struct device_attribute *attr,
return count;
}

+static void remove_callback(void *data)
+{
+ pci_remove_bus_device((struct pci_dev *)data);
+}
+
+static ssize_t
+remove_store(struct device *dev, struct device_attribute *dummy,
+ const char *buf, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ unsigned long val;
+
+ if (strict_strtoul(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (val)
+ sysfs_schedule_callback(&dev->kobj, remove_callback, pdev,
+ THIS_MODULE);
+
+ return count;
+}
+
struct device_attribute pci_dev_attrs[] = {
__ATTR_RO(resource),
__ATTR_RO(vendor),
@@ -235,6 +260,7 @@ struct device_attribute pci_dev_attrs[] = {
__ATTR(broken_parity_status,(S_IRUGO|S_IWUSR),
broken_parity_status_show,broken_parity_status_store),
__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
+ __ATTR(remove, S_IWUSR, NULL, remove_store),
__ATTR_NULL,
};

--
1.5.4.1

2008-12-01 14:09:34

by Trent Piepho

[permalink] [raw]
Subject: [PATCH] PCI: Legacy fakephp driver

This module has an interface that's compatible with how fakephp's has been.

It puts entries in /sys/bus/pci/slots with the names of all PCI
devices/functions. Each one has a "power" attribute, which works the same
way as the fakephp driver's power attribute has worked.

There are a few improvements over fakephp, which couldn't handle PCI
devices being added or removed via a means other than fakephp's actions.
If is device was added another way, fakephp doesn't notice and create the
fake slot for it. If a device was removed another way, fakephp doesn't
delete the fake slot for it (and accessing the stale slot will cause an
oops). This is fixed. As a consequence of this, removing a bridge with
other devices behind it works as well, which is something else fakephp
couldn't do.

This duplicates a tiny bit of the code in the PCI core that does this same
function. Re-using that code ends up being more complex than duplicating
it, and it makes code in the PCI core more ugly just to support this legacy
fakephp interface compatibility layer.

Doing a rescan isn't supported yet.
---
drivers/pci/hotplug/Kconfig | 5 +
drivers/pci/hotplug/Makefile | 1 +
drivers/pci/hotplug/legacy_fakephp.c | 163 ++++++++++++++++++++++++++++++++++
3 files changed, 169 insertions(+), 0 deletions(-)
create mode 100644 drivers/pci/hotplug/legacy_fakephp.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index eacfb13..d26bc68 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -39,6 +39,11 @@ config HOTPLUG_PCI_FAKE

When in doubt, say N.

+config HOTPLUG_PCI_LEGACY_FAKE
+ tristate "Legacy Fake PCI Hotplug driver"
+ help
+ Provides an interface like the fakephp driver used to.
+
config HOTPLUG_PCI_COMPAQ
tristate "Compaq PCI Hotplug driver"
depends on X86 && PCI_BIOS && PCI_LEGACY
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 9bdbe1a..9ad009d 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_SHPC) += shpchp.o
obj-$(CONFIG_HOTPLUG_PCI_RPA) += rpaphp.o
obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR) += rpadlpar_io.o
obj-$(CONFIG_HOTPLUG_PCI_SGI) += sgi_hotplug.o
+obj-$(CONFIG_HOTPLUG_PCI_LEGACY_FAKE) += legacy_fakephp.o

# Link this last so it doesn't claim devices that have a real hotplug driver
obj-$(CONFIG_HOTPLUG_PCI_FAKE) += fakephp.o
diff --git a/drivers/pci/hotplug/legacy_fakephp.c b/drivers/pci/hotplug/legacy_fakephp.c
new file mode 100644
index 0000000..a13cf98
--- /dev/null
+++ b/drivers/pci/hotplug/legacy_fakephp.c
@@ -0,0 +1,163 @@
+/* Works like the fakephp driver used to, except a little better.
+ *
+ * - It's possible to remove devices with subordinate busses.
+ * - New PCI devices that appear via any method, not just a fakephp triggered
+ * rescan, will be noticed.
+ * - Devices that are removed via any method, not just a fakephp triggered
+ * removal, will also be noticed.
+ *
+ * Uses nothing from the pci-hotplug subsystem.
+ *
+ * Currently you can't do a bus rescan, but that should be fixed.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <linux/pci.h>
+#include "../pci.h"
+
+struct legacy_slot {
+ struct kobject kobj;
+ struct pci_dev *dev;
+ struct list_head list;
+};
+
+static LIST_HEAD(legacy_list);
+
+static ssize_t legacy_show(struct kobject *kobj, struct attribute *attr,
+ char *buf)
+{
+ struct legacy_slot *slot = container_of(kobj, typeof(*slot), kobj);
+ strcpy(buf, "1\n");
+ return 2;
+}
+
+static void remove_callback(void *data)
+{
+ pci_remove_bus_device((struct pci_dev *)data);
+}
+
+static ssize_t legacy_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t len)
+{
+ struct legacy_slot *slot = container_of(kobj, typeof(*slot), kobj);
+ unsigned long val;
+
+ if (strict_strtoul(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ if (val)
+ return len; /* rescan should go here */
+ else
+ sysfs_schedule_callback(&slot->dev->dev.kobj, remove_callback,
+ slot->dev, THIS_MODULE);
+ return len;
+}
+
+static struct attribute *legacy_attrs[] = {
+ &(struct attribute){ .name = "power", .mode = 0644 },
+ NULL,
+};
+
+static void legacy_release(struct kobject *kobj)
+{
+ struct legacy_slot *slot = container_of(kobj, typeof(*slot), kobj);
+
+ pci_dev_put(slot->dev);
+ kfree(slot);
+}
+
+static struct kobj_type legacy_ktype = {
+ .sysfs_ops = &(struct sysfs_ops){
+ .store = legacy_store, .show = legacy_show
+ },
+ .release = &legacy_release,
+ .default_attrs = legacy_attrs,
+};
+
+static int legacy_add_slot(struct pci_dev *pdev)
+{
+ struct legacy_slot *slot = kzalloc(sizeof(*slot), GFP_KERNEL);
+
+ if (!slot)
+ return -ENOMEM;
+
+ if (kobject_init_and_add(&slot->kobj, &legacy_ktype,
+ &pci_slots_kset->kobj, "%s",
+ pdev->dev.bus_id)) {
+ dev_warn(&pdev->dev, "Failed to created legacy fake slot\n");
+ return -EINVAL;
+ }
+ slot->dev = pci_dev_get(pdev);
+
+ list_add(&slot->list, &legacy_list);
+
+ return 0;
+}
+
+static int legacy_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct pci_dev *pdev = to_pci_dev(data);
+
+ if (action == BUS_NOTIFY_ADD_DEVICE) {
+ legacy_add_slot(pdev);
+ } else if (action == BUS_NOTIFY_DEL_DEVICE) {
+ struct legacy_slot *slot;
+
+ list_for_each_entry(slot, &legacy_list, list)
+ if (slot->dev == pdev)
+ goto found;
+
+ dev_warn(&pdev->dev, "Missing legacy fake slot?");
+ return -ENODEV;
+found:
+ kobject_del(&slot->kobj);
+ list_del(&slot->list);
+ kobject_put(&slot->kobj);
+ }
+
+ return 0;
+}
+
+static struct notifier_block legacy_notifier = {
+ .notifier_call = legacy_notify
+};
+
+static int __init init_legacy(void)
+{
+ struct pci_dev *pdev = NULL;
+
+ /* Add existing devices */
+ while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev)))
+ legacy_add_slot(pdev);
+
+ /* Be alerted of any new ones */
+ bus_register_notifier(&pci_bus_type, &legacy_notifier);
+ return 0;
+}
+module_init(init_legacy);
+
+static void __exit remove_legacy(void)
+{
+ struct legacy_slot *slot, *tmp;
+
+ bus_unregister_notifier(&pci_bus_type, &legacy_notifier);
+
+ list_for_each_entry_safe(slot, tmp, &legacy_list, list) {
+ list_del(&slot->list);
+ kobject_del(&slot->kobj);
+ kobject_put(&slot->kobj);
+ }
+}
+module_exit(remove_legacy);
+
+
+MODULE_AUTHOR("Trent Piepho <[email protected]>");
+MODULE_DESCRIPTION("Legacy version of the fakephp interface");
+MODULE_LICENSE("GPL");
--
1.5.4.1

2008-12-01 14:44:09

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] PCI: Method for removing PCI devices

On Mon, Dec 01, 2008 at 06:08:09AM -0800, Trent Piepho wrote:
> This patch adds an attribute named "remove" to a PCI device's sysfs
> directory. Writing a non-zero value to this attribute will remove the PCI
> device and any children of it.

Please always add documentation for new sysfs files in Documentation/ABI
so that we know what they do, and how to use them.

thanks,

greg k-h

2008-12-01 17:08:49

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

Trent Piepho wrote:
> On Fri, 28 Nov 2008, Rolf Eike Beer wrote:
> > Trent Piepho wrote:
> > > On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> > > > On Wed, Nov 26, 2008 at 03:55:35PM -0700, Alex Chiang wrote:
> > > > > > Maybe it's different on powerpc then? My pseudo-hotplugable
> > > > > > device is also the only thing connected to the PCI-E host bus
> > > > > > controller. At boot the controller is empty and so I think some
> > > > > > code to enable its BARs gets skipped. But without the
> > > > > > pci_enable_device(), I get this:
> > > > > >
> > > > > > 01:00.0 Signal processing controller: Freescale Semiconductor Inc
> > > > > > Aurora Nexus Trace Interface Flags: fast devsel, IRQ 255
> > > > > > Memory at 40000000 (64-bit, prefetchable) [disabled]
> > > > > > [size=4K]
> > > >
> > > > Are you referring to this? ^^^^^^^^^^
> > > >
> > > > Without seeing the raw dump of the PCI config space, it looks to me
> > > > like the memory space enable bit of the PCICMD register is unset.
> > > > Probably the device driver should call pci_enable_device() at init
> > > > time, though I suppose you did say earlier that there is no driver.
> > >
> > > Yes, that's it. It seems like since the BARs are normally enabled
> > > after a device is scanned at boot time that they should also be enabled
> > > when a device is found by a fakephp rescan. So I thought it seemed
> > > reasonable to put pci_enable_device() in fakephp.
> >
> > No, pci_enable_device() will be called by the device driver. The hotplug
> > drivers have nothing to do with that.
>
> I guess you didn't read the part about there not being a device driver?

I read it, but that's the way a kernel works: if you want to talk to a device,
get a driver. You can write a rather minimal one that does only
pci_enable_device() on probe and pci_disable_device() on remove. Try the one
posted by Chris Wright in "[PATCH 2/2] PCI: pci-stub module to reserve pci
device" as a starting point.

Eike


Attachments:
(No filename) (1.96 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-02 03:16:54

by Alex Chiang

[permalink] [raw]
Subject: Re: Problems with fakephp

Hi Trent,

I think there are now enough ideas in this thread that they're
starting to get confusing.

1) Function vs. device removal
2) User interface
3) Existing fakephp bugs

For (1), do you need function level hotplug? Or will you be able
to get away with device level?

The answer to (2) will naturally flow from your answer to (1).

A few responses for you, along with some of my thoughts...

* Trent Piepho <[email protected]>:
> It looks like you weren't opposed to reverting the original
> patch. I think that should be done for 2.6.27 and 2.6.28, to
> restore the existing interface.

Unfortunately, it won't be that simple. The reason is, the PCI
slot logic expects a bus/slot tuple. It doesn't know anything
about functions. In the context of the original goal (physical
slots), this design decision made sense.

So we can't do a simple revert of fe99740c because:

- fakephp registers with PCI hotplug core for device xxxx:yy.0
- fakephp tries to register device xxxx:yy.1
- PCI hotplug core returns -EBUSY because the _device_ is
already claimed

I played around with this today and encountered the above problem.

Now, if _device_ level hotplug is sufficient for you, then fixing
(2) and (3) within the context of fakephp is pretty easy. The
interface would have the limitation of only displaying function 0
per given device.

[root@canola slots]# pwd
/sys/bus/pci/slots

# my test patch applied to fakephp to get domain:bus:slot.fn in sysfs
[root@canola slots]# ls
0000:00:01.0 0000:0f:00.0 0000:4e:00.0 0000:50:01.0 0000:c2:00.0
0000:00:02.0 0000:10:00.0 0000:4f:00.0 0000:51:00.0
0000:00:04.0 0000:48:02.0 0000:50:00.0 0000:89:00.0

We can then work on fixing the fakephp bugs that you've found.

Something tells me that you're interested in _function_ level
hotplug though. Can you confirm this?

> What should be done is to create a better interface for
> disabling PCI devices/functions/bridges and re-scanning. Then
> create a compatibility system that will allow software the used
> fakephp to continue to work. Now fakephp can change to be more
> like a real hotplug driver. At some time in future, the
> compatibility layer, which will have been listed in the feature
> removal schedule, can be removed.

I took a _very_ brief look at the compatibility driver you wrote.
It _is_ a lot simpler than fakephp. ;)

You've convinced me that the 'fake%d' names are pretty useless. I
was thinking of submitting my test patch that shows the
domain:bus:slot.fn output above.

If I do that, then fakephp will have issues with the
legacy_fakephp driver you wrote because we will have name
collisions in sysfs.

So the question is, do we keep the existing sucky fakephp mess
that I created and go with your new legacy driver as the way
forward?

Or do we try and fix up fakephp?

Again, I think the answer to this question is whether you need
function or device level hotplug. If the former, then we go with
your new legacy driver and the fakephp interface has to remain
sucky.

If the latter, then we can keep fakephp and just get it usable
again.

> I also noticed this in drivers/pci/probe.c:pci_scan_device()
>
> list_for_each_entry(slot, &bus->slots, list)
> if(PCI_SLOT(devfn) == slot->number)
> dev->slot = slot;
>
> This appears to assume that there will only one slot with a given number
> on the same bus. But if fakephp and real hotplug driver are both loaded,
> this won't be the case anymore, will it?

Only one hotplug driver can claim a device at a time, so the
above won't be an issue.

> I think a much more useful interface would be a "scan" file
> that will just trigger a rescan of everything. Even the users
> who know what ID to scan would probably rather not be bothered
> to be forced to specify.

Yes, I think a generic /sys/bus/pci/scan is a good idea.

> Maybe each PCI device could get a 'rescan' attribute that
> triggers a rescan of that device for new functions and
> recursively rescans any subordinate busses if it's a bridge?
> That might be useful too.

Also a good idea.

Thanks.

/ac

2008-12-03 04:07:26

by Trent Piepho

[permalink] [raw]
Subject: Re: Problems with fakephp

On Mon, 1 Dec 2008, Alex Chiang wrote:
> I think there are now enough ideas in this thread that they're
> starting to get confusing.
>
> 1) Function vs. device removal
> 2) User interface
> 3) Existing fakephp bugs
>
> For (1), do you need function level hotplug? Or will you be able
> to get away with device level?

While I have some hardware the can use function level hotplug (secondary
functions can be controlled by registers in the primary function), it's not
something *I* make use of. But function level hotplug has been there for
years so it seems like a regression to remove that ability and break the
existing interface.

I guess my first though is should be there a new interface, as part of the
pci core rather than pci hotplug, for adding/removing devices from Linux's
view? By devices I mean in the Linux "struct device" sense, so PCI
functions.

I think that seems reasonable. fakephp isn't the best interface. My patch
to add "remove" to pci-sysfs ended up being very simple, unless there's a
serious flaw in it I've overlooked.

So once we have that the question becomes how to keep some compatibility
with the old fakephp interface. Either a new legacy compat module like
I've done or by fixing fakephp.

I'm more inclined to have the new legacy compat module:

- It's quite a bit simpler than fakephp so far.
- It already works better than fakephp ever did. fakephp can't do
recursive bridge removal and won't co-exist well with a new pci core
remove/add interface.
- Fakephp's use of devices as "slots" appears to be fundamentally at odds
with the hotplug core. It's just going to cause problems in the future.
The new compat module doesn't use hotplug at all, so it shouldn't get in
the way.

> > It looks like you weren't opposed to reverting the original
> > patch. I think that should be done for 2.6.27 and 2.6.28, to
> > restore the existing interface.
>
> So we can't do a simple revert of fe99740c because:
>
> - fakephp registers with PCI hotplug core for device xxxx:yy.0
> - fakephp tries to register device xxxx:yy.1
> - PCI hotplug core returns -EBUSY because the _device_ is
> already claimed

Hmm, I see what you mean. I'd really like to fix 2.6.27 though. I suppose
it would be if one left out sub-functions? Partially broken being better
than entirely broken.

> You've convinced me that the 'fake%d' names are pretty useless. I
> was thinking of submitting my test patch that shows the
> domain:bus:slot.fn output above.

If fn must be 0, then maybe use domain:bus:slot? That way there's no
conflict with the old fakephp interface.

2008-12-03 04:38:24

by Alex Chiang

[permalink] [raw]
Subject: Re: Problems with fakephp

* Trent Piepho <[email protected]>:
> On Mon, 1 Dec 2008, Alex Chiang wrote:
> > I think there are now enough ideas in this thread that they're
> > starting to get confusing.
> >
> > 1) Function vs. device removal
> > 2) User interface
> > 3) Existing fakephp bugs
> >
> > For (1), do you need function level hotplug? Or will you be able
> > to get away with device level?
>
> While I have some hardware the can use function level hotplug (secondary
> functions can be controlled by registers in the primary function), it's not
> something *I* make use of. But function level hotplug has been there for
> years so it seems like a regression to remove that ability and break the
> existing interface.

That seems reasonable.

> I guess my first though is should be there a new interface, as part of the
> pci core rather than pci hotplug, for adding/removing devices from Linux's
> view? By devices I mean in the Linux "struct device" sense, so PCI
> functions.
>
> I think that seems reasonable. fakephp isn't the best interface. My patch
> to add "remove" to pci-sysfs ended up being very simple, unless there's a
> serious flaw in it I've overlooked.

I think we should definitely merge your 'remove' attribute patch
for PCI functions. That should be independent of the rest of our
discussion.

It will probably help the SR-IOV folks too.

> So once we have that the question becomes how to keep some compatibility
> with the old fakephp interface. Either a new legacy compat module like
> I've done or by fixing fakephp.
>
> I'm more inclined to have the new legacy compat module:
>
> - It's quite a bit simpler than fakephp so far.
> - It already works better than fakephp ever did. fakephp can't do
> recursive bridge removal and won't co-exist well with a new pci core
> remove/add interface.
> - Fakephp's use of devices as "slots" appears to be fundamentally at odds
> with the hotplug core. It's just going to cause problems in the future.
> The new compat module doesn't use hotplug at all, so it shouldn't get in
> the way.

Maybe we should just replace fakephp wholesale with your new
driver?

Or coming at it from another angle, I don't see what benefit
we'll have from keeping both fakephp and your driver. And if
fakephp is as broken as you describe, then it will only cause
more confusion if a user loads both fakephp and legacy_fakephp.

If the user removes a bridge via the correct legacy_fakephp
interface, fakephp won't notice, and we'll just have a broken
mess on our hands.

It would be better to have just one, correctly working fakephp,
even if the implementation is 100% different and truly not even a
"real" hotplug driver.

I think the way forward is:

- merge in the function level hotplug patch
- wholesale replacement of fakephp with new fakephp
- schedule new fakephp for deprecation
- encourage anyone who wants function level hotplug to
use the 'remove' attribute

Thoughts? Jesse, Willy, Eike, Greg?

Thanks.

/ac

2008-12-03 17:23:19

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: Problems with fakephp

Alex Chiang wrote:
> * Trent Piepho <[email protected]>:
> > On Mon, 1 Dec 2008, Alex Chiang wrote:
> > > I think there are now enough ideas in this thread that they're
> > > starting to get confusing.
> > >
> > > 1) Function vs. device removal
> > > 2) User interface
> > > 3) Existing fakephp bugs
> > >
> > > For (1), do you need function level hotplug? Or will you be able
> > > to get away with device level?
> >
> > While I have some hardware the can use function level hotplug (secondary
> > functions can be controlled by registers in the primary function), it's
> > not something *I* make use of. But function level hotplug has been there
> > for years so it seems like a regression to remove that ability and break
> > the existing interface.
>
> That seems reasonable.
>
> > I guess my first though is should be there a new interface, as part of
> > the pci core rather than pci hotplug, for adding/removing devices from
> > Linux's view? By devices I mean in the Linux "struct device" sense, so
> > PCI functions.
> >
> > I think that seems reasonable. fakephp isn't the best interface. My
> > patch to add "remove" to pci-sysfs ended up being very simple, unless
> > there's a serious flaw in it I've overlooked.
>
> I think we should definitely merge your 'remove' attribute patch
> for PCI functions. That should be independent of the rest of our
> discussion.
>
> It will probably help the SR-IOV folks too.
>
> > So once we have that the question becomes how to keep some compatibility
> > with the old fakephp interface. Either a new legacy compat module like
> > I've done or by fixing fakephp.
> >
> > I'm more inclined to have the new legacy compat module:
> >
> > - It's quite a bit simpler than fakephp so far.
> > - It already works better than fakephp ever did. fakephp can't do
> > recursive bridge removal and won't co-exist well with a new pci core
> > remove/add interface.
> > - Fakephp's use of devices as "slots" appears to be fundamentally at odds
> > with the hotplug core. It's just going to cause problems in the future.
> > The new compat module doesn't use hotplug at all, so it shouldn't get in
> > the way.
>
> Maybe we should just replace fakephp wholesale with your new
> driver?
>
> Or coming at it from another angle, I don't see what benefit
> we'll have from keeping both fakephp and your driver. And if
> fakephp is as broken as you describe, then it will only cause
> more confusion if a user loads both fakephp and legacy_fakephp.
>
> If the user removes a bridge via the correct legacy_fakephp
> interface, fakephp won't notice, and we'll just have a broken
> mess on our hands.
>
> It would be better to have just one, correctly working fakephp,
> even if the implementation is 100% different and truly not even a
> "real" hotplug driver.
>
> I think the way forward is:
>
> - merge in the function level hotplug patch

Sorry that I don't get the point. To PCI Hotplug core or to fakephp?

> - wholesale replacement of fakephp with new fakephp
> - schedule new fakephp for deprecation
^^^

I don't think so.

> - encourage anyone who wants function level hotplug to
> use the 'remove' attribute
>
> Thoughts? Jesse, Willy, Eike, Greg?

Oh yes, let's start using dummyphp ;) That one already handled the rescan long
ago. But I think it's a bit outdated at the moment, I haven't touched it for
month. Looks like I need to bring it back to live.

Eike


Attachments:
(No filename) (3.36 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-03 17:43:43

by Alex Chiang

[permalink] [raw]
Subject: Re: Problems with fakephp

* Rolf Eike Beer <[email protected]>:
> Alex Chiang wrote:
> > I think the way forward is:
> >
> > - merge in the function level hotplug patch
>
> Sorry that I don't get the point. To PCI Hotplug core or to fakephp?

I was talking about Trent's patch to add the "remove" attribute
to the pci-sysfs. Not fakephp.

> > - wholesale replacement of fakephp with new fakephp
> > - schedule new fakephp for deprecation
> ^^^
>
> I don't think so.

If we get function level reset as part of the PCI core, then I
don't see what fakephp offers us anymore.

> > - encourage anyone who wants function level hotplug to
> > use the 'remove' attribute
> >
> > Thoughts? Jesse, Willy, Eike, Greg?
>
> Oh yes, let's start using dummyphp ;) That one already handled
> the rescan long ago. But I think it's a bit outdated at the
> moment, I haven't touched it for month. Looks like I need to
> bring it back to live.

I take it you are not impressed with my proposal? Care to explain why not?

/ac

2008-12-03 17:56:12

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: Problems with fakephp

Alex Chiang wrote:
> * Rolf Eike Beer <[email protected]>:
> > Alex Chiang wrote:
> > > I think the way forward is:
> > >
> > > - merge in the function level hotplug patch
> >
> > Sorry that I don't get the point. To PCI Hotplug core or to fakephp?
>
> I was talking about Trent's patch to add the "remove" attribute
> to the pci-sysfs. Not fakephp.
>
> > > - wholesale replacement of fakephp with new fakephp
> > > - schedule new fakephp for deprecation
> >
> > ^^^
> >
> > I don't think so.
>
> If we get function level reset as part of the PCI core, then I
> don't see what fakephp offers us anymore.

Why add a new fakephp if you want to remove it right after that?

> > > - encourage anyone who wants function level hotplug to
> > > use the 'remove' attribute
> > >
> > > Thoughts? Jesse, Willy, Eike, Greg?
> >
> > Oh yes, let's start using dummyphp ;) That one already handled
> > the rescan long ago. But I think it's a bit outdated at the
> > moment, I haven't touched it for month. Looks like I need to
> > bring it back to live.
>
> I take it you are not impressed with my proposal? Care to explain why not?

It's a long fight between me and Greg about fakephp. I wrote dummyphp, fakephp
is a for of an early version of dummyphp that I never liked. So if anyone
comes up with "fakephp can not do $foobar" my first answer is "try dummyphp".
So if you want to remove fakephp I'm the first do support you *eg* Yes, this
probably is mostly personal taste and not so much technical, but I'm just
human ;)

Eike


Attachments:
(No filename) (1.52 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-03 18:22:20

by Alex Chiang

[permalink] [raw]
Subject: Re: Problems with fakephp

* Rolf Eike Beer <[email protected]>:
> Alex Chiang wrote:
> > * Rolf Eike Beer <[email protected]>:
> > > Alex Chiang wrote:
> > > > - wholesale replacement of fakephp with new fakephp
> > > > - schedule new fakephp for deprecation
> > >
> > > ^^^
> > >
> > > I don't think so.
> >
> > If we get function level reset as part of the PCI core, then I
> > don't see what fakephp offers us anymore.
>
> Why add a new fakephp if you want to remove it right after that?

How we got here:

- we made changes to PCI core that says,
"/sys/bus/pci/slots/ is for _physical_ slots"

- that series of changes broke an existing interface
(fakephp) that had real users

- the existing interface was interesting because it was
the only way that allowed for function level hotplug

So, if we can get function level hotplug via a different
interface (by adding a "remove" attribute to pci-sysfs), then I
don't really see a strong reason to keep fakephp.

We also don't want to break existing software (more than we
already have :-/ ) so we need a transition period to get users
over to the new "remove" interface.

A deprecation period is usually pretty long, several years, iirc.

> > > > - encourage anyone who wants function level
> > > > hotplug to use the 'remove' attribute
> > > >
> > > > Thoughts? Jesse, Willy, Eike, Greg?
> > >
> > > Oh yes, let's start using dummyphp ;) That one already
> > > handled the rescan long ago. But I think it's a bit
> > > outdated at the moment, I haven't touched it for month.
> > > Looks like I need to bring it back to live.
> >
> > I take it you are not impressed with my proposal? Care to
> > explain why not?
>
> It's a long fight between me and Greg about fakephp. I wrote
> dummyphp, fakephp is a for of an early version of dummyphp that
> I never liked. So if anyone comes up with "fakephp can not do
> $foobar" my first answer is "try dummyphp". So if you want to
> remove fakephp I'm the first do support you *eg* Yes, this
> probably is mostly personal taste and not so much technical,
> but I'm just human ;)

Ok, well I haven't seen dummyphp.

I don't think we should really care that much about dummyphp vs.
new fakephp as long as we complete the end goal, which in my
opinion is:

/sys/bus/pci/slots/ represents physical slots and device hotplug

/sys/devices/<bus>/<function>/remove used for function hotplug

If we can use dummyphp to get us through the transition period,
meaning it can:

- provide an interface compatible with fakephp
- allow recursive hotplug (remove a bridge and all
functions behind it)
- recognize if a function has been hotplugged via a
different interface
- relatively simple implementation

Then I'm sure that Trent won't mind whether we use your dummyphp
or his new fakephp.

I'm wondering though, if dummyphp uses the PCI hotplug API, it
will probably suffer from the same limitation that the current
fakephp does, which is that the PCI hotplug core will only allow
drivers to claim on a _device_ level, not _function_ level.

Care to post dummyphp for us to see?

Thanks.

/ac

2008-12-08 21:10:20

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: Problems with fakephp

Alex Chiang wrote:
> * Rolf Eike Beer <[email protected]>:
> > Alex Chiang wrote:
> > > * Rolf Eike Beer <[email protected]>:
> > > > Alex Chiang wrote:
> > > > > - wholesale replacement of fakephp with new fakephp
> > > > > - schedule new fakephp for deprecation
> > > >
> > > > ^^^
> > > >
> > > > I don't think so.
> > >
> > > If we get function level reset as part of the PCI core, then I
> > > don't see what fakephp offers us anymore.
> >
> > Why add a new fakephp if you want to remove it right after that?
>
> How we got here:
>
> - we made changes to PCI core that says,
> "/sys/bus/pci/slots/ is for _physical_ slots"
>
> - that series of changes broke an existing interface
> (fakephp) that had real users
>
> - the existing interface was interesting because it was
> the only way that allowed for function level hotplug
>
> So, if we can get function level hotplug via a different
> interface (by adding a "remove" attribute to pci-sysfs), then I
> don't really see a strong reason to keep fakephp.
>
> We also don't want to break existing software (more than we
> already have :-/ ) so we need a transition period to get users
> over to the new "remove" interface.
>
> A deprecation period is usually pretty long, several years, iirc.
>
> > > > > - encourage anyone who wants function level
> > > > > hotplug to use the 'remove' attribute
> > > > >
> > > > > Thoughts? Jesse, Willy, Eike, Greg?
> > > >
> > > > Oh yes, let's start using dummyphp ;) That one already
> > > > handled the rescan long ago. But I think it's a bit
> > > > outdated at the moment, I haven't touched it for month.
> > > > Looks like I need to bring it back to live.
> > >
> > > I take it you are not impressed with my proposal? Care to
> > > explain why not?
> >
> > It's a long fight between me and Greg about fakephp. I wrote
> > dummyphp, fakephp is a for of an early version of dummyphp that
> > I never liked. So if anyone comes up with "fakephp can not do
> > $foobar" my first answer is "try dummyphp". So if you want to
> > remove fakephp I'm the first do support you *eg* Yes, this
> > probably is mostly personal taste and not so much technical,
> > but I'm just human ;)
>
> Ok, well I haven't seen dummyphp.
>
> I don't think we should really care that much about dummyphp vs.
> new fakephp as long as we complete the end goal, which in my
> opinion is:
>
> /sys/bus/pci/slots/ represents physical slots and device hotplug
>
> /sys/devices/<bus>/<function>/remove used for function hotplug
>
> If we can use dummyphp to get us through the transition period,
> meaning it can:
>
> - provide an interface compatible with fakephp
> - allow recursive hotplug (remove a bridge and all
> functions behind it)
> - recognize if a function has been hotplugged via a
> different interface
> - relatively simple implementation
>
> Then I'm sure that Trent won't mind whether we use your dummyphp
> or his new fakephp.
>
> I'm wondering though, if dummyphp uses the PCI hotplug API, it
> will probably suffer from the same limitation that the current
> fakephp does, which is that the PCI hotplug core will only allow
> drivers to claim on a _device_ level, not _function_ level.

Exactly.

> Care to post dummyphp for us to see?

http://opensource.sf-tec.de/kernel/dummyphp-2.6.28-rc7.diff

Since I'm rather short on time it's not in a state I would like to see it in,
i.e. I've not tested it for the last year.

Greetings,

Eike


Attachments:
(No filename) (3.40 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-16 19:35:33

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the device

On Monday, December 1, 2008 9:08 am Rolf Eike Beer wrote:
> Trent Piepho wrote:
> > On Fri, 28 Nov 2008, Rolf Eike Beer wrote:
> > > Trent Piepho wrote:
> > > > On Wed, 26 Nov 2008, Darrick J. Wong wrote:
> > > > > On Wed, Nov 26, 2008 at 03:55:35PM -0700, Alex Chiang wrote:
> > > > > > > Maybe it's different on powerpc then? My pseudo-hotplugable
> > > > > > > device is also the only thing connected to the PCI-E host bus
> > > > > > > controller. At boot the controller is empty and so I think some
> > > > > > > code to enable its BARs gets skipped. But without the
> > > > > > > pci_enable_device(), I get this:
> > > > > > >
> > > > > > > 01:00.0 Signal processing controller: Freescale Semiconductor
> > > > > > > Inc Aurora Nexus Trace Interface Flags: fast devsel, IRQ 255
> > > > > > > Memory at 40000000 (64-bit, prefetchable) [disabled] [size=4K]
> > > > >
> > > > > Are you referring to this? ^^^^^^^^^^
> > > > >
> > > > > Without seeing the raw dump of the PCI config space, it looks to me
> > > > > like the memory space enable bit of the PCICMD register is unset.
> > > > > Probably the device driver should call pci_enable_device() at init
> > > > > time, though I suppose you did say earlier that there is no driver.
> > > >
> > > > Yes, that's it. It seems like since the BARs are normally enabled
> > > > after a device is scanned at boot time that they should also be
> > > > enabled when a device is found by a fakephp rescan. So I thought it
> > > > seemed reasonable to put pci_enable_device() in fakephp.
> > >
> > > No, pci_enable_device() will be called by the device driver. The
> > > hotplug drivers have nothing to do with that.
> >
> > I guess you didn't read the part about there not being a device driver?
>
> I read it, but that's the way a kernel works: if you want to talk to a
> device, get a driver. You can write a rather minimal one that does only
> pci_enable_device() on probe and pci_disable_device() on remove. Try the
> one posted by Chris Wright in "[PATCH 2/2] PCI: pci-stub module to reserve
> pci device" as a starting point.

Ok, so sounds like Darrick's original patch gets a NAK? I guess the fakephp
vs. dummyphp vs. new interface stuff can be dealt with in another thread...
--
Jesse Barnes, Intel Open Source Technology Center

2008-12-16 20:28:42

by Jesse Barnes

[permalink] [raw]
Subject: Re: fixup PCI device booleans in sysfs

On Sunday, November 30, 2008 5:10 pm Trent Piepho wrote:
> From d962157b2b36f2c54d147a296921553b4aefcf7b Mon Sep 17 00:00:00 2001
> From: Trent Piepho <[email protected]>
> Date: Sun, 30 Nov 2008 16:51:29 -0800
> Subject: [PATCH] PCI: Make settable sysfs attributes more consistent
>
> PCI devices have three settable boolean attributes, enable,
> broken_parity_status, and msi_bus.
>
> The store functions for these would silently interpret "0x01" as false,
> "1llogical" as true, and "true" would be (silently!) ignored and do
> nothing.
>
> This is inconsistent with typical sysfs handling of settable attributes,
> and just plain doesn't make much sense.
>
> So, use strict_strtoul(), which was created for this purpose. The store
> functions will treat a value of 0 as false, non-zero as true, and return
> -EINVAL for a parse failure.
>
> Additionally, is_enabled_store() and msi_bus_store() return -EPERM if
> CAP_SYS_ADMIN is lacking, rather than silently doing nothing. This is more
> typical behavior for sysfs attributes that need a capability.
>
> And msi_bus_store() will only print the "forced subordinate bus ..."
> warning if the MSI flag was actually forced to a different value.
>
> Signed-off-by: Trent Piepho <[email protected]>

Nice, I dug this out of the big fakephp flame thread. Looks like a good fix;
there are probably other sysfs interfaces in the kernel that need similar
treatment. This one is in my linux-next branch now. Hope there aren't many
scripts out there depending on the broken behavior! :)

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2008-12-16 20:57:08

by djwong

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the?device

On Tue, Dec 16, 2008 at 11:33:33AM -0800, Jesse Barnes wrote:
> > I read it, but that's the way a kernel works: if you want to talk to a
> > device, get a driver. You can write a rather minimal one that does only
> > pci_enable_device() on probe and pci_disable_device() on remove. Try the
> > one posted by Chris Wright in "[PATCH 2/2] PCI: pci-stub module to reserve
> > pci device" as a starting point.
>
> Ok, so sounds like Darrick's original patch gets a NAK? I guess the fakephp
> vs. dummyphp vs. new interface stuff can be dealt with in another thread...

I'd like to be able to (pretend to) add and remove PCI devices via
fakephp until this dummyphp/fakephp/newinterface stuff gets ironed out
and put into mainline.

In any case, I gave 2.6.24 a whirl. 2.6.24 fakephp sets up the BARs
correctly, so technically this is a regression fix too, even if only a
stopgap.

--D

2008-12-21 02:23:59

by Trent Piepho

[permalink] [raw]
Subject: Re: [PATCH] fakephp: Allocate PCI resources before adding the?device

On Tue, 16 Dec 2008, Darrick J. Wong wrote:
> On Tue, Dec 16, 2008 at 11:33:33AM -0800, Jesse Barnes wrote:
> > > I read it, but that's the way a kernel works: if you want to talk to a
> > > device, get a driver. You can write a rather minimal one that does only
> > > pci_enable_device() on probe and pci_disable_device() on remove. Try the
> > > one posted by Chris Wright in "[PATCH 2/2] PCI: pci-stub module to reserve
> > > pci device" as a starting point.
> >
> > Ok, so sounds like Darrick's original patch gets a NAK? I guess the fakephp
> > vs. dummyphp vs. new interface stuff can be dealt with in another thread...
>
While fakephp may be a mess, Darrick's patch does fix a problem with it.

> In any case, I gave 2.6.24 a whirl. 2.6.24 fakephp sets up the BARs
> correctly, so technically this is a regression fix too, even if only a
> stopgap.

I know 2.6.23 has the same problem with resources not being assigned
correctly to BARs as 2.6.27. It doesn't look like there are any patches in
2.6.24 that would have fixed it.