Hi all,
Well, I've been trying to figure out a way to remove the existing
pci_find_device(), and other pci_find_* functions from the 2.5 kernel
without hurting to many things (well, things that people care about.)
Turns out these are very useful functions, outside of the "old" pci
framework, and I can't really justify removing them, so they are staying
for now (or until someone else can think of a replacement...)
The main reason for wanting to do this, is that any PCI driver that
relies on using pci_find_* to locate a device to control, will not work
with the existing PCI hotplug code. Moving forward, those drivers will
also not work with the driverfs, struct driver, or the device naming
code.
So if you own a PCI driver that does not conform to the "new" PCI api
(using pci_register_driver() and friends) consider yourself warned.
Your driver will NOT inherit any of the upcoming changes to the drivers
tree, which might cause them to break. Also remember, all of the people
that are buying hotplug PCI systems for their datacenters will not buy
your cards :)
thanks,
greg k-h
On Sat, 2002-07-13 at 01:36, Greg KH wrote:
> So if you own a PCI driver that does not conform to the "new" PCI api
> (using pci_register_driver() and friends) consider yourself warned.
> Your driver will NOT inherit any of the upcoming changes to the drivers
> tree, which might cause them to break. Also remember, all of the people
> that are buying hotplug PCI systems for their datacenters will not buy
> your cards :)
I have several examples where the ordering of the PCI cards is critical
to get stuff like boot device and primary controller detection right.
pci_register_driver doesn't appear to have a good way to deal with this
or have I missed something ?
From: Alan Cox <[email protected]>
Date: 13 Jul 2002 03:23:29 +0100
I have several examples where the ordering of the PCI cards is critical
to get stuff like boot device and primary controller detection right.
pci_register_driver doesn't appear to have a good way to deal with this
or have I missed something ?
Cards get registered in the order they appear on the bus, or at least
that is the way the algorithm worked the last time I looked.
Or, what other facility do you need?
> I have several examples where the ordering of the PCI cards
> is critical
> to get stuff like boot device and primary controller detection right.
> pci_register_driver doesn't appear to have a good way to deal
> with this or have I missed something ?
Indeed, this is used for a variety of reasons:
1) Systems with both on-motherboard and add-in disk controllers which share
a driver, where you really need the on-motherboard controller to appear
first before any add-in cards. aacraid driver in 2.4.x does this today.
2) Systems with both an older and newer add-in card which share a driver,
where the older (original) card has your boot disks, and any newer card
would get added for adding more storage later. megaraid driver in
2.4.x/2.5.x does this today.
In both these cases, the pci_find_device() functions use an explict ordering
to make it far more likely we can still boot the system after adding new
hardware. Unless/until there's a method for telling the kernel/modules that
a particular device is the boot device (ala BIOS EDD 3.0 if vendors were to
get around to implementing such) explict ordering in the drivers is the only
way we can build complex storage solutions and boot reliably.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer, Architect
Dell Linux Solutions http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
#1 US Linux Server provider for 2001 and Q1/2002! (IDC May 2002)
[email protected] wrote:
> In both these cases, the pci_find_device() functions use an explict ordering
> to make it far more likely we can still boot the system after adding new
> hardware. Unless/until there's a method for telling the kernel/modules that
> a particular device is the boot device (ala BIOS EDD 3.0 if vendors were to
> get around to implementing such) explict ordering in the drivers is the only
> way we can build complex storage solutions and boot reliably.
IMO what devices are boot devices is a policy decision. Depending on
pci_find_device() use in a driver's kernel code, or kernel link
ordering, is simply hard-coding something that should really be in
userspace. Depending on pci_find_device logic / link order to
still-boot-the-system after adding new hardware sounds like an
incredibly fragile hope, not a reliable system users can trust.
Jeff
On Sat, Jul 13, 2002 at 12:41:56AM -0400, Jeff Garzik wrote:
> [email protected] wrote:
> >In both these cases, the pci_find_device() functions use an explict
> >ordering
> >to make it far more likely we can still boot the system after adding new
> >hardware. Unless/until there's a method for telling the kernel/modules
> >that
> >a particular device is the boot device (ala BIOS EDD 3.0 if vendors were to
> >get around to implementing such) explict ordering in the drivers is the
> >only
> >way we can build complex storage solutions and boot reliably.
>
>
> IMO what devices are boot devices is a policy decision. Depending on
> pci_find_device() use in a driver's kernel code, or kernel link
> ordering, is simply hard-coding something that should really be in
> userspace. Depending on pci_find_device logic / link order to
> still-boot-the-system after adding new hardware sounds like an
> incredibly fragile hope, not a reliable system users can trust.
Exactly.
In the way we are moving naming policy out to userspace, you will be
able to specify exactly which disk, on which pci bus that you want to
boot from (remember initramfs will let us run userspace programs before
the boot disk is touched by the kernel.)
Yes, it still involves some handwaving at this moment in time, but it
will happen, and I do know about this requirement :)
thanks,
greg k-h
> > ordering, is simply hard-coding something that should really be in
> > userspace. Depending on pci_find_device logic / link order to
> > still-boot-the-system after adding new hardware sounds like an
> > incredibly fragile hope, not a reliable system users can trust.
Yes, but unfortunately it's all we've had for a long time.
> Yes, it still involves some handwaving at this moment in time, but it
> will happen, and I do know about this requirement :)
Then this will solve my #2 factory install problem. I look forward to this
restriction being removed properly. :-)
(In case you're curious, #1 is customers can't specify the partition
strategy they want at order time, so they wind up blowing away the FI
anyhow).
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer, Architect
Dell Linux Solutions http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
#1 US Linux Server provider for 2001 and Q1/2002! (IDC May 2002)
On Sat, 2002-07-13 at 05:41, Jeff Garzik wrote:
> ordering, is simply hard-coding something that should really be in
> userspace. Depending on pci_find_device logic / link order to
> still-boot-the-system after adding new hardware sounds like an
> incredibly fragile hope, not a reliable system users can trust.
For hot plugging obvious. At system boot time however the ordering and
seeing the ordering is rather important because in many cases the
ordering is what tells you about things like IDE controller pairing. It
tells you what order to assign many scsi devices because the ordering is
defined by their BIOS ROM.
One way to handle this generically would be to use pci_register_device,
but in the register function for such wacky devices during boot up we
merely keep track of what we have to look into.
That requires a way for drivers to register an init function that will
be called after the boot time PCI device registration is done. At that
point its as easy as
Register
Collect list of devices
[Kernel does pci enumerations]
Sort list in BIOS specific ordering
Feed list to registration code
Flip registration function pointer to be the immediate register
handler
[Watch all the glue vanish into __init oblivion]
Which seems preferable to keeping the old API around for registrations,
although its still used for probing for things (which has locking
concerns). Refcounting pci_ might sort those out
On Sat, 2002-07-13 at 02:12, David S. Miller wrote:
> From: Alan Cox <[email protected]>
> Date: 13 Jul 2002 03:23:29 +0100
>
> I have several examples where the ordering of the PCI cards is critical
> to get stuff like boot device and primary controller detection right.
> pci_register_driver doesn't appear to have a good way to deal with this
> or have I missed something ?
>
> Cards get registered in the order they appear on the bus, or at least
> that is the way the algorithm worked the last time I looked.
For most cards that is all that has to be turned from a "happens to
work" into a "officially this is what happens". I've replied about how
to handle other stuff in an earlier message. Its not a big thing, and
getting rid of most of pci_find_device will help. We still have people
needing to find other devices so there are refcount/locking things to
handle akin to what was done to make the old networking dev_get() type
stuff actually DTRT
Alan Cox wrote:
> On Sat, 2002-07-13 at 05:41, Jeff Garzik wrote:
>
>>ordering, is simply hard-coding something that should really be in
>>userspace. Depending on pci_find_device logic / link order to
>>still-boot-the-system after adding new hardware sounds like an
>>incredibly fragile hope, not a reliable system users can trust.
>
>
> For hot plugging obvious. At system boot time however the ordering and
> seeing the ordering is rather important because in many cases the
> ordering is what tells you about things like IDE controller pairing. It
> tells you what order to assign many scsi devices because the ordering is
> defined by their BIOS ROM.
>
> One way to handle this generically would be to use pci_register_device,
> but in the register function for such wacky devices during boot up we
> merely keep track of what we have to look into.
My point is that depending on any method of internal kernel ordering is
fragile.
I would rather have the kernel export which drives are listed in CMOS /
BIOS ROM, and let userspace say "my boot drive is the nth BIOS-listed
drive." For example, looking through the aic7xxx (or was it
ncr53c8xxx?) drive, it gets boot drive ordering from BIOS/CMOS. That
piece of info can either be exported by driverfs from the low-level SCSI
driver, or by a separate, tiny ncr53c8xxx_boot_drive driver.
Depending on pci_find_* ordering is very situation-dependent, and only
covers N cases. Then you have another N cases covered by the order in
which you modprobe key drivers. Then you have another N cases covered
by special case code somewhere. You'll never get all these cases right,
in the kernel, the way the user wants. That's why I say the
responsibility for figuring out the boot drive should be pushed to
initrd/initramfs.
Jeff
On Sat, 2002-07-13 at 16:37, Jeff Garzik wrote:
> My point is that depending on any method of internal kernel ordering is
> fragile.
Its actually -extremely- reliable. Simply because we've kept the
behaviour constant over time.
> I would rather have the kernel export which drives are listed in CMOS /
> BIOS ROM, and let userspace say "my boot drive is the nth BIOS-listed
> drive." For example, looking through the aic7xxx (or was it
There is a BIOS extension for this (EDID 3.0 I believe). It only
addresses where the boot device went, not how to sort the IDE device
ordering and the like
> Depending on pci_find_* ordering is very situation-dependent, and only
> covers N cases. Then you have another N cases covered by the order in
> which you modprobe key drivers. Then you have another N cases covered
Forget about modprobe. The areas this bites people are areas where the
ordering is compiled in stuff (eg IDE) and where you have multiple of
the same controller.
A good example here is that many systems order devices internally based
on mainboard versus external. Dell do this a lot. That ordering happens
not to be the pci scan order some times.
Even with BIOS help you have to know this. And with only the basic BIOS
you have to know the full ROM initialisation ordering, which is -very-
non trivial for complex systems.
> in the kernel, the way the user wants. That's why I say the
> responsibility for figuring out the boot drive should be pushed to
> initrd/initramfs.
Finding the rootfs by label is a minor problem, figuring out how to name
the controllers consistently between 2.2/2.4/2.6 is a showstopper in the
real world even if its not in happy hackerdom.
Alan Cox wrote:
> On Sat, 2002-07-13 at 16:37, Jeff Garzik wrote:
>
>>My point is that depending on any method of internal kernel ordering is
>>fragile.
>
> Its actually -extremely- reliable. Simply because we've kept the
> behaviour constant over time.
For the user, agreed. For the kernel hacker, it's a fragile balance
trying to keep the user presentation constant. And we haven't always
been successful.
>>I would rather have the kernel export which drives are listed in CMOS /
>>BIOS ROM, and let userspace say "my boot drive is the nth BIOS-listed
>>drive." For example, looking through the aic7xxx (or was it
>
>
> There is a BIOS extension for this (EDID 3.0 I believe). It only
> addresses where the boot device went, not how to sort the IDE device
> ordering and the like
>
>
>>Depending on pci_find_* ordering is very situation-dependent, and only
>>covers N cases. Then you have another N cases covered by the order in
>>which you modprobe key drivers. Then you have another N cases covered
>
>
> Forget about modprobe. The areas this bites people are areas where the
> ordering is compiled in stuff (eg IDE) and where you have multiple of
> the same controller.
sorry for the confusion, I really equate modprobe and link order -- they
both define overall order of initialization.
> A good example here is that many systems order devices internally based
> on mainboard versus external. Dell do this a lot. That ordering happens
> not to be the pci scan order some times.
>
> Even with BIOS help you have to know this. And with only the basic BIOS
> you have to know the full ROM initialisation ordering, which is -very-
> non trivial for complex systems.
[...]
> Finding the rootfs by label is a minor problem, figuring out how to name
> the controllers consistently between 2.2/2.4/2.6 is a showstopper in the
> real world even if its not in happy hackerdom.
Everything you are saying here just convinces me more than we should do
this stuff in initramfs. At the summit Linus endorsed using
/sbin/hotplug when storage devices appear... combine that with
initramfs, and you should have all you need to handle whatever complex
scenario you come up with. It sounds straightforward to have some
find-the-root-device code in initramfs that can contain "if
(dell_mainboard)" code all over the place.
Jeff
Jeff Garzik wrote:
> Everything you are saying here just convinces me more than we should do
> this stuff in initramfs. At the summit Linus endorsed using
> /sbin/hotplug when storage devices appear... combine that with
> initramfs, and you should have all you need to handle whatever complex
> scenario you come up with. It sounds straightforward to have some
> find-the-root-device code in initramfs that can contain "if
> (dell_mainboard)" code all over the place.
IOW, strive to make order of kernel device initialization irrelevant --
init the kernel drivers, then figure out the boot device.
Jeff
From: Alan Cox <[email protected]>
Date: 13 Jul 2002 15:46:55 +0100
We still have people needing to find other devices
In particular things like "if on PCI host controller DEV/ID, enable hw
bug workaround foo". I'm going to need to do crap like this even in
the TG3 driver, it has to be worked around in the TG3 driver code
itself so this isn't a PCI black-list type thing where we swizzle bits
in the PCI host controller registers.
>In particular things like "if on PCI host controller DEV/ID, enable hw
>bug workaround foo". I'm going to need to do crap like this even in
>the TG3 driver, it has to be worked around in the TG3 driver code
>itself so this isn't a PCI black-list type thing where we swizzle bits
>in the PCI host controller registers.
That case shouldn't be a problem, since when your device get discovered,
hopefully, the host controller is already there. Though in some cases,
host controllers just appear as a sibling device, and in this specific
case, it may be not have been "discovered" yet.
There can be other bad dependencies between "sibling" devices (especially
functions of the same physical devices), which is why I would make sure
that all devices on a given level have been probed (that is their
pci_dev structure created) before the various drivers get notified.
Ben.
Greg KH <[email protected]> writes:
> Hi all,
>
> Well, I've been trying to figure out a way to remove the existing
> pci_find_device(), and other pci_find_* functions from the 2.5 kernel
> without hurting to many things (well, things that people care about.)
>
> Turns out these are very useful functions, outside of the "old" pci
> framework, and I can't really justify removing them, so they are staying
> for now (or until someone else can think of a replacement...)
>
> The main reason for wanting to do this, is that any PCI driver that
> relies on using pci_find_* to locate a device to control, will not work
> with the existing PCI hotplug code. Moving forward, those drivers will
> also not work with the driverfs, struct driver, or the device naming
> code.
>
> So if you own a PCI driver that does not conform to the "new" PCI api
> (using pci_register_driver() and friends) consider yourself warned.
> Your driver will NOT inherit any of the upcoming changes to the drivers
> tree, which might cause them to break. Also remember, all of the people
> that are buying hotplug PCI systems for their datacenters will not buy
> your cards :)
I do but it only doesn't use pci_register_driver because that doesn't
work.
The driver is a mtd map driver. It knows there is a rom chip behind
a pci->isa bridge. And it needs to find the pci->isa bridge to
properly set it up to access the rom chip (enable writes and the
like).
It isn't a driver for the pci->isa bridge, (I'm not even certain we
have a good model for that). So it does not use pci_register_driver.
If you can give me a good proposal for how to accomplish that kind of
functionality I would be happy to use the appropriate
xxx_register_driver.
Eric
From: Benjamin Herrenschmidt <[email protected]>
Date: Sat, 13 Jul 2002 15:45:53 +0200
That case shouldn't be a problem, since when your device get discovered,
hopefully, the host controller is already there. Though in some cases,
host controllers just appear as a sibling device, and in this specific
case, it may be not have been "discovered" yet.
THat's not what I'm concerned about, what I care about is that there
still will be a pci_find_*() I can call to see if DEV/ID is on
the bus. That is the easiest way to perform that search right
now.
On Sun, Jul 14, 2002 at 02:07:01PM -0600, Eric W. Biederman wrote:
>
> The driver is a mtd map driver. It knows there is a rom chip behind
> a pci->isa bridge. And it needs to find the pci->isa bridge to
> properly set it up to access the rom chip (enable writes and the
> like).
>
> It isn't a driver for the pci->isa bridge, (I'm not even certain we
> have a good model for that). So it does not use pci_register_driver.
>
> If you can give me a good proposal for how to accomplish that kind of
> functionality I would be happy to use the appropriate
> xxx_register_driver.
I don't think there is a good way for you to convert over to
_register_driver(), that's the main reason I'm keeping the pci_find_*
functions around, they are quite useful for lots of situations.
It doesn't sound like you are worrying about your device working in a
pci hotplug system, and you would probably be willing do any pci device
conversion work to the new driver model yourself, right? :)
thanks,
greg k-h
On Sun, Jul 14, 2002 at 10:25:27PM -0700, David S. Miller wrote:
> From: Benjamin Herrenschmidt <[email protected]>
> Date: Sat, 13 Jul 2002 15:45:53 +0200
>
> That case shouldn't be a problem, since when your device get discovered,
> hopefully, the host controller is already there. Though in some cases,
> host controllers just appear as a sibling device, and in this specific
> case, it may be not have been "discovered" yet.
>
> THat's not what I'm concerned about, what I care about is that there
> still will be a pci_find_*() I can call to see if DEV/ID is on
> the bus. That is the easiest way to perform that search right
> now.
Yes, it will stay. It is needed for situations just like these, and lots
of other valid reasons.
thanks,
greg k-h
Greg KH <[email protected]> writes:
> On Sun, Jul 14, 2002 at 02:07:01PM -0600, Eric W. Biederman wrote:
> >
> > The driver is a mtd map driver. It knows there is a rom chip behind
> > a pci->isa bridge. And it needs to find the pci->isa bridge to
> > properly set it up to access the rom chip (enable writes and the
> > like).
> >
> > It isn't a driver for the pci->isa bridge, (I'm not even certain we
> > have a good model for that). So it does not use pci_register_driver.
> >
> > If you can give me a good proposal for how to accomplish that kind of
> > functionality I would be happy to use the appropriate
> > xxx_register_driver.
>
> I don't think there is a good way for you to convert over to
> _register_driver(), that's the main reason I'm keeping the pci_find_*
> functions around, they are quite useful for lots of situations.
>
> It doesn't sound like you are worrying about your device working in a
> pci hotplug system, and you would probably be willing do any pci device
> conversion work to the new driver model yourself, right? :)
Assuming I can actually fit in better with the new driver model. As
far as hot-plug. It is an abuse but I regularly hot-swap my rom chips
in my development system.
I am probably looking at this from the wrong angle but my problem with
current code base seems to be that I can only have one driver per pci
device.
In any case I would like to have code that fits in nicely with the
new driver system. I can take about one change in kernel API. For
the most part the drivers are trivial, and having non-trivial
maintenance for trivial code is less than ideal.
Eric
On 16 Jul 2002, Eric W. Biederman wrote:
> Greg KH <[email protected]> writes:
>
> > On Sun, Jul 14, 2002 at 02:07:01PM -0600, Eric W. Biederman wrote:
> > >
> > > The driver is a mtd map driver. It knows there is a rom chip behind
> > > a pci->isa bridge. And it needs to find the pci->isa bridge to
> > > properly set it up to access the rom chip (enable writes and the
> > > like).
> > >
> > > It isn't a driver for the pci->isa bridge, (I'm not even certain we
> > > have a good model for that). So it does not use pci_register_driver.
> > >
> > > If you can give me a good proposal for how to accomplish that kind of
> > > functionality I would be happy to use the appropriate
> > > xxx_register_driver.
> >
> > I don't think there is a good way for you to convert over to
> > _register_driver(), that's the main reason I'm keeping the pci_find_*
> > functions around, they are quite useful for lots of situations.
> >
> > It doesn't sound like you are worrying about your device working in a
> > pci hotplug system, and you would probably be willing do any pci device
> > conversion work to the new driver model yourself, right? :)
>
> Assuming I can actually fit in better with the new driver model. As
> far as hot-plug. It is an abuse but I regularly hot-swap my rom chips
> in my development system.
No, but you do do firmware, and you have a desire to tell the kernel about
which devices are in the system from the firmware. The code path once you
discover the device is exactly the same as if you were to actually plug
in the device, or probe for it natively.
Though making legacy drivers hotpluggable seems absurd, the capability is
actually a requirement for supporting many firmwares.
> I am probably looking at this from the wrong angle but my problem with
> current code base seems to be that I can only have one driver per pci
> device.
Don't most people? :)
> In any case I would like to have code that fits in nicely with the
> new driver system. I can take about one change in kernel API. For
> the most part the drivers are trivial, and having non-trivial
> maintenance for trivial code is less than ideal.
We don't want to make things difficult. It's a PITA right now, since the
documentation is lacking and not all the infrastructure is in place to
really start plowing ahead. But, it will get better..
-pat
> There is a BIOS extension for this (EDID 3.0 I believe).
Unfortunately EDD30 isn't implemented by very many BIOSs or option roms.
The Adaptec aic7xxx series BIOSs and recent LSI-based PERC3 BIOSs are the
only ones I've found that do, but their implementations are slightly buggy.
Granted, I've limited my testing to those cards and/or onboard devices that
Dell sells.
I made a simple DOS program if people want to test boards for themselves.
(EDD30 specifies a real mode int13 extension, so it was easy to to it in
DOS, only slightly harder to do it in the boot loader or kernel before
switching to protected mode.) Posted to http://domsch.com/linux/edd30/.
There's a results page there that I'm starting to fill in. Please send
results to [email protected] per the README included.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer, Architect
Dell Linux Solutions http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
#1 US Linux Server provider for 2001 and Q1/2002! (IDC May 2002)
Patrick Mochel <[email protected]> writes:
> On 16 Jul 2002, Eric W. Biederman wrote:
>
> > Greg KH <[email protected]> writes:
> >
> > > I don't think there is a good way for you to convert over to
> > > _register_driver(), that's the main reason I'm keeping the pci_find_*
> > > functions around, they are quite useful for lots of situations.
> > >
> > > It doesn't sound like you are worrying about your device working in a
> > > pci hotplug system, and you would probably be willing do any pci device
> > > conversion work to the new driver model yourself, right? :)
> >
> > Assuming I can actually fit in better with the new driver model. As
> > far as hot-plug. It is an abuse but I regularly hot-swap my rom chips
> > in my development system.
>
> No, but you do do firmware, and you have a desire to tell the kernel about
> which devices are in the system from the firmware. The code path once you
> discover the device is exactly the same as if you were to actually plug
> in the device, or probe for it natively.
A clarification here. I am thinking of drivers/mtd/maps/ich2rom.c, or
drivers/mtd/maps/amd766rom.c. (Should be in 2.4.19). What the driver
do is find a pci bridge device behind which rom chips are usually
found, and then it probes for a rom chip, behind the bridge.
Despite being LPC/ISA, there is a moderately standard way of getting a
chip id from a rom chip (see drivers/mtd/chips/jedec_probe.c). Armed
with the chip id I dynamically select the chip driver.
In practice my driver really is a driver for a subset of the bridge
chip, allowing access to the rom chip. Besides giving a clue which
addresses to probe the map driver also enables writes to the rom chip.
>From the firmware side it is easy to tell the kernel there is a rom
chip at address xxx for yyy bytes behind zzz. The challenging part is
what structure the driver should really take. And for that I am
asking for advice, or at least some ideas.
> > In any case I would like to have code that fits in nicely with the
> > new driver system. I can take about one change in kernel API. For
> > the most part the drivers are trivial, and having non-trivial
> > maintenance for trivial code is less than ideal.
>
> We don't want to make things difficult. It's a PITA right now, since the
> documentation is lacking and not all the infrastructure is in place to
> really start plowing ahead. But, it will get better..
Well I want to keep the reminders coming of weird things that are
actually supportable right now, and to ask for help in finding better
ways to construct the drivers. If I could just do firmware my job
would be so much easier :)
Eric