2004-04-28 21:51:28

by Brian King

[permalink] [raw]
Subject: userspace pci config space accesses

I recently ran into a problem where lspci was trying to read pci config space
of a pci adapter while the device driver for that adapter was running BIST
on it. On ppc64, this resulted in a PCI error and puts the slot into an error
state making it unusable for the remainder of that system boot. Should there
be some blocking in place so that userspace pci config reads will not occur
in these windows or is using tools like lspci user beware? Part of my problem
was that lspci was being called as a result of a boot script so on a particular
machine I was hitting this every boot.


--
Brian King
eServer Storage I/O
IBM Linux Technology Center


2004-04-28 22:54:07

by Greg KH

[permalink] [raw]
Subject: Re: userspace pci config space accesses

On Wed, Apr 28, 2004 at 04:49:02PM -0500, Brian King wrote:
> I recently ran into a problem where lspci was trying to read pci config
> space
> of a pci adapter while the device driver for that adapter was running BIST
> on it. On ppc64, this resulted in a PCI error and puts the slot into an
> error state making it unusable for the remainder of that system boot.
> Should there be some blocking in place so that userspace pci config
> reads will not occur in these windows or is using tools like lspci
> user beware?

There already is a pci_config_lock that should be grabbed when accessing
pci config space. It sounds like the driver needs to play a bit nicer
when it's running a self test :)

What driver is doing this?

thanks,

greg k-h

2004-04-28 23:29:52

by Brian King

[permalink] [raw]
Subject: Re: userspace pci config space accesses

Greg KH wrote:
> On Wed, Apr 28, 2004 at 04:49:02PM -0500, Brian King wrote:
>
>>I recently ran into a problem where lspci was trying to read pci config
>>space
>>of a pci adapter while the device driver for that adapter was running BIST
>>on it. On ppc64, this resulted in a PCI error and puts the slot into an
>>error state making it unusable for the remainder of that system boot.
>>Should there be some blocking in place so that userspace pci config
>>reads will not occur in these windows or is using tools like lspci
>>user beware?
>
>
> There already is a pci_config_lock that should be grabbed when accessing
> pci config space. It sounds like the driver needs to play a bit nicer
> when it's running a self test :)

Found the lock. Unfortunately, its not exported, so a device driver can't use
it without changing that. Additionally, its a spinlock, and it takes 2 seconds
to complete BIST, which seems a bit too long to hold a spinlock.

> What driver is doing this?

The ipr driver, a scsi device driver for ppc64.

http://marc.theaimsgroup.com/?l=linux-scsi&m=108144942527994&w=2

The driver runs BIST at device initialization time to ensure that the device
is in a clean state. It will also run BIST on module unload, and in various
error scenarios.


--
Brian King
eServer Storage I/O
IBM Linux Technology Center

2004-04-28 23:39:45

by Greg KH

[permalink] [raw]
Subject: Re: userspace pci config space accesses

On Wed, Apr 28, 2004 at 06:26:53PM -0500, Brian King wrote:
> Greg KH wrote:
> >On Wed, Apr 28, 2004 at 04:49:02PM -0500, Brian King wrote:
> >
> >>I recently ran into a problem where lspci was trying to read pci config
> >>space
> >>of a pci adapter while the device driver for that adapter was running BIST
> >>on it. On ppc64, this resulted in a PCI error and puts the slot into an
> >>error state making it unusable for the remainder of that system boot.
> >>Should there be some blocking in place so that userspace pci config
> >>reads will not occur in these windows or is using tools like lspci
> >>user beware?
> >
> >
> >There already is a pci_config_lock that should be grabbed when accessing
> >pci config space. It sounds like the driver needs to play a bit nicer
> >when it's running a self test :)
>
> Found the lock. Unfortunately, its not exported, so a device driver can't
> use it without changing that. Additionally, its a spinlock, and it
> takes 2 seconds to complete BIST, which seems a bit too long to hold a
> spinlock.

Yes, a driver shouldn't mess with that lock anyway. I was pointing out
that there is already a lock to prevent reading and writing config
errors from happening.

> >What driver is doing this?
>
> The ipr driver, a scsi device driver for ppc64.
>
> http://marc.theaimsgroup.com/?l=linux-scsi&m=108144942527994&w=2
>
> The driver runs BIST at device initialization time to ensure that the device
> is in a clean state.

Ick. It sounds like "clean state" isn't always true if userspace can
mess the device up by simply reading its config space :)

Worse case thing, stop the whole machine while doing BIST if you want to
prevent this from happening (not that I'm actually suggesting you do it,
but if you really think it's the only way...)

Is there any way you can not run BIST all the time, except when
explicitly asked for from the user?

thanks,

greg k-h

2004-04-29 00:38:17

by Brian King

[permalink] [raw]
Subject: Re: userspace pci config space accesses

Greg KH wrote:
>>>What driver is doing this?
>>
>>The ipr driver, a scsi device driver for ppc64.
>>
>>http://marc.theaimsgroup.com/?l=linux-scsi&m=108144942527994&w=2
>>
>>The driver runs BIST at device initialization time to ensure that the device
>>is in a clean state.
>
>
> Ick. It sounds like "clean state" isn't always true if userspace can
> mess the device up by simply reading its config space :)

Yeah, its not so much the device, bus the way pSeries PCI bridges work.
Other adapters would have the same problem, but after a quick grep it
doesn't look like running BIST is a very common thing to do in most
Linux drivers.

> Worse case thing, stop the whole machine while doing BIST if you want to
> prevent this from happening (not that I'm actually suggesting you do it,
> but if you really think it's the only way...)

Yeah, mdelay(2000) kind of sticks out in a code review;)

> Is there any way you can not run BIST all the time, except when
> explicitly asked for from the user?

Not really. The times when told to explicitly by the user are actually
in the minority. Here are the times when BIST currently gets run and why:

1. module load time. Ensure the adapter is ready to accepts commands. I
might be able to remove this one, but would need to talk with the system
firmware folks to make sure they can guarantee the adapter will be in a
clean state. At one point there was some talk that this couldn't be
guaranteed, but that may have changed.

2. Fatal adapter error. The adapter signals an interrupt to the driver
and the driver needs to run BIST to recover.

3. scsi_eh_host_reset. The adapter is having problems and commands are
probably timing out. Last resort ERP.

4. Module unload time. This is required since there are kernel buffers
that the adapter thinks it owns that must be freed and the only way to
relinquish ownership of them from the adapter is to run BIST. Ugly, but
we are stuck with it.

5. Microcode download. User initiated update of adapter microcode.

6. User writes an adapter reset sysfs file.


Two ideas I had would either be to create interfaces in the pci layer
that a device driver could call to disable all pci adapter accesses and
one to re-enable them. We could probably just make all pci accesses fail
when disabled. These interfaces could then grab the lock and set the
state on the pci_dev, then the read/write interfaces would check the
state after acquiring the lock.

The other idea would be to create an async interface in the pci layer to
run BIST for me and have it manage the locking in a similar manner.


thanks

-Brian



2004-04-29 10:12:19

by Arjan van de Ven

[permalink] [raw]
Subject: Re: userspace pci config space accesses


> Yeah, mdelay(2000) kind of sticks out in a code review;)

mdelay() isn't enough; think SMP.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part