2002-09-23 21:23:01

by Jeff Garzik

[permalink] [raw]
Subject: Quick aic7xxx bug hunt...

Justin T. Gibbs wrote:
> On some motherboards with some chipsets, you can get these messages if
> another busmaster (say an IDE drive or a sound card) is hogging the bus.
> Usually this is with a VIA chipset. Its not clear why the aic7xxx_old
> driver would behave differently other than it disables memory write
> and invalidate PCI transactions on this chip. The new driver doesn't
> need that work around.


Justin,

One thing I notice is at least one PCI posting bug. When using MMIO
(write[bwlq] under Linux), you _must_ use a read[bwlq] to flush the
write to PCI, if you wish to ensure the write posts at a certain point
in the code.

Here is the example PCI posting bug, in ahc_clear_critical_section:
> ahc_outb(ahc, HCNTRL, ahc->unpause);
> do {
> ahc_delay(200);
> } while (!ahc_is_paused(ahc));

As you can see, there is no read before the udelay(), which is very
wrong on modern CPUs with write posting... that's definitely a driver
bug that will bite you on modern x86 motherboards [and is totally broken
on ia64 and other platforms].

Please let me know if you have further questions on PCI write posting...

Jeff




2002-09-23 22:14:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

Justin T. Gibbs wrote:
> I somewhat doubt that any CPU would hold onto a posted write for 200us
> since you are not guaranteed that a read will occur quickly and you want
> those write buffers to be availble for other clients, but regardless, the
> code has not been as you describe since November of last year.


Great, I stand corrected. Looks like 2.5 code is ancient then?

comments on the 2.4 code:
* the 1000us delay in ahc_reset needs to be turned into a sleep, instead
all paths to that function [AFAICS] can sleep. likewise for the huge
delay in ahc_acquire_seeprom.

* 400ms worst case udelay() is in ahc_clear_critical_section is kinda
annoying [but I suppose it can be lived with, if the average is a lot
less than that :)]

* the delay in ahc_init should be replaced with a sleep

* PCI posting? (aic7xxx_core.c, line 1322, the last statement in the
function...)

ahc_outb(ahc, CLRINT, CLRSCSIINT);

I'll look it over some more later.

2002-09-23 21:50:02

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

> Justin T. Gibbs wrote:
>> On some motherboards with some chipsets, you can get these messages if
>> another busmaster (say an IDE drive or a sound card) is hogging the bus.
>> Usually this is with a VIA chipset. Its not clear why the aic7xxx_old
>> driver would behave differently other than it disables memory write
>> and invalidate PCI transactions on this chip. The new driver doesn't
>> need that work around.
>
>
> Justin,
>
> One thing I notice is at least one PCI posting bug. When using MMIO
> (write[bwlq] under Linux), you _must_ use a read[bwlq] to flush the write
> to PCI, if you wish to ensure the write posts at a certain point in the
> code.

>
> Here is the example PCI posting bug, in ahc_clear_critical_section:
>> ahc_outb(ahc, HCNTRL, ahc->unpause);
>> do {
>> ahc_delay(200);
>> } while (!ahc_is_paused(ahc));
>
> As you can see, there is no read before the udelay(), which is very wrong
> on modern CPUs with write posting... that's definitely a driver bug that
> will bite you on modern x86 motherboards [and is totally broken on ia64
> and other platforms].
>
> Please let me know if you have further questions on PCI write posting...

I somewhat doubt that any CPU would hold onto a posted write for 200us
since you are not guaranteed that a read will occur quickly and you want
those write buffers to be availble for other clients, but regardless, the
code has not been as you describe since November of last year. The
CHANGELOG reads:

Always perform a bus read prior to waiting in
a delay loop waiting for a bus write to take
effect. This ensures that the first time
through the loop the delay occurs after the
write has taken effect.

Of course, the "bug" was benign since the loop does perform a read
so the write is guaranteed to post during the first itteration through
the loop.

--
Justin

2002-09-23 22:40:35

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

> Great, I stand corrected. Looks like 2.5 code is ancient then?

Yes. I didn't do the original port and am now just finishing up my
port to 2.5.X.

> comments on the 2.4 code:
> * the 1000us delay in ahc_reset needs to be turned into a sleep, instead
> all paths to that function [AFAICS] can sleep. likewise for the huge
> delay in ahc_acquire_seeprom.

For all of these delays, I'd be more than happy to make them all into
sleeps if I can tell, from inside ahc_delay() if I'm in a context where
it is safe to sleep. On the other platforms that this core code runs on
I'm usually not in a context where it is safe to sleep, so I don't want
to switch to using a different driver primitive.

> * PCI posting? (aic7xxx_core.c, line 1322, the last statement in the
> function...)
>
> ahc_outb(ahc, CLRINT, CLRSCSIINT);

I don't care when the write occurs only that it will occur eventually.
The buffer will get flushed eventually so there is no need to call
ahc_flush_device_writes().

--
Justin

2002-09-23 22:50:45

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

>> For all of these delays, I'd be more than happy to make them all into
>> sleeps if I can tell, from inside ahc_delay() if I'm in a context where
>> it is safe to sleep. On the other platforms that this core code runs on
>> I'm usually not in a context where it is safe to sleep, so I don't want
>> to switch to using a different driver primitive.
>
> For Linux it's unconditionally safe, and other platforms is sounds like
> it's unconditionally not. So, s/ahc_delay/ahc_sleep/ for the places I
> pointed out, and just make ahc_delay==ahc_sleep on non-Linux platforms
> (or any similarly-functioning solution)

So you can sleep while in an interrupt context? I didn't know that
2.5 had switched to using interrupt threads or some similar construct.

> It's pretty much impossible to detect if you are inside certain
> spinlocks, in a generic fashion.

In 2.5, I should only need to release my own host lock assuming it is
held.

--
Justin

2002-09-23 22:45:54

by Jeff Garzik

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

Justin T. Gibbs wrote:
>>Great, I stand corrected. Looks like 2.5 code is ancient then?
>
>
> Yes. I didn't do the original port and am now just finishing up my
> port to 2.5.X.
>
>
>>comments on the 2.4 code:
>>* the 1000us delay in ahc_reset needs to be turned into a sleep, instead
>>all paths to that function [AFAICS] can sleep. likewise for the huge
>>delay in ahc_acquire_seeprom.
>
>
> For all of these delays, I'd be more than happy to make them all into
> sleeps if I can tell, from inside ahc_delay() if I'm in a context where
> it is safe to sleep. On the other platforms that this core code runs on
> I'm usually not in a context where it is safe to sleep, so I don't want
> to switch to using a different driver primitive.

For Linux it's unconditionally safe, and other platforms is sounds like
it's unconditionally not. So, s/ahc_delay/ahc_sleep/ for the places I
pointed out, and just make ahc_delay==ahc_sleep on non-Linux platforms
(or any similarly-functioning solution)

It's pretty much impossible to detect if you are inside certain
spinlocks, in a generic fashion.


>>* PCI posting? (aic7xxx_core.c, line 1322, the last statement in the
>>function...)
>>
>> ahc_outb(ahc, CLRINT, CLRSCSIINT);
>
>
> I don't care when the write occurs only that it will occur eventually.
> The buffer will get flushed eventually so there is no need to call
> ahc_flush_device_writes().

ok, thanks for clarifying.

Jeff



2002-09-23 23:17:39

by Jeff Garzik

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

Justin T. Gibbs wrote:
>>>For all of these delays, I'd be more than happy to make them all into
>>>sleeps if I can tell, from inside ahc_delay() if I'm in a context where
>>>it is safe to sleep. On the other platforms that this core code runs on
>>>I'm usually not in a context where it is safe to sleep, so I don't want
>>>to switch to using a different driver primitive.
>>
>>For Linux it's unconditionally safe, and other platforms is sounds like
>>it's unconditionally not. So, s/ahc_delay/ahc_sleep/ for the places I
>>pointed out, and just make ahc_delay==ahc_sleep on non-Linux platforms
>>(or any similarly-functioning solution)
>
>
> So you can sleep while in an interrupt context? I didn't know that
> 2.5 had switched to using interrupt threads or some similar construct.


Of course not :)

ahc_reset
aic7770_config -> can sleep
ahc_pci_config -> can sleep
ahc_shutdown -> can't sleep, whoops
ahc_resume -> dead code
ahc_init
aic7770_config -> can sleep
ahc_pci_config -> can sleep
ahc_acquire_seeprom
check_extport -> can sleep
ahc_proc_write_seeprom -> can sleep

so, ahc_init and ahc_acquire_seeprom can s/ahc_delay/ahc_sleep/ safely.

Oh, and I found another bug: never use check_region, it's inherently
racy. Use request_region and check its return value.

Note I agree with a comment in the code, that wrapping SHT->detect() in
io_request_lock is silly... the comment describing the rationale in
drivers/scsi/scsi.c is not really accurate...

Jeff



2002-09-23 23:23:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

Jeff Garzik wrote:
> ahc_reset
> aic7770_config -> can sleep
> ahc_pci_config -> can sleep
> ahc_shutdown -> can't sleep, whoops


Though, to answer your question from a previous email, you can call the
function in_interrupt() to see if you're in interrupt context.

Jeff



2002-09-24 09:31:37

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Quick aic7xxx bug hunt...

>Jeff Garzik wrote:
>> ahc_reset
>> aic7770_config -> can sleep
>> ahc_pci_config -> can sleep
>> ahc_shutdown -> can't sleep, whoops
>
>
>Though, to answer your question from a previous email, you can call the
>function in_interrupt() to see if you're in interrupt context.

Well... no, you can't rely on it for knowing if you can sleep or
not. in_interrupt() won't tell you interesting things like you
are holding a spinlock...

Ben.