inb_p and outb_p issue outb's to port 0x80 to achieve a short delay.
In a reasonable system there is nothing listening to port 0x80
or there is a post card, but there are no other devices there.
On a modern system with no post card the outb travels it's
way down to the LPC bus, and the outb is terminated by an abort
because nothing is listening.
So far so good. Except for the fact that recent high volume
ROM chips get confused when they see an abort on the LPC
bus. Making it problematic to update the ROM from under Linux.
I don't know if there are other buggy LPC devices or not. But
I do know that it is generally bad form do I/O to a random port.
So can we gradually kill inb_p, outb_p in 2.6? An the other
miscellaneous users of I/O port 0x80 for I/O delays?
Or possibly rewriting outb_p to look something like:
outb(); udelay(200); or whatever the appropriate delay is?
When debugging this I modified arch/i386/io.h to read:
#define __SLOW_DOWN_IO__ ""
Which totally removed the delay and the system ran fine.
Eric
On Llu, 2003-09-22 at 01:27, Eric W. Biederman wrote:
> So can we gradually kill inb_p, outb_p in 2.6? An the other
> miscellaneous users of I/O port 0x80 for I/O delays?
>
> Or possibly rewriting outb_p to look something like:
> outb(); udelay(200); or whatever the appropriate delay is?
The delay should be 8 ISA clocks. While you can easily fix inb_p and
outb_p you also need to fix up the udelay() code since if you stick
a BUG() check in udelay you'll find it gets used before the clock is
initialized even now, let alone with inb_p relying on it. But that
itself is quite fixable too.
(one part of the problem of course is you need inb_p/outb_p to drive
the timer chip on some x86 boards in order to calibrate the udelay
timer)
> When debugging this I modified arch/i386/io.h to read:
> #define __SLOW_DOWN_IO__ ""
> Which totally removed the delay and the system ran fine.
Not all systems do - we had breakages from both the keyboard controller
and the timer chips even on some modern boards when this got messed up.
Alan Cox wrote:
> (one part of the problem of course is you need inb_p/outb_p to drive
> the timer chip on some x86 boards in order to calibrate the udelay
> timer)
What sort of timer chip problems do you see? Is it something that can
be auto-detected, so that timer chip accesses can be made faster on
boards where that is fine?
I'm sure I've seen timer chip code in DOS programs that didn't have
the extra delay I/Os. Surely it cannot be a very widespread problem.
> > When debugging this I modified arch/i386/io.h to read:
> > #define __SLOW_DOWN_IO__ ""
> > Which totally removed the delay and the system ran fine.
>
> Not all systems do - we had breakages from both the keyboard controller
> and the timer chips even on some modern boards when this got messed up.
I've also seen much DOS code that didn't have extra delays for
keyboard I/Os. What sort of breakage did you observe with the
keyboard?
Thanks,
-- Jamie
On Llu, 2003-09-22 at 17:26, Jamie Lokier wrote:
> What sort of timer chip problems do you see? Is it something that can
> be auto-detected, so that timer chip accesses can be made faster on
> boards where that is fine?
CS5520 is one example. Also VIA VP2 seems to care but only very very
occasionally. On my 386 board its reliably borked without the delays
(not sure what chipset and its ISA so harder to tell)
> I've also seen much DOS code that didn't have extra delays for
> keyboard I/Os. What sort of breakage did you observe with the
> keyboard?
DEC laptops hang is the well known example of that one.
I'm *for* making this change to udelay, it just has to start up with a
suitably pessimal udelay assumption until calibrated.
On Mon, 2003-09-22 at 18:33, Alan Cox wrote:
\
> > I've also seen much DOS code that didn't have extra delays for
> > keyboard I/Os. What sort of breakage did you observe with the
> > keyboard?
>
> DEC laptops hang is the well known example of that one.
>
> I'm *for* making this change to udelay, it just has to start up with a
> suitably pessimal udelay assumption until calibrated.
or we make udelay() do the port 80 access in the uncalibrated case....
The first person to complain about the extra branch miss in udelay for
this will get laughed at by me ;)
Arjan van de Ven wrote:
> The first person to complain about the extra branch miss in udelay for
> this will get laughed at by me ;)
udelay(1) is too slow on a 386 even without the branch miss.
If you think I/O operations are infinitely slower than other
instructions, please explain why there is asm-optimised I/O code in
asm-i386/floppy.h.
:)
-- Jamie
[email protected] writes:
> > inb_p and outb_p issue outb's to port 0x80 to achieve a short delay.
> > In a reasonable system there is nothing listening to port 0x80
> > or there is a post card, but there are no other devices there.
> >
> > On a modern system with no post card the outb travels it's
> > way down to the LPC bus, and the outb is terminated by an abort
> > because nothing is listening.
> >
> > So far so good. Except for the fact that recent high volume
> > ROM chips get confused when they see an abort on the LPC
> > bus. Making it problematic to update the ROM from under Linux.
>
> Juts when we think we've found a safe time delay, somebody goes and
> screws it up.
>
> 1) Could you specify the ROM chips concerned? Is it possible to
> unconfuse them in software as a workaround?
SST. And they don't stay confused. They just drop whatever
command you were sending them. And dropping writes to a ROM
chip is very nasty. If you don't catch and fix it your system
will not boot the next time.
> 2) Do the BIOSes not write to port 0x80 themselves? The port was
> chosen precisely because most current BIOSes write boot progress
> indicators there, so Linux shouldn't be doing anything new.
I think they do. I have not been able to hook up a post card to
check. The difference is that they don't do it while talking
to the ROM chip.
> > I don't know if there are other buggy LPC devices or not. But
> > I do know that it is generally bad form do I/O to a random port.
>
> It's bad form to do I/O on a *random* port because you don't know what
> might be listening. It's *not* bad form to write to a known unused
> I/O port. On the original ISA bus, that's harmless, and the LPC bus is
> supposed to emulate that.
>
> And, as I mentioned, most BIOSes write to that port periodically.
Yes. There are a few boards and few weird cases where 0x80 is
not safe. This just adds to the list of problematic cases.
> > So can we gradually kill inb_p, outb_p in 2.6? An the other
> > miscellaneous users of I/O port 0x80 for I/O delays?
>
> Actually, It's not easy. The issue got debated a lot a few years ago.
> A read is also acceptable, and allows a few more ports to be
> potentially used, but that corrupts %al and thus bloats the code.
>
> > Or possibly rewriting outb_p to look something like:
> > outb(); udelay(200); or whatever the appropriate delay is?
>
> As Alan points out, udelay() requires either using the 8254 PIT or knowing
> at least one system clock speed to calibrate a bogomips-style delay loop.
> All of the known clock frequencies are on ISA-bus peripherals. So we
> have to access them BEFORE udelay() is calibrated.
>
> And the 8254 is one of the chips which requires the pause.
> This creates a significant boot-order challenge.
Yes, I agree. And during bootstrap the 0x80 case does not cause me
problems. But when the system is running I either need
cli();
......
sti();
pairs in my code, a global lpc bus lock, or I need to avoid
the writes to port 0x80. I can't even use cli()/sti() because
they have been removed in 2.6.
> > When debugging this I modified arch/i386/io.h to read:
> > #define __SLOW_DOWN_IO__ ""
> > Which totally removed the delay and the system ran fine.
>
> Yes, in 98% of modern boards it *does* work fine to just omit the delays
> entirely, because the motherboard chipset emulation can cope.
> But the original chip specs call for the delay, and the kernel
> has a hard time figuring out what's what early in the boot process.
Right. So this does need to be handled and delicately.
My main goal is to remove the deep magic voodoo in place. I would
object less if the code was clearly commented and isolated, to those
few places that need it.
Eric
Arjan van de Ven <[email protected]> writes:
> On Mon, 2003-09-22 at 18:33, Alan Cox wrote:
> \
> > > I've also seen much DOS code that didn't have extra delays for
> > > keyboard I/Os. What sort of breakage did you observe with the
> > > keyboard?
> >
> > DEC laptops hang is the well known example of that one.
> >
> > I'm *for* making this change to udelay, it just has to start up with a
> > suitably pessimal udelay assumption until calibrated.
>
> or we make udelay() do the port 80 access in the uncalibrated case....
>
> The first person to complain about the extra branch miss in udelay for
> this will get laughed at by me ;)
Sounds like a solution. I will see what I can do in that direction.
Maintaining a suitably pessimistic udelay with multi gigahertz chips
sounds like a challenge, so using outb to port 0x80 may be a
reasonable solution there.
Alan, can you describe a little more what the original delay is needed
for? I don't see it documented in my 8254 data sheet. The better I
can understand the problem the better I can write the comments on this
magic bit of code as I fix it.
The oldest machine I have is a 386 MCA system. Any chance of the bug
showing up there? I'd love to have a test case.
Another reason for fixing this is we are killing who knows how much
I/O bandwidth with this stream of failing writes to port 0x80.
Eric
Jamie Lokier <[email protected]> writes:
> Arjan van de Ven wrote:
> > The first person to complain about the extra branch miss in udelay for
> > this will get laughed at by me ;)
>
> udelay(1) is too slow on a 386 even without the branch miss.
Hmm. I will have to test that one.
> If you think I/O operations are infinitely slower than other
> instructions, please explain why there is asm-optimised I/O code in
> asm-i386/floppy.h.
>
> :)
Because the kernel was initially written in assembly and then fixed?
Eric
A few architectures define in[bwl]_p & out[bwl]_p differently from
in[bwl] and out[bwl]. These are:
i386 (all machines)
- four methods are offered; the normal one is 1 write to port
0x80. The others are selected by manually editing a file.
- The port 0x80 is hard-coded in assembly language in
<asm-i386/floppy.h>, although that code is no longer used.
m68k
- A delay is done for some machines. It is a write
to port 0x80.
MIPS
- normally disabled, but can be enabled if CONF_SLOWDOWN_IO
is manually edited in a file. It is a write to port
0x80.
PPC64
- the input operations don't have a delay
- the output operations have a udelay(1), _before_ the I/O.
- PPC (32 bit) doesn't have delays at all
SH (all machines except ec3104)
- only byte-size access has the delay method,
except on the "overdrive" where all sizes have it.
- delay is done by a special I/O operation,
ctrl_inw(0xa0000000)
- one SH board uses udelay(10) for the delay.
- another SH board has a funky single delay after
insw/insl/outsw/outsl (but not insb/outsb), using a
different special I/O, ctrl_inb(0xba000000).
x86_64
- four methods just like i386
Some of the architectures _call_ the _p operators even for some
architecture-specific devices, even though those operators don't have
a delay.
The m68k implementation is nice: it defines a function called
isa_delay(), which contains the delay which is done after the I/O by
_p operations.
IMHO, it would be nice if all architectures simply provided an
appropriate isa_delay() function, and the _p I/O operators were simply
removed. That would make the intent of drivers using those operators
a little clearer, too.
Devices
=======
I think they're all ISA devices, or emulations of them.
Unfortunately, a _lot_ of drivers in the kernel use the _p I/O
operators so it's not possible to look for a few special quirky
devices. I suspect that in many cases, the _p operators are not
required but have been used for safety. There is no harm in this of
course.
Alan mentioned the PIT timer chip and keyboard which appear on x86
machines as being specifically quirky.
I find it interesting, but awfully difficult to know whether it's
"correct", that the i386 PIT code uses inb/outb for some operations
and inb_p/outb_p for some others.
Alan Cox wrote:
> On Llu, 2003-09-22 at 17:26, Jamie Lokier wrote:
> > What sort of timer chip problems do you see? Is it something that can
> > be auto-detected, so that timer chip accesses can be made faster on
> > boards where that is fine?
>
> CS5520 is one example. Also VIA VP2 seems to care but only very very
> occasionally. On my 386 board its reliably borked without the delays
> (not sure what chipset and its ISA so harder to tell)
Yeah, but what's the problem and can it be detected? :)
> > I've also seen much DOS code that didn't have extra delays for
> > keyboard I/Os. What sort of breakage did you observe with the
> > keyboard?
>
> DEC laptops hang is the well known example of that one.
>
> I'm *for* making this change to udelay, it just has to start up with a
> suitably pessimal udelay assumption until calibrated.
I'm wondering if there's a way to detect how much udelay is needed on
a particular board, and reduce or remove it on boards where it isn't
needed.
udelay() is also unreliable nowadays, due to CPUs changing clock
speeds according to the whims of the BIOS. On laptops, even the rdtsc
rate varies. If the delay is critical to system reliability for
unknown reasons, then switching to udelay() removes some of that "we
always did this and it fixed the unknown problems" legacy driver safety
Unfortunately, there are a lot of drivers, and a lot of x86
arch-specific code, which use the delay operaters. There's no real
way to verify that all the drivers are fine when the delay is reduced
or removed.
The delay should only be effective for ISA devices. I wonder if it
makes sense to separate the delay into an isa_delay() or io_delay()
function, sprinkled into source where the _p operators currently
appear, to make it clearer where the delays should appear.
In the i386 floppy driver, for example, it's clear why the delay is
there in one part of the virtual DMA loop and not the other: to allow
a device time to propagate a state change. That also explains the
need for a delay after writing to port 0x42 of the PIT but not after
writing port 0x43. I think calls to isa_delay() would make the intent
there a little clearer.
-- Jamie
Eric W. Biederman wrote:
> Alan, can you describe a little more what the original delay is needed
> for? I don't see it documented in my 8254 data sheet. The better I
> can understand the problem the better I can write the comments on this
> magic bit of code as I fix it.
>
> The oldest machine I have is a 386 MCA system. Any chance of the bug
> showing up there? I'd love to have a test case.
>
> Another reason for fixing this is we are killing who knows how much
> I/O bandwidth with this stream of failing writes to port 0x80.
Unfortunately, a lot of drivers use the _p operators so if the delay
is simply removed, it may take a while before the ill effects of that
are discovered.
-- Jamie
On Mon, Sep 22, 2003 at 07:28:08PM +0100, Jamie Lokier wrote:
> Arjan van de Ven wrote:
> > The first person to complain about the extra branch miss in udelay for
> > this will get laughed at by me ;)
>
> udelay(1) is too slow on a 386 even without the branch miss.
ok we have ndelay() now as well in 2.6
Jamie Lokier <[email protected]> writes:
> A few architectures define in[bwl]_p & out[bwl]_p differently from
> in[bwl] and out[bwl]. These are:
> x86_64
> - four methods just like i386
At least until some new chipsets come out I can certify that x86_64
works just fine without the delay.
> Some of the architectures _call_ the _p operators even for some
> architecture-specific devices, even though those operators don't have
> a delay.
>
> The m68k implementation is nice: it defines a function called
> isa_delay(), which contains the delay which is done after the I/O by
> _p operations.
>
> IMHO, it would be nice if all architectures simply provided an
> appropriate isa_delay() function, and the _p I/O operators were simply
> removed. That would make the intent of drivers using those operators
> a little clearer, too.
It certainly removes the magic coupling.
> Devices
> =======
>
> I think they're all ISA devices, or emulations of them.
>
> Unfortunately, a _lot_ of drivers in the kernel use the _p I/O
> operators so it's not possible to look for a few special quirky
> devices. I suspect that in many cases, the _p operators are not
> required but have been used for safety. There is no harm in this of
> course.
>
> Alan mentioned the PIT timer chip and keyboard which appear on x86
> machines as being specifically quirky.
>
> I find it interesting, but awfully difficult to know whether it's
> "correct", that the i386 PIT code uses inb/outb for some operations
> and inb_p/outb_p for some others.
>
> I'm wondering if there's a way to detect how much udelay is needed on
> a particular board, and reduce or remove it on boards where it isn't
> needed.
Until the problem is more clearly defined I don't think we can
auto-detect. The best we can do is to have drivers that replace
the legacy drivers when appropriate and don't do the delay. If the
delay is done with udelay though it does not cause any bus traffic and
another device can be using the bus.
> udelay() is also unreliable nowadays, due to CPUs changing clock
> speeds according to the whims of the BIOS. On laptops, even the rdtsc
> rate varies. If the delay is critical to system reliability for
> unknown reasons, then switching to udelay() removes some of that "we
> always did this and it fixed the unknown problems" legacy driver safety
As long as udelay is calibrated at the fast system clock speed we are
ok. When the clock slows down the delay just increases, which is fine.
> Unfortunately, there are a lot of drivers, and a lot of x86
> arch-specific code, which use the delay operaters. There's no real
> way to verify that all the drivers are fine when the delay is reduced
> or removed.
We just need something sufficiently good. If the delay is removed
on a system that needs it someone will complain.
> The delay should only be effective for ISA devices. I wonder if it
> makes sense to separate the delay into an isa_delay() or io_delay()
> function, sprinkled into source where the _p operators currently
> appear, to make it clearer where the delays should appear.
>
> In the i386 floppy driver, for example, it's clear why the delay is
> there in one part of the virtual DMA loop and not the other: to allow
> a device time to propagate a state change. That also explains the
> need for a delay after writing to port 0x42 of the PIT but not after
> writing port 0x43. I think calls to isa_delay() would make the intent
> there a little clearer.
I would agree with that. Although I wonder if we are not mixing up
various delays into one mechanism. In any event it is a small safe step
forward to entangling this legacy confusion.
Eric
> Another reason for fixing this is we are killing who knows how much
> I/O bandwidth with this stream of failing writes to port 0x80.
Assuming we do stop using I/O to port 0x80 for timing purposes, would
it be worth adding code to make existing POST cards double as a poor
man's front panel display once the kernel has booted?
John.
On Mon, Sep 22, 2003 at 10:37:32PM +0100, Jamie Lokier wrote:
> We already see this problem with pure PCI devices. The standard
> solution with PCI devices is to issue a PCI read after the write, to
> flush the write.
afaik only PCI memory accesses are posted, not io port accesses
John Bradford wrote:
> > Another reason for fixing this is we are killing who knows how much
> > I/O bandwidth with this stream of failing writes to port 0x80.
>
> Assuming we do stop using I/O to port 0x80 for timing purposes, would
> it be worth adding code to make existing POST cards double as a poor
> man's front panel display once the kernel has booted?
Problem: what if the ISA device is behind a PCI bridge?
At the moment, outb_p followed by inb will be seen by the ISA device
as "write, write, read," giving the ISA device time to propagate state
changes between the end of the first write and the beginning of the
read.
If that is replaced by "write, udelay(1), read," then it will be sent
over the PCI-ISA bridge and may be seen as "write, read" with only a
small delay between them, due to the PCI subsystem delaying the write.
We already see this problem with pure PCI devices. The standard
solution with PCI devices is to issue a PCI read after the write, to
flush the write.
Over the PCI-ISA bridge, some transaction is needed to flush the first
write, unfortunately, if we want to guarantee the same delays that are
there at the moment.
I think that means we need to keep write to port 0x80, otherwise we
will be changing the behaviour of the many legacy drivers which use _p
operators.
-- Jamie
[email protected] writes:
> > So can we gradually kill inb_p, outb_p in 2.6? An the other
> > miscellaneous users of I/O port 0x80 for I/O delays?
>
> Actually, It's not easy. The issue got debated a lot a few years ago.
> A read is also acceptable, and allows a few more ports to be
> potentially used, but that corrupts %al and thus bloats the code.
It bloats the code a lot less than udelay() calls or any other
solution which keeps the delay!
In the worst case, the bloat from a read _should_ be two bytes: "push
%eax; inb $80,%al; pop %eax". Whereas a call to udelay is 5 bytes,
for a call instruction.
-- Jamie
Arjan van de Ven wrote:
> > > The first person to complain about the extra branch miss in udelay for
> > > this will get laughed at by me ;)
> >
> > udelay(1) is too slow on a 386 even without the branch miss.
>
> ok we have ndelay() now as well in 2.6
I was sort of joking, but the point is that a function call might be
too slow on a 386.
On a 386, a function call for every byte transferred to/from the
floppy disk might show up as a real overhead - this occurred to me
because someone once thought it was worth rewriting the floppy data
transfer code in assembly language, and presumably it did improve
performance at the time.
Of course the answer is to not use udelay() on 386-optimised
configurations.
-- Jamie
On Llu, 2003-09-22 at 22:37, Jamie Lokier wrote:
> At the moment, outb_p followed by inb will be seen by the ISA device
> as "write, write, read," giving the ISA device time to propagate state
> changes between the end of the first write and the beginning of the
> read.
This isnt MMIO so won't be posted.
On Llu, 2003-09-22 at 20:00, Jamie Lokier wrote:
> > CS5520 is one example. Also VIA VP2 seems to care but only very very
> > occasionally. On my 386 board its reliably borked without the delays
> > (not sure what chipset and its ISA so harder to tell)
>
> Yeah, but what's the problem and can it be detected? :)
You get bogus results
> I'm wondering if there's a way to detect how much udelay is needed on
> a particular board, and reduce or remove it on boards where it isn't
> needed.
8 ISA cycles will be nice and safe - see the specification for the ISA
bus. Its a nice easy known value and the way we moved various drivers to
udelay that used isa delay cycles for timing loops
> udelay() is also unreliable nowadays, due to CPUs changing clock
> speeds according to the whims of the BIOS. On laptops, even the rdtsc
> rate varies. If the delay is critical to system reliability for
> unknown reasons, then switching to udelay() removes some of that "we
> always did this and it fixed the unknown problems" legacy driver safety
Delaying too long is ok, delaying too little isnt good but as you say
most modern hw seems not to care.
On Llu, 2003-09-22 at 19:58, Eric W. Biederman wrote:
> Alan, can you describe a little more what the original delay is needed
> for? I don't see it documented in my 8254 data sheet. The better I
> can understand the problem the better I can write the comments on this
> magic bit of code as I fix it.
If I remember rightly its because it is a 2Mhz part on an 8Mhz bus.
> The oldest machine I have is a 386 MCA system. Any chance of the bug
> showing up there? I'd love to have a test case.
No idea
> Another reason for fixing this is we are killing who knows how much
> I/O bandwidth with this stream of failing writes to port 0x80.
Definitely - and if we can boot up with udelay() using some pessimal
value then we don't even need the branch, just the udelay for the 8 ISA
cycles.
Alan
In article <[email protected]>,
Eric W. Biederman <[email protected]> wrote:
| Jamie Lokier <[email protected]> writes:
| > Unfortunately, there are a lot of drivers, and a lot of x86
| > arch-specific code, which use the delay operaters. There's no real
| > way to verify that all the drivers are fine when the delay is reduced
| > or removed.
|
| We just need something sufficiently good. If the delay is removed
| on a system that needs it someone will complain.
The only problem with that is that is that (a) a complaint and a dollar
will get you a cheap beer, but this is Linux and no one *needs* to fix
it, therefore not breaking it becomes more important. The top developers
are not running legacy 386's, I bet. And (b) if the problem comes up
months from now, will anyone think to try timing changes for "every once
in a while" problems.
I really like the isa_delay() idea, or similar, which will be in a
single place and probably get enough attention to make it work. It just
sounds like a safer way to go with equal benefits.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
In article <[email protected]>,
Alan Cox <[email protected]> wrote:
| On Llu, 2003-09-22 at 19:58, Eric W. Biederman wrote:
| > Alan, can you describe a little more what the original delay is needed
| > for? I don't see it documented in my 8254 data sheet. The better I
| > can understand the problem the better I can write the comments on this
| > magic bit of code as I fix it.
|
| If I remember rightly its because it is a 2Mhz part on an 8Mhz bus.
And I thought I was a hotshot overclocker ;-)
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
In article <[email protected]>,
Jamie Lokier <[email protected]> wrote:
| Arjan van de Ven wrote:
| > The first person to complain about the extra branch miss in udelay for
| > this will get laughed at by me ;)
|
| udelay(1) is too slow on a 386 even without the branch miss.
|
| If you think I/O operations are infinitely slower than other
| instructions, please explain why there is asm-optimised I/O code in
| asm-i386/floppy.h.
|
| :)
The choices are:
1 - there really were some old crappy chips which were both slow and
timing sensitive
2 - someone thought that would optimize access
3 - gcc of the time generated bad code if you didn't
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
In article <[email protected]>,
Jamie Lokier <[email protected]> wrote:
| [email protected] writes:
| > > So can we gradually kill inb_p, outb_p in 2.6? An the other
| > > miscellaneous users of I/O port 0x80 for I/O delays?
| >
| > Actually, It's not easy. The issue got debated a lot a few years ago.
| > A read is also acceptable, and allows a few more ports to be
| > potentially used, but that corrupts %al and thus bloats the code.
|
| It bloats the code a lot less than udelay() calls or any other
| solution which keeps the delay!
|
| In the worst case, the bloat from a read _should_ be two bytes: "push
| %eax; inb $80,%al; pop %eax". Whereas a call to udelay is 5 bytes,
| for a call instruction.
Isn't one of the benefits of a rethink not to use any i/o bus cycles?
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
bill davidsen wrote:
>
> Isn't one of the benefits of a rethink not to use any i/o bus cycles?
I wouldn't worry about the bloat as much as I do about synchronization.
Doing an IO to an ISA device not just causes a delay, but tends to actually
force the PCI forwarding buffers to flush.
Of course, IOIO shouldn't be buffered anyway, and if we wanted to flush
stuff we'd actually be better with a read, so..
But what we _could_ do is to make "inb_p()" be more like this
#define inb_p(port) ({ unsigned char val; \
asm volatile("call __inb_p" \
:"=a" (val) \
:"d" ((unsigned short)(port))); \
val; })
where we call to an out-of-line function with a magic calling convention (so
that it doesn't flush the register state like a normal call would).
That would likely shrink the code, and it would mean that we could more
easily play with what we do in the delay case (including deciding the code
at boot-time).
Anybody want to try that?
Linus