commit b2a10c0b8e6c1c73b940e60fae4cbe9db9ca9e3b
Author: Rene Herman <[email protected]>
Date: Mon Dec 17 21:23:55 2007 +0100
x86: provide a DMI based port 0x80 I/O delay override.
Certain (HP/Compaq) laptops experience trouble from our port 0x80
I/O delay writes. This patch provides for a DMI based switch to the
"alternate diagnostic port" 0xed (as used by some BIOSes as well)
for these.
David P. Reed confirmed that using port 0xed works and provides a
proper delay on his HP Pavilion dv9000z, Islam Amer comfirmed that
it does so on a Compaq Presario V6000. Both are Quanta boards, type
30B9 and 30B7 respectively and are the (only) machines for which
the DMI based switch triggers. HP Pavilion dv6000z is expected to
also need this but its DMI info hasn't been verified yet.
The symptoms of _not_ working are a hanging machine, with "hwclock"
use being a direct trigger and therefore the bootup often hanging
already on these machines.
Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist, but that approach has
two problems.
First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but still leaves:
Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally various drivers are racy on SMP without the bus
locking outb.
Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.
An early boot parameter to make the choice manually (and override any
possible DMI based decision) is also provided:
io_delay=standard|alternate
This does not change the io_delay() in the boot code which is using
the same port 0x80 I/O delay but those do not appear to be a problem
as tested by David P. Reed. He moreover reported that booting with
"acpi=off" also fixed things and seeing as how ACPI isn't touched
until after this DMI based I/O port switch leaving the ones in the
boot code be is safe.
This patch is partly based on earlier patches from Pavel Machek and
David P. Reed.
Signed-off-by: Rene Herman <[email protected]>
Tested-by: David P. Reed <[email protected]>
Tested-by: Islam Amer <[email protected]>
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 33121d6..6948e25 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -785,6 +785,12 @@ and is between 256 and 4096 characters. It is defined in the file
for translation below 32 bit and if not available
then look in the higher range.
+ io_delay= [X86-32,X86-64] I/O delay port
+ standard
+ Use the 0x80 standard I/O delay port (default)
+ alternate
+ Use the 0xed alternate I/O delay port
+
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
diff --git a/arch/x86/boot/compressed/misc_32.c b/arch/x86/boot/compressed/misc_32.c
index b74d60d..288e162 100644
--- a/arch/x86/boot/compressed/misc_32.c
+++ b/arch/x86/boot/compressed/misc_32.c
@@ -276,10 +276,10 @@ static void putstr(const char *s)
RM_SCREEN_INFO.orig_y = y;
pos = (x + cols * y) * 2; /* Update cursor position */
- outb_p(14, vidport);
- outb_p(0xff & (pos >> 9), vidport+1);
- outb_p(15, vidport);
- outb_p(0xff & (pos >> 1), vidport+1);
+ outb(14, vidport);
+ outb(0xff & (pos >> 9), vidport+1);
+ outb(15, vidport);
+ outb(0xff & (pos >> 1), vidport+1);
}
static void* memset(void* s, int c, unsigned n)
diff --git a/arch/x86/boot/compressed/misc_64.c b/arch/x86/boot/compressed/misc_64.c
index 6ea015a..43e5fcc 100644
--- a/arch/x86/boot/compressed/misc_64.c
+++ b/arch/x86/boot/compressed/misc_64.c
@@ -269,10 +269,10 @@ static void putstr(const char *s)
RM_SCREEN_INFO.orig_y = y;
pos = (x + cols * y) * 2; /* Update cursor position */
- outb_p(14, vidport);
- outb_p(0xff & (pos >> 9), vidport+1);
- outb_p(15, vidport);
- outb_p(0xff & (pos >> 1), vidport+1);
+ outb(14, vidport);
+ outb(0xff & (pos >> 9), vidport+1);
+ outb(15, vidport);
+ outb(0xff & (pos >> 1), vidport+1);
}
static void* memset(void* s, int c, unsigned n)
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index a7bc93c..0cc1981 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -8,7 +8,7 @@ CPPFLAGS_vmlinux.lds += -Ui386
obj-y := process_32.o signal_32.o entry_32.o traps_32.o irq_32.o \
ptrace_32.o time_32.o ioport_32.o ldt_32.o setup_32.o i8259_32.o sys_i386_32.o \
pci-dma_32.o i386_ksyms_32.o i387_32.o bootflag.o e820_32.o\
- quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o
+ quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o io_delay.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += cpu/
diff --git a/arch/x86/kernel/Makefile_64 b/arch/x86/kernel/Makefile_64
index 5a88890..08a68f0 100644
--- a/arch/x86/kernel/Makefile_64
+++ b/arch/x86/kernel/Makefile_64
@@ -11,7 +11,7 @@ obj-y := process_64.o signal_64.o entry_64.o traps_64.o irq_64.o \
x8664_ksyms_64.o i387_64.o syscall_64.o vsyscall_64.o \
setup64.o bootflag.o e820_64.o reboot_64.o quirks.o i8237.o \
pci-dma_64.o pci-nommu_64.o alternative.o hpet.o tsc_64.o bugs_64.o \
- i8253.o
+ i8253.o io_delay.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += cpu/
diff --git a/arch/x86/kernel/io_delay.c b/arch/x86/kernel/io_delay.c
new file mode 100644
index 0000000..77a8bcd
--- /dev/null
+++ b/arch/x86/kernel/io_delay.c
@@ -0,0 +1,77 @@
+/*
+ * I/O delay strategies for inb_p/outb_p
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/dmi.h>
+#include <asm/io.h>
+
+/*
+ * Allow for a DMI based override of port 0x80
+ */
+#define IO_DELAY_PORT_STD 0x80
+#define IO_DELAY_PORT_ALT 0xed
+
+static unsigned short io_delay_port __read_mostly = IO_DELAY_PORT_STD;
+
+void native_io_delay(void)
+{
+ asm volatile ("outb %%al, %w0" : : "d" (io_delay_port));
+}
+EXPORT_SYMBOL(native_io_delay);
+
+static int __init dmi_io_delay_port_alt(const struct dmi_system_id *id)
+{
+ printk(KERN_NOTICE "%s: using alternate I/O delay port\n", id->ident);
+ io_delay_port = IO_DELAY_PORT_ALT;
+ return 0;
+}
+
+static struct dmi_system_id __initdata dmi_io_delay_port_alt_table[] = {
+ {
+ .callback = dmi_io_delay_port_alt,
+ .ident = "Compaq Presario V6000",
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "Quanta"),
+ DMI_MATCH(DMI_BOARD_NAME, "30B7")
+ }
+ },
+ {
+ .callback = dmi_io_delay_port_alt,
+ .ident = "HP Pavilion dv9000z",
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "Quanta"),
+ DMI_MATCH(DMI_BOARD_NAME, "30B9")
+ }
+ },
+ {
+ }
+};
+
+static int __initdata io_delay_override;
+
+static int __init io_delay_param(char *s)
+{
+ if (!s)
+ return -EINVAL;
+
+ if (!strcmp(s, "standard"))
+ io_delay_port = IO_DELAY_PORT_STD;
+ else if (!strcmp(s, "alternate"))
+ io_delay_port = IO_DELAY_PORT_ALT;
+ else
+ return -EINVAL;
+
+ io_delay_override = 1;
+ return 0;
+}
+
+early_param("io_delay", io_delay_param);
+
+void __init io_delay_init(void)
+{
+ if (!io_delay_override)
+ dmi_check_system(dmi_io_delay_port_alt_table);
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index e1e18c3..6c3a3b4 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -648,6 +648,8 @@ void __init setup_arch(char **cmdline_p)
dmi_scan_machine();
+ io_delay_init();;
+
#ifdef CONFIG_X86_GENERICARCH
generic_apic_probe();
#endif
diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index 30d94d1..ec976ed 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -311,6 +311,8 @@ void __init setup_arch(char **cmdline_p)
dmi_scan_machine();
+ io_delay_init();
+
#ifdef CONFIG_SMP
/* setup to use the static apicid table during kernel startup */
x86_cpu_to_apicid_ptr = (void *)&x86_cpu_to_apicid_init;
diff --git a/include/asm-x86/io_32.h b/include/asm-x86/io_32.h
index fe881cd..690b8f4 100644
--- a/include/asm-x86/io_32.h
+++ b/include/asm-x86/io_32.h
@@ -250,10 +250,8 @@ static inline void flush_write_buffers(void)
#endif /* __KERNEL__ */
-static inline void native_io_delay(void)
-{
- asm volatile("outb %%al,$0x80" : : : "memory");
-}
+extern void io_delay_init(void);
+extern void native_io_delay(void);
#if defined(CONFIG_PARAVIRT)
#include <asm/paravirt.h>
diff --git a/include/asm-x86/io_64.h b/include/asm-x86/io_64.h
index a037b07..b2d4994 100644
--- a/include/asm-x86/io_64.h
+++ b/include/asm-x86/io_64.h
@@ -35,13 +35,18 @@
* - Arnaldo Carvalho de Melo <[email protected]>
*/
-#define __SLOW_DOWN_IO "\noutb %%al,$0x80"
+extern void io_delay_init(void);
+extern void native_io_delay(void);
+static inline void slow_down_io(void)
+{
+ native_io_delay();
#ifdef REALLY_SLOW_IO
-#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO __SLOW_DOWN_IO
-#else
-#define __FULL_SLOW_DOWN_IO __SLOW_DOWN_IO
+ native_io_delay();
+ native_io_delay();
+ native_io_delay();
#endif
+}
/*
* Talk about misusing macros..
@@ -50,21 +55,21 @@
static inline void out##s(unsigned x value, unsigned short port) {
#define __OUT2(s,s1,s2) \
-__asm__ __volatile__ ("out" #s " %" s1 "0,%" s2 "1"
+__asm__ __volatile__ ("out" #s " %" s1 "0,%" s2 "1" : : "a" (value), "Nd" (port))
#define __OUT(s,s1,x) \
-__OUT1(s,x) __OUT2(s,s1,"w") : : "a" (value), "Nd" (port)); } \
-__OUT1(s##_p,x) __OUT2(s,s1,"w") __FULL_SLOW_DOWN_IO : : "a" (value), "Nd" (port));} \
+__OUT1(s,x) __OUT2(s,s1,"w"); } \
+__OUT1(s##_p,x) __OUT2(s,s1,"w"); slow_down_io(); }
#define __IN1(s) \
static inline RETURN_TYPE in##s(unsigned short port) { RETURN_TYPE _v;
#define __IN2(s,s1,s2) \
-__asm__ __volatile__ ("in" #s " %" s2 "1,%" s1 "0"
+__asm__ __volatile__ ("in" #s " %" s2 "1,%" s1 "0" : "=a" (_v) : "Nd" (port))
-#define __IN(s,s1,i...) \
-__IN1(s) __IN2(s,s1,"w") : "=a" (_v) : "Nd" (port) ,##i ); return _v; } \
-__IN1(s##_p) __IN2(s,s1,"w") __FULL_SLOW_DOWN_IO : "=a" (_v) : "Nd" (port) ,##i ); return _v; } \
+#define __IN(s,s1) \
+__IN1(s) __IN2(s,s1,"w"); return _v; } \
+__IN1(s##_p) __IN2(s,s1,"w"); slow_down_io(); return _v; }
#define __INS(s) \
static inline void ins##s(unsigned short port, void * addr, unsigned long count) \
On Sun, 30 Dec 2007, Rene Herman wrote:
>
> This fixes "hwclock" triggered boottime hangs for a few HP/Compaq laptops
> and might as such be applicable to 2.6.24 still.
It's not a regression as far as I can see (ie we've always done that port
80 access for slow-down), and quite frankly, I think the code is horribly
ugly.
Using a DMI quirk for something like this is just not maintainable. Are we
going to live with doing new quirks forever? I'd rather just remove the
slowdown entirely (obviously that is not for 2.6.24 either, though!), and
drivers that then are shown to really need it could use their *own* ports.
Linus
> drivers that then are shown to really need it could use their *own* ports.
The i8259 driver uses it and it is known to be needed on some old chipsets.
But it doesn't really have any "own" ports to use afaik.
-Andi
* Linus Torvalds <[email protected]> wrote:
> > This fixes "hwclock" triggered boottime hangs for a few HP/Compaq
> > laptops and might as such be applicable to 2.6.24 still.
>
> It's not a regression as far as I can see (ie we've always done that
> port 80 access for slow-down), and quite frankly, I think the code is
> horribly ugly.
>
> Using a DMI quirk for something like this is just not maintainable.
> Are we going to live with doing new quirks forever? I'd rather just
> remove the slowdown entirely (obviously that is not for 2.6.24 either,
> though!), and drivers that then are shown to really need it could use
> their *own* ports.
yep, that's exactly the plan: in x86.git we've got it all set up so that
we can switch over to ioport=nodelay by default in v2.6.25, and then get
rid of all the iodelay infrastructure in 2.6.26 altogether if things
work out fine (which is the expectation from all test feedback so far).
Ingo
* Andi Kleen <[email protected]> wrote:
> > drivers that then are shown to really need it could use their *own*
> > ports.
>
> The i8259 driver uses it and it is known to be needed on some old
> chipsets. But it doesn't really have any "own" ports to use afaik.
we'll solve that via an i8259-specific quirk. That is a lot cleaner and
maintainable than the current generic, always-enabled "opt out"
port-0x80 quirk.
Ingo
On 30-12-07 10:30, Linus Torvalds wrote:
> On Sun, 30 Dec 2007, Rene Herman wrote:
>> This fixes "hwclock" triggered boottime hangs for a few HP/Compaq laptops
>> and might as such be applicable to 2.6.24 still.
>
> It's not a regression as far as I can see (ie we've always done that port
> 80 access for slow-down), and quite frankly, I think the code is horribly
> ugly.
It is indeed not a regression. Submitted it as a stop-gap measure for those
specific afflicted machines but I guess they'll mostly be able to google up
the problem and patch by now as well..
> Using a DMI quirk for something like this is just not maintainable. Are we
> going to live with doing new quirks forever? I'd rather just remove the
> slowdown entirely (obviously that is not for 2.6.24 either, though!), and
> drivers that then are shown to really need it could use their *own* ports.
And yes, "elegant" it is neither. It's a bit of a pesky problem though. Port
0x80 is a decidedly non-random port selection in so far that it's just about
the only available port with guaranteed (in a PC sense) effects -- various
chipsets make specific efforts to forward port 0x80 writes onto ISA due to
its use as a POST port by the PC BIOS meaning the outb outside its bus-level
effects also has fairly well defined timing characteristics. In practice, a
udelay(2) is going to satisfy the delay property though -- but doesn't do
anything for the other things the outb() does.
The legacy PIT, PIC and DMA and KB controllers have been mentioned in this
and previous incarnations of this same thread as hardware that in some
implementations need the outb to function properly but ofcourse, no _sane_
implementations do. With an arch that purports to support just about
anything though there's some fairly justified fear, uncertainty, doubt that
the ones to break aren't going to be found and reported quickly/easily. In
itself, that could mean it's also not something to be overly worried about,
but still not nice.
With the various races in (legacy) drivers additionally an early suggestion
by Andi Kleen to leave the outb in place for a DMI year < X (or no DMI
available) and just do nothing for > X might in fact be justified.
Rene.
> slowdown entirely (obviously that is not for 2.6.24 either, though!), and
> drivers that then are shown to really need it could use their *own* ports.
*No* - that is the one thing they cannot do. The _p cycles on ISA for 2MHz
parts on a standard ISA bus needs the delay to come off another device.
For modern systems we should just use tsc delays, but we have to fix all
the drivers first as right now 0x80 causes posting and we have some PCI
users (I think probably all bogus), and we need to fix the tons of
locking errors that are mostly covered by the inb 0x80 being an
indivisible operation so not getting split by interrupts/SMP.
I've been going through the drivers that use it - the biggest mess
appears to be in the watchdog drivers all of which copied an original
lack of locking from the mid 1990s caused by umm.. me. I guess my past is
catching up with me ;)
Some of the ISA network users (like the scc driver) are going to be quite
foul to fix but most of it looks quite sane.
The X server also appears to touch 0x80 in some cases but we can hope
only on ancient hardware.
Alan
* Alan Cox <[email protected]> wrote:
> For modern systems we should just use tsc delays, but we have to fix
> all the drivers first as right now 0x80 causes posting and we have
> some PCI users (I think probably all bogus), and we need to fix the
> tons of locking errors that are mostly covered by the inb 0x80 being
> an indivisible operation so not getting split by interrupts/SMP.
i dont get your last point. Firstly, we do an "outb $0x80" not an inb.
Secondly, outb $0x80 has no PCI posting side-effects AFAICS. Thirdly,
even assuming that it has PCI posting side-effects, how can any locking
error be covered up by an outb 0x80 sticking together with the inb it
does before it? The sequence we emit is:
inbb $some_port
outb $0x80
and i see that the likelyhood of getting such sequences from two CPUs
'mixed up' are low, but how can this have any smp locking side-effects?
How can this provide any workaround/coverup?
> I've been going through the drivers that use it - the biggest mess
> appears to be in the watchdog drivers all of which copied an original
> lack of locking from the mid 1990s caused by umm.. me. I guess my past
> is catching up with me ;)
heh :-)
> The X server also appears to touch 0x80 in some cases but we can hope
> only on ancient hardware.
do you have any memories about the outb_p() use of misc_32.c:
pos = (x + cols * y) * 2; /* Update cursor position */
outb_p(14, vidport);
outb_p(0xff & (pos >> 9), vidport+1);
outb_p(15, vidport);
outb_p(0xff & (pos >> 1), vidport+1);
was this ever needed? This is so early in the bootup that can we cannot
do any sensible delay. Perhaps we could try a natural delay sequence via
inb from 0x3cc:
outb(14, vidport);
inb(0x3cc); /* delay */
outb(0xff & (pos >> 9), vidport+1);
inb(0x3cc); /* delay */
outb(15, vidport);
inb(0x3cc); /* delay */
outb(0xff & (pos >> 1), vidport+1);
inb(0x3cc); /* delay */
as a dummy delay (totally untested).
Reading from the 0x3cc port does not impact the cursor position update
sequence IIRC - i think the vidport is even ignored for the input
direction by most hardware, there's a separate input register. The 0x3cc
port is a well-defined VGA register which should be unused on non-VGA
hardware. (which makes it a perfect delay register in any case)
Ingo
> i dont get your last point. Firstly, we do an "outb $0x80" not an inb.
outb not inb sorry yes
> Secondly, outb $0x80 has no PCI posting side-effects AFAICS. Thirdly,
It does. The last mmio write cycle to the bridge gets pushed out before
the 0x80 cycle goes to the PCI bridge, times out and goes to the LPC bus.
I still don't believe any of our _p users in PCI space are actually real
- but someone needs to look at the scsi ones.
> even assuming that it has PCI posting side-effects, how can any locking
> error be covered up by an outb 0x80 sticking together with the inb it
> does before it? The sequence we emit is:
>
> inbb $some_port
> outb $0x80
>
> and i see that the likelyhood of getting such sequences from two CPUs
> 'mixed up' are low, but how can this have any smp locking side-effects?
> How can this provide any workaround/coverup?
We issue inb port
We issue outb 0x80
The CPU core stalls and the LPC bus stalls
On the other CPU we issue another access to the LPC bus because our
locking is wrong. With the 0x80 outb use this stalls so the delay is
applied unless the two inb's occur perfectly in time. With a udelay() the
udelay can be split and we get a second access which breaks the needed
device delay. We end up relying on the bus locking non splitting
properties of the 0x80 port access to paper over bugs - see the watchdog
fix example I sent you about a week ago.
That btw is another argument for removing 0x80 usage as much as possible
- its bad for real time.
> do you have any memories about the outb_p() use of misc_32.c:
>
> pos = (x + cols * y) * 2; /* Update cursor position */
> outb_p(14, vidport);
> outb_p(0xff & (pos >> 9), vidport+1);
> outb_p(15, vidport);
> outb_p(0xff & (pos >> 1), vidport+1);
>
> was this ever needed? This is so early in the bootup that can we cannot
None - but we don't care. The problems with 0x80 and the wacko HP systems
occur once ACPI is enabled so we are fine using 0x80. I don't myself know
why the _p versions ended up being used. A rummage through archives found
me nothing useful on this but notes that outb not outw is required for
some devices.
For that matter does anyone actually have video cards old enough for us
to care actually still in use with Linux today ?
Alan
On 30-12-07 16:28, Ingo Molnar wrote:
> Reading from the 0x3cc port does not impact the cursor position update
> sequence IIRC - i think the vidport is even ignored for the input
> direction by most hardware, there's a separate input register. The 0x3cc
> port is a well-defined VGA register which should be unused on non-VGA
> hardware. (which makes it a perfect delay register in any case)
Hardly. Duron 1300 on AMD756:
rene@7ixe4:~/src/port80$ su -c ./port80
cycles: out 2400, in 2401
rene@7ixe4:~/src/port80$ su -c ./port3cc
cycles: out 459, in 394
As stated a few dozen times by now already, port 0x80 is _decidedly_ _non_
_random_
Rene.
* Alan Cox <[email protected]> wrote:
> > i dont get your last point. Firstly, we do an "outb $0x80" not an
> > inb.
>
> outb not inb sorry yes
>
> > Secondly, outb $0x80 has no PCI posting side-effects AFAICS.
> > Thirdly,
>
> It does. The last mmio write cycle to the bridge gets pushed out
> before the 0x80 cycle goes to the PCI bridge, times out and goes to
> the LPC bus.
ok. Is it more of a "gets flushed due to timing out", or a
specified-for-sure POST flushing property of all out 0x80 cycles going
to the PCI bridge? I thought PCI posting policy is up to the CPU, it can
delay PCI space writes arbitrarily (within reasonable timeouts) as long
as no read is done from the _same_ IO space address. Note that the port
0x80 cycle is neither a read, nor for the same address.
> I still don't believe any of our _p users in PCI space are actually
> real - but someone needs to look at the scsi ones.
i'm wondering, how safe would it be to just dumbly replace outb_p()
with:
out(port);
in(port);
in these drivers. Side-effects of inb() would not be unheard of for the
ancient IO ports, but for even relatively old SCSI hardware, would that
really be a problem?
this would give us explicit PCI posting.
> > even assuming that it has PCI posting side-effects, how can any locking
> > error be covered up by an outb 0x80 sticking together with the inb it
> > does before it? The sequence we emit is:
> >
> > inbb $some_port
> > outb $0x80
> >
> > and i see that the likelyhood of getting such sequences from two CPUs
> > 'mixed up' are low, but how can this have any smp locking side-effects?
> > How can this provide any workaround/coverup?
>
> We issue inb port
> We issue outb 0x80
>
> The CPU core stalls and the LPC bus stalls
>
> On the other CPU we issue another access to the LPC bus because our
> locking is wrong. With the 0x80 outb use this stalls so the delay is
> applied unless the two inb's occur perfectly in time. With a udelay()
> the udelay can be split and we get a second access which breaks the
> needed device delay. We end up relying on the bus locking non
> splitting properties of the 0x80 port access to paper over bugs - see
> the watchdog fix example I sent you about a week ago.
ah, i understand. So i guess a stupid udelay_serialized() which takes a
global spinlock would solve these sort of races? But i guess making them
more likely to trigger would lead to a better kernel in the end ...
> > do you have any memories about the outb_p() use of misc_32.c:
> >
> > pos = (x + cols * y) * 2; /* Update cursor position */
> > outb_p(14, vidport);
> > outb_p(0xff & (pos >> 9), vidport+1);
> > outb_p(15, vidport);
> > outb_p(0xff & (pos >> 1), vidport+1);
> >
> > was this ever needed? This is so early in the bootup that can we cannot
>
> None - but we don't care. The problems with 0x80 and the wacko HP
> systems occur once ACPI is enabled so we are fine using 0x80. I don't
> myself know why the _p versions ended up being used. A rummage through
> archives found me nothing useful on this but notes that outb not outw
> is required for some devices.
>
> For that matter does anyone actually have video cards old enough for
> us to care actually still in use with Linux today ?
we had port 0x80 removal patches floating in the past decade, and i'm
sure if it broke anything for sure we'd know about it. It was always
this "general scope impact" property of it that scared us away from
doing it - but we'll do the plunge in v2.6.25 and make io_delay=udelay
the default, hm? Thomas has a real 386DX system, if that doesnt break
then nothing would i guess ;-) We wont forget about needed PCI posting
driver fixups either because the _p() API use would still be in place.
By 2.6.26 we could remove all of them.
Ingo
On Sun, Dec 30, 2007 at 02:05:44PM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > > drivers that then are shown to really need it could use their *own*
> > > ports.
> >
> > The i8259 driver uses it and it is known to be needed on some old
> > chipsets. But it doesn't really have any "own" ports to use afaik.
>
> we'll solve that via an i8259-specific quirk. That is a lot cleaner and
> maintainable than the current generic, always-enabled "opt out"
> port-0x80 quirk.
You mean using pci quirks + udelay? Will be probably challenging to collect
PCI-IDs for that. And there might be old systems needing it without PCI.
They likely won't have DMI either.
In theory you could make it a DMI year cut off of course (and assume old
if no DMI, although that happens occasionally with new systems too); but
that is generally considered ugly.
I don't think it's a big problem to keep delays of some form by default in 8259 --
people who care about performance should be definitely using APIC mode instead.
-Andi
* Rene Herman <[email protected]> wrote:
> On 30-12-07 16:28, Ingo Molnar wrote:
>
>> Reading from the 0x3cc port does not impact the cursor position update
>> sequence IIRC - i think the vidport is even ignored for the input
>> direction by most hardware, there's a separate input register. The 0x3cc
>> port is a well-defined VGA register which should be unused on non-VGA
>> hardware. (which makes it a perfect delay register in any case)
>
> Hardly. Duron 1300 on AMD756:
but that does not matter at all: that's not '90s era hardware that we
are (slightly) worried about wrt. IO delays in misc_32.c. (i.e. on
_real_ ISA systems)
> rene@7ixe4:~/src/port80$ su -c ./port80
> cycles: out 2400, in 2401
> rene@7ixe4:~/src/port80$ su -c ./port3cc
> cycles: out 459, in 394
of course, since VGA is implemented in the southbridge or on the video
card, so it's much faster than a true ISA cycle.
the only (minor) worry we have here is really ancient systems relying on
delays there. Modern VGA hardware most definitely does not need any such
delays.
Ingo
On 30-12-07 17:07, Ingo Molnar wrote:
> * Rene Herman <[email protected]> wrote:
>
>> On 30-12-07 16:28, Ingo Molnar wrote:
>>
>>> hardware. (which makes it a perfect delay register in any case)
>> Hardly. Duron 1300 on AMD756:
>
> but that does not matter at all: that's not '90s era hardware that we
> are (slightly) worried about wrt. IO delays in misc_32.c. (i.e. on
> _real_ ISA systems)
Real ISA systems will also generally respond faster to it than the unused
port (this thing actually has an ISA bus but not VGA on it ofcourse) which
means that "a perfect delay register" it is not. But yes, I have an actual
Am386DX-40 with ISA VGA up and running which also doesn't care either way,
about the ones in misc_32.c or anywhere else for that matter.
Me myself never having seen anything actually care since using that machine
actively was in fact the reason I got involved so don't get me wrong; doing
away with 0x80 use would be quite sensible. It's just that various machines
that _do_ need it (and which were reported to exist) are by now gathering
dust in basements and will not timely respond/test this. Which, again, also
means their possible regression might not be considered all that regressive
but still; if x86 should support anything under the sun still it's a
sensible worry.
Rene.
* Andi Kleen <[email protected]> wrote:
> > > The i8259 driver uses it and it is known to be needed on some old
> > > chipsets. But it doesn't really have any "own" ports to use afaik.
> >
> > we'll solve that via an i8259-specific quirk. That is a lot cleaner
> > and maintainable than the current generic, always-enabled "opt out"
> > port-0x80 quirk.
>
> You mean using pci quirks + udelay? Will be probably challenging to
> collect PCI-IDs for that. And there might be old systems needing it
> without PCI. They likely won't have DMI either.
>
> In theory you could make it a DMI year cut off of course (and assume
> old if no DMI, although that happens occasionally with new systems
> too); but that is generally considered ugly.
>
> I don't think it's a big problem to keep delays of some form by
> default in 8259 -- people who care about performance should be
> definitely using APIC mode instead.
do you remember which old systems/chipsets were affected by this
problem? We had many - meanwhile fixed - PIC related problems, maybe
it's a red herring and the delay just papered it over.
Ingo
> ok. Is it more of a "gets flushed due to timing out", or a
> specified-for-sure POST flushing property of all out 0x80 cycles going
> to the PCI bridge? I thought PCI posting policy is up to the CPU, it can
> delay PCI space writes arbitrarily (within reasonable timeouts) as long
> as no read is done from the _same_ IO space address. Note that the port
> 0x80 cycle is neither a read, nor for the same address.
Its what appears to happen reliably on real computers.
> i'm wondering, how safe would it be to just dumbly replace outb_p()
> with:
>
> out(port);
> in(port);
Catastrophic I imagine. If the delay is for timing access then you've just
broken the timing, if the port has side effects you've just broken the
driver.
> in these drivers. Side-effects of inb() would not be unheard of for the
> ancient IO ports, but for even relatively old SCSI hardware, would that
> really be a problem?
The specific drivers need reviewing. There are very few uses in PCI space
so it's a minor job.
> ah, i understand. So i guess a stupid udelay_serialized() which takes a
> global spinlock would solve these sort of races? But i guess making them
> more likely to trigger would lead to a better kernel in the end ...
Better to just fix the drivers. I don't think that will take too many
days after everyone is back working.
> doing it - but we'll do the plunge in v2.6.25 and make io_delay=udelay
> the default, hm? Thomas has a real 386DX system, if that doesnt break
For processors with TSC I think we should aim for 2.6.25 to do this and
to have the major other _p fixups done. I pity whoever does stuff like
the scc drivers but most of the rest isn't too bad.
Alan
* Rene Herman <[email protected]> wrote:
>>> Hardly. Duron 1300 on AMD756:
>>
>> but that does not matter at all: that's not '90s era hardware that we
>> are (slightly) worried about wrt. IO delays in misc_32.c. (i.e. on
>> _real_ ISA systems)
>
> Real ISA systems will also generally respond faster to it than the
> unused port (this thing actually has an ISA bus but not VGA on it
> ofcourse) which means that "a perfect delay register" it is not. But
> yes, I have an actual Am386DX-40 with ISA VGA up and running which
> also doesn't care either way, about the ones in misc_32.c or anywhere
> else for that matter.
yeah - and that's typical of most _p() use: most of them are totally
bogus, but the global existence of the delay was used as a "it _might_
break system" boogey-man against replacing it.
so _IF_ we do any delay in x86 platform drivers, we at most do a delay
on the order of the round-trip latency to the same piece of hardware we
are handling. That isolates the quirk to the same hardware category,
instead of creating these cross-dependencies and assumed dependencies on
fixed, absolute timings. (and most hardware timing bugs are not absolute
but depend on some bus speed/frequency, thus round-trip latency of that
hardware is a good approximation of that. The round-trip to the same
hardware also correctly adds any assumed PCI posting dependencies.)
So the current plan is to go with an io_delay=udelay default in v2.6.25,
to give this a migration window, and io_delay=none in v2.6.26 [and a
complete removal of arch/x86/kernel/io_delay.c], once the _p() uses are
fixed up. This is gradual enough to notice any regressions we care about
and also makes it nicely bisectable and gradual.
Ingo
On Sunday 30 December 2007 16:38, Alan Cox wrote:
> > do you have any memories about the outb_p() use of misc_32.c:
> >
> > pos = (x + cols * y) * 2; /* Update cursor position */
> > outb_p(14, vidport);
> > outb_p(0xff & (pos >> 9), vidport+1);
> > outb_p(15, vidport);
> > outb_p(0xff & (pos >> 1), vidport+1);
> >
> > was this ever needed? This is so early in the bootup that can we cannot
>
> None - but we don't care.
Was this embedded outb to 0x80 for delay only? Maybe I'm wrong. But in the
case above it forces the chipselect signal to deselect the hardware between
the access to vidport and vidport+1. Some devices need this to latch the
values correctly. Otherwise the chipselect signal would be active for all
four accesses in the example above (and only data and addresses are changing
from device's view).
Juergen
* Alan Cox <[email protected]> wrote:
> > ah, i understand. So i guess a stupid udelay_serialized() which
> > takes a global spinlock would solve these sort of races? But i guess
> > making them more likely to trigger would lead to a better kernel in
> > the end ...
>
> Better to just fix the drivers. I don't think that will take too many
> days after everyone is back working.
ok.
> > doing it - but we'll do the plunge in v2.6.25 and make
> > io_delay=udelay the default, hm? Thomas has a real 386DX system, if
> > that doesnt break
>
> For processors with TSC I think we should aim for 2.6.25 to do this
> and to have the major other _p fixups done. I pity whoever does stuff
> like the scc drivers but most of the rest isn't too bad.
ok, sounds good to me. The current io_delay= stuff for v2.6.25 is
already shaped as a debugging/transition helper, towards complete
elimination of _p() uses.
Ingo
Ingo Molnar <[email protected]> wrote:
> do you have any memories about the outb_p() use of misc_32.c:
>
> pos = (x + cols * y) * 2; /* Update cursor position */
> outb_p(14, vidport);
> outb_p(0xff & (pos >> 9), vidport+1);
> outb_p(15, vidport);
> outb_p(0xff & (pos >> 1), vidport+1);
>
> was this ever needed? This is so early in the bootup that can we cannot
> do any sensible delay. Perhaps we could try a natural delay sequence via
> inb from 0x3cc:
>
> outb(14, vidport);
> inb(0x3cc); /* delay */
> outb(0xff & (pos >> 9), vidport+1);
I've never seen code which would do that, and it was not suggested by any
tutorial I ever saw. I'd expect any machine to break on all kinds of software
if it required this. The only thing I remember being warned about is writing
the index and the data register at the same time using outw, because that
would write both registers at the same time on 16-bit-cards.
BTW: The error function in linux-2.6.23/arch/i386/boot/compressed/misc.c
uses while(1) without cpu_relax() in order to halt the machine. Is this fixed?
Should it be fixed?
On 30-12-07 18:06, Ingo Molnar wrote:
> * Rene Herman <[email protected]> wrote:
>> Real ISA systems will also generally respond faster to it than the
>> unused port (this thing actually has an ISA bus but not VGA on it
>> ofcourse) which means that "a perfect delay register" it is not. But
>> yes, I have an actual Am386DX-40 with ISA VGA up and running which
>> also doesn't care either way, about the ones in misc_32.c or anywhere
>> else for that matter.
>
> yeah - and that's typical of most _p() use: most of them are totally
> bogus, but the global existence of the delay was used as a "it _might_
> break system" boogey-man against replacing it.
No delaying at all does break a few systems.
> so _IF_ we do any delay in x86 platform drivers, we at most do a delay
> on the order of the round-trip latency to the same piece of hardware we
> are handling.
Given that part of the problem is 2 MHz devices on a 8 MHz bus, you can't do
this generally.
Rene.
* Bodo Eggert <[email protected]> wrote:
> BTW: The error function in
> linux-2.6.23/arch/i386/boot/compressed/misc.c uses while(1) without
> cpu_relax() in order to halt the machine. Is this fixed? Should it be
> fixed?
this is early bootup so there's no need to be "nice" to other cores or
sockets - none of them are really running.
Ingo
On 30-12-07 17:48, Alan Cox wrote:
> For processors with TSC I think we should aim for 2.6.25 to do this and
> to have the major other _p fixups done. I pity whoever does stuff like
> the scc drivers but most of the rest isn't too bad.
I'm by the way looking at drivers/net/wd.c which my 386 uses for its dual
mode NE2000/WD8013 clone ISA NIC and while it specifically needs no delay at
all it seems, the mixed use of out and outb_p seems to suggest that someone
once thought about that. Would you advice sticking in a udelay(2) manually
there?
Rene.
> do you remember which old systems/chipsets were affected by this
> problem? We had many - meanwhile fixed - PIC related problems, maybe
> it's a red herring and the delay just papered it over.
Some old VIA chipsets at least iirc. Might be more.
-Andi
Ingo Molnar wrote:
> * Alan Cox <[email protected]> wrote:
>
>>> i dont get your last point. Firstly, we do an "outb $0x80" not an
>>> inb.
>> outb not inb sorry yes
>>
>>> Secondly, outb $0x80 has no PCI posting side-effects AFAICS.
>>> Thirdly,
>> It does. The last mmio write cycle to the bridge gets pushed out
>> before the 0x80 cycle goes to the PCI bridge, times out and goes to
>> the LPC bus.
>
> ok. Is it more of a "gets flushed due to timing out", or a
> specified-for-sure POST flushing property of all out 0x80 cycles going
> to the PCI bridge? I thought PCI posting policy is up to the CPU, it can
> delay PCI space writes arbitrarily (within reasonable timeouts) as long
> as no read is done from the _same_ IO space address. Note that the port
> 0x80 cycle is neither a read, nor for the same address.
There's no guarantee in the spec that any IO access will flush pending
MMIO writes. However, I suspect in the majority of implementations
(perhaps all), it indeed does.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
> So the current plan is to go with an io_delay=udelay default in v2.6.25,
> to give this a migration window, and io_delay=none in v2.6.26 [and a
> complete removal of arch/x86/kernel/io_delay.c], once the _p() uses are
> fixed up. This is gradual enough to notice any regressions we care about
> and also makes it nicely bisectable and gradual.
You will break systems if you blindly go around disabling _p delays for
ISA and LPC bus devices. The DEC Hinote laptops for example are well
known for requiring the correct ISA and other keyboard controller delays.
I don't expect anyone to test with a hinote or see it until it hits
Debian or similar 'low resource' friendly devices.
A 2.6.26 plan for io_delay=none is very very foolish indeed. We don't burn
the processor manuals, overclock the CPU and use undefined behaviour
hacks, we shouldn't do the same for I/O devices. Your claim of
bisectability is also completely confused and wrong. If, for example, you
write to an SCC without delays then the chances are it will work most
times. Bisecting doesn't work for random timing dependant failures.
We have four categories of _p users
- Devices that don't need it -> Eliminate use
- Old Devices that do need it -> Codify use and fix locking
- Legacy Devices that we don't need to use on modern systems -> Avoid use
- Devices that sometimes need it -> Evaluate options
There is absolutely no point in breaking, overclocking and introducing
random unreliabilities (that may be stepping or even device instance
specific) into device drivers. Quite the reverse in fact - the way to
drive out _p misuse for debugging is to make it *very* visible. An
io_delay=debug which beeps the keyboard buzzer each _p access will be
most informative and lead to far better and correct debugging.
The components in question for the typical user of a modern system are:
ISA DMA controller (doesn't get used)
Keyboard interface (notoriously sensitive to timing, going USB)
PIC (use the APIC instead)
Legacy Timers (use the newer timers instead)
CMOS (slow as **** anyway so udelay 2 doesn't matter)
Floppy (dying out and slow anyway)
So there is nothing to gain from going with "No delay" and everything to
lose. What we actually want to do is to make it as visible as possible so
we can avoid it whenever possible.
Alan
> A 2.6.26 plan for io_delay=none is very very foolish indeed. We don't burn
It also seems quite risky to me; at least if not paired with a DMI
year master switch.
Switching to udelay() by default should be probably ok though.
-Andi
On Sun, 30 Dec 2007, Rene Herman wrote:
>
> rene@7ixe4:~/src/port80$ su -c ./port80
> cycles: out 2400, in 2401
> rene@7ixe4:~/src/port80$ su -c ./port3cc
> cycles: out 459, in 394
>
> As stated a few dozen times by now already, port 0x80 is _decidedly_ _non_
> _random_
Oh, I agree 100% that port 80 is not random. It's very much selected on
purpose.
The reason we use port 80 is because it's commonly used as the BIOS POST
debug port, which means that it's been a reasonable "safe" port to use,
since nobody would be so crazy as to actually hook up a real device behind
that port (and since it is below 0x100, it's also part of the "motherboard
range", so you won't have any crazy plug-in devices either).
Pretty much all other ports in the low 256 bytes of IO have been used at
some point or other, because people put special motherboard devices in and
pick from the very limited list of open ports at random. So there are
ports that are not commonly used (notably 0xB0-0xBF and 0xE0-0xEF), but
they are quite often used for stuff like the magic Sony VAIO rocker-button
devices etc.
So 0x80 _is_ special. We've been able to use it for 15+ years with
basically nobody being so insane as to put an anything there.
However, I'd like to point out that the *timings* aren't special per se.
The only reason you see such slow accesses to port 80 is not because port
80 is special from a timing standpoint, but because it falls under the
heading of "no device wanted to accept the access", and it wasn't decoded
by any bridge or device. So it hits the "access timed out" case, which is
just about the slowest access you can have.
But that does't mean that other ports won't have the same timings. Also,
it doesn't mean that we really need to have exactly *those* timings.
But yes, the timeout timing is pretty convenient, because it's basically
almost universally always going to take one microsecond to time out,
regardless of speed of CPU. It's been impressively stable over the years.
But do we *need* it that stable? It probably would be perfectly fine to
pick something that gets faster with CPU's getting faster, because it's
generally only really old devices that need that delay in the first place.
In other words, the really *traditional* delay is not to do an IO port
access at all, but to just do two short unconditional jumps. That was
enough of a slowdown on the old machines, and the old machines are likely
the only machines that really care about or want the slowdown in the first
place!
In other words, what I'm trying to say is:
- yes, "port 80" is very much special
- yes, the timings on any port that is unconnected (or connected to some
interal ISA bus like the LPC often is) have been impressively stable
over the years at about 1us.
- but no, I don't think we really need those special timings. The fact
is, hardware manufacturers have been *so* careful about backwards
compatibility, that they generally made sure that the "two short jumps"
delay (which is no delay at all these days!) _also_ continued working.
I also think that the worries about PCI write posting are unnecessary. IO
port accesses (ie a regular "inb()" and "outb()" even _without_ the "_p()"
format slowdown) are already synchronous not only by the CPU but by all
chipsets. That's why that "outb" to port 0x80 takes a microsecond: because
unlike a MMIO write, it's not only synchronous all the way to the chipset,
it's even synchronous as far as the CPU core is concerned too (which is
also why all high-performance devices avoid PIO like the plague).
So even if that "port 80" access will also cause PCI postings to be
flushed, so would the actual IO access that accompanies it, so I don't
think that is a very strong argument.
With all that said: it is certainly possible that the 1us timing makes a
difference on some machine, and it is certainly *also* theoretically
possible that there is a buggy chipset that posts too much, and the port
80 access might make a difference, but it's not all that likely, and I
suspect we'd be better off handling those devices/drivers on a one-by-one
basis as we find them.
Linus
On Sun, 30 Dec 2007 19:14:40 +0100
Rene Herman <[email protected]> wrote:
> On 30-12-07 17:48, Alan Cox wrote:
>
> > For processors with TSC I think we should aim for 2.6.25 to do this and
> > to have the major other _p fixups done. I pity whoever does stuff like
> > the scc drivers but most of the rest isn't too bad.
>
> I'm by the way looking at drivers/net/wd.c which my 386 uses for its dual
> mode NE2000/WD8013 clone ISA NIC and while it specifically needs no delay at
> all it seems, the mixed use of out and outb_p seems to suggest that someone
> once thought about that. Would you advice sticking in a udelay(2) manually
> there?
I would need to dig out the documentation and NE2000 reference code if I
even still have them. From memory NE2K needs them but I don't know
offhand if the WD80x3 devices do, or if only some of them do. It'll also
depend on the port - the DPRAM is different to the 8390.
Don Becker wrote the drivers and at the time he tuned them carefully for
performance so I would expect delays to be the ones needed
Alan
On 30-12-07 19:39, Alan Cox wrote:
> On Sun, 30 Dec 2007 19:14:40 +0100
> Rene Herman <[email protected]> wrote:
>> I'm by the way looking at drivers/net/wd.c which my 386 uses for its dual
>> mode NE2000/WD8013 clone ISA NIC and while it specifically needs no delay at
>> all it seems, the mixed use of out and outb_p seems to suggest that someone
>> once thought about that. Would you advice sticking in a udelay(2) manually
>> there?
>
> I would need to dig out the documentation and NE2000 reference code if I
> even still have them. From memory NE2K needs them but I don't know
> offhand if the WD80x3 devices do, or if only some of them do. It'll also
> depend on the port - the DPRAM is different to the 8390.
>
> Don Becker wrote the drivers and at the time he tuned them carefully for
> performance so I would expect delays to be the ones needed
This NIC (a Networth UTP16B) has a National Semiconductor DP83905 AT/LANTIC
for which I'm reading the software developers guide now. It doesn't seem to
list specific delays...
I also just now dug up a "WDC (C) 1987" WD8003EBT and a "Novell, Inc (C)
1990" NE1000, both 8-bit ISA NICs and the ownership of which, I would
suggest, makes me a really cool person. Both are coax and a little clumsy to
test but that 1987 one is probably going to be close to the oldest type around.
I've been testing with the 386's own 2.2.26 kernel upto now but I'll try and
compile a 2.6 system on there with uclibc and busybox or some such and test
more.
Rene.
On Sun, 30 Dec 2007, Rene Herman wrote:
>
> I also just now dug up a "WDC (C) 1987" WD8003EBT and a "Novell, Inc (C) 1990"
> NE1000, both 8-bit ISA NICs and the ownership of which, I would suggest, makes
> me a really cool person.
.. I'm also told that mentioning this is a really good way to pick up any
hot chicks in singles bars.
"If you've got it, flaunt it".
Please let us know how it turns out for you,
Linus
On 30-12-07 21:00, Linus Torvalds wrote:
> On Sun, 30 Dec 2007, Rene Herman wrote:
>> I also just now dug up a "WDC (C) 1987" WD8003EBT and a "Novell, Inc (C) 1990"
>> NE1000, both 8-bit ISA NICs and the ownership of which, I would suggest, makes
>> me a really cool person.
>
> .. I'm also told that mentioning this is a really good way to pick up any
> hot chicks in singles bars.
>
> "If you've got it, flaunt it".
>
> Please let us know how it turns out for you,
Ah, check, thanks, will do!
Rene.
* Linus Torvalds <[email protected]> wrote:
> So even if that "port 80" access will also cause PCI postings to be
> flushed, so would the actual IO access that accompanies it, so I don't
> think that is a very strong argument.
>
> With all that said: it is certainly possible that the 1us timing makes
> a difference on some machine, and it is certainly *also* theoretically
> possible that there is a buggy chipset that posts too much, and the
> port 80 access might make a difference, but it's not all that likely,
> and I suspect we'd be better off handling those devices/drivers on a
> one-by-one basis as we find them.
yeah, wholeheartedly agreed, and this is what x86.git is heading
towards. All test feedback so far is positive. With strong tools like
bisection there's no reason why we couldnt approach it this way. If this
change breaks anything, it will be bisected down to the patch below. In
fact even io_delay=udelay would be wrong because any problem will be
less clearly triggerable and thus less bisectable/debuggable.
Ingo
----------------------->
Subject: x86: make io_delay=none the default
From: Ingo Molnar <[email protected]>
make io_delay=none the default. This is the first step towards removing
all the legacy io-delay API uses.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/Kconfig.debug | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-x86.q/arch/x86/Kconfig.debug
===================================================================
--- linux-x86.q.orig/arch/x86/Kconfig.debug
+++ linux-x86.q/arch/x86/Kconfig.debug
@@ -133,7 +133,7 @@ config IO_DELAY_TYPE_NONE
choice
prompt "IO delay type"
- default IO_DELAY_0X80
+ default IO_DELAY_NONE
config IO_DELAY_0X80
bool "port 0x80 based port-IO delay [recommended]"
Juergen Beisert wrote:
> On Sunday 30 December 2007 16:38, Alan Cox wrote:
>>> do you have any memories about the outb_p() use of misc_32.c:
>>>
>>> pos = (x + cols * y) * 2; /* Update cursor position */
>>> outb_p(14, vidport);
>>> outb_p(0xff & (pos >> 9), vidport+1);
>>> outb_p(15, vidport);
>>> outb_p(0xff & (pos >> 1), vidport+1);
>>>
>>> was this ever needed? This is so early in the bootup that can we cannot
>> None - but we don't care.
>
> Was this embedded outb to 0x80 for delay only? Maybe I'm wrong. But in the
> case above it forces the chipselect signal to deselect the hardware between
> the access to vidport and vidport+1. Some devices need this to latch the
> values correctly. Otherwise the chipselect signal would be active for all
> four accesses in the example above (and only data and addresses are changing
> from device's view).
>
Presumably you're talking about an actual ISA bus here. On those, you
don't really have a chip select; but you'd expect the latch to happen on
the rising edge of IOW#, not on an internally generated chip select.
Now, I think there is a specific reason to believe that EGA/VGA (but
perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards
of the day was touched, directly, by an interminable number of DOS
applications. CGA/MDA generally *were not*, due to the unsynchronized
memory of the original versions (writing could cause snow), so most
applications tended to fall back to using the BIOS access methods for
CGA and MDA.
-hpa
Bodo Eggert wrote:
>
> I've never seen code which would do that, and it was not suggested by any
> tutorial I ever saw. I'd expect any machine to break on all kinds of software
> if it required this. The only thing I remember being warned about is writing
> the index and the data register at the same time using outw, because that
> would write both registers at the same time on 16-bit-cards.
>
And we use that, and have been for 15 years. I haven't seen any screams
of pain about it.
-hpa
* Alan Cox <[email protected]> wrote:
> > So the current plan is to go with an io_delay=udelay default in v2.6.25,
> > to give this a migration window, and io_delay=none in v2.6.26 [and a
> > complete removal of arch/x86/kernel/io_delay.c], once the _p() uses are
> > fixed up. This is gradual enough to notice any regressions we care about
> > and also makes it nicely bisectable and gradual.
>
> You will break systems if you blindly go around disabling _p delays
> for ISA and LPC bus devices. The DEC Hinote laptops for example are
> well known for requiring the correct ISA and other keyboard controller
> delays. I don't expect anyone to test with a hinote or see it until it
> hits Debian or similar 'low resource' friendly devices.
well, using io_delay=udelay is not 'blindly disabling'. io_delay=none
would be the end goal, once all _p() API uses are eliminated by
transformation. In drivers/ alone that's more than 1000 callsites, so
it's quite frequently used, and wont go away overnight.
Ingo
Ingo Molnar wrote:
> * Bodo Eggert <[email protected]> wrote:
>
>> BTW: The error function in
>> linux-2.6.23/arch/i386/boot/compressed/misc.c uses while(1) without
>> cpu_relax() in order to halt the machine. Is this fixed? Should it be
>> fixed?
>
> this is early bootup so there's no need to be "nice" to other cores or
> sockets - none of them are really running.
>
It probably should actually HLT, to avoid sucking power, and stressing
the thermal system. We're dead at this point, and the early 486's which
had problems with HLT will lock up - we don't care.
-hpa
On 30-12-07 21:46, Ingo Molnar wrote:
> * Alan Cox <[email protected]> wrote:
>
>>> So the current plan is to go with an io_delay=udelay default in v2.6.25,
>>> to give this a migration window, and io_delay=none in v2.6.26 [and a
>>> complete removal of arch/x86/kernel/io_delay.c], once the _p() uses are
>>> fixed up. This is gradual enough to notice any regressions we care about
>>> and also makes it nicely bisectable and gradual.
>> You will break systems if you blindly go around disabling _p delays
>> for ISA and LPC bus devices. The DEC Hinote laptops for example are
>> well known for requiring the correct ISA and other keyboard controller
>> delays. I don't expect anyone to test with a hinote or see it until it
>> hits Debian or similar 'low resource' friendly devices.
>
> well, using io_delay=udelay is not 'blindly disabling'.
On the other hand, the patch you just posted that makes io_delay=none the
default _is_ blindly disabling. So that wasn't for consumption?
io_delay=udelay additionally blindly disables the race-hiding effect the
outb has on SMP and that Alan is seeing so many of. Should also wait for
more driver review.
Rene.
* H. Peter Anvin <[email protected]> wrote:
> Ingo Molnar wrote:
>> * Bodo Eggert <[email protected]> wrote:
>>
>>> BTW: The error function in linux-2.6.23/arch/i386/boot/compressed/misc.c
>>> uses while(1) without cpu_relax() in order to halt the machine. Is this
>>> fixed? Should it be fixed?
>>
>> this is early bootup so there's no need to be "nice" to other cores or
>> sockets - none of them are really running.
>>
>
> It probably should actually HLT, to avoid sucking power, and stressing
> the thermal system. We're dead at this point, and the early 486's
> which had problems with HLT will lock up - we don't care.
ok. Like the patch below?
Ingo
---------->
Subject: x86: hlt on early crash
From: Ingo Molnar <[email protected]>
H. Peter Anvin <[email protected]> wrote:
> It probably should actually HLT, to avoid sucking power, and stressing
> the thermal system. We're dead at this point, and the early 486's
> which had problems with HLT will lock up - we don't care.
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/compressed/misc_32.c | 2 +-
arch/x86/boot/compressed/misc_64.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
Index: linux-x86.q/arch/x86/boot/compressed/misc_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/boot/compressed/misc_32.c
+++ linux-x86.q/arch/x86/boot/compressed/misc_32.c
@@ -339,7 +339,7 @@ static void error(char *x)
putstr(x);
putstr("\n\n -- System halted");
- while(1); /* Halt */
+ asm("cli; hlt"); /* Halt */
}
asmlinkage void decompress_kernel(void *rmode, unsigned long end,
Index: linux-x86.q/arch/x86/boot/compressed/misc_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/boot/compressed/misc_64.c
+++ linux-x86.q/arch/x86/boot/compressed/misc_64.c
@@ -338,7 +338,7 @@ static void error(char *x)
putstr(x);
putstr("\n\n -- System halted");
- while(1); /* Halt */
+ asm("cli; hlt"); /* Halt */
}
asmlinkage void decompress_kernel(void *rmode, unsigned long heap,
I am so happy that there will be a way for people who don't build their
own kernels to run Linux on their HP and Compaq laptops that have
problems with gazillions of writes to port 80, and I'm also happy that
some of the strange driver code will be cleaned up over time. Thank you
all. Some thoughts you all might consider, take or leave, in this
process, from an old engineering manager who once had to worry about QA
for software on nearly every personal computer model in the 1980-1992
period:
You know, there is a class of devices that are defined to use port
0x80... it's that historically useful class of devices that show/record
the POST diagnostics. It certainly was not designed for "delay"
purposes. In fact, some of those same silly devices are still used in
industry during manufacturing test. I wonder what would happen if
Windows were not part of manufacturing test, and instead Linux were the
"standard" for some category of machines...
When I was still working at Lotus in the late '80's, when we still
supported machines like 286's, there were lots of problems with timing
loops in drivers in applications (even Win 3.0 had some in hard disk
drivers, as did some of our printer drivers, ...), as clock speeds
continued to ramp. There were major news stories of machines that
"crashed when xyz application or zyx peripheral were added". It was
Intel, as I recall, that started "publicly" berating companies in the PC
industry for using the "two short jumps" solutions, and suggesting that
they measure the processor speed at bootup, using the BIOS standard for
doing that with the int 15 BIOS elapsed time calls, and always use
"calibrated" timing loops. Which all of us who supported device
drivers started to do (remember, apps had device drivers in those days
for many devices that talked directly with the registers).
I was impressed when I dug into Linux eventually, that this operating
system "got it right" by measuring the timing during boot and creating a
udelay function that really worked!
So I have to say, that when I was tracing down the problem that
originally kicked off this thread, which was that just accessing the RTC
using the standard CMOS_READ macros in a loop caused a hang, that these
"outb al,80h" things were there. And I noticed your skeptical comment
in the code, Linus. Knowing that there was never in any of the
documented RTC chipsets a need for a pause between accesses (going back
to my days at Software Arts working on just about every old machine
there was...) I changed it on a lark to do no pause at all. And my
machine never hung...
Now what's interesting is that the outb to port 80 is *faster* than an
outb to an unused port, on my machine. So there's something there -
actually accepting the bus transaction. In the ancient 5150 PC, 80 was
unused because it was the DMA controller port that drove memory refresh,
and had no meaning.
Now my current hypothesis (not having access to quanta's design specs
for a board they designed and have shipped in quantity, or having taken
the laptop apart recently) is that there is logic there on port 80,
doing something. Perhaps even "POST diagnostic recording" as every PC
since the XT has supported... perhaps supporting post-crash
dignostics... And that that something has a buffer, perhaps even in
the "Embedded Controller" that may need emptying periodically. It
takes several tens of thousands of "outb" to port 80 to hang the
hardware solid - so something is either rare or overflowing. In any
case, if this hypothesis is correct - the hardware may have an erratum,
but the hardware is doing a very desirable thing - standardizing on an
error mechanism that was already in the "standard" as an option... It's
Linux that is using a "standard" in a wrong way (a diagnostic port as a
delay).
So I say all this, mainly to point out that Linux has done timing loops
right (udelay and ndelay) - except one place where there was some
skepticism expressed, right there in the code. Linus may have some
idea why it was thought important to do an essential delay with a bus
transaction that had uncertain timing. My hypothesis is that
"community" projects have the danger of "magical theories" and
"coolness" overriding careful engineering design practices.
Cleaning up that "clever hack" that seemed so good at the time is hugely
difficult, especially when the driver writer didn't write down why he
used it.
Thus I would suggest that the _p functions be deprecated, and if there
needs to be a timing-delay after in/out instructions, define
in_pause(port, nsec_delay) with an explicit delay. And if the delay is
dependent on bus speeds, define a bus-speed ratio calibration.
Thus in future driver writing, people will be forced to think clearly
about what the timing characteristics of their device on its bus must
be. That presupposes that driver writers understand the timing
issues. If they do not, they should not be writing drivers.
> But that does't mean that other ports won't have the same timings. Also,
> it doesn't mean that we really need to have exactly *those* timings.
For ISA bus you want "at least" those timings. That is an easy case
anyway - ISA bus boxes, old processors and generally no TSC so we can
fall back to 0x80 - we know from 15 years experience the problem only
occurs with recent non ISA systems that have borked firmware.
Lots of ISA hardware does really need the delays and most of it will be
on old processors as well naturally enough.
> I also think that the worries about PCI write posting are unnecessary. IO
> port accesses (ie a regular "inb()" and "outb()" even _without_ the "_p()"
> format slowdown) are already synchronous not only by the CPU but by all
Ok then the SCSI examples should be fine (although as I said I think they
are possibly bogus anyway)
Alan
* Rene Herman <[email protected]> wrote:
> On 30-12-07 21:46, Ingo Molnar wrote:
>> * Alan Cox <[email protected]> wrote:
>>
>>>> So the current plan is to go with an io_delay=udelay default in v2.6.25,
>>>> to give this a migration window, and io_delay=none in v2.6.26 [and a
>>>> complete removal of arch/x86/kernel/io_delay.c], once the _p() uses are
>>>> fixed up. This is gradual enough to notice any regressions we care about
>>>> and also makes it nicely bisectable and gradual.
>>> You will break systems if you blindly go around disabling _p delays for
>>> ISA and LPC bus devices. The DEC Hinote laptops for example are well
>>> known for requiring the correct ISA and other keyboard controller delays.
>>> I don't expect anyone to test with a hinote or see it until it hits
>>> Debian or similar 'low resource' friendly devices.
>>
>> well, using io_delay=udelay is not 'blindly disabling'.
>
> On the other hand, the patch you just posted that makes io_delay=none
> the default _is_ blindly disabling. So that wasn't for consumption?
if you want to see the current x86.git intention then do:
git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6.git
cd linux-2.6.git
git-branch x86
git-checkout x86
git-pull git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git mm
right now the default is io_delay=udelay.
Ingo
On Sun, 30 Dec 2007, Ingo Molnar wrote:
> * H. Peter Anvin <[email protected]> wrote:
> > Ingo Molnar wrote:
> >> * Bodo Eggert <[email protected]> wrote:
> >>> BTW: The error function in linux-2.6.23/arch/i386/boot/compressed/misc.c
> >>> uses while(1) without cpu_relax() in order to halt the machine. Is this
> >>> fixed? Should it be fixed?
> >>
> >> this is early bootup so there's no need to be "nice" to other cores or
> >> sockets - none of them are really running.
> >>
> >
> > It probably should actually HLT, to avoid sucking power, and stressing
> > the thermal system. We're dead at this point, and the early 486's
> > which had problems with HLT will lock up - we don't care.
>
> ok. Like the patch below?
>
> - while(1); /* Halt */
> + asm("cli; hlt"); /* Halt */
The other users would loop around the hlt. Cargo Cult?
--
Top 100 things you don't want the sysadmin to say:
97. Go get your backup tape. (You _do_ have a backup tape?)
> fact even io_delay=udelay would be wrong because any problem will be
> less clearly triggerable and thus less bisectable/debuggable.
And if this eats someones disk because you drive the hardware out of spec
you are going to sit there and tell them to bisect it ? Lovely.
Ingo - put the christmas wine away and have a coffee. Now think first.
You won't bisect obscure timing triggered problems, and the _p users are
almost all for hardware where performance doesn't matter one iota (eg
CMOS).
This isn't even all down to the chipset internal logic - several of my
boxes have external CMOS NVRAM/RTC chips which are probably the same
design (if a little smaller) as ten years ago.
io_delay = none is exactly the same thing as CPU overclocking. Hard to
debug, unpredictable and stupid.
Alan
On Sun, 30 Dec 2007 21:46:50 +0100
Ingo Molnar <[email protected]> wrote:
> well, using io_delay=udelay is not 'blindly disabling'. io_delay=none
> would be the end goal, once all _p() API uses are eliminated by
> transformation.
io_delay = none is not the end goal. Correctness is the end goal.
Alan
On Sun, 30 Dec 2007 12:53:02 -0800
"H. Peter Anvin" <[email protected]> wrote:
> Bodo Eggert wrote:
> >
> > I've never seen code which would do that, and it was not suggested by any
> > tutorial I ever saw. I'd expect any machine to break on all kinds of software
> > if it required this. The only thing I remember being warned about is writing
> > the index and the data register at the same time using outw, because that
> > would write both registers at the same time on 16-bit-cards.
> >
>
> And we use that, and have been for 15 years. I haven't seen any screams
> of pain about it.
Actually there were, and I sent numerous people patches for that back in
ISA days.
Alan
> ok. Like the patch below?
Not quite - you still need the loop in case you NMI and then run off into
oblivion
Ingo Molnar wrote:
>>>
>> It probably should actually HLT, to avoid sucking power, and stressing
>> the thermal system. We're dead at this point, and the early 486's
>> which had problems with HLT will lock up - we don't care.
>
> ok. Like the patch below?
>
Don't need the cli; we're already running with interrupts disabled.
I'd do:
while (1)
asm volatile("hlt");
... mostly on general principles.
-hpa
> Now what's interesting is that the outb to port 80 is *faster* than an
> outb to an unused port, on my machine. So there's something there -
> actually accepting the bus transaction. In the ancient 5150 PC, 80 was
Yes and I even told you a while back how to verify where it is. From the
timing you get its not on the LPC bus but chipset core so pretty
certainly an SMM trap as other systems with the same chipset don't have
the bug. Probably all that is needed is a BIOS upgrade
Alan
On 30-12-07 22:44, H. Peter Anvin wrote:
> Ingo Molnar wrote:
>>>>
>>> It probably should actually HLT, to avoid sucking power, and
>>> stressing the thermal system. We're dead at this point, and the
>>> early 486's which had problems with HLT will lock up - we don't care.
>>
>> ok. Like the patch below?
>>
>
> Don't need the cli; we're already running with interrupts disabled.
>
> I'd do:
>
> while (1)
> asm volatile("hlt");
>
> ... mostly on general principles.
At least with current GCC the volatile isn't strictly needed as its implied
without output operands but I was only certain after checking that. Do you
remember if that used to be different for previous GCC versions? I tend to
also stick volatiles on them still...
Rene.
* Alan Cox <[email protected]> wrote:
> You won't bisect obscure timing triggered problems, and the _p users
> are almost all for hardware where performance doesn't matter one iota
> (eg CMOS).
actually, people have, and i have too. But i agree that io_delay=none
would be stupid now, and would probably be stupid in v2.6.26 too.
i also have a debug patch that counts the number of _p() API uses and
prints a stacktrace (once per bootup) if it occurs [wrote it 2 weeks
ago] - so i agree with you that we can do this more gradually and more
intelligently. As long as it does not turn into a BKL situation. It's
2008 in a day and we've still got the NFS client code running under the
BKL - quite ridiculous IMO.
Ingo
* Alan Cox <[email protected]> wrote:
> On Sun, 30 Dec 2007 21:46:50 +0100
> Ingo Molnar <[email protected]> wrote:
>
> > well, using io_delay=udelay is not 'blindly disabling'. io_delay=none
> > would be the end goal, once all _p() API uses are eliminated by
> > transformation.
>
> io_delay = none is not the end goal. Correctness is the end goal.
the end goal will be for io_delay=none to be a NOP, because nothing will
use the _p() ops anymore.
Ingo
* Alan Cox <[email protected]> wrote:
> > ok. Like the patch below?
>
> Not quite - you still need the loop in case you NMI and then run off
> into oblivion
yes indeed. Updated patch below.
Ingo
-------------->
Subject: x86: hlt on early crash
From: Ingo Molnar <[email protected]>
H. Peter Anvin <[email protected]> wrote:
> It probably should actually HLT, to avoid sucking power, and stressing
> the thermal system. We're dead at this point, and the early 486's
> which had problems with HLT will lock up - we don't care.
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/compressed/misc_32.c | 3 ++-
arch/x86/boot/compressed/misc_64.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
Index: linux-x86.q/arch/x86/boot/compressed/misc_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/boot/compressed/misc_32.c
+++ linux-x86.q/arch/x86/boot/compressed/misc_32.c
@@ -339,7 +339,8 @@ static void error(char *x)
putstr(x);
putstr("\n\n -- System halted");
- while(1); /* Halt */
+ while (1)
+ asm("hlt");
}
asmlinkage void decompress_kernel(void *rmode, unsigned long end,
Index: linux-x86.q/arch/x86/boot/compressed/misc_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/boot/compressed/misc_64.c
+++ linux-x86.q/arch/x86/boot/compressed/misc_64.c
@@ -338,7 +338,8 @@ static void error(char *x)
putstr(x);
putstr("\n\n -- System halted");
- while(1); /* Halt */
+ while (1)
+ asm("hlt");
}
asmlinkage void decompress_kernel(void *rmode, unsigned long heap,
Alan Cox wrote:
>> Now what's interesting is that the outb to port 80 is *faster* than an
>> outb to an unused port, on my machine. So there's something there -
>> actually accepting the bus transaction. In the ancient 5150 PC, 80 was
>>
>
> Yes and I even told you a while back how to verify where it is. From the
> timing you get its not on the LPC bus but chipset core so pretty
> certainly an SMM trap as other systems with the same chipset don't have
> the bug. Probably all that is needed is a BIOS upgrade
>
>
Actually, I could see whether it was SMM trapping due to AMD MSR's that
would allow such trapping, performance or debug registers. Nothing was
set to trap with SMI or other traps on any port outputs. But I'm
continuing to investigate for a cause. It would be nice if it were a
BIOS-fixable problem. It would be even nicer if the BIOS were GPL...
David P. Reed wrote:
> Alan Cox wrote:
>>> Now what's interesting is that the outb to port 80 is *faster* than
>>> an outb to an unused port, on my machine. So there's something there
>>> - actually accepting the bus transaction. In the ancient 5150 PC,
>>> 80 was
>>
>> Yes and I even told you a while back how to verify where it is. From the
>> timing you get its not on the LPC bus but chipset core so pretty
>> certainly an SMM trap as other systems with the same chipset don't have
>> the bug. Probably all that is needed is a BIOS upgrade
>>
>>
> Actually, I could see whether it was SMM trapping due to AMD MSR's that
> would allow such trapping, performance or debug registers. Nothing was
> set to trap with SMI or other traps on any port outputs. But I'm
> continuing to investigate for a cause. It would be nice if it were a
> BIOS-fixable problem. It would be even nicer if the BIOS were GPL...
If it was an SMM trap, I would expect it to be trapped in the SuperIO chip.
-hpa
H. Peter Anvin wrote:
> Now, I think there is a specific reason to believe that EGA/VGA (but
> perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards
> of the day was touched, directly, by an interminable number of DOS
> applications. CGA/MDA generally *were not*, due to the unsynchronized
> memory of the original versions (writing could cause snow), so most
> applications tended to fall back to using the BIOS access methods for
> CGA and MDA.
>
A little history... not that it really matters, but some might be
interested in a 55-year-old hacker's sentimental recollections...As
someone who actually wrote drivers for CGA and MDA on the original IBM
PC, I can tell you that back to back I/O *port* writes and reads were
perfectly fine. The "snow" problem had nothing to do with I/O ports.
It had to do with the memory on the CGA adapter card not being dual
ported, and in high-res (80x25) character mode (only!) a CPU read or
write access caused a read of the adapter memory by the
character-generator to fail, causing one character-position of the
current scanline being output to get all random bits, which was then put
through the character-generator and generated whatever the character
generator did with 8 random bits of character or attributes as an index
into the character generator's font table.
In particular, the solution in both the BIOS and in Visicalc, 1-2-3, and
other products that did NOT use the BIOS or DOS for I/O to the CGA or
MDA because they were Dog Slow, was to detect the CGA, and do a *very*
tight loop doing "inb" instructions from one of the CGA status
registers, looking for a 0-1 transition on the horizontal retrace flag.
It would then do a write to display memory with all interrupts locked
out, because that was all it could do during the horizontal retrace,
given the speed of the processor. One of the hacks I did in those days
(I wrote the CGA driver for Visicalc Advanced Version and several other
Software Arts programs, some of which were sold to Lotus when they
bought our assets, and hired me, in 1985) was to measure the "horizontal
retrace time" and the "vertical blanking interval" when the program
started, and compile screen-writing code that squeezed as many writes as
possible into both horizontal retraces and vertical retraces. That was
actually a "selling point" for spreadsheets - the reviewers actually
measured whether you could use the down-arrow key in auto-repeat mode
and keep the screen scrolling at the relevant rate! That was hard on an
8088 or 80286 processor, with a CGA card.
It was great when EGA and VGA came out, but we still had to support the
CGA long after. Which is why I fully understand the need not to break
old machines. We had to run on every machine that was claimed to be "PC
compatible" - many of which were hardly so compatible (the PS/2 model
50 had a completely erroneous serial chip that claimed to emulate the
original 8250, but had an immense pile of bugs, for example, that IBM
begged ISVs to call a software problem and fix so they didn't get sued).
The IBM PC bus (predecessor of the current ISA bus, which came from the
PC-AT's 16-bit bus), did just fine electrically - any I/O port-specific
timing problems had to do with the timing of the chips attached to the
bus. For example, if a bus write to a port was routed into a particular
chip, the timing of that chip's subsequent processing might be such that
it was not ready to respond to another read or write.) That's not a
"signalling" problem - it has nothing to do with capacitance on the bus,
e.g., but a functional speed problem in the chip (if on the motherboard)
or the adapter card.
Rant off. This has nothing, of course, to do with present issues.
David P. Reed wrote:
>
>
> H. Peter Anvin wrote:
>> Now, I think there is a specific reason to believe that EGA/VGA (but
>> perhaps not CGA/MDA) didn't need these kinds of hacks: the video cards
>> of the day was touched, directly, by an interminable number of DOS
>> applications. CGA/MDA generally *were not*, due to the unsynchronized
>> memory of the original versions (writing could cause snow), so most
>> applications tended to fall back to using the BIOS access methods for
>> CGA and MDA.
>>
> A little history... not that it really matters, but some might be
> interested in a 55-year-old hacker's sentimental recollections...As
> someone who actually wrote drivers for CGA and MDA on the original IBM
> PC, I can tell you that back to back I/O *port* writes and reads were
> perfectly fine. The "snow" problem had nothing to do with I/O ports.
> It had to do with the memory on the CGA adapter card not being dual
> ported, and in high-res (80x25) character mode (only!) a CPU read or
> write access caused a read of the adapter memory by the
> character-generator to fail, causing one character-position of the
> current scanline being output to get all random bits, which was then put
> through the character-generator and generated whatever the character
> generator did with 8 random bits of character or attributes as an index
> into the character generator's font table.
>
[Additional history snipped]
This is all true of course (and a useful history lesson to those not
familiar with it) but what I wrote above is still true: due to the lack
of synchronized memory (it doesn't have to be dual-ported, just
synchronized, if it has enough bandwidth), most DOS applications *in the
i386+ timeframe* just invoked the BIOS rather than dealing with the
synchronization needs themselves (anything compiled with a Borland
compiler using their conio library, for example.)
Hence the variety of software that poked directly at CGA/MDA as opposed
to EGA/VGA was smaller, but I never claimed it was uncommon.
-hpa
On Sun, 30 Dec 2007 16:23:20 -0800
"H. Peter Anvin" <[email protected]> wrote:
> > continuing to investigate for a cause. It would be nice if it were a
> > BIOS-fixable problem. It would be even nicer if the BIOS were GPL...
>
> If it was an SMM trap, I would expect it to be trapped in the SuperIO chip.
Many SuperIO chips do port 0x80, but they do it over the LPC and they do
it in hardware to the parallel port data lines. The timings posted for
0x80 on his box are really a bit fast for LPC.
On Sun 2007-12-30 21:46:50, Ingo Molnar wrote:
>
> * Alan Cox <[email protected]> wrote:
>
> > > So the current plan is to go with an io_delay=udelay default in v2.6.25,
> > > to give this a migration window, and io_delay=none in v2.6.26 [and a
> > > complete removal of arch/x86/kernel/io_delay.c], once the _p() uses are
> > > fixed up. This is gradual enough to notice any regressions we care about
> > > and also makes it nicely bisectable and gradual.
> >
> > You will break systems if you blindly go around disabling _p delays
> > for ISA and LPC bus devices. The DEC Hinote laptops for example are
> > well known for requiring the correct ISA and other keyboard controller
> > delays. I don't expect anyone to test with a hinote or see it until it
> > hits Debian or similar 'low resource' friendly devices.
>
> well, using io_delay=udelay is not 'blindly disabling'. io_delay=none
> would be the end goal, once all _p() API uses are eliminated by
> transformation. In drivers/ alone that's more than 1000 callsites, so
> it's quite frequently used, and wont go away overnight.
IOW elimination of broken inb_p()/outb_p() interfaces is the ultimate
goal. Agreed.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> > rene@7ixe4:~/src/port80$ su -c ./port80
> > cycles: out 2400, in 2401
> > rene@7ixe4:~/src/port80$ su -c ./port3cc
> > cycles: out 459, in 394
> >
> > As stated a few dozen times by now already, port 0x80 is _decidedly_ _non_
> > _random_
>
> Oh, I agree 100% that port 80 is not random. It's very much selected on
> purpose.
>
> The reason we use port 80 is because it's commonly used as the BIOS POST
> debug port, which means that it's been a reasonable "safe" port to use,
> since nobody would be so crazy as to actually hook up a real device behind
> that port (and since it is below 0x100, it's also part of the "motherboard
> range", so you won't have any crazy plug-in devices either).
Eh?
I have two mainboards here that have debug displays hooked to port
0x80. I have PCI DEBUG card that has display on port 0x80.
"You plug in PCI DEBUG card and it overclocks your machine" is bad
scenario.. (I don't know if it does... can PCI card emulate ISA timings?)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, 30 Dec 2007 19:14:40 +0100
Rene Herman <[email protected]> wrote:
> On 30-12-07 17:48, Alan Cox wrote:
>
> > For processors with TSC I think we should aim for 2.6.25 to do this and
> > to have the major other _p fixups done. I pity whoever does stuff like
> > the scc drivers but most of the rest isn't too bad.
>
> I'm by the way looking at drivers/net/wd.c which my 386 uses for its dual
> mode NE2000/WD8013 clone ISA NIC and while it specifically needs no delay at
> all it seems, the mixed use of out and outb_p seems to suggest that someone
> once thought about that. Would you advice sticking in a udelay(2) manually
> there?
I dug out the reference drivers. The reference drivers use the delay and
the 8390 datasheet confirms it is neccessary.
The Crynwr driver has some interesting things to say
| The National 8390 Chip (NIC) requires 4 bus clocks between successive
| chip selects (National DP8390 Data Sheet Addendum, June 1990)
Also " To establish a minimum delay, an I/O instruction must be used. A
good rule of ; thumb is that ISA I/O instructions take ~1.0 microseconds
and MCA I/O ; instructions take ~0.5 microseconds. Reading the NMI Status
Register (0x61) ; is a good way to pause on all machines."
But all the official drivers use pauses and the manual says they are
needed for correct, reliable behaviour - at least with a genuine 8390.
> "You plug in PCI DEBUG card and it overclocks your machine" is bad
> scenario.. (I don't know if it does... can PCI card emulate ISA timings?)
Easily. Its a bit more restricted by later spec revisions but it can halt
your box of a week or two if it wants. Video cards used to pull this
stunt for marketing benchmark numbers.
/* gcc -W -Wall -O2 -o portime portime.c */
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <sys/io.h>
#define LOOPS 10000
inline uint64_t rdtsc(void)
{
uint32_t hi, lo;
asm ("rdtsc": "=d" (hi), "=a" (lo));
return (uint64_t)hi << 32 | lo;
}
inline void serialize(void)
{
asm ("cpuid": : : "eax", "ebx", "ecx", "edx");
}
int main(void)
{
uint64_t tsc0, tsc1, tsc2, tsc3, tsc4;
uint64_t out, in8, in6;
int i;
if (iopl(3) < 0) {
perror("iopl");
return EXIT_FAILURE;
}
asm ("cli");
tsc0 = rdtsc();
for (i = 0; i < LOOPS; i++) {
serialize();
serialize();
}
tsc1 = rdtsc();
for (i = 0; i < LOOPS; i++) {
serialize();
asm ("outb %al, $0x80");
serialize();
}
tsc2 = rdtsc();
for (i = 0; i < LOOPS; i++) {
serialize();
asm ("inb $0x80, %%al": : : "al");
serialize();
}
tsc3 = rdtsc();
for (i = 0; i < LOOPS; i++) {
serialize();
asm ("inb $0x61, %%al": : : "al");
serialize();
}
tsc4 = rdtsc();
asm ("sti");
out = ((tsc2 - tsc1) - (tsc1 - tsc0)) / LOOPS;
in8 = ((tsc3 - tsc2) - (tsc1 - tsc0)) / LOOPS;
in6 = ((tsc4 - tsc3) - (tsc1 - tsc0)) / LOOPS;
printf("out 0x80: %llu cycles\n", out);
printf("in 0x80: %llu cycles\n", in8);
printf("in 0x61: %llu cycles\n", in6);
return EXIT_SUCCESS;
}
On Sun, 30 Dec 2007, Alan Cox wrote:
> On Sun, 30 Dec 2007 12:53:02 -0800
> "H. Peter Anvin" <[email protected]> wrote:
> > Bodo Eggert wrote:
> > > I've never seen code which would do that, and it was not suggested by any
> > > tutorial I ever saw. I'd expect any machine to break on all kinds of software
> > > if it required this. The only thing I remember being warned about is writing
> > > the index and the data register at the same time using outw, because that
> > > would write both registers at the same time on 16-bit-cards.
> > >
> >
> > And we use that, and have been for 15 years. I haven't seen any screams
> > of pain about it.
>
> Actually there were, and I sent numerous people patches for that back in
> ISA days.
Are you talking about VGA cards requiring a delay between outb index/outb
data, VGA cards barfing on outw or systems barfing on outb(0x80,42)?
--
Programming is an art form that fights back.
On Sun, 30 Dec 2007 21:13:29 +0000
Alan Cox <[email protected]> wrote:
> > But that does't mean that other ports won't have the same timings.
> > Also, it doesn't mean that we really need to have exactly *those*
> > timings.
>
> For ISA bus you want "at least" those timings. That is an easy case
> anyway - ISA bus boxes, old processors and generally no TSC so we can
> fall back to 0x80 - we know from 15 years experience the problem only
> occurs with recent non ISA systems that have borked firmware.
>
> Lots of ISA hardware does really need the delays and most of it will
> be on old processors as well naturally enough.
If I recall correctly, the MediaGX/Geode processor does need _p for the
PIT accesses, and that CPU family does have a TSC (even though the TSC
stops at times so is hard to use). I also seem to remember that the
breakage did not happen very often, but running a system without _p
overnight usually showed one hiccup where a read from the counter got
corrupted.
So unless I'm wrong (which I very well could be, it's been a couple of
years since I was debugging the PIT code on a misbehaving Geode SC1200
based system) there is at least one fairly modern CPU, which is used in
lots of embedded systems, and in active use, which does need the _p.
Just a data point... It's not only ancient systems that need _p.
/Christer
--
"Just how much can I get away with and still go to heaven?"
Christer Weinigel <[email protected]> http://www.weinigel.se
> Okay. Am about to go stuff my face with new years celebrations but will
> definitely try to make that old WD8003 hickup.
Have fun. Is it an 8390 or an 83905 ?
> By the way, expected, but before anyone else mentions it -- no, reading from
> port 0x61 is not a reliable delay today. Duron 1300 / AMD756:
No big suprise - the comment is from about 1992.
Alan
On Mon, 31 Dec 2007 15:39:02 +0100 (CET)
> > Actually there were, and I sent numerous people patches for that back in
> > ISA days.
>
> Are you talking about VGA cards requiring a delay between outb index/outb
> data, VGA cards barfing on outw or systems barfing on outb(0x80,42)?
VGA cards barfing on outw - on some trident at least it would cause weird
display messups when scrolling the text console that went right the next
scroll.
Alan Cox wrote:
> On Sun, 30 Dec 2007 16:23:20 -0800
> "H. Peter Anvin" <[email protected]> wrote:
>
>>> continuing to investigate for a cause. It would be nice if it were a
>>> BIOS-fixable problem. It would be even nicer if the BIOS were GPL...
>> If it was an SMM trap, I would expect it to be trapped in the SuperIO chip.
>
> Many SuperIO chips do port 0x80, but they do it over the LPC and they do
> it in hardware to the parallel port data lines. The timings posted for
> 0x80 on his box are really a bit fast for LPC.
Ah, that would eliminate the SuperIO chip.
-hpa
Alan Cox wrote:
>> "You plug in PCI DEBUG card and it overclocks your machine" is bad
>> scenario.. (I don't know if it does... can PCI card emulate ISA timings?)
>
> Easily. Its a bit more restricted by later spec revisions but it can halt
> your box of a week or two if it wants. Video cards used to pull this
> stunt for marketing benchmark numbers.
The drivers, specifically (the old "don't check if the command FIFO is
full before writing, just write anyway and if it's full let the whole
PCI bus stall while the FIFO empties out" trick).
I rather doubt any of those PCI POST debug cards would bother to
accurately emulate the ISA timings of normal port 0x80 accesses,
however. Most likely if you plug those in, port 0x80 accesses suddenly
become lots faster now that the writes are completing on the PCI bus
before ever hitting ISA/LPC..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Monday 31 December 2007 16:56:00 Alan Cox wrote:
> > Okay. Am about to go stuff my face with new years celebrations but will
> > definitely try to make that old WD8003 hickup.
>
> Have fun. Is it an 8390 or an 83905 ?
>
What about HP PCLan 16/TP+ cards? I have one that runs 24/7 in a 486 box
(2.6.20.6 kernel) and one spare. It has some VLSI HP chip and also ST-NIC
DP83902AV - is that a good candidate for testing?
I also have WD8013EP with DP8390DV chip - that's probably even better.
--
Ondrej Zary
> What about HP PCLan 16/TP+ cards? I have one that runs 24/7 in a 486 box
> (2.6.20.6 kernel) and one spare. It has some VLSI HP chip and also ST-NIC
> DP83902AV - is that a good candidate for testing?
What are you trying to test. The documentation explicitly says you need
the delays and that the delays are in bus clocks not microseconds. That
means the existing code is correct and it needs a delay dependant on the
ISA bus clock frequency (somewhere between 6 and 12MHz). Note that the
delay depends on the bus clock frequency not time.
We don't do overclocking, we don't support overclocking, please do not
overclock your ethernet chip.
Alan
Alan Cox wrote:
>> What about HP PCLan 16/TP+ cards? I have one that runs 24/7 in a 486 box
>> (2.6.20.6 kernel) and one spare. It has some VLSI HP chip and also ST-NIC
>> DP83902AV - is that a good candidate for testing?
>
> What are you trying to test. The documentation explicitly says you need
> the delays and that the delays are in bus clocks not microseconds. That
> means the existing code is correct and it needs a delay dependant on the
> ISA bus clock frequency (somewhere between 6 and 12MHz). Note that the
> delay depends on the bus clock frequency not time.
>
> We don't do overclocking, we don't support overclocking, please do not
> overclock your ethernet chip.
>
However, assuming a bus clock of 6 MHz should be safe (167 ns).
4 bus clocks would be 667 ns, or we can round it up to 1 ms to deal with
bus delay effects.
None of this really helps with *memory-mapped* 8390, though, since
memory mapped writes can be posted. Putting any IOIO transaction in the
middle has the effect of flushing the posting queues; an MMIO read would
also work. The WD80x3 cards were memory-mapped, in particular (and
were some of the very first cards supported by Linux.)
-hpa
> However, assuming a bus clock of 6 MHz should be safe (167 ns).
Agreed - or ISA timings directly. Boxes using WD80x3 are not going to
have a TSC so might as well stick with port 0x80 as they have done just
fine for the past 15 years.
> None of this really helps with *memory-mapped* 8390, though, since
> memory mapped writes can be posted. Putting any IOIO transaction in the
ISA isn't posted only PCI.
PCI 8390 clones seem to be a mix of ASICs and 8390x chips with
some quite disgusting FPGA glue logic.
Alan
Alan Cox wrote:
>> However, assuming a bus clock of 6 MHz should be safe (167 ns).
>
> Agreed - or ISA timings directly. Boxes using WD80x3 are not going to
> have a TSC so might as well stick with port 0x80 as they have done just
> fine for the past 15 years.
>
>> None of this really helps with *memory-mapped* 8390, though, since
>> memory mapped writes can be posted. Putting any IOIO transaction in the
>
> ISA isn't posted only PCI.
>
> PCI 8390 clones seem to be a mix of ASICs and 8390x chips with
> some quite disgusting FPGA glue logic.
>
ISA isn't posted no, but on several chipsets the upstream PCI bus will
post MMIO writes to ISA space regardless of the spec.
-hpa
* Pavel Machek <[email protected]> wrote:
> > well, using io_delay=udelay is not 'blindly disabling'.
> > io_delay=none would be the end goal, once all _p() API uses are
> > eliminated by transformation. In drivers/ alone that's more than
> > 1000 callsites, so it's quite frequently used, and wont go away
> > overnight.
>
> IOW elimination of broken inb_p()/outb_p() interfaces is the ultimate
> goal. Agreed.
yeah - although i'd not call it "broken", it's simply historic, and due
to the side-effects of the _implementation_, a few non-standard uses
(such as reliance on PCI posting/flushing effects) grew.
Ingo
On 31-12-07 16:56, Alan Cox wrote:
>> Okay. Am about to go stuff my face with new years celebrations but will
>> definitely try to make that old WD8003 hickup.
>
> Have fun. Is it an 8390 or an 83905 ?
A DP8390BN. And I have a DP8390CN on a 3Com Etherlink II. The NE1000 has a
DP83901AV, my new-fangled Networth combo WD/NE cards DP83905s.
Rene.