2000-12-06 00:02:39

by H. Peter Anvin

[permalink] [raw]
Subject: That horrible hack from hell called A20

Okay, here is my latest attempt to find a way to toggle A20M# that
genuinely works on all machines -- including Olivettis, IBM Aptivas,
bizarre notebooks, yadda yadda.

The bizarre notebooks with broken SMM code are the ones I really worry
most about... there may very well be NO WAY to support them that actually
works on all machines, because you don't get any kind of feedback that
they are broken.

If you have had A20M# problems with any kernel -- recent or not --
*please* try this patch, against 2.4.0-test12-pre5:

ftp://ftp.kernel.org/pub/linux/kernel/people/hpa/a20-test12-pre5.diff

--- arch/i386/boot/setup.S.12p5 Tue Dec 5 15:19:10 2000
+++ arch/i386/boot/setup.S Tue Dec 5 15:22:13 2000
@@ -636,6 +636,7 @@
# First, try the "fast A20 gate".
#
inb $0x92,%al
+ movb %al,%dl
orb $0x02,%al # Fast A20 on
andb $0xfe,%al # Don't reset CPU!
outb %al,$0x92
@@ -648,9 +649,17 @@
# did the trick and stop its probing at that stage; but subsequent ones
# must not do so.
#
+ pushw %dx
movb $0x01,%dl # A20-sensitive
call empty_8042
+ popw %dx
jnz a20_wait # A20 already on?
+
+# If A20 is still off, we need to go to the KBC. Set the
+# "fast A20 gate" to its original value, since some smartass
+# manufacturers have apparently decided to use A20M# as a GPIO!
+ movb %dl,%al
+ outb %al,$0x92

movb $0xD1, %al # command write
outb %al, $0x64


--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt


2000-12-06 00:45:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20



On Tue, 5 Dec 2000, H. Peter Anvin wrote:
>
> Okay, here is my latest attempt to find a way to toggle A20M# that
> genuinely works on all machines -- including Olivettis, IBM Aptivas,
> bizarre notebooks, yadda yadda.

I really think that the 0x92 accesses are still unsafe.

I will bet that the same way some manufacturers use the A20 output as a
GPIO, they might also use the keyboard _reset_ output as a GPIO. This
would explain why we have problems on getting back from resume.

So the "orb $2,%al ; andb $0xfe,%al" will potentially change both of
these. And I'd feel a hell of a lot more safe, if we avoided using 0x92
except when we find that we absolutely _have_ to.

How about making the keyboard controller timeouts shorter, and moving all
the 0x92 games to after the keyboard controller games. That, I feel, would
be the safest approach: try the really old approach first (that people are
the least likely to use as GPIO - it's just too damn painful to go through
the keyboard controller, and the keyboard controller A20 logic is just too
well documented, so nobody would use it for anything else).

If the keyboard controller times out, or if A20 still doesn't seem to be
enabled, only _then_ would we do the 0x92 testing.

Btw, do we actually know of any machine that really needs the "and $0xfe"?
That register really makes me nervous.

Linus

2000-12-06 00:47:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

Linus Torvalds wrote:
>
> So the "orb $2,%al ; andb $0xfe,%al" will potentially change both of
> these. And I'd feel a hell of a lot more safe, if we avoided using 0x92
> except when we find that we absolutely _have_ to.
>
> How about making the keyboard controller timeouts shorter, and moving all
> the 0x92 games to after the keyboard controller games. That, I feel, would
> be the safest approach: try the really old approach first (that people are
> the least likely to use as GPIO - it's just too damn painful to go through
> the keyboard controller, and the keyboard controller A20 logic is just too
> well documented, so nobody would use it for anything else).
>
> If the keyboard controller times out, or if A20 still doesn't seem to be
> enabled, only _then_ would we do the 0x92 testing.
>
> Btw, do we actually know of any machine that really needs the "and $0xfe"?
> That register really makes me nervous.
>

Good question. The whole thing makes me nervous... in fact, perhaps we
should really consider using the BIOS INT 15h interrupt to enter
protected mode?

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-12-06 00:52:34

by Kai Germaschewski

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20


On Tue, 5 Dec 2000, H. Peter Anvin wrote:

> If you have had A20M# problems with any kernel -- recent or not --
> *please* try this patch, against 2.4.0-test12-pre5:

Just a datapoint: This patch doesn't fix the problem here (Sony
PCG-Z600NE). Still the spontaneous reboot exactly the moment I expect to
get my console back from resumeing.

--Kai


2000-12-06 01:20:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20



On Wed, 6 Dec 2000, Kai Germaschewski wrote:

>
> On Tue, 5 Dec 2000, H. Peter Anvin wrote:
>
> > If you have had A20M# problems with any kernel -- recent or not --
> > *please* try this patch, against 2.4.0-test12-pre5:
>
> Just a datapoint: This patch doesn't fix the problem here (Sony
> PCG-Z600NE). Still the spontaneous reboot exactly the moment I expect to
> get my console back from resumeing.

Can you test whether it's the "and 0xfe" or the "or $2" that does it for
you?

Right now we know that the Olivetti M4 has problems with the "or $2". I'd
like to know if this is the same bit #1, or whether it's #0.

[ And I agree with Peter - if somebody knows BIOS programming and how to
use "int 15" to enter protected mode, then that migth well be the
easiest solution. The only real reason the linux setup code does it by
hand is that the original code was written that way - and it was written
that way because I had never used the BIOS in my life before, _and_ I
wanted to learn the i386. Both of which were valid reasons back in 1991.
Neither of which is probably a very good reason ten years later ;]

Linus

2000-12-06 01:38:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20



On Wed, 6 Dec 2000, Kai Germaschewski wrote:

>
> On Tue, 5 Dec 2000, H. Peter Anvin wrote:
>
> > If you have had A20M# problems with any kernel -- recent or not --
> > *please* try this patch, against 2.4.0-test12-pre5:
>
> Just a datapoint: This patch doesn't fix the problem here (Sony
> PCG-Z600NE). Still the spontaneous reboot exactly the moment I expect to
> get my console back from resumeing.

Actually, I bet I know what's up.

Want to bet $5 USD that suspend/resume saves the keyboard A20 state, but
does NOT save the fast-A20 gate information?

So anything that enables A20 with only the fast A20 gate will find that
A20 is disabled again on resume.

Which would make Linux _really_ unhappy, needless to say. Instant death in
the form of a triple fault (all of the Linux kernel code is in the 1-2MB
area, which would be invisible), resulting in an instant reboot.

Peter, we definitely need to do the keyboard A20, even if fast-A20 works
fine.

Linus

2000-12-06 01:41:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

Linus Torvalds wrote:
>
> Actually, I bet I know what's up.
>
> Want to bet $5 USD that suspend/resume saves the keyboard A20 state, but
> does NOT save the fast-A20 gate information?
>
> So anything that enables A20 with only the fast A20 gate will find that
> A20 is disabled again on resume.
>
> Which would make Linux _really_ unhappy, needless to say. Instant death in
> the form of a triple fault (all of the Linux kernel code is in the 1-2MB
> area, which would be invisible), resulting in an instant reboot.
>
> Peter, we definitely need to do the keyboard A20, even if fast-A20 works
> fine.
>

Yup. It's a BIOS bug, oh what a shocker... (that never happens, right)?

I might hack on using INT 15h to do the jump to protected mode, as ugly
as it is, but I won't have time before my trip. It would require quite a
bit of restructuring in setup.S, and would probably break LOADLIN.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-12-06 01:53:59

by Alan

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

> Okay, here is my latest attempt to find a way to toggle A20M# that
> genuinely works on all machines -- including Olivettis, IBM Aptivas,
> bizarre notebooks, yadda yadda.

Can I suggest a slightly different hammer. Flip the A20 via the keyboard
controller and set the timeout to say 1 second. If that fails then kick the
0x92 stuff ?


2000-12-06 02:07:53

by Alan

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

> Good question. The whole thing makes me nervous... in fact, perhaps we
> should really consider using the BIOS INT 15h interrupt to enter
> protected mode?

>From my experience with BIOS authors, only if Windows 98 uses the same function
with the same arguments, the same stuff top of stack and the same segment
registers loaded ;)


2000-12-06 02:09:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20



On Tue, 5 Dec 2000, H. Peter Anvin wrote:
>
> I might hack on using INT 15h to do the jump to protected mode, as ugly
> as it is, but I won't have time before my trip. It would require quite a
> bit of restructuring in setup.S, and would probably break LOADLIN.

Right now this is my interim patch (to clean test11). The thing to note is
that I decreased the keyboard controller timeout by a factor of about 167,
while making the "delay" a bit longer.

Now, if you don't have a keyboard controller, the bootup delay should be
on the order of 1.2 seconds or so (calling empty_8042 three times, each
around 0.4 seconds to time out). Which is acceptable. Especially as the
non-keyboard-controller machines that don't even emulate one are quite
rare. And it's still long enough that if the keyboard controller hasn't
emptied in 0.4 seconds, something else is badly wrong.

The non-keyboard-controller timeout used to be around three minutes
before. Which _definitely_ is excessive. Most people would assume that the
machine had hung.

Linus

----
--- v2.4.0-test11/linux/arch/i386/boot/setup.S Tue Oct 31 12:42:26 2000
+++ linux/arch/i386/boot/setup.S Tue Dec 5 17:31:53 2000
@@ -825,10 +825,18 @@
#
# Some machines have delusions that the keyboard buffer is always full
# with no keyboard attached...
+#
+# If there is no keyboard controller, we will usually get 0xff
+# to all the reads. With each IO taking a microsecond and
+# a timeout of 100,000 iterations, this can take about half a
+# second ("delay" == outb to port 0x80). That should be ok,
+# and should also be plenty of time for a real keyboard controller
+# to empty.
+#

empty_8042:
pushl %ecx
- movl $0x00FFFFFF, %ecx
+ movl $100000, %ecx

empty_8042_loop:
decl %ecx
@@ -867,7 +875,7 @@

# Delay is needed after doing I/O
delay:
- jmp .+2 # jmp $+2
+ outb %al,$0x80
ret

# Descriptor tables

2000-12-06 02:13:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20



On Tue, 5 Dec 2000, Linus Torvalds wrote:
>
> Right now this is my interim patch (to clean test11). The thing to note is
> that I decreased the keyboard controller timeout by a factor of about 167,
> while making the "delay" a bit longer.

Oh, btw, I forgot to ask people to give this a whirl. I assume it fixes
the APM problems for Kai.

It definitely won't fix the silly Olivetti M4 issue (we still touch bit #2
in 0x92). We'll need to fix that by testing A20 before bothering with the
0x92 stuff. Alan, that should get fixed in 2.2.x too - clearly those
Olivetti machines can be considered buggy, but even so..

Who else had trouble with the keyboard controller?

Linus

2000-12-06 02:37:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

Alan Cox wrote:
>
> > Okay, here is my latest attempt to find a way to toggle A20M# that
> > genuinely works on all machines -- including Olivettis, IBM Aptivas,
> > bizarre notebooks, yadda yadda.
>
> Can I suggest a slightly different hammer. Flip the A20 via the keyboard
> controller and set the timeout to say 1 second. If that fails then kick the
> 0x92 stuff ?

I think that's pretty much the one remaining plan.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-12-06 03:10:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

Linus Torvalds wrote:
>
> On Tue, 5 Dec 2000, Linus Torvalds wrote:
> >
> > Right now this is my interim patch (to clean test11). The thing to note is
> > that I decreased the keyboard controller timeout by a factor of about 167,
> > while making the "delay" a bit longer.
>
> Oh, btw, I forgot to ask people to give this a whirl. I assume it fixes
> the APM problems for Kai.
>
> It definitely won't fix the silly Olivetti M4 issue (we still touch bit #2
> in 0x92). We'll need to fix that by testing A20 before bothering with the
> 0x92 stuff. Alan, that should get fixed in 2.2.x too - clearly those
> Olivetti machines can be considered buggy, but even so..
>
> Who else had trouble with the keyboard controller?
>

Some IBM Aptiva box...

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-12-06 10:47:41

by Kai Germaschewski

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20



On Tue, 5 Dec 2000, Linus Torvalds wrote:

> On Wed, 6 Dec 2000, Kai Germaschewski wrote:
>
> >
> > On Tue, 5 Dec 2000, H. Peter Anvin wrote:
> >
> > > If you have had A20M# problems with any kernel -- recent or not --
> > > *please* try this patch, against 2.4.0-test12-pre5:
> >
> > Just a datapoint: This patch doesn't fix the problem here (Sony
> > PCG-Z600NE). Still the spontaneous reboot exactly the moment I expect to
> > get my console back from resumeing.

First of all, I tested removing the "or $2" in hpa's patch.
That made resume work. Then, putting back the "or" but removing the
"and $0xfe" had no effect at all (i.e. resume -> reboot).

I suppose not setting bit 1 made fast A20 enable not succeed, so kbc A20
was used -> resume works fine.

With setting bit 1 and with or w/o clearing bit 0, fast A20 enable seems
to succeed -> kbc A20 is not used -> resume reboots.


> Want to bet $5 USD that suspend/resume saves the keyboard A20 state, but
> does NOT save the fast-A20 gate information?

I suppose I'ld loose that bet.

> So anything that enables A20 with only the fast A20 gate will find that
> A20 is disabled again on resume.
>
> Which would make Linux _really_ unhappy, needless to say. Instant death in
> the form of a triple fault (all of the Linux kernel code is in the 1-2MB
> area, which would be invisible), resulting in an instant reboot.

Makes sense to me.

test12-pre6 now works perfectly :-)

--Kai


2000-12-06 13:13:59

by Giacomo A. Catenazzi

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

"H. Peter Anvin" wrote:
>
>
> Good question. The whole thing makes me nervous... in fact, perhaps we
> should really consider using the BIOS INT 15h interrupt to enter
> protected mode?
>

Maybe it is better to try with INT15 AX=2400 (Enable A20 gate).

INT 15-2400 enable A20
INT 15-2401 disable A20
INT 15-2402 query status A20
INT 15-2403 query A20 support (kdb or port 92)

IBM classifies these functions as optional, but it is enabled on a lot
of
new BIOS, no know conflicts, thus we can call this function to enable
A20,
check the result and only after failure we can try the old methods.


giacomo

2000-12-06 13:32:11

by Alan Cox

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

> INT 15-2401 disable A20
> INT 15-2402 query status A20
> INT 15-2403 query A20 support (kdb or port 92)
>
> IBM classifies these functions as optional, but it is enabled on a lot
> of
> new BIOS, no know conflicts, thus we can call this function to enable
> A20,
> check the result and only after failure we can try the old methods.

I trust Linus over BIOS vendors, every single time.

2000-12-06 16:24:50

by Richard B. Johnson

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

On Tue, 5 Dec 2000, Linus Torvalds wrote:

>
>
> On Wed, 6 Dec 2000, Kai Germaschewski wrote:
>
> >
> > On Tue, 5 Dec 2000, H. Peter Anvin wrote:
> >
> > > If you have had A20M# problems with any kernel -- recent or not --
> > > *please* try this patch, against 2.4.0-test12-pre5:
> >
> > Just a datapoint: This patch doesn't fix the problem here (Sony
> > PCG-Z600NE). Still the spontaneous reboot exactly the moment I expect to
> > get my console back from resumeing.
>
> Can you test whether it's the "and 0xfe" or the "or $2" that does it for
> you?
>
> Right now we know that the Olivetti M4 has problems with the "or $2". I'd
> like to know if this is the same bit #1, or whether it's #0.
>
> [ And I agree with Peter - if somebody knows BIOS programming and how to
> use "int 15" to enter protected mode, then that migth well be the
> easiest solution. The only real reason the linux setup code does it by
> hand is that the original code was written that way - and it was written
> that way because I had never used the BIOS in my life before, _and_ I
> wanted to learn the i386. Both of which were valid reasons back in 1991.
> Neither of which is probably a very good reason ten years later ;]
>
> Linus
>
> -

The protected-mode switch in INT 15 is probably the least tested BIOS
function ever. I wouldn't trust it, and relying on it will put further
burden on embedded Linux developers, many of whom don't even have a
BIOS. It is 'least tested' because there is no way provided to get
back to real-mode. This implies that somebody probably 'tested' it
once, verified that some simple 32-bit function executed for a
few microseconds, then declared; "It works!".

Many new chip-sets snoop for the sequence:

Write 0xd1 to port 0x64
Write 0xN1 to port 0x60

... Where 'N' are any bits and the LSB enables A<20> propagation.

The writes have to be in sequence, therefore, one must read 0x60
first, OR in bit 0. Write 0xD1 to 0x64, then write the new enable-
value to port 0x60.

It takes about 700 to 1500 microseconds for a real keyboard controller
to enable the A<20> propagation bit. It takes only a few hundred
nanoseconds for the virtual sequence, above, to do the same thing.

On all machines I have looked at, including several lap-tops, the
'fast' A<20> enable port is R/W. This means that you don't have to
crash the machine by setting some secret reserved bit. Just read first,
OR in your A<20> bit, then write.

You can experiment by booting DOS (or free-DOS), and play using DEBUG.
Setting A<20> while in real-mode DOS won't hurt anything. You can even
check for wrap beyond FFFF:0000 from DEBUG to see if the bit is enabled.

I suggest that a "universal" sequence is the D1/N1 sequence shown above
(this will work with real keyboard controllers), you don't have to
wait for completion of the last command as long as you don't have
more than two commands in sequence. This is because the controller
doesn't know if you ever read the status, and it will execute the
next command, if one exists, as soon as it writes the completion status
from the previous.

The next step of the "universal" sequence, if the previous doesn't
work, would be to enable the fast A<20> bit (only) in port 0x92.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-06 18:08:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

Alan Cox wrote:
>
> > INT 15-2401 disable A20
> > INT 15-2402 query status A20
> > INT 15-2403 query A20 support (kdb or port 92)
> >
> > IBM classifies these functions as optional, but it is enabled on a lot
> > of
> > new BIOS, no know conflicts, thus we can call this function to enable
> > A20,
> > check the result and only after failure we can try the old methods.
>
> I trust Linus over BIOS vendors, every single time.
>

The problem here is Linus doing things right, and the BIOS vendors not...
and the BIOS getting confused. If INT 15:24xx is supported, we might
actually want to use it, under the assumption that if it works, the BIOS
Suspend-to-Foo routines probably won't get too confused.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-12-06 18:57:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

"Richard B. Johnson" wrote:
>
> The protected-mode switch in INT 15 is probably the least tested BIOS
> function ever. I wouldn't trust it, and relying on it will put further
> burden on embedded Linux developers, many of whom don't even have a
> BIOS. It is 'least tested' because there is no way provided to get
> back to real-mode. This implies that somebody probably 'tested' it
> once, verified that some simple 32-bit function executed for a
> few microseconds, then declared; "It works!".
>

And of course, that's pretty much all we'd trust it to do. Personally,
I'd rather try to use the A20 gate function, if it works. I suspect that
between the machines where the BIOS or the KBC works, we should be close
to 100% coverage.

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-12-08 20:48:29

by Riley Williams

[permalink] [raw]
Subject: Re: That horrible hack from hell called A20

Hi Peter.

On Tue, 5 Dec 2000, H. Peter Anvin wrote:

> Linus Torvalds wrote:

>> Actually, I bet I know what's up.
>>
>> Want to bet $5 USD that suspend/resume saves the keyboard A20 state,
>> but does NOT save the fast-A20 gate information?
>>
>> So anything that enables A20 with only the fast A20 gate will find
>> that A20 is disabled again on resume.
>>
>> Which would make Linux _really_ unhappy, needless to say. Instant
>> death in the form of a triple fault (all of the Linux kernel code is
>> in the 1-2MB area, which would be invisible), resulting in an
>> instant reboot.
>>
>> Peter, we definitely need to do the keyboard A20, even if fast-A20
>> works fine.

> Yup. It's a BIOS bug, oh what a shocker... (that never happens,
> right)?

One alternative would presumably be to reserve a block in the 0-1M
region for a routine to be called on resume that makes sure everything
is set up correctly. However, from the various comments, I gather that
such is not viable as it's already been excluded for other reasons, but
nobody seems to say precicely what the problems with this idea are?

I would presume such a routine would be set up such that when it's time
to suspend, a call is made to that routine at its 0-1M address, so when
the resume kicks in, it sees an IP in the 0-1M region to resume to.

As part of the kernel start-up, the kernel would reserve the page in
question, then copy the suspend/resume code to it, and only then would
it enable the suspend/resume facility.

Best wishes from Riley.

---
* Why didn't Linus Torvalds write the resume specification,
* rather than those idiots at MacroHard !!!