2002-01-09 18:50:06

by Sipos Ferenc

[permalink] [raw]
Subject: system time issue

Hi!

I have a redhat 7.2 box, and compiled the latest 2.4.18pre2 kernel with
the 2.4.18pre2aa1 patches in order to use gcc pre 3.1. When I shutdown
my system with the new kernel, it writes out normally: syncing system
time with hardware time, and with gcc 2.96 compiled kernels, everything
is ok, but with this kernel, after reboot the bios beeps and the
hardware time and date stands at the beginning, so I have to setup
manually to continue the post process. I think, gcc pre3.1 miscompiles
something in the kernel.

Paco



2002-01-09 19:45:13

by Richard B. Johnson

[permalink] [raw]
Subject: Re: system time issue

On 9 Jan 2002, Sipos Ferenc wrote:

> Hi!
>
> I have a redhat 7.2 box, and compiled the latest 2.4.18pre2 kernel with
> the 2.4.18pre2aa1 patches in order to use gcc pre 3.1. When I shutdown
> my system with the new kernel, it writes out normally: syncing system
> time with hardware time, and with gcc 2.96 compiled kernels, everything
> is ok, but with this kernel, after reboot the bios beeps and the
> hardware time and date stands at the beginning, so I have to setup
> manually to continue the post process. I think, gcc pre3.1 miscompiles
> something in the kernel.
>

Here is a test program I would like everybody to try. It is a module
but is not intended to actually be implimented in the kernel as a
module once eveybody finds out that it will reliably cold-start even
a SMP machine. This is for Intel CPUs only.

The purpose of the module is to execute two simple CPU instructions
in kernel mode. This forces a processor reset, just like hitting
the reset button.

There are two scripts that you should look at because you might
want to go to single-user mode first and unmount your disk(s)
if your `kill -1` command kills you as well as everybody else.

I think all the complex reboot stuff in the kernel can be removed
and the simple two-line assembly substituted.

If anybody finds that executing the 'open()' of this test module
does not cause a processor reset in which a cold boot is forced,
please let me know.

Once most everybody has faith that this will produce a clean
reboot, somebody, maybe Allan can use it to replace the reboot
code in the kernel.

The code works by disabling paging while executing code where
there is not a 1:1 physical/virtual page mapping. I have never
found a system, even one with two CPUs that did not instantly
reset.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


Attachments:
reboot.tar.gz (1.24 kB)

2002-01-09 19:58:05

by Brian Gerst

[permalink] [raw]
Subject: Re: system time issue

"Richard B. Johnson" wrote:
>
> The code works by disabling paging while executing code where
> there is not a 1:1 physical/virtual page mapping. I have never
> found a system, even one with two CPUs that did not instantly
> reset.

All you are doing is causing a triple fault, started with most likely an
invalid op fault. There are many ways of doing that, including the no
idt way the kernel currently uses, which IMHO would be more reliable
that depending on the processor crashing on random memory.

--

Brian Gerst

2002-01-09 20:11:38

by Richard B. Johnson

[permalink] [raw]
Subject: Re: system time issue

On Wed, 9 Jan 2002, Brian Gerst wrote:

> "Richard B. Johnson" wrote:
> >
> > The code works by disabling paging while executing code where
> > there is not a 1:1 physical/virtual page mapping. I have never
> > found a system, even one with two CPUs that did not instantly
> > reset.
>
> All you are doing is causing a triple fault, started with most likely an
> invalid op fault. There are many ways of doing that, including the no
> idt way the kernel currently uses, which IMHO would be more reliable
> that depending on the processor crashing on random memory.
>
> --
>
> Brian Gerst

Kernel version 2.4.1 through 17 (last I checked 17) used a bunch of
ways including the keyboard controller, the aux control port, then
finaly a transition to 16-bit address space with direct execution
of the reset vector. I found that the only reason the transition
to 16-bits "worked" was because of coding errors which caused the
processor reset.

Therefore, I created a deliberate "coding error" which actually
doesn't require fetching random instructions. The processor never
even gets to fetch anything because a CS selector for the correct
segment has never been loaded. It can't fetch instructions and,
in fact a logic analyzer shows that the last memory access was
the dword load of the trash CR0 instructions plus some additional
cache-line fill of valid code.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2002-01-09 23:36:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: system time issue

Followup to: <[email protected]>
By author: "Richard B. Johnson" <[email protected]>
In newsgroup: linux.dev.kernel
>
> Kernel version 2.4.1 through 17 (last I checked 17) used a bunch of
> ways including the keyboard controller, the aux control port, then
> finaly a transition to 16-bit address space with direct execution
> of the reset vector. I found that the only reason the transition
> to 16-bits "worked" was because of coding errors which caused the
> processor reset.
>

That is only invoked *IF REQUESTED BY USERSPACE*.

This is the real termination:

if(!reboot_thru_bios) {
/* rebooting needs to touch the page at absolute addr
0 */
*((unsigned short *)__va(0x472)) = reboot_mode;
for (;;) {
int i;
for (i=0; i<100; i++) {
kb_wait();
udelay(50);
outb(0xfe,0x64); /* pulse reset low */
udelay(50);
}
/* That didn't work - force a triple fault.. */
__asm__ __volatile__("lidt %0": :"m" (no_idt));
__asm__ __volatile__("int3");
}
}

Zero the IDT and force an interrupt -> triple fault.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>