2000-10-26 17:03:34

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

On Thu, Oct 26, 2000 at 12:04:21PM -0400, Richard B. Johnson wrote:

> ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> to the timer. It writes 0 to the control-word for timer 0. This
> does the following:
>
> o Selects timer 0.
> o Latches the timer.
> o Selects mode 0.
> o Programs it to a 16 bit counter.
>
> The result is a latched (stopped) counter. Bits 5 and 4 should have been
> selected. Then you read bits 0-7 from 0x40, followed by bits 8-15 from
> the same port.
>
> Also, there is no spin-lock protecting access to these ports. If anybody
> else is mucking with the timer, all bets are off.

Well, at least on 2.4.0-test9, the above timing code is #ifed to
DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
include/linux/ide.h.

So this is not our problem here. Anyway I guess it's time to hunt for
i8259 accesses in the kernel that lack the necessary spinlock, even when
they're not probably the cause of the problem we see here.

--
Vojtech Pavlik
SuSE Labs


2000-10-26 17:43:57

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

On Thu, 26 Oct 2000, Vojtech Pavlik wrote:

> On Thu, Oct 26, 2000 at 12:04:21PM -0400, Richard B. Johnson wrote:
>
> > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > to the timer. It writes 0 to the control-word for timer 0. This
> > does the following:
[Snipped...]
>
> Well, at least on 2.4.0-test9, the above timing code is #ifed to
> DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> include/linux/ide.h.
>
> So this is not our problem here. Anyway I guess it's time to hunt for
> i8259 accesses in the kernel that lack the necessary spinlock, even when
> they're not probably the cause of the problem we see here.

Okay, good.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-10-26 18:02:53

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

On Thu, Oct 26, 2000 at 01:42:29PM -0400, Richard B. Johnson wrote:

> > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > to the timer. It writes 0 to the control-word for timer 0. This
> > > does the following:
> [Snipped...]
> >
> > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > include/linux/ide.h.
> >
> > So this is not our problem here. Anyway I guess it's time to hunt for
> > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > they're not probably the cause of the problem we see here.
>
> Okay, good.

Ok, here is a list of places within the kernel that access the PIT
timer, plus the method of locking (i386 arch only):

Usage: Lock method:

arch/i386/kernel/time.c:170: spin_lock()
arch/i386/kernel/time.c:491: spin_lock()
arch/i386/kernel/time.c:575: none (init)
arch/i386/kernel/i8259.c:491: none (init)
arch/i386/kernel/apm.c:871: cli()
arch/i386/kernel/apic.c:398: spin_lock_irqsave()

drivers/char/vt.c:121: cli()
drivers/char/ftape/lowlevel/ftape-calibr.c:80: cli()
drivers/char/ftape/lowlevel/ftape-calibr.c:99: cli()
drivers/char/joystick/analog.c:142: cli() __cli()
drivers/char/joystick/gameport.c:66: cli()
drivers/ide/hd.c:137: cli()
drivers/ide/ide.c:206: __cli()

I guess we'll need to fix this. While races here are not likely (the
most likely is a beep by vt.c at a wrong moment), they're possible.

However, these don't seem to be the cause of the problem we see here
anyway.

--
Vojtech Pavlik
SuSE Labs

2000-10-26 20:12:20

by Yoann Vandoorselaere

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

Vojtech Pavlik <[email protected]> writes:

> On Thu, Oct 26, 2000 at 01:42:29PM -0400, Richard B. Johnson wrote:
>
> > > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > > to the timer. It writes 0 to the control-word for timer 0. This
> > > > does the following:
> > [Snipped...]
> > >
> > > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > > include/linux/ide.h.
> > >
> > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > they're not probably the cause of the problem we see here.
> >
> > Okay, good.
>
> Ok, here is a list of places within the kernel that access the PIT
> timer, plus the method of locking (i386 arch only):

[...]

Ok, I just tested if the problem was always present without
the IDE subsystem...

The answer is it is not... so it isn't an IDE problem.

--
-- Yoann http://www.mandrakesoft.com/~yoann/
An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."

2000-10-26 20:17:20

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

On Thu, Oct 26, 2000 at 10:11:54PM +0200, Yoann Vandoorselaere wrote:

> > > > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > > > to the timer. It writes 0 to the control-word for timer 0. This
> > > > > does the following:
> > > [Snipped...]
> > > >
> > > > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > > > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > > > include/linux/ide.h.
> > > >
> > > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > > they're not probably the cause of the problem we see here.
> > >
> > > Okay, good.
> >
> > Ok, here is a list of places within the kernel that access the PIT
> > timer, plus the method of locking (i386 arch only):
>
> [...]
>
> Ok, I just tested if the problem was always present without
> the IDE subsystem...
>
> The answer is it is not... so it isn't an IDE problem.

Uh, guess too many negations. You wanted to say that the problem was
present even when you disabled the IDE subsystem, right?

So now it seems that possibly enough PCI traffic / busmastering traffic
can cause the problem ...

--
Vojtech Pavlik
SuSE Labs

2000-10-26 21:05:44

by Yoann Vandoorselaere

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

Vojtech Pavlik <[email protected]> writes:

> On Thu, Oct 26, 2000 at 10:11:54PM +0200, Yoann Vandoorselaere wrote:
>
> > > > > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > > > > to the timer. It writes 0 to the control-word for timer 0. This
> > > > > > does the following:
> > > > [Snipped...]
> > > > >
> > > > > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > > > > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > > > > include/linux/ide.h.
> > > > >
> > > > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > > > they're not probably the cause of the problem we see here.
> > > >
> > > > Okay, good.
> > >
> > > Ok, here is a list of places within the kernel that access the PIT
> > > timer, plus the method of locking (i386 arch only):
> >
> > [...]
> >
> > Ok, I just tested if the problem was always present without
> > the IDE subsystem...
> >
> > The answer is it is not... so it isn't an IDE problem.
>
> Uh, guess too many negations. You wanted to say that the problem was
> present even when you disabled the IDE subsystem, right?

yop

>
> So now it seems that possibly enough PCI traffic / busmastering traffic
> can cause the problem ...

yop, I 've done :

make -j10 World
in the xfree tree and simulateously :

while true; do make dep && make clean && make bzImage; done
in the kernel tree


--
-- Yoann http://www.mandrakesoft.com/~yoann/
An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."

2000-10-26 21:16:19

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:

> yop, I 've done :
>
> make -j10 World
> in the xfree tree and simulateously :
>
> while true; do make dep && make clean && make bzImage; done
> in the kernel tree

Now it'd be nice to verify that the problem also happens when the system
is not running out of memory (which -j10 quite causes I think) ...

--
Vojtech Pavlik
SuSE Labs

2000-10-26 21:25:00

by Yoann Vandoorselaere

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

Vojtech Pavlik <[email protected]> writes:

> On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:
>
> > yop, I 've done :
> >
> > make -j10 World
> > in the xfree tree and simulateously :
> >
> > while true; do make dep && make clean && make bzImage; done
> > in the kernel tree
>
> Now it'd be nice to verify that the problem also happens when the system
> is not running out of memory (which -j10 quite causes I think) ...

Nope, my system was loaded, but was usable
(at least until the problem occured)...

Athlon 750 with 128mb of ram and 103mb of swap.

--
-- Yoann http://www.mandrakesoft.com/~yoann/
An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."

2000-10-26 21:26:50

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: Possible critical VIA vt82c686a chip bug (private question)

On Thu, Oct 26, 2000 at 11:24:38PM +0200, Yoann Vandoorselaere wrote:
> Vojtech Pavlik <[email protected]> writes:
>
> > On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:
> >
> > > yop, I 've done :
> > >
> > > make -j10 World
> > > in the xfree tree and simulateously :
> > >
> > > while true; do make dep && make clean && make bzImage; done
> > > in the kernel tree
> >
> > Now it'd be nice to verify that the problem also happens when the system
> > is not running out of memory (which -j10 quite causes I think) ...
>
> Nope, my system was loaded, but was usable
> (at least until the problem occured)...

Good to know.

--
Vojtech Pavlik
SuSE Labs