2001-04-13 08:00:59

by Dennis Björklund

[permalink] [raw]
Subject: Data-corruption bug in VIA chipsets

Here might be one of the resons for the trouble with VIA chipsets:

http://www.theregister.co.uk/content/3/18267.html

Some DMA error corrupting data, sounds like a really nasty bug. The
information is minimal on that page.

I just bought one of these babies and I should probably return it
directly. I have seven days to return it and get my money back. I have not
even opened the box yet.

They seems to think they can correct it by some bios updates, but who
knows what that fix might be. Maybe they turn of DMA alltogether
(hopefully not).

If anybody knows more about it I'm very interested.

--
/Dennis


2001-04-13 09:45:27

by Ingo Oeser

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

On Fri, Apr 13, 2001 at 10:00:32AM +0200, Dennis Bjorklund wrote:
> Here might be one of the resons for the trouble with VIA chipsets:
>
> http://www.theregister.co.uk/content/3/18267.html
>
> Some DMA error corrupting data, sounds like a really nasty bug. The
> information is minimal on that page.

These are the things, that one of the German links[1] suggest
(translated only, because I'm not the IDE guy ;-)):

- PCI Delay Transaction = 0 (off) (Register 0x70, Bit 1)
- PCI Master Read Caching = 0 (off) (Register 0x70, Bit 2)
- PCI Latency = 0 (values between 0 and 32 *seem* to be safe,
everything above seems to be *not* !)

Note: This also fixes some related USB issues according to [1].

Some hassles of setting the "PCI Latency" are described and one
of their reader found out, that it is "PCI Bus Master Time-Out"
on his board.

Register 0x75, Bits 0-3 are at 0001, which means 32 as latency
value. He set it to 0000 and it helps. This setting also does no
harm according to the magazine.

The observations are valid for the VT82C686B. One of their
readers also observed it at VT82C686A too and reported, that the
workaround helps.

So we might want to enable these workarounds for this
southbridge, too.

Hope this translation helps our maintainers a little ;-)

Regards

Ingo Oeser

[1] http://home.tiscalinet.de/au-ja/review-kt133a-4.html
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
<<<<<<<<<<<< been there and had much fun >>>>>>>>>>>>

2001-04-13 13:05:18

by Alan

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

> Here might be one of the resons for the trouble with VIA chipsets:
>
> http://www.theregister.co.uk/content/3/18267.html
>
> Some DMA error corrupting data, sounds like a really nasty bug. The
> information is minimal on that page.

What annoys me is that we've known about the problem for _ages_. If you look
the 2.4 kernel has experimental workarounds for this problem. VIA never once
even returned an email to say 'we are looking into this'. Instead people sat
there flashing multiple BIOS images and seeing what made the difference.

> I just bought one of these babies and I should probably return it
> directly. I have seven days to return it and get my money back. I have not
> even opened the box yet.

Disabling pci master read caching is likely to reduce the performance of the
board.

> They seems to think they can correct it by some bios updates, but who
> knows what that fix might be. Maybe they turn of DMA alltogether
> (hopefully not).

The -ac kernel does this on the KT7 series boards which seemed the worst
affected.

Hopefully now someone in VIA will have the decency to tell the community
how to detect setups that need a BIOS upgrade so we can warn them before the
chipset bug turns there file systems into sludge.

Alan

2001-04-13 13:11:08

by Alan

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

> These are the things, that one of the German links[1] suggest
> (translated only, because I'm not the IDE guy ;-)):
>
> - PCI Delay Transaction = 0 (off) (Register 0x70, Bit 1)
> - PCI Master Read Caching = 0 (off) (Register 0x70, Bit 2)
> - PCI Latency = 0 (values between 0 and 32 *seem* to be safe,
> everything above seems to be *not* !)
>
> Note: This also fixes some related USB issues according to [1].

If you set the latency only within 0 and 32 then numerous other cards will
stop working (because they set the latency up to fix pci bugs or get
performance) - eg the buslogic scsi cards set the latency in their bios. The
3c59x needs a high value.

The values they quote are ones people tried and they were pulled because those
were the values that generated all the 'my tv card has broken' 'my ethernet
stopped working' reports.

Alan

2001-04-13 13:30:36

by Doug McNaught

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

Alan Cox <[email protected]> writes:

> > Here might be one of the resons for the trouble with VIA chipsets:
> >
> > http://www.theregister.co.uk/content/3/18267.html
> >
> > Some DMA error corrupting data, sounds like a really nasty bug. The
> > information is minimal on that page.
>
> What annoys me is that we've known about the problem for _ages_. If you look
> the 2.4 kernel has experimental workarounds for this problem. VIA never once
> even returned an email to say 'we are looking into this'. Instead people sat
> there flashing multiple BIOS images and seeing what made the difference.

Is this problem likely to affect 2.2.X? I have a VIA-based board on
order (Tyan Trinity) and I don't plan to run 2.4 on it anytime soon
(it's upgrading a stock RH6.2 box).

Am I safe if I stay in PIO mode?

-Doug

2001-04-13 13:35:17

by Alan

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

> Is this problem likely to affect 2.2.X? I have a VIA-based board on
> order (Tyan Trinity) and I don't plan to run 2.4 on it anytime soon
> (it's upgrading a stock RH6.2 box).
>
> Am I safe if I stay in PIO mode?

I have received exactly zero reports of 2.2 problems, and as the 2.2 maintainer
I would have expected more (I delete 2.2 + ide-patch reports). My suspicion is
the problem requires UDMA to occur, or to occur with any probability.

The real concern (as with all of these things) is going to be what the
workaround does to performance - as measured in frames/second for most folks ;)


2001-04-13 14:02:59

by Doug McNaught

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

Alan Cox <[email protected]> writes:

> > Is this problem likely to affect 2.2.X? I have a VIA-based board on
> > order (Tyan Trinity) and I don't plan to run 2.4 on it anytime soon
> > (it's upgrading a stock RH6.2 box).
> >
> > Am I safe if I stay in PIO mode?
>
> I have received exactly zero reports of 2.2 problems, and as the 2.2
> maintainer I would have expected more (I delete 2.2 + ide-patch
> reports). My suspicion is the problem requires UDMA to occur, or to
> occur with any probability.

This is good to know. I'll stay away from UDMA and the ide-patches
until things seem clearer then.

> The real concern (as with all of these things) is going to be what the
> workaround does to performance - as measured in frames/second for most folks ;)

Well, this is a compile server (and will have a lot of RAM) so running
PIO for a while shouldn't have much impact.

Thanks, Alan.

-Doug

2001-04-13 22:58:48

by Jamie Lokier

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

Alan Cox wrote:
> > Is this problem likely to affect 2.2.X? I have a VIA-based board on
> > order (Tyan Trinity) and I don't plan to run 2.4 on it anytime soon
> > (it's upgrading a stock RH6.2 box).
> >
> > Am I safe if I stay in PIO mode?
>
> I have received exactly zero reports of 2.2 problems, and as the 2.2
> maintainer I would have expected more (I delete 2.2 + ide-patch
> reports). My suspicion is the problem requires UDMA to occur, or to
> occur with any probability.

Are you talking about IDE DMA problems on any VIA boards, or the Tyan in
particular? I've sent several reports of sudden system death on a VIA
motherboard, that were confirmed by a few other people. It's still
present in 2.2: Mandrake 7's installer froze, twice, until I added
"ide=nodma" (or whatever the option is). Note, this is _without_ UDMA:
the board is not capable of UDMA.

-- Jamie

2001-04-14 05:50:24

by Dan Podeanu

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

On 13 Apr 2001, Doug McNaught wrote:

> Alan Cox <[email protected]> writes:
>
> > > Here might be one of the resons for the trouble with VIA chipsets:
> > >
> > > http://www.theregister.co.uk/content/3/18267.html
> > >
> > > Some DMA error corrupting data, sounds like a really nasty bug. The
> > > information is minimal on that page.
> >
> > What annoys me is that we've known about the problem for _ages_. If you look
> > the 2.4 kernel has experimental workarounds for this problem. VIA never once
> > even returned an email to say 'we are looking into this'. Instead people sat
> > there flashing multiple BIOS images and seeing what made the difference.
>
> Is this problem likely to affect 2.2.X? I have a VIA-based board on
> order (Tyan Trinity) and I don't plan to run 2.4 on it anytime soon
> (it's upgrading a stock RH6.2 box).
>

We've had HUGE problems with 2.4.x kernels and VIA based boards. We have
here 3 VIA boards with Athlon/850 and Duron/900 CPUs. The problem goes
like this:

Compile 2.4.3 with VIA and Athlon support.
Reboot.
Ooopses (between 1 and continuously scrolling of them) occur at random
periods of time, between mounting the root filesystem and 2-3 minutes of
functionality.

Note that the problem occurs with all versions of 2.4 but not with
2.2.17-2.2.19. Also, enabling or disabling DMA doesn't help fix the
problem.

After several compiles, it appears that compiling with 586 cpu support
instead of Athlon resulted in a working, stable kernel (which is rather
strange imo, given there are different CPUs but the same board model).
However, compiling with 386 support didn't (which was my first guess of a
stable system).

I was a bit lazy and didn't save/ksymoops any of the oopses, but will do
if this is not a well-known problem (and it appears that more or less it
is).

My lspci:
00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8305
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev
40)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev
06)
00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 16)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686
[Apollo Super AC97/Audio] (rev 50)
00:0b.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
(rev 08)01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400
AGP (rev 04)

(although the cards stuck in the PCIs aren't the same for all three
systems)

and /proc/cpuinfo (from the kernel with 586 support):
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) Processor
stepping : 2
cpu MHz : 851.961
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov
pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 1697.38

I know that all in all this is rather irrelevant, but if further
information is needed, drop me a line and I'll try be a good boy (tm) :)

Yours, Dan.

2001-04-15 17:45:38

by Thomas Molina

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

On Sat, 14 Apr 2001, Dan Podeanu wrote:

> On 13 Apr 2001, Doug McNaught wrote:
>
> > Alan Cox <[email protected]> writes:
> >
> > > > Here might be one of the resons for the trouble with VIA chipsets:
> > > >
> > > > http://www.theregister.co.uk/content/3/18267.html
> > > >
> > > > Some DMA error corrupting data, sounds like a really nasty bug. The
> > > > information is minimal on that page.
> > >
> > > What annoys me is that we've known about the problem for _ages_. If you look
> > > the 2.4 kernel has experimental workarounds for this problem. VIA never once
> > > even returned an email to say 'we are looking into this'. Instead people sat
> > > there flashing multiple BIOS images and seeing what made the difference.
> >
> > Is this problem likely to affect 2.2.X? I have a VIA-based board on
> > order (Tyan Trinity) and I don't plan to run 2.4 on it anytime soon
> > (it's upgrading a stock RH6.2 box).
> >
>
> We've had HUGE problems with 2.4.x kernels and VIA based boards. We have
> here 3 VIA boards with Athlon/850 and Duron/900 CPUs. The problem goes
> like this:
>
> Compile 2.4.3 with VIA and Athlon support.
> Reboot.
> Ooopses (between 1 and continuously scrolling of them) occur at random
> periods of time, between mounting the root filesystem and 2-3 minutes of
> functionality.
>

Interesting. I have an ASUS A7V board I'm running an Athlon 900 on with
none of the problems noted here. Are there specific hardware
correlations that people have noted? It does have the 686B southbridge
noted in the article as causing problems.

The BIOS thing is interesting though. I work part time in a computer
repair shop. With the A7V boards you never know which BIOS version will
be on the board. The A7V is one of the most popular ones we have for
AMD chips. We sell a ton of them, so if there are problems I'd sure
like to be kept up to date.

2001-04-16 06:47:06

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Data-corruption bug in VIA chipsets

Alan Cox <[email protected]> writes:

> > Here might be one of the resons for the trouble with VIA chipsets:
> >
> > http://www.theregister.co.uk/content/3/18267.html
> >
> > Some DMA error corrupting data, sounds like a really nasty bug. The
> > information is minimal on that page.
>
> What annoys me is that we've known about the problem for _ages_. If you look
> the 2.4 kernel has experimental workarounds for this problem. VIA never once
> even returned an email to say 'we are looking into this'. Instead people sat
> there flashing multiple BIOS images and seeing what made the difference.
>
> > I just bought one of these babies and I should probably return it
> > directly. I have seven days to return it and get my money back. I have not
> > even opened the box yet.
>
> Disabling pci master read caching is likely to reduce the performance of the
> board.
>
> > They seems to think they can correct it by some bios updates, but who
> > knows what that fix might be. Maybe they turn of DMA alltogether
> > (hopefully not).
>
> The -ac kernel does this on the KT7 series boards which seemed the worst
> affected.
>
> Hopefully now someone in VIA will have the decency to tell the community
> how to detect setups that need a BIOS upgrade so we can warn them before the
> chipset bug turns there file systems into sludge.

I wonder if someone at VIA even knows what is going on. In working
with linuxBIOS Ron Minnich was worked with VIA to get it up on some of
their chipsets. He ran into a few cases where his code wouldn't work,
he'd show it to the engineers at VIA and they also wouldn't have a
clue why his code was failing. And that it looked like only Award
knew how the chipset really worked. This is northbridge code not
southbridge code so it may be an entirely different ball game but...

Anyway Alan you might want to bounce off Ron. He might have a clue
how to help you get get VIA's attention...

Eric