LinuxLists.cc - data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-02 00:56:12

Subject: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

Hi.

Perhaps some of you have read my older two threads:
http://marc.theaimsgroup.com/?t=116312440000001&r=1&w=2 and the even
older http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2

The issue was basically the following:
I found a severe bug mainly by fortune because it occurs very rarely.
My test looks like the following: I have about 30GB of testing data on
my harddisk,... I repeat verifying sha512 sums on these files and check
if errors occur.
One test pass verifies the 30GB 50 times,... about one to four
differences are found in each pass.

The corrupted data is not one single completely wrong block of data or
so,.. but if you look at the area of the file where differences are
found,.. than some bytes are ok,.. some are wrong,.. and so on (seems to
be randomly).

Also, there seems to be no event that triggers the corruption,.. it
seems to be randomly, too.

It is really definitely not a harware issue (see my old threads my
emails to Tyan/Hitachi and my "workaround" below. My system isn't
overclocked.

My System:
Mainboard: Tyan S2895
Chipsets: Nvidia nforce professional 2200 and 2050 and AMD 8131
CPU: 2x DualCore Opterons model 275
RAM: 4GB Kingston Registered/ECC
Diskdrives: IBM/Hitachi: 1 PATA, 2 SATA

The data corruption error occurs on all drives.

You might have a look at the emails between me and Tyan and Hitachi,..
they contain probalby lots of valuable information (especially my
different tests).

Some days ago,.. an engineer of Tyan suggested me to boot the kernel
with mem=3072M.
When doing this,.. the issue did not occur (I don't want to say it was
solved. Why? See my last emails to Tyan!)
Then he suggested me to disable the memory hole mapping in the BIOS,...
When doing so,.. the error doesn't occur, too.
But I loose about 2GB RAM,.. and,.. more important,.. I cant believe
that this is responsible for the whole issue. I don't consider it a
solution but more a poor workaround which perhaps only by fortune solves
the issue (Why? See my last eMails to Tyan ;) )

So I'd like to ask you if you perhaps could read the current information
in this and previous mails,.. and tell me your opinions.
It is very likely that a large number of users suffer from this error
(namely all Nvidia chipset users) but only few (there are some,.. I
found most of them in the Nvidia forums,.. and they have exactly the
same issue) identify this as an error because it's so rare.

Perhaps someone have an idea why disabling the memhole mapping solves
it. I've always thought that memhole mapping just moves some address
space to higher addreses to avoid the conflict between address space for
PCI devices and address space for pyhsical memory.
But this should be just a simple addition and not solve this obviously
complex error.

Lots of thanks in advance.

Best wishes,
Chris.

#########################################################################
### email #1 to Tyan/Hitachi ###
#########################################################################

(sorry for reposting but the AMD support system requires to add some keywords in
the subject, and I wanted to have the correct subject for all other parties
(Tyan and Hitachi) too, so that CC'ing would be possible for all.

Hi.

I provide this information to:
- Tyan ([email protected]) - Mr. Rodger Dusatko
- Hitachi ([email protected] , please add the GST Support
Request #627-602-082-5 in the subject) Mr. Schledz)
- and with this email for the first time to AMD [email protected]
(for the AMD people: please have a look at the information at the very
end of this email first,... there you'll find links where you can read
the introduction and description about the whole issue).

It might be useful if you contact each other (and especially nvidia
which I wasn't able to contact myself),.. but please CC me in all you
communications.
Also, please forward my emails/information to your responsible technical
engineers and developers.

Please do not ignore this problem:
- it existing,
- several users are experiencing it (thus this is not a single failure
of my system),
- it can cause severe data corruption (which is even more grave, as the
a user won't notice it throught error messages) and
- it happens with different Operating Systems (at least Linux and Windows).

This is my current state of testing. For further information,.. please
do not hesitate to ask.
You'll find old information (included in my previous mails or found at
the linux-kernel mailinglist thread I've included in my mails) at the end.

- In the meantime I do not use diff any longer for my tests, simply
because it takes much longer than to use sha512sums to verify
dataintegrity (but this has not effect on the testing or the issue
itself, it just proves that the error is not in the diff program).

- I always test 30GB insteat of 15

- As before I'm still very sure, that the following components are fully
working and not damaged (see my old mails or lkml):
CPUs => due to extensive gimps/mprime torture tests
memory => due to extensive memtest86+ tests
harddisks => because I use three different disks (SATA-II and PATA) (but
all from IBM/Hitachi or Hitachi) and I did extensive badblock scans
temperature should be ok in my system => proper chassis (not any of the
chep ones) with several fans, CPUs between 38 °C an 45°C, System ambient
about 46°C, videocard, between 55° and 88°C (when under full 3D use),...
the chipsetS (!) don't have temperature monitoring,.. and seem to be
quite hot, but according Tyan this is normal.

Ok now my current state:
- I found (although it was difficult) a lot of resource in the internet
where users report about the same or a very similar problem using the
same hardware components. Some of them:
http://forums.nvidia.com/index.php?showtopic=8171&st=0
http://forums.nvidia.com/index.php?showtopic=18923
http://lkml.org/lkml/2006/8/14/42 (see http://lkml.org/lkml/2006/8/15/109)
Note that I've opened a thread at the nvidia forums myself:
http://forums.nvidia.com/index.php?showtopic=21576

All of them have in common, that the issue is/was not a hardware failure
and it seems that none of them was able to reproduce the failure.

- As far as I understand the Tyan S2895 mainboard manual
ftp://ftp.tyan.com/manuals/m_s2895_101.pdf on page 9,... both the IDE
and SATA are connected to the Nvidia nforce professional 2200,.. so this
may be nvidia related
(If anyone of you has the ability to contact nvidia,.. please do so and
send them all my information (also the old one). It seems that it's not
easily possible to contact them for "end-users")

- I tried different cable routings in my chassis (as far as this was
possible) which did not solve the problem.
I also switched of all other devices in my rooms that might produce
electro-magnetic disturbances....
thus electro-magnetic disturbances are unlikely.

- I found the errata for the AMD 8131 (which is on of my chipsets):
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26310.pdf
Please have a look at it (all the issues) as some might be responsible
for the whole issue.

- I tried to use older BIOS versions (1.02 and 1.03) but all of them
gave me an OPROM error (at bus 12 device 06 function 1) and despite of
that booting,.. the problem still exists.
According to Linux's dmesg this is:

12:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)
12:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)

- I tried with 8131 Errata 56 PCLK (which actualy disables the AMD
Errata 56 fix) but the issue still exists

- I activated my BIOS's Spread Spectrum Option (although it is not
described what it does).
Is this just for harddisks? And would I have to activate SpredSpectrum
at the Hitachi Feature Tools, too, for having an effect?

- I tried everything with the lowest possible BIOS settings,.. which
solved nothing.

- According to the information found in the threads at the nvidia boards
(see the links above)... this may be a nvidia-Hitachi nvidia-Maxtor (and
even other manufracturers of HDDs) related problem.
Some even claimed that deactivation of Native Command Queueing (NCQ)
helped,.. BUT only for a limited time.
But as far as I know, Linux doesn't support NCQ at all (at the moment).

Thank you very much for now.
Best wishes,
Christoph Anton Mitterer.

----------------------------
Old information:
As a reminder my software/hardware data:

CPU: 2x DualCore Opteron 275
Mainboard: Tyan Thunder K8WE (S2895)
Chipsets: Nvidia nForce professional 2200, Nvidia nForce professional
2050, AMD 8131
Memory: Kingston ValueRAM 4x 1GB Registered ECC
Harddisks: 1x PATA IBM/Hitachi, 2x SATA IBM/Hitachi
Additional Devices/Drives: Plextor PX760A DVD/CD, TerraTec Aureon 7.1
Universe soundcard, Hauppage Nova 500T DualDVB-T card.
Distribution: Debian sid
Kernel: self-compiled 2.6.18.2 (see below for .config) with applied EDAC
patches

The system should be cooled enough so I don't think that this comes from
overheating issues. Nothing is overclocked.

The issue was:

For an in depth description of the problem please have a look at the
linux-kernel mailing list.
The full, current thread:
http://marc.theaimsgroup.com/?t=116312440000001&r=1&w=2

(You'll find there my system specs, too.)

An older thread with the same problem, but where I thougt the problem is
FAT32 related (anyway, might be interesting, too):
http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2
#########################################################################
### email #2 to Tyan/Hitachi ###
#########################################################################

Hi Mr. Dusatko, hi Mr. Ebner.

Rodger Dusatko - Tyan wrote:

> > Thanks so much for your e-mail.
>
Well I have to thank you for your support, too :-)

> > You seem to have tried many different tests.
> >
>
Dozens ^^

> > If I understand the problem correctly, when you use the on-board SATA, you
> > are receiving corrupted data.
> >
>
It happens with the onboard IDE, too !!!! This sounds reasonable as both
IDE and SATA are connected to the nforce 2200.
If you read the links to pages where other people report about the same
problem (especially at the Nvidia forums) you'll see that others think
that this is nforce related, too.

I established contact with some of them and most think that this may
(but of course there is no definite proof for this) related to a
nforce/disk manufracturer combination. So all of them report for example
that the error occurs with nforce/Hitachi.
Some of them think that it might be NCQ related (did you read the NCQ
related parts of my last email? As far as I can see NCQ will be added in
the kernel for sata_nv in 2.6.19).
Both of this sounds somewhat strange to me...
For my understanding of computer technology I would assume that this
should be a general harddisk error,.. and not only Hitachi or e.g.
Maxtor related.

(I didn't test it with the onboard SCSI, as I don't have any SCSI drives.)
Not that

> > Sometimes we have solved this problem by simply readjusting bios settings.
> >
>
Does this mean that you were able to reproduce the problem?

> > Please try the following:
> >
> > in the Linux boot prompt, please try (mem=3072M). This will show whether it
> > might be a problem related to the memory hole.
> > or use only 2gb of memory.
> >
> >
>
I'm going to test this in a few minutes (althoug I think I did already a
similar test)...
Anyway from a theoretical point of view it sounds very unlikely to me,
that this is a memory related issue at all. Not only because of my
memtest86+ test,.. but also because of the way the linux kernel works in
that area.

> > If it is a memory hole problem, you should have (with Linux) the following
> > settings:
> >
>
My current memhole seetings are these (the ones that I use under
"normal" production):
IOMMU -> enabled
IOMMU -> 64 MB
Memhole -> AUTO
mapping -> HARDWARE

Other memory settings

-> Node Memory Interleave -> enabled
-> Dram Bank Interleave -> enabled
-> MTTR Mapping -> discrete
-> Memory Hole
-> Memory Hole mapping -> enabled
-> Memory Config
-> Memory Clock DDR400
-> Swizzle memory banks enabled

> > CMOS reset (press CMOS Clear button for 20 seconds).
> > Go into Bios -> Set all values to default (F9)
> > Main -> Installed O/S -> Linux
> > Advanced -> Hammer Config
> > -> Node Memory Interleave -> disabled
> > -> Dram Bank Interleave -> disabled
> > -> MTTR Mapping -> discrete
> > -> Memory Hole
> > -> Memory Hole mapping -> Disabled
> > -> Memory Config
> > -> Memory Clock DDR333
> > -> Swizzle memory banks disabled
> >
>
I've already checked excatly this setting ;) expect that I used
DDR400,... could that make any difference?

> > You might try SATA HDDs from another manufacturer.
> >
>
I'm already trying to do so but currently none of my friends was able to
borrow me any devices,... I'm also going to check the issue with other
operating systems (at least if I find any that support the Nvidia
chipsets at all),.. maybe some *BSD or OpenSolaris.

> > Also, I have a newer beta bios version available.
> >
> > ftp://ftp.tech2.de/boards/28xx/2895/BIOS/ -> 2895_1047.zip you might want to
> > try.
> >
>
Please don't understand me wrong,... I still would like you to help and
investigate in that issue... but currently I think (although I may be
wrong) that this could be harddisk firmware related.
So what _excatly_ did you change in that version,.. or is it just a
crappy solution or workaround,...?

Any idea about that spread spectrum option?:

> > - I activated my BIOS's Spread Spectrum Option (although it is not
> > described what it does).
> > Is this just for harddisks? And would I have to activate SpredSpectrum
> > at the Hitachi Feature Tools, too, for having an effect?
> >
>

Thanks so far.

Chris.
#########################################################################
### email #3 to Tyan/Hitachi ###
#########################################################################

Rodger Dusatko - Tyan wrote:

> > Hello Christoph,
> >
> > another customer having data corruption problems said by entering the
> > command mem=3072M he no longer has data corruption problems.
> >
> > Please let me know as soon as possible, that I might know how to help
> > further.
> >
>
I just finished my test....
Used my "old" BIOS settings (not the one included in you mail)... but
set mem=3072M.
It seems (although I'm not yet fully convinced as I've already had cases
where an error occured after lots of sha512-passes) that with mem=3072M
_no error occures_

But of course I get only 2GB ram (of my 4GB which I wanted to upgrad to
even more memory in the next months).
So just to use mem=3072M is not acceptable.

And I must admit that I have strong concerns about the fact that memhole
settings are a proper fix for that.
Of course I'd be glad if I could fix that... but from my own large
system programming experience I know that there are many cases where a
fix isn't really a fix for a problem,... but solves the problem in
conjunction with other errors (that are not yet found).

I'd be glad if you could give me better explanation of the
memhole-solution (and especially how to solve it without mem=3072M
because I'd like to have my full memory) ... because I'd like to fully
understand the issue to secure that it is really fixed or not.

I'll test you beta BIOS tomorrow and report my results.

If you whish I could also call you via phone (just give me your phone-#).

Thanks in advance,
Chris.
#########################################################################
### email #4 to Tyan/Hitachi ###
#########################################################################
One thing I forgot,...
Although using it very very rarely,.. there are some cases where I have
to use M$ Windows.... and afaik,.. you cannot tell windows something
like mem=3072M
So it wouldn't solve that for Windows.

Chris.
#########################################################################
### email #5 to Tyan/Hitachi ###
#########################################################################
Dear Mr. Dusatko, Mr. Ebner and dear sir at the Hitachi GST Support.

I'd like to give you my current status of the problem.

First of all AMD didn't even answer until now, the same applies for my
request at Nvidias knowledge base,... says something about these
companies I think.

For the people at Hitachi: With the advice of Mr. Dusatko from Tyan I
was able to workaround the problem:

Rodger Dusatko - Tyan wrote:

> > as I mentioned earlier, you can do some of these memory hole settings
> > : (for
> > Linux)
>
>>> >>> Go into Bios -> Set all values to default (F9)
>>> >>> Main -> Installed O/S -> Linux
>>> >>> Advanced -> Hammer Config
>>> >>> -> Node Memory Interleave -> disabled
>>> >>> -> Dram Bank Interleave -> disabled
>>> >>> -> MTTR Mapping -> discrete
>>> >>> -> Memory Hole
>>> >>> -> Memory Hole mapping -> Disabled
>>> >>> -> Memory Config
>>> >>> -> Memory Clock DDR333
>>> >>> -> Swizzle memory banks disabled
>>>
The above settings for the BIOS actually lead to a system that did not
make any errors during one of my complete tests (that is verifying
sha512sums 50 times on 30 GB of data).

Actually I seems to depend only on one of the above settings: Memory
hole mapping.
Currently I'm using the following:
Main -> Installed O/S -> Linux
Advanced -> Hammer Config
-> Node Memory Interleave -> Auto
-> Dram Bank Interleave -> Auto
-> MTTR Mapping -> discrete
-> Memory Hole
-> Memory Hole mapping -> Disabled
-> Memory Config
-> Memory Clock->DDR400
->Swizzle memory banks -> Enabled
And still no error occurs.
But as soon as I set Memory Hole mapping to one of the other values
(Auto, Hardware or Software),.. the error occurs.
(Especially for Tyan: Note that when using Software Node Memory
Interleave is always automatically set to Disabled after reboot, while
when using Harware, Auto works - perhaps a bug?)

Ok,.. now you might think,... problem solved,.. but it is defenitely not:

1) Memory Hole mapping costs me 2GB of my 4GB RAM (which are unusable
because of the memory hole),.. this is not really acceptable.
The beta BIOS Mr. Dusatko from Tyan gave might solve this, but I wasn't
able to test this yet.

2) But even it this would solve the problem I'm still very concerned and
encourage especially the people at Hitachi to try to find another reason.
Why? Because I cannot imagine how the memory hole leads to the wole issue:
- The memory hole is a quite simple process where the BIOS / Hardware
remaps to some portions of physical RAM to higher areas,.. to give the
lower areas to PCI devices that make uses of mmap.
Even if there would be an error,... that would not only affect IDE/SATA
but also CD/DVD/SCSI drives and any other memory operations at all.
AND there would be complete block that would be corrupted,.. not only
several bytes (remember: I've reportet that in a currupted block some
bytes are ok,.. some are note,... and so on).

-If you look at the board description
(ftp://ftp.tyan.com/manuals/m_s2895_101.pdf page 9) you see that both
IDE and SATA are connected to the nforce professional 2200, right?
Why should the memhole settings affect only the IDE/SATA drives? If
there was an error in the memory controller it would affect every memory
operation in the system (see above) because the memory controller is not
onboard,.. but integrated in the Operton CPUs. (This is also the reason
why, if the memory controller would have design errors, not only people
using nvidia chipsets have this problem,.. which is apparently the case.)

-Last but not least,.. (as also noted above) the errors are always like
the following: not a complete block is corrupted but just perhaps half
of all its bytes (in any order). Could this come from the simple memory
hole remapping???? In my opinion, definitely not.

So I think "we" are not yet finished with work.
- I ask the Hitachi people to continue their work (or start with it ;) )
in taking a special look at their firmware and how it interoperates with
nforce chipsets.
I found (really) lots of reports where people tells that this issue has
been resolved by firmware upgrades of their vendor (especially for
Maxtor devices).
Nvidia itself suggests this:
http://nvidia.custhelp.com/cgi-bin/nvidia.cfg/php/enduser/std_adp.php?p_faqid=768&p_created=1138923863&p_sid=9qSJ8Yni&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPSZwX3NvcnRfYnk9JnBfZ3JpZHNvcnQ9JnBfcm93X2NudD00MzImcF9wcm9kcz0mcF9jYXRzPSZwX3B2PSZwX2N2PSZwX3NlYXJjaF90eXBlPWFuc3dlcnMuc2VhcmNoX2ZubCZwX3BhZ2U9MQ**&p_li=&p_topview=1
(although they think that the issue appears only on SATA which is
definitely not true)
Please have a detailed look on the NCQ of the drives:
This would be (according to how NCQ works) the most likely reason for
the error,... and some people say that deactivating it under Windows,
solved the issue. Anyway,... if NCQ was responsible for the error,.. it
would not appear on the IDE drives (but it does).
And I'm not even sure if Linux/libata (until kernel 2.6.18.x) even uses
NCQ. I always thought it would not but I might be wrong. See this part
of my dmesg:
sata_nv 0000:00:07.0: version 2.0
ACPI: PCI Interrupt Link [LTID] enabled at IRQ 22
GSI 18 sharing vector 0xD9 and IRQ 18
ACPI: PCI Interrupt 0000:00:07.0[A] -> Link [LTID] -> GSI 22 (level,
high) -> IRQ 217
PCI: Setting latency timer of device 0000:00:07.0 to 64
ata1: SATA max UDMA/133 cmd 0x1C40 ctl 0x1C36 bmdma 0x1C10 irq 217
ata2: SATA max UDMA/133 cmd 0x1C38 ctl 0x1C32 bmdma 0x1C18 irq 217
scsi0 : sata_nv
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : sata_nv
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 0/32)
ata2.00: ata2: dev 0 multi count 16
ata2.00: configured for UDMA/133
Vendor: ATA Model: HDT722525DLA380 Rev: V44O
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: HDT722525DLA380 Rev: V44O
Type: Direct-Access ANSI SCSI revision: 05

It says something about NCQ...

-It would also be great if Hitachi could inform me about their current
progress or how long it will take until their engineers start to have a
look at my issue.

Especially for Mr. Dusatko at tyan:

> > Just because memtest86 works it doesn't mean that the memory you are using
> > is compatible memory. That is why we have a recommended list.
> >
> > Each of the modules on our recommended list have been thoroughly tested.
> > Most memories pass the memtest86 test, yet many of these do not past our
> > tests.
> >
> > my tel-nr.
>
My memory modules are actually on your compatible list (they're the
Kingston KVR400D8R3A/1G) so this cannot be the point.

I still was not able to test your beta BIOS but I'll do so as soon as
possible an report the results. And I'm going to call you this or next
week (have to work at the Leibniz-Supercomputing Centre today and
tomorrow,.. so don't know when I have enough time).

Thanks for now.

Best wishes,
Chris.
#########################################################################
### email #6 to Tyan/Hitachi ###
#########################################################################
Rodger Dusatko - Tyan wrote:

> > you mention:
> >
> >
>
>> >> My memory modules are actually on your compatible list (they're the
>> >> Kingston KVR400D8R3A/1G) so this cannot be the point.
>> >>
>>
> >
> > I have talked with so many customers about this very problem. Just because
> > the part-nr. of the Kingston modules is correct, this means absolutely
> > nothing.
> >
> > You need to also have the same chips as on our recommended website. The
> > chips being used are even more important than the kingston part-nr.
> >
> > The chips on the KVR400D8R3A/1G must be Micron, having chip part-nr.
> > MT46V64M8TG-5B D as shown on our recommended memory page.
> >
>
I'll check this these days and inform you about the exact chips on the DIMMs

Anyway...
What do you say to the reasons why I don't think that the memhole stuff
is a real solution but more a poor workaround (see my last email,..
which is attached below).
You didn't comment on my ideas in your last answer.

> > This is a grave problem with Kingston memory and why I would only recommend
> > Kingston memory when your supplier is willing to help you to get the exact
> > modules which we have tested.
> >
>
Well are you absolutely sure that this is memory related? (See also my
comments in my last email)
Note that lots of users were able to solve this via disk drive firmware
upgrades and many of them didn't have Kingston RAMs.
Also,... all RAMs "shoudl" be usable as all "should" follow the SDRAM
standard...

If there would be a Kingston error,.. that data corruption issue should
appear everywhere, shouldn't it? And not only on hard disk accesses.

In all doing respect, and please believe me that I truely respect your
knowledge and so (because you surely know more about hardware because my
computer science study goes more about theoretical stuff)... but I
cannot believe that this is the simple reason,... "wrong RAMs wrong BIOS
settings and you cannot use your full RAM" (see my reasons in my last
email)...
I'd say that there is somewhere a real and perhaps grave error....
either on the board itself ot the nvidia chipset (which I suspect as the
evil here ;-) ).
And I think the error is severe enought that there should be made a
considerable effort to solve it, or at least, exactly locate where there
error is, and why the memhole disabled solves it.

And remember,... it may be the case that the data corruption doesn't
appear when UDMA (at PATA drives) is disabled,.. but this shouldn't have
to do anything with memory vendor or memhole settings,... so why would
this solve the issue, too (if it actually does which I cannot proove)?

I'm also going to start my test with changing the following BIOS settings:
SCSI Bus master from my current setting Enabled to Disabled
Disk Access Mode (don't recall the actual name) from Other to DOS.

I'm going to report you the results next week,.. and I'll probably going
to call you again.

> > Wiith ATP or other vendors, they stick usually to the same chips as long as
> > the vendor part-nr is the same. In such a case, you probably would have been
> > right when the vendor part-nr matches your part-nr.
> >
> > The problems you are having, as I mentioned before, may disappear if you use
> > memory on our recommended memory list.
> >
>
Is it possible for Tyan to borrow me such memory for testing? I live in
Munich and Tyan Germany is in Munich too, if I recall correctly.

Thanks in adc

Best wishes,
Chris.
#########################################################################
### email #7 to Tyan/Hitachi ###
#########################################################################
Sorry I forgot one thing:
The beta BIOS you gave me did not change anything.
As soon as I activate memhole mapping (either to software, hardware or
auto),.. data corruption occurs.

Chris.
#########################################################################
### reply to #1 from Tyan ###
#########################################################################
Hello Chris,

there are often problems which are not really so easy to understand.

As I understand it, the hard disk uses DMA (Direct Memory Access), which is
supported by the chipset.

The processor uses the DMA access to the DIMMs through the chipset to write
to the disks.

Now, I really am not an expert on this, but normally the DMA is not used by
the processor when communicating with the memory, but rather the
hypertransport connection.

This may be an explanation of what is causing the problem. Because a driver
for HDDs also exists, there may be different links where the problem is
occuring.

The driver may be able to solve problems which can make it that even using
the hardware setting for memory hole causes no problems. However, there are
many different amd cpu steppings, all different in how they manage memory
(and in this case, the memory hole). If the drivers take all of these
considerations, they may be able to adjust according to the processor being
used. But I am not sure if the people who write these drivers get involved
with this.

Rodger

, the DMA s supported from the chipset uses the DMA access for
communicating with the processor, the memory
----- Original Message -----
...
...
...

That were all (important) emails so until now.

2006-12-02 01:15:09

by Erik Andersen

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Sat Dec 02, 2006 at 01:56:06AM +0100, Christoph Anton Mitterer wrote:
> The issue was basically the following:
> I found a severe bug mainly by fortune because it occurs very rarely.
> My test looks like the following: I have about 30GB of testing data on
> my harddisk,... I repeat verifying sha512 sums on these files and check
> if errors occur.
> One test pass verifies the 30GB 50 times,... about one to four
> differences are found in each pass.

Doh! I have a Tyan S2895 in my system, and I've been pulling my
hair out trying to track down the cause of a similar somewhat
rare failure for the pre-computer sha1 of a block of data to
actually match the calculated sha1. I'd been hunting in vain the
past few days trying to find a cause -- looking for buffer
overflows, non thread safe code, or similar usual suspects.

It is a relief to see I am not alone!

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2006-12-02 01:28:15

Ville Herva wrote:
> I saw something very similar with Via KT133 years ago. Then the culprit was
> botched PCI implementation that sometimes corrupted PCI transfers when there
> was heavy PCI I/O going on. Usually than meant running two disk transfers at
> the same time. Doing heavy network I/O at the time made it more likely
> happen.
Hm I do only on concurrent test,... and network is not used very much
during the tests.

> I used this crude hack:
> http://v.iki.fi/~vherva/tmp/wrchk.c
>
I'll have a look at it :)

> If the problem in your case is that the PCI transfer gets corrupted when it
> happens to a certain memory area, I guess you could try to binary search for
> the bad spot with the kernel BadRam patches
> http://www.linuxjournal.com/article/4489 (I seem to recall it was possible
> to turn off memory areas with vanilla kernel boot params without a patch,
> but I can't find a reference.)
>

I know badram,.. but the thing is,.. that it's highly unlikely that my
RAMs are damaged. Many hours of memtest86+ runs did not show any error
(not even ECC errors),...

And why should memhol mapping disabled solve the issue if memory was
damaged? That could only be if the badblocks would be in the address
space used by the memhole....

Chris.

Attachments:

calestyo.vcf (156.00 B)

2006-12-11 09:25:07

by Karsten Weiss

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Sat, 2 Dec 2006, Karsten Weiss wrote:

> On Sat, 2 Dec 2006, Christoph Anton Mitterer wrote:
>
> > I found a severe bug mainly by fortune because it occurs very rarely.
> > My test looks like the following: I have about 30GB of testing data on
>
> This sounds very familiar! One of the Linux compute clusters I
> administer at work is a 336 node system consisting of the
> following components:
>
> * 2x Dual-Core AMD Opteron 275
> * Tyan S2891 mainboard
> * Hitachi HDS728080PLA380 harddisk
> * 4 GB RAM (some nodes have 8 GB) - intensively tested with
> memtest86+
> * SUSE 9.3 x86_64 (kernel 2.6.11.4-21.14-smp) - But I've also
> e.g. tried the latest openSUSE 10.2 RC1+ kernel 2.6.18.2-33 which
> makes no difference.
>
> We are running LS-Dyna on these machines and discovered a
> testcase which shows a similar data corruption. So I can
> confirm that the problem is for real an not a hardware defect
> of a single machine!

Last week we did some more testing with the following result:

We could not reproduce the data corruption anymore if we boot the machines
with the kernel parameter "iommu=soft" i.e. if we use software bounce
buffering instead of the hw-iommu. (As mentioned before, booting with
mem=2g works fine, too, because this disables the iommu altogether.)

I.e. on these systems the data corruption only happens if the hw-iommu
(PCI-GART) of the Opteron CPUs is in use.

Christoph, Erik, Chris: I would appreciate if you would test and hopefully
confirm this workaround, too.

Best regards,
Karsten

--
__________________________________________creating IT solutions
Dipl.-Inf. Karsten Weiss science + computing ag
phone: +49 7071 9457 452 Hagellocher Weg 73
teamline: +49 7071 9457 681 72070 Tuebingen, Germany
email: [email protected] http://www.science-computing.de

2006-12-12 06:18:27

by Chris Wedgwood

On Wed, Dec 13, 2006 at 08:18:21PM +0100, Christoph Anton Mitterer wrote:

> booting with iommu=soft => works fine
> booting with iommu=noagp => DOESN'T solve the error
> booting with iommu=off => the system doesn't even boot and panics

> When I set IOMMU to disabled in the BIOS the error is not solved-
> I tried to set bigger space for the IOMMU in the BIOS (256MB instead of
> 64MB),.. but it does not solve the problem.

> Any ideas why iommu=disabled in the bios does not solve the issue?

The kernel will still use the IOMMU if the BIOS doesn't set it up if
it can, check your dmesg for IOMMU strings, there might be something
printed to this effect.

> 1) And does this now mean that there's an error in the hardware
> (chipset or CPU/memcontroller)?

My guess is it's a kernel bug, I don't know for certain. Perhaps we
shaould start making a more comprehensive list of affected kernels &
CPUs?

2006-12-13 20:01:06

by Chris Wedgwood

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Wed, Dec 13, 2006 at 08:20:59PM +0100, Christoph Anton Mitterer wrote:

> Did anyone made any test under Windows? I cannot set there
> iommu=soft, can I?

Windows never uses the hardware iommu, so it's always doing the
equivalent on iommu=soft

2006-12-13 20:02:44

Karsten Weiss wrote:

> Of course, the big question "Why does the hardware iommu *not*
> work on those machines?" still remains.
>
I'm going to check AMDs errata docs these days,.. perhaps I find
something that relates. But I'd ask you to do the same as I don't
consider myself as an expert in these issues ;-)

Chris Wedgwood said that iommu isn't used unter windows at all,.. so I
think the following three solutions would be possible:
- error in the Opteron (memory controller)
- error in the Nvidia chipsets
- error in the kernel

> I have also tried setting "memory hole mapping" to "disabled"
> instead of "hardware" on some of the machines and this *seems*
> to work stable, too. However, I did only test it on about a
> dozen machines because this bios setting costs us 1 GB memory
> (and iommu=soft does not).
>
Yes... loosing so much memory is a big drawback,.. anyway it would be
great if you can make some more extensive tests that we'd be able to say
if memholemapping=disabled in the BIOS really solves that issue, too, or
not.

Does anyone know how memhole mapping in the BIOS relates to the iommu stuff?
Is it likely or explainable that both would sovle the issue?

> BTW: Maybe I should also mention that other machines types
> (e.g. the HP xw9300 dual opteron workstations) which also use a
> NVIDIA chipset and Opterons never had this problem as far as I
> know.
>
Uhm,.. that's really strange,... I would have thought that this would
affect all systems that uses either the (mayby) buggy nforce chipset,..
or the (mayby) buggy Opteron.

Did those systems have exactly the same Nvidia-Type? Same question for
the CPU (perhaps the issue only occurs for a speciffic stepping)
Again I have:
nforce professional 2200
nforce professional 2050
Opteron model 275 (stepping E6)

btw: I think that is already clear but again:
Both "solutions" solve the problem for me:
Either
- memhole mapping=disabled in the BIOS (but you loose some memory)
- without any iommu= option for the kernel
or
- memhole mapping=hardware in the BIOS (I suppuse it will work with
software too)
- with iommu=soft for the kernel

Best wishes,
Chris.

Attachments:

calestyo.vcf (156.00 B)

2006-12-13 20:16:38

by Erik Andersen

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> Last week we did some more testing with the following result:
>
> We could not reproduce the data corruption anymore if we boot the machines
> with the kernel parameter "iommu=soft" i.e. if we use software bounce
> buffering instead of the hw-iommu. (As mentioned before, booting with
> mem=2g works fine, too, because this disables the iommu altogether.)
>
> I.e. on these systems the data corruption only happens if the hw-iommu
> (PCI-GART) of the Opteron CPUs is in use.
>
> Christoph, Erik, Chris: I would appreciate if you would test and hopefully
> confirm this workaround, too.

What did you set the BIOS to when testing this setting?
Memory Hole enabled? IOMMU enabled?

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2006-12-13 20:26:19

by Karsten Weiss

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Wed, 13 Dec 2006, Erik Andersen wrote:

> On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> > Last week we did some more testing with the following result:
> >
> > We could not reproduce the data corruption anymore if we boot the machines
> > with the kernel parameter "iommu=soft" i.e. if we use software bounce
> > buffering instead of the hw-iommu. (As mentioned before, booting with
> > mem=2g works fine, too, because this disables the iommu altogether.)
> >
> > I.e. on these systems the data corruption only happens if the hw-iommu
> > (PCI-GART) of the Opteron CPUs is in use.
> >
> > Christoph, Erik, Chris: I would appreciate if you would test and hopefully
> > confirm this workaround, too.
>
> What did you set the BIOS to when testing this setting?
> Memory Hole enabled? IOMMU enabled?

"Memory hole mapping" was set to "hardware". With "disabled" we only
see 3 of our 4 GB memory.

Best regards,
Karsten

--
__________________________________________creating IT solutions
Dipl.-Inf. Karsten Weiss science + computing ag
phone: +49 7071 9457 452 Hagellocher Weg 73
teamline: +49 7071 9457 681 72070 Tuebingen, Germany
email: [email protected] http://www.science-computing.de

2006-12-13 20:29:26

by Erik Andersen

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> We could not reproduce the data corruption anymore if we boot
> the machines with the kernel parameter "iommu=soft" i.e. if we
> use software bounce buffering instead of the hw-iommu.

I just realized that booting with "iommu=soft" makes my pcHDTV
HD5500 DVB cards not work. Time to go back to disabling the
memhole and losing 1 GB. :-(

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2006-12-13 20:32:14

Lennart Sorensen wrote:
> I upgrade my plextor firmware using linux. pxupdate for most devices,
> and pxfw for new drivers (like the PX760). Works perfectly for me. It
> is one of the reasons I buy plextors.
Yes I know about it,.. although never tested it,... anyway the main
reason for Windows is Exact Audio Copy (but Andre Wiehthoff is working
on a C port :-D )

Unfortunately my PX760 seems to be defect,.. posted about the issue to
lkml but no success :-(

Best wishes,
Chris.

Attachments:

calestyo.vcf (156.00 B)

2006-12-13 23:16:33

by Lennart Sorensen

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Wed, Dec 13, 2006 at 08:57:23PM +0100, Christoph Anton Mitterer wrote:
> Don't understand me wrong,.. I don't use Windows (expect for upgrading
> my Plextor firmware and EAC ;) )... but I ask because the more
> information we get (even if it's not Linux specific) the more steps we
> can take ;)

I upgrade my plextor firmware using linux. pxupdate for most devices,
and pxfw for new drivers (like the PX760). Works perfectly for me. It
is one of the reasons I buy plextors.

--
Len Sorensen

2006-12-13 23:33:28

Muli Ben-Yehuda wrote:
>> 4)
>> And does someone know if the nforce/opteron iommu requires IBM Calgary
>> IOMMU support?
>>
> It doesn't, Calgary isn't found in machine with Opteron CPUs or NForce
> chipsets (AFAIK). However, compiling Calgary in should make no
> difference, as we detect in run-time which IOMMU is found and the
> machine.
Yes,.. I've read the relevant section shortly after sending that email ;-)

btw & for everybody:
I'm working (as student) at the LRZ (Leibniz Computing Centre) in Munich
where we have very large Linux Cluster and lots of different other
machines,...
I'm going to test for that error on most of the different types of
systems we have,.. and will inform you about my results (if they're
interesting).

Chris.

Attachments:

calestyo.vcf (156.00 B)

2006-12-15 00:12:58

by Dax Kelson

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

On Sat, 2006-12-02 at 01:56 +0100, Christoph Anton Mitterer wrote:
> Hi.
>
> Perhaps some of you have read my older two threads:
> http://marc.theaimsgroup.com/?t=116312440000001&r=1&w=2 and the even
> older http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2
>
> The issue was basically the following:
> I found a severe bug mainly by fortune because it occurs very rarely.
> My test looks like the following: I have about 30GB of testing data on
> my harddisk,... I repeat verifying sha512 sums on these files and check
> if errors occur.
> One test pass verifies the 30GB 50 times,... about one to four
> differences are found in each pass.

This sounds very similar to a corruption issue I was experiencing on my
nforce4 based system. After replacing most of my hardware to no avail, I
discovered that if increased the voltage for my RAM chips the corruption
went away. Note that I was not overclocking at all.

Worth a try.

Dax Kelson

2006-12-15 16:39:00

by Paul Slootman

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

[email protected] wrote:
>On Wed, Dec 13, 2006 at 09:11:29PM +0100, Christoph Anton Mitterer wrote:
>
>> - error in the Opteron (memory controller)
>> - error in the Nvidia chipsets
>> - error in the kernel
>
>My guess without further information would be that some, but not all
>BIOSes are doing some work to avoid this.
>
>Does anyone have an amd64 with an nforce4 chipset and >4GB that does
>NOT have this problem? If so it might be worth chasing the BIOS
>vendors to see what errata they are dealing with.

We have a number of Tyan S2891 systems at work, most with 8GB but all at
least 4GB (data corruption still occurs whether 4 or 8GB is installed;
didn't try less than 4GB...). All have 2 of the following CPUs:
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron(tm) Processor 248
stepping : 1
cpu MHz : 2210.208
cache size : 1024 KB

- the older models have no problem with data corruption,
but fail to boot 2.6.18 and up (exactly like
http://bugzilla.kernel.org/show_bug.cgi?id=7505 )

- the newer models had problems with data corruption (running md5sum
over a large number of files would show differences from run to run).
Sometimes the system would hang (no messages on the serial console,
no magic sysrq, nothing).
These have no problem booting 2.6.18 and up, however.
These were delivered with a 2.02 BIOS version.
On a whim I tried booting with "nosmp noapic", and running on one CPU
the systems seemed stable, no data corruption and no crashes.

- The older models flashed to the latest 2.02 BIOS from the Tyan website
still have no data corruption but still won't boot 2.6.18 and up.

- The newer models flashed (downgraded!) to the 2.01 BIOS available from the Tyan
website seem to work fine, no data corruption while running on both
CPUs and no crashes (although perhaps time is too short to tell for
sure, first one I did was 10 days ago).

- I have an idea that perhaps the 2.02 BIOS the newer systems were
delivered with is a subtely different version than the one on the
website. I may try flashing 2.02 again once the current 2.01 on these
systems has proven to be stable.

- Apparently there's something different on the motherboards from the
first batch and the second batch, otherwise I couldn't explain the
difference in ability to boot 2.6.18 and up. However, I haven't had an
opportunity to open two systems up to compare them visually.

Paul Slootman

2006-12-23 02:04:52

John A Chaves wrote:
> I didn't need to run a specific test for this. The normal workload of the
> machine approximates a continuous selftest for almost the last year.
>
> Large files (4-12GB is typical) are being continuously packed and unpacked
> with gzip and bzip2. Statistical analysis of the datasets is followed by
> verification of the data, sometimes using diff, or md5sum, or python
> scripts using numarray to mmap 2GB chunks at a time. The machine
> often goes for days with a load level of 20+ and 32GB RAM + another 32GB
> swap in use. It would be very unlikely for data corruption to go unnoticed.
>
> When I first got the machine I did have some problems with disks being
> dropped from the RAID and occasional log messages implicating the IOMMU.
> But that was with kernel 2.6.16.?, Kernels since 2.6.17 haven't had any
> problem.
>
Ah thanks for that info,.. as far as I can tell,.. this "testing
environment" should have found any corruptions I there had been any.

So I think we could take this as our first working system where the
issue don't occur although we would expect it...

Chris.

Attachments:

calestyo.vcf (156.00 B)

2007-01-03 15:03:07

by Christoph Anton Mitterer

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

Hi everybody.

After my last mails to this issue (btw: anything new in the meantime? I
received no replys..) I wrote again to nvidia and AMD...
This time with some more success.

Below is the answer from Mr. Friedman to my mail. He says that he wasn't
able to reproduce the problem and asks for a testing system.
Unfortunately I cannot ship my system as this is my only home PC and I
need it for daily work. But perhaps someone else here might has a system
(with the error) that he can send to Nvidia...

I cc'ed Mr. Friedman so he'll read your replies.

To Mr. Friedman: What system did you exactly use for your testing?
(Hardware configuration, BIOS settings and so on). As we've seen before
it might be possible that some BIOSes correct the problem.

Best wishes,
Chris.

Lonni J Friedman wrote:
> Christoph,
> Thanks for your email. I'm aware of the LKML threads, and have spent
> considerable time attempting to reproduce this problem on one of our
> reference motherboards without success. If you could ship a system
> which reliably reproduces the problem, I'd be happy to investigate further.
>
> Thanks,
> Lonni J Friedman
> NVIDIA Corporation
>
> Christoph Anton Mitterer wrote:
>
>> Hi.
>>
>> First of all: This is only a copy from a thread to nvnews.net
>> (http://www.nvnews.net/vbulletin/showthread.php?t=82909). You probably
>> should read the description there.
>>
>> Please note that his is also a very important issue. It is most likely
>> not only Linux related but a general nforce chipset design flaw, so
>> perhaps you should forwad this mail to your engineers too. (Please CC me
>> in all mails).
>>
>> Also note: I'm not one of the normal "end users" with simple problems or
>> damaged hardware. I study computer science and work in one of Europes
>> largest supercomputing centres (Leibniz supercomputing centre).
>> Believe me: I know what I'm talking about.... and I'm investigating in
>> this issue (with many others) for some weeks now.
>>
>> Please answer either to the specific lkml thread, to the nvnews.net post
>> or directly to me (via email).
>> And I'd be grateful if you could give me email-addresses from your
>> developers or enginers, or even better, forward this email to them and
>> CC me. Of course I'll keep their emails-addresses absolutely confident
>> if you wish.
>>
>> Best wishes,
>> Christoph Anton Mitterer.
>> Munich University of Applied Sciences / Department of Mathematics and
>> Computer Science
>> Leibniz Supercomputing Centre / Department for High Performance
>> Computing and Compute Servers
>>
>>
>>
>>
>> Here is the copy:
>> Hi.
>>
>> I've already tried to "resolve" this via the nvidia knowledgebase but
>> either they don't want to know about that issue or there is noone who is
>> competent enought to give information/solutions about it.
>> They finally pointed me to this fourm and told me that Linux
>> <http://www.nvnews.net/vbulletin/showthread.php?t=82909#> support would
>> be handled here (they did not realise that this is probably a hardware
>> <http://www.nvnews.net/vbulletin/showthread.php?t=82909#> flaw and not
>> OS related).
>>
>> I must admit that I'm a little bit bored with Nvidia's policy in such
>> matters and thus I only describe the problem in brief.
>> If here is any competent chipset engineer who reads this, than he might
>> read the main discussion-thread (and some spin-off threads) of the issue
>> which takes place at the linux-kernel mailing list (again this is
>> probably not Linux related).
>> You can find the archive here:
>> http://marc.theaimsgroup.com/?t=116502121800001&r=1&w=2
>> <http://marc.theaimsgroup.com/?t=116502121800001&r=1&w=2>
>>
>>
>> Now a short description:
>> -I (and many others) found a data corruption issue that happens on AMD
>> Opteron / Nvidia chipset systems
>> <http://www.nvnews.net/vbulletin/showthread.php?t=82909#>.
>>
>> -What happens: If one reads/writes large amounts of data there are errors.
>> We test this the following way: Create some test data (huge amounts
>> of),.. make md5sums of it (or with other hash algorithms), then verify
>> them over and over.
>> The test shoes differences (refer the lkml thread for more information
>> about this). Always at differnt files (!!!!). It may happen at read AND
>> write access <http://www.nvnews.net/vbulletin/showthread.php?t=82909#>.
>> Note that even for affected users the error occurs rarely (but this is
>> of course still far to often): My personal tests shows about the following:
>> Test data: 30GB (of random data), I verify sha512sum 50 times (that is
>> what I call one complete test). So I verify 30*50GB. In one complete
>> test there are about 1-3 files with differences. With about 100
>> corrupted bytes (at leas very low data sizes, far below an MB)
>>
>> -It probably happens with all the nforce chipsets (see the lkml thread
>> where everybody tells his hardware)
>>
>> -The reasons are not single hardware defects (dozens of hight quality
>> memory <http://www.nvnews.net/vbulletin/showthread.php?t=82909#>, CPU,
>> PCI bus, HDD bad block scans, PCI parity, ECC, etc. tests showed this,
>> and even with different hardware compontents the issue remained)
>>
>> -It is probably not an Operating System related bug, although Windows
>> won't suffer from it. The reason therefore is, that windows is (too
>> stupid) ... I mean unable to use the hardware iommu at all.
>>
>> -It happens with both, PATA and SATA disk. To be exact: It is may that
>> this has nothing special to do with harddisks at all.
>> It is probably PCI-DMA related (see lkml for more infos and reasons for
>> this thesis).
>>
>> -Only users with much main memory (don't know the exact value by hard
>> and I'm to lazy to look it up)... say 4GB will suffer from this problem.
>> Why? Only users who need the memory hole mapping and the iommu will
>> suffer from the problem (this is why we think it is chipset related).
>>
>> -We found two "workarounds" but these have both big problems:
>> Workaround 1: Disable Memory Hole Mapping in the system BIOS at all.
>> The issue no longer occurs, BUT you loose a big part of your main memory
>> (depending on the size of the memhole, which itself depends on the PCI
>> devices). In my case I loose 1,5GB from my 4GB. Most users will probably
>> loose 1GB.
>> => inacceptable
>>
>> Workaround 2: As told Windows won't suffer from the problem because it
>> always uses an software iommu. (btw: the same applies for Intel CPUs
>> with EMT64/Intel 64,.. these CPUs don't even have a hardware iommu).
>> Linux is able to use the hardware iommu (which of course accelerates the
>> whole system).
>> If you tell the kernel (Linux) to use a software iommu (with the kernel
>> parameter iommu=soft),.. the issue won't appear.
>> => this is better than workaround 1 but still not really acceptable.
>> Why? There are some following problems:
>>
>> The hardware iommu and systems with such big main memory is largely used
>> in computing centres. Those groups won't abdicate the hwiommu in
>> general, simply because some Opteron (and perhaps Athlon) / Nvidia
>> combinations make problems.
>> (I can tell this because I work at the Leibniz Supercomputing Centre,..
>> one of the largest in Europe)
>>
>> But as we don't know the exact reason for the issue, we cannot
>> selectively switch the iommu=soft for affected
>> mainboards/chipsets/cpu-steppings/and alike.
>>
>> We'd have to use a kernel wide iommu=soft as a catchall solution.
>> But it is highly unlikely that this is accepted by the Linux community
>> (not to talk about end users like the supercomputing centres) and I
>> don't want to talk about other OS'es.
>>
>>
>> So we (and of course all, and especially professional, customers) need
>> Nvidias help.
>>
>> Perhaps this might be solvable via BIOS fixes, but of course not by the
>> stupid-solution "disable hwiommu via the BIOS".
>> Perhaps the reason is a Linux kernel bug (although this is highly unlikely).
>> Last but not least,.. perhaps this is AMD Opteron/Athlon (Note: These
>> CPUs have the memory controllers directly integrated) issue and/or
>> Nvidia nforce chipset issue.
>>
>> Regards,
>> Chris.
>> *
>> btw: For answers from Nvidia engineers/developers or end-users who
>> suffer from that issue too,... please post it to the lkml thread (see
>> above for the link) and if not possible here.
>> You may even contact me via email ([email protected]) or personal
>> messages.*
>>
>> PS: Please post any other resources/links to threads about this or
>> similar problems.
>>
>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
>

Attachments:

calestyo.vcf (156.00 B)

2007-01-03 23:42:50

by Robert Hancock

Christoph Anton Mitterer wrote:
> Sorry, as always I've forgot some things... *g*
>
>
> Robert Hancock wrote:
>
>> If this is related to some problem with using the GART IOMMU with memory
>> hole remapping enabled
> What is that GART thing exactly? Is this the hardware IOMMU? I've always
> thought GART was something graphics card related,.. but if so,.. how
> could this solve our problem (that seems to occur mainly on harddisks)?

The GART built into the Athlon 64/Opteron CPUs is normally used for
remapping graphics memory so that an AGP graphics card can see
physically non-contiguous memory as one contiguous region. However,
Linux can also use it as an IOMMU which allows devices which normally
can't access memory above 4GB to see a mapping of that memory that
resides below 4GB. In pre-2.6.20 kernels both the SATA and PATA
controllers on the nForce 4 chipsets can only access memory below 4GB so
transfers to memory above this mark have to go through the IOMMU. In
2.6.20 this limitation is lifted on the nForce4 SATA controllers.

>
>> then 2.6.20-rc kernels may avoid this problem on
>> nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA
>> controller are concerned
> Does this mean that PATA is no related? The corruption appears on PATA
> disks to, so why should it only solve the issue at SATA disks? Sounds a
> bit strange to me?

The PATA controller will still be using 32-bit DMA and so may also use
the IOMMU, so this problem would not be avoided.

>
>> as the sata_nv driver now supports 64-bit DMA
>> on these chipsets and so no longer requires the IOMMU.
>>
> Can you explain this a little bit more please? Is this a drawback (like
> a performance decrease)? Like under Windows where they never use the
> hardware iommu but always do it via software?

No, it shouldn't cause any performance loss. In previous kernels the
nForce4 SATA controller was controlled using an interface quite similar
to a PATA controller. In 2.6.20 kernels they use a more efficient
interface that NVidia calls ADMA, which in addition to supporting NCQ
also supports DMA without any 4GB limitations, so it can access all
memory directly without requiring IOMMU assistance.

Note that if this corruption problem is, as has been suggested, related
to memory hole remapping and the IOMMU, then this change only prevents
the SATA controller transfers from experiencing this problem. Transfers
on the PATA controller as well as any other devices with 32-bit DMA
limitations might still have problems. As such this really just avoids
the problem, not fixes it.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-01-16 13:54:38

Chris Wedgwood wrote:
> right now i'm thinking if we can't figure out which cpu/bios
> combinations are safe we might almost be better off doing iommu=soft
> for *all* k8 stuff except for those that are whitelisted; though this
> seems extremely drastic
>
I agree,... it seems drastic, but this is the only really secure solution.
But it seems that none of the responsible developers read our thread or
the bugreport and gave his opinion about the issue.

> it's not clear if this only affect nvidia based chipsets, the nature
> of the corruption makes me think it's not an iommu software bug (we
> see a few bytes not entire pages corrupted, it's not even clear if
> it's entire cachelines trashed) --- perhaps other vendors have more
> recent bios errata or maybe it's just that nvidia has sold a lot of
> these so they are more visible? (i'm assuming at this point it might
> be some kind of cpu errata that some bioses deal with because some
> mainboards don't ever seem to see this whilst others do)
>
Well we can hope that Nvidia will find out more (though I'm not too
optimistic).

> in some ways the problem is worse with recent kernels --- because the
> ethernet and sata can address over 4GB and don't use the iommu anymore
> the problem is going to be *much* harder to hit, but still here
> lurking to cause problems for people.
Yes I agree,.. this is a dangerous situation...
But we should not forget about the issue, just because SATA is not
longer affected.

Chris.

Attachments:

calestyo.vcf (156.00 B)

2007-01-16 20:17:17

by Arkadiusz Miśkiewicz

[permalink] [raw]

Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

On Tuesday 16 January 2007 19:01, Chris Wedgwood wrote:
> On Tue, Jan 16, 2007 at 08:26:05AM -0600, Robert Hancock wrote:
> > >If one use iommu=soft the sata_nv will continue to use the new code
> > >for the ADMA, right?
> >
> > Right, that shouldn't affect it.
>
> right now i'm thinking if we can't figure out which cpu/bios
> combinations are safe we might almost be better off doing iommu=soft
> for *all* k8 stuff except for those that are whitelisted; though this
> seems extremely drastic
>
> it's not clear if this only affect nvidia based chipsets, the nature
> of the corruption makes me think it's not an iommu software bug (we
> see a few bytes not entire pages corrupted, it's not even clear if
> it's entire cachelines trashed) --- perhaps other vendors have more
> recent bios errata or maybe it's just that nvidia has sold a lot of
> these so they are more visible? (i'm assuming at this point it might
> be some kind of cpu errata that some bioses deal with because some
> mainboards don't ever seem to see this whilst others do)

FYI it seems that I was also hit by this bug with qlogic fc card + adaptec
taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on
it.

http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=st&q=arkadiusz+fibre&rnum=8#45701994c95fe2cf

--
Arkadiusz Mi?kiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/

2007-01-16 20:21:15

Chris Wedgwood wrote:
> I'd like to here from Andi how he feels about this? It seems like a
> somewhat drastic solution in some ways given a lot of hardware doesn't
> seem to be affected (or maybe in those cases it's just really hard to
> hit, I don't know).
>
Yes this might be true,.. those who have reported working systems might
just have a configuration where the error happens even rarer or where
some other event(s) work around it.

>> Well we can hope that Nvidia will find out more (though I'm not too
>> optimistic).
>>
> Ideally someone from AMD needs to look into this, if some mainboards
> really never see this problem, then why is that? Is there errata that
> some BIOS/mainboard vendors are dealing with that others are not?
>
Some time ago I've asked here in a post if some of you could try to
contact AMD and/or Nvidia,.. as no one did,... I wrote them again (to
all forums and email addresses I knew). (You can see the text here
http://www.nvnews.net/vbulletin/showthread.php?t=82909).
Now Nvidia replied and it seems (thanks to Mr. Friedman) that they're
actually try to investigate in the issue...

I received on reply from AMD (actually in German which is strange as I
wrote to their US support)... where they told me they'd have forwarded
my mail to their Linux engineers... but no reply since then.

Perhaps some of you have some "contacts" and can use them...

Attachments:

calestyo.vcf (156.00 B)

2007-01-17 01:17:11

joachim wrote:
> Not only has it only been on Nvidia chipsets but we have only seen
> reports on the Nvidia CK804 SATA controller. Please write in or add
> yourself to the bugzilla entry [1] and tell us which hardware you have
> if you get 4kB pagesize corruption and it goes away with "iommu=soft".
How do I find out if I get a 4kB pagesize corruption (or is this the
same as "our corruption"?

Chris.

btw: Should we only post the controller, or other hardware details, too?

Attachments:

calestyo.vcf (156.00 B)

2007-01-18 14:43:39

On Wed, 17 Jan 2007, Andi Kleen wrote:

> On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:
> > On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:
> > > I agree,... it seems drastic, but this is the only really secure
> > > solution.
> >
> > I'd like to here from Andi how he feels about this? It seems like a
> > somewhat drastic solution in some ways given a lot of hardware doesn't
> > seem to be affected (or maybe in those cases it's just really hard to
> > hit, I don't know).
>
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.

We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems
to solve the problem. AMD and Nvidia analyzed an HDT trace that
seemed to indicate that CPU updates of the GATT were still in cache
when a subsequent table walk caused by a device load used a stale GATT
PTE. That analysis inspired this patch, submitted to this list as an
RFC. It is not obvious (to me, at least) why this problem has only
shown up on Nvidia SATA controllers.

We are continuing to investigate.

diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index 030eb37..1dd461a 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -69,6 +69,8 @@ static u32 gart_unmapped_entry;
#define AGPEXTERN
#endif

+#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + (i)))
+
/* backdoor interface to AGP driver */
AGPEXTERN int agp_memory_reserved;
AGPEXTERN __u32 *agp_gatt_table;
@@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem,
for (i = 0; i < npages; i++) {
iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
SET_LEAK(iommu_page + i);
+ GATT_CLFLUSH(iommu_page + i);
phys_mem += PAGE_SIZE;
}
return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK);
@@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int start, int stopat,
while (pages--) {
iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr);
SET_LEAK(iommu_page);
+ GATT_CLFLUSH(iommu_page);
addr += PAGE_SIZE;
iommu_page++;
}

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426