2006-03-15 22:30:55

by Dax Kelson

[permalink] [raw]
Subject: Warning - Maxtor SATA II and Nvidia nforce4

Short version
==============
Nvidia Nforce4 chipset with Maxtor SATA II drives with certain firmware
revisions cause data corruption and system instability when under
moderate to heavy I/O load.

After being suspected for over a year, it was acknowledged just in the
last few weeks, see:

http://maxtor.custhelp.com/cgi-bin/maxtor.cfg/php/enduser/std_adp.php?p_faqid=2685

(there is a list of affected HD model numbers and HD firmware versions)

If it is possible to determine the firmware version, maybe some printk
warnings could be generated.

Long version
=============

About a year ago I got a new home uber system all decked out.

AMDFX-55
Nforce4 SLI motherboard
2GB RAM
300GB Maxtor SATA II HDD x2 (model 6B300S with firmware BANC1B70)

Off and on I have experienced the following problems:

* kernel panics
* freezes
* insta-reboots
* on-board RAID (nforce fake raid) de-syncing
* LCD blinking on and off (most common symptom for me)
* segfaults and application crashes

The problems were not continuous and would seemingly erratically
appear.

My memory tested fine with memtest86+ over repeated tests the past 12
months. I RMA'd my video cards and got a new motherboard. I was
contemplating swapping my CPU when I finally did a Google search on
"Maxtor nforce4" and my eyes were opened. Pages and pages of posts in
hundreds of different forums.

Dax Kelson


2006-03-15 22:47:20

by Jeff Garzik

[permalink] [raw]
Subject: Re: Warning - Maxtor SATA II and Nvidia nforce4


Ah, I see this made it to LKML :)

Dax Kelson wrote:
> Short version
> ==============
> Nvidia Nforce4 chipset with Maxtor SATA II drives with certain firmware
> revisions cause data corruption and system instability when under
> moderate to heavy I/O load.

I'm a bit suspicious of this.

Looking at the link, there are three problem areas and two problem blame
targets implied:

Data corruption -> blame nvidia driver
NCQ -> blame nvidia driver
Detection -> blame maxtor firmware

The first one likely applies to the Windows driver not Linux's sata_nv,
and thus irrelevant here. The second one OBVIOUSLY applies only to
Windows, since sata_nv (and libata itself) don't yet enable NCQ. The
third one could potentially apply to Linux. Lastly, your mention of
"nforce fake raid" almost certainly indicates Windows or proprietary
drivers.

Therefore, I ask:
* are you reporting a only drive detection problem?
* why are you reporting unrelated Windows problems to a Linux list?
* if you are indeed reporting a problem on Linux, where is the kernel
and driver version info, as requested in REPORTING-BUGS?
* and can you provide such info *and reproduce the problems* without
proprietary drivers loaded?

Your email is just a list of highly general symptoms. Your link seems
to indicate two NV driver bugs on Windows, and a Maxtor firmware upgrade
for undescribed detection problems.

My recommended action for users is:
1) Avoid Windows.
2) Don't panic.

Jeff


2006-03-15 23:22:50

by Dax Kelson

[permalink] [raw]
Subject: Re: Warning - Maxtor SATA II and Nvidia nforce4

On Wed, 2006-03-15 at 17:47 -0500, Jeff Garzik wrote:
> Ah, I see this made it to LKML :)
>
> Dax Kelson wrote:
> > Short version
> > ==============
> > Nvidia Nforce4 chipset with Maxtor SATA II drives with certain firmware
> > revisions cause data corruption and system instability when under
> > moderate to heavy I/O load.
>
> I'm a bit suspicious of this.
>
> Looking at the link, there are three problem areas and two problem blame
> targets implied:
>
> Data corruption -> blame nvidia driver
> NCQ -> blame nvidia driver
> Detection -> blame maxtor firmware
>
> The first one likely applies to the Windows driver not Linux's sata_nv,
> and thus irrelevant here.

No.

Take a big file (5-10gb)

$ cp bigfile newfile
$ cp bigfile newfile2
$ cp bigfile newfile3
$ cp bigfile newfile4
$ md5sum bigfile newfile*
[results are all different, assuming kernel doesn't panic during test]

When I use the "stress" utility from
http://weather.ou.edu/~apw/projects/stress/

The box usually makes it an an hour or two before a kernel panic or I/O
errors wedge the box.

I setup a netdump/netconsole server on my network and I have several
crashes captured. If you are interested I can send them on to you. I
filed most them under the Red Hat bugzilla, but closed them after I
discovered they were a hardware problem.

> The second one OBVIOUSLY applies only to
> Windows, since sata_nv (and libata itself) don't yet enable NCQ. The
> third one could potentially apply to Linux. Lastly, your mention of
> "nforce fake raid" almost certainly indicates Windows or proprietary
> drivers.

Linux device mapper is proprietary? :)

The corruption occurs with a single disk or when using a device mapper
"nvraid".

> Therefore, I ask:
> * are you reporting a only drive detection problem?

No. Detection was never a problem for me.

> * why are you reporting unrelated Windows problems to a Linux list?

I'm not, see above.

> * if you are indeed reporting a problem on Linux, where is the kernel
> and driver version info, as requested in REPORTING-BUGS?

Well, what can Linux do about this hardware problem? Maybe there is a
workaround that can be done, but I'm not counting on it. A warning would
be nice if it possible to detect the conditions where this can occur.
This way others can troubleshoot and identify this problem quicker.

I used mostly late model FC5 rawhide kernels which I believe are based
off of 2.6.16rc5-git12/git13 or therebouts.

> * and can you provide such info *and reproduce the problems* without
> proprietary drivers loaded?

Sorry for the misunderstanding. Again, no proprietary drivers ever
loaded. Problem is 100% reproducible. See above, etc.

Dax Kelson


2006-03-16 06:30:28

by Sander

[permalink] [raw]
Subject: Re: Warning - Maxtor SATA II and Nvidia nforce4

Jeff Garzik wrote (ao):
> Ah, I see this made it to LKML :)

I'm not the OP. The Maxtor notice is a few months old already.

> Dax Kelson wrote:
> >Short version
> >==============
> >Nvidia Nforce4 chipset with Maxtor SATA II drives with certain firmware
> >revisions cause data corruption and system instability when under
> >moderate to heavy I/O load.
>
> I'm a bit suspicious of this.
>
> Looking at the link, there are three problem areas and two problem blame
> targets implied:
>
> Data corruption -> blame nvidia driver
> NCQ -> blame nvidia driver
> Detection -> blame maxtor firmware
>
> The first one likely applies to the Windows driver not Linux's sata_nv,
> and thus irrelevant here. The second one OBVIOUSLY applies only to
> Windows, since sata_nv (and libata itself) don't yet enable NCQ. The
> third one could potentially apply to Linux. Lastly, your mention of
> "nforce fake raid" almost certainly indicates Windows or proprietary
> drivers.
>
> Therefore, I ask:
> * are you reporting a only drive detection problem?
> * why are you reporting unrelated Windows problems to a Linux list?
> * if you are indeed reporting a problem on Linux, where is the kernel
> and driver version info, as requested in REPORTING-BUGS?
> * and can you provide such info *and reproduce the problems* without
> proprietary drivers loaded?
>
> Your email is just a list of highly general symptoms. Your link seems
> to indicate two NV driver bugs on Windows, and a Maxtor firmware upgrade
> for undescribed detection problems.
>
> My recommended action for users is:
> 1) Avoid Windows.
> 2) Don't panic.

Last december I requested new firmware for my drives. Maxtor called me
and asked if I did have any problems. I did not, but just wanted to fix
the problem before I would notice any.

The Maxtor guy then told me that harddisk firmware upgrades are best not
to be done if not needed, and asked what operating system I run (answer:
Linux). He said that the problems only exists with Windows, and that
Linux should be ok.

In fact, I have yet to see a problem with my sata Maxtor disks connected
to the onboard nForce4 controller. This supports Jeff Garziks story.

I do notice that the nForce4 controller most of the times fails to
detect some of the drives (seems random) on a reboot. A powerdown and
fresh boot lets the controller detect all disks again.

Sander

--
Humilis IT Services and Solutions
http://www.humilis.net