2008-08-23 14:45:19

by Sergey Spiridonov

[permalink] [raw]
Subject: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Hi

I got kernel errors [1] and [2] followed by SATA reset on heavy load on
the hard drive connected to the GA-MA790FX-DS5 onboard controller
Jmicron 20360/20363 (JMB363) (here is lspci [3]). Hard drive connected
to the another onboard (south bridge from AMD SB600) controller works
without problem.

I got two 1TB Seagate hard disks, ST31000340AS and ST31000340NS. I
connected one to Jmicron JMB363, another to SB600. After some testing
with several instances of bonnie++ I got kernel errors [1] and [2].
After this I exchanged hard disks connections. The one which was
connected to JMB363 I connected to SB600 and vs versa. Errors, timeouts
and hard drive resetting happened always on the hard drive which is
connected to the JMB363 (in log file it is sdb). There are no errors if
both drives are connected to the SB600.

Here [4] is complete (before i get errors) dmesg output after system is
booted.

I already replaced (took from working PC) power supply, memory, video
card and dvd drive. I get same problems also with this devices. So
problem must be motherboard, software or CPU. CPU seems to work O.K.

It looks like the problem is motherboard or ahci ata driver. Does
somebody have any clue about it? Is chip JMB363 broken or linux driver
is broken?

[1] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-sata-errors.txt
[2] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-sata-errors2.txt
[3] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/lspci.txt
[4] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-after-boot.txt

Here is complete hw description:
--------------------------------------------
Motherboard : GA-MA790FX-DS5(rev. 1.0)
BIOS Ver : F6 (was tested also with F5)
VGA Brand : Asus Model : EN8400GS HTP TD 256MB
CPU Brand : AMD Model : AM2 Athlon64 X2 4850E boxed Speed : 2500 MHz
Operation System : Debian GNU/Linux Lenny with kernel 2.6.25-2
Memory Brand : Kingston Type : DDRII
Memory Size : 1GB Speed : 800Mhz
Power Supply : 600W MS-Tech MP-600 W
--------------------------------------------

Here is part of error log, in case links does not work:

[ 373.263823] ata7.00: exception Emask 0x10 SAct 0x777ff SErr 0x580100
action 0x2
[ 373.263904] ata7.00: irq_stat 0x08000000
[ 373.263973] ata7: SError: { UnrecovData 10B8B Dispar Handshk }
[ 373.264039] ata7.00: cmd 61/00:00:ce:5a:68/04:00:0a:00:00/40 tag 0
ncq 524288 out
[ 373.264041] res 40/00:70:46:4f:68/00:00:0a:00:00/40 Emask
0x10 (ATA bus error)
[ 373.264197] ata7.00: status: { DRDY }
[ 373.264266] ata7.00: cmd 61/00:08:ce:5e:68/03:00:0a:00:00/40 tag 1
ncq 393216 out
[ 373.264267] res 40/00:70:46:4f:68/00:00:0a:00:00/40 Emask
0x10 (ATA bus error)
[ 373.264415] ata7.00: status: { DRDY }
[ 373.264484] ata7.00: cmd 61/30:10:d6:69:68/02:00:0a:00:00/40 tag 2
ncq 286720 out
[ 373.264485] res 40/00:70:46:4f:68/00:00:0a:00:00/40 Emask
0x10 (ATA bus error)

...


[ 373.271291] ata7: hard resetting link
[ 373.915361] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 376.158770] ata7.00: configured for UDMA/133
[ 376.158770] ata7: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
[ 376.158770] ata7: irq_stat 0x48000000
[ 376.158770] ata7: EH complete
[ 376.158770] sd 6:0:0:0: [sdb] 1953523055 512-byte hardware sectors
(1000204 MB)
[ 376.158770] sd 6:0:0:0: [sdb] Write Protect is off
[ 376.158770] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 376.158770] sd 6:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 1557.808227] ata7.00: exception Emask 0x2 SAct 0x7e7ff SErr 0x980400
action 0x2
[ 1557.808227] ata7.00: irq_stat 0x08000000
[ 1557.808227] ata7: SError: { Proto 10B8B Dispar LinkSeq }
[ 1557.808227] ata7.00: cmd 61/00:00:4e:d5:41/04:00:11:00:00/40 tag 0
ncq 524288 out
[ 1557.808227] res 40/00:80:d6:21:41/00:00:11:00:00/40 Emask
0x2 (HSM violation)
[ 1557.808227] ata7.00: status: { DRDY }
[ 1557.808227] ata7.00: cmd 61/60:08:0e:2e:42/02:00:11:00:00/40 tag 1
ncq 311296 out
[ 1557.808227] res 40/00:80:d6:21:41/00:00:11:00:00/40 Emask
0x2 (HSM violation)
[ 1557.808227] ata7.00: status: { DRDY }


lspci:

03:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363
AHCI Controller (rev 02)
03:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363
AHCI Controller (rev 02)
04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363
AHCI Controller (rev 02)
04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363
AHCI Controller (rev 02)
--
Best regards, Sergey Spiridonov


2008-08-23 23:25:01

by Christian Lamparter

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

On Sunday 24 August 2008 00:38:36 Jeff Garzik wrote:
> See http://ata.wiki.kernel.org/index.php/Libata_error_messages for an
> introduction.
>
> In general, tons of ATA bus errors and SError register bits means that
> problems are coming from the ATA bus, a.k.a. the SATA cable and its
> related connections.
>
> So... suspect bad cables, bad port connectors, cable interference,
> motherboard-caused interference or grounding problems, power supply
> problems.
>
hmm, or something totally odd...

what happens if you do: (after you made a backup!)
"dd if=/dev/sdX(where X is your affected hdd?) of=/dev/null bs=1"
Note:
The important bit is the small bs (blocksize) number.
You can throw in a O_DIRECT flag to disable the caches, or
if you have some "empty" partition space, you can "dd" into
it with a small blocksize too)

my seagate & even a samsung hd103uj doesn't like that and will spew
out the same sort problems you have just posted... (but they work fine,
if I don't do nasty dd things!)

and unfortunatly my md(raid1) seems to do lots of "small" reads & writes
when it starts to check/resync the whole 1TB array :-/.

Regards,
Chr

2008-08-24 17:02:47

by xerces8

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Sergey Spiridonov wrote:

> Jeff Garzik wrote:
>
> > So... suspect bad cables, bad port connectors, cable interference,
> > motherboard-caused interference or grounding problems, power supply
> > problems.
>
> I did exchange power supply and I did exchange hard drives. The same
> hard drive with the same cable works with SB600 and produces errors with
> JMB363. So looks like it is not cable or hard drive problem. May be the
> problem is JMB363 port connector on the motherboard. How can I check it?

Hi!

I have a JMB363 myself and it has its share of problems.
I would say it is buggy hardware. (why would they otherwise
release a new windows driver every week ? if not to workaround
bugs in HW ;-)

My WD MyBook Studio Edition 500 GB external eSATA drive does not
work on the JMB363 correctly no matter what I try. Both
under linux and windows. I think the best was 30 minutes of
(apparent) error free operation under windows.

If interested, I can supply logs, data etc.
(I have a bunch of drives to try).

Regards,
David

2008-08-23 22:38:49

by Jeff Garzik

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Sergey Spiridonov wrote:
> Hi
>
> I got kernel errors [1] and [2] followed by SATA reset on heavy load on
> the hard drive connected to the GA-MA790FX-DS5 onboard controller
> Jmicron 20360/20363 (JMB363) (here is lspci [3]). Hard drive connected
> to the another onboard (south bridge from AMD SB600) controller works
> without problem.
>
> I got two 1TB Seagate hard disks, ST31000340AS and ST31000340NS. I
> connected one to Jmicron JMB363, another to SB600. After some testing
> with several instances of bonnie++ I got kernel errors [1] and [2].
> After this I exchanged hard disks connections. The one which was
> connected to JMB363 I connected to SB600 and vs versa. Errors, timeouts
> and hard drive resetting happened always on the hard drive which is
> connected to the JMB363 (in log file it is sdb). There are no errors if
> both drives are connected to the SB600.
>
> Here [4] is complete (before i get errors) dmesg output after system is
> booted.
>
> I already replaced (took from working PC) power supply, memory, video
> card and dvd drive. I get same problems also with this devices. So
> problem must be motherboard, software or CPU. CPU seems to work O.K.
>
> It looks like the problem is motherboard or ahci ata driver. Does
> somebody have any clue about it? Is chip JMB363 broken or linux driver
> is broken?
>
> [1] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-sata-errors.txt
> [2] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-sata-errors2.txt
> [3] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/lspci.txt
> [4] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-after-boot.txt


See http://ata.wiki.kernel.org/index.php/Libata_error_messages for an
introduction.

In general, tons of ATA bus errors and SError register bits means that
problems are coming from the ATA bus, a.k.a. the SATA cable and its
related connections.

So... suspect bad cables, bad port connectors, cable interference,
motherboard-caused interference or grounding problems, power supply
problems.

Jeff


2008-08-24 00:24:18

by Sergey Spiridonov

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Hi

Jeff Garzik wrote:

> So... suspect bad cables, bad port connectors, cable interference,
> motherboard-caused interference or grounding problems, power supply
> problems.

I did exchange power supply and I did exchange hard drives. The same
hard drive with the same cable works with SB600 and produces errors with
JMB363. So looks like it is not cable or hard drive problem. May be the
problem is JMB363 port connector on the motherboard. How can I check it?
--
Best regards, Sergey Spiridonov

2008-08-24 04:39:49

by Jeff Garzik

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Sergey Spiridonov wrote:
> Hi
>
> Jeff Garzik wrote:
>
>> So... suspect bad cables, bad port connectors, cable interference,
>> motherboard-caused interference or grounding problems, power supply
>> problems.
>
> I did exchange power supply and I did exchange hard drives. The same
> hard drive with the same cable works with SB600 and produces errors with
> JMB363. So looks like it is not cable or hard drive problem. May be the
> problem is JMB363 port connector on the motherboard. How can I check it?

Try new motherboard of same brand and model :/

In general, tons of ATA bus errors and SError complaints indicate some
sort of problem at the physical layer/level. Its always possible that
software is to blame, but bug report patterns so far tend to point to
hardware.

Jeff

2008-08-25 14:52:22

by xerces8

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Sergey Spiridonov wrote:

> Seagate
> support does not tell anything except something like "it should work"...

I just got this* link from WD support, accompanied with the text:
"This sounds like an incompatibility perhaps with the ESATA controller.
Please see the link below for tested ESATA controllers."

* - http://www.wdc.com/en/products/resources/esataupgrade.asp

Regards,
David

2008-08-30 11:01:47

by Tejun Heo

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

xerces8 wrote:
> I have a JMB363 myself and it has its share of problems.
> I would say it is buggy hardware. (why would they otherwise
> release a new windows driver every week ? if not to workaround
> bugs in HW ;-)

Well, FWIW, JMB ahci's are one of my favorites and usually very well
behaved.

> My WD MyBook Studio Edition 500 GB external eSATA drive does not
> work on the JMB363 correctly no matter what I try. Both
> under linux and windows. I think the best was 30 minutes of
> (apparent) error free operation under windows.

This one is being discussed both with JMB and WD. It seems the bridge
chip used in the WD external drives is somehow incompatible with the
JMB ahci's. Don't know whose fault it is or how it can be worked
around yet. The issue is being tracked in the following bugzilla.

http://bugzilla.kernel.org/show_bug.cgi?id=9913

--
tejun

2008-08-30 18:13:52

by xerces8

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Tejun Heo wrote:

> xerces8 wrote:
> > My WD MyBook Studio Edition 500 GB external eSATA drive does not
> > work on the JMB363 correctly no matter what I try. Both
> > under linux and windows. I think the best was 30 minutes of
> > (apparent) error free operation under windows.
>
> This one is being discussed both with JMB and WD. It seems the bridge
> chip used in the WD external drives is somehow incompatible with the
> JMB ahci's. Don't know whose fault it is or how it can be worked
> around yet. The issue is being tracked in the following bugzilla.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=9913

I know, I'm David Balažic (the last commenter on bug, besides you) ;-)

Regards,
David

2008-08-31 09:35:23

by Tejun Heo

[permalink] [raw]
Subject: Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

xerces8 wrote:
> Tejun Heo wrote:
>
>> xerces8 wrote:
>>> My WD MyBook Studio Edition 500 GB external eSATA drive does not
>>> work on the JMB363 correctly no matter what I try. Both
>>> under linux and windows. I think the best was 30 minutes of
>>> (apparent) error free operation under windows.
>> This one is being discussed both with JMB and WD. It seems the bridge
>> chip used in the WD external drives is somehow incompatible with the
>> JMB ahci's. Don't know whose fault it is or how it can be worked
>> around yet. The issue is being tracked in the following bugzilla.
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=9913
>
> I know, I'm David Balažic (the last commenter on bug, besides you) ;-)

Somehow I've been confusing people a lot lately. I asked my AMD contact
a few times about sata_nv problems somehow thinking AMD acquired NVidia
instead of ATI. :-)

--
tejun