2003-01-01 15:31:59

by Soeren Sonnenburg

[permalink] [raw]
Subject: ide harddisk freeze WDC WD1800JB vs VIA VT8235

Hi.

I still get harddisk freezes on 2 WD1800JB drives (asus A7V8X mobo),
i.e., I have to powercycle the system to get the harddisk back working.
A cold reset is not enough.

The harddisks are connected to the primary and secondary via vt8235
controller (both disks are master).

When a harddisk 'freezes' the ide-light is continuosly on. When that
happens the machine is still ping-able but since these two disk form a
software raid0 the machine hangs on disk access.

This problem happens like once in a week....and very seldom within 3
days.

The system is athlon xp 2.4G+ a7v8x mainboard 1.5g ddr333 memory kernel
2.4.21-pre2 (did not work with 2.4.20 that is why I went to the pre
version)

So far I have ruled out the following:
- it is not a cable problem (tested several cables)
- the memory of the machine seems to be ok (memtest ran for 2 days
without reporting an error)
- it is not a problem of the power supply (tested another powersupply)
- the harddisk are ok ( I ran the vendors check program )

Now it could be that the via vt8235 chipset has a bug, the linux ide
handling code is buggy or the firmware of the drive is not ok.

It is probably a vt8235 problem or linux driver problem since a I know
of 8 wdc1800jb disks with that firmware version running since >1month in
a hardware raid setup without problems.

I attached the kernel config, dmesg, and hdparm -I of one of the drives.

If you have any ideas/ need further infos don't hesitate to contact me.

Thanks in advance,
Soeren.


Attachments:
config (22.91 kB)
dmesg (14.96 kB)
hdparm (1.78 kB)
Download all attachments

2003-01-01 15:51:39

by Mark Rutherford

[permalink] [raw]
Subject: Re: ide harddisk freeze WDC WD1800JB vs VIA VT8235

Q: are you using cables with a slave connector, but its not in use?
I had this problem, with this chipset and it turned out that it didnt like
having
a long 80 wire cable with a loose connector
I got a round cable with just 2 connectors, 1 for the board and 1 for the
drive....
never locked again.
I thought it to be strange as well.
but with my setup it happened more frequently, say once every 2-3 hours.
hope this is of any help.


Soeren Sonnenburg wrote:

> Hi.
>
> I still get harddisk freezes on 2 WD1800JB drives (asus A7V8X mobo),
> i.e., I have to powercycle the system to get the harddisk back working.
> A cold reset is not enough.
>
> The harddisks are connected to the primary and secondary via vt8235
> controller (both disks are master).
>
> When a harddisk 'freezes' the ide-light is continuosly on. When that
> happens the machine is still ping-able but since these two disk form a
> software raid0 the machine hangs on disk access.
>
> This problem happens like once in a week....and very seldom within 3
> days.
>
> The system is athlon xp 2.4G+ a7v8x mainboard 1.5g ddr333 memory kernel
> 2.4.21-pre2 (did not work with 2.4.20 that is why I went to the pre
> version)
>
> So far I have ruled out the following:
> - it is not a cable problem (tested several cables)
> - the memory of the machine seems to be ok (memtest ran for 2 days
> without reporting an error)
> - it is not a problem of the power supply (tested another powersupply)
> - the harddisk are ok ( I ran the vendors check program )
>
> Now it could be that the via vt8235 chipset has a bug, the linux ide
> handling code is buggy or the firmware of the drive is not ok.
>
> It is probably a vt8235 problem or linux driver problem since a I know
> of 8 wdc1800jb disks with that firmware version running since >1month in
> a hardware raid setup without problems.
>
> I attached the kernel config, dmesg, and hdparm -I of one of the drives.
>
> If you have any ideas/ need further infos don't hesitate to contact me.
>
> Thanks in advance,
> Soeren.
>
> ------------------------------------------------------------------------
> Name: config
> config Type: Plain Text (text/plain)
> Encoding: 7bit
>
> Name: dmesg
> dmesg Type: Plain Text (text/plain)
> Encoding: 7bit
>
> Name: hdparm
> hdparm Type: unspecified type (application/octet-stream)
> Encoding: quoted-printable

--
Regards,
Mark Rutherford
[email protected]




2003-01-01 16:11:37

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: ide harddisk freeze WDC WD1800JB vs VIA VT8235

On Wed, 2003-01-01 at 16:59, Mark Rutherford wrote:
> Q: are you using cables with a slave connector, but its not in use?
> I had this problem, with this chipset and it turned out that it didnt like
> having
> a long 80 wire cable with a loose connector
> I got a round cable with just 2 connectors, 1 for the board and 1 for the
> drive....
> never locked again.
> I thought it to be strange as well.
> but with my setup it happened more frequently, say once every 2-3 hours.
> hope this is of any help.

Thanks for your answer.

Indeed these cables have slave connectors (which are unused). I tried a
round cable and 2 different non-round ones... I also went down to udma4
(==udma66)... no use..

I am not sure whether the harddisk should cold-freeze...

Soeren.

2003-01-02 07:41:36

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: ide harddisk freeze WDC WD1800JB vs VIA VT8235

On Wed, 2003-01-01 at 18:37, Mark Hahn wrote:


> ide cables are <= 18", PERIOD. if only one connnector is used,
> it must be the end one. this is all quite clear in the ATA spec.

the flat cables I tried were all ~24cm < 18" (tried two different
sets)... I have not measured the length of the round cables...

> the original poster should consider simplifying his system first:
> for instance, don't load all the random IO devices, especially not
> the NVidia module, consider testing with ext2/3, etc.

I don't see the point with this. This system (with a K7- and older via
chipset) was working reliably. All that changed is 180G harddisks +
mainboard + processor.

Anyway the kernel is not crashing and it is very unlikely that a bug in
whatever io device will cause this specific problem reproducably with
different kernel versions isn't it ?

Especially changing the file system type will not gain anything....

However it *might* be that this is some 48bit ide-access problem as both
disks are 180G in size.

Unfortunately I failed to trigger the problem, i.e. I could run
badblocks -p 0 /dev/hda and at the same time for /dev/hdc for more then
a day (>16 passes) without any trouble.

So it sounds as if the problem to occur is very very unlikely... IMHO
this is some 'failure of transfer' between disk and controller... and
this error condition is not properly handled...

Not the slightest idea what that could be....
Soeren.