LinuxLists.cc - Detecting I/O error and Halting System

2006-03-27 14:55:08

Subject: Detecting I/O error and Halting System

Hi everybody,

I have I/O error which occurs on servers based on a
VIA VT82C686 chipset and I have to prevent or stop the
error. I spent a lot time for searching solutions to
stop the error but I don't found anything, So I want
to write a module which will surveil the HDD and
stops the system after sending a mail.

I read a lot of documents about kernel and writting
modules but I don't know how to start...? Help,
please.

I'm not closed to others solutions (like smartd, or
writting classical programms)

Best regards.

Zine

PS : this are errors du to VIA chipset; if anyone
knows wath appens...?

Feb 12 04:46:03 porte_de_clignancourt_nds_b kernel:
hda: timeout waiting for DMA
Feb 12 04:46:06 alesia_nds_b ucd-snmp[812]: Connection
from 104.25.3.11
Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
ide_dmaproc: chipset supported ide_dma_timeout func
only: 14
Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
hda: status timeout: status=0xd0 { Busy } adapter
disque annonce un status busy du DMA
Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
hda: drive not ready for command
Feb 12 04:46:23 porte_de_clignancourt_nds_b
ucd-snmp[813]: Connection from 104.1.3.11
Feb 12 04:46:23 porte_de_clignancourt_nds_b
ucd-snmp[813]: Connection from 104.1.3.11
Feb 12 04:46:23 porte_de_clignancourt_nds_b last
message repeated 3 times
Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
ide0: reset: success
Feb 12 10:22:38 porte_de_clignancourt_nds_b kernel:
hda: timeout waiting for DMA
Feb 12 10:24:46 porte_de_clignancourt_nds_b kernel:
ide_dmaproc: chipset supported ide_dma_timeout func
only: 14
Feb 12 10:24:46 porte_de_clignancourt_nds_b kernel:
hda: status timeout: status=0xd0 { Busy }
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: drive not ready for command
Feb 12 10:24:47 porte_de_clignancourt_nds_b
ucd-snmp[813]: Connection from 104.1.3.11
Feb 12 10:24:47 porte_de_clignancourt_nds_b last
message repeated 4 times
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
ide0: reset timed-out, status=0x80
le premier reser de ide0 est en ?chec
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: status timeout: status=0x80 { Busy }
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: drive not ready for command
Feb 12 10:24:47 porte_de_clignancourt_nds_b
ucd-snmp[813]: Connection from 104.1.3.11
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
ide0: reset: success

Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: irq timeout: status=0xd0 { Busy }

Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: DMA disabled

Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
ide0: reset timed-out, status=0x80
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: status timeout: status=0x80 { Busy }
nouvel ?chec de reset
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
hda: drive not ready for command
Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
ide0: reset: success

Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
ide0: reset timed-out, status=0x80
Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
hda: status timeout: status=0x80 { Busy }
Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
hda: drive not ready for command
Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
ide0: reset timed-out, status=0x80
Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
end_request: I/O error, dev 03:02 (hda), sector 102263
Feb 12 13:45:38 porte_de_clignancourt_nds_b syslogd:
/var/log/maillog: Input/output error
Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
end_request: I/O error, dev 03:02 (hda), sector 110720
Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
end_request: I/O error, dev 03:02 (hda), sector 110728

___________________________________________________________________________
Nouveau : t?l?phonez moins cher avec Yahoo! Messenger ! D?couvez les tarifs exceptionnels pour appeler la France et l'international.
T?l?chargez sur http://fr.messenger.yahoo.com

2006-03-27 15:24:47

by linux-os (Dick Johnson)

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

On Mon, 27 Mar 2006, zine el abidine Hamid wrote:

> Hi everybody,
>
> I have I/O error which occurs on servers based on a
> VIA VT82C686 chipset and I have to prevent or stop the
> error. I spent a lot time for searching solutions to
> stop the error but I don't found anything, So I want
> to write a module which will surveil the HDD and
> stops the system after sending a mail.
>
> I read a lot of documents about kernel and writting
> modules but I don't know how to start...? Help,
> please.
>
> I'm not closed to others solutions (like smartd, or
> writting classical programms)
>
> Best regards.
>
> Zine
>
> PS : this are errors du to VIA chipset; if anyone
> knows wath appens...?
>
>
> Feb 12 04:46:03 porte_de_clignancourt_nds_b kernel:
> hda: timeout waiting for DMA
> Feb 12 04:46:06 alesia_nds_b ucd-snmp[812]: Connection
> from 104.25.3.11
> Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
> ide_dmaproc: chipset supported ide_dma_timeout func
> only: 14
> Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0xd0 { Busy } adapter
> disque annonce un status busy du DMA
> Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 04:46:23 porte_de_clignancourt_nds_b
> ucd-snmp[813]: Connection from 104.1.3.11
> Feb 12 04:46:23 porte_de_clignancourt_nds_b
> ucd-snmp[813]: Connection from 104.1.3.11
> Feb 12 04:46:23 porte_de_clignancourt_nds_b last
> message repeated 3 times
> Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
> ide0: reset: success
> Feb 12 10:22:38 porte_de_clignancourt_nds_b kernel:
> hda: timeout waiting for DMA
> Feb 12 10:24:46 porte_de_clignancourt_nds_b kernel:
> ide_dmaproc: chipset supported ide_dma_timeout func
> only: 14
> Feb 12 10:24:46 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0xd0 { Busy }
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 10:24:47 porte_de_clignancourt_nds_b
> ucd-snmp[813]: Connection from 104.1.3.11
> Feb 12 10:24:47 porte_de_clignancourt_nds_b last
> message repeated 4 times
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset timed-out, status=0x80
> le premier reser de ide0 est en ?chec
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0x80 { Busy }
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 10:24:47 porte_de_clignancourt_nds_b
> ucd-snmp[813]: Connection from 104.1.3.11
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset: success
>
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: irq timeout: status=0xd0 { Busy }
>
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: DMA disabled
>
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset timed-out, status=0x80
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0x80 { Busy }
> nouvel ?chec de reset
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset: success
>
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> ide0: reset timed-out, status=0x80
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0x80 { Busy }
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> ide0: reset timed-out, status=0x80
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> end_request: I/O error, dev 03:02 (hda), sector 102263
> Feb 12 13:45:38 porte_de_clignancourt_nds_b syslogd:
> /var/log/maillog: Input/output error
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> end_request: I/O error, dev 03:02 (hda), sector 110720
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> end_request: I/O error, dev 03:02 (hda), sector 110728
>
>

Maybe you should just fix the problem rather than attempting
to work around it! The problem is that that you system has
difficulty communicating with a hard disk. This is caused
by either:

(1) Bad hard disk.
(2) Bad cable.
(3) Improper configuration of one or more disks.

Since a reset timed out, it is likely that one of the disks
that share the same cable is defective, not necessarily /dev/hda
if you have another drive on the same cable. If you have only
one drive per cable (or only one drive), it is likely that
/dev/hda is (or becomes defective). Note that the disk can
fail if it gets too hot.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.42 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-27 16:48:42

by Alan

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

On Llu, 2006-03-27 at 16:55 +0200, zine el abidine Hamid wrote:
> hda: status timeout: status=0xd0 { Busy } adapter
> disque annonce un status busy du DMA

If I'm reading the translation right then your hard disk decided
it was busy and then never came back

> Feb 12 04:46:23 porte_de_clignancourt_nds_b kernel:
> ide0: reset: success

So the IDE layer tried to reset it

> Feb 12 10:22:38 porte_de_clignancourt_nds_b kernel:
> hda: timeout waiting for DMA

Which didnt help

> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset: success

Still trying

> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: irq timeout: status=0xd0 { Busy }
>
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: DMA disabled

We gave up on DMA to see if PIO would help
>
>
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset timed-out, status=0x80
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0x80 { Busy }
> nouvel échec de reset
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 10:24:47 porte_de_clignancourt_nds_b kernel:
> ide0: reset: success

And reset..

> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> hda: status timeout: status=0x80 { Busy }
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> hda: drive not ready for command
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> ide0: reset timed-out, status=0x80
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> end_request: I/O error, dev 03:02 (hda), sector 102263
> Feb 12 13:45:38 porte_de_clignancourt_nds_b syslogd:
> /var/log/maillog: Input/output error
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> end_request: I/O error, dev 03:02 (hda), sector 110720
> Feb 12 13:45:38 porte_de_clignancourt_nds_b kernel:
> end_request: I/O error, dev 03:02 (hda), sector 110728

Eventually we give up.

First thing to check would be the disk and the temperature, then the
cabling. In particular make sure the *long* part of the cable is between
the drive and the controller.

2006-03-28 15:07:15

by zine el abidine Hamid

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

First of all, thank you for your analysis.

I don't think that it's a HDD problem nor a cable
problem because the servers are new. We have tried
different HDD (seagate, maxtor) but it has not help
anyway.
It's perhaps a temperature problem but we make a lot
tests in hard condition (high temperature)
successfuly...

One thinks that the problem comes from the VIA chipset
VT82c686 (it's also the opinion of Dick Johnson
(linux-os) whom advised me to try UDMA33 instead of
UDMA66).

How can I determine the problem?

I want to add that the HDD seems to be disconnected
(the BIOS can't find any drive for boot) after a
simple reset. We must switch off the servers to get
them work again.
However, it takes a long time (4 mounths and more)
before the HDD fell down. I want to work around by
write a module which will supervise the HDD. I know
how to write a module (I used the lkmpg guide
(http://www.tldp.org/LDP/lkmpg/) but how can I
shutdown Linux from inside a module...?

best regards.

Zine.

--- Alan Cox <[email protected]> a ?crit :

> On Llu, 2006-03-27 at 16:55 +0200, zine el abidine
> Hamid wrote:
> > hda: status timeout: status=0xd0 { Busy }
> adapter
> > disque annonce un status busy du DMA
>
> If I'm reading the translation right then your hard
> disk decided
> it was busy and then never came back
>
> > Feb 12 04:46:23 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset: success
>
> So the IDE layer tried to reset it
>
> > Feb 12 10:22:38 porte_de_clignancourt_nds_b
> kernel:
> > hda: timeout waiting for DMA
>
> Which didnt help
>
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset: success
>
> Still trying
>
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: irq timeout: status=0xd0 { Busy }
>
> >
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: DMA disabled
>
> We gave up on DMA to see if PIO would help
> >
> >
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset timed-out, status=0x80
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: status timeout: status=0x80 { Busy }
> > nouvel ?chec de reset
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: drive not ready for command
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset: success
>
> And reset..
>
>
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > hda: status timeout: status=0x80 { Busy }
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > hda: drive not ready for command
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset timed-out, status=0x80
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > end_request: I/O error, dev 03:02 (hda), sector
> 102263
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> syslogd:
> > /var/log/maillog: Input/output error
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > end_request: I/O error, dev 03:02 (hda), sector
> 110720
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > end_request: I/O error, dev 03:02 (hda), sector
> 110728
>
> Eventually we give up.
>
>
> First thing to check would be the disk and the
> temperature, then the
> cabling. In particular make sure the *long* part of
> the cable is between
> the drive and the controller.
>
>

___________________________________________________________________________
Nouveau : t?l?phonez moins cher avec Yahoo! Messenger ! D?couvez les tarifs exceptionnels pour appeler la France et l'international.
T?l?chargez sur http://fr.messenger.yahoo.com

2006-03-28 15:28:28

by Alan

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

On Maw, 2006-03-28 at 17:07 +0200, zine el abidine Hamid wrote:
> I don't think that it's a HDD problem nor a cable
> problem because the servers are new. We have tried

New. Thats another word for "untested" I believe 8)

> How can I determine the problem?

I would consult the hardware vendor

> I want to add that the HDD seems to be disconnected
> (the BIOS can't find any drive for boot) after a
> simple reset. We must switch off the servers to get
> them work again.

Thats strongly indicating a hardware problem.

> (http://www.tldp.org/LDP/lkmpg/) but how can I
> shutdown Linux from inside a module...?

See the softdog driver for an example.

2006-03-28 17:55:47

by Gene Heskett

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

On Tuesday 28 March 2006 10:07, zine el abidine Hamid wrote:
>First of all, thank you for your analysis.
>
>I don't think that it's a HDD problem nor a cable
>problem because the servers are new. We have tried
>different HDD (seagate, maxtor) but it has not help
>anyway.
>It's perhaps a temperature problem but we make a lot
>tests in hard condition (high temperature)
>successfuly...
>
>One thinks that the problem comes from the VIA chipset
>VT82c686 (it's also the opinion of Dick Johnson
>(linux-os) whom advised me to try UDMA33 instead of
>UDMA66).
>
>How can I determine the problem?
>
>I want to add that the HDD seems to be disconnected
>(the BIOS can't find any drive for boot) after a
>simple reset. We must switch off the servers to get
>them work again.
>However, it takes a long time (4 mounths and more)
>before the HDD fell down. I want to work around by
>write a module which will supervise the HDD. I know
>how to write a module (I used the lkmpg guide
>(http://www.tldp.org/LDP/lkmpg/) but how can I
>shutdown Linux from inside a module...?
>
>best regards.
>
>Zine.

I take it that you are aware of a drive monitoring utility called
smartd? By querying the drive after a new powerup, you may be able to
extract usefull information about its health.

>--- Alan Cox <[email protected]> a ?crit :
>> On Llu, 2006-03-27 at 16:55 +0200, zine el abidine
>>
>> Hamid wrote:
>> > hda: status timeout: status=0xd0 { Busy }
>>
>> adapter
>>
>> > disque annonce un status busy du DMA
>>
>> If I'm reading the translation right then your hard
>> disk decided
>> it was busy and then never came back
>>
>> > Feb 12 04:46:23 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > ide0: reset: success
>>
>> So the IDE layer tried to reset it
>>
>> > Feb 12 10:22:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: timeout waiting for DMA
>>
>> Which didnt help
>>
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > ide0: reset: success
>>
>> Still trying
>>
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: irq timeout: status=0xd0 { Busy }
>> >
>> >
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: DMA disabled
>>
>> We gave up on DMA to see if PIO would help
>>
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > ide0: reset timed-out, status=0x80
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: status timeout: status=0x80 { Busy }
>> > nouvel ?chec de reset
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: drive not ready for command
>> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > ide0: reset: success
>>
>> And reset..
>>
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: status timeout: status=0x80 { Busy }
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > hda: drive not ready for command
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > ide0: reset timed-out, status=0x80
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > end_request: I/O error, dev 03:02 (hda), sector
>>
>> 102263
>>
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> syslogd:
>> > /var/log/maillog: Input/output error
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > end_request: I/O error, dev 03:02 (hda), sector
>>
>> 110720
>>
>> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
>>
>> kernel:
>> > end_request: I/O error, dev 03:02 (hda), sector
>>
>> 110728
>>
>> Eventually we give up.
>>
>>
>> First thing to check would be the disk and the
>> temperature, then the
>> cabling. In particular make sure the *long* part of
>> the cable is between
>> the drive and the controller.

--
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules. I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-03-30 08:14:55

by zine el abidine Hamid

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

I know about smartd, but the HDD are ok. When the
problem happen's, we have to switch off/on the servers
and then go on without any errors; The servers work's
after that like nothing happen's...

--- Gene Heskett <[email protected]> a ?crit :

> On Tuesday 28 March 2006 10:07, zine el abidine
> Hamid wrote:
> >First of all, thank you for your analysis.
> >
> >I don't think that it's a HDD problem nor a cable
> >problem because the servers are new. We have tried
> >different HDD (seagate, maxtor) but it has not help
> >anyway.
> >It's perhaps a temperature problem but we make a
> lot
> >tests in hard condition (high temperature)
> >successfuly...
> >
> >One thinks that the problem comes from the VIA
> chipset
> >VT82c686 (it's also the opinion of Dick Johnson
> >(linux-os) whom advised me to try UDMA33 instead of
> >UDMA66).
> >
> >How can I determine the problem?
> >
> >I want to add that the HDD seems to be disconnected
> >(the BIOS can't find any drive for boot) after a
> >simple reset. We must switch off the servers to get
> >them work again.
> >However, it takes a long time (4 mounths and more)
> >before the HDD fell down. I want to work around by
> >write a module which will supervise the HDD. I know
> >how to write a module (I used the lkmpg guide
> >(http://www.tldp.org/LDP/lkmpg/) but how can I
> >shutdown Linux from inside a module...?
> >
> >best regards.
> >
> >Zine.
>
> I take it that you are aware of a drive monitoring
> utility called
> smartd? By querying the drive after a new powerup,
> you may be able to
> extract usefull information about its health.
>
> >--- Alan Cox <[email protected]> a ?crit :
> >> On Llu, 2006-03-27 at 16:55 +0200, zine el
> abidine
> >>
> >> Hamid wrote:
> >> > hda: status timeout: status=0xd0 { Busy }
> >>
> >> adapter
> >>
> >> > disque annonce un status busy du DMA
> >>
> >> If I'm reading the translation right then your
> hard
> >> disk decided
> >> it was busy and then never came back
> >>
> >> > Feb 12 04:46:23 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > ide0: reset: success
> >>
> >> So the IDE layer tried to reset it
> >>
> >> > Feb 12 10:22:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: timeout waiting for DMA
> >>
> >> Which didnt help
> >>
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > ide0: reset: success
> >>
> >> Still trying
> >>
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: irq timeout: status=0xd0 { Busy }
> >> >
> >> >
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: DMA disabled
> >>
> >> We gave up on DMA to see if PIO would help
> >>
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > ide0: reset timed-out, status=0x80
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: status timeout: status=0x80 { Busy }
> >> > nouvel ?chec de reset
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: drive not ready for command
> >> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > ide0: reset: success
> >>
> >> And reset..
> >>
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: status timeout: status=0x80 { Busy }
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > hda: drive not ready for command
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > ide0: reset timed-out, status=0x80
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > end_request: I/O error, dev 03:02 (hda), sector
> >>
> >> 102263
> >>
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> syslogd:
> >> > /var/log/maillog: Input/output error
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > end_request: I/O error, dev 03:02 (hda), sector
> >>
> >> 110720
> >>
> >> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> >>
> >> kernel:
> >> > end_request: I/O error, dev 03:02 (hda), sector
> >>
> >> 110728
> >>
> >> Eventually we give up.
> >>
> >>
> >> First thing to check would be the disk and the
> >> temperature, then the
> >> cabling. In particular make sure the *long* part
> of
> >> the cable is between
> >> the drive and the controller.
>
> --
> Cheers, Gene
> People having trouble with vz bouncing email to me
> should add the word
> 'online' between the 'verizon', and the dot which
> bypasses vz's
> stupid bounce rules. I do use spamassassin too. :-)
> Yahoo.com and AOL/TW attorneys please note,
> additions to the above
> message by Gene Heskett are:
> Copyright 2006 by Maurice Eugene Heskett, all rights
> reserved.
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

___________________________________________________________________________
Nouveau : t?l?phonez moins cher avec Yahoo! Messenger ! D?couvez les tarifs exceptionnels pour appeler la France et l'international.
T?l?chargez sur http://fr.messenger.yahoo.com

2006-03-30 12:28:37

by be-news06

[permalink] [raw]

Subject: Re: Detecting I/O error and Halting System

Alan Cox <[email protected]> wrote:
> See the softdog driver for an example.

The usermode agent (watchdog(8) can, btw monitor the availableness of a
file, no need to write a module. MAybe this feature was added after somebody
took that code over from you? :)

watchdog.conf(5) says:

# file = <filename> Set file name for file mode. This option can be
# given as often as you like to check several files.

Bernd