2007-10-10 08:28:55

by Jimmy

[permalink] [raw]
Subject: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

I get this on brand new hardware, 2xHitachi Deathstar 320gb SATA2
(sata_via driver)

I get this a lot, the disk makes some sound after heavy IO and then the
system hangs for a few seconds, then this comes up:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd 25/00:00:3f:76:30/00:04:00:00:00/e0 tag 0 cdb 0x0 data 524288 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: soft resetting port
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
raid1: Disk failure on sdb1, disabling device.

This is on kernel 2.6.23


2007-10-10 19:17:28

by Greg Cormier

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

I'd like to hop in on this, and add my similar problem. This is my
first post so please excuse me if I'm doing something wrong.

I've been having issues recently (couple of weeks?) with my server. I
have three WD5000YS (500gb) drives in RAID5, on an Asus A8N
motherboard which is nForce 4. I've even RMA'd one of the drives, but
now I'm thinking the drives are fine.

The drive seems to have issues under heavy to moderate IO. I unmounted
my raid, and forced an e2fsck. e2fsck didn't even print anything out,
I got this.

Oct 10 14:50:40 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next cpb idx 0x0
Oct 10 14:50:40 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Oct 10 14:50:40 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
Oct 10 14:50:40 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
Oct 10 14:50:40 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x1c00000 action 0x2 frozen
Oct 10 14:50:40 zeus kernel: ata3.00: cmd
61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 10 14:50:40 zeus kernel: res
40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 10 14:50:40 zeus kernel: ata3: soft resetting port
Oct 10 14:50:40 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 10 14:50:40 zeus kernel: ata3.00: configured for UDMA/133
Oct 10 14:50:40 zeus kernel: ata3: EH complete
Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Oct 10 14:51:40 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next cpb idx 0x0
Oct 10 14:51:40 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Oct 10 14:51:40 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
Oct 10 14:51:40 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
Oct 10 14:51:40 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x400000 action 0x2 frozen
Oct 10 14:51:40 zeus kernel: ata3.00: cmd
61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 10 14:51:40 zeus kernel: res
40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 10 14:51:41 zeus kernel: ata3: soft resetting port
Oct 10 14:51:41 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 10 14:51:41 zeus kernel: ata3.00: configured for UDMA/133
Oct 10 14:51:41 zeus kernel: ata3: EH complete
Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Oct 10 14:52:19 zeus kernel: device eth0 left promiscuous mode
Oct 10 14:52:41 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next cpb idx 0x0
Oct 10 14:52:41 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Oct 10 14:52:41 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
Oct 10 14:52:41 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
Oct 10 14:52:41 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
SErr 0x400000 action 0x2 frozen
Oct 10 14:52:41 zeus kernel: ata3.00: cmd
61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 10 14:52:41 zeus kernel: res
40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 10 14:52:41 zeus kernel: ata3: soft resetting port
Oct 10 14:52:42 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 10 14:52:42 zeus kernel: ata3.00: configured for UDMA/133
Oct 10 14:52:42 zeus kernel: ata3: EH complete
Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

These errors have been happening on various .22 kernels, and this
message is from the hot-off-the-press .23 kernel. This message is
followed by a hard freeze.

I'm in the process of figuring out why netconsole isn't quite working,
so hopefully I can provide more information soon. The server is
currently frozen, when I get home I can perhaps provide more
information? lspci?

Looks like another rebuild of the array when I get home.


Thanks,
Greg

2007-10-11 02:37:54

by Steen Eugen Poulsen

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Greg Cormier skrev:
> I've been having issues recently (couple of weeks?) with my server. I
> have three WD5000YS (500gb) drives in RAID5, on an Asus A8N
> motherboard which is nForce 4. I've even RMA'd one of the drives, but
> now I'm thinking the drives are fine.

Intel/VIA motherboard and software raid. Same errors in dmesg though.

Machine ran for a month with no issues, then one disk died. After boot
one disk was missing, additional reboots and both disks is again running.

I'm thinking broken hardware and not a kernel issue on this, but I'm a
software guy, so there could be a lot I don't understand here.



Attachments:
smime.p7s (3.33 kB)
S/MIME Cryptographic Signature

2007-10-13 00:42:36

by Andrew Morton

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

On Wed, 10 Oct 2007 10:28:45 +0200 (CEST)
[email protected] wrote:

> I get this on brand new hardware, 2xHitachi Deathstar 320gb SATA2
> (sata_via driver)
>
> I get this a lot, the disk makes some sound after heavy IO and then the
> system hangs for a few seconds, then this comes up:
>
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: cmd 25/00:00:3f:76:30/00:04:00:00:00/e0 tag 0 cdb 0x0 data 524288 in
> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata1: port is slow to respond, please be patient (Status 0xd0)
> ata1: soft resetting port
> ata1.00: configured for UDMA/133
> ata1: EH complete
> sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> raid1: Disk failure on sdb1, disabling device.
>
> This is on kernel 2.6.23
>

(added linux-ide)

2007-10-13 00:54:58

by Andrew Morton

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

On Wed, 10 Oct 2007 15:17:19 -0400
"Greg Cormier" <[email protected]> wrote:

> I'd like to hop in on this, and add my similar problem. This is my
> first post so please excuse me if I'm doing something wrong.

Please cc [email protected] on ide, sata and pata reports.

A "hard freeze" is fairly serious.

> I've been having issues recently (couple of weeks?) with my server. I
> have three WD5000YS (500gb) drives in RAID5, on an Asus A8N
> motherboard which is nForce 4. I've even RMA'd one of the drives, but
> now I'm thinking the drives are fine.
>
> The drive seems to have issues under heavy to moderate IO. I unmounted
> my raid, and forced an e2fsck. e2fsck didn't even print anything out,
> I got this.
>
> Oct 10 14:50:40 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
> notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
> next cpb idx 0x0
> Oct 10 14:50:40 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
> Oct 10 14:50:40 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
> Oct 10 14:50:40 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
> Oct 10 14:50:40 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
> SErr 0x1c00000 action 0x2 frozen
> Oct 10 14:50:40 zeus kernel: ata3.00: cmd
> 61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
> Oct 10 14:50:40 zeus kernel: res
> 40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Oct 10 14:50:40 zeus kernel: ata3: soft resetting port
> Oct 10 14:50:40 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
> SControl 300)
> Oct 10 14:50:40 zeus kernel: ata3.00: configured for UDMA/133
> Oct 10 14:50:40 zeus kernel: ata3: EH complete
> Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
> hardware sectors (500108 MB)
> Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
> Oct 10 14:50:40 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Oct 10 14:51:40 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
> notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
> next cpb idx 0x0
> Oct 10 14:51:40 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
> Oct 10 14:51:40 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
> Oct 10 14:51:40 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
> Oct 10 14:51:40 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
> SErr 0x400000 action 0x2 frozen
> Oct 10 14:51:40 zeus kernel: ata3.00: cmd
> 61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
> Oct 10 14:51:40 zeus kernel: res
> 40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Oct 10 14:51:41 zeus kernel: ata3: soft resetting port
> Oct 10 14:51:41 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
> SControl 300)
> Oct 10 14:51:41 zeus kernel: ata3.00: configured for UDMA/133
> Oct 10 14:51:41 zeus kernel: ata3: EH complete
> Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
> hardware sectors (500108 MB)
> Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
> Oct 10 14:51:41 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Oct 10 14:52:19 zeus kernel: device eth0 left promiscuous mode
> Oct 10 14:52:41 zeus kernel: ata3: EH in ADMA mode, notifier 0x0
> notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
> next cpb idx 0x0
> Oct 10 14:52:41 zeus kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
> Oct 10 14:52:41 zeus kernel: ata3: timeout waiting for ADMA IDLE, stat=0x400
> Oct 10 14:52:41 zeus kernel: ata3: timeout waiting for ADMA LEGACY, stat=0x400
> Oct 10 14:52:41 zeus kernel: ata3.00: exception Emask 0x0 SAct 0x1
> SErr 0x400000 action 0x2 frozen
> Oct 10 14:52:41 zeus kernel: ata3.00: cmd
> 61/08:00:bf:4b:38/00:00:3a:00:00/40 tag 0 cdb 0x0 data 4096 out
> Oct 10 14:52:41 zeus kernel: res
> 40/00:f2:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Oct 10 14:52:41 zeus kernel: ata3: soft resetting port
> Oct 10 14:52:42 zeus kernel: ata3: SATA link up 3.0 Gbps (SStatus 123
> SControl 300)
> Oct 10 14:52:42 zeus kernel: ata3.00: configured for UDMA/133
> Oct 10 14:52:42 zeus kernel: ata3: EH complete
> Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] 976773168 512-byte
> hardware sectors (500108 MB)
> Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] Write Protect is off
> Oct 10 14:52:42 zeus kernel: sd 2:0:0:0: [sdb] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
>
> These errors have been happening on various .22 kernels, and this
> message is from the hot-off-the-press .23 kernel. This message is
> followed by a hard freeze.
>
> I'm in the process of figuring out why netconsole isn't quite working,
> so hopefully I can provide more information soon. The server is
> currently frozen, when I get home I can perhaps provide more
> information? lspci?
>
> Looks like another rebuild of the array when I get home.
>
>
> Thanks,
> Greg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2007-10-13 05:50:26

by Steen Eugen Poulsen

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Andrew Morton skrev:
> On Wed, 10 Oct 2007 10:28:45 +0200 (CEST)
> [email protected] wrote:
>
>> I get this on brand new hardware, 2xHitachi Deathstar 320gb SATA2
>> (sata_via driver)

Sep 28 04:32:40 locker ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x2 frozen
Sep 28 04:32:40 locker ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00
tag 0 cdb 0x0 data 123392 in
Sep 28 04:32:40 locker res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask
0x202 (HSM violation)
Sep 28 04:32:41 locker current size: 625140335 sectors

Sep 28 04:32:41 locker native size: 625142448 sectors

Sep 28 04:32:41 locker current size: 625140335 sectors

Sep 28 04:32:41 locker native size: 625142448 sectors

Another machine:

Sep 28 03:47:55 dragonslair ata1.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x2 frozen
Sep 28 03:47:55 dragonslair ata1.00: cmd
b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 126976 in

Sep 28 03:47:55 dragonslair res 50/00:f8:00:4f:c2/00:00:00:00:00/00
Emask 0x202 (HSM violation)
Sep 28 03:47:55 dragonslair ata1: soft resetting port

Sep 28 03:47:55 dragonslair ata1.00: configured for UDMA/133

Sep 28 03:47:55 dragonslair ata1: EH complete

Sep 28 03:47:55 dragonslair sd 0:0:0:0: [sda] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Sep 28 03:47:55 dragonslair sd 0:0:0:0: [sda] 156250000 512-byte
hardware sectors (80000 MB)
Sep 28 03:47:55 dragonslair sd 0:0:0:0: [sda] Write Protect is off

Sep 28 03:47:55 dragonslair sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00

Sep 28 03:47:55 dragonslair sd 0:0:0:0: [sda] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA

And yet another:

Sep 28 04:33:52 liferaft kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Sep 28 04:33:55 liferaft kernel: ata1.00: cmd
b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in
Sep 28 04:33:55 liferaft kernel: res
50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation)
Sep 28 04:33:55 liferaft kernel: ata1: soft resetting port
Sep 28 04:33:55 liferaft kernel: ata1: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Sep 28 04:33:55 liferaft kernel: ata1.00: configured for UDMA/133
Sep 28 04:33:55 liferaft kernel: ata1: EH complete



Another few cases, taken from semi random locations from my log to get
the many different data, maybe some of it can help out.

Weirdness 1: I have 3 machines, that decide to spew this garbage within
the same second? (smartd running at it is around the hour that smartd
would run, but it's just this one day Sep 28 that horrible bad)

Note 1: Bad/failing hardware creates these type of errors.

Note 2: The hardware didn't freeze for me and I believe the freeze is do
to swap breaking due to the errors.

Note 3: dragonslair's harddisk actually crashed, kernel didn't die, it
just remounted read only. Reboot and the disk was missing, more reboot
and the machine started with all disks running again, been stable since
the 28th Sep. (knock on wood)

Note 4: I've changed hardware and kernel in a non controled manner, so I
was waiting for another case of these errors where I would be able to
write down kernel config.

I'm not sure, but I do believe that a keyword with this stuff is SMP and
2.6.22, older kernels doesn't seem to trigger this and non SMP seems to
avoid it with 2.6.22, but I can't trigger the error, so there is no way
of knowing if the conclusion can be trusted.

Dragonslair:
P4 x2 3.0 Ghz
Chips Intel 865GV & ICH5
32bit SMP kernel (2.6.22)
2 SATA disks WDC WD800JD-75MSA3
(I'm guessing this one has a physical bad disk, since it's the only one
the disk has physically failed and the only one with a worrying SMART
error: 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 5)

Locker:
AMD 64x2
Chips Nvidia 570
32bit SMP kernel (2.6.22)
6 SATA disks 2xWD3200YS-01PGB0 4xWD3200AAKS-00TMA0

Liferaft:
AMD 64x2
Chips Nvidia 590
32bit SMP kernel (vserver 2.6.22 based)
1 SATA disk WD2500KS-00MJB0


Attachments:
smime.p7s (3.33 kB)
S/MIME Cryptographic Signature

2007-10-23 09:56:19

by Tejun Heo

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Hello,

Steen Eugen Poulsen wrote:
> Sep 28 04:32:40 locker ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
> action 0x2 frozen
> Sep 28 04:32:40 locker ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00
> tag 0 cdb 0x0 data 123392 in
> Sep 28 04:32:40 locker res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask
> 0x202 (HSM violation)
[--snip--]
> Another machine:
>
> Sep 28 03:47:55 dragonslair ata1.00: exception Emask 0x0 SAct 0x0 SErr
> 0x0 action 0x2 frozen
> Sep 28 03:47:55 dragonslair ata1.00: cmd
> b0/db:f8:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 126976 in
>
> Sep 28 03:47:55 dragonslair res 50/00:f8:00:4f:c2/00:00:00:00:00/00
> Emask 0x202 (HSM violation)
[--snip--]
> Sep 28 04:33:52 liferaft kernel: ata1.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2 frozen
> Sep 28 04:33:55 liferaft kernel: ata1.00: cmd
> b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in
> Sep 28 04:33:55 liferaft kernel: res
> 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation)
> Sep 28 04:33:55 liferaft kernel: ata1: soft resetting port
> Sep 28 04:33:55 liferaft kernel: ata1: SATA link up 3.0 Gbps (SStatus
> 123 SControl 300)
> Sep 28 04:33:55 liferaft kernel: ata1.00: configured for UDMA/133
> Sep 28 04:33:55 liferaft kernel: ata1: EH complete

All these are caused by smartd. Updating should fix the problem.

> Note 2: The hardware didn't freeze for me and I believe the freeze is do
> to swap breaking due to the errors.

Above HSM violations should be harmless other than those messages.
libata resets the devices and should just go on.

> Note 3: dragonslair's harddisk actually crashed, kernel didn't die, it
> just remounted read only. Reboot and the disk was missing, more reboot
> and the machine started with all disks running again, been stable since
> the 28th Sep. (knock on wood)

libata EH can't really recover from actual hardware failures but some
drives come back on if you hot unplug and then replug it.

--
tejun

2007-10-26 01:40:54

by Tejun Heo

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

[please don't drop cc. restored]

Steen Eugen Poulsen wrote:
> Tejun Heo skrev:
>> All these are caused by smartd. Updating should fix the problem.
>
> Okay, but there is no newer smartd than what I'm using. (5.37)

Bruce? Original thread can be read from...

http://thread.gmane.org/gmane.linux.kernel/588972

--
tejun

2007-10-26 04:03:08

by Jim Paris

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Tejun Heo wrote:
> [please don't drop cc. restored]
>
> Steen Eugen Poulsen wrote:
> >Tejun Heo skrev:
> >>All these are caused by smartd. Updating should fix the problem.
> >
> >Okay, but there is no newer smartd than what I'm using. (5.37)
>
> Bruce? Original thread can be read from...
>
> http://thread.gmane.org/gmane.linux.kernel/588972

The fixes were added in smartmontools CVS, but there hasn't been a
release since then.

-jim

2007-11-06 10:05:32

by Bruce Allen

[permalink] [raw]
Subject: Re: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

>>>> All these are caused by smartd. Updating should fix the problem.
>>>
>>> Okay, but there is no newer smartd than what I'm using. (5.37)
>>
>> Bruce? Original thread can be read from...
>>
>> http://thread.gmane.org/gmane.linux.kernel/588972
>
> The fixes were added in smartmontools CVS, but there hasn't been a
> release since then.

I think we'll do a new smartmontools release fairly soon.

Cheers,
Bruce