2013-03-28 02:01:43

by Ken Moffat

[permalink] [raw]
Subject: smartd broken in 3.9.0-rc4

Hi,

just tested my first 3.9 kernel today. During boot, smartd (from
smartmontools-6.0) fails to start. Works fine in 3.8.4.

In 3.8.4 I get messages like this :

Mar 27 22:02:02 ac4tv smartd[3981]: smartd 6.0 2012-10-10 r3643
[x86_64-linux-3.8.4] (local build)
Mar 27 22:02:02 ac4tv smartd[3981]: Copyright (C) 2002-12, Bruce
Allen, Christian Franke, http://www.smartmontools.org
Mar 27 22:02:02 ac4tv smartd[3981]: Opened configuration file
/etc/smartd.conf
Mar 27 22:02:02 ac4tv smartd[3981]: Configuration file
/etc/smartd.conf parsed.
Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, opened
Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, WDC
WD5000AAKX-001CA0, S/N:WD-WMAYU6818967, WWN:5-0014ee-0ad9caed4,
FW:15.01H15, 500 GB
Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, found in
smartd database: Western Digital Caviar Blue Serial ATA
Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, enabled SMART
Automatic Offline Testing.
Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, is SMART
capable. Adding to "monitor" list.
Mar 27 22:02:02 ac4tv smartd[3981]: Monitoring 1 ATA and 0 SCSI
devices

but in 3.9.0-rc4 all I get is
Mar 28 00:39:32 ac4tv smartd[2487]: Copyright (C) 2002-12, Bruce
Allen, Christian Franke, http://www.smartmontools.org
Mar 28 00:39:32 ac4tv smartd[2487]: Opened configuration file
/etc/smartd.conf
Mar 28 00:39:32 ac4tv smartd[2487]: Configuration file
/etc/smartd.conf parsed.
Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, opened
Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, WDC
WD5000AAKX-001CA0, S/N:WD-WMAYU6818967, WWN:5-0014ee-0ad9caed4,
FW:15.01H15, 500 GB
Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, found in
smartd database: Western Digital Caviar Blue Serial ATA
Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, could not
enable SMART capability
Mar 28 00:39:32 ac4tv smartd[2487]: Unable to register ATA device
/dev/sda at line 1 of file /etc/smartd.conf
Mar 28 00:39:32 ac4tv smartd[2487]: Unable to register device
/dev/sda (no Directive -d removable). Exiting.

Using strace, in the failing version I get
2643 ioctl(3, HDIO_DRIVE_CMD, 0x7fff53288a60) = -1 EIO (Input/output error)

instead of the normal
3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437340) = 0
3981 ioctl(3, HDIO_DRIVE_TASK, 0x7fff04437350) = 0
3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437330) = 0
3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437330) = 0
3981 ioctl(3, HDIO_DRIVE_TASK, 0x7fff04437340) = 0

Looks like I'll be bisecting.

ken
--
das eine Mal als Trag?die, das andere Mal als Farce


2013-03-28 20:59:22

by Ken Moffat

[permalink] [raw]
Subject: Re: smartd broken in 3.9.0-rc4 : bisected

On Thu, Mar 28, 2013 at 01:01:48AM +0000, Ken Moffat wrote:

Adding Cc:s, further details at the end.
> Hi,
>
> just tested my first 3.9 kernel today. During boot, smartd (from
> smartmontools-6.0) fails to start. Works fine in 3.8.4.
>
> In 3.8.4 I get messages like this :
>
> Mar 27 22:02:02 ac4tv smartd[3981]: smartd 6.0 2012-10-10 r3643
> [x86_64-linux-3.8.4] (local build)
> Mar 27 22:02:02 ac4tv smartd[3981]: Copyright (C) 2002-12, Bruce
> Allen, Christian Franke, http://www.smartmontools.org
> Mar 27 22:02:02 ac4tv smartd[3981]: Opened configuration file
> /etc/smartd.conf
> Mar 27 22:02:02 ac4tv smartd[3981]: Configuration file
> /etc/smartd.conf parsed.
> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, opened
> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, WDC
> WD5000AAKX-001CA0, S/N:WD-WMAYU6818967, WWN:5-0014ee-0ad9caed4,
> FW:15.01H15, 500 GB
> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, found in
> smartd database: Western Digital Caviar Blue Serial ATA
> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, enabled SMART
> Automatic Offline Testing.
> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, is SMART
> capable. Adding to "monitor" list.
> Mar 27 22:02:02 ac4tv smartd[3981]: Monitoring 1 ATA and 0 SCSI
> devices
>
> but in 3.9.0-rc4 all I get is
> Mar 28 00:39:32 ac4tv smartd[2487]: Copyright (C) 2002-12, Bruce
> Allen, Christian Franke, http://www.smartmontools.org
> Mar 28 00:39:32 ac4tv smartd[2487]: Opened configuration file
> /etc/smartd.conf
> Mar 28 00:39:32 ac4tv smartd[2487]: Configuration file
> /etc/smartd.conf parsed.
> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, opened
> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, WDC
> WD5000AAKX-001CA0, S/N:WD-WMAYU6818967, WWN:5-0014ee-0ad9caed4,
> FW:15.01H15, 500 GB
> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, found in
> smartd database: Western Digital Caviar Blue Serial ATA
> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, could not
> enable SMART capability
> Mar 28 00:39:32 ac4tv smartd[2487]: Unable to register ATA device
> /dev/sda at line 1 of file /etc/smartd.conf
> Mar 28 00:39:32 ac4tv smartd[2487]: Unable to register device
> /dev/sda (no Directive -d removable). Exiting.
>
> Using strace, in the failing version I get
> 2643 ioctl(3, HDIO_DRIVE_CMD, 0x7fff53288a60) = -1 EIO (Input/output error)
>
> instead of the normal
> 3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437340) = 0
> 3981 ioctl(3, HDIO_DRIVE_TASK, 0x7fff04437350) = 0
> 3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437330) = 0
> 3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437330) = 0
> 3981 ioctl(3, HDIO_DRIVE_TASK, 0x7fff04437340) = 0
>
> Looks like I'll be bisecting.
>
> ken
Bisection blames :

commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63
Author: Gwendal Grignou <[email protected]>
Date: Fri Jan 18 10:56:43 2013 -0800

[libata] Set proper SK when CK_COND is set.

When the user application sends a ATA_12 or ATA_16 PASSTHROUGH
scsi command, put the task file register in the sense data with the
proper Sense Key. Instead of NO SENSE, set RECOVERED, as
specified in [SAT2]12.2.5 Table 92.

That reverts cleanly from 3.9.0-rc4, and with it reverted smartd
works again. Obviously that does nothing to fix whatever problem
this was supposed to fix. I can test any follow-up patches if
needed.

ken
--
das eine Mal als Trag?die, das andere Mal als Farce

2013-03-29 00:56:24

by Gwendal Grignou

[permalink] [raw]
Subject: Re: smartd broken in 3.9.0-rc4 : bisected

My mistake.
In ata_cmd_ioctl(), we clean the results and remove CHECK_CONDITION
when necessary. I did not update that code for the new Sense Key:
if (cmd_result & SAM_STAT_CHECK_CONDITION) {
struct scsi_sense_hdr sshdr;
scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE,
&sshdr);
if (sshdr.sense_key == 0 &&
sshdr.asc == 0 && sshdr.ascq == 0)
cmd_result &= ~SAM_STAT_CHECK_CONDITION;
}

A patch in progress.
Gwendal.

On Thu, Mar 28, 2013 at 1:59 PM, Ken Moffat <[email protected]> wrote:
> On Thu, Mar 28, 2013 at 01:01:48AM +0000, Ken Moffat wrote:
>
> Adding Cc:s, further details at the end.
>> Hi,
>>
>> just tested my first 3.9 kernel today. During boot, smartd (from
>> smartmontools-6.0) fails to start. Works fine in 3.8.4.
>>
>> In 3.8.4 I get messages like this :
>>
>> Mar 27 22:02:02 ac4tv smartd[3981]: smartd 6.0 2012-10-10 r3643
>> [x86_64-linux-3.8.4] (local build)
>> Mar 27 22:02:02 ac4tv smartd[3981]: Copyright (C) 2002-12, Bruce
>> Allen, Christian Franke, http://www.smartmontools.org
>> Mar 27 22:02:02 ac4tv smartd[3981]: Opened configuration file
>> /etc/smartd.conf
>> Mar 27 22:02:02 ac4tv smartd[3981]: Configuration file
>> /etc/smartd.conf parsed.
>> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, opened
>> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, WDC
>> WD5000AAKX-001CA0, S/N:WD-WMAYU6818967, WWN:5-0014ee-0ad9caed4,
>> FW:15.01H15, 500 GB
>> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, found in
>> smartd database: Western Digital Caviar Blue Serial ATA
>> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, enabled SMART
>> Automatic Offline Testing.
>> Mar 27 22:02:02 ac4tv smartd[3981]: Device: /dev/sda, is SMART
>> capable. Adding to "monitor" list.
>> Mar 27 22:02:02 ac4tv smartd[3981]: Monitoring 1 ATA and 0 SCSI
>> devices
>>
>> but in 3.9.0-rc4 all I get is
>> Mar 28 00:39:32 ac4tv smartd[2487]: Copyright (C) 2002-12, Bruce
>> Allen, Christian Franke, http://www.smartmontools.org
>> Mar 28 00:39:32 ac4tv smartd[2487]: Opened configuration file
>> /etc/smartd.conf
>> Mar 28 00:39:32 ac4tv smartd[2487]: Configuration file
>> /etc/smartd.conf parsed.
>> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, opened
>> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, WDC
>> WD5000AAKX-001CA0, S/N:WD-WMAYU6818967, WWN:5-0014ee-0ad9caed4,
>> FW:15.01H15, 500 GB
>> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, found in
>> smartd database: Western Digital Caviar Blue Serial ATA
>> Mar 28 00:39:32 ac4tv smartd[2487]: Device: /dev/sda, could not
>> enable SMART capability
>> Mar 28 00:39:32 ac4tv smartd[2487]: Unable to register ATA device
>> /dev/sda at line 1 of file /etc/smartd.conf
>> Mar 28 00:39:32 ac4tv smartd[2487]: Unable to register device
>> /dev/sda (no Directive -d removable). Exiting.
>>
>> Using strace, in the failing version I get
>> 2643 ioctl(3, HDIO_DRIVE_CMD, 0x7fff53288a60) = -1 EIO (Input/output error)
>>
>> instead of the normal
>> 3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437340) = 0
>> 3981 ioctl(3, HDIO_DRIVE_TASK, 0x7fff04437350) = 0
>> 3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437330) = 0
>> 3981 ioctl(3, HDIO_DRIVE_CMD, 0x7fff04437330) = 0
>> 3981 ioctl(3, HDIO_DRIVE_TASK, 0x7fff04437340) = 0
>>
>> Looks like I'll be bisecting.
>>
>> ken
> Bisection blames :
>
> commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63
> Author: Gwendal Grignou <[email protected]>
> Date: Fri Jan 18 10:56:43 2013 -0800
>
> [libata] Set proper SK when CK_COND is set.
>
> When the user application sends a ATA_12 or ATA_16 PASSTHROUGH
> scsi command, put the task file register in the sense data with the
> proper Sense Key. Instead of NO SENSE, set RECOVERED, as
> specified in [SAT2]12.2.5 Table 92.
>
> That reverts cleanly from 3.9.0-rc4, and with it reverted smartd
> works again. Obviously that does nothing to fix whatever problem
> this was supposed to fix. I can test any follow-up patches if
> needed.
>
> ken
> --
> das eine Mal als Trag?die, das andere Mal als Farce

2013-03-29 05:57:26

by Gwendal Grignou

[permalink] [raw]
Subject: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63 changed the sense key
used for returning task registers, but HDIO_DRIVE_CMD ioctl was
not changed accordingly.

Tested: check that SMART ENABLE sent using HDIO_DRIVE_CMD returns 0
instead of EIO.

Signed-off-by: Gwendal Grignou <[email protected]>
---
drivers/ata/libata-scsi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 318b413..e05bf4c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -532,8 +532,8 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
struct scsi_sense_hdr sshdr;
scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE,
&sshdr);
- if (sshdr.sense_key == 0 &&
- sshdr.asc == 0 && sshdr.ascq == 0)
+ if (sshdr.sense_key == RECOVERED_ERROR &&
+ sshdr.asc == 0 && sshdr.ascq == 0x1D)
cmd_result &= ~SAM_STAT_CHECK_CONDITION;
}

--
1.8.1.3

2013-03-29 18:31:10

by Ken Moffat

[permalink] [raw]
Subject: Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

On Thu, Mar 28, 2013 at 10:56:49PM -0700, Gwendal Grignou wrote:

Hmm, not sure. Smartd started and was happy to monitor the disk,
but I got two new messages between 'found in smartd database' and
'is SMART capable. Adding to "monitor" list' -

Mar 29 17:26:42 ac4tv smartd[2481]: Device: /dev/sda, not capable of
SMART Health Status check
Mar 29 17:26:42 ac4tv smartd[2481]: Device: /dev/sda, enable SMART
Automatic Offline Testing failed.

I've seen the first (intermittently) when a drive was starting to
fail, and apparently there was a taskfile issue in the days of 2.6.22
which also caused it to appear. I don't think I've seen the second
of these before.

After going back and forth between the kernel where I reverted your
original patch, and regular rc4 plus this new patch the output from
running smartctl as root all seems to be consistent (including
'Passed' for the health check).

I'm now running with the patch again, and I've started a manual
'long' test (which will take 85 minutes, the default 'offline' is
about 150 minutes).

> commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63 changed the sense key
> used for returning task registers, but HDIO_DRIVE_CMD ioctl was
> not changed accordingly.
>
> Tested: check that SMART ENABLE sent using HDIO_DRIVE_CMD returns 0
> instead of EIO.
>
> Signed-off-by: Gwendal Grignou <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 318b413..e05bf4c 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -532,8 +532,8 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
> struct scsi_sense_hdr sshdr;
> scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE,
> &sshdr);
> - if (sshdr.sense_key == 0 &&
> - sshdr.asc == 0 && sshdr.ascq == 0)
> + if (sshdr.sense_key == RECOVERED_ERROR &&
> + sshdr.asc == 0 && sshdr.ascq == 0x1D)
> cmd_result &= ~SAM_STAT_CHECK_CONDITION;
> }
>
> --
> 1.8.1.3

--
das eine Mal als Trag?die, das andere Mal als Farce

2013-03-29 20:34:49

by Ken Moffat

[permalink] [raw]
Subject: Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

On Fri, Mar 29, 2013 at 06:31:03PM +0000, Ken Moffat wrote:
> On Thu, Mar 28, 2013 at 10:56:49PM -0700, Gwendal Grignou wrote:
>
> Hmm, not sure. Smartd started and was happy to monitor the disk,
> but I got two new messages between 'found in smartd database' and
> 'is SMART capable. Adding to "monitor" list' -
>
> Mar 29 17:26:42 ac4tv smartd[2481]: Device: /dev/sda, not capable of
> SMART Health Status check
> Mar 29 17:26:42 ac4tv smartd[2481]: Device: /dev/sda, enable SMART
> Automatic Offline Testing failed.
>
> I've seen the first (intermittently) when a drive was starting to
> fail, and apparently there was a taskfile issue in the days of 2.6.22
> which also caused it to appear. I don't think I've seen the second
> of these before.
>
> After going back and forth between the kernel where I reverted your
> original patch, and regular rc4 plus this new patch the output from
> running smartctl as root all seems to be consistent (including
> 'Passed' for the health check).
>
> I'm now running with the patch again, and I've started a manual
> 'long' test (which will take 85 minutes, the default 'offline' is
> about 150 minutes).
>

Looks like the problem is confined to smartd, smartctl is
different and working fine. The new messages only come from smartd.cpp :
(sorry, long lines to avoid word wrapping)

// capability check: SMART status
if (cfg.smartcheck && ataSmartStatus2(atadev) == -1) {
PrintOut(LOG_INFO,"Device: %s, not capable of SMART Health Status check\n",name);
cfg.smartcheck = false;
}

and

// enable/disable automatic on-line testing
if (cfg.autoofflinetest) {
// is this an enable or disable request?
const char *what=(cfg.autoofflinetest==1)?"disable":"enable";
if (!smart_val_ok)
PrintOut(LOG_INFO,"Device: %s, could not %s SMART Automatic Offline Testing.\n",name, what);
else {
// if command appears unsupported, issue a warning...
if (!isSupportAutomaticTimer(&state.smartval))
PrintOut(LOG_INFO,"Device: %s, SMART Automatic Offline Testing unsupported...\n",name);
// ... but then try anyway
if ((cfg.autoofflinetest==1)?ataDisableAutoOffline(atadev):ataEnableAutoOffline(atadev))
PrintOut(LOG_INFO,"Device: %s, %s SMART Automatic Offline Testing failed.\n", name, what);
else
PrintOut(LOG_INFO,"Device: %s, %sd SMART Automatic Offline Testing.\n", name, what);
}
}

I've no idea about the details, but it looks to me as if smartd is
still getting different values returned to it. The capability check
normally was ok (silent), the automatic testing normally showed as
'enabled'd.

ken
--
das eine Mal als Trag?die, das andere Mal als Farce

2013-03-29 23:30:26

by Ken Moffat

[permalink] [raw]
Subject: Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

On Fri, Mar 29, 2013 at 08:34:43PM +0000, Ken Moffat wrote:
> On Fri, Mar 29, 2013 at 06:31:03PM +0000, Ken Moffat wrote:

> I've no idea about the details, but it looks to me as if smartd is
> still getting different values returned to it. The capability check
> normally was ok (silent), the automatic testing normally showed as
> 'enabled'd.
>
After that I found today's posts in Krzystof Mazur's earlier
thread. When I first read that I thought it was for a specific
drive, and didn't remember it when I had time to try rc4.

The v2 of the patch (with two hunks) fixes it for me too.

ken
--
das eine Mal als Trag?die, das andere Mal als Farce

2013-04-03 23:49:52

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

On 03/29/2013 01:56 AM, Gwendal Grignou wrote:
> commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63 changed the sense key
> used for returning task registers, but HDIO_DRIVE_CMD ioctl was
> not changed accordingly.
>
> Tested: check that SMART ENABLE sent using HDIO_DRIVE_CMD returns 0
> instead of EIO.
>
> Signed-off-by: Gwendal Grignou <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

applied the version from Krzysztof Mazur, which covered both cases


2013-04-04 18:25:43

by Gwendal Grignou

[permalink] [raw]
Subject: Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

Thank you.
Gwendal.

On Wed, Apr 3, 2013 at 4:49 PM, Jeff Garzik <[email protected]> wrote:
> On 03/29/2013 01:56 AM, Gwendal Grignou wrote:
>>
>> commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63 changed the sense key
>> used for returning task registers, but HDIO_DRIVE_CMD ioctl was
>> not changed accordingly.
>>
>> Tested: check that SMART ENABLE sent using HDIO_DRIVE_CMD returns 0
>> instead of EIO.
>>
>> Signed-off-by: Gwendal Grignou <[email protected]>
>> ---
>> drivers/ata/libata-scsi.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>
> applied the version from Krzysztof Mazur, which covered both cases
>
>
>

2013-04-08 22:07:43

by Ken Moffat

[permalink] [raw]
Subject: Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

On Wed, Apr 03, 2013 at 07:49:48PM -0400, Jeff Garzik wrote:
>
> applied the version from Krzysztof Mazur, which covered both cases
>
>
According to linus's changelog for -rc6, this doesn't seem to have
been included ?

ken
--
das eine Mal als Trag?die, das andere Mal als Farce