LinuxLists.cc - sata_nv + ADMA + Samsung disk problem

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Gabor Gombas wrote:
> Hi,
>
> Since I have upgraded to 2.6.22.1 from 2.6.20 I have problems with
> Samsung disks. Sometimes the disks stall for about half a minute and
> then I have these messages in the logs:
>
> Aug 6 20:10:11 twister kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
> Aug 6 20:10:12 twister kernel: ata7: CPB 0: ctl_flags 0x9, resp_flags 0x0
> Aug 6 20:10:12 twister kernel: ata7: timeout waiting for ADMA IDLE, stat=0x400
> Aug 6 20:10:12 twister kernel: ata7: timeout waiting for ADMA LEGACY, stat=0x400
> Aug 6 20:10:12 twister kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> Aug 6 20:10:12 twister kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
> Aug 6 20:10:12 twister kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Aug 6 20:10:12 twister kernel: ata7: soft resetting port
> Aug 6 20:10:12 twister kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Aug 6 20:10:12 twister kernel: ata7.00: configured for UDMA/133
> Aug 6 20:10:12 twister kernel: ata7: EH complete
> Aug 6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
> Aug 6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] Write Protect is off
> Aug 6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> Aug 6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Aug 6 20:20:25 twister kernel: ata8: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
> Aug 6 20:20:25 twister kernel: ata8: CPB 0: ctl_flags 0x9, resp_flags 0x0
> Aug 6 20:20:25 twister kernel: ata8: timeout waiting for ADMA IDLE, stat=0x400
> Aug 6 20:20:25 twister kernel: ata8: timeout waiting for ADMA LEGACY, stat=0x400
> Aug 6 20:20:25 twister kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> Aug 6 20:20:25 twister kernel: ata8.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
> Aug 6 20:20:25 twister kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Aug 6 20:20:25 twister kernel: ata8: soft resetting port
> Aug 6 20:20:25 twister kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Aug 6 20:20:25 twister kernel: ata8.00: configured for UDMA/133
> Aug 6 20:20:25 twister kernel: ata8: EH complete
> Aug 6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] 488397168 512-byte hardware sectors (250059 MB)
> Aug 6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] Write Protect is off
> Aug 6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> Aug 6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> I also have two Maxtor disks on the same controller but they are working
> correctly in ADMA mode. I now disabled ADMA mode and that seems to help.

Hmmm... That's timeout on cache flush, indicative of failing disk.
Please post the result of 'smartctl -a /dev/sdc'.

--
tejun

2007-08-14 12:02:45

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote:

> Hmmm... That's timeout on cache flush, indicative of failing disk.
> Please post the result of 'smartctl -a /dev/sdc'.

Will do when I get home. Note however that this only occurs in ADMA
mode. It never occured with 2.6.20 and it never occured with 2.6.22 ever
since I have disabled ADMA.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

2007-08-16 16:06:50

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Hi,

On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote:

> Hmmm... That's timeout on cache flush, indicative of failing disk.
> Please post the result of 'smartctl -a /dev/sdc'.

Ok, so something is fishy in 2.6.22 wrt. SMART.

First, booting back to 2.6.20.5 I confirmed that SMART works without any
problems for all 4 disks, so all the following is a regression in
2.6.22.

I have 4 disks: two Maxtors (hdparm -I output below): sda/sdb, and two
Samsung (hdparm -I output is in my previous mail): sdc/sdd.

<==================== cut ====================>
/dev/sda:

ATA device, with non-removable media
Model Number: Maxtor 6B250S0
Serial Number: XXXXXXXX
Firmware Revision: BANC1G10
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 0
Supported: 7 6 5 4
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 490234752
device size with M = 1024*1024: 239372 MBytes
device size with M = 1000*1000: 251000 MBytes (251 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: unknown setting (0x0000)
Recommended acoustic management value: 192, current value: 128
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_VERIFY command
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Advanced Power Management feature set
SET_MAX security extension
* Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
Media Card Pass-Through
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* URG for READ_STREAM[_DMA]_EXT
* URG for WRITE_STREAM[_DMA]_EXT
* SATA-I signaling speed (1.5Gb/s)
* Native Command Queueing (NCQ)
Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
Checksum: correct
<==================== cut ====================>

Under 2.6.22.1, when I try to do "smartctl -d ata -s on /dev/sd[ab]" or
"smartctl -d ata -a /dev/sd[ab]", I get the following error:

<==================== cut ====================>
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model: Maxtor 6B250S0
Serial Number: XXXXXXXX
Firmware Version: BANC1G10
User Capacity: 251,000,193,024 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Wed Aug 15 12:01:38 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Register values returned from SMART Status command are:
CMD=0x50
FR =0x00
NS =0x00
SC =0x00
CL =0xc2
CH =0x00
SEL=0x00
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
<==================== cut ====================>

To repeat, this does not happen under 2.6.20.5. Using "-T permissive" works:

<==================== cut ====================>
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model: Maxtor 6B250S0
Serial Number: XXXXXXXX
Firmware Version: BANC1G10
User Capacity: 251,000,193,024 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Wed Aug 15 12:01:47 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Register values returned from SMART Status command are:
CMD=0x50
FR =0x00
NS =0x00
SC =0x00
CL =0xc2
CH =0x00
SEL=0x00
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (1922) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 99) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 185 183 063 Pre-fail Always - 14093
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 1050
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 247 239 187 Pre-fail Always - 33104
9 Power_On_Minutes 0x0032 246 246 000 Old_age Always - 440h+23m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 251 251 000 Old_age Always - 1057
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 035 253 000 Old_age Always - 38
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 8387
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0
202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age Always - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age Always - 0
209 Offline_Seek_Performnce 0x0024 234 234 000 Old_age Offline - 230
210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
212 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2618 -
# 2 Short offline Completed without error 00% 2600 -
# 3 Short offline Completed without error 00% 2574 -
# 4 Short offline Self-test routine in progress 80% 2574 -
# 5 Short offline Completed without error 00% 2565 -
# 6 Short offline Completed without error 00% 2556 -
# 7 Short offline Completed without error 00% 2550 -
# 8 Extended offline Aborted by host 40% 2550 -
# 9 Short offline Completed without error 00% 2525 -
#10 Short offline Completed without error 00% 2516 -
#11 Short offline Completed without error 00% 2516 -
#12 Short offline Completed without error 00% 2504 -
#13 Short offline Completed without error 00% 2499 -
#14 Short offline Completed without error 00% 2494 -
#15 Short offline Completed without error 00% 2489 -
#16 Short offline Completed without error 00% 2484 -
#17 Short offline Completed without error 00% 2469 -
#18 Short offline Completed without error 00% 2450 -
#19 Short offline Completed without error 00% 2442 -
#20 Short offline Completed without error 00% 2435 -
#21 Short offline Completed without error 00% 2422 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
<==================== cut ====================>

Doing "smartctl -d ata -s on /dev/sd[cd]" or "smartctl -d ata -a
/dev/sd[cd]" works _most of the time_, but sometimes I get errors like
this:

<==================== cut ====================>
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint P120 series
Device Model: SAMSUNG SP2504C
Serial Number: XXXXXXXXXXXXXX
Firmware Version: VT100-33
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a
Local Time is: Wed Aug 15 11:37:35 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Register values returned from SMART Status command are:
CMD=0x50
FR =0x00
NS =0x00
SC =0x49
CL =0x41
CH =0x16
SEL=0x00
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
<==================== cut ====================>

This error is infrequent and not deterministic; I got it 2 times in
about 15-20 tries (I think the second time the SC/CL/CH values were
different but I did not manage to capture that).

When the smart command succeeds, I get the following output:

<==================== cut ====================>
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint P120 series
Device Model: SAMSUNG SP2504C
Serial Number: XXXXXXXXXXXXXX
Firmware Version: VT100-33
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a
Local Time is: Wed Aug 15 11:47:00 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (4867) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 81) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 100 100 025 Pre-fail Always - 6080
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 817
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 11557
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2342
10 Spin_Retry_Count 0x0033 253 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 253 002 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 783
190 Temperature_Celsius 0x0022 148 124 000 Old_age Always - 30
194 Temperature_Celsius 0x0022 148 124 000 Old_age Always - 30
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 2820219
196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 100 000 Old_age Always - 0
202 TA_Increase_Count 0x0032 253 253 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2336 -
# 2 Short offline Completed without error 00% 2318 -
# 3 Short offline Completed without error 00% 2292 -
# 4 Short offline Interrupted (host reset) 80% 2291 -
# 5 Short offline Completed without error 00% 2283 -
# 6 Short offline Completed without error 00% 2273 -
# 7 Short offline Completed without error 00% 2268 -
# 8 Short offline Completed without error 00% 2241 -
# 9 Short offline Completed without error 00% 2233 -
#10 Short offline Completed without error 00% 2233 -
#11 Short offline Completed without error 00% 2221 -
#12 Short offline Completed without error 00% 2216 -
#13 Short offline Completed without error 00% 2210 -
#14 Short offline Completed without error 00% 2206 -
#15 Short offline Completed without error 00% 2200 -
#16 Short offline Completed without error 00% 2185 -
#17 Extended offline Completed without error 00% 2181 -
#18 Short offline Completed without error 00% 2166 -
#19 Short offline Completed without error 00% 2158 -
#20 Extended offline Completed without error 00% 2155 -
#21 Short offline Completed without error 00% 2151 -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
<==================== cut ====================>

The output for the other disk is nearly identical, the differences in
the reported values are insignificant.

SMART shows the same behavior regardless if ADMA is enabled or not, but
when booting with ADMA-enabled I got a SATA exception for sdc in about
10 minutes in single-user mode so basically no I/O load. I never got a
SATA exception with ADMA turned off even during heavy I/O load.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

2007-08-16 18:50:54

by Jim Paris

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Gabor Gombas wrote:
> On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote:
> > Hmmm... That's timeout on cache flush, indicative of failing disk.
> > Please post the result of 'smartctl -a /dev/sdc'.
>
> Ok, so something is fishy in 2.6.22 wrt. SMART.

See http://lkml.org/lkml/2007/7/8/198

-jim

2008-01-01 16:44:29

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Hi,

Just FYI I've tried to enable ADMA again (now running 2.6.24-rc6) but
the bug is still present:

Jan 1 16:11:21 host kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
Jan 1 16:11:21 host kernel: ata7: CPB 0: ctl_flags 0x9, resp_flags 0x0
Jan 1 16:11:21 host kernel: ata7: timeout waiting for ADMA IDLE, stat=0x400
Jan 1 16:11:21 host kernel: ata7: timeout waiting for ADMA LEGACY, stat=0x400
Jan 1 16:11:21 host kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jan 1 16:11:21 host kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jan 1 16:11:21 host kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 1 16:11:21 host kernel: ata7.00: status: { DRDY }
Jan 1 16:11:21 host kernel: ata7: soft resetting link
Jan 1 16:11:22 host kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 1 16:11:22 host kernel: ata7.00: configured for UDMA/133
Jan 1 16:11:22 host kernel: ata7: EH complete
Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Write Protect is off
Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Although this time the above happened more than 3 hours after boot
which is much better than 2.6.22 was. In the past ~4 months ADMA was
disabled and I never had any libata-related error messages.

SMART does not show anything interesting:

smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint P120 series
Device Model: SAMSUNG SP2504C
Serial Number: XXXXXXXXXXXXXX
Firmware Version: VT100-33
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a
Local Time is: Tue Jan 1 17:38:21 2008 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (4867) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 81) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 100 100 025 Pre-fail Always - 6144
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1218
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 11363
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3325
10 Spin_Retry_Count 0x0033 253 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 253 002 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 985
190 Temperature_Celsius 0x0022 160 124 000 Old_age Always - 26
194 Temperature_Celsius 0x0022 160 124 000 Old_age Always - 26
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 4346055
196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 100 000 Old_age Always - 0
202 TA_Increase_Count 0x0032 253 253 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3315 -
# 2 Short offline Completed without error 00% 3298 -
# 3 Short offline Completed without error 00% 3268 -
# 4 Extended offline Completed without error 00% 3262 -
# 5 Short offline Completed without error 00% 3243 -
# 6 Short offline Completed without error 00% 3236 -
# 7 Short offline Completed without error 00% 3228 -
# 8 Extended offline Completed without error 00% 3218 -
# 9 Short offline Completed without error 00% 3186 -
#10 Short offline Completed without error 00% 3178 -
#11 Short offline Completed without error 00% 3175 -
#12 Short offline Completed without error 00% 3165 -
#13 Short offline Completed without error 00% 3155 -
#14 Short offline Completed without error 00% 3146 -
#15 Short offline Completed without error 00% 3139 -
#16 Extended offline Completed without error 00% 3135 -
#17 Short offline Completed without error 00% 3125 -
#18 Extended offline Completed without error 00% 3082 -
#19 Short offline Completed without error 00% 3068 -
#20 Short offline Completed without error 00% 3066 -
#21 Short offline Completed without error 00% 3052 -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

2008-01-02 03:25:50

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

[cc'ing Robert Hancock and NVidia people]

Whole thread can be read from the following URL.

http://thread.gmane.org/gmane.linux.ide/21710

In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I
first suspected faulty disk (reallocation failure on flush) but SMART
reports nothing suspicious and w/ ADMA disabled, the drive works just fine.

On a side note, on 2.6.22.1, SMART fails from time to time but the
problem went away on 2.6.24-rc6. This was apparently fixed during that
period. I guess we can ignore this for now.

Thanks.

Gabor Gombas wrote:
> Hi,
>
> Just FYI I've tried to enable ADMA again (now running 2.6.24-rc6) but
> the bug is still present:
>
> Jan 1 16:11:21 host kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
> Jan 1 16:11:21 host kernel: ata7: CPB 0: ctl_flags 0x9, resp_flags 0x0
> Jan 1 16:11:21 host kernel: ata7: timeout waiting for ADMA IDLE, stat=0x400
> Jan 1 16:11:21 host kernel: ata7: timeout waiting for ADMA LEGACY, stat=0x400
> Jan 1 16:11:21 host kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> Jan 1 16:11:21 host kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> Jan 1 16:11:21 host kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Jan 1 16:11:21 host kernel: ata7.00: status: { DRDY }
> Jan 1 16:11:21 host kernel: ata7: soft resetting link
> Jan 1 16:11:22 host kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Jan 1 16:11:22 host kernel: ata7.00: configured for UDMA/133
> Jan 1 16:11:22 host kernel: ata7: EH complete
> Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
> Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Write Protect is off
> Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> Jan 1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> Although this time the above happened more than 3 hours after boot
> which is much better than 2.6.22 was. In the past ~4 months ADMA was
> disabled and I never had any libata-related error messages.
>
> SMART does not show anything interesting:
>
> smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: SAMSUNG SpinPoint P120 series
> Device Model: SAMSUNG SP2504C
> Serial Number: XXXXXXXXXXXXXX
> Firmware Version: VT100-33
> User Capacity: 250,059,350,016 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7
> ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a
> Local Time is: Tue Jan 1 17:38:21 2008 CET
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x82) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection: Enabled.
> Self-test execution status: ( 0) The previous self-test routine completed
> without error or no self-test has ever
> been run.
> Total time to complete Offline
> data collection: (4867) seconds.
> Offline data collection
> capabilities: (0x5b) SMART execute Offline immediate.
> Auto Offline data collection on/off support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 81) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
> 3 Spin_Up_Time 0x0007 100 100 025 Pre-fail Always - 6144
> 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1218
> 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
> 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
> 8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 11363
> 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3325
> 10 Spin_Retry_Count 0x0033 253 253 051 Pre-fail Always - 0
> 11 Calibration_Retry_Count 0x0012 253 002 000 Old_age Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 985
> 190 Temperature_Celsius 0x0022 160 124 000 Old_age Always - 26
> 194 Temperature_Celsius 0x0022 160 124 000 Old_age Always - 26
> 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 4346055
> 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
> 197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
> 198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
> 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
> 201 Soft_Read_Error_Rate 0x000a 253 100 000 Old_age Always - 0
> 202 TA_Increase_Count 0x0032 253 253 000 Old_age Always - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Short offline Completed without error 00% 3315 -
> # 2 Short offline Completed without error 00% 3298 -
> # 3 Short offline Completed without error 00% 3268 -
> # 4 Extended offline Completed without error 00% 3262 -
> # 5 Short offline Completed without error 00% 3243 -
> # 6 Short offline Completed without error 00% 3236 -
> # 7 Short offline Completed without error 00% 3228 -
> # 8 Extended offline Completed without error 00% 3218 -
> # 9 Short offline Completed without error 00% 3186 -
> #10 Short offline Completed without error 00% 3178 -
> #11 Short offline Completed without error 00% 3175 -
> #12 Short offline Completed without error 00% 3165 -
> #13 Short offline Completed without error 00% 3155 -
> #14 Short offline Completed without error 00% 3146 -
> #15 Short offline Completed without error 00% 3139 -
> #16 Extended offline Completed without error 00% 3135 -
> #17 Short offline Completed without error 00% 3125 -
> #18 Extended offline Completed without error 00% 3082 -
> #19 Short offline Completed without error 00% 3068 -
> #20 Short offline Completed without error 00% 3066 -
> #21 Short offline Completed without error 00% 3052 -
>
> SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
> SMART Selective self-test log data structure revision number 0
> Warning: ATA Specification requires selective self-test log data structure revision number = 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> Gabor
>

--
tejun

2008-01-02 04:03:26

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Tejun Heo wrote:
> [cc'ing Robert Hancock and NVidia people]
>
> Whole thread can be read from the following URL.
>
> http://thread.gmane.org/gmane.linux.ide/21710
>
> In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I
> first suspected faulty disk (reallocation failure on flush) but SMART
> reports nothing suspicious and w/ ADMA disabled, the drive works just fine.
>
> On a side note, on 2.6.22.1, SMART fails from time to time but the
> problem went away on 2.6.24-rc6. This was apparently fixed during that
> period. I guess we can ignore this for now.
>
> Thanks.

This is kind of a longstanding problem which has been partially worked
around, but it seems not entirely. This is what I had diagnosed some
time ago:

"recently, some issues cropped up with command timeouts when a cache
flush command was immediately followed by an NCQ write. In this case,
sometimes when the NCQ write was issued, the status register changed
from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally appears
to, however it seems like the controller would get hung in that state,
and we would time out with no notifiers set, the gen_ctl register not
indicating interrupt status, and the CPB response flags still 0 as we
left them, seemingly indicating the controller hasn't done anything with
it. Then, when the error handler kicks in we clear the GO bit to put it
back into register mode, but the Legacy flag in the status register
doesn't get set (or at least it takes longer than 1 microsecond).
Finally when we do an ADMA channel reset that seems to get it responding
again, until this happens the next time.

From some experimentation, I found that when we are issuing a NCQ
command when the last command was non-NCQ, or vice versa, if I added in
a delay of 20 microseconds between setting up the CPB and writing to the
append register, the problem appeared to go away. Problem is I don't
know if that's because it actually needs this delay, or because it
changes the timing so that it happens to work even though we're doing
something wrong, there's some event we're not waiting for, etc.

I've now verified that no switches between ADMA and register mode occur
near the time of these timeouts. Neither are we reading or writing any
of the ATA shadow registers while we're in ADMA mode."

It seems likely that this is what is happening here (a switch from an
NCQ command to a non-NCQ command, then the non-NCQ times out). It could
be in some cases the 20 microsecond delay is not enough. But it seems
bogus that we should need such an arbitrary delay in the first place.

The question I had for NVIDIA regarding this that I never got answered
was, is there any reason why we would need a delay when switching
between NCQ and non-NCQ commands on ADMA, and if not, is there any known
cause that could cause the controller to get into this seemingly
locked-up state?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-02 04:20:57

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Robert Hancock wrote:
> Tejun Heo wrote:
>> [cc'ing Robert Hancock and NVidia people]
>>
>> Whole thread can be read from the following URL.
>>
>> http://thread.gmane.org/gmane.linux.ide/21710
>>
>> In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I
>> first suspected faulty disk (reallocation failure on flush) but SMART
>> reports nothing suspicious and w/ ADMA disabled, the drive works just
>> fine.
>>
>> On a side note, on 2.6.22.1, SMART fails from time to time but the
>> problem went away on 2.6.24-rc6. This was apparently fixed during that
>> period. I guess we can ignore this for now.
>>
>> Thanks.
>
> This is kind of a longstanding problem which has been partially worked
> around, but it seems not entirely. This is what I had diagnosed some
> time ago:
>
> "recently, some issues cropped up with command timeouts when a cache
> flush command was immediately followed by an NCQ write. In this case,
> sometimes when the NCQ write was issued, the status register changed
> from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally appears
> to, however it seems like the controller would get hung in that state,
> and we would time out with no notifiers set, the gen_ctl register not
> indicating interrupt status, and the CPB response flags still 0 as we
> left them, seemingly indicating the controller hasn't done anything with
> it. Then, when the error handler kicks in we clear the GO bit to put it
> back into register mode, but the Legacy flag in the status register
> doesn't get set (or at least it takes longer than 1 microsecond).
> Finally when we do an ADMA channel reset that seems to get it responding
> again, until this happens the next time.
>
> From some experimentation, I found that when we are issuing a NCQ
> command when the last command was non-NCQ, or vice versa, if I added in
> a delay of 20 microseconds between setting up the CPB and writing to the
> append register, the problem appeared to go away. Problem is I don't
> know if that's because it actually needs this delay, or because it
> changes the timing so that it happens to work even though we're doing
> something wrong, there's some event we're not waiting for, etc.
>
> I've now verified that no switches between ADMA and register mode occur
> near the time of these timeouts. Neither are we reading or writing any
> of the ATA shadow registers while we're in ADMA mode."
>
> It seems likely that this is what is happening here (a switch from an
> NCQ command to a non-NCQ command, then the non-NCQ times out). It could
> be in some cases the 20 microsecond delay is not enough. But it seems
> bogus that we should need such an arbitrary delay in the first place.
>
> The question I had for NVIDIA regarding this that I never got answered
> was, is there any reason why we would need a delay when switching
> between NCQ and non-NCQ commands on ADMA, and if not, is there any known
> cause that could cause the controller to get into this seemingly
> locked-up state?

Well, I guess I did sort of get an answer, but the only theory was that
the flush and the NCQ commands were being overlapped, which shouldn't be
possible (the libata core guarantees that, and if it didn't work it
would affect all controllers).

I'm kind of wondering if there's something funny going on with the
notifier register stuff, which is supposed to tell us what commands have
completed. We don't really use it at all (we had some problems with
missed completions, etc. when I tried using it, also it doesn't work if
ATAPI is enabled on the other port on the controller, apparently). I
know these controllers will do strange things like not signalling
interrupts for later events if you don't clear the notifiers in just the
right way (that being mostly determined by trial and error).

Or, maybe somehow the flush is getting issued before the controller is
really "ready" for it somehow (it's not finished cleaning up after
preceding NCQ command).

It's pretty hard for me to figure out which of the above might be the
case, especially without access to the detailed controller documentation..

2008-01-02 04:25:42

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Robert Hancock wrote:
>> This is kind of a longstanding problem which has been partially worked
>> around, but it seems not entirely. This is what I had diagnosed some
>> time ago:
>>
>> "recently, some issues cropped up with command timeouts when a cache
>> flush command was immediately followed by an NCQ write. In this case,
>> sometimes when the NCQ write was issued, the status register changed
>> from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally
>> appears to, however it seems like the controller would get hung in
>> that state, and we would time out with no notifiers set, the gen_ctl
>> register not indicating interrupt status, and the CPB response flags
>> still 0 as we left them, seemingly indicating the controller hasn't
>> done anything with it. Then, when the error handler kicks in we clear
>> the GO bit to put it back into register mode, but the Legacy flag in
>> the status register doesn't get set (or at least it takes longer than
>> 1 microsecond). Finally when we do an ADMA channel reset that seems to
>> get it responding again, until this happens the next time.
>>
>> From some experimentation, I found that when we are issuing a NCQ
>> command when the last command was non-NCQ, or vice versa, if I added in
>> a delay of 20 microseconds between setting up the CPB and writing to the
>> append register, the problem appeared to go away. Problem is I don't
>> know if that's because it actually needs this delay, or because it
>> changes the timing so that it happens to work even though we're doing
>> something wrong, there's some event we're not waiting for, etc.
>>
>> I've now verified that no switches between ADMA and register mode
>> occur near the time of these timeouts. Neither are we reading or
>> writing any of the ATA shadow registers while we're in ADMA mode."
>>
>> It seems likely that this is what is happening here (a switch from an
>> NCQ command to a non-NCQ command, then the non-NCQ times out). It
>> could be in some cases the 20 microsecond delay is not enough. But it
>> seems bogus that we should need such an arbitrary delay in the first
>> place.
>>
>> The question I had for NVIDIA regarding this that I never got answered
>> was, is there any reason why we would need a delay when switching
>> between NCQ and non-NCQ commands on ADMA, and if not, is there any
>> known cause that could cause the controller to get into this seemingly
>> locked-up state?
>
> Well, I guess I did sort of get an answer, but the only theory was that
> the flush and the NCQ commands were being overlapped, which shouldn't be
> possible (the libata core guarantees that, and if it didn't work it
> would affect all controllers).
>
> I'm kind of wondering if there's something funny going on with the
> notifier register stuff, which is supposed to tell us what commands have
> completed. We don't really use it at all (we had some problems with
> missed completions, etc. when I tried using it, also it doesn't work if
> ATAPI is enabled on the other port on the controller, apparently). I
> know these controllers will do strange things like not signalling
> interrupts for later events if you don't clear the notifiers in just the
> right way (that being mostly determined by trial and error).
>
> Or, maybe somehow the flush is getting issued before the controller is
> really "ready" for it somehow (it's not finished cleaning up after
> preceding NCQ command).
>
> It's pretty hard for me to figure out which of the above might be the
> case, especially without access to the detailed controller documentation..

Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
FLUSH is used regularly. We really need to fix this.

--
tejun

2008-01-02 06:19:33

by Jeff Garzik

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Tejun Heo wrote:
> Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
> FLUSH is used regularly. We really need to fix this.

I reiterate my opinion :) ... We should remove ADMA support from
sata_nv. It's only in a few chips, it's not appearing in any new chips,
and nasty problems have lingered since ADMA support was introduced.

Definitely sounds like we should disable ADMA by default for 2.6.24-rc, too.

Jeff

2008-01-02 06:40:29

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Jeff Garzik wrote:
> Tejun Heo wrote:
>> Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
>> FLUSH is used regularly. We really need to fix this.
>
>
> I reiterate my opinion :) ... We should remove ADMA support from
> sata_nv. It's only in a few chips, it's not appearing in any new chips,
> and nasty problems have lingered since ADMA support was introduced.
>
> Definitely sounds like we should disable ADMA by default for 2.6.24-rc,
> too.

I wouldn't agree.. It's only in a few chips (CK804/MCP04), but those
chips are very common in desktop, workstation, even some server
machines. Given the huge number of these chips out there, problem
reports have been quite rare.

2008-01-02 06:56:06

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Robert Hancock wrote:
> Jeff Garzik wrote:
>> Tejun Heo wrote:
>>> Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
>>> FLUSH is used regularly. We really need to fix this.
>>
>>
>> I reiterate my opinion :) ... We should remove ADMA support from
>> sata_nv. It's only in a few chips, it's not appearing in any new
>> chips, and nasty problems have lingered since ADMA support was
>> introduced.
>>
>> Definitely sounds like we should disable ADMA by default for
>> 2.6.24-rc, too.
>
> I wouldn't agree.. It's only in a few chips (CK804/MCP04), but those
> chips are very common in desktop, workstation, even some server
> machines. Given the huge number of these chips out there, problem
> reports have been quite rare.

I agree with Jeff here. Maybe not remove but disable it by default and
when enabling warn loudly. NCQ just doesn't enough for its cost when
the cost includes erratic behaviors. Only very small fraction of error
cases actually make to bugzilla or this mailing list.

Nvidia gents, is there anyway (be it NDA or whatever) to get Robert or
any of us technical documentation?

Thanks.

--
tejun

2008-01-02 17:36:06

by Allen Martin

[permalink] [raw]

Subject: RE: sata_nv + ADMA + Samsung disk problem

> The question I had for NVIDIA regarding this that I never got
> answered
> was, is there any reason why we would need a delay when switching
> between NCQ and non-NCQ commands on ADMA, and if not, is
> there any known
> cause that could cause the controller to get into this seemingly
> locked-up state?

When switching from NCQ to non NCQ or vice versa you must make sure all
outstanding commands are completed before issuing the new command. The
hardware doesn't do anything to prevent queued and non queued commands
from going out on the wire at the same time which will certainly cause
some drives to fail.

-Allen
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

2008-01-02 18:57:58

by Jeff Garzik

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Allen Martin wrote:
>> The question I had for NVIDIA regarding this that I never got
>> answered
>> was, is there any reason why we would need a delay when switching
>> between NCQ and non-NCQ commands on ADMA, and if not, is
>> there any known
>> cause that could cause the controller to get into this seemingly
>> locked-up state?
>
> When switching from NCQ to non NCQ or vice versa you must make sure all
> outstanding commands are completed before issuing the new command. The
> hardware doesn't do anything to prevent queued and non queued commands
> from going out on the wire at the same time which will certainly cause
> some drives to fail.

The software definitely provides that guarantee for all NCQ-capable
controllers.

Jeff

2008-01-02 23:29:29

by Allen Martin

[permalink] [raw]

Subject: RE: sata_nv + ADMA + Samsung disk problem

> The software definitely provides that guarantee for all NCQ-capable
> controllers.
>

Well if that's not it, it must be some problem entering ADMA legacy
mode. Here's what the Windows driver does:

ADMACtrl.aGO = 0
ADMACtrl.aEIEN = 0
poll {
until ADMAStatus.aLGCY = 1 || timeout
}
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

2008-01-03 00:22:42

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Allen Martin wrote:
>> The software definitely provides that guarantee for all NCQ-capable
>> controllers.
>>
>
> Well if that's not it, it must be some problem entering ADMA legacy
> mode. Here's what the Windows driver does:
>
>
> ADMACtrl.aGO = 0
> ADMACtrl.aEIEN = 0
> poll {
> until ADMAStatus.aLGCY = 1 || timeout
> }

What we're doing to enter legacy mode is essentially:

-wait until ADMA status indicates IDLE bit set (max wait of 1 microsecond)
-clear GO bit in control register
-wait until status indicates LEGACY bit set (max wait of 1 microsecond)

and to enter ADMA mode:

-set GO bit in control register
-wait until status indicates LEGACY bit cleared and IDLE bit set (max
wait of 1 microsecond)

The 1 microsecond timeout is pretty aggressive admittedly, but it
apparently isn't being broken (the only timeouts when switching modes
I've seen are during error handling after a command timeout has already
occurred). What timeout value is the Windows driver using?

Also, I see you are clearing the AEIN bit when in register mode, while
we're not. Is that important/necessary?

Aside from all this though, in the case of NCQ writes followed by a
cache flush, that sequence of commands won't put us into legacy mode at
all since the cache flush is a no-data command which we should be able
to handle in ADMA mode, from my understanding (correct me if I'm wrong).
So I don't imagine legacy/ADMA mode switch could be the cause of this
problem.

I also saw in my previous investigation that a flush immediately
followed by a write could cause the write to time out as well.

From some of the traces I took previously (posted on LKML as "sata_nv
ADMA controller lockup investigation" way back in Feb 07), what seems to
occur is that when the second command is issued very rapidly (within
less than 20 microseconds, or potentially longer) after the previous
command's completion, the ADMA status changes from 0x500 (STOPPED and
IDLE) to 0x400 (just IDLE) as it typically does, but then it sticks
there, no interrupt is ever raised, and CPB response flags remain at 0.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-03 00:28:14

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Tejun Heo wrote:
> Robert Hancock wrote:
>> Jeff Garzik wrote:
>>> Tejun Heo wrote:
>>>> Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
>>>> FLUSH is used regularly. We really need to fix this.
>>>
>>> I reiterate my opinion :) ... We should remove ADMA support from
>>> sata_nv. It's only in a few chips, it's not appearing in any new
>>> chips, and nasty problems have lingered since ADMA support was
>>> introduced.
>>>
>>> Definitely sounds like we should disable ADMA by default for
>>> 2.6.24-rc, too.
>> I wouldn't agree.. It's only in a few chips (CK804/MCP04), but those
>> chips are very common in desktop, workstation, even some server
>> machines. Given the huge number of these chips out there, problem
>> reports have been quite rare.
>
> I agree with Jeff here. Maybe not remove but disable it by default and
> when enabling warn loudly. NCQ just doesn't enough for its cost when
> the cost includes erratic behaviors. Only very small fraction of error
> cases actually make to bugzilla or this mailing list.
>
> Nvidia gents, is there anyway (be it NDA or whatever) to get Robert or
> any of us technical documentation?
>
> Thanks.

Last I heard, NVIDIA management gave the thumbs down to any more NDAs
for ADMA documentation. It would be nice if they would reconsider.
Apparently Jeff does have the docs, though..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-03 04:14:17

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Robert Hancock wrote:
>
> What we're doing to enter legacy mode is essentially:
>
> -wait until ADMA status indicates IDLE bit set (max wait of 1 microsecond)
> -clear GO bit in control register
> -wait until status indicates LEGACY bit set (max wait of 1 microsecond)
>
> and to enter ADMA mode:
>
> -set GO bit in control register
> -wait until status indicates LEGACY bit cleared and IDLE bit set (max
> wait of 1 microsecond)
..

If there are outstanding TCQ/NCQ commands (any drive),
then this could take (much) longer to enter legacy mode,
as the ADMA engine will wait for them all to finish.

But for normal, "nothing outstanding" mode, it should be very quick.

Cheers

2008-01-03 04:17:34

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Robert Hancock wrote:
..
> From some of the traces I took previously (posted on LKML as "sata_nv
> ADMA controller lockup investigation" way back in Feb 07), what seems to
> occur is that when the second command is issued very rapidly (within
> less than 20 microseconds, or potentially longer) after the previous
> command's completion, the ADMA status changes from 0x500 (STOPPED and
> IDLE) to 0x400 (just IDLE) as it typically does, but then it sticks
> there, no interrupt is ever raised, and CPB response flags remain at 0.
..

Assuming that NVidia got their ADMA core logic from Pacific Digital
(the inventors), then it may have some of the same bugs as the original.

One of those bugs is that the aGO trigger is sampled in a "racey" way,
such that it sometimes may miss a recent addition to the ring.

The *only* way to guarantee things with the original Pacific Digital core
was to (1) always retrigger aGO for a full ring scan with each new addition,
and (2) poll periodically (every half second or so) rather than relying
exclusively on the IRQ actually working..

Dunno about the NVidia version.

Cheers

2008-01-03 04:55:20

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Mark Lord wrote:
> Robert Hancock wrote:
> ..
>> From some of the traces I took previously (posted on LKML as "sata_nv
>> ADMA controller lockup investigation" way back in Feb 07), what seems
>> to occur is that when the second command is issued very rapidly
>> (within less than 20 microseconds, or potentially longer) after the
>> previous command's completion, the ADMA status changes from 0x500
>> (STOPPED and IDLE) to 0x400 (just IDLE) as it typically does, but then
>> it sticks there, no interrupt is ever raised, and CPB response flags
>> remain at 0.
> ..
>
> Assuming that NVidia got their ADMA core logic from Pacific Digital
> (the inventors), then it may have some of the same bugs as the original.
>
> One of those bugs is that the aGO trigger is sampled in a "racey" way,
> such that it sometimes may miss a recent addition to the ring.
>
> The *only* way to guarantee things with the original Pacific Digital core
> was to (1) always retrigger aGO for a full ring scan with each new
> addition,
> and (2) poll periodically (every half second or so) rather than relying
> exclusively on the IRQ actually working..
>
> Dunno about the NVidia version.

Theirs works rather differently - the GO bit is there, but there's
another append register which is used to tell the controller that a new
tag has been added to the CPB list.

The only thing we currently use the GO bit for is to switch between ADMA
and port register mode. Could be there's something we need to do there,
though, who knows..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-03 15:48:54

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Robert Hancock wrote:
> Mark Lord wrote:
>> Robert Hancock wrote:
>> ..
>>> From some of the traces I took previously (posted on LKML as
>>> "sata_nv ADMA controller lockup investigation" way back in Feb 07),
>>> what seems to occur is that when the second command is issued very
>>> rapidly (within less than 20 microseconds, or potentially longer)
>>> after the previous command's completion, the ADMA status changes from
>>> 0x500 (STOPPED and IDLE) to 0x400 (just IDLE) as it typically does,
>>> but then it sticks there, no interrupt is ever raised, and CPB
>>> response flags remain at 0.
>> ..
>>
>> Assuming that NVidia got their ADMA core logic from Pacific Digital
>> (the inventors), then it may have some of the same bugs as the original.
>>
>> One of those bugs is that the aGO trigger is sampled in a "racey" way,
>> such that it sometimes may miss a recent addition to the ring.
>>
>> The *only* way to guarantee things with the original Pacific Digital core
>> was to (1) always retrigger aGO for a full ring scan with each new
>> addition,
>> and (2) poll periodically (every half second or so) rather than relying
>> exclusively on the IRQ actually working..
>>
>> Dunno about the NVidia version.
>
> Theirs works rather differently - the GO bit is there, but there's
> another append register which is used to tell the controller that a new
> tag has been added to the CPB list.
..

The PacDigi core uses a "search count" register for that purpose,
but the buggy nature of the core required that it always be set
to "2 * ring_size" to ensure nothing got missed.

Here's some comments from the original ADMA driver.
Maybe something from here might help with the NV stuff, too.

// There is a chance that the chip will skip over a CPB if a SERVICE interrupt
// occurs while it's reading the CPB header. This won't cause us to get
// stuck anywhere, but it might slow down execution of the new CPB if
// it has to wait for the next time we hit aGO. So.. Dxxx/Dxxx suggest
// that all we need to do is tell the chip to do two passes around the ring
// from an aGO instead of one pass, so that it will find the "missed" CPB
// on the second pass. This isn't as bad as it first looks.
//
writew(channel->num_cpbs * 2, &adma_regs->cpb_search_count);

Or again, the NV stuff may be completely different (?).

2008-01-03 15:49:55

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Mark Lord wrote:
> Robert Hancock wrote:
>> Mark Lord wrote:
>>> Robert Hancock wrote:
>>> ..
>>>> From some of the traces I took previously (posted on LKML as "sata_nv ADMA controller lockup investigation" way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20 microseconds, or potentially longer) after the previous command's completion, the ADMA status changes from 0x500 (STOPPED and IDLE) to 0x400 (just IDLE) as it typically does, but then it sticks there, no interrupt is ever raised, and CPB response flags remain at 0.
>>> ..
>>>
>>> Assuming that NVidia got their ADMA core logic from Pacific Digital
>>> (the inventors), then it may have some of the same bugs as the original.
>>>
>>> One of those bugs is that the aGO trigger is sampled in a "racey" way,
>>> such that it sometimes may miss a recent addition to the ring.
>>>
>>> The *only* way to guarantee things with the original Pacific Digital core
>>> was to (1) always retrigger aGO for a full ring scan with each new addition,
>>> and (2) poll periodically (every half second or so) rather than relying
>>> exclusively on the IRQ actually working..
>>>
>>> Dunno about the NVidia version.
>>
>> Theirs works rather differently - the GO bit is there, but there's another append register which is used to tell the controller that a new tag has been added to the CPB list.
> ..
>
> The PacDigi core uses a "search count" register for that purpose,
> but the buggy nature of the core required that it always be set
> to "2 * ring_size" to ensure nothing got missed.
>
> Here's some comments from the original ADMA driver.
> Maybe something from here might help with the NV stuff, too.
>
> // There is a chance that the chip will skip over a CPB if a SERVICE interrupt
> // occurs while it's reading the CPB header. This won't cause us to get
> // stuck anywhere, but it might slow down execution of the new CPB if
> // it has to wait for the next time we hit aGO. So.. Dxxx/Dxxx suggest
> // that all we need to do is tell the chip to do two passes around the ring
> // from an aGO instead of one pass, so that it will find the "missed" CPB
> // on the second pass. This isn't as bad as it first looks.
> //
> writew(channel->num_cpbs * 2, &adma_regs->cpb_search_count);
>
> Or again, the NV stuff may be completely different (?).
..

Another thing about the PacDigi core: one has to be very careful
to avoid sequential accesses to sequential PCI locations when
programming the chip -- it cannot handle merged register writes.

So for any group of sequentially laid out registers, the code has
to ensure it never writes two adjacent registers in sequence..

-ml

2008-01-03 21:14:21

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

> Another thing about the PacDigi core: one has to be very careful
> to avoid sequential accesses to sequential PCI locations when
> programming the chip -- it cannot handle merged register writes.
>
> So for any group of sequentially laid out registers, the code has
> to ensure it never writes two adjacent registers in sequence..

Ugh ? Write combining isn't permitted on normal registers afaik...

Ben.

2008-01-04 00:44:21

by Allen Martin

[permalink] [raw]

Subject: RE: sata_nv + ADMA + Samsung disk problem

> > Dunno about the NVidia version.
>
> Theirs works rather differently - the GO bit is there, but there's
> another append register which is used to tell the controller
> that a new
> tag has been added to the CPB list.
>
> The only thing we currently use the GO bit for is to switch
> between ADMA
> and port register mode. Could be there's something we need to
> do there,
> though, who knows..
>

You shouldn't ever need to touch GO other than the ADMA / legacy mode
switch as you say.

The NVIDIA ADMA hw is not based on the Pacific Digital core.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

2008-01-04 01:43:58

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Benjamin Herrenschmidt wrote:
>> Another thing about the PacDigi core: one has to be very careful
>> to avoid sequential accesses to sequential PCI locations when
>> programming the chip -- it cannot handle merged register writes.
>>
>> So for any group of sequentially laid out registers, the code has
>> to ensure it never writes two adjacent registers in sequence..
>
> Ugh ? Write combining isn't permitted on normal registers afaik...
>
> Ben.

Byte merging can be done by the chipset on MMIO writes (merging multiple
8 or 16-bit writes into a single 32-bit cycle).

2008-01-04 02:53:04

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Allen Martin wrote:
>
>>> Dunno about the NVidia version.
>> Theirs works rather differently - the GO bit is there, but there's
>> another append register which is used to tell the controller
>> that a new
>> tag has been added to the CPB list.
>>
>> The only thing we currently use the GO bit for is to switch
>> between ADMA
>> and port register mode. Could be there's something we need to
>> do there,
>> though, who knows..
>>
>
> You shouldn't ever need to touch GO other than the ADMA / legacy mode
> switch as you say.
>
> The NVIDIA ADMA hw is not based on the Pacific Digital core.

That answers that question, I guess. Still guessing at why the
controller would get stuck in IDLE state with no interrupt raised, then..

2008-01-04 05:52:38

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

On Thu, 2008-01-03 at 19:43 -0600, Robert Hancock wrote:
> Benjamin Herrenschmidt wrote:
> >> Another thing about the PacDigi core: one has to be very careful
> >> to avoid sequential accesses to sequential PCI locations when
> >> programming the chip -- it cannot handle merged register writes.
> >>
> >> So for any group of sequentially laid out registers, the code has
> >> to ensure it never writes two adjacent registers in sequence..
> >
> > Ugh ? Write combining isn't permitted on normal registers afaik...
> >
> > Ben.
>
> Byte merging can be done by the chipset on MMIO writes (merging multiple
> 8 or 16-bit writes into a single 32-bit cycle).

That is true, if they are consecutive. You mean that this HW is f*cked
up enough to actually have separate 8/16 bits registers that are
contiguous ? Yuck... I'm afraid you -have- to add reads in between to
guarantee that no merging will occur.

Cheers,

Ben.

2008-01-08 00:11:56

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

Allen Martin wrote:
>
>>> Dunno about the NVidia version.
>> Theirs works rather differently - the GO bit is there, but there's
>> another append register which is used to tell the controller
>> that a new
>> tag has been added to the CPB list.
>>
>> The only thing we currently use the GO bit for is to switch
>> between ADMA
>> and port register mode. Could be there's something we need to
>> do there,
>> though, who knows..
>>
>
> You shouldn't ever need to touch GO other than the ADMA / legacy mode
> switch as you say.
>
> The NVIDIA ADMA hw is not based on the Pacific Digital core.

Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22
fails. 2.6.20 had ADMA support as well, so I wonder what change started
causing the problem. Would it be possible for you to do a git bisect (or
at least try 2.6.21 to try and narrow it down)?

2008-01-11 23:18:23

[permalink] [raw]

Subject: Re: sata_nv + ADMA + Samsung disk problem

On Mon, Jan 07, 2008 at 06:10:29PM -0600, Robert Hancock wrote:

> Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22
> fails. 2.6.20 had ADMA support as well, so I wonder what change started
> causing the problem. Would it be possible for you to do a git bisect (or
> at least try 2.6.21 to try and narrow it down)?

I've now booted 2.6.21.7, we'll see. The problem with the bisection is
that I can't explicitely trigger the bug so I can't say for sure if a
kernel is good or it is just needs more time to trigger. The average
uptime of this machine is just a couple hours a day.

For example, with 2.6.24-rc6 it took over 3 hours for the first disk to
trigger the bug and the second disk needed more than 7 hours. This
machine is seldom turned on for that long.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

2008-01-12 01:11:09